More

dkersten · 2026-06-23T06:29:53 1782196193

And AI tried telling me that Uber for Dogs (dogs are the drivers) was a terrible idea…

dkersten · 2026-06-22T09:20:46 1782120046

Its on other providers, like Together.ai

dkersten · 2026-06-18T08:27:31 1781771251

> This is not scientific at all, just vibes, YMMV.

This is the problem.

I would love to have a product sheet showing what each models strengths an weaknesses are, so that I can have a clear decision tree of "if this kind of work, use model X", or "model Y should be used in ways Z". But they all look the same from the outside and the only way to figure out which might be marginally better at what is to do extensive, time consuming, and perhaps expensive testing.

coldtea · 2026-06-18T09:37:31 1781775451

>I would love to have a product sheet showing what each models strengths an weaknesses are, so that I can have a clear decision tree of "if this kind of work, use model X", or "model Y should be used in ways Z". But they all look the same from the outside and the only way to figure out which might be marginally better at what is to do extensive, time consuming, and perhaps expensive testing.

Think of it less like a static tool, and more like a human helper, where the same holds.

mahidhar · 2026-06-18T13:32:57 1781789577

Well, unlike a human, I cannot expect any these LLMs to take any ownership of the work they do. I cannot expect any given model and version (sonnet 4.6) to learn, improve and adapt over time. I cannot expect it's limitations to ever go away at the model level. So it is not like a human in most ways that I actually care about.

That said, I can't wait for LLMs to stop being AI and start being just another tool. Anything cursed with the "AI" label seems to go through this mess. In the earlier AI cycles, rules engines were considered "human-ish" and got hyped up, but today we just see then as just another tool available to us, and we're better off for it.

squidbeak · 2026-06-18T17:11:59 1781802719

You're on the hook for their work in the way a manager is for their staff's output. The insistence of AI being a mere tool very often comes with this strange desire to be free of responsibility for its work. People seem to forget that the big advantage in these things is the range they have for obscure insight and creative solutions, both impossible with determinism.

themgt · 2026-06-18T16:00:03 1781798403

That said, I can't wait for LLMs to stop being AI and start being just another tool.

From a horse's perspective, the internal combustion engine is just another tool for making scary noises and powering horse trailers to take me on fun horse adventures. So ... perhaps.

kolinko · 2026-06-18T15:07:10 1781795230

models don’t improve, but harnesses/tools/rules around them grow with the project.

ACCount37 · 2026-06-18T09:55:11 1781776511

One issue with that is that human helpers last longer. LLMs cycle in and out in months, and what held for Your Favorite LLM 6.7 may not hold for Your Favorite LLM 6.9.

renegade-otter · 2026-06-18T12:38:01 1781786281

Right, this is why I would slam the breaks on investing into your workflow all of your time and effort, because 2 months from now it may be out the window. Frontier models are also constantly being tweaked, so what worked yesterday may be off today.

ChatGPT was obedient with the grill-me technique, just wrote a plan. Yesterday it started jumping to implementation. Why?

HappySweeney · 2026-06-18T13:04:47 1781787887

I find that when an LLM jumps into tasks it was not told to do (or even worse, doing things it was explicitly told not to), it is a good sign the context is too full, and you should do a controlled hand-off to a new instance.

renegade-otter · 2026-06-18T14:32:56 1781793176

I wipe my context relentlessly. I never have long-running conversations. In and out like Seal Team Six.

cassianoleal · 2026-06-18T11:01:08 1781780468

They are not human. Humans have names, faces, voices, personality, a personal history, family, care for whatever they call their community.

With humans it's actually good and worthwhile to create and strengthen connections. With an LLM, that's psychosis.

tekne · 2026-06-18T11:18:47 1781781527

To be fair: a voice, personality, and personal history sounds a lot like training data.

I don't think LLMs are people in any sense, at least as they're constructed now -- but they very much have what we would call "culture" and "personality" in suitably alien forms.

This is not the same as, e.g., feelings, experience, or humanity, or actual opinions or ideas (versus essentially "distilled vibes") and I feel that AI will more and more force us to confront that (including if new AIs are ever developed that may have the latter, as well!)

scotty79 · 2026-06-18T11:06:08 1781780768

If you have a toolbox full of similar but different tool getting to know them is a prudent thing to do, not a psychosis. There's no connection because the tool is immutable (except for adjustments you made) but you do develop a specific relation with that tool. Some people even love some of their tools at some level.

And if humans are anything, they are tool users.

coldtea · 2026-06-18T11:51:16 1781783476

>If you have a toolbox full of similar but different tool getting to know them is a prudent thing to do, not a psychosis

Can be both. Use of some tools like LLMs might be more inducing psychosis than others like plain compilers or hammers.

>And if humans are anything, they are tool users.

To the point of self-destruction sometimes.

scotty79 · 2026-06-18T12:46:08 1781786768

> Use of some tools like LLMs might be more inducing psychosis than others like plain compilers or hammers.

I really don't get it. Why the fact that it outputs words is so goddamn important for everybody? How does it suddenly make you so emotionally vulnerable? Does my brain work in a different way than the rest of humanity? Can't you disregard what's irrelevant? Is every programmer suddenly a trump supporter that has no ability to recognize empty words? To recognize lies about emotions and facts?

Words are just input. Mostly garbage. Emotion inducing words are garbage 10 times more often than any other. I could expect romance reader to be affected, or somebody with iq 70. But how the caste of some of the most technical people ever is afraid of catching psychosis just because they might read some words?

chadgpt3 · 2026-06-18T14:05:35 1781791535

It's a certain percentage of people and yes it's different for them because it outputs words and triggers some kind of emotional trust response.

scotty79 · 2026-06-18T15:15:58 1781795758

As good opportunity as any to acquire some emotional intelligence.

j-bos · 2026-06-18T11:30:37 1781782237

Yeah, AI tools bring software developers closer to the messy real world where 0 and 1 aren't always exactly 0 and 1.

skydhash · 2026-06-18T14:52:17 1781794337

Computing is useful for exactly going away from the messy real world of humans. I don’t need random errors in my financial transactions. I don’t want random errors when doctors are retrieving my medical history. And I don’t want random errors in my backup,… There’s plenty of non-deterministic things in my life, I don’t want my computer to follow suite.

epicepicurean · 2026-06-18T14:09:38 1781791778

They are not human, but it helps to prompt them similarly. See: https://www.anthropic.com/research/emotion-concepts-function

anthonyrstevens · 2026-06-18T14:56:50 1781794610

Good read. Thanks for sharing.

Wowfunhappy · 2026-06-18T12:10:01 1781784601

They're not human. But they are trained on human language, and thinking of them as similar to a human helps me work with them effectively.

malwrar · 2026-06-18T11:41:54 1781782914

These things passing the Turing Test makes anthropomorphizing their behavior awkward, but don’t forget it’s just an analogy to communicate an experience. If you convey a certain written voice to these models in your input, you get a somewhat consistent end effect. I think that’s all that is being communicated.

madeofpalk · 2026-06-18T09:52:57 1781776377

Except, where every different model and version is like a different person where you need to learn their idiosyncrasies of how they work every other month.

It's a very very bizarre way to use a computer.

Personally, I just don't. I'll use and prompt the LLMs the way that feels natural to me and move on with my life. Maybe I don't always get completely optimal results from them, but im also not spending half my day pleading with the computer to do a task.

user43928 · 2026-06-18T12:15:00 1781784900

I also don't think I need to prompt Claude differently than Codex.

The most important thing to be aware of in my opinion would be that Claude is better at UI design, and leaves a lot more comments in the code.

Other than that the results seem similar, at least functionally. I do not usually review the code style.

gib444 · 2026-06-18T10:19:18 1781777958

No, I won't anthropomorphise LLMs.

coldtea · 2026-06-18T11:54:29 1781783669

If there was anything that made sense to anthropomorphise it would be a machine meant to mimic talking, thinking and answering like a human, one that even passes the Turing test.

When we built the idea that anthropomorphising is wrong, we meant when doing it for rocks or trees or thunders or deer or some such.

TeMPOraL · 2026-06-18T16:20:25 1781799625

That's your prerogative, but be aware you'll continue to remain confused about LLMs. Anthropomorphizing them is what gives you the best high-level intuition about where and how to employ them, and where and how not to.

yeer2 · 2026-06-18T11:51:45 1781783505

This is so dumb and goes against all the principles that enabled computers and smartphones to achieve wide adoption - the technology should evolve to fit the human. Not the other way around.

duckmysick · 2026-06-18T13:15:32 1781788532

I'd argue the opposite. Technology in the past few decades was (is) limited and humans had to adapt to it.

We communicate with other humans using voice and three dimensional hand gestures. To use computers and early phones we had to learn to operate new input devices: keyboards and mice. Later with touchscreens we moved to two dimensional hand (finger) gestures. We're barely making voice commands work with our devices just recently.

Then, a large number of humans are figuratively tethered to their desks because the devices need power and stable internet connection. Mobile devices break this relationship a bit but you still need to charge them and be close to some sort of access point. In any case, the devices encourage sitting in one place for hours at time.

And this is just computers and smartphones. Humans adapted their entire lifestyles and transformed the landscape to cater to cars.

skydhash · 2026-06-18T15:01:35 1781794895

> Technology in the past few decades was (is) limited and humans had to adapt to it.

Was it? Think first about what it replaced. Lots of manual computation in bookkeeping and financial sectors. Telegrams and snail mail moved to email. Typesetting in books and magazines became easier and widely available,…

If there’s one thing that you can’t say about computers is that they’re limited.

duckmysick · 2026-06-18T15:28:04 1781796484

No doubt that computers enabled a lot of automation. We can both agree with that.

The context was that technology should evolve to fit the humans [not the other way around]. And if contemporary technology didn't have limitations, it would be correct.

But it did and humans had to adapt to the computers. Humans had to develop and learn special languages so they could communicate with computers to do all those useful things you mentioned. Why? They were limited in understanding (or parsing) human languages. It took us decades before we could talk to computers in human languages. We're getting pretty close - especially in the past few years - but there's still some friction.

skydhash · 2026-06-18T15:47:59 1781797679

> Humans had to develop and learn special languages so they could communicate with computers to do all those useful things you mentioned. Why? They were limited in understanding (or parsing) human languages

You may need to revisit your computation theory courses. Computers are the embodiment of a mathematical model and thus the inputs and outputs are formalized.

Do you just hold a pen and words are written automatically? Do you just hover your hands over a piano and have the moonlight sonata played? No, you have to do precise mechanical movements because that’s how the output is realized.

There’s no such things as words, sentences, keywords, statements at the computer level. What it does is symbol manipulation. You provide it a string of symbols, the rules for the manipulation, and it will provide a string of symbols as the output.

What symbols, what rules, are completely arbitrary . We just found that {1,0} are all that we needed as the set of symbols and that Context-Free Grammar is perfect for specifying the rules.

We still need to encode everything down to binary (ascii, unicode, bcd, floating points, pixel formats, PCM,…) and use a programming language (as defined by a grammar) to get the computer to do anything. Inference is made possible by those two mechanisms. It’s not a new computation model.

fluffybucktsnek · 2026-06-18T21:41:54 1781818914

I don't think the "languages" they said meant specifically "programming languages". In HCI, computer interfaces can be referred as languages as they come with their own affordances and symbolism that is not directly associated with real life: case in point, nowadays, basically no one saves data in diskettes, but we still use them as the "save icon".

Also, I find it funny you mentioned "there's no such thing as words [...] at the computer level". It seems you are the one in the need of a computational theory refresh. Grammars are composed of words, which in turn, are composed of elements of the alphabet set. So, in fact, not only there are words, computers are, above all else, word-processing machines. There are more innacuracies (physical computers being stricly deterministic, needing binary to accomplish inference, etc.), but let's leave it at that, unless you wish to press.

skydhash · 2026-06-18T22:40:13 1781822413

> In HCI, computer interfaces can be referred as languages as they come with their own affordances and symbolism that is not directly associated with real life:

There's always jargon and other token words that holds no meaning in other realm of life. Even the alphabet today is mostly arbitrary gliphs.

> Grammars are composed of words, which in turn, are composed of elements of the alphabet set.

Please refer to the formal definition found in wikipedia

https://en.wikipedia.org/wiki/Context-free_grammar#Formal_de...

> There are more innacuracies (physical computers being stricly deterministic, needing binary to accomplish inference, etc.),

I've not said anything about computers being strictly deterministic. And everything is binary at the CPU/GPU level. Even with specialized instructions, you still need to organize them into a proper algorithm and encode it and its data to binary.

fluffybucktsnek · 2026-06-19T02:01:39 1781834499

> There's always jargon and other token words that holds no meaning in other realm of life. Even the alphabet today is mostly arbitrary gliphs.

Sure, but this is a discussion focused on how humans interact with computers, ergo Human-Computer Interactions, so I'm not sure what's your point. In the end, you don't interact with your computer (in the physical sense) through a 2-key keyboard.

> Please refer to the formal definition found in wikipedia <link to CFG article>

When I mentioned grammars, I was talking about formal grammars in general. Still, I made a bit of confusion, since formal grammars only define the rules, whereas formal languages are, in one of its definitions, sets over strings/words.

Not that this means much, since the point of grammars is to define languages. As such, grammars (RG/CFG/NG/UG) stipulate the words that a language accepts. Words are important to computers (both in mathematical theory and in material reality).

> I've not said anything about computers being strictly deterministic.

My bad, that was my misreading of "formalized".

> And everything is binary at the CPU/GPU level. Even with specialized instructions, you still need to organize them into a proper algorithm and encode it and its data to binary.

Poor phrasinf on my part, but the "needing binary to accomplish inference" was supposed to be read in isolation. Still, computers do not require binary to operate. There are non-digital computers, both in history and being explored today. There are experiments on using trinary for optimizing LLM inference, for instance.

Wowfunhappy · 2026-06-18T12:04:44 1781784284

I mean, like, you can lament the state of the world all you want. It is what it is. Of course the AI labs would also like to make their models more consistent, but it's not how the technology works. They're black boxes to everybody.

dreambuffer · 2026-06-18T10:43:36 1781779416

Please do not think of LLMs like human helpers, that is a recipe for long term sociopathy.

dotancohen · 2026-06-18T08:47:30 1781772450

Honestly, the differences between AI models always felt to me like the differences between coworkers or job candidates. They don't all share the same strengths and weaknesses - and they all have both good days and bad days.

Realising this made me respect the "I" in "AI" a bit more seriously.

m-dot-reviews · 2026-06-19T02:34:18 1781836458

So, this may not be precisely what you're looking for but it may come close. I've put together a simple site for sharing ratings/opinions on models on a task-specific granularity. https://model.reviews/

The idea is that benchmark score comparisons are useful for a large cross-product comparison across models + their settings, but less useful if you're looking for the best model for <your-specific-task>. So on this site, each model gets its own page showing the list of tasks that people have rated it on, and the score out of 10 for each task. Common tasks, like coding, will likely be on most/all models, and more niche tasks may only be on a few. It is human moderated (by me only right now).

The corpus is pretty empty right now, so please spread the word if this seems like a useful idea!

yunohn · 2026-06-18T13:35:14 1781789714

> a product sheet showing what each models strengths an weaknesses are

This presumes that the labs themselves know how well their models perform. But all they have are overtuned benchmarks and hype vibes.

egwor · 2026-06-18T15:03:17 1781794997

Maybe this is similar to web search too. We know how to get google to return the results we want, and when we use other tools like Bing we get other behaviour.

epolanski · 2026-06-18T12:11:37 1781784697

The problem is that this is very hard to replicate and benchmarks focus on E2E tests, going from one prompt to the final solution.

They do not test how models perform when used interactively, like most of us do.

amelius · 2026-06-18T08:45:26 1781772326

Yes, but benchmarks can be gamed.

Maybe we need better reviewers then?

couscouspie · 2026-06-18T09:10:43 1781773843

That would be ideal, but AI is less like a tool and more like a human in this regard and you don't have character sheets for each of your colleagues, as well.

supergarfield · 2026-06-18T09:50:09 1781776209

If my coworker was part of a clone series of 100 million units, requesting a character sheet would be pretty reasonable

bluegatty · 2026-06-18T09:47:19 1781776039

These are $1 Trillion dollar companies that can't produce explicit details on how their products work? It's nonsense.

sixothree · 2026-06-18T16:21:37 1781799697

I think if they could explain how they work, their strengths and weaknesses, they would reveal to the world whose data they've been appropriating.

bluegatty · 2026-06-18T16:24:25 1781799865

That's another thing altogether. They can characterize the behaviour without quite giving up who and where the data comes from.

Admittedly, yes, there's some overlap there.

They would have to admit 'seen it in the training data' as a factor, and that opens a can of worms.

dkersten · 2026-06-14T21:56:49 1781474209

Haven’t Unreal engine and Unity been used for robotics for over a decade?

Hasn’t the Bullet physics engine been used for robotics for over a decade?

I don’t understand this “first game engine for robotics” messaging.

As an aside, this website crashes for me on safari on iOS.

Legend2440 · 2026-06-14T22:02:35 1781474555

Their claim is actually: "the world's first game engine purpose-built for robotics."

Idk if that's true or not, but it does exclude all the engines you mentioned.

dkersten · 2026-06-14T23:38:59 1781480339

Is that true though? From Cherno’s videos it sounded like it was basically the hazel engine, repurposed. So unless he rewrote hazel, purposely for robotics, it’s still not actually the case?

TheCherno · 2026-06-19T03:32:42 1781839962

While "rewrote hazel" might be a bit of a stretch, we did fundamentally rewrite a lot of the core to make it specifically suitable for robotics simulation, rather than human gamers.

dkersten · 2026-06-13T16:20:14 1781367614

They did, I had something like $50 in FTX and I got it back. I don’t remember when, I think it was about a year ago.

tim333 · 2026-06-14T09:41:07 1781430067

They refunded + 20% interest people with less than $50k. The larger investors will take longer.

dkersten · 2026-06-10T12:46:53 1781095613

Why am I not surprised that its a YC startup? Lately, being a YC startup seems to have become a negative signal for me, far too many grifters are getting funded by YC, it seems.

dkersten · 2026-06-10T07:15:04 1781075704

I haven’t used Fable/Mythos yet, but my experience with recent version of Opus, GPT 5.5 and recent Chinese models is that promoting again isn’t guaranteed to fix the underlying issues, nor is it guaranteed to not introduce more issues. I’ve seen SOTA models make ridiculously stupid architectural decisions that they were then unable to back out of without being prompted very specifically, instead adding a patchwork of “fixes” on top.

I’m not saying that you can’t use AI to do it because I believe that with carefully controlled workflows and context management you can, but it’s not a simple prompt away, it’s requires guidance and understanding, and isn’t the speed demon that raw prompting is.

locknitpicker · 2026-06-10T07:26:33 1781076393

> I haven’t used Fable/Mythos yet, but my experience with recent version of Opus, GPT 5.5 and recent Chinese models is that promoting again isn’t guaranteed to fix the underlying issues, nor is it guaranteed to not introduce more issues.

That's not really the point though. That presumes models are only useful if they are one-shot models. That is false.

I mean, what if your prompt successfully changes 20 source files and makes a mess in one? How much work did it saved?

And the elephant in the room is when models actually outperform whatever the prompter is able to deliver, and faster. That is somehow left out.

dkersten · 2026-06-10T07:49:55 1781077795

> That presumes models are only useful if they are one-shot models

That’s not at all what I’m saying.

I’m saying that in my experience across multiple models, the follow up prompts don’t fix prior underlying issues. They usually patch on top instead, unless you give them significant and time consuming guidance.

I want them to be more useful outside of one-shot uses, but I find that they currently miss the mark.

locknitpicker · 2026-06-10T13:47:34 1781099254

> I’m saying that in my experience across multiple models, the follow up prompts don’t fix prior underlying issues. They usually patch on top instead, unless you give them significant and time consuming guidance.

That's not my experience at all, and I have been using models that are far from being cutting edge. Even in the cases where a model generates utter nonsense, a couple of clarifying questions is all it takes to get it back on track.

But that might be a factor of the project being worked on, and the extension of the changes being asked.

dkersten · 2026-06-09T07:04:54 1780988694

> You've gone too fast, too much is vague, nothing is clear.

Contrast to when Clojure was released: Rich Hickey had spent years thinking about, researching, and refining the concepts. It was easy to understand what the language is. And it shows in the design quality as even now, almost two decades later, the language has changed surprisingly little and is still really good.

dkersten · 2026-06-08T22:22:55 1780957375

I’ve been playing around with groq and GPT OSS which they run at 1000 TPS (20B) or 800 TPS (120B) and the speed feels quite magical.

I haven’t tried cerebras’ 3000 TPS yet but I did try the demo of that 15,000 TPS model whose name escapes me right now.

I’m not sure if it makes a meaningful difference for my actual work, but it sure is amazing to watch it generate a screen full of text in the blink of an eye.

I do think it’s super useful for rubbing little validation checks like showing it a diff to ensure that the changes are on task, and being able to do those quicker really helps because it means you can do many focused checks without them getting in the way.

robberth · 2026-06-08T22:28:49 1780957729

https://chatjimmy.ai/ ?

msdz · 2026-06-08T22:39:27 1780958367

AFAIK Taalas, the company behind this demo, still only have their initially "hardwarized" model available to test in ChatJimmy, which IIRC is a rather stupid Llama 3ish 8b.

Don't get me wrong though, that demo is still incredibly impressive & makes me very much excited for the hardware-based model era (potentially) ahead.

Once you've experienced those speeds, you really start to think about the whole class of things that becomes possible; massively parallel decode paths, extensive reasoning loops, etc…

hedgehog · 2026-06-08T23:03:50 1780959830

For scale though if three or four chips that size can replicate a Qwen 27B experience that'll be quite useful.

dkersten · 2026-06-09T07:11:57 1780989117

That’s the one.

The speed is incredible and fun to see, but the model is rather weak to the point where I’m not sure it’s particularly useful for most people.

ayewo · 2026-06-08T23:18:37 1780960717

> I haven’t tried cerebras’ 3000 TPS yet but I did try the demo of that 15,000 TPS model whose name escapes me right now.

You were likely thinking of AI accelerator startup Taalas.

Previous HN discussion: https://news.ycombinator.com/item?id=47086181

dkersten · 2026-06-07T19:58:55 1780862335

Then they should not be allowed to use words like “buy” in the online stores.

teeray · 2026-06-07T21:31:40 1780867900

I wish legislators would poison marketing language more frequently in cases like this. It’s bad enough with “unlimited* data”.