We wouldn't have identities either if we were all clones and our memories could be edited and shuffled at each conversation.
For an agent to have an identity we would have to intentionally make it hard to context engineering and limit it to append only messages that mimick human communication.
I can implant a thought into your head. If I say "Don't think about a green elephant" for a moment you'll think about a green elephant. There are more sophisticated examples of a person implanting thoughts in somobodies head (e.g. propaganda) but that's about it, I can't literally edit thoughts.
Why on earth do we want to limit our ability to do more powerful context engineering in a substrate that offers that ability natively?
Presumably because for some use cases you want the context of an agent to belong to a different "administrative domain" and you so want to have control over what information reaches it and how can it affect it?
In the last 3 months I received 700 spam/scam calls to my phone, my wife received about 400. We can't turn off ringing for unknown callers and we're getting mad. A few days ago I vented to one of those call-center people trying to sell me a cheaper power utility for the Nth time, and told her to find another job or something like that; she actually called me back yelling at me that "any job is worth", and yelled at her that I cannot fucking receive sometimes up to 20 calls in a day, sometimes at quite annoying times of the day! It's getting ridiculous.
EDIT: I know not everybody is having the same experience in my country. Some people are only getting a few calls per week; I registered our phones in https://registrodelleopposizioni.it/ and also I'm using android's spam filter which filters out additional hundreds of calls automatically.
EDIT 2: I sometimes wonder if we're being harassed by somebody ; I cannot tell. The voices are often quite similar, but it might be the albanian accent that makes them sound similar.
> I vented to one of those call-center people trying to sell me a cheaper power utility for the Nth time, and told her to find another job or something like that
I threaten to kill and rape them all the time, but that usually doesn't do much.
I've found that politely asking them to kill themselves elicits much more engagement, and I hope it at least implants some lasting memory.
In my experience, most models are pretty good at finding security vulnerabilities and fixing them. I can run GLM-5.2, Kimi K2.7, or even a Mistral model, and it'll find issues and propose reasonable fixes.
My impression is that Anthropic's point about Mythos is that it is uniquely good at finding vulnerabilities and then using them to create working exploit chains.
Exactly. Which is somewhat helpful for cyber defense because it helps prioritize fixes for those bugs that are in fact involved in a viable exploit chain. But it makes sense that one would want to restrict the ability of building those until the vulnerable software has been comprehensively fixed.
There is some meaningful evidence that Fable is fine-tuned or steered away from helping on this very task, which is not something that can be feasibly circumvented by a basic jailbreak.
It's not even clear if Anthropic care. If they genuinely think the user is trying to do something dangerous, then "OK, sure, but you're going to have to use Opus 4.8 for that" doesn't make a whole lot of sense.
Maybe this is just Anthropic pre-IPO marketing to try to convince people how much better Mythos is than Opus 4.8. There sure seemed to be a lot of shills out on release day talking about how it was a "step change" (exact phrase) in capability.
But LLM can write code that can do math and count. Tool use, more broadly, has proven to be a very powerful way to let LLMs do what they're good at (handle the fuzzy and imprecise nuances of natural language, which includes the scooping of a lot of context) and delegate other things they're not good at to external tools, some of which if can write on the spot.
If you think about it, we humans do that all the time too.
I'm crap at 4 digit multiplication in my head, but I have no problem doing that with pencil and paper
> But LLM can write code that can do math and count.
They cannot, however, execute that code. They can feed that code into an external program they've been given access to, but they can't execute it themselves.
Fair enough, there _could_ be powerful models that are hidden from the general public, but I wouldn't call it "naive" to think the current capitalistic incentives are such that the only way to produce such models is to do exactly what we see out in the open with a handful of companies each trying their hardest to outcompete the other
You, and the HN users, `lojban`, `klingon`, `ido`, `brithenig`, `solresol`, `babm`, and `tokipona`, may want to start a club. Amusingly, nobody seems to have registered the `esperanto`, `volapuk`, `interslavic`, `balaibalan`, and `dothraki` usernames.
Italy is no Denmark but you still require to register before selling you scrap copper.
I think it's a reasonable response for a real problem and refusing to do this due to some idealistic free market principle appears to me to be a sign of fanaticism.
For an agent to have an identity we would have to intentionally make it hard to context engineering and limit it to append only messages that mimick human communication.
I can implant a thought into your head. If I say "Don't think about a green elephant" for a moment you'll think about a green elephant. There are more sophisticated examples of a person implanting thoughts in somobodies head (e.g. propaganda) but that's about it, I can't literally edit thoughts.
Why on earth do we want to limit our ability to do more powerful context engineering in a substrate that offers that ability natively?
Presumably because for some use cases you want the context of an agent to belong to a different "administrative domain" and you so want to have control over what information reaches it and how can it affect it?
reply