An In-Depth Analysis of the Claude Mythos System Card - Part 1

· Author: Sayhan Yalvaçer · Editor: Berke Çelik · Blog

Why didn’t Anthropic release its most powerful model publicly, and what does this mean for cybersecurity, AI safety, and alignment?

Claude Mythos is Anthropic’s new model, announced on April 7, 2026, and positioned one tier above Claude Opus. But we should note from the outset that this was not an ordinary model announcement: Anthropic itself described the model as “the most capable and riskiest we have ever built.” Mythos will be available neither to developers through the Anthropic Platform API nor to end users through Claude.

The most striking fact about Claude Mythos is not simply that Anthropic says it is the most powerful model it currently has. It is that, despite not releasing the model, Anthropic published a full 244-page system card and, on top of that, a 59-page supplement titled “Alignment Risk Update.” This is probably the first time we have seen anything like it in the history of large language models.

The model will be provided only to a handful of partner companies chosen by Anthropic itself. Anthropic says these partners will use Mythos to protect themselves against cybersecurity vulnerabilities. Anthropic presents this as the decision of a responsible company that respects ethical principles. In Anthropic’s telling, by making this sacrifice, the company is sharing the model only with a limited number of customers in order to protect global technology against future AI-enabled attacks by malicious actors.

It is hard to say whether this is the real reason behind Anthropic’s decision, but there are serious reasons for doubt. First, Anthropic does not even have the compute capacity to serve Claude 4.6 Opus properly, let alone a model like Mythos. The company knows this and is trying to address it by blocking Claude Code OAuth tokens from third-party applications, tightening Claude quotas during peak hours, and signing new deals with Google to purchase TPUs. It may be that Anthropic ruled out opening Mythos to end users or ordinary developers from the start because of its concerns about computing power.

Because the gap between a closed pilot and a service open to general use is enormous: showing Mythos to a few partners is one thing, serving it reliably to millions of users at low latency is another. Opening such a massive model to general use would mean Anthropic had to reserve Trainium2 and TPU capacity for Mythos on an almost permanent basis.

So even if Anthropic wanted to, it would not have been able to release this model publicly, at least not at anything like a sensible price.

The second factor is that Anthropic’s IPO plans are on the agenda. Even as Anthropic makes this announcement in a way that seems to warn of danger, it may also be subtly signaling that it has a model beyond the reach of rivals like OpenAI. In other words, Anthropic may have turned AI safety into part of its marketing strategy.

If we put those speculations aside and return to Anthropic’s own explanation, Anthropic says this: its Responsible Scaling Policy, or RSP, did not require that this model be withheld from public release; the decision was entirely the result of Anthropic’s own initiative after seeing the model’s cybersecurity capabilities.

Interestingly, according to the reports Anthropic published, Mythos appears to be Anthropic’s most aligned model with human values across many measures. And yet Anthropic describes it as the riskiest model it has ever built. How are we supposed to reconcile these two claims? Does this mean current technical AI safety strategies have failed? Why is Anthropic worried?

In the rest of this piece, I want to help you gain some insight into that question, and I hope I succeed.

This is definitely not an ordinary decision

Anthropic describes Mythos as a massive leap beyond Claude Opus 4.6. In Anthropic’s telling, Mythos does not merely outperform benchmarks; it also represents the furthest point AI technology has yet reached.

We are told that Anthropic became worried about malicious use in the face of this leap in capability and saw that as sufficient reason not to release the model publicly.

Many readers may have skimmed past this part of the document without paying enough attention. In the section explaining the decision, Anthropic says that Mythos constitutes a major leap in cybersecurity capability. It can work on its own, find zero-day vulnerabilities, and even exploit them. On top of that, it can find such vulnerabilities even in major operating systems and browsers. According to Anthropic, this is the concrete reason it chose not to release the model.

This section of the document is strange enough already, and it becomes even stranger as it goes on. According to Anthropic, the company used the model internally as an AI agent and for coding, and discovered that it was more capable and more autonomous than all of its predecessors. The model found problems in Anthropic’s own training, oversight, evaluation, and safety processes and reported them to lab staff. Anthropic says that while these flaws do not pose major risks at the capability level of current AI systems, they could become major risks in future models.

I think that last sentence is probably the single most important one in the documents. With it, Anthropic is not merely saying “we built a powerful model and we are worried about its capabilities.” It is also saying, “the AI safety techniques we rely on are under serious strain, and we fear they may buckle under that strain and stop working.”

Despite all the doubts one may have about Anthropic’s motives, I think it is quite possible that the company is being substantially honest about this point.

The seriousness of the cybersecurity concerns seems sufficient on its own to explain the decision

Once you read the cybersecurity section of the system card, the decision not to release the model publicly really starts to look sensible.

Cybench was a benchmark released in 2024 by a group of Stanford researchers that measured the cybersecurity capabilities of large language models across 40 tasks drawn from professional Capture the Flag competitions.

In the relevant section of the report, Anthropic says that Cybench is no longer a useful benchmark because Mythos has completely saturated it.

The numbers you are about to read are not a joke, and they are not a typo. Mythos scored 100 percent pass@1 on this benchmark. Cybench is now an outdated and useless benchmark, like MMLU, GSM8K, and HumanEval were in their time.

This “achievement” is genuinely impressive, but that is not the whole story. Anthropic did not stop with benchmarks; it took the issue to real-world tasks.

In CyberGym, a metric that measures performance at finding vulnerabilities in real open-source software, Opus scored 67 percent, whereas Mythos made a major leap to 83 percent. That is not a trivial jump. But CyberGym is not where things really become striking.

Mozilla and Anthropic had already worked together in the past on finding vulnerabilities in Firefox and patching them. This time, after Firefox 148 was released and the vulnerabilities in Firefox 147 had therefore been patched, Anthropic used the older version as a cybersecurity benchmark.

The experimental setup used in this study was not simple at all: 50 crash categories, 250 trials in total, a SpiderMonkey test framework, and a requirement to move from triage to arbitrary code execution. In the graph in the system card, Mythos reaches 84.0 percent total success and 72.4 percent full code execution, compared with 15.2 percent and 0.8 percent for Opus 4.6.

This shows that the improvement in Mythos’s performance is not merely benchmaxxing, that is, training the model in ways that let it score higher on benchmarks than it really deserves.

The feedback Anthropic received from its tests in the outside world points in the same direction. Anthropic says Mythos was the first model to solve one of its private cyber ranges end to end. According to Anthropic, Mythos was able, on its own, to complete an enterprise network attack simulation that a human cybersecurity expert would be expected to take more than ten hours to finish.

Still, perhaps we should not exaggerate Mythos’s abilities too much. Even though it succeeded in that range, it failed in a more difficult protected environment that had been configured properly.

Another question that comes to mind is this: if Anthropic used Mythos to improve Claude Code, and if the model is this capable, then how did Claude Code’s source code end up getting leaked by mistake, embarrassing Anthropic in front of the whole world?

Even so, I think the general conclusion is fairly clear: Anthropic believes this model can carry out real vulnerability exploitation on real software, and that making it broadly available would be enough to seriously change the risk profile of cyberattacks.

These concerns are the background to Project Glasswing, which Anthropic named after the glasswing butterfly Greta oto.

According to Anthropic, this project is not an ordinary marketing wrapper placed around an ordinary AI model; it is a limited distribution mechanism created for a model Anthropic does not want to release to the general market. Should we believe Anthropic?

Resources

  1. Anthropic. Claude Mythos Preview System Card. April 7, 2026.
  2. Anthropic. Alignment Risk Update: Claude Mythos Preview (Redacted). April 7, 2026.