AI & ML

Anthropic's Claude Mythos: The Leaked 'Step-Change' Model Rewriting AI Safety Calculus

Anthropic has accidentally confirmed the existence of Claude Mythos — its most powerful model ever, leaked through a misconfigured content management system. Early testers describe it as a new tier beyond Opus, with unprecedented capabilities in reasoning and cybersecurity that bring both immense promise and serious dual-use risk.

April 6, 2026 4 min read

The Leak That Wasn’t Supposed to Happen

Anthropic has spent four years building a reputation as the “safety-first” AI lab — the company founded by former OpenAI researchers who believed their former employer was moving too fast. The irony, then, is almost painful: the existence of Anthropic’s most powerful model ever was revealed not by a press release, but by a misconfigured content management system that left an unpublished blog post publicly searchable.

Fortune discovered and reported the leak in late March 2026. Anthropic confirmed it shortly after, attributing the exposure to “human error” in their CMS configuration. In doing so, they inadvertently confirmed details they hadn’t planned to announce publicly: the model is real, it’s called Mythos, and it represents what Anthropic internally described as “a step-change in capabilities.”

That phrase — “step-change” — carries significant weight in the AI industry. It means something qualitatively different, not just quantitatively larger. It’s the language used to describe the jumps between generations that actually alter what AI systems can do. Anthropic has used this term sparingly.

What Is Claude Mythos?

The model naming is somewhat complex. “Mythos” is the product generation name — the equivalent of “Opus” or “Sonnet” in Anthropic’s existing lineup. “Capybara” is the tier designation within that generation. The full designation is something like Claude Mythos Capybara, and it sits above the current Opus tier: the first model Anthropic has built that is explicitly “larger and more intelligent than our Opus models — which were, until now, our most powerful.”

What can it do that current models cannot? Anthropic has been measured in its disclosures, but the contours are becoming clear from the early-access customer reports and the inadvertently published draft:

Reasoning at scale. Mythos shows significantly improved performance on complex, multi-step reasoning tasks that require maintaining coherent chains of logic across very long contexts. Early testers working in scientific research describe it as the first AI model capable of independently navigating literature reviews without losing thread — synthesizing contradictory papers and generating novel hypotheses with a degree of rigor that approaches expert human performance.

Advanced coding. The model’s coding capabilities are described as a meaningful jump from Claude Opus 4.6. In internal testing, Mythos reportedly completes large-scale refactoring tasks that previously required multiple rounds of human correction in a single pass. It can reason about an entire codebase’s architecture, not just isolated functions.

Cybersecurity. This is where Anthropic’s communications have been most careful — and most revealing. The company has explicitly focused Mythos’s initial rollout on cybersecurity use cases, working with a small group of enterprise customers in that sector. Anthropic describes the potential “unprecedented” — both in terms of defensive applications (threat detection, vulnerability assessment, incident response) and, implicitly, in terms of dual-use risk.

The Safety Paradox

The cybersecurity focus is Anthropic’s acknowledgment of a tension at the heart of frontier AI development: the more capable a model becomes at understanding and reasoning about complex systems, the more capable it becomes at attacking them.

Anthropic’s approach is to lean into controlled, supervised deployment in the cybersecurity vertical before broader release — accumulating evidence about how the model behaves in the wild with expert operators before democratizing access. This is the “responsible scaling” playbook, and it’s been Anthropic’s differentiating narrative.

But critics in the security research community have noted the obvious limitation: a model capable of producing “unprecedented” cybersecurity insights in the hands of a vetted enterprise client is, architecturally, the same model that could produce unprecedented attack capabilities in the hands of a less scrupulous user. The difference between a vulnerability report and an exploit is often syntactic.

CNN’s reporting adds texture to this concern. Cybersecurity experts interviewed for the piece described Mythos as potentially the first model capable of autonomously developing novel zero-day exploits — not just cataloguing known vulnerabilities, but reasoning from first principles about where gaps in a given system’s defenses might exist. That’s a qualitative capability shift that the current generation of models, including Claude Opus 4.6 and GPT-5.4, does not clearly exhibit.

The Release Strategy

Mythos is not launching publicly. As of early April, Anthropic has distributed access to “a small group of early access customers selected by Anthropic,” with an explicit initial focus on cybersecurity applications. The company says it is “slowly expanding access to Claude Mythos to more customers using the Claude API over the coming weeks.”

The caveat is significant: Mythos is described as “a large, compute-intensive model that is very expensive to serve.” Anthropic is working to make it more efficient before any general release. This is consistent with the history of other frontier models — GPT-4 was initially expensive and capacity-constrained at launch, and its pricing fell by roughly 97% over 18 months as infrastructure improved and smaller distilled variants emerged.

The current expectation among Claude API customers is a limited beta through Q2 2026, with broader availability potentially tied to the release of a more efficient Mythos variant — internally called something other than Capybara, which presumably represents the maximum-compute configuration — in H2 2026.

What Anthropic Signaled, Even Inadvertently

The leak, embarrassing as it was operationally, may have served a useful function for Anthropic. In a competitive market where OpenAI just raised $122 billion at an $852 billion valuation and Microsoft shipped independent frontier models, Anthropic needed a signal that it remains at the capability frontier.

Mythos provides that signal — and the cybersecurity framing provides the safety narrative that differentiates Anthropic from its rivals. The message is: we have the most powerful model, we know it’s dangerous, and we’re handling it more carefully than anyone else would.

Whether the safety-first approach proves durable — or whether competitive pressure eventually forces faster, broader rollout — will define Anthropic’s next chapter. For now, Claude Mythos is the most anticipated model no one is allowed to use.

That might be exactly the point.

Sources

anthropic claude mythos AI-safety cybersecurity frontier-models LLM

Anthropic's Claude Mythos: The Leaked 'Step-Change' Model Rewriting AI Safety Calculus

The Leak That Wasn’t Supposed to Happen

What Is Claude Mythos?

The Safety Paradox

The Release Strategy

What Anthropic Signaled, Even Inadvertently

Sources

Related Stories

The Open-Source LLM Race: Llama 4 vs Mistral Large vs Qwen 3 — Who's Actually Winning?

GPT-5 Benchmarks Leaked — And They Tell a Very Different Story Than OpenAI Wants

Microsoft Breaks Free: Three In-House MAI Models Signal End of OpenAI Dependency