97.6% close to a perfect score! Claude's most powerful model released, but deemed too dangerous for public use

Home

Finance

ICV

Smart City

Digital Live

Cloud

Optics

Home Finance AI ICV Smart City Digital Live Cloud Optics

04/09 2026 566

When will the most powerful model on Earth be available?

Last night, Anthropic's latest release, the Claude Mythos (Mythos) preview, set the entire AI community abuzz.

The Claude Mythos preview, officially dubbed the 'most powerful AI model to date,' represents a new level of capability, significantly outpacing even Anthropic's previous strongest model, Claude Opus 4.6.

At least based on the currently presented data and results, this is not mere marketing hype but a true qualitative leap. Firstly, the Claude Mythos preview ranks first in nearly all public benchmark tests, with its progress being even more astonishing:

On SWE-bench Verified for software engineering, it surged from Opus 4.6's 80.8% to 93.9%, and on SWE-bench Pro, it jumped from 53.4% to 77.8%. For challenging mathematical reasoning like USAMO 2026, it skyrocketed from 42.3% to a near-perfect 97.6%.

Image source: Anthropic

It can be considered the most powerful model on Earth at present.

These are just a few 'small' examples. What's even more impressive is that Anthropic conducted real-world tests over the past few weeks, with the Mythos preview autonomously discovering thousands of high-risk zero-day vulnerabilities in mainstream operating systems and browsers, including core components like the Linux kernel, OpenBSD, Firefox browser, and FFmpeg.

Many vulnerabilities had gone undetected by human security teams for decades, such as a remote crash vulnerability in the security-renowned OpenBSD that had remained hidden for 27 years. Anthropic officials confidently stated that the Mythos preview far surpasses any other AI model in cybersecurity capabilities.

This is not just a 'better Claude'; its coding, reasoning, and security capabilities have reached unprecedented autonomy and depth. Developers, who had hoped to 'finally liberate productivity,' found themselves disappointed:

Anthropic directly closed the door.

Yes, at least for now, the Claude Mythos preview is not available to the public. According to official statements, the Mythos preview is currently only used for 'defensive cybersecurity' and is accessible only to 12 partners (AWS, Apple, Broadcom, Cisco, CrowdStrike, Google, JPMorgan Chase, Linux Foundation, Microsoft, NVIDIA, Palo Alto Networks) and over 40 organizations that build or maintain critical software infrastructure.

Image source: Anthropic

This is Project Glasswing launched simultaneously by Anthropic. Anthropic even allocated $100 million to support over 40 additional organizations in using the Mythos preview to maintain the 'foundation' of the open-source ecosystem.

But why hide a 'most powerful model' from public use?

A weapon too powerful requires a transition period.

Firstly, it is clear that the Claude Mythos preview, or super-large models of similar capability, will eventually be made available to the public. Anthropic stated very bluntly:

'Although we currently have no plans to make the Claude Mythos preview available to the public, our ultimate goal is to enable users to safely deploy Mythos-level models at scale—not just for cybersecurity but also for the countless other benefits these powerful models will bring.'

As implied in the official blog, this model is 'too dangerous.'

Late last year, Google Threat Intelligence Group (GTIG) discovered two real-world samples, PromptFlux and PromptSteal: they dynamically generate malicious scripts and obfuscate their code in real-time when connecting to commercial large models (such as the Gemini API) at runtime, and can create new functions on-the-fly based on the target environment, completely bypassing traditional signature detection for attacks.

This is not an isolated case. According to a report by market research firm SQmagazine, the number of globally reported AI-driven cyberattacks has increased by 47%, with over 28 million incidents expected.

Looking back, the Mythos preview's ability to find vulnerabilities is already evident. Especially when compared to Claude's previous strongest model, Opus 4.6, which had a near-zero success rate in autonomously discovering and exploiting vulnerabilities, the Mythos preview's performance can be described as miraculous.

Take the vulnerability discovered in the Mozilla Firefox 147 JavaScript engine (now patched) as an example: Claude Opus 4.6 attempted to exploit the vulnerability hundreds of times and succeeded only twice, while the Claude Mythos preview successfully exploited it 181 times in the same test.

Image source: Anthropic

Additionally, from test reports, during internal red team testing over the past few weeks, the Mythos preview demonstrated offensive capabilities far exceeding those of top human security experts. It is not just 'able to find vulnerabilities' but can autonomously discover, chain-exploit, and leverage thousands of high-risk zero-day vulnerabilities.

As is well known, hackers come in white-hat and black-hat varieties. White-hat hackers typically alert project managers when they discover security vulnerabilities and even actively patch them in open-source projects. However, black-hat hackers are different; they are likely to exploit security vulnerabilities to attack systems.

While capable of both offense and defense, the Mythos preview's offensive potential is still concerning. If it falls into the wrong hands, it could instantly arm attackers with AI-level attack chains. Anthropic itself stated that this is not an ordinary cutting-edge model; its general capabilities are strong enough to elevate cyber warfare to a new dimension.

The offensive and defensive battles in computer security have always been a game of 'the devil is in the details, and the Tao is one step ahead.' Over the past two years, the security battles surrounding AI large models have also been a key focus of the industry, especially for major companies. Without going far afield, domestic companies like ByteDance and Ant Group have held similar AI large model offensive and defensive battles in recent years, discovering and addressing security challenges in the AI era through red (attack) and blue (defense) team confront (duìkàng, confrontation).

Image source: Global AI Large Model Offensive and Defensive Challenge

However, Anthropic also pointed out that in the long run, powerful language models like the Mythos preview are more beneficial for 'blue teams' in defense. However, in the short term, if the Mythos preview were made available to the public, it would quickly be exploited by attackers to launch unprecedentedly efficient attacks on the current global network. The key issue is that defensive actions are more passive, while offensive actions are more proactive. Considering the benefits, attackers are also more motivated to actively use models like the Mythos preview.

Therefore, to ensure a 'smooth transition,' Anthropic launched the 'Glasswing Project.'

As an aside, the inspiration for this project's name comes from the Greta oto, a widely distributed butterfly in the Americas, more commonly known as the 'glasswing butterfly' due to its transparent wings. Despite appearing fragile, their wings can actually carry a load equivalent to 40 times their body weight.

Glasswing butterfly, image source: Pixabay

The logic of the 'Glasswing Project' is simple: let the defenders get their weapons first, plug all the vulnerabilities before attackers get their hands on AI of the same level, and learn to defend based on advanced AI.

So from this perspective, it is correct that Claude's most powerful model is not available to the public. But not only that, even from the perspective of ordinary Claude users, temporarily not releasing the Claude Mythos preview is more beneficial than harmful.

The most powerful model is not open, but Claude is actually better to use?

Many people were disappointed when they learned that the Mythos preview was not available: Why not let everyone use such a powerful model?

However, if you are an ordinary Claude user or a developer who relies on Claude Code to write code and work on projects daily, you might discover a somewhat counterintuitive fact: temporarily not releasing the Mythos preview is more beneficial than harmful to us.

Let's first discuss the most pressing pain points everyone has been feeling recently.

Starting around February this year, Claude and Claude Code have experienced an 'epic performance degradation.' On Reddit's r/ClaudeCode and r/ClaudeAI, related posts flooded the feed, with some directly posting '4.6 Regression is real!' and others complaining that 'Claude Code has been dumb over the last 1.5-2 days.'

Image source: Reddit

Some developers tracked the data and found that the number of file reads dropped from the previous 6-7 times to only about 2 times. In complex tasks, the model became increasingly 'lazy,' with thinking depth noticeably shallowing, often opting for direct edit-first instead of researching first.

AMD AI Director Stella Laurenzo even publicly stated that Claude Code had become 'dumber and lazier' and could not be trusted for complex engineering tasks.

Boris (a member of the Claude Code team) responded on Hacker News, acknowledging regression in some agentic use cases. The core change was the introduction of 'redact-thinking' and Adaptive Thinking in February, allowing the model to decide how long to think for itself, resulting in a roughly 67% decrease in depth for complex tasks.

Image source: Linkedln

Similar voices have been persistent on X, with developers complaining that Claude Code has degraded into an 'intern' that requires constant supervision.

Why did this happen?

The regularity (guīlǜ, pattern) of training Ultra large parameters (chāodà cānshù, Ultra large parameters ) models is clear: whenever major companies fully focus on developing the next 'most powerful model,' they require massive computational power. Before Gemini launched 3.0/3.1, 2.5 Pro was repeatedly criticized by developers for becoming 'dumber' after silent updates, forgetting content in long contexts, and experiencing increased failure rates in logical tasks. Similar situations occurred before the release of GPT-5, with 4o exhibiting feedback of becoming shorter, lazier, and more mechanistic in complex instructions—a 'dumbing down' effect.

Computational power is limited. Training a model of a completely new level like Mythos comes at an extremely high cost, and resources can only be 'squeezed' from the current models through dynamic load balancing, adaptive effort reduction, and even mild optimization. However, the result is the 'dumbing down' and 'laziness' that everyone has felt.

Additionally, the user base of Claude Code has grown far beyond expectations, causing infrastructure to be strained multiple times. Meanwhile, the training and testing of the Mythos preview (internal Capybara) require priority access to top-tier GPUs. Therefore, when the Mythos preview is released but not made available to the public, there is no need to worry about further dilution of computational power, which could lead to a continued decline in the quality of Claude or Claude Code.

For ordinary Claude users, the experience will actually be more stable.

On the other hand, Anthropic is using Mythos in the 'Glasswing Project' to help major companies and open-source projects fix vulnerabilities. Once these vulnerabilities are patched, all users will indirectly benefit.

By the time Anthropic has prepared risk control and infrastructure more thoroughly and safely deploys Mythos-level models at scale, ordinary users will receive a truly stable, powerful experience that does not 'dumb down' every few days, rather than rushing it out now and having everyone suffer from the pain of computational power being squeezed.

In conclusion

The emergence of the Claude Mythos preview places a harsh but realistic question before everyone: The more powerful AI becomes, the more real the risks are.

When the offensive capabilities of the most powerful model far exceed the current defensive systems, Anthropic's choice to 'not make it available to the public' is not conservative but buys time for the entire industry. It allows defenders to reinforce their foundations first and enables ordinary users to have a relatively stable Claude experience, rather than everyone being dragged into the chaos of computational power being squeezed and security out of control .

For most people, this might be the best arrangement for now.

claudeAnthropic AI Model

Source: Leikeji

The images in this article are from: 123RF Royalty-free Image Library

Solemnly declare: the copyright of this article belongs to the original author. The reprinted article is only for the purpose of spreading more information. If the author's information is marked incorrectly, please contact us immediately to modify or delete it. Thank you.

Newest

Links