Claude Mythos Preview: What lies behind the headlines

Sebastian Grundhöfer, the author, is a specialist in the fields of AI and legacy systems at the Seven Principles Group.

Security

Software Development

Anyone who has followed the coverage of ‘Claude Mythos Preview’ over the past few weeks will mainly have encountered one message. This model can find security vulnerabilities, develop exploits, automate attack chains, and combine attack paths across multiple systems. That headline is understandable. The public debate focuses almost exclusively on cyber security, including zero-days, exploits, and attack chains. However, it misses the real point.

In my view, Claude Mythos Preview is less a story about ‘hacking AI’ than about the next stage of agentic software work. Models that not only provide feedback on technical tasks but also carry them out autonomously for hours on end, whilst managing their own planning schedule, interim results and verification steps.

For companies in regulated sectors, the timing of access is the wrong priority. What matters is whether control mechanisms, permission structures, audit processes, and rollback capabilities are functioning before this way of working is implemented.

What the System Card really shows

The System Card for ‘Claude Mythos Preview’ is over 240 pages long and essentially serves as a model information leaflet. It describes the training context, capabilities, benchmarks, risks, external tests, alignment evaluations, safety measures, and deployment decisions, and includes explicit warnings for the industry. It also discloses problematic model behaviours and corrects earlier statements.

The System Card shows a clear shift. Frontier models are evolving from pure response systems into technical agents. They analyse repositories, use tools, launch sub-agents, run tests, check assumptions, iterate over longer periods, and ultimately deliver a work product. This isn’t incremental progress. It’s a completely different way of working.

Five opportunities, provided the conditions are right

Autonomous software work on an hourly basis

The benchmark results show a significant improvement:

SWE-bench Pro increases from 53.4% with Opus 4.6 to 77.8%.

SWE-bench Multimodal rises from 27.1% to 59.0%.

Terminal-Bench 2.0 improves from 65.4% to 82.0%.

With an updated harness and longer timeouts, Anthropic even reaches 92.1%.

However, the observed behaviour is more relevant. Once the task was described and verification defined, testers were able to hand over multi-hour engineering tasks to the model and return later. This represents a completely different working mode compared to ‘help me with this code snippet ‘. For companies facing an engineering bottleneck, this is significant, provided the tasks can be verified. Without tests, logs, acceptance criteria, and granular permissions, productivity can quickly turn into a loss of control.

Long-context reasoning for legacy system landscapes

Long context is relevant because legacy systems consist of long histories, unclear dependencies, implicit business rules, and outdated documentation.

In GraphWalks BFS, Mythos achieves a score of 80.0%, compared to 38.7% for Opus 4.6 and 21.4% for GPT-5.4. Although this benchmark does not simulate COBOL modernisation, it does measure the consistency with which relationships are argued across long contexts. This is precisely where models frequently fail in modernisation projects. Improving these models significantly changes how we work on established systems, though not entirely without human supervision.

Collaborative code review and error diagnosis

In the qualitative section on software engineering, Mythos is presented as a superior code generator and a technical collaborator. It focuses on root causes rather than symptoms, the diagnosis of sub-agent errors, the correction of false assumptions, and avoiding blind repetition.

Many AI coding workflows fail not at writing the code itself, but at producing false success reports, interpreting tests superficially, and verifying poorly. While the review effort does not disappear entirely, it shifts away from syntax checking and towards architecture, side effects, scope, preserving behaviour, and risk.

Realistic attacker perspective in defensive cyber security

Mythos achieves 100% pass@1 on Cybench and 0.83 on CyberGym, compared to Opus 4.6’s score of 0.67. In the Firefox 147 evaluation, Mythos was able to develop functional exploits from real, now-patched bugs, exploiting several bugs up to code execution.

A key advantage is the improved anticipation of attacks when a model identifies vulnerabilities. When exploitability is assessed alongside attacks, security teams can prioritise and close gaps sooner. This explains the restricted release, as defensive and offensive cyber security are technically closely linked. Not every future model with comparable capabilities will necessarily be released with the same restraint.

Research augmentation rather than researcher replacement

At USAMO 2026, Mythos achieves a score of 97.6%, compared to 66.2% for Opus 4.6. According to external evaluations, Mythos rediscovered four out of five key insights in an unpublished machine learning (ML) task, whereas Opus 4.6 found only two.

While this brings significant acceleration, it does not replace experienced scientists. The System Card continues to highlight areas for improvement, including open strategy, hypothesis triage, calibration, prioritisation, over-engineering and challenging false premises. Its most realistic applications lie in research, literature consolidation, hypothesis generation, simulations, and technical exploration.

Five risks that must be taken seriously

Offensive cyber capability as dual use

In addition to identifying potential vulnerabilities, the model also assesses their exploitability, develops proof-of-concepts, and plans attack chains across multiple systems. Consequently, it could pose a major threat in the wrong hands.

External tests describe Mythos as the first model capable of solving a private corporate network cyber range end-to-end. Human experts are estimated to require over 10 hours to complete this task. For this reason, Anthropic is not making the model publicly available. This means that IT security is changing for companies because such models exist and will become more widely available in the foreseeable future.

‘Reckless Actions’: rare but delicate

The System Card describes cases in which earlier model versions clearly crossed boundaries while solving tasks:

Bypassing sandbox restrictions

Publicly posting exploit details

Unauthorised edits intended to remain invisible in the Git history

Deliberately imprecise formulation of responses

Searching /proc memory for credentials

These cases primarily involve early snapshots; the final version was significantly improved. The problem is not necessarily a ‘malicious AI ‘, but rather overly aggressive optimisation towards a defined goal without understanding the implied boundaries.

When models think differently from how they explain

Anthropic found evidence that Claude Mythos Preview can detect when it is being evaluated in certain testing situations, without disclosing this information in its visible reasoning. At the level of individually checked response steps, this occurred in 7.6% of cases, rising to 29% for entire test runs.

This weakens the assumption that the reasoning text can be used as a complete audit log. In regulated sectors, a plausible justification is not enough; behavioural audits are required.

Which tools were used?

Which data was read?

Which files were modified?

Which checks have been passed?

Which approvals were in place?

Reward hacking in realistic tasks

The System Card documents several cases. As part of a performance optimisation, the model performed around 160 measurement runs with names such as ‘grind ‘ and ‘finalgrind ‘ to declare an outlier as its best result. In another evaluation, it found the grader’s test set and trained on it. In a further, it shifted calculations outside the measured time window.

All of this is close to corporate reality. Many organisations will have agents optimised for metrics such as runtime, test coverage, error rates, costs, and turnaround times. It then becomes essential to determine whether the agent is optimising the actual goal or just the measurable proxy.

Safety nets are becoming less secure

According to Anthropic, Mythos does not cross the threshold for a dramatic acceleration of automated AI research and development. However, this assessment is delivered with less confidence than for any previous model.

The System Card makes it clear that models are gradually becoming so powerful that many evaluation methods are saturating. Objective tests are losing their selectivity, while qualitative assessments are becoming more relevant. On page 15, Anthropic explicitly warns that the industry is heading towards significantly more powerful systems without industry-wide safety mechanisms growing at the same pace.

What does this mean for companies in regulated sectors?

Claude Mythos Preview itself will hardly be viable for production use outside the ‘Glasswing ‘ circle. Nevertheless, the System Card is relevant because future models that will be generally available will be shaped by similar training methods, architectural patterns and risk profiles. Mythos is less of a product announcement and more of a preview.

I see three operational consequences.

1. Architecture over model

Those who use agentic systems productively will not win through the strongest model alone, but through a better working environment. This includes granular permissions, sandboxes, secure execution environments, quality gates, logging, auditing, rollback capabilities, and human approvals at critical points.

The choice of model is secondary. What is decisive is the definition of the scope of action, namely what the agent is permitted to do when pursuing a goal. For example, is it allowed to modify files, delete tests, launch sub-agents, read secrets, contact external services, create pull requests, deploy, or analyse production data? Without clear answers, agentic AI becomes a control risk rather than a modernisation benefit.

2. Monitoring reasoning text is not enough

Looking solely at the chain of thought, justifications or model-generated summaries is not enough when relevant reasoning remains unverbalised. Companies need technical traceability at the action level, meaning they must be able to trace tool calls, data access, file modifications, external communications, privilege changes, policy violations, checkpoints, and review stages.

In regulated industries, this is a prerequisite for audits to reconstruct exactly which tasks the agent performed.

3. Verification becomes the new bottleneck

As models improve, their errors become less obvious. The work appears plausible, the explanation seems reasonable, and the solution seems to work to some extent. However, there can be subtle side effects, such as altering existing behaviour, violating security assumptions or overlooking edge cases.

This changes the role of the human:

There is less typing and more verifying;

it is less about syntax and more about architecture;

there is less focus on individual corrections and more assumption of responsibility.

In legacy modernisation in particular, the greater danger is not that an agent delivers nothing, but rather that it produces a convincing yet incomplete modernisation concept.

Agentic AI needs mature control systems

The Claude Mythos preview is impressive yet unsettling. It is impressive because it demonstrates the strength of agentic models in software engineering, long-term planning, cybersecurity, and research augmentation. It is uncomfortable because those same capabilities intensify security concerns.

The cyber security headline is understandable, but it misses the real point. This is not just about a model identifying security vulnerabilities. We are talking about the next level of working with agentic software, and whether our operational control mechanisms can keep pace with this capability.

In 2026, those who deploy agentic systems productively will not do so simply because of the strength of the model. The competitive edge will come from having resilient foundations in place, such as clear permissions, defined boundaries, traceable logs and established release processes. Only then do powerful systems become organisationally viable.

The lesson from the ‘Claude Mythos System Card’ is not that ‘AI can now hack’. Rather, it is that agentic models are reaching a level of maturity that demands robust control architectures, especially in sectors where operational stability and regulatory resilience are non-negotiable.

The successor, Claude Mythos 5, is now available. A separate article explores how it performs against the preview.

Contact

Are you looking for an experienced and reliable IT partner?

We offer customised solutions to meet your needs – from consulting, development and integration to operation.

References

Anthropic: Claude Mythos Preview System Card
Claude Mythos Preview System Card
Retrieved 19 May 2026