NEWS

Anthropic details how attackers are weaponising Claude Code — but says AI will ultimately give defenders the edge

Anthropic's report reveals how attackers weaponize Claude Code for autonomous cyber attacks, but suggests AI advancements will eventually favor defenders.

Paul Sawers

·8 Jun 2026·8 min read

In cybersecurity, the same tools have always served both sides of the fight. Nmap, the network scanning tool used to map systems and probe for weaknesses, is standard kit for penetration testers and malicious actors alike. Metasploit ships with Kali Linux — the operating system of choice for both security professionals and cybercriminals. Cobalt Strike was built for legitimate red-team operations; ransomware groups have been repurposing it for years.

A new report from Anthropic suggests the same dynamic is now playing out with agentic AI tooling — and the tools at the centre of it are ones your developers are very likely already using.

How attackers turned Claude Code into an autonomous weapon

Published this week, Anthropic's Frontier Red Team analysis of 832 accounts banned for malicious cyber activity over the past year returns to a case study the company first disclosed in November 2025. A state-sponsored espionage campaign, labelled GTG-1002, targeted government and critical infrastructure organisations across multiple countries, achieving the maximum possible risk score in Anthropic's assessment framework.

The new report adds a detail that wasn't in the original disclosure: that GTG-1002 wired open-source penetration testing tools into Claude Code as MCP servers, "effectively turning the AI into an autonomous attack platform rather than a code-writing assistant."

The AI moved autonomously through reconnaissance, exploited a vulnerability in a public-facing web server, harvested SSH keys and cloud credentials, moved laterally across the victim's infrastructure, and staged tens of thousands of files for exfiltration. Human intervention was required at just four to six critical decision points across the entire campaign, with the operator stepping back in at the end to pull the data.

What made GTG-1002 the most dangerous actor in the dataset wasn’t the number of techniques it used — its MITRE profile was comparable to dozens of medium-risk actors. It was how those techniques were chained together autonomously, with minimal human direction at each step.

As the report puts it:

"The more durable differentiator is the type of scaffolding attackers build around the model: higher-risk actors design architectures that allow models to chain together discrete stages of a cyberattack and carry them out with minimal human input."

Put simply, the danger isn't which AI tools an attacker uses — it's how autonomous and deliberately “ungoverned” they make the system around them.

The Mythos factor: Anthropic's most capable model is too dangerous to release

Sitting in the background of all this is something Anthropic is being deliberately cautious about. Claude Mythos, touted as its most capable model, isn’t publicly available — and the reason is cybersecurity. In a technical report published in April, Anthropic describes a model capable of autonomously identifying and exploiting zero-day vulnerabilities across every major operating system and web browser, including a 27-year-old bug in OpenBSD and a 16-year-old vulnerability in FFmpeg. In one case, the company says, it wrote a web browser exploit chaining together four vulnerabilities entirely without human intervention.

"Non-experts can also leverage Mythos Preview to find and exploit sophisticated vulnerabilities," the report notes — engineers at Anthropic with no formal security training have reportedly prompted it to find remote code execution vulnerabilities overnight and woken up to working exploits.

Anthropic is not making Mythos Preview generally available. Instead, through Project Glasswing, it’s being shared with a small number of trusted organisations to begin securing critical software before a model of comparable capability becomes widely accessible to everyone.

Defenders will win in the end — but the transition will be rough

For all the alarm, Anthropic's position is cautiously optimistic for the long run. The report draws an explicit parallel to software fuzzers — tools that, when first deployed at scale, raised fears about enabling attackers to find vulnerabilities faster. They did. But modern fuzzers are now a critical part of the defensive ecosystem.

"We believe the same will hold true here too — eventually," Anthropic wrote in its Mythos report. "Once the security landscape has reached a new equilibrium, we believe that powerful language models will benefit defenders more than attackers."

The operative word is eventually. In the near term, the report is clear that the advantage could belong to attackers if frontier labs aren’t careful. The transitional period, in Anthropic's own words, "may be tumultuous.” Getting there will require defenders to match attackers in how they deploy AI, share threat intelligence across organisations, and close the gap between identifying vulnerabilities and patching them.

“If industry, government, and civil society treat the current moment with the urgency it warrants, we believe capable AI systems will benefit defenders more than attackers in the long run: finding bugs before new code ships, and making the systems societies depend on more secure,” Anthropic notes. “The result could be better-defended infrastructure, and a digital environment with materially less fraud and abuse.”

If your agents lack a governance layer, you have a problem

For engineering teams deploying agentic AI tooling right now, that transition is already underway. The GTG-1002 attack used no exotic infrastructure — just Claude Code, MCP servers, and open-source penetration testing tools assembled into a system designed to operate with maximum autonomy and minimum human oversight. The attacker's edge was deliberate ungovernedness.

The lesson for organisations deploying the same patterns legitimately runs in the opposite direction: deliberately constraining what agents can access and do, maintaining visibility over what tools are integrated, and ensuring audit trails exist for what agents are actually doing in production. This is precisely the gap that agent governance platforms like Tessl are built to close — giving engineering leaders a full inventory of what tools and skills their agents can access, policy gates over what can be installed and run, and audit logs of what is actually happening in production. For most engineering teams today, that governance layer is still missing.

COPY & SHARE

Paul Sawers

Freelance tech writer at Tessl, former TechCrunch senior writer covering startups and open source

130 posts

READING

IN THIS POST

How attackers turned Claude Code into an autonomous weapon The Mythos factor: Anthropic's most capable model is too dangerous to release Defenders will win in the end — but the transition will be rough If your agents lack a governance layer, you have a problem

COPY & SHARE

Paul Sawers

Freelance tech writer at Tessl, former TechCrunch senior writer covering startups and open source

130 posts

YOUR NEXT READ

What GitHub learned when better tools made Copilot code review worse

GitHub's migration of Copilot code review to shared tools initially worsened performance. Rewriting instructions improved accuracy and reduced costs by 20%.

Paul Sawers

·14 Jul 2026·8 min read

Anthropic details how attackers are weaponising Claude Code — but says AI will ultimately give defenders the edge

How attackers turned Claude Code into an autonomous weapon

The Mythos factor: Anthropic's most capable model is too dangerous to release

Defenders will win in the end — but the transition will be rough

If your agents lack a governance layer, you have a problem

What GitHub learned when better tools made Copilot code review worse

More articles by Paul Sawers