Back to articlesAnthropic launches AI code review agents that scan pull requests for bugs

12 Mar 20267 minute read

Paul Sawers

Freelance tech writer at Tessl, former TechCrunch senior writer covering startups and open source

Table of Contents

AI agents review pull requests in parallel

Internal testing shows higher review coverage

Developers question the AI-review loop

Code Review: Pricing and availability

Back to articles

Anthropic launches AI code review agents that scan pull requests for bugs

12 Mar 20267 minute read

Table of Contents

AI agents review pull requests in parallel

Internal testing shows higher review coverage

Developers question the AI-review loop

Code Review: Pricing and availability

Code review remains one of the slowest parts of modern software development. Engineers often wait hours or days for feedback on pull requests, and large changes can overwhelm human reviewers.

Anthropic says it has a new approach: send a team of AI agents to review the code instead. The company this week introduced Code Review, a feature inside Claude Code that automatically reviews code changes and surfaces potential bugs and design issues.

AI agents review pull requests in parallel

Anthropic describes Code Review as a multi-agent system that runs several AI reviewers in parallel to examine a pull request (PR).

“Code review has become a bottleneck, and we hear the same from customers every week,” the company wrote in a blog post. “They tell us developers are stretched thin, and many PRs get skims rather than deep reads.”

Before teams can use the feature, Code Review must first be enabled inside Claude Code’s organization settings. Administrators do this by selecting the Code Review preview and clicking “Setup,” which begins the process of connecting Claude to GitHub.

The setup installs the Claude GitHub app and grants permission to read pull request diffs and post review comments directly on a repository’s PRs.

Once the integration is complete, Claude can automatically analyze pull request changes as part of the normal review workflow.

After the connection is established, developers can run reviews against their codebase using Claude Code.

When a pull request opens, Claude dispatches multiple agents to inspect the changes. Each agent searches the diff and surrounding files for potential bugs or implementation problems. The findings then pass through a verification stage where additional agents review the results before anything is surfaced to developers, a step Anthropic says helps reduce false positives.

The results appear inside the pull request as a structured review. Claude posts a summary comment highlighting the most important issues, alongside inline comments attached to specific lines of code. Issues are ranked by severity so engineers can focus on the most critical fixes first.

Anthropic says this multi-agent approach helps the system handle large pull requests more effectively.

Internal testing shows higher review coverage

Anthropic says it has been running the tool internally on most pull requests during development of Claude Code.

According to the company, the share of pull requests receiving substantive review comments increased from 16% to 54% during testing. Engineers also marked fewer than 1% of findings as incorrect, suggesting the system produces relatively few false alarms.

Large pull requests appear to benefit the most. On changes exceeding 1,000 lines of code, Anthropic says 84% produced findings, with an average of 7.5 issues per pull request.

Developers question the AI-review loop

The launch has also prompted questions about the prospect of AI systems reviewing code written by other AI systems. As coding assistants generate larger volumes of software, automated review tools could increasingly end up evaluating AI-produced pull requests rather than human-written ones.

One Reddit user described Anthropic’s feature as essentially “an AI auditor for your AI-written code.”

In the Hacker News discussion following the announcement, some developers were openly skeptical of the premise. “The concept of ‘AI will review AI-authored PRs’ seems completely wrong to me,” one commenter wrote: “Why didn’t the AI write the correct code in the first place?”

Another questioned the incentives behind such tools, asking: “So their business model is to deliver me buggy code and then charge me to fix it?”

Others, meanwhile, questioned the idea at a more fundamental level: “Does AI review of AI generated code even make sense?,” one responder wrote.

The debate reflects a broader tension emerging around AI-assisted development. Coding tools are now capable of generating large amounts of software quickly, while systems like Anthropic’s Code Review are designed to inspect that code automatically before it reaches production.

In theory, that creates a development loop where AI systems both generate and review code changes. If that model becomes widespread, the human role in the process could shift toward oversight and deployment decisions rather than writing or reviewing the code itself — a recipe for AI-generated mayhem.

Code Review: Pricing and availability

Code Review is available as a research preview for Claude Code customers on Team and Enterprise plans.

Reviews are billed based on token usage and typically cost $15 to $25 per pull request, according to Anthropic. Pricing varies depending on the size and complexity of the code being analyzed.

That price point has already sparked discussion among developers. In a Hacker News thread following the launch, several engineers questioned the cost and how it compares with other AI-assisted code review tools.

Some contrasted the pricing with GitHub Copilot’s review system, where assigning Copilot as a reviewer consumes a “premium request” from a monthly allowance included with Copilot subscriptions. GitHub’s documentation states that each pull request review typically uses one request, with additional requests costing around $0.04 once the monthly quota is exhausted.

Others in the thread compared the pricing to third-party review tools such as Greptile, which offer a fixed number of reviews per month under a subscription model.

However the pricing is structured, the broader trend is clear: AI agents are moving beyond code generation into ongoing engineering tasks such as debugging, testing, and review.

If systems like this gain traction, pull requests may increasingly arrive with a first layer of automated analysis already completed.