ARTICLE

We ran Composer 2.5 and 2.5 Fast across 11 skills. Surprisingly, Fast won.

Discover why Composer 2.5 Fast outperforms Composer 2.5 in speed and skill across 11 benchmarks. Upgrade now for faster results at no extra cost.

Simon Maple

·28 May 2026·6 min read

Cursor just shipped Composer 2.5 and Composer 2.5 Fast. We benchmarked both across 11 engineering skills, 5 scenarios per skill, averaged across three independent LLM judges. The fast model scored higher, ran 32% quicker, and costs exactly the same. If you are reaching for Composer 2.5 over Composer 2.5 Fast, you are paying the same price for a slower, slightly worse model.

Here is the full picture.

TL;DR

Composer 2.5 Fast scores 92.7% with skill context. Composer 2.5 scores 92.1%. Fast wins.
Both are ahead of gpt-5.5, gpt-5.4, and the previous Composer 2.
The fast model completes scenarios in 59 seconds on average. The regular model takes 87 seconds.

Where They Land in the Benchmark

We ran 6 models across 11 skills, scoring each run with three independent judges and averaging the results. Here is where the full leaderboard sits:

Model	Avg baseline	Avg with-skill	Lift
opus-4-7	80.8%	93.4%	+12.6
composer-2.5-fast	79.6%	92.7%	+13.1
composer-2.5	79.0%	92.1%	+13.1
composer-2	74.2%	89.6%	+15.4
gpt-5.5	75.5%	89.4%	+13.9
gpt-5.4	74.1%	89.3%	+15.2
gpt-5.3	65.5%	83.9%	+18.4
gpt-5-codex	68.7%	78.7%	+10.0

Composer 2.5 Fast sits 1.3 points behind opus-4-7 and 3.3 points clear of everything else. That is a meaningful gap. The previous Composer 2 sits alongside gpt-5.4 and gpt-5.5 at roughly 89-90%. Cursor has moved its own model up a full competitive tier in a single release.

The Fast model seems better.

Normally a "fast" variant trades quality for speed. Composer 2.5 Fast does not do that. It scores 0.6 points higher than the regular model while running 28 seconds faster per scenario (59s vs 87s on average across 110 scored runs).

The per-skill breakdown shows where the differences accumulate:

Skill	2.5 with-skill	2.5-fast with-skill	Winner
documentation	97%	98%	fast
fastify	99%	94%	2.5
init	87%	86%	2.5
linting	98%	99%	fast
node-best-practices	95%	95%	tie
nodejs-core	98%	98%	tie
oauth	92%	89%	2.5
octocat	95%	96%	fast
skill-optimizer	98%	98%	tie
snipgrapher	93%	93%	tie
typescript	82%	76%	2.5

The regular model wins on fastify (+5), oauth (+3), and typescript (+6). The fast model wins on documentation, linting, and octocat. For most skills they are within noise. The overall average breaks toward fast because it avoids some of the deeper failures the regular model hits on documentation and linting under stricter judges.

The typescript result is worth flagging separately. Both models score lower with skill context than without it on typescript. The regular model drops from baseline to 82% with skill; the fast model drops further to 76%. Something about how these models interact with the typescript skill works against them. If typescript is central to your workflow, treat this as a yellow flag worth investigating.

The Cost Argument

Both Composer 2.5 variants are part of the Cursor subscription. The marginal cost of choosing one over the other is zero. There is no per-token bill that changes when you switch from the regular to the fast model.

This makes the benchmark result unusually clean: faster, cheaper (relatively), and better. The only case where you might prefer the regular model is if you are working heavily in fastify or oauth-heavy codebases where it holds a consistent 3-5 point lead. For everything else, the fast model is the better default.

Compare this to the OpenAI side of the leaderboard. gpt-5.5 and gpt-5.4 both land around 89%, behind both Composer 2.5 variants, and carry per-token API costs that accumulate with usage. The Cursor subscription gives you a stronger model at a fixed price, which changes the economics significantly if you are running agents at any kind of scale.

What Changed from Composer 2

The gap between Composer 2 and Composer 2.5 is larger than the leaderboard position suggests. The with-skill scores are 89.6% vs 92.1-92.7%, a 2.5-3 point jump. More importantly, the baseline scores tell a different story: Composer 2 sits at 74.2% without context, while Composer 2.5 sits at 79-80%. That 5-6 point baseline improvement means the new model is genuinely stronger at the task, not just better at following instructions when given them.

The lift numbers reinforce this. Composer 2 shows +15.4 points of lift from skill context. Both 2.5 variants show +13.1. A lower lift number means the model needs less scaffolding to perform well. Composer 2 was getting more out of the skill context because it needed it more. Composer 2.5 is a better baseline model that skills push even higher.

The One Caveat

These scores are averaged across three judges (Sonnet, GPT-5.5, Opus-4-7). The raw Sonnet-only scores for Composer 2.5 were 94% and 92%, which looked even better. After applying stricter judges, the numbers settled at 92.1% and 92.7%. That is the correct comparison to make against the other models in this benchmark, which went through the same three-judge process. A single-judge Sonnet score would have overstated the gap.

COPY & SHARE

Simon Maple

Simon Maple is Tessl’s Founding Developer Advocate, a Java Champion, and former DevRel leader at Snyk, ZeroTurnaround, and IBM.

27 posts

READING

IN THIS POST

Where They Land in the Benchmark The Fast model seems better.The Cost Argument What Changed from Composer 2 The One Caveat

COPY & SHARE

Simon Maple

Simon Maple is Tessl’s Founding Developer Advocate, a Java Champion, and former DevRel leader at Snyk, ZeroTurnaround, and IBM.

27 posts

YOUR NEXT READ

The new Tessl review: now you decide what "good" looks like:

The new Tessl review lets users define their own criteria for skill quality, offers agent-based accuracy, and maintains a history of review runs.

Simon Maple

·23 Jun 2026·7 min read

We ran Composer 2.5 and 2.5 Fast across 11 skills. Surprisingly, Fast won.

Where They Land in the Benchmark

The Fast model seems better.

The Cost Argument

What Changed from Composer 2

The One Caveat

The new Tessl review: now you decide what "good" looks like:

More articles by Simon Maple