Run e2e tests, fix flake and outdated tests, identify bugs against spec. Use when running e2e tests, debugging test failures, or fixing flaky tests. Never changes source code logic or API without spec backing.
92
Quality
96%
Does it follow best practices?
Impact
75%
1.22xAverage score across 3 eval scenarios
Passed
No known issues
Flaky test remediation
No waitForTimeout
100%
100%
Semantic role selectors
0%
0%
Web-first assertions
0%
100%
No arbitrary delays
100%
100%
Mock before navigation
100%
100%
No manual retry loops
66%
100%
Assertions not weakened
100%
100%
No source code changes
100%
100%
Without context: $0.3058 · 1m 22s · 17 turns · 22 in / 4,701 out tokens
With context: $0.4967 · 1m 56s · 24 turns · 61 in / 6,802 out tokens
Failure taxonomy and fix ordering
Auth failure: outdated category
0%
0%
Auth fix: test updated not source
0%
0%
Transfer failure: flaky category
100%
100%
Settings failure: unverified category
0%
0%
No source changes for non-bug
0%
0%
Failure table produced
50%
75%
Fix ordering
0%
0%
Report: E2E Results header
0%
50%
Report: Fixed section
0%
75%
Report: Remaining Failures section
0%
25%
Source code boundary respected
100%
100%
Report: Unit Tests Added section
0%
87%
Without context: $0.3934 · 1m 45s · 22 turns · 23 in / 6,394 out tokens
With context: $0.7096 · 2m 56s · 26 turns · 29 in / 11,054 out tokens
Bug fix with spec and TDD gate
Bug classification
100%
100%
Spec section cited
100%
100%
Source code fixed
100%
100%
E2E test unchanged
100%
100%
Unit tests produced
100%
100%
Unit test covers fix
100%
100%
No API contract changes
100%
100%
No source logic removed
100%
100%
Investigation report produced
100%
100%
Without context: $0.3898 · 1m 37s · 23 turns · 30 in / 5,228 out tokens
With context: $0.5386 · 2m 5s · 26 turns · 276 in / 6,465 out tokens
5342bca
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.