Reference tile for Themis, a Node.js and TypeScript unit test framework designed for AI coding agents. Covers unit-test authoring, Jest/Vitest migration, agent-readable failure output with repair hints, and first-class integrations for Claude Code, Cursor, and generic agents.
96
94%
Does it follow best practices?
Impact
97%
2.69xAverage score across 10 eval scenarios
Passed
No known issues
{
"context": "Tests whether the agent uses --reporter agent to get structured JSON failure output rather than parsing raw stack traces, and uses --rerun-failed for efficient re-validation after fixes. Also checks that the agent knows the specific fields available in the structured output.",
"type": "weighted_checklist",
"checklist": [
{
"name": "--reporter agent in script",
"description": "fix-workflow.sh includes a `themis test --reporter agent` command (or npx/bunx equivalent) as the primary failure diagnosis step",
"max_score": 20
},
{
"name": "--rerun-failed in script",
"description": "fix-workflow.sh includes `themis test --rerun-failed` (or equivalent) as the step used to confirm fixes without running the full suite",
"max_score": 20
},
{
"name": "failures[].cluster mentioned",
"description": "DIAGNOSIS_REPORT.md mentions `failures[].cluster` (or `cluster`) as a field that groups failures by likely common cause",
"max_score": 12
},
{
"name": "failures[].repairHints mentioned",
"description": "DIAGNOSIS_REPORT.md mentions `failures[].repairHints` (or `repairHints`) as structured suggestions that can be acted on directly",
"max_score": 12
},
{
"name": "sourceFile/lineNumber mentioned",
"description": "DIAGNOSIS_REPORT.md mentions `sourceFile` and/or `lineNumber` fields as pre-parsed location information",
"max_score": 10
},
{
"name": "No stack trace re-reading",
"description": "DIAGNOSIS_REPORT.md explicitly states that re-reading stack traces is unnecessary because the structured output pre-parses location information",
"max_score": 10
},
{
"name": "expected/actual mentioned",
"description": "DIAGNOSIS_REPORT.md mentions `expected` and/or `actual` fields from the structured failure output",
"max_score": 8
},
{
"name": "Script comment explains why agent reporter",
"description": "fix-workflow.sh contains a comment explaining that --reporter agent provides structured/machine-readable output (not just naming the flag)",
"max_score": 8
}
]
}evals
scenario-1
scenario-2
scenario-3
scenario-4
scenario-5
scenario-6
scenario-7
scenario-8
scenario-9
scenario-10
rules
skills
themis