Name: vitron-ai/themis
Rating: 96.7 (1 reviews)
Author: vitron-ai

vitron-ai/themis

Reference tile for Themis, a Node.js and TypeScript unit test framework designed for AI coding agents. Covers unit-test authoring, Jest/Vitest migration, agent-readable failure output with repair hints, and first-class integrations for Claude Code, Cursor, and generic agents.

2.69x

Quality

94%

Does it follow best practices?

Impact

97%

2.69x

Average score across 10 eval scenarios

Securityby

Passed

No known issues

{
  "context": "Tests whether the agent uses --reporter agent to get structured JSON failure output rather than parsing raw stack traces, and uses --rerun-failed for efficient re-validation after fixes. Also checks that the agent knows the specific fields available in the structured output.",
  "type": "weighted_checklist",
  "checklist": [
    {
      "name": "--reporter agent in script",
      "description": "fix-workflow.sh includes a `themis test --reporter agent` command (or npx/bunx equivalent) as the primary failure diagnosis step",
      "max_score": 20
    },
    {
      "name": "--rerun-failed in script",
      "description": "fix-workflow.sh includes `themis test --rerun-failed` (or equivalent) as the step used to confirm fixes without running the full suite",
      "max_score": 20
    },
    {
      "name": "failures[].cluster mentioned",
      "description": "DIAGNOSIS_REPORT.md mentions `failures[].cluster` (or `cluster`) as a field that groups failures by likely common cause",
      "max_score": 12
    },
    {
      "name": "failures[].repairHints mentioned",
      "description": "DIAGNOSIS_REPORT.md mentions `failures[].repairHints` (or `repairHints`) as structured suggestions that can be acted on directly",
      "max_score": 12
    },
    {
      "name": "sourceFile/lineNumber mentioned",
      "description": "DIAGNOSIS_REPORT.md mentions `sourceFile` and/or `lineNumber` fields as pre-parsed location information",
      "max_score": 10
    },
    {
      "name": "No stack trace re-reading",
      "description": "DIAGNOSIS_REPORT.md explicitly states that re-reading stack traces is unnecessary because the structured output pre-parses location information",
      "max_score": 10
    },
    {
      "name": "expected/actual mentioned",
      "description": "DIAGNOSIS_REPORT.md mentions `expected` and/or `actual` fields from the structured failure output",
      "max_score": 8
    },
    {
      "name": "Script comment explains why agent reporter",
      "description": "fix-workflow.sh contains a comment explaining that --reporter agent provides structured/machine-readable output (not just naming the flag)",
      "max_score": 8
    }
  ]
}

vitron-ai/themis

criteria.json.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}evals/scenario-6/

criteria.jsonevals/scenario-6/