Triage Antithesis test reports to understand what happened in a run: look up runs, check status, investigate failed properties (assertions), view metadata, download logs, inspect findings, and examine environmental details. Load after a run completes or when investigating a failure.
75
92%
Does it follow best practices?
Impact
—
No eval scenarios have been run
Passed
No known issues
Quality
Discovery
100%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
This is a strong skill description that clearly identifies the tool (Antithesis), lists specific concrete actions, and provides explicit trigger conditions. It uses proper third-person voice and covers both the 'what' and 'when' dimensions effectively, making it easy for Claude to select this skill appropriately from a large pool.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | Lists multiple specific concrete actions: look up runs, check status, investigate failed properties (assertions), view metadata, download logs, inspect findings, and examine environmental details. | 3 / 3 |
Completeness | Clearly answers both what ('Triage Antithesis test reports... look up runs, check status, investigate failed properties...') and when ('Load after a run completes or when investigating a failure'), with explicit trigger guidance. | 3 / 3 |
Trigger Term Quality | Includes natural keywords users would say: 'test reports', 'run', 'failed properties', 'assertions', 'logs', 'findings', 'failure', 'triage', 'Antithesis'. These cover the domain well and match how users would describe investigating test failures. | 3 / 3 |
Distinctiveness Conflict Risk | Highly distinctive due to the specific product name 'Antithesis' and the focused domain of test report triage. Unlikely to conflict with generic testing or logging skills due to the specific tool reference and detailed action list. | 3 / 3 |
Total | 12 / 12 Passed |
Implementation
85%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This is a strong, well-structured triage skill with excellent workflow clarity and progressive disclosure. The actionability is high with concrete commands and specific field names throughout. The main weakness is moderate verbosity — the follow-up skills section and some repeated guidance about log importance could be tightened to save tokens.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | The skill is mostly efficient and avoids explaining concepts Claude already knows, but there's some verbosity — the preflight section is somewhat lengthy, the 'Suggesting follow-up skills' section includes detailed descriptions of four other skills that could be more concise, and some guidance is repeated (e.g., the importance of logs is stated multiple times). | 2 / 3 |
Actionability | The skill provides concrete, executable commands throughout (e.g., `snouty runs --json properties`, `snouty runs --json events ${RUN_ID} ${PROPERTY_NAME}`, `snouty runs --json build-logs ${RUN_ID}`). Steps are specific with exact flags, field names to check, and clear decision points. The guidance is copy-paste ready and leaves little ambiguity about what to do. | 3 / 3 |
Workflow Clarity | Multiple workflows are clearly sequenced with explicit validation checkpoints — the preflight checklist stops at the first failure, the triage workflow checks for null triage_report before proceeding, incomplete runs have a distinct diagnostic path, and the 'Investigate failed properties' workflow includes feedback loops (downloading additional logs, comparing passing vs failing examples). The self-review section adds a final validation checkpoint. | 3 / 3 |
Progressive Disclosure | The skill is well-structured as an overview that delegates detailed content to reference files (run-discovery.md, run-info.md, properties.md, logs.md) at exactly the point they're needed. References are one level deep, clearly signaled, and the skill explicitly instructs not to read them all up front. Navigation is intuitive. | 3 / 3 |
Total | 11 / 12 Passed |
Validation
100%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 11 / 11 Passed
Validation for skill structure
No warnings or errors.
f837248
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.