CtrlK
BlogDocsLog inGet started
Tessl Logo

antithesis-triage

Triage Antithesis test reports to understand what happened in a run: look up runs, check status, investigate failed properties (assertions), view metadata, download logs, inspect findings, and examine environmental details. Load after a run completes or when investigating a failure.

70

Quality

85%

Does it follow best practices?

Impact

No eval scenarios have been run

SecuritybySnyk

Advisory

Suggest reviewing before use

SKILL.md
Quality
Evals
Security

Quality

Content

85%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This is a well-structured triage skill with clear, actionable workflows, excellent progressive disclosure via reference files loaded on demand, and strong validation checkpoints throughout. The main weakness is moderate verbosity in the follow-up skills section and self-review criteria, which could be tightened. Overall it is a high-quality skill that effectively guides Claude through complex multi-step triage operations.

DimensionReasoningScore

Conciseness

The skill is mostly efficient and avoids explaining concepts Claude already knows, but there is some verbosity — e.g., the 'Suggesting follow-up skills' section is lengthy with descriptions of each skill that could be more compact, and the 'Self-Review' section repeats guidance already implicit in the workflows. Some tightening is possible without losing clarity.

2 / 3

Actionability

The skill provides concrete, executable commands (snouty runs --json properties, snouty runs --json logs $RUN_ID $INPUT_HASH $VTIME, snouty doctor, etc.) with specific flags and arguments. Steps are precise about which fields to check, how to match them, and what to do with the results. The guidance is copy-paste ready and specific.

3 / 3

Workflow Clarity

Multiple workflows are clearly sequenced with explicit validation checkpoints (preflight with snouty doctor, checking for null triage_report before proceeding, verifying run status before choosing workflow). The 'Investigate failed properties' workflow includes a feedback loop (download more logs if uncertain, compare passing vs failing). The 'Diagnose incomplete run' workflow has clear fallback steps. Error recovery paths are well-defined.

3 / 3

Progressive Disclosure

The skill is structured as an overview that explicitly references detailed guides in references/ (run-discovery.md, run-info.md, properties.md, logs.md) at the exact points where they are needed, with a clear instruction not to read them all up front. References are one level deep and clearly signaled. The content is well-organized with distinct workflow sections.

3 / 3

Total

11

/

12

Passed

Description

85%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

This is a strong skill description that clearly enumerates specific capabilities and provides explicit trigger conditions. The main weakness is in trigger term quality — while 'Antithesis' correctly scopes the skill, users might use alternative phrasings like 'test results' or 'test failures' that aren't covered. Overall, it effectively communicates both purpose and activation context.

Suggestions

Add more natural trigger term variations users might say, such as 'test results', 'test failures', 'why did my test fail', or 'debug test run'.

DimensionReasoningScore

Specificity

Lists multiple specific concrete actions: look up runs, check status, investigate failed properties (assertions), view metadata, download logs, inspect findings, and examine environmental details.

3 / 3

Completeness

Clearly answers both what ('Triage Antithesis test reports... look up runs, check status, investigate failed properties...') and when ('Load after a run completes or when investigating a failure'), providing explicit trigger guidance.

3 / 3

Trigger Term Quality

Includes some relevant terms like 'test reports', 'run', 'failed properties', 'logs', 'findings', but 'Antithesis' is a specific product name that limits natural keyword coverage. Missing common variations users might say like 'test results', 'test failures', 'debugging tests', or 'why did my test fail'.

2 / 3

Distinctiveness Conflict Risk

Very distinct niche — specifically targets 'Antithesis test reports' which is a named product/platform, making it highly unlikely to conflict with other skills. The combination of the product name and specific actions creates a clear, unique identity.

3 / 3

Total

11

/

12

Passed

Validation

100%

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation11 / 11 Passed

Validation for skill structure

No warnings or errors.

Repository
antithesishq/antithesis-skills
Reviewed

Table of Contents

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.