Analyze a codebase to figure out how it should be tested with Antithesis: map the system, identify failure-prone areas and testable properties, and produce the research artifacts needed for workload and environment planning.
78
72%
Does it follow best practices?
Impact
Pending
No eval scenarios have been run
Advisory
Suggest reviewing before use
Optimize this skill with Tessl
npx tessl skill review --optimize ./antithesis-research/SKILL.mdQuality
Discovery
67%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
The description is strong on specificity and distinctiveness due to the clear mention of Antithesis and concrete deliverables. However, it lacks an explicit 'Use when...' clause, which limits completeness, and could benefit from more natural trigger terms that users might use when requesting this type of analysis.
Suggestions
Add a 'Use when...' clause such as 'Use when the user wants to prepare a codebase for Antithesis testing, asks about autonomous testing setup, or needs to identify testable properties for reliability testing.'
Include additional natural trigger terms like 'fuzz testing', 'autonomous testing', 'reliability testing', 'test harness', or 'Antithesis setup' to improve discoverability.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | Lists multiple specific concrete actions: 'map the system', 'identify failure-prone areas and testable properties', and 'produce the research artifacts needed for workload and environment planning'. These are clear, actionable capabilities. | 3 / 3 |
Completeness | Clearly answers 'what does this do' (analyze codebase, map system, identify failure areas, produce artifacts), but lacks an explicit 'Use when...' clause or equivalent trigger guidance for when Claude should select this skill. | 2 / 3 |
Trigger Term Quality | Includes some relevant keywords like 'Antithesis', 'codebase', 'testable properties', 'failure-prone areas', but misses common user-facing variations like 'fuzz testing', 'autonomous testing', 'reliability testing', or 'test planning'. The term 'Antithesis' is quite specific and niche, which helps but also limits natural trigger coverage. | 2 / 3 |
Distinctiveness Conflict Risk | The mention of 'Antithesis' specifically, combined with the detailed scope of workload and environment planning artifacts, creates a very clear niche that is unlikely to conflict with general testing or code analysis skills. | 3 / 3 |
Total | 10 / 12 Passed |
Implementation
77%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This is a well-structured, highly actionable skill for a complex research workflow. Its greatest strengths are the concrete output specifications, clearly sequenced multi-step workflows, and thorough self-review criteria. The main weaknesses are moderate verbosity (particularly the definitions section and the lengthy self-review checklist) and the inability to verify the progressive disclosure structure without bundle files.
Suggestions
Consider moving the self-review checklist to a separate reference file (e.g., references/research-review-checklist.md) to reduce the main skill's length while keeping it accessible.
Trim the 'Definitions and Concepts' section to only Antithesis-specific terms that Claude wouldn't know (e.g., Test Template, Test Command, Timeline) — safety/liveness/reachability properties are standard CS concepts.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | The skill is fairly long but most content is necessary for a complex multi-artifact research workflow. Some sections like 'Definitions and Concepts' explain things Claude likely knows (e.g., safety vs liveness properties), and the self-review checklist is quite verbose with some redundancy to the success criteria stated at the top. However, the reference table and workflows are dense and purposeful. | 2 / 3 |
Actionability | The skill provides highly concrete guidance: specific file paths for all outputs, exact assertion function names to search for (assert_always!, assert_sometimes!, etc.), numbered step-by-step workflows, and clear artifact formats. The instructions are specific enough to be directly executable without ambiguity. | 3 / 3 |
Workflow Clarity | Three distinct workflows are clearly sequenced with numbered steps, each building on specific reference files. The full research pass includes validation via the property evaluation step and a comprehensive self-review checklist that serves as a verification checkpoint. The property expansion workflow includes a conditional evaluation gate (substantial vs small expansions), showing thoughtful workflow design. | 3 / 3 |
Progressive Disclosure | The skill references 8 external reference files via a well-organized table and workflows direct the reader to specific references at appropriate points. However, since no bundle files were provided, we cannot verify these references exist. The SKILL.md itself is quite long (~150+ lines of substantive content) and the self-review section could potentially be a separate reference file. The structure is good but the main file carries a lot of inline content. | 2 / 3 |
Total | 10 / 12 Passed |
Validation
100%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 11 / 11 Passed
Validation for skill structure
No warnings or errors.
a851a75
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.