Analyze a codebase to figure out how it should be tested with Antithesis: map the system, identify failure-prone areas and testable properties, and produce the research artifacts needed for workload and environment planning.
60
68%
Does it follow best practices?
Impact
—
No eval scenarios have been run
Advisory
Suggest reviewing before use
Optimize this skill with Tessl
npx tessl skill review --optimize ./antithesis-research/SKILL.mdQuality
Discovery
67%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
The description does a good job of specifying concrete actions and carving out a distinct niche around Antithesis testing analysis. However, it lacks an explicit 'Use when...' clause which caps completeness, and the trigger terms could be broader to capture more natural user phrasings. The specificity and distinctiveness are strong points.
Suggestions
Add an explicit 'Use when...' clause, e.g., 'Use when the user asks about testing with Antithesis, planning Antithesis workloads, or analyzing a codebase for autonomous/fuzz testing.'
Include additional natural trigger terms users might say, such as 'fuzz testing', 'reliability testing', 'autonomous testing', 'test harness', or 'Antithesis setup'.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | Lists multiple specific concrete actions: 'map the system', 'identify failure-prone areas and testable properties', and 'produce the research artifacts needed for workload and environment planning'. These are clear, actionable capabilities. | 3 / 3 |
Completeness | Clearly answers 'what does this do' (analyze codebase, map system, identify failure areas, produce artifacts), but lacks an explicit 'Use when...' clause or equivalent trigger guidance for when Claude should select this skill. | 2 / 3 |
Trigger Term Quality | Includes some relevant keywords like 'Antithesis', 'codebase', 'testable properties', 'failure-prone areas', but misses common user-facing variations like 'fuzz testing', 'autonomous testing', 'reliability testing', or 'test planning'. The term 'Antithesis' is quite specific and niche, which helps but also limits natural trigger coverage. | 2 / 3 |
Distinctiveness Conflict Risk | The mention of 'Antithesis' as a specific testing platform, combined with the particular workflow of mapping systems and producing research artifacts for workload/environment planning, creates a very clear niche that is unlikely to conflict with general testing or code analysis skills. | 3 / 3 |
Total | 10 / 12 Passed |
Implementation
70%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This is a well-structured orchestration skill for a complex multi-phase research workflow. Its greatest strengths are workflow clarity (three distinct, well-sequenced workflows with validation) and progressive disclosure (clean delegation to reference files with clear navigation). Its main weaknesses are moderate verbosity — particularly the lengthy self-review checklist that partially duplicates earlier content — and limited direct actionability since most concrete guidance lives in unreachable reference files.
Suggestions
Trim the self-review checklist to only items not already stated in the Output or Workflows sections, or restructure it as a compact checklist table to reduce redundancy.
Add at least one concrete inline example of a well-formed property entry or sut-analysis excerpt so the skill is partially actionable without reading all reference files.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | The skill is fairly long but most content is necessary for a complex multi-phase research workflow. However, there's some redundancy — the self-review checklist repeats many points already covered in the workflows and output sections, and some definitions (like 'SUT') are things Claude already knows. The reference table and workflow steps are efficient, but the overall document could be tightened by ~20%. | 2 / 3 |
Actionability | The workflows provide clear step sequences and the output artifacts are well-specified with paths and content expectations. However, the actual analysis work is delegated to reference files that aren't provided, making the skill itself more of an orchestration guide than directly executable. The assertion scanning step (step 4 of full research) is concrete and actionable, but most other steps say 'read reference X and follow its workflow' without inline specifics. | 2 / 3 |
Workflow Clarity | Three distinct workflows are clearly sequenced with numbered steps. The full research pass has a logical progression (setup → discovery → analysis → property discovery → topology → evaluation → refinement). The self-review section serves as an explicit validation checklist. The property evaluation step includes a feedback loop (apply refinements, fill gaps, escalate biases). Conditional logic is present (e.g., 'if scratchbook artifacts already exist, extend them'). | 3 / 3 |
Progressive Disclosure | The skill is well-structured as an orchestrator that points to 8 clearly-signaled reference files via a table with 'when to read' guidance. References are one level deep. The main file provides the overview, workflows, and success criteria while delegating detailed methodology to reference files. Navigation is clear with the reference table and explicit 'read X' instructions in workflows. | 3 / 3 |
Total | 10 / 12 Passed |
Validation
100%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 11 / 11 Passed
Validation for skill structure
No warnings or errors.
f837248
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.