CtrlK
BlogDocsLog inGet started
Tessl Logo

antithesis-research

Analyze a codebase to figure out how it should be tested with Antithesis: map the system, identify failure-prone areas and testable properties, and produce the research artifacts needed for workload and environment planning.

60

Quality

68%

Does it follow best practices?

Impact

No eval scenarios have been run

SecuritybySnyk

Passed

No known issues

Optimize this skill with Tessl

npx tessl skill review --optimize ./antithesis-research/SKILL.md
SKILL.md
Quality
Evals
Security

Quality

Content

70%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This is a well-structured orchestration skill for a complex multi-phase research workflow. Its strengths are excellent progressive disclosure with clearly signaled references, well-sequenced workflows with validation checkpoints, and comprehensive self-review criteria. Its weaknesses are moderate verbosity (especially the self-review section which partially duplicates earlier content) and a lack of concrete executable examples — the skill tells Claude what to produce but defers nearly all 'how' to reference files that aren't available for evaluation.

Suggestions

Add a concrete example of the grep/ripgrep command for scanning Antithesis SDK assertions (step 4 of the full research pass) rather than just describing what to search for

Include an inline example of the provenance frontmatter format rather than requiring a reference file read just to understand the output format

Trim the self-review checklist by removing items that are direct restatements of the success criteria or output section — consolidate into one authoritative list

DimensionReasoningScore

Conciseness

The skill is fairly long but most content is necessary for a complex multi-phase research workflow. However, there's some redundancy — the self-review checklist repeats many points already covered in the workflows and output sections, and some definitions (like SUT, workload) are things Claude already knows. The reference threading instructions are also somewhat verbose.

2 / 3

Actionability

The skill provides clear step-by-step workflows and specific output file paths, which is good. However, it lacks concrete executable examples — there are no code snippets for searching for SDK assertions (e.g., grep/ripgrep commands), no example frontmatter format inline, and the actual analysis methodology is deferred entirely to reference files that aren't provided. The guidance is structured but largely procedural rather than executable.

2 / 3

Workflow Clarity

The three workflows (full research pass, targeted property research, property expansion) are clearly sequenced with numbered steps. The full research pass includes validation via a self-review checklist with explicit criteria. The property evaluation step serves as a feedback loop (evaluate → refine → fill gaps → escalate). The conditional logic in property expansion (when to run full evaluation vs skip) adds useful decision points.

3 / 3

Progressive Disclosure

The skill is well-structured as an orchestrator that points to 8 clearly-signaled reference files via a table with 'when to read' guidance. Content is appropriately split — the SKILL.md provides the overview, workflows, and success criteria while deferring methodology details to one-level-deep references. The reference table makes navigation easy and each reference has a clear purpose.

3 / 3

Total

10

/

12

Passed

Description

67%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

The description does well at specifying concrete actions and carving out a distinct niche around Antithesis testing analysis. However, it lacks an explicit 'Use when...' clause, which is important for Claude to know when to select this skill. Adding trigger guidance and more natural user-facing keywords would strengthen it.

Suggestions

Add an explicit 'Use when...' clause, e.g., 'Use when the user asks about testing with Antithesis, planning Antithesis workloads, or analyzing a codebase for Antithesis integration.'

Include more natural trigger terms users might say, such as 'fuzz testing', 'chaos testing', 'Antithesis setup', 'test environment planning', or 'reliability testing'.

DimensionReasoningScore

Specificity

Lists multiple specific concrete actions: 'map the system', 'identify failure-prone areas and testable properties', and 'produce the research artifacts needed for workload and environment planning'. These are clear, actionable capabilities.

3 / 3

Completeness

Clearly answers 'what does this do' with specific actions, but lacks an explicit 'Use when...' clause or equivalent trigger guidance. The 'when' is only implied by the description of capabilities, which caps this at 2 per the rubric guidelines.

2 / 3

Trigger Term Quality

Includes some relevant keywords like 'codebase', 'tested', 'Antithesis', 'failure-prone areas', 'testable properties', but misses common user variations. Users might say 'fuzz testing', 'chaos testing', 'test planning', or 'Antithesis setup' which aren't covered.

2 / 3

Distinctiveness Conflict Risk

The mention of 'Antithesis' as a specific testing platform creates a very clear niche. This is unlikely to conflict with general testing skills or code analysis skills due to the distinct Antithesis focus and the specific artifact types mentioned.

3 / 3

Total

10

/

12

Passed

Validation

100%

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation11 / 11 Passed

Validation for skill structure

No warnings or errors.

Repository
antithesishq/antithesis-skills
Reviewed

Table of Contents

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.