Test strategy, execution, and coverage analysis. Use when designing tests, running test suites, or analyzing test results beyond baseline checks.
77
72%
Does it follow best practices?
Impact
Pending
No eval scenarios have been run
Passed
No known issues
Optimize this skill with Tessl
npx tessl skill review --optimize ./skills/core/agent-ops-testing/SKILL.mdQuality
Discovery
67%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
The description is structurally sound with a clear 'what' and 'when' clause, earning full marks on completeness. However, it lacks specificity in the concrete actions it performs and misses common trigger terms users would naturally use (e.g., 'unit test', 'code coverage', specific frameworks). The phrase 'beyond baseline checks' is somewhat ambiguous and could create confusion about when to select this skill versus a more general one.
Suggestions
Add specific concrete actions such as 'generate test plans, write unit/integration tests, measure code coverage, identify untested paths' to improve specificity.
Include more natural trigger terms users would say, such as 'unit tests', 'integration tests', 'code coverage', 'TDD', 'test plan', and common framework names like 'pytest', 'jest', or 'mocha'.
Clarify the boundary implied by 'beyond baseline checks'—specify what distinguishes this skill from basic test-related tasks to reduce conflict risk with other skills.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | Names the domain (testing) and some actions ('designing tests, running test suites, analyzing test results'), but the actions are somewhat general and not highly concrete—e.g., it doesn't specify types of tests, frameworks, or specific outputs like coverage reports or test plans. | 2 / 3 |
Completeness | Clearly answers both 'what' (test strategy, execution, and coverage analysis) and 'when' (Use when designing tests, running test suites, or analyzing test results beyond baseline checks), with an explicit 'Use when...' clause. | 3 / 3 |
Trigger Term Quality | Includes relevant terms like 'tests', 'test suites', 'test results', 'coverage analysis', and 'test strategy', but misses common natural variations users might say such as 'unit tests', 'integration tests', 'test coverage', 'TDD', 'pytest', 'jest', or 'code coverage'. | 2 / 3 |
Distinctiveness Conflict Risk | The phrase 'beyond baseline checks' attempts to carve out a niche distinct from basic testing, but 'designing tests' and 'running test suites' are broad enough to potentially overlap with general coding or CI/CD skills. The boundary between this and a baseline testing skill is unclear. | 2 / 3 |
Total | 9 / 12 Passed |
Implementation
77%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This is a solid, actionable testing skill with excellent workflow clarity and concrete guidance. Its main weakness is length—it tries to be both an overview and a comprehensive reference in one file, leading to some verbosity and content that could be split out. Some sections cover knowledge Claude already possesses (general test quality principles, anti-patterns).
Suggestions
Trim the 'Test Quality Checklist' and 'Anti-Patterns to Avoid' sections significantly—Claude already knows these testing fundamentals; keep only project-specific conventions.
Split detailed sections (Test Isolation rules, Coverage Analysis, Failure Investigation) into separate referenced files to make SKILL.md a leaner overview with progressive disclosure.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | The skill is fairly comprehensive but includes some unnecessary verbosity—e.g., the 'When to Use' and 'Preconditions' sections add little value Claude couldn't infer, and the general test quality checklist (test behavior not implementation, deterministic, etc.) covers knowledge Claude already has. The coverage analysis section has some redundancy between the confidence-based thresholds table and the general coverage metrics table. | 2 / 3 |
Actionability | The skill provides concrete, executable commands for multiple ecosystems (Python/pytest, TypeScript/vitest, .NET), specific code examples for forbidden vs. correct patterns, and detailed templates for coverage checks, failure analysis, and issue discovery output. The guidance is specific and copy-paste ready throughout. | 3 / 3 |
Workflow Clarity | Multi-step processes are clearly sequenced with explicit validation checkpoints—incremental testing has a stop-diagnose-fix loop, bug fixes follow a write-fail-fix-verify sequence, coverage has hard/soft enforcement gates, and failure investigation has a systematic categorize-then-decide workflow with clear feedback loops. | 3 / 3 |
Progressive Disclosure | The content is well-structured with clear headers and sections, but it's a long monolithic document (~200+ lines) that could benefit from splitting detailed sections (e.g., coverage analysis, failure investigation, test isolation) into separate referenced files. References to external skills like 'agent-ops-debugging' and 'agent-ops-tasks' are mentioned but the overall file is heavy for a SKILL.md overview. | 2 / 3 |
Total | 10 / 12 Passed |
Validation
90%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 10 / 11 Passed
Validation for skill structure
| Criteria | Description | Result |
|---|---|---|
frontmatter_unknown_keys | Unknown frontmatter key(s) found; consider removing or moving to metadata | Warning |
Total | 10 / 11 Passed | |
d156cd1
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.