CtrlK
BlogDocsLog inGet started
Tessl Logo

agent-ops-testing

Test strategy, execution, and coverage analysis. Use when designing tests, running test suites, or analyzing test results beyond baseline checks.

77

Quality

72%

Does it follow best practices?

Impact

Pending

No eval scenarios have been run

SecuritybySnyk

Passed

No known issues

Optimize this skill with Tessl

npx tessl skill review --optimize ./skills/core/agent-ops-testing/SKILL.md
SKILL.md
Quality
Evals
Security

Quality

Discovery

67%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

The description is structurally sound with a clear 'what' and 'when' clause, earning full marks on completeness. However, it lacks specificity in the concrete actions it performs and misses common trigger terms users would naturally use (e.g., 'unit test', 'code coverage', specific frameworks). The phrase 'beyond baseline checks' is somewhat ambiguous and could create confusion about when to select this skill versus a more general one.

Suggestions

Add specific concrete actions such as 'generate test plans, write unit/integration tests, measure code coverage, identify untested paths' to improve specificity.

Include more natural trigger terms users would say, such as 'unit tests', 'integration tests', 'code coverage', 'TDD', 'test plan', and common framework names like 'pytest', 'jest', or 'mocha'.

Clarify the boundary implied by 'beyond baseline checks'—specify what distinguishes this skill from basic test-related tasks to reduce conflict risk with other skills.

DimensionReasoningScore

Specificity

Names the domain (testing) and some actions ('designing tests, running test suites, analyzing test results'), but the actions are somewhat general and not highly concrete—e.g., it doesn't specify types of tests, frameworks, or specific outputs like coverage reports or test plans.

2 / 3

Completeness

Clearly answers both 'what' (test strategy, execution, and coverage analysis) and 'when' (Use when designing tests, running test suites, or analyzing test results beyond baseline checks), with an explicit 'Use when...' clause.

3 / 3

Trigger Term Quality

Includes relevant terms like 'tests', 'test suites', 'test results', 'coverage analysis', and 'test strategy', but misses common natural variations users might say such as 'unit tests', 'integration tests', 'test coverage', 'TDD', 'pytest', 'jest', or 'code coverage'.

2 / 3

Distinctiveness Conflict Risk

The phrase 'beyond baseline checks' attempts to carve out a niche distinct from basic testing, but 'designing tests' and 'running test suites' are broad enough to potentially overlap with general coding or CI/CD skills. The boundary between this and a baseline testing skill is unclear.

2 / 3

Total

9

/

12

Passed

Implementation

77%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This is a solid, actionable testing skill with excellent workflow clarity and concrete guidance. Its main weakness is length—it tries to be both an overview and a comprehensive reference in one file, leading to some verbosity and content that could be split out. Some sections cover knowledge Claude already possesses (general test quality principles, anti-patterns).

Suggestions

Trim the 'Test Quality Checklist' and 'Anti-Patterns to Avoid' sections significantly—Claude already knows these testing fundamentals; keep only project-specific conventions.

Split detailed sections (Test Isolation rules, Coverage Analysis, Failure Investigation) into separate referenced files to make SKILL.md a leaner overview with progressive disclosure.

DimensionReasoningScore

Conciseness

The skill is fairly comprehensive but includes some unnecessary verbosity—e.g., the 'When to Use' and 'Preconditions' sections add little value Claude couldn't infer, and the general test quality checklist (test behavior not implementation, deterministic, etc.) covers knowledge Claude already has. The coverage analysis section has some redundancy between the confidence-based thresholds table and the general coverage metrics table.

2 / 3

Actionability

The skill provides concrete, executable commands for multiple ecosystems (Python/pytest, TypeScript/vitest, .NET), specific code examples for forbidden vs. correct patterns, and detailed templates for coverage checks, failure analysis, and issue discovery output. The guidance is specific and copy-paste ready throughout.

3 / 3

Workflow Clarity

Multi-step processes are clearly sequenced with explicit validation checkpoints—incremental testing has a stop-diagnose-fix loop, bug fixes follow a write-fail-fix-verify sequence, coverage has hard/soft enforcement gates, and failure investigation has a systematic categorize-then-decide workflow with clear feedback loops.

3 / 3

Progressive Disclosure

The content is well-structured with clear headers and sections, but it's a long monolithic document (~200+ lines) that could benefit from splitting detailed sections (e.g., coverage analysis, failure investigation, test isolation) into separate referenced files. References to external skills like 'agent-ops-debugging' and 'agent-ops-tasks' are mentioned but the overall file is heavy for a SKILL.md overview.

2 / 3

Total

10

/

12

Passed

Validation

90%

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation10 / 11 Passed

Validation for skill structure

CriteriaDescriptionResult

frontmatter_unknown_keys

Unknown frontmatter key(s) found; consider removing or moving to metadata

Warning

Total

10

/

11

Passed

Repository
majiayu000/claude-skill-registry
Reviewed

Table of Contents

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.