CtrlK
BlogDocsLog inGet started
Tessl Logo

agent-ops-testing

Test strategy, execution, and coverage analysis. Use when designing tests, running test suites, or analyzing test results beyond baseline checks.

77

Quality

72%

Does it follow best practices?

Impact

Pending

No eval scenarios have been run

SecuritybySnyk

Passed

No known issues

Optimize this skill with Tessl

npx tessl skill review --optimize ./skills/core/agent-ops-testing/SKILL.md
SKILL.md
Quality
Evals
Security

Quality

Discovery

67%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

The description is structurally sound with a clear 'what' and 'when' clause, earning full marks on completeness. However, it remains somewhat generic in its specificity and trigger terms—listing broad actions like 'designing tests' and 'running test suites' without naming concrete frameworks, test types, or outputs. The 'beyond baseline checks' qualifier is an interesting differentiator but is too ambiguous to reliably prevent conflicts.

Suggestions

Add specific concrete actions such as 'generate unit/integration tests, measure code coverage, create test plans, debug failing tests' to improve specificity.

Include more natural trigger terms users would say, such as 'unit test', 'integration test', 'code coverage', 'TDD', 'test failures', 'pytest', 'jest', or 'test plan'.

Clarify what 'beyond baseline checks' means—e.g., 'Use this instead of basic linting or type-checking skills when the task involves test design, execution, or coverage measurement'—to reduce conflict risk.

DimensionReasoningScore

Specificity

Names the domain (testing) and some actions ('designing tests, running test suites, analyzing test results'), but the actions are somewhat general and not highly concrete—e.g., it doesn't specify types of tests, frameworks, or specific outputs like coverage reports or test plans.

2 / 3

Completeness

Clearly answers both 'what' (test strategy, execution, and coverage analysis) and 'when' (Use when designing tests, running test suites, or analyzing test results beyond baseline checks), with an explicit 'Use when...' clause.

3 / 3

Trigger Term Quality

Includes relevant terms like 'tests', 'test suites', 'test results', 'coverage analysis', and 'test strategy', but misses common natural variations users might say such as 'unit tests', 'integration tests', 'test coverage', 'TDD', 'pytest', 'jest', or 'code coverage'.

2 / 3

Distinctiveness Conflict Risk

The phrase 'beyond baseline checks' attempts to carve out a niche distinct from basic testing, but the boundary is vague—it's unclear what constitutes 'baseline checks' vs. this skill's scope, which could cause overlap with general coding or linting skills.

2 / 3

Total

9

/

12

Passed

Implementation

77%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This is a solid, actionable testing skill with clear workflows, validation checkpoints, and concrete examples. Its main weakness is length—it tries to cover test strategy, execution, isolation rules, coverage analysis, failure investigation, and issue discovery all in one file, leading to a somewhat verbose document that could benefit from splitting into focused sub-files. The test isolation section and confidence-based coverage thresholds are particularly well-done.

Suggestions

Split detailed sections (test isolation rules, coverage analysis, failure investigation) into separate reference files and link from the main SKILL.md to improve progressive disclosure.

Remove the 'Test Quality Checklist' and 'Anti-Patterns to Avoid' sections—these cover general testing knowledge Claude already possesses and add ~30 lines of low-value content.

DimensionReasoningScore

Conciseness

The skill is fairly comprehensive but includes some unnecessary verbosity—e.g., the 'When to Use' and 'Preconditions' sections add little value Claude couldn't infer, and the general test quality checklist (test behavior not implementation, deterministic, etc.) covers knowledge Claude already has. The coverage analysis section has some redundancy between the confidence-based thresholds and the general metrics table.

2 / 3

Actionability

The skill provides concrete, executable commands for multiple ecosystems, specific code examples (forbidden patterns vs correct patterns), exact coverage thresholds, formatted output templates, and step-by-step procedures. The test isolation section with forbidden/correct Python patterns is particularly actionable.

3 / 3

Workflow Clarity

Multi-step processes are clearly sequenced with explicit validation checkpoints—the incremental testing workflow has a stop-diagnose-fix loop, the bug fix workflow enforces write-failing-test-first, coverage checks have pass/fail gates with hard blocks for LOW confidence, and failure investigation has a systematic categorize-then-decide flow with clear handoff criteria.

3 / 3

Progressive Disclosure

The content is well-structured with clear headers and sections, but it's quite long (~200+ lines) and could benefit from splitting detailed sections (test isolation rules, coverage analysis, failure investigation) into separate reference files. References to external skills like 'agent-ops-debugging' and 'agent-ops-tasks' are mentioned but the main content is monolithic.

2 / 3

Total

10

/

12

Passed

Validation

90%

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation10 / 11 Passed

Validation for skill structure

CriteriaDescriptionResult

frontmatter_unknown_keys

Unknown frontmatter key(s) found; consider removing or moving to metadata

Warning

Total

10

/

11

Passed

Repository
majiayu000/claude-skill-registry
Reviewed

Table of Contents

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.