agent-ops-testing

Test strategy, execution, and coverage analysis. Use when designing tests, running test suites, or analyzing test results beyond baseline checks.

Quality

72%

Does it follow best practices?

Impact

Pending

No eval scenarios have been run

Securityby

Passed

No known issues

Optimize this skill with Tessl

npx tessl skill review --optimize ./core/agent-ops-testing/SKILL.md

Quality

Discovery

67%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

The description is structurally sound with a clear 'what' and 'when' clause, earning full marks on completeness. However, it lacks specificity in the concrete actions it performs and misses common trigger terms users would naturally use (e.g., 'unit test', 'code coverage', specific frameworks). The phrase 'beyond baseline checks' is somewhat ambiguous and could create confusion about when to select this skill versus a more general one.

Suggestions

Add specific concrete actions such as 'generate test plans, write unit/integration tests, measure code coverage, identify untested paths' to improve specificity.

Include more natural trigger terms users would say, such as 'unit tests', 'integration tests', 'code coverage', 'TDD', 'test plan', and common framework names like 'pytest', 'jest', or 'mocha'.

Clarify the boundary implied by 'beyond baseline checks'—specify what distinguishes this skill from basic test-related tasks to reduce conflict risk with other skills.

Dimension	Reasoning	Score
Specificity	Names the domain (testing) and some actions ('designing tests, running test suites, analyzing test results'), but the actions are somewhat general and not highly concrete—e.g., it doesn't specify types of tests, frameworks, or specific outputs like coverage reports or test plans.	2 / 3
Completeness	Clearly answers both 'what' (test strategy, execution, and coverage analysis) and 'when' (Use when designing tests, running test suites, or analyzing test results beyond baseline checks), with an explicit 'Use when...' clause.	3 / 3
Trigger Term Quality	Includes relevant terms like 'tests', 'test suites', 'test results', 'coverage analysis', and 'test strategy', but misses common natural variations users might say such as 'unit tests', 'integration tests', 'test coverage', 'TDD', 'pytest', 'jest', or 'code coverage'.	2 / 3
Distinctiveness Conflict Risk	The phrase 'beyond baseline checks' attempts to carve out a niche distinct from basic testing, but 'designing tests' and 'running test suites' are broad enough to potentially overlap with general coding or CI/CD skills. The boundary between this and a baseline testing skill is unclear.	2 / 3
	Total	9 / 12 Passed

Implementation

77%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This is a solid, actionable testing skill with excellent workflow clarity and concrete guidance. Its main weakness is length—it tries to be both an overview and a comprehensive reference in one file, leading to some verbosity and content that could be split out. Some sections cover knowledge Claude already possesses (general test quality principles, anti-patterns).

Suggestions

Trim the 'Test Quality Checklist' and 'Anti-Patterns to Avoid' sections significantly—Claude already knows these testing fundamentals; keep only project-specific conventions.

Split detailed sections (Test Isolation rules, Coverage Analysis, Failure Investigation) into separate referenced files to make SKILL.md a leaner overview with progressive disclosure.

Dimension	Reasoning	Score
Conciseness	The skill is fairly comprehensive but includes some unnecessary verbosity—e.g., the 'When to Use' and 'Preconditions' sections add little value Claude couldn't infer, and the general test quality checklist (test behavior not implementation, deterministic, etc.) covers knowledge Claude already has. The coverage analysis section has some redundancy between the confidence-based thresholds table and the general coverage metrics table.	2 / 3
Actionability	The skill provides concrete, executable commands for multiple ecosystems (Python/pytest, TypeScript/vitest, .NET), specific code examples for forbidden vs. correct patterns, and detailed templates for coverage checks, failure analysis, and issue discovery output. The guidance is specific and copy-paste ready throughout.	3 / 3
Workflow Clarity	Multi-step processes are clearly sequenced with explicit validation checkpoints—incremental testing has a stop-diagnose-fix loop, bug fixes follow a write-fail-fix-verify sequence, coverage has hard/soft enforcement gates, and failure investigation has a systematic categorize-then-decide workflow with clear feedback loops.	3 / 3
Progressive Disclosure	The content is well-structured with clear headers and sections, but it's a long monolithic document (~200+ lines) that could benefit from splitting detailed sections (e.g., coverage analysis, failure investigation, test isolation) into separate referenced files. References to external skills like 'agent-ops-debugging' and 'agent-ops-tasks' are mentioned but the overall file is heavy for a SKILL.md overview.	2 / 3
	Total	10 / 12 Passed

Validation

90%

Warnings & errors only

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation — 10 / 11 Passed

Validation for skill structure

Criteria	Description	Result
frontmatter_unknown_keys	Unknown frontmatter key(s) found; consider removing or moving to metadata	Warning

	Total	10 / 11 Passed

Repository: majiayu000/claude-skill-registry-data
Commit: 6770aaa

Reviewed: 5 days ago

Table of Contents

Discovery Implementation Validation

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.