test-skill

Use when creating or editing skills, before deployment, to verify they work under pressure and resist rationalization - applies RED-GREEN-REFACTOR cycle to process documentation by running baseline without skill, writing to address failures, iterating to close loopholes

Quality

68%

Does it follow best practices?

Impact

—

No eval scenarios have been run

Securityby

Passed

No known issues

Fix and improve this skill with Tessl

tessl review fix ./plugins/customaize-agent/skills/test-skill/SKILL.md

Quality

Content

62%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

The skill is highly actionable with excellent workflow clarity — the RED-GREEN-REFACTOR cycle is well-defined with concrete examples, validation checkpoints, and feedback loops. However, it is significantly too verbose, repeating the same TDD mapping at least three times and including extensive explanations of concepts Claude already understands. The content would benefit greatly from being split across reference files rather than presented as a single monolithic document.

Suggestions

Cut the document by ~40%: remove the duplicate TDD mapping tables (keep only the Quick Reference one), eliminate 'The Bottom Line' and 'Real-World Impact' sections which restate already-covered points, and trim explanatory text like 'This is identical to TDD's write failing test first'.

Move the detailed pressure types table, common mistakes section, and the full TDD Skill Bulletproofing example into separate referenced files (e.g., PRESSURE_TYPES.md, COMMON_MISTAKES.md, EXAMPLES.md) to improve progressive disclosure.

Provide the referenced bundle files (examples/CLAUDE_MD_TESTING.md, persuasion-principles.md) or remove the references if they don't exist.

Dimension	Reasoning	Score
Conciseness	Extremely verbose at ~300+ lines. Repeats the TDD mapping table three times (overview table, detailed sections, quick reference). Explains TDD concepts Claude already knows. The 'Bottom Line' section restates what was already said multiple times. Pressure types table, common mistakes section, and many examples are redundant with each other.	1 / 3
Actionability	Highly actionable with concrete scenario templates, exact markdown formatting for pressure tests, specific before/after examples for skill refactoring, explicit A/B/C choice formats, and copy-paste ready testing setups. The meta-testing prompts and rationalization table formats are directly executable.	3 / 3
Workflow Clarity	The RED-GREEN-REFACTOR workflow is clearly sequenced with explicit validation checkpoints at each phase. The testing checklist provides a comprehensive verification loop. Feedback loops are explicit: 'If agent still fails → revise and re-test' and 'If agent finds NEW rationalization → Continue REFACTOR cycle.' The meta-testing section provides clear error recovery paths.	3 / 3
Progressive Disclosure	References two external files (examples/CLAUDE_MD_TESTING.md and persuasion-principles.md) which is good, but neither is provided in the bundle. The skill itself is monolithic — the pressure types table, common mistakes, rationalization examples, and the full worked example could all be split into separate reference files. Too much detail is inline for a skill of this complexity.	2 / 3
	Total	9 / 12 Passed

Description

75%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

The description has strong completeness with an explicit 'Use when' clause and occupies a distinctive niche, making it unlikely to conflict with other skills. However, the specificity of concrete actions could be improved, and the trigger terms lean toward specialized jargon rather than natural user language, which may reduce discoverability when users ask for skill testing or validation in plain terms.

Suggestions

Add more natural trigger terms users might say, such as 'test skill', 'validate skill', 'skill QA', 'check skill quality', or 'debug skill'.

Make the concrete actions more explicit — e.g., 'runs the skill against test scenarios, compares outputs with and without the skill, identifies failure cases, and iterates on the skill definition'.

Dimension	Reasoning	Score
Specificity	The description names a domain (skill testing/verification) and some actions ('running baseline without skill', 'writing to address failures', 'iterating to close loopholes'), but the actions are somewhat abstract and process-oriented rather than concrete discrete capabilities. Terms like 'resist rationalization' and 'RED-GREEN-REFACTOR cycle' describe methodology more than specific actions.	2 / 3
Completeness	The description explicitly answers both 'what' (applies RED-GREEN-REFACTOR cycle to process documentation by running baseline, writing to address failures, iterating) and 'when' ('Use when creating or editing skills, before deployment, to verify they work under pressure'). The 'Use when' clause is present and specific.	3 / 3
Trigger Term Quality	Contains some relevant keywords like 'skills', 'verify', 'deployment', 'RED-GREEN-REFACTOR', but many terms are specialized jargon ('resist rationalization', 'close loopholes', 'process documentation'). A user wanting to test a skill might say 'test my skill' or 'validate skill' — these natural phrasings are mostly absent.	2 / 3
Distinctiveness Conflict Risk	This skill occupies a very specific niche — testing/verifying skills using a RED-GREEN-REFACTOR methodology before deployment. It is unlikely to conflict with other skills due to its unique combination of skill verification, baseline testing, and iterative refinement.	3 / 3
	Total	10 / 12 Passed

Validation

100%

Warnings & errors only

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation — 11 / 11 Passed

Validation for skill structure

No warnings or errors.

Repository: NeoLabHQ/context-engineering-kit
Commit: 3711edf

Reviewed: 20 days ago

Table of Contents

Discovery Implementation Validation

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.