CtrlK
BlogDocsLog inGet started
Tessl Logo

test-skill

Use when creating or editing skills, before deployment, to verify they work under pressure and resist rationalization - applies RED-GREEN-REFACTOR cycle to process documentation by running baseline without skill, writing to address failures, iterating to close loopholes

60

Quality

68%

Does it follow best practices?

Impact

No eval scenarios have been run

SecuritybySnyk

Passed

No known issues

Fix and improve this skill with Tessl

tessl review fix ./plugins/customaize-agent/skills/test-skill/SKILL.md
SKILL.md
Quality
Evals
Security

Quality

Content

62%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

The skill is highly actionable with excellent workflow clarity — the RED-GREEN-REFACTOR cycle is well-defined with concrete examples, validation checkpoints, and feedback loops. However, it is significantly too verbose, repeating the same TDD mapping at least three times and including extensive explanations of concepts Claude already understands. The content would benefit greatly from being split across reference files rather than presented as a single monolithic document.

Suggestions

Cut the document by ~40%: remove the duplicate TDD mapping tables (keep only the Quick Reference one), eliminate 'The Bottom Line' and 'Real-World Impact' sections which restate already-covered points, and trim explanatory text like 'This is identical to TDD's write failing test first'.

Move the detailed pressure types table, common mistakes section, and the full TDD Skill Bulletproofing example into separate referenced files (e.g., PRESSURE_TYPES.md, COMMON_MISTAKES.md, EXAMPLES.md) to improve progressive disclosure.

Provide the referenced bundle files (examples/CLAUDE_MD_TESTING.md, persuasion-principles.md) or remove the references if they don't exist.

DimensionReasoningScore

Conciseness

Extremely verbose at ~300+ lines. Repeats the TDD mapping table three times (overview table, detailed sections, quick reference). Explains TDD concepts Claude already knows. The 'Bottom Line' section restates what was already said multiple times. Pressure types table, common mistakes section, and many examples are redundant with each other.

1 / 3

Actionability

Highly actionable with concrete scenario templates, exact markdown formatting for pressure tests, specific before/after examples for skill refactoring, explicit A/B/C choice formats, and copy-paste ready testing setups. The meta-testing prompts and rationalization table formats are directly executable.

3 / 3

Workflow Clarity

The RED-GREEN-REFACTOR workflow is clearly sequenced with explicit validation checkpoints at each phase. The testing checklist provides a comprehensive verification loop. Feedback loops are explicit: 'If agent still fails → revise and re-test' and 'If agent finds NEW rationalization → Continue REFACTOR cycle.' The meta-testing section provides clear error recovery paths.

3 / 3

Progressive Disclosure

References two external files (examples/CLAUDE_MD_TESTING.md and persuasion-principles.md) which is good, but neither is provided in the bundle. The skill itself is monolithic — the pressure types table, common mistakes, rationalization examples, and the full worked example could all be split into separate reference files. Too much detail is inline for a skill of this complexity.

2 / 3

Total

9

/

12

Passed

Description

75%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

The description has strong completeness with an explicit 'Use when' clause and occupies a distinctive niche, making it unlikely to conflict with other skills. However, the specificity of concrete actions could be improved, and the trigger terms lean toward specialized jargon rather than natural user language, which may reduce discoverability when users ask for skill testing or validation in plain terms.

Suggestions

Add more natural trigger terms users might say, such as 'test skill', 'validate skill', 'skill QA', 'check skill quality', or 'debug skill'.

Make the concrete actions more explicit — e.g., 'runs the skill against test scenarios, compares outputs with and without the skill, identifies failure cases, and iterates on the skill definition'.

DimensionReasoningScore

Specificity

The description names a domain (skill testing/verification) and some actions ('running baseline without skill', 'writing to address failures', 'iterating to close loopholes'), but the actions are somewhat abstract and process-oriented rather than concrete discrete capabilities. Terms like 'resist rationalization' and 'RED-GREEN-REFACTOR cycle' describe methodology more than specific actions.

2 / 3

Completeness

The description explicitly answers both 'what' (applies RED-GREEN-REFACTOR cycle to process documentation by running baseline, writing to address failures, iterating) and 'when' ('Use when creating or editing skills, before deployment, to verify they work under pressure'). The 'Use when' clause is present and specific.

3 / 3

Trigger Term Quality

Contains some relevant keywords like 'skills', 'verify', 'deployment', 'RED-GREEN-REFACTOR', but many terms are specialized jargon ('resist rationalization', 'close loopholes', 'process documentation'). A user wanting to test a skill might say 'test my skill' or 'validate skill' — these natural phrasings are mostly absent.

2 / 3

Distinctiveness Conflict Risk

This skill occupies a very specific niche — testing/verifying skills using a RED-GREEN-REFACTOR methodology before deployment. It is unlikely to conflict with other skills due to its unique combination of skill verification, baseline testing, and iterative refinement.

3 / 3

Total

10

/

12

Passed

Validation

100%

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation11 / 11 Passed

Validation for skill structure

No warnings or errors.

Repository
NeoLabHQ/context-engineering-kit
Reviewed

Table of Contents

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.