test-skill

Use when creating or editing skills, before deployment, to verify they work under pressure and resist rationalization - applies RED-GREEN-REFACTOR cycle to process documentation by running baseline without skill, writing to address failures, iterating to close loopholes

Quality

68%

Does it follow best practices?

Impact

—

No eval scenarios have been run

Securityby

Passed

No known issues

Optimize this skill with Tessl

npx tessl skill review --optimize ./plugins/customaize-agent/skills/test-skill/SKILL.md

Quality

Discovery

75%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

The description has strong completeness with an explicit 'Use when' clause and a distinctive niche that minimizes conflict risk. However, the specificity of actions could be more concrete (what exactly does 'running baseline without skill' produce?), and the trigger terms lean toward internal jargon rather than natural user language, which could reduce discoverability.

Suggestions

Replace jargon like 'resist rationalization' and 'close loopholes' with more natural user-facing terms like 'test skill reliability', 'validate skill behavior', or 'QA skills'.

Add more concrete action descriptions — e.g., 'generates test scenarios, runs baseline comparisons, identifies failure cases, and iterates on skill content' instead of the abstract process description.

Dimension	Reasoning	Score
Specificity	The description names a domain (skill testing/verification) and mentions some actions like 'running baseline without skill', 'writing to address failures', 'iterating to close loopholes', and 'RED-GREEN-REFACTOR cycle'. However, the actions are somewhat abstract and process-oriented rather than concrete discrete operations.	2 / 3
Completeness	The description explicitly answers both 'what' (applies RED-GREEN-REFACTOR cycle to process documentation by running baseline, writing to address failures, iterating) and 'when' ('Use when creating or editing skills, before deployment, to verify they work under pressure'). The 'Use when...' clause is present and specific.	3 / 3
Trigger Term Quality	Contains some relevant terms like 'skills', 'deployment', 'RED-GREEN-REFACTOR', and 'rationalization', but many of these are technical jargon. Users might naturally say 'test my skill' or 'verify skill works' but the description's trigger terms like 'resist rationalization' and 'close loopholes' are less natural user phrases.	2 / 3
Distinctiveness Conflict Risk	This skill occupies a very specific niche — testing and verifying skills using a RED-GREEN-REFACTOR methodology before deployment. It is unlikely to conflict with other skills due to its unique combination of skill verification, rationalization resistance, and the specific testing cycle described.	3 / 3
	Total	10 / 12 Passed

Implementation

62%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

The skill excels at actionability and workflow clarity, providing concrete scenario templates, explicit before/after examples, and a well-structured RED-GREEN-REFACTOR cycle with validation checkpoints. However, it is severely verbose — the same concepts are restated multiple times across tables, examples, and summary sections, roughly doubling the necessary token count. The content would benefit significantly from aggressive deduplication and splitting detailed examples into referenced files.

Suggestions

Eliminate redundant restatements of the RED-GREEN-REFACTOR cycle — the TDD mapping table, quick reference table, testing checklist, and 'Bottom Line' section all say the same thing. Keep one authoritative version.

Move the detailed pressure scenario examples and pressure types table into a separate reference file (e.g., PRESSURE_SCENARIOS.md) and link to it, keeping only one concise example inline.

Remove explanatory content Claude already knows — e.g., 'This is identical to TDD's write failing test first', 'Same cycle as code TDD, different test format', and the entire 'The Bottom Line' section which restates the obvious.

Consolidate the 'Common Mistakes' section into the relevant phase sections rather than repeating guidance in a separate block.

Dimension	Reasoning	Score
Conciseness	The skill is extremely verbose at ~300+ lines with significant repetition. The TDD mapping table, quick reference table, and multiple sections restate the same RED-GREEN-REFACTOR cycle. Pressure scenario examples are shown three times with minor variations. The 'Bottom Line' and 'Real-World Impact' sections add no new information. Much content explains concepts Claude already knows (what TDD is, what pressure testing means).	1 / 3
Actionability	The skill provides highly concrete, executable guidance: specific scenario templates with exact wording, before/after skill revision examples, explicit A/B/C choice formats, meta-testing prompts, and detailed rationalization table entries. The checklist format makes steps copy-paste actionable.	3 / 3
Workflow Clarity	The RED-GREEN-REFACTOR workflow is clearly sequenced with explicit validation checkpoints at each phase. The checklist includes verification steps (re-test, meta-test), feedback loops for error recovery (if agent still fails → revise and re-test), and clear success/failure criteria for each phase. The 'When Skill is Bulletproof' section provides explicit completion criteria.	3 / 3
Progressive Disclosure	References to external files exist (examples/CLAUDE_MD_TESTING.md, persuasion-principles.md, superpowers:test-driven-development) but no bundle files are provided to verify them. The skill itself is monolithic — the extensive pressure types table, common mistakes section, and multiple example scenarios could be split into separate reference files. The inline content is too long for what should be an overview.	2 / 3
	Total	9 / 12 Passed

Validation

100%

Warnings & errors only

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation — 11 / 11 Passed

Validation for skill structure

No warnings or errors.

Repository: NeoLabHQ/context-engineering-kit
Commit: dedca19

Reviewed: about 1 month ago

Table of Contents

Discovery Implementation Validation

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.