CtrlK
BlogDocsLog inGet started
Tessl Logo

test-prompt

Use when creating or editing any prompt (commands, hooks, skills, subagent instructions) to verify it produces desired behavior - applies RED-GREEN-REFACTOR cycle to prompt engineering using subagents for isolated testing

54

Quality

61%

Does it follow best practices?

Impact

No eval scenarios have been run

SecuritybySnyk

Passed

No known issues

Fix and improve this skill with Tessl

tessl review fix ./plugins/customaize-agent/skills/test-prompt/SKILL.md
SKILL.md
Quality
Evals
Security

Quality

Content

47%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

The skill has excellent workflow clarity with well-defined phases, checklists, and feedback loops for the RED-GREEN-REFACTOR cycle applied to prompt testing. However, it is severely over-long and verbose, repeatedly explaining TDD concepts Claude already knows and restating the same core idea in multiple formats (tables, prose, examples, checklists). The actionability is moderate — good illustrative examples but lacking truly executable Task tool syntax — and the monolithic structure would benefit from splitting detailed examples and patterns into separate files.

Suggestions

Cut content by at least 50%: remove the 'Why Use Subagents' rationale section, the TDD mapping table (redundant with the process sections), explanations of what TDD is, and the 'The Bottom Line' section. Trust that Claude knows TDD if the prerequisite skill is loaded.

Provide exact Task tool invocation syntax rather than pseudocode markdown blocks — show the actual tool call format Claude should use to launch subagents.

Split the detailed prompt-type examples (instruction, discipline, guidance, reference) and the testing patterns (parallel, A/B, regression, stress) into a separate EXAMPLES.md or PATTERNS.md file, keeping only a summary table in the main SKILL.md.

Remove the full worked git:commit example (which alone is ~80 lines) or move it to a separate EXAMPLE.md, replacing it with a brief reference link.

DimensionReasoningScore

Conciseness

Extremely verbose at ~400+ lines. Extensively explains concepts Claude already knows (what subagents are, why isolation matters, TDD basics). Massive redundancy: the TDD mapping table, the overview, the process sections, the example, and the quick reference all repeat the same RED-GREEN-REFACTOR concept. The 'Why Use Subagents' section explains obvious benefits. Multiple sections could be cut by 60%+ without losing actionable content.

1 / 3

Actionability

Provides concrete examples of test scenarios and prompt structures, and the git commit example is detailed and illustrative. However, the Task tool invocations are pseudocode rather than actual executable syntax, and many sections describe what to do conceptually rather than providing copy-paste ready commands. The scenario examples are helpful but presented as illustrative narratives rather than executable templates.

2 / 3

Workflow Clarity

The RED-GREEN-REFACTOR workflow is clearly sequenced with explicit validation checkpoints at each phase. Each phase has checklists, success criteria, and feedback loops (e.g., 'If agent still fails: Prompt unclear or incomplete. Revise and re-test.' and 'If new failures appear: Refactoring broke something. Revert and try different optimization.'). The comprehensive testing checklist at the end provides a clear verification gate before deployment.

3 / 3

Progressive Disclosure

References to related skills (`tdd:test-driven-development`, `prompt-engineering`, `test-skill`) are mentioned but no bundle files exist to support them. The content is monolithic — all inline in one massive file with no separation of reference material, examples, or detailed patterns into supporting files. The prompt type examples and testing patterns could easily be split into separate reference files.

2 / 3

Total

8

/

12

Passed

Description

75%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

The description effectively communicates a clear niche (TDD for prompts) with an explicit 'Use when' clause that covers relevant trigger scenarios. Its main weakness is moderate specificity — it describes the approach at a high level but doesn't enumerate the concrete actions performed (e.g., spawning test subagents, comparing outputs, iterating on prompt text). Trigger terms lean technical and could benefit from more natural user-facing language.

Suggestions

Add more concrete actions such as 'spawns test subagents to evaluate prompt outputs, compares results against expected behavior, iterates on prompt wording'

Include more natural trigger terms users might say, such as 'test my prompt', 'prompt isn't working', 'debug prompt behavior', or 'improve prompt quality'

DimensionReasoningScore

Specificity

It names the domain (prompt engineering/testing) and describes the approach (RED-GREEN-REFACTOR cycle, subagents for isolated testing), but doesn't list multiple concrete actions beyond 'creating or editing' and 'verify'. The specific steps of the cycle or what testing entails are not enumerated.

2 / 3

Completeness

The description answers both 'what' (applies RED-GREEN-REFACTOR cycle to prompt engineering using subagents for isolated testing) and 'when' (when creating or editing any prompt including commands, hooks, skills, subagent instructions). The 'Use when' clause is explicit and clearly stated.

3 / 3

Trigger Term Quality

Includes relevant terms like 'prompt', 'commands', 'hooks', 'skills', 'subagent instructions', and 'RED-GREEN-REFACTOR', which are useful triggers. However, these are somewhat technical/internal terms; common user phrases like 'test my prompt', 'prompt quality', 'debug prompt', or 'prompt iteration' are missing.

2 / 3

Distinctiveness Conflict Risk

This skill occupies a very specific niche — TDD-style prompt testing using subagents — which is unlikely to conflict with other skills. The combination of RED-GREEN-REFACTOR methodology applied specifically to prompt engineering is highly distinctive.

3 / 3

Total

10

/

12

Passed

Validation

90%

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation10 / 11 Passed

Validation for skill structure

CriteriaDescriptionResult

skill_md_line_count

SKILL.md is long (715 lines); consider splitting into references/ and linking

Warning

Total

10

/

11

Passed

Repository
NeoLabHQ/context-engineering-kit
Reviewed

Table of Contents

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.