CtrlK
BlogDocsLog inGet started
Tessl Logo

implement-task

Implement a task with automated LLM-as-Judge verification for critical steps

34

Quality

31%

Does it follow best practices?

Impact

No eval scenarios have been run

SecuritybySnyk

Passed

No known issues

Fix and improve this skill with Tessl

tessl review fix ./plugins/sdd/skills/implement-task/SKILL.md
SKILL.md
Quality
Evals
Security

Quality

Content

55%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This skill is extremely thorough and actionable with excellent workflow clarity, providing concrete agent prompts, validation gates, and error recovery loops. However, it is massively over-engineered for a single SKILL.md file — at 800+ lines it wastes enormous context window space with repetitive patterns, redundant diagrams, excessive examples, and inlined content that should be in separate reference files. The token cost of loading this skill is very high relative to the unique information it conveys.

Suggestions

Extract the 7 usage examples, Appendix A (verification specs reference), the voting algorithm details, and the refine/continue mode specifications into separate referenced files (e.g., EXAMPLES.md, VERIFICATION-SPECS.md, VOTING.md, MODES.md) to reduce the main file to ~200 lines.

Consolidate Patterns A/B/C into a single parameterized pattern with a table showing differences (verification level, number of judges, threshold) rather than repeating nearly identical prompt templates three times.

Remove explanations of basic concepts (e.g., 'For 2 evaluations: Median = (Score1 + Score2) / 2', 'Sort scores, take middle value') — Claude knows how to calculate medians.

Eliminate the redundant checklist at the end which restates rules already covered in the CRITICAL sections and anti-rationalization rules above.

DimensionReasoningScore

Conciseness

Extremely verbose at ~800+ lines. Massive amounts of repetition (e.g., the same prompt templates repeated for Pattern A/B/C, multiple redundant flow diagrams, 7 usage examples that largely repeat the same concepts). Explains concepts Claude already knows (what median is, how to sort scores). The checklist at the end repeats rules already stated in the body. Much of this could be condensed to 1/4 the length.

1 / 3

Actionability

Highly actionable with concrete prompt templates, specific bash commands, exact file paths, detailed argument parsing logic, and copy-paste ready agent prompts. The workflow patterns (A/B/C) provide executable guidance for each scenario with specific tool configurations.

3 / 3

Workflow Clarity

Excellent workflow clarity with explicit phase sequencing (0-5), clear validation checkpoints at every step (judge PASS/FAIL gates), explicit feedback loops (fix → re-verify → iterate up to MAX_ITERATIONS), and clear error recovery paths. The flow diagrams reinforce the sequence. Destructive operations (moving files) have proper guards.

3 / 3

Progressive Disclosure

Monolithic wall of text with no references to external files for detailed content. Everything is inlined — the refine mode logic, continue mode logic, voting algorithm, all usage examples, the entire appendix on verification specs, and the full checklist. Much of this (e.g., Appendix A, the 7 usage examples, the voting algorithm) should be in separate referenced files. No bundle files are provided to support splitting, but the content desperately needs it.

1 / 3

Total

8

/

12

Passed

Description

7%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

This description is too vague and abstract to be effective for skill selection. It fails to list concrete actions, lacks natural trigger terms users would use, and provides no explicit guidance on when Claude should select this skill. The only slightly redeeming quality is the mention of 'LLM-as-Judge' which provides some domain specificity.

Suggestions

List specific concrete actions the skill performs, e.g., 'Sets up automated evaluation pipelines, creates scoring rubrics, runs LLM-based quality checks on task outputs, and iterates on failing steps.'

Add a 'Use when...' clause with natural trigger terms, e.g., 'Use when the user asks to verify task outputs automatically, evaluate LLM responses, set up quality checks, or needs judge-based validation of critical workflow steps.'

Replace the vague 'implement a task' phrasing with specific task types or domains this skill applies to, making it distinguishable from general task-implementation skills.

DimensionReasoningScore

Specificity

The description is vague — 'implement a task' is extremely generic, and 'automated LLM-as-Judge verification' is a single abstract concept rather than a list of concrete actions. No specific operations like 'create test cases', 'run evaluations', or 'generate scoring rubrics' are mentioned.

1 / 3

Completeness

The description weakly addresses 'what' (implement a task with verification) but provides no 'when' clause or explicit trigger guidance. There is no 'Use when...' or equivalent, and the 'what' itself is too vague to be useful.

1 / 3

Trigger Term Quality

The terms used are technical jargon ('LLM-as-Judge', 'verification for critical steps') that users are unlikely to naturally say. Common trigger terms like 'evaluate', 'check quality', 'auto-verify', 'test output', or 'judge responses' are absent.

1 / 3

Distinctiveness Conflict Risk

'LLM-as-Judge' is a somewhat distinctive concept that narrows the domain, but 'implement a task' is so generic it could overlap with virtually any skill. The combination provides some distinctiveness but not enough to clearly carve out a niche.

2 / 3

Total

5

/

12

Passed

Validation

81%

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation9 / 11 Passed

Validation for skill structure

CriteriaDescriptionResult

skill_md_line_count

SKILL.md is long (1786 lines); consider splitting into references/ and linking

Warning

frontmatter_unknown_keys

Unknown frontmatter key(s) found; consider removing or moving to metadata

Warning

Total

9

/

11

Passed

Repository
NeoLabHQ/context-engineering-kit
Reviewed

Table of Contents

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.