Execute complex tasks through sequential sub-agent orchestration with intelligent model selection, meta-judge → LLM-as-a-judge verification
37
23%
Does it follow best practices?
Impact
Pending
No eval scenarios have been run
Passed
No known issues
Optimize this skill with Tessl
npx tessl skill review --optimize ./plugins/sadd/skills/do-in-steps/SKILL.mdQuality
Discovery
7%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
This description is heavily laden with technical jargon and buzzwords while failing to communicate concrete actions or when the skill should be triggered. It reads more like an internal architecture note than a user-facing skill description. It lacks any natural language trigger terms and provides no 'Use when...' guidance.
Suggestions
Replace abstract language like 'complex tasks' and 'sub-agent orchestration' with concrete examples of what tasks this skill actually performs (e.g., 'Breaks down multi-step research, analysis, or coding tasks into sequential steps with quality verification').
Add an explicit 'Use when...' clause with natural trigger terms users would say, such as 'Use when the user requests a complex multi-step task that requires breaking down into subtasks, quality checking, or iterative refinement.'
Remove or simplify internal implementation jargon like 'meta-judge → LLM-as-a-judge verification' and instead describe the user-facing benefit, e.g., 'automatically verifies output quality at each step.'
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | The description uses abstract, jargon-heavy language like 'sequential sub-agent orchestration,' 'intelligent model selection,' and 'meta-judge → LLM-as-a-judge verification' without listing any concrete user-facing actions. It does not describe what specific tasks it performs. | 1 / 3 |
Completeness | The 'what' is vague ('execute complex tasks') and there is no 'when' clause or explicit trigger guidance at all. Both components are weak or missing. | 1 / 3 |
Trigger Term Quality | The terms used ('sub-agent orchestration,' 'meta-judge,' 'LLM-as-a-judge') are highly technical jargon that no user would naturally say when requesting help. There are no natural trigger keywords a user would use. | 1 / 3 |
Distinctiveness Conflict Risk | The mention of 'sub-agent orchestration' and 'meta-judge → LLM-as-a-judge verification' gives it some distinctiveness from other skills, but 'execute complex tasks' is extremely generic and could overlap with virtually any skill. | 2 / 3 |
Total | 5 / 12 Passed |
Implementation
39%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This skill defines a sophisticated multi-agent orchestration workflow with excellent workflow clarity, including validation checkpoints, retry loops, and escalation procedures. However, it is severely undermined by extreme verbosity—repeating key instructions 4-6 times, including three lengthy examples that demonstrate the same pattern, and explaining concepts Claude already knows. The monolithic structure (~800+ lines with no external file references) makes it a poor fit for a context-window-conscious skill file.
Suggestions
Reduce content by 60-70%: eliminate repeated instructions (e.g., 'meta-judge FIRST' and 'reuse spec across retries' each appear 4-6 times), consolidate the three examples into one concise example, and remove explanations of concepts Claude already understands (what orchestration is, what chain-of-thought means).
Split into multiple files: move prompt templates to TEMPLATES.md, examples to EXAMPLES.md, model selection matrix to MODEL_SELECTION.md, and context format reference to FORMATS.md, keeping SKILL.md as a concise overview with clear references.
Define the Task tool invocation concretely: show the actual tool call syntax/API rather than pseudocode descriptions like 'Use Task tool: - description: ...' so the skill is directly executable.
Remove the 'Best Practices' section entirely as it rehashes content already covered in the Process section, or consolidate it into a brief checklist.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | This skill is extremely verbose at ~800+ lines. It over-explains concepts Claude already understands (orchestration patterns, chain-of-thought reasoning, what context passing means), includes massive template blocks that could be summarized, repeats the same instructions multiple times (e.g., 'meta-judge FIRST in dispatch order' appears 4+ times, 'reuse same meta-judge spec across retries' appears 6+ times), and provides three lengthy examples that largely repeat the same pattern with minor variations. The 'Best Practices' section rehashes content already covered in the Process section. | 1 / 3 |
Actionability | The skill provides detailed prompt templates and structured formats, which is somewhat actionable. However, it's an orchestration skill that relies entirely on dispatching sub-agents via a 'Task tool' that is never concretely defined (no actual tool call syntax, API, or executable example). The 'Dispatch Example' uses pseudocode-like notation rather than actual tool invocation syntax. The process is described in detail but not in a directly executable way. | 2 / 3 |
Workflow Clarity | The workflow is clearly sequenced across 4 phases with explicit validation checkpoints (judge verification after each step), a well-defined retry/feedback loop (max 3 retries with escalation), clear decision criteria for pass/fail (score thresholds), and an ASCII flow diagram. Error handling covers multiple failure modes with specific escalation procedures. The iteration loop with meta-judge spec reuse is well-defined. | 3 / 3 |
Progressive Disclosure | The entire skill is a monolithic wall of text with no references to external files despite being ~800+ lines. Content that could easily be split into separate files (prompt templates, examples, model selection matrix, context format reference, error handling procedures) is all inline. There are no bundle files to support progressive disclosure, and the skill makes no attempt to organize content across files. | 1 / 3 |
Total | 7 / 12 Passed |
Validation
81%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 9 / 11 Passed
Validation for skill structure
| Criteria | Description | Result |
|---|---|---|
skill_md_line_count | SKILL.md is long (1417 lines); consider splitting into references/ and linking | Warning |
frontmatter_unknown_keys | Unknown frontmatter key(s) found; consider removing or moving to metadata | Warning |
Total | 9 / 11 Passed | |
dedca19
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.