do-in-steps

Execute complex tasks through sequential sub-agent orchestration with intelligent model selection, meta-judge → LLM-as-a-judge verification

Quality

23%

Does it follow best practices?

Impact

—

No eval scenarios have been run

Securityby

Passed

No known issues

Optimize this skill with Tessl

npx tessl skill review --optimize ./plugins/sadd/skills/do-in-steps/SKILL.md

Quality

Discovery

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

This description is heavily laden with technical jargon and buzzwords while failing to communicate concrete actions or when the skill should be triggered. It reads more like an internal architecture note than a user-facing skill description. It lacks any natural language trigger terms and provides no 'Use when...' guidance.

Suggestions

Replace abstract language like 'complex tasks' and 'sub-agent orchestration' with concrete examples of what tasks this skill actually performs (e.g., 'Breaks down multi-step research, analysis, or coding tasks into sequential steps with quality verification').

Add an explicit 'Use when...' clause with natural trigger terms users would say, such as 'Use when the user requests a complex multi-step task that requires breaking down into subtasks, quality checking, or iterative refinement.'

Remove or simplify internal implementation jargon like 'meta-judge → LLM-as-a-judge verification' and instead describe the user-facing benefit, e.g., 'automatically verifies output quality at each step.'

Dimension	Reasoning	Score
Specificity	The description uses abstract, jargon-heavy language like 'sequential sub-agent orchestration,' 'intelligent model selection,' and 'meta-judge → LLM-as-a-judge verification' without listing any concrete user-facing actions. It does not describe what specific tasks it performs.	1 / 3
Completeness	The 'what' is vague ('execute complex tasks') and there is no 'when' clause or explicit trigger guidance at all. Both components are weak or missing.	1 / 3
Trigger Term Quality	The terms used ('sub-agent orchestration,' 'meta-judge,' 'LLM-as-a-judge') are highly technical jargon that no user would naturally say when requesting help. There are no natural trigger keywords a user would use.	1 / 3
Distinctiveness Conflict Risk	The mention of 'sub-agent orchestration' and 'meta-judge → LLM-as-a-judge verification' gives it some distinctiveness from other skills, but 'execute complex tasks' is extremely generic and could overlap with virtually any skill.	2 / 3
	Total	5 / 12 Passed

Implementation

39%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This skill defines a sophisticated multi-agent orchestration workflow with excellent workflow clarity, including validation checkpoints, retry loops, and escalation procedures. However, it is severely undermined by extreme verbosity—repeating key instructions 4-6 times, including three lengthy examples that demonstrate the same pattern, and explaining concepts Claude already knows. The monolithic structure (~800+ lines with no external file references) makes it a poor fit for a context-window-conscious skill file.

Suggestions

Reduce content by 60-70%: eliminate repeated instructions (e.g., 'meta-judge FIRST' and 'reuse spec across retries' each appear 4-6 times), consolidate the three examples into one concise example, and remove explanations of concepts Claude already understands (what orchestration is, what chain-of-thought means).

Split into multiple files: move prompt templates to TEMPLATES.md, examples to EXAMPLES.md, model selection matrix to MODEL_SELECTION.md, and context format reference to FORMATS.md, keeping SKILL.md as a concise overview with clear references.

Define the Task tool invocation concretely: show the actual tool call syntax/API rather than pseudocode descriptions like 'Use Task tool: - description: ...' so the skill is directly executable.

Remove the 'Best Practices' section entirely as it rehashes content already covered in the Process section, or consolidate it into a brief checklist.

Dimension	Reasoning	Score
Conciseness	This skill is extremely verbose at ~800+ lines. It over-explains concepts Claude already understands (orchestration patterns, chain-of-thought reasoning, what context passing means), includes massive template blocks that could be summarized, repeats the same instructions multiple times (e.g., 'meta-judge FIRST in dispatch order' appears 4+ times, 'reuse same meta-judge spec across retries' appears 6+ times), and provides three lengthy examples that largely repeat the same pattern with minor variations. The 'Best Practices' section rehashes content already covered in the Process section.	1 / 3
Actionability	The skill provides detailed prompt templates and structured formats, which is somewhat actionable. However, it's an orchestration skill that relies entirely on dispatching sub-agents via a 'Task tool' that is never concretely defined (no actual tool call syntax, API, or executable example). The 'Dispatch Example' uses pseudocode-like notation rather than actual tool invocation syntax. The process is described in detail but not in a directly executable way.	2 / 3
Workflow Clarity	The workflow is clearly sequenced across 4 phases with explicit validation checkpoints (judge verification after each step), a well-defined retry/feedback loop (max 3 retries with escalation), clear decision criteria for pass/fail (score thresholds), and an ASCII flow diagram. Error handling covers multiple failure modes with specific escalation procedures. The iteration loop with meta-judge spec reuse is well-defined.	3 / 3
Progressive Disclosure	The entire skill is a monolithic wall of text with no references to external files despite being ~800+ lines. Content that could easily be split into separate files (prompt templates, examples, model selection matrix, context format reference, error handling procedures) is all inline. There are no bundle files to support progressive disclosure, and the skill makes no attempt to organize content across files.	1 / 3
	Total	7 / 12 Passed

Validation

81%

Warnings & errors only

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation — 9 / 11 Passed

Validation for skill structure

Criteria	Description	Result
skill_md_line_count	SKILL.md is long (1417 lines); consider splitting into references/ and linking	Warning
frontmatter_unknown_keys	Unknown frontmatter key(s) found; consider removing or moving to metadata	Warning

	Total	9 / 11 Passed

Repository: NeoLabHQ/context-engineering-kit
Commit: dedca19

Reviewed: 29 days ago

Table of Contents

Discovery Implementation Validation

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.