do-competitively

Execute tasks through competitive multi-agent generation, meta-judge evaluation specification, multi-judge evaluation, and evidence-based synthesis

Quality

36%

Does it follow best practices?

Impact

—

No eval scenarios have been run

Securityby

Passed

No known issues

Optimize this skill with Tessl

npx tessl skill review --optimize ./plugins/sadd/skills/do-competitively/SKILL.md

Quality

Discovery

17%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

The description relies heavily on technical jargon describing an internal methodology without explaining what user-facing problems it solves or when it should be selected. It lacks natural trigger terms users would use and provides no explicit 'Use when...' guidance, making it very difficult for Claude to correctly select this skill from a pool of options.

Suggestions

Add a 'Use when...' clause that specifies concrete scenarios, e.g., 'Use when the user wants to compare multiple approaches to a complex problem and select the best solution through structured evaluation.'

Replace or supplement jargon terms with natural language that users would actually say, such as 'compare solutions', 'evaluate options', 'best-of-N generation', or 'quality ranking'.

Specify what kinds of tasks this skill handles and what the output looks like, e.g., 'Generates multiple candidate solutions for complex tasks, evaluates them using structured judging criteria, and synthesizes the best result with supporting evidence.'

Dimension	Reasoning	Score
Specificity	The description names specific phases of a process (multi-agent generation, meta-judge evaluation, multi-judge evaluation, evidence-based synthesis), but these are abstract methodological terms rather than concrete user-facing actions. It doesn't clarify what kinds of tasks are executed or what the outputs look like.	2 / 3
Completeness	The description only vaguely addresses 'what' (execute tasks through a multi-step process) and completely lacks any 'when' clause or explicit trigger guidance. There is no 'Use when...' or equivalent, which per the rubric caps completeness at 2, and the 'what' is itself weak, so this scores a 1.	1 / 3
Trigger Term Quality	The terms used ('multi-agent generation', 'meta-judge evaluation specification', 'evidence-based synthesis') are highly technical jargon that users would almost never naturally say. No common user-facing keywords are included.	1 / 3
Distinctiveness Conflict Risk	The specific methodology terms (multi-agent, meta-judge, multi-judge) provide some distinctiveness from generic skills, but 'execute tasks' is extremely broad and could overlap with many other skills. The niche is somewhat identifiable but not clearly scoped.	2 / 3
	Total	6 / 12 Passed

Implementation

55%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This skill is highly actionable with excellent workflow clarity — the multi-phase orchestration pattern is well-defined with clear decision logic, thresholds, and feedback loops. However, it is extremely verbose and monolithic, with massive repetition across prompt templates and examples that could be condensed or split into referenced files. The content would be roughly 3x more token-efficient with better progressive disclosure and elimination of redundant explanations.

Suggestions

Reduce content by at least 50%: consolidate the three nearly-identical worked examples into one with a brief table showing how strategy selection differs, and extract full prompt templates into separate referenced files (e.g., prompts/generator.md, prompts/judge.md).

Split into bundle files: move prompt templates to a PROMPTS.md, examples to EXAMPLES.md, and strategy details to STRATEGIES.md, keeping SKILL.md as a concise overview with the workflow diagram and phase summaries.

Remove redundant CRITICAL warnings — the same points about file naming, not reading reports, and passing exact YAML are repeated 3-4 times each. State each constraint once in a constraints section.

Remove explanatory text Claude doesn't need (e.g., 'Key principle: Diversity through independence', 'Key principle: Evidence-based synthesis leverages collective intelligence') — these are obvious from the instructions themselves.

Dimension	Reasoning	Score
Conciseness	Extremely verbose at ~500+ lines. Contains massive amounts of repetition (prompt templates repeated in full, examples that restate the same patterns multiple times), explains orchestration concepts Claude can infer, and includes ASCII diagrams, best practices, and three full worked examples that largely duplicate each other. The critical warnings are repeated excessively. Much of this could be condensed to under 150 lines.	1 / 3
Actionability	Highly actionable with complete prompt templates, specific tool dispatch instructions, concrete file naming conventions, decision logic with exact thresholds, and structured output formats. Every phase has copy-paste ready prompts and clear dispatch patterns.	3 / 3
Workflow Clarity	Excellent multi-step workflow with clear phase sequencing, explicit validation checkpoints (Phase 2.5 adaptive strategy selection with specific decision logic), feedback loops (REDESIGN returns to Phase 1), and clear dependencies between phases (wait for all Phase 1 before Phase 2). The decision tree with thresholds is well-defined.	3 / 3
Progressive Disclosure	Monolithic wall of text with no bundle files or external references. Everything is inlined including three full worked examples, all prompt templates, and extensive best practices. The content would benefit enormously from splitting prompt templates, examples, and strategy details into separate referenced files.	1 / 3
	Total	8 / 12 Passed

Validation

81%

Warnings & errors only

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation — 9 / 11 Passed

Validation for skill structure

Criteria	Description	Result
skill_md_line_count	SKILL.md is long (778 lines); consider splitting into references/ and linking	Warning
frontmatter_unknown_keys	Unknown frontmatter key(s) found; consider removing or moving to metadata	Warning

	Total	9 / 11 Passed

Repository: NeoLabHQ/context-engineering-kit
Commit: dedca19

Reviewed: 29 days ago

Table of Contents

Discovery Implementation Validation

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.