Execute tasks through competitive multi-agent generation, meta-judge evaluation specification, multi-judge evaluation, and evidence-based synthesis
47
36%
Does it follow best practices?
Impact
Pending
No eval scenarios have been run
Passed
No known issues
Optimize this skill with Tessl
npx tessl skill review --optimize ./plugins/sadd/skills/do-competitively/SKILL.mdQuality
Discovery
17%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
The description relies heavily on technical jargon describing an internal methodology without explaining what user-facing problems it solves or when it should be selected. It lacks natural trigger terms users would use and provides no explicit 'Use when...' guidance, making it very difficult for Claude to correctly select this skill from a pool of options.
Suggestions
Add a 'Use when...' clause that specifies concrete scenarios, e.g., 'Use when the user wants to compare multiple approaches to a complex problem and select the best solution through structured evaluation.'
Replace or supplement jargon terms with natural language that users would actually say, such as 'compare solutions', 'evaluate options', 'best-of-N generation', or 'quality ranking'.
Specify what kinds of tasks this skill handles and what the output looks like, e.g., 'Generates multiple candidate solutions for complex tasks, evaluates them using structured judging criteria, and synthesizes the best result with supporting evidence.'
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | The description names specific phases of a process (multi-agent generation, meta-judge evaluation, multi-judge evaluation, evidence-based synthesis), but these are abstract methodological terms rather than concrete user-facing actions. It doesn't clarify what kinds of tasks are executed or what the outputs look like. | 2 / 3 |
Completeness | The description only vaguely addresses 'what' (execute tasks through a multi-step process) and completely lacks any 'when' clause or explicit trigger guidance. There is no 'Use when...' or equivalent, which per the rubric caps completeness at 2, and the 'what' is itself weak, so this scores a 1. | 1 / 3 |
Trigger Term Quality | The terms used ('multi-agent generation', 'meta-judge evaluation specification', 'evidence-based synthesis') are highly technical jargon that users would almost never naturally say. No common user-facing keywords are included. | 1 / 3 |
Distinctiveness Conflict Risk | The specific methodology terms (multi-agent, meta-judge, multi-judge) provide some distinctiveness from generic skills, but 'execute tasks' is extremely broad and could overlap with many other skills. The niche is somewhat identifiable but not clearly scoped. | 2 / 3 |
Total | 6 / 12 Passed |
Implementation
55%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This skill is highly actionable with excellent workflow clarity — the multi-phase orchestration pattern is well-defined with clear decision logic, thresholds, and feedback loops. However, it is extremely verbose and monolithic, with massive repetition across prompt templates and examples that could be condensed or split into referenced files. The content would be roughly 3x more token-efficient with better progressive disclosure and elimination of redundant explanations.
Suggestions
Reduce content by at least 50%: consolidate the three nearly-identical worked examples into one with a brief table showing how strategy selection differs, and extract full prompt templates into separate referenced files (e.g., prompts/generator.md, prompts/judge.md).
Split into bundle files: move prompt templates to a PROMPTS.md, examples to EXAMPLES.md, and strategy details to STRATEGIES.md, keeping SKILL.md as a concise overview with the workflow diagram and phase summaries.
Remove redundant CRITICAL warnings — the same points about file naming, not reading reports, and passing exact YAML are repeated 3-4 times each. State each constraint once in a constraints section.
Remove explanatory text Claude doesn't need (e.g., 'Key principle: Diversity through independence', 'Key principle: Evidence-based synthesis leverages collective intelligence') — these are obvious from the instructions themselves.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | Extremely verbose at ~500+ lines. Contains massive amounts of repetition (prompt templates repeated in full, examples that restate the same patterns multiple times), explains orchestration concepts Claude can infer, and includes ASCII diagrams, best practices, and three full worked examples that largely duplicate each other. The critical warnings are repeated excessively. Much of this could be condensed to under 150 lines. | 1 / 3 |
Actionability | Highly actionable with complete prompt templates, specific tool dispatch instructions, concrete file naming conventions, decision logic with exact thresholds, and structured output formats. Every phase has copy-paste ready prompts and clear dispatch patterns. | 3 / 3 |
Workflow Clarity | Excellent multi-step workflow with clear phase sequencing, explicit validation checkpoints (Phase 2.5 adaptive strategy selection with specific decision logic), feedback loops (REDESIGN returns to Phase 1), and clear dependencies between phases (wait for all Phase 1 before Phase 2). The decision tree with thresholds is well-defined. | 3 / 3 |
Progressive Disclosure | Monolithic wall of text with no bundle files or external references. Everything is inlined including three full worked examples, all prompt templates, and extensive best practices. The content would benefit enormously from splitting prompt templates, examples, and strategy details into separate referenced files. | 1 / 3 |
Total | 8 / 12 Passed |
Validation
81%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 9 / 11 Passed
Validation for skill structure
| Criteria | Description | Result |
|---|---|---|
skill_md_line_count | SKILL.md is long (778 lines); consider splitting into references/ and linking | Warning |
frontmatter_unknown_keys | Unknown frontmatter key(s) found; consider removing or moving to metadata | Warning |
Total | 9 / 11 Passed | |
dedca19
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.