CtrlK
BlogDocsLog inGet started
Tessl Logo

do-competitively

Execute tasks through competitive multi-agent generation, meta-judge evaluation specification, multi-judge evaluation, and evidence-based synthesis

37

Quality

36%

Does it follow best practices?

Impact

No eval scenarios have been run

SecuritybySnyk

Passed

No known issues

Fix and improve this skill with Tessl

tessl review fix ./plugins/sadd/skills/do-competitively/SKILL.md
SKILL.md
Quality
Evals
Security

Quality

Content

55%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This skill provides an extremely thorough and actionable multi-agent orchestration workflow with excellent workflow clarity and concrete prompt templates. However, it is severely undermined by its verbosity — the content is roughly 3-4x longer than necessary, with extensive repetition across prompt templates, examples, and instructions. The monolithic structure with no progressive disclosure makes it a poor fit for context-window efficiency, which is ironic given the skill's own emphasis on cost savings.

Suggestions

Reduce content by 50-60%: eliminate the three full worked examples (or move to a separate EXAMPLES.md), deduplicate repeated instructions (e.g., 'CRITICAL' warnings appear 8+ times), and remove explanatory text Claude doesn't need (e.g., explaining what REST APIs or caching strategies are).

Split into multiple files: move prompt templates to PROMPTS.md, examples to EXAMPLES.md, and keep SKILL.md as a concise overview with the workflow diagram, decision logic, and references to detail files.

Consolidate the prompt templates — generators, judges, synthesizer, and polish prompts share significant boilerplate that could be parameterized rather than fully repeated.

Remove or drastically shorten the three worked examples (API Design, Caching, Auth) — they repeat the entire process flow and add ~200 lines without teaching new concepts beyond what the process section already covers.

DimensionReasoningScore

Conciseness

Extremely verbose at ~500+ lines. Contains massive amounts of repetition (e.g., the same prompt templates, examples, and decision logic explained multiple times), redundant ASCII diagrams, and extensive examples that largely restate the same process. Many sections explain things Claude already knows (what REST APIs are, what caching strategies exist). The skill could be reduced to ~30-40% of its current size without losing actionable content.

1 / 3

Actionability

Despite verbosity, the skill provides highly concrete, executable guidance: exact prompt templates for each agent type, specific tool call dispatch patterns, structured output formats for judges, precise decision logic with thresholds (e.g., <3.0 triggers REDESIGN), file naming conventions, and directory setup commands. The instructions are copy-paste ready for orchestration.

3 / 3

Workflow Clarity

The multi-phase workflow is exceptionally well-sequenced with clear phase boundaries, explicit validation checkpoints (Phase 2.5 adaptive strategy selection with specific decision logic), feedback loops (REDESIGN returns to Phase 1), and clear dependencies (wait for all Phase 1 agents before Phase 2). The ASCII diagram provides a clear visual overview, and each phase has explicit entry/exit criteria.

3 / 3

Progressive Disclosure

Everything is crammed into a single monolithic file with no bundle files or external references. The three detailed examples, prompt templates, best practices, and output format specifications could easily be split into separate reference files. The skill is a wall of text that would benefit enormously from splitting into overview + detailed references.

1 / 3

Total

8

/

12

Passed

Description

17%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

The description relies heavily on technical jargon that describes an internal methodology rather than user-facing capabilities. It lacks a 'Use when...' clause, concrete action verbs tied to observable outcomes, and natural trigger terms that users would actually use. The description would be nearly impossible for Claude to correctly match to user requests.

Suggestions

Add a 'Use when...' clause specifying the types of user requests that should trigger this skill, e.g., 'Use when the user wants to compare multiple approaches to a problem, needs quality evaluation of generated outputs, or requests a best-of-N generation strategy.'

Replace abstract methodology terms with concrete, user-facing language describing what the skill produces, e.g., 'Generates multiple candidate solutions, evaluates them against defined criteria, and synthesizes the best result.'

Include natural trigger terms users might say, such as 'compare approaches', 'evaluate options', 'best answer', 'multi-perspective analysis', or 'judge quality'.

DimensionReasoningScore

Specificity

The description names specific phases of a process (multi-agent generation, meta-judge evaluation, multi-judge evaluation, evidence-based synthesis), but these are abstract methodological terms rather than concrete user-facing actions. It doesn't clarify what kinds of tasks are executed or what the outputs look like.

2 / 3

Completeness

The description only vaguely addresses 'what' (execute tasks through a multi-step process) and completely lacks any 'when' clause or explicit trigger guidance. There is no 'Use when...' or equivalent, which per the rubric caps completeness at 2, and the 'what' is itself weak enough to warrant a 1.

1 / 3

Trigger Term Quality

The terms used ('meta-judge evaluation specification', 'multi-agent generation', 'evidence-based synthesis') are highly technical jargon that users would almost never naturally say. No common user-facing keywords or natural language variations are included.

1 / 3

Distinctiveness Conflict Risk

The highly specialized jargon makes it unlikely to conflict with common skills, but the phrase 'execute tasks' is extremely generic and could overlap with any task-execution skill. The niche methodology terms provide some distinctiveness but the overall scope is unclear.

2 / 3

Total

6

/

12

Passed

Validation

81%

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation9 / 11 Passed

Validation for skill structure

CriteriaDescriptionResult

skill_md_line_count

SKILL.md is long (778 lines); consider splitting into references/ and linking

Warning

frontmatter_unknown_keys

Unknown frontmatter key(s) found; consider removing or moving to metadata

Warning

Total

9

/

11

Passed

Repository
NeoLabHQ/context-engineering-kit
Reviewed

Table of Contents

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.