Turn a refined research proposal or method idea into a detailed, claim-driven experiment roadmap. Use after `research-refine`, or when the user asks for a detailed experiment plan, ablation matrix, evaluation protocol, run order, compute budget, or paper-ready validation that supports the core problem, novelty, simplicity, and any LLM / VLM / Diffusion / RL-based contribution.
93
92%
Does it follow best practices?
Impact
Pending
No eval scenarios have been run
Passed
No known issues
Quality
Discovery
100%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
This is a strong skill description that clearly defines its purpose (converting research proposals into experiment roadmaps), provides explicit trigger conditions (both pipeline-based and keyword-based), and uses domain-specific terminology that researchers would naturally use. The description is concise yet comprehensive, covering both the input context and the specific outputs it produces.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | Lists multiple specific concrete actions: 'claim-driven experiment roadmap', 'ablation matrix', 'evaluation protocol', 'run order', 'compute budget', 'paper-ready validation'. These are highly specific deliverables. | 3 / 3 |
Completeness | Clearly answers both 'what' (turn a refined proposal into a detailed experiment roadmap) and 'when' (after research-refine, or when user asks for experiment plan, ablation matrix, evaluation protocol, etc.) with explicit trigger guidance. | 3 / 3 |
Trigger Term Quality | Excellent coverage of natural terms a researcher would use: 'experiment plan', 'ablation matrix', 'evaluation protocol', 'run order', 'compute budget', 'paper-ready validation', 'LLM', 'VLM', 'Diffusion', 'RL-based'. Also references the upstream skill 'research-refine' for pipeline triggers. | 3 / 3 |
Distinctiveness Conflict Risk | Occupies a clear niche: experiment planning from a research proposal. The specific triggers like 'ablation matrix', 'run order', 'compute budget', and the explicit pipeline position ('after research-refine') make it highly distinguishable from other research or planning skills. | 3 / 3 |
Total | 12 / 12 Passed |
Implementation
85%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This is a strong, well-structured skill that provides a comprehensive claim-driven experiment planning workflow. Its greatest strengths are the highly actionable output templates, clear phase sequencing with decision gates, and appropriate use of progressive disclosure. The main weakness is moderate verbosity—some explanatory framing and motivational text could be trimmed without losing clarity, though the overall token cost is justified by the complexity of the task.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | The skill is well-structured and mostly efficient, but it's quite long (~200 lines of content) with some sections that could be tightened. The constants section and key rules are lean, but the phase-by-phase breakdown includes some explanatory text that a capable agent could infer (e.g., 'The goal is not to generate a giant benchmark wishlist'). The full markdown templates in Phase 5 add significant length but are arguably necessary for reproducibility. | 2 / 3 |
Actionability | The skill provides highly concrete, executable guidance: specific file paths to read/write, exact markdown templates with table structures, a precise milestone structure, specific output formats, and even fallback instructions for tool failures. Every phase has clear deliverables and the output templates are copy-paste ready. | 3 / 3 |
Workflow Clarity | The workflow is clearly sequenced across 6 phases (0-5) with explicit decision gates at each milestone (stop/go), validation checkpoints (the final checklist), and a clear execution order. The milestone structure includes risk/mitigation and the separation of must-run vs nice-to-have provides built-in prioritization. The failure interpretation requirement for each block serves as a feedback loop. | 3 / 3 |
Progressive Disclosure | The skill has a clear overview section, references shared protocols via one-level-deep links (output-versioning.md, output-manifest.md, output-language.md), and points to related skills at the end. The inline markdown templates are appropriately included since they are the core deliverable, not reference material that should be split out. | 3 / 3 |
Total | 11 / 12 Passed |
Validation
90%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 10 / 11 Passed
Validation for skill structure
| Criteria | Description | Result |
|---|---|---|
allowed_tools_field | 'allowed-tools' contains unusual tool name(s) | Warning |
Total | 10 / 11 Passed | |
700fbe2
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.