Turn a refined research proposal or method idea into a detailed, claim-driven experiment roadmap. Use after `research-refine`, or when the user asks for a detailed experiment plan, ablation matrix, evaluation protocol, run order, compute budget, or paper-ready validation that supports the core problem, novelty, simplicity, and any LLM / VLM / Diffusion / RL-based contribution.
72
88%
Does it follow best practices?
Impact
—
No eval scenarios have been run
Passed
No known issues
Quality
Discovery
100%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
This is a strong skill description that clearly articulates what the skill does (converts research proposals into detailed experiment roadmaps), when to use it (after research-refine or when specific experiment planning artifacts are requested), and for what domain (ML research). It uses third person voice correctly and includes rich, natural trigger terms that researchers would use.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | Lists multiple specific concrete actions: 'claim-driven experiment roadmap', 'ablation matrix', 'evaluation protocol', 'run order', 'compute budget', 'paper-ready validation'. These are concrete, actionable outputs rather than vague descriptions. | 3 / 3 |
Completeness | Clearly answers both 'what' (turn a refined research proposal into a detailed experiment roadmap) and 'when' (explicitly states 'Use after research-refine, or when the user asks for a detailed experiment plan, ablation matrix, evaluation protocol, run order, compute budget, or paper-ready validation'). | 3 / 3 |
Trigger Term Quality | Includes strong natural keywords users would say: 'experiment plan', 'ablation matrix', 'evaluation protocol', 'run order', 'compute budget', 'paper-ready validation', 'research proposal', 'LLM', 'VLM', 'Diffusion', 'RL-based'. Good coverage of terms a researcher would naturally use. | 3 / 3 |
Distinctiveness Conflict Risk | Occupies a clear niche: experiment planning for ML research, distinct from general research refinement or paper writing. The explicit mention of 'after research-refine' and specific ML domains (LLM/VLM/Diffusion/RL) makes it unlikely to conflict with other skills. | 3 / 3 |
Total | 12 / 12 Passed |
Implementation
77%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This is a strong, well-structured skill that provides a clear claim-driven experiment planning workflow with concrete output templates and explicit validation checkpoints. Its main weakness is moderate verbosity—some motivational framing and inline templates could be trimmed or externalized. The actionability and workflow clarity are excellent, with specific file paths, decision gates, and a realistic milestone structure.
Suggestions
Trim motivational/explanatory text in the Overview section (e.g., 'The goal is not to generate a giant benchmark wishlist') since Claude can infer intent from the workflow itself.
Consider moving the large output templates (EXPERIMENT_PLAN.md and EXPERIMENT_TRACKER.md structures) into separate reference files to reduce the main skill's token footprint and improve progressive disclosure.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | The skill is well-structured but somewhat verbose. Some sections like the overview explain motivations Claude can infer ('The goal is not to generate a giant benchmark wishlist'), and the constants section, while useful, includes explanatory comments that could be trimmed. The detailed template in Phase 5 is borderline—useful for output format but lengthy. | 2 / 3 |
Actionability | The skill provides highly concrete, executable guidance: specific file paths to read, exact markdown templates for output files, a precise milestone structure, and detailed specifications for each experiment block including success criteria, failure interpretation, and table/figure targets. The output templates are copy-paste ready. | 3 / 3 |
Workflow Clarity | The workflow is clearly sequenced across 6 phases (0-5) with logical dependencies. Phase 4 includes explicit decision gates ('stop / go decision gate') at each milestone, and the run order separates must-run from nice-to-have. The milestone structure provides validation checkpoints (sanity stage before baselines, decision stage before polish). | 3 / 3 |
Progressive Disclosure | The skill references shared protocols (output-versioning.md, output-manifest.md, output-language.md) and companion skills, which is good. However, the main body is quite long (~200+ lines) with the full output templates inline. The experiment block template and tracker template could potentially be referenced files. No bundle files are provided to verify referenced paths exist. | 2 / 3 |
Total | 10 / 12 Passed |
Validation
90%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 10 / 11 Passed
Validation for skill structure
| Criteria | Description | Result |
|---|---|---|
allowed_tools_field | 'allowed-tools' contains unusual tool name(s) | Warning |
Total | 10 / 11 Passed | |
a425a71
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.