Turn a refined research proposal or method idea into a detailed, claim-driven experiment roadmap. Use after `research-refine`, or when the user asks for a detailed experiment plan, ablation matrix, evaluation protocol, run order, compute budget, or paper-ready validation that supports the core problem, novelty, simplicity, and any LLM / VLM / Diffusion / RL-based contribution.
90
88%
Does it follow best practices?
Impact
Pending
No eval scenarios have been run
Critical
Do not install without reviewing
Quality
Discovery
100%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
This is a strong skill description that clearly communicates its purpose, lists specific deliverables, includes abundant natural trigger terms, and explicitly states when it should be used. It positions itself well within a research pipeline by referencing the upstream skill and covers both the 'what' and 'when' comprehensively. The description is concise yet information-dense with minimal fluff.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | Lists multiple specific concrete actions: 'claim-driven experiment roadmap', 'ablation matrix', 'evaluation protocol', 'run order', 'compute budget', 'paper-ready validation'. These are highly specific deliverables. | 3 / 3 |
Completeness | Clearly answers both 'what' (turn a refined research proposal into a detailed experiment roadmap) and 'when' (after research-refine, or when user asks for experiment plan, ablation matrix, evaluation protocol, etc.) with explicit trigger guidance. | 3 / 3 |
Trigger Term Quality | Excellent coverage of natural terms a researcher would use: 'experiment plan', 'ablation matrix', 'evaluation protocol', 'run order', 'compute budget', 'paper-ready validation', 'LLM', 'VLM', 'Diffusion', 'RL-based'. Also references the upstream skill 'research-refine' for pipeline triggering. | 3 / 3 |
Distinctiveness Conflict Risk | Occupies a clear niche: experiment planning for ML research, distinct from general research ideation or paper writing. The specific mention of 'ablation matrix', 'run order', 'compute budget', and the pipeline position after 'research-refine' make it highly distinguishable from other skills. | 3 / 3 |
Total | 12 / 12 Passed |
Implementation
77%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This is a strong, highly actionable skill that provides a clear claim-driven experiment planning workflow with concrete output templates and well-sequenced phases. Its main weakness is moderate verbosity—some explanatory text could be trimmed and the large inline templates could be split into reference files. The workflow clarity is excellent with explicit decision gates, risk assessment, and a final validation checklist.
Suggestions
Consider moving the full EXPERIMENT_PLAN.md and EXPERIMENT_TRACKER.md templates into separate reference files to reduce inline bulk and improve progressive disclosure.
Tighten the Overview and Phase 2 descriptions by removing rationale sentences that Claude can infer (e.g., 'The goal is not to generate a giant benchmark wishlist' and explanations of why strong baselines matter).
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | The skill is well-structured and mostly efficient, but includes some unnecessary elaboration. For instance, the overview section re-explains the purpose redundantly, and some phase descriptions could be tightened (e.g., the bullet lists in Phase 1 and Phase 2 explain rationale that Claude could infer). The constants section and key rules are lean, but the overall document is longer than it needs to be. | 2 / 3 |
Actionability | The skill provides highly concrete, executable guidance: exact file paths, specific markdown templates for output files, a structured claim map table, a tracker table schema, a user-facing summary format, and explicit phase-by-phase instructions. The output templates are copy-paste ready and the workflow steps are specific enough to execute without ambiguity. | 3 / 3 |
Workflow Clarity | The workflow is clearly sequenced across 6 phases (0-5) with explicit decision gates at each milestone (stop/go), a structured run order with risk/mitigation, and a final checklist that serves as validation. The milestone structure in Phase 4 includes explicit decision gates, and the separation of must-run vs nice-to-have provides clear prioritization checkpoints. | 3 / 3 |
Progressive Disclosure | The skill references companion skills at the end and outputs to specific files, which is good. However, the document itself is quite long (~200 lines of substantive content) with detailed output templates inline that could potentially be referenced separately. The inline markdown templates for EXPERIMENT_PLAN.md and EXPERIMENT_TRACKER.md add significant length that could be split into reference files. | 2 / 3 |
Total | 10 / 12 Passed |
Validation
90%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 10 / 11 Passed
Validation for skill structure
| Criteria | Description | Result |
|---|---|---|
allowed_tools_field | 'allowed-tools' contains unusual tool name(s) | Warning |
Total | 10 / 11 Passed | |
dc00dfb
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.