Turn a refined research proposal or method idea into a detailed, claim-driven experiment roadmap. Use after `research-refine`, or when the user asks for a detailed experiment plan, ablation matrix, evaluation protocol, run order, compute budget, or paper-ready validation that supports the core problem, novelty, simplicity, and any LLM / VLM / Diffusion / RL-based contribution.
90
88%
Does it follow best practices?
Impact
Pending
No eval scenarios have been run
Critical
Do not install without reviewing
Refine and concretize: $ARGUMENTS
Use this skill after the method is stable enough that the next question becomes: what exact experiments should we run, in what order, to defend the paper? If the user wants the full chain in one request, prefer /research-refine-pipeline.
The goal is not to generate a giant benchmark wishlist. The goal is to turn a proposal into a claim -> evidence -> run order roadmap that supports four things:
refine-logs/ — Default destination for experiment planning artifacts.Read the most relevant existing files first if they exist:
refine-logs/FINAL_PROPOSAL.mdrefine-logs/REVIEW_SUMMARY.mdrefine-logs/REFINEMENT_REPORT.mdExtract:
If these files do not exist, derive the same information from the user's prompt.
Before proposing experiments, write down the claims that must be defended.
Use this structure:
Do not exceed MAX_PRIMARY_CLAIMS unless the paper truly has multiple inseparable claims.
Design the paper around a compact set of experiment blocks. Default to the following blocks and delete any that are not needed:
For each block, decide whether it belongs in:
Prefer one strong baseline family over many weak baselines. If a stronger modern baseline exists, use it instead of padding the list.
For every kept block, fully specify:
Special rules:
Build a realistic run order so the user knows what to do first.
Use this milestone structure:
For each milestone, estimate:
Separate must-run from nice-to-have experiments.
refine-logs/EXPERIMENT_PLAN.mdUse this structure:
# Experiment Plan
**Problem**: [problem]
**Method Thesis**: [one-sentence thesis]
**Date**: [today]
## Claim Map
| Claim | Why It Matters | Minimum Convincing Evidence | Linked Blocks |
|-------|-----------------|-----------------------------|---------------|
| C1 | ... | ... | B1, B2 |
## Paper Storyline
- Main paper must prove:
- Appendix can support:
- Experiments intentionally cut:
## Experiment Blocks
### Block 1: [Name]
- Claim tested:
- Why this block exists:
- Dataset / split / task:
- Compared systems:
- Metrics:
- Setup details:
- Success criterion:
- Failure interpretation:
- Table / figure target:
- Priority: MUST-RUN / NICE-TO-HAVE
### Block 2: [Name]
...
## Run Order and Milestones
| Milestone | Goal | Runs | Decision Gate | Cost | Risk |
|-----------|------|------|---------------|------|------|
| M0 | ... | ... | ... | ... | ... |
## Compute and Data Budget
- Total estimated GPU-hours:
- Data preparation needs:
- Human evaluation needs:
- Biggest bottleneck:
## Risks and Mitigations
- [Risk]:
- [Mitigation]:
## Final Checklist
- [ ] Main paper tables are covered
- [ ] Novelty is isolated
- [ ] Simplicity is defended
- [ ] Frontier contribution is justified or explicitly not claimed
- [ ] Nice-to-have runs are separated from must-run runsrefine-logs/EXPERIMENT_TRACKER.mdUse this structure:
# Experiment Tracker
| Run ID | Milestone | Purpose | System / Variant | Split | Metrics | Priority | Status | Notes |
|--------|-----------|---------|------------------|-------|---------|----------|--------|-------|
| R001 | M0 | sanity | ... | ... | ... | MUST | TODO | ... |Keep the tracker compact and execution-oriented.
Experiment plan ready.
Must-run blocks:
- [Block 1]
- [Block 2]
Highest-risk assumption:
- [risk]
First three runs to launch:
1. [run]
2. [run]
3. [run]
Plan file: refine-logs/EXPERIMENT_PLAN.md
Tracker file: refine-logs/EXPERIMENT_TRACKER.mdLarge file handling: If the Write tool fails due to file size, immediately retry using Bash (cat << 'EOF' > file) to write in chunks. Do NOT ask the user for permission — just do it silently.
Every experiment must defend a claim. If it does not change a reviewer belief, cut it.
Prefer a compact paper story. Design the main table first, then add only the ablations that defend it.
Defend simplicity explicitly. If complexity is a concern, include a deletion study or a stronger-but-bloated variant comparison.
Defend frontier choices explicitly. If a modern primitive is central, prove why it is better than the strongest simpler alternative.
Prefer strong baselines over long baseline lists. A short, credible comparison set is better than a padded one.
Separate must-run from nice-to-have. Do not let appendix ideas delay the core paper evidence.
Reuse proposal constraints. Do not invent unrealistic budgets or data assumptions.
Do not fabricate results. Plan evidence; do not claim evidence.
/research-refine-pipeline -> one-shot method + experiment planning
/research-refine -> method and claim refinement
/experiment-plan -> detailed experiment roadmap
/run-experiment -> execute the runs
/auto-review-loop -> react to results and iterate on the paperdc00dfb
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.