CtrlK
BlogDocsLog inGet started
Tessl Logo

experiment-plan

Turn a refined research proposal or method idea into a detailed, claim-driven experiment roadmap. Use after `research-refine`, or when the user asks for a detailed experiment plan, ablation matrix, evaluation protocol, run order, compute budget, or paper-ready validation that supports the core problem, novelty, simplicity, and any LLM / VLM / Diffusion / RL-based contribution.

90

Quality

88%

Does it follow best practices?

Impact

Pending

No eval scenarios have been run

SecuritybySnyk

Critical

Do not install without reviewing

SKILL.md
Quality
Evals
Security

Quality

Discovery

100%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

This is a strong skill description that clearly communicates its purpose, lists specific deliverables, includes abundant natural trigger terms, and explicitly states when it should be used. It positions itself well within a research pipeline by referencing the upstream skill and covers both the 'what' and 'when' comprehensively. The description is concise yet information-dense with minimal fluff.

DimensionReasoningScore

Specificity

Lists multiple specific concrete actions: 'claim-driven experiment roadmap', 'ablation matrix', 'evaluation protocol', 'run order', 'compute budget', 'paper-ready validation'. These are highly specific deliverables.

3 / 3

Completeness

Clearly answers both 'what' (turn a refined research proposal into a detailed experiment roadmap) and 'when' (after research-refine, or when user asks for experiment plan, ablation matrix, evaluation protocol, etc.) with explicit trigger guidance.

3 / 3

Trigger Term Quality

Excellent coverage of natural terms a researcher would use: 'experiment plan', 'ablation matrix', 'evaluation protocol', 'run order', 'compute budget', 'paper-ready validation', 'LLM', 'VLM', 'Diffusion', 'RL-based'. Also references the upstream skill 'research-refine' for pipeline triggering.

3 / 3

Distinctiveness Conflict Risk

Occupies a clear niche: experiment planning for ML research, distinct from general research ideation or paper writing. The specific mention of 'ablation matrix', 'run order', 'compute budget', and the pipeline position after 'research-refine' make it highly distinguishable from other skills.

3 / 3

Total

12

/

12

Passed

Implementation

77%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This is a strong, highly actionable skill that provides a clear claim-driven experiment planning workflow with concrete output templates and well-sequenced phases. Its main weakness is moderate verbosity—some explanatory text could be trimmed and the large inline templates could be split into reference files. The workflow clarity is excellent with explicit decision gates, risk assessment, and a final validation checklist.

Suggestions

Consider moving the full EXPERIMENT_PLAN.md and EXPERIMENT_TRACKER.md templates into separate reference files to reduce inline bulk and improve progressive disclosure.

Tighten the Overview and Phase 2 descriptions by removing rationale sentences that Claude can infer (e.g., 'The goal is not to generate a giant benchmark wishlist' and explanations of why strong baselines matter).

DimensionReasoningScore

Conciseness

The skill is well-structured and mostly efficient, but includes some unnecessary elaboration. For instance, the overview section re-explains the purpose redundantly, and some phase descriptions could be tightened (e.g., the bullet lists in Phase 1 and Phase 2 explain rationale that Claude could infer). The constants section and key rules are lean, but the overall document is longer than it needs to be.

2 / 3

Actionability

The skill provides highly concrete, executable guidance: exact file paths, specific markdown templates for output files, a structured claim map table, a tracker table schema, a user-facing summary format, and explicit phase-by-phase instructions. The output templates are copy-paste ready and the workflow steps are specific enough to execute without ambiguity.

3 / 3

Workflow Clarity

The workflow is clearly sequenced across 6 phases (0-5) with explicit decision gates at each milestone (stop/go), a structured run order with risk/mitigation, and a final checklist that serves as validation. The milestone structure in Phase 4 includes explicit decision gates, and the separation of must-run vs nice-to-have provides clear prioritization checkpoints.

3 / 3

Progressive Disclosure

The skill references companion skills at the end and outputs to specific files, which is good. However, the document itself is quite long (~200 lines of substantive content) with detailed output templates inline that could potentially be referenced separately. The inline markdown templates for EXPERIMENT_PLAN.md and EXPERIMENT_TRACKER.md add significant length that could be split into reference files.

2 / 3

Total

10

/

12

Passed

Validation

90%

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation10 / 11 Passed

Validation for skill structure

CriteriaDescriptionResult

allowed_tools_field

'allowed-tools' contains unusual tool name(s)

Warning

Total

10

/

11

Passed

Repository
wanshuiyin/Auto-claude-code-research-in-sleep
Reviewed

Table of Contents

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.