CtrlK
BlogDocsLog inGet started
Tessl Logo

experiment-plan

Turn a refined research proposal or method idea into a detailed, claim-driven experiment roadmap. Use after `research-refine`, or when the user asks for a detailed experiment plan, ablation matrix, evaluation protocol, run order, compute budget, or paper-ready validation that supports the core problem, novelty, simplicity, and any LLM / VLM / Diffusion / RL-based contribution.

72

Quality

88%

Does it follow best practices?

Impact

No eval scenarios have been run

SecuritybySnyk

Passed

No known issues

SKILL.md
Quality
Evals
Security

Quality

Discovery

100%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

This is a strong skill description that clearly articulates what the skill does (converts research proposals into detailed experiment roadmaps), when to use it (after research-refine or when specific experiment planning artifacts are requested), and for what domain (ML research). It uses third person voice correctly and includes rich, natural trigger terms that researchers would use.

DimensionReasoningScore

Specificity

Lists multiple specific concrete actions: 'claim-driven experiment roadmap', 'ablation matrix', 'evaluation protocol', 'run order', 'compute budget', 'paper-ready validation'. These are concrete, actionable outputs rather than vague descriptions.

3 / 3

Completeness

Clearly answers both 'what' (turn a refined research proposal into a detailed experiment roadmap) and 'when' (explicitly states 'Use after research-refine, or when the user asks for a detailed experiment plan, ablation matrix, evaluation protocol, run order, compute budget, or paper-ready validation').

3 / 3

Trigger Term Quality

Includes strong natural keywords users would say: 'experiment plan', 'ablation matrix', 'evaluation protocol', 'run order', 'compute budget', 'paper-ready validation', 'research proposal', 'LLM', 'VLM', 'Diffusion', 'RL-based'. Good coverage of terms a researcher would naturally use.

3 / 3

Distinctiveness Conflict Risk

Occupies a clear niche: experiment planning for ML research, distinct from general research refinement or paper writing. The explicit mention of 'after research-refine' and specific ML domains (LLM/VLM/Diffusion/RL) makes it unlikely to conflict with other skills.

3 / 3

Total

12

/

12

Passed

Implementation

77%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This is a strong, well-structured skill that provides a clear claim-driven experiment planning workflow with concrete output templates and explicit validation checkpoints. Its main weakness is moderate verbosity—some motivational framing and inline templates could be trimmed or externalized. The actionability and workflow clarity are excellent, with specific file paths, decision gates, and a realistic milestone structure.

Suggestions

Trim motivational/explanatory text in the Overview section (e.g., 'The goal is not to generate a giant benchmark wishlist') since Claude can infer intent from the workflow itself.

Consider moving the large output templates (EXPERIMENT_PLAN.md and EXPERIMENT_TRACKER.md structures) into separate reference files to reduce the main skill's token footprint and improve progressive disclosure.

DimensionReasoningScore

Conciseness

The skill is well-structured but somewhat verbose. Some sections like the overview explain motivations Claude can infer ('The goal is not to generate a giant benchmark wishlist'), and the constants section, while useful, includes explanatory comments that could be trimmed. The detailed template in Phase 5 is borderline—useful for output format but lengthy.

2 / 3

Actionability

The skill provides highly concrete, executable guidance: specific file paths to read, exact markdown templates for output files, a precise milestone structure, and detailed specifications for each experiment block including success criteria, failure interpretation, and table/figure targets. The output templates are copy-paste ready.

3 / 3

Workflow Clarity

The workflow is clearly sequenced across 6 phases (0-5) with logical dependencies. Phase 4 includes explicit decision gates ('stop / go decision gate') at each milestone, and the run order separates must-run from nice-to-have. The milestone structure provides validation checkpoints (sanity stage before baselines, decision stage before polish).

3 / 3

Progressive Disclosure

The skill references shared protocols (output-versioning.md, output-manifest.md, output-language.md) and companion skills, which is good. However, the main body is quite long (~200+ lines) with the full output templates inline. The experiment block template and tracker template could potentially be referenced files. No bundle files are provided to verify referenced paths exist.

2 / 3

Total

10

/

12

Passed

Validation

90%

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation10 / 11 Passed

Validation for skill structure

CriteriaDescriptionResult

allowed_tools_field

'allowed-tools' contains unusual tool name(s)

Warning

Total

10

/

11

Passed

Repository
wanshuiyin/Auto-claude-code-research-in-sleep
Reviewed

Table of Contents

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.