ablation-planner

Use when main results pass result-to-claim (`claim_supported = yes` or `partial`) and ablation studies are needed for paper submission. A secondary Codex agent designs ablations from a reviewer's perspective; the local executor reviews feasibility and implements.

Quality

—

Does it follow best practices?

Impact

—

No eval scenarios have been run

Securityby

Passed

No known issues

Quality

Content

100%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

The body is a lean, well-structured orchestration workflow with concrete templates, explicit validation checkpoints, and clear sequencing. It assumes Claude's competence and provides actionable guidance throughout without verbosity.

Dimension	Reasoning	Score
Conciseness	The body is lean and assumes Claude's competence: it gives an agent prompt template, a normalization table format, and rules without explaining what ablations are or padding with generalities, matching the level-3 'lean and efficient; every token earns its place'.	3 / 3
Actionability	Provides a concrete spawn_agent prompt template with explicit fields, a structured markdown table to normalize output, and specific filenames and naming conventions (EXPERIMENT_LOG.md, ablation-no-module-X), giving concrete executable guidance appropriate for an instruction/orchestration skill rather than vague direction.	3 / 3
Workflow Clarity	A clear 5-step sequence includes explicit validation checkpoints (Step 4 feasibility review before running, Step 5 smoke test each ablation) and a feedback loop (propose cuts and ask for re-prioritization when over budget), matching the level-3 anchor of clear sequence with explicit validation and feedback loops.	3 / 3
Progressive Disclosure	No bundle files exist so there are no nested references to evaluate; the body is self-contained and well-organized into When to Use, Workflow, and Rules sections with nothing that should be split out, satisfying the well-organized-sections criterion for a skill with no need for external references.	3 / 3
	Total	12 / 12 Passed

Description

85%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

The description is specific, complete with an explicit 'Use when...' trigger, and clearly distinctive within the ML research workflow niche. Its main weakness is trigger-term quality, which leans on internal slash-command names rather than the natural language a user would actually say.

Suggestions

Broaden trigger terms beyond slash-command names by adding natural phrasings users would say, e.g. 'Use when the user needs ablation studies, sensitivity analyses, or reviewer-requested experiments for a paper submission'.

Lead with what the skill does before the gating condition so the capability is visible even when a user hasn't run /result-to-claim.

Dimension	Reasoning	Score
Specificity	Names multiple concrete actions tied to the domain: 'designs ablations from a reviewer's perspective', 'reviews feasibility and implements', and 'ablation studies are needed for paper submission', matching the level-3 anchor of multiple specific concrete actions rather than the level-2 partial coverage.	3 / 3
Completeness	An explicit 'Use when...' clause states the trigger condition (result-to-claim passing) and the body clearly states what it does, so both what and when are explicit, satisfying the level-3 anchor and not the level-2 'when only implied'.	3 / 3
Trigger Term Quality	Includes relevant terms like 'ablation studies' and 'paper submission' but relies heavily on slash-command gating ('result-to-claim', '/auto-review-loop') rather than natural user phrasings, leaving common variations ('I need ablations for my paper') under-covered, which fits level 2 not 3.	2 / 3
Distinctiveness Conflict Risk	The niche is tightly scoped to ablation planning for ML paper submission gated on a specific prior check, with distinct triggers unlikely to fire for unrelated skills, matching the level-3 'clear niche with distinct triggers'.	3 / 3
	Total	11 / 12 Passed

Validation

93%

Warnings & errors only

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation — 15 / 16 Passed

Validation for skill structure

Criteria	Description	Result
allowed_tools_field	'allowed-tools' contains unusual tool name(s)	Warning

	Total	15 / 16 Passed

Repository: wanshuiyin/Auto-claude-code-research-in-sleep
Commit: fe5963c

Reviewed: about 18 hours ago

Table of Contents

Discovery Implementation Validation

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.