CtrlK
BlogDocsLog inGet started
Tessl Logo

ablation-planner

Use when main results pass result-to-claim (`claim_supported = yes` or `partial`) and ablation studies are needed for paper submission. A secondary Codex agent designs ablations from a reviewer's perspective; the local executor reviews feasibility and implements.

74

Quality

68%

Does it follow best practices?

Impact

Pending

No eval scenarios have been run

SecuritybySnyk

Passed

No known issues

Optimize this skill with Tessl

npx tessl skill review --optimize ./skills/skills-codex/ablation-planner/SKILL.md
SKILL.md
Quality
Evals
Security

Quality

Discovery

75%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

The description has strong completeness with an explicit 'Use when' clause and clear triggering conditions, and occupies a very distinct niche unlikely to conflict with other skills. However, the specific actions could be more concrete (what exactly does 'designs ablations' and 'implements' entail?), and the trigger terms lean toward internal pipeline jargon rather than natural user language.

Suggestions

Add more concrete action verbs describing what the skill actually does, e.g., 'identifies key components to ablate, designs controlled experiments removing individual contributions, generates comparison tables'.

Include more natural trigger terms users might say, such as 'ablation experiments', 'component contribution analysis', 'remove and test individual features', or 'verify each component's contribution'.

DimensionReasoningScore

Specificity

The description names a specific domain (ablation studies for paper submission) and mentions some actions (designs ablations, reviews feasibility, implements), but the actions are somewhat vague and don't list concrete specific operations like 'removes individual components', 'reruns experiments', or 'generates comparison tables'.

2 / 3

Completeness

The description explicitly answers both 'what' (designs ablations from a reviewer's perspective, reviews feasibility, implements) and 'when' (when main results pass result-to-claim with claim_supported = yes or partial and ablation studies are needed for paper submission), with a clear 'Use when' clause at the start.

3 / 3

Trigger Term Quality

Includes some relevant terms like 'ablation studies', 'paper submission', 'reviewer's perspective', and 'result-to-claim', but misses common natural variations users might say such as 'ablation experiments', 'component analysis', 'contribution analysis', or 'what if I remove'. The term 'claim_supported' is internal jargon that users wouldn't naturally use.

2 / 3

Distinctiveness Conflict Risk

This is a very specific niche — ablation study design triggered by a specific upstream condition (result-to-claim passing). It is unlikely to conflict with other skills due to its narrow scope and specific prerequisite conditions.

3 / 3

Total

10

/

12

Passed

Implementation

62%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This is a well-structured workflow skill with clear sequencing and good validation checkpoints, particularly the feasibility review before execution and smoke testing. Its main weaknesses are the lack of truly executable code/commands (the spawn_agent block is a template, not an API call) and some inline verbosity in the template section that could be externalized. The rules section is strong and provides clear constraints for Claude's behavior.

Suggestions

Replace the pseudocode spawn_agent block with the actual API or CLI command syntax used to spawn a secondary Codex agent, or clarify that this is a conceptual template if no concrete API exists.

Add concrete, executable examples for implementation steps—e.g., actual shell commands for creating config variants, running smoke tests, or specific code snippets for modifying experiment configs.

Consider moving the full ablation plan markdown template (Step 3) to a separate reference file and linking to it, keeping only a brief summary inline.

DimensionReasoningScore

Conciseness

The skill is reasonably efficient but includes some unnecessary elaboration. The structured tables and rules are useful, but some sections like the full markdown template in Step 3 could be more concise since Claude can generate table formats without a full template. The spawn_agent message block is appropriately detailed as it's a prompt template.

2 / 3

Actionability

The skill provides a clear structured workflow with specific steps, table formats, and naming conventions. However, it lacks executable code—the spawn_agent block is pseudocode/template rather than an actual API call, and there are no concrete commands for creating configs, running smoke tests, or tracking results. The guidance is specific but not copy-paste ready.

2 / 3

Workflow Clarity

The 5-step workflow is clearly sequenced with logical progression from context gathering through design, parsing, feasibility review, and implementation. Step 4 serves as an explicit validation/feasibility checkpoint before execution, Step 5 includes smoke testing before full runs, and the rules section addresses error cases like budget overruns with explicit feedback loops (propose cuts, ask for re-prioritization).

3 / 3

Progressive Disclosure

The skill references external files like EXPERIMENT_LOG.md, findings.md, and research_contract.md, showing awareness of the broader project structure. However, the skill itself is somewhat monolithic—the full ablation plan template in Step 3 is lengthy inline content that could be referenced separately. There are no links to supplementary files for advanced usage or examples.

2 / 3

Total

9

/

12

Passed

Validation

90%

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation10 / 11 Passed

Validation for skill structure

CriteriaDescriptionResult

allowed_tools_field

'allowed-tools' contains unusual tool name(s)

Warning

Total

10

/

11

Passed

Repository
wanshuiyin/Auto-claude-code-research-in-sleep
Reviewed

Table of Contents

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.