Use when main results pass result-to-claim (`claim_supported = yes` or `partial`) and ablation studies are needed for paper submission. A secondary Codex agent designs ablations from a reviewer's perspective; the local executor reviews feasibility and implements.
74
68%
Does it follow best practices?
Impact
Pending
No eval scenarios have been run
Passed
No known issues
Optimize this skill with Tessl
npx tessl skill review --optimize ./skills/skills-codex/ablation-planner/SKILL.mdQuality
Discovery
75%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
The description has strong completeness with an explicit 'Use when' clause and clear triggering conditions, and occupies a very distinct niche. However, the specific actions could be more concrete (what exactly does 'designs ablations' and 'implements' entail?), and the trigger terms lean heavily on technical jargon that may not match how users naturally phrase requests.
Suggestions
Add more concrete action verbs describing what the skill does, e.g., 'identifies key components to ablate, designs controlled experiments, generates comparison tables, and produces ablation result summaries'.
Include more natural trigger terms users might say, such as 'ablation experiments', 'component contribution analysis', 'justify experimental results', or 'reviewer requested ablations'.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | The description names a specific domain (ablation studies for paper submission) and mentions some actions (designs ablations, reviews feasibility, implements), but the actions are somewhat vague and don't list concrete specific operations like 'removes individual components', 'reruns experiments', 'generates comparison tables'. | 2 / 3 |
Completeness | The description explicitly answers both 'what' (designs ablations from a reviewer's perspective, reviews feasibility, implements) and 'when' (when main results pass result-to-claim with claim_supported = yes or partial and ablation studies are needed for paper submission). The 'Use when' clause is present and specific. | 3 / 3 |
Trigger Term Quality | Includes some relevant terms like 'ablation studies', 'paper submission', 'reviewer's perspective', and 'result-to-claim', but these are fairly technical/jargon-heavy. Missing more natural user phrases like 'ablation experiments', 'component analysis', 'justify results', or 'reviewer response'. | 2 / 3 |
Distinctiveness Conflict Risk | This is a very specific niche — ablation study design triggered by a specific upstream condition (result-to-claim passing). The combination of ablation studies, reviewer perspective, and the prerequisite condition makes it highly unlikely to conflict with other skills. | 3 / 3 |
Total | 10 / 12 Passed |
Implementation
62%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This is a well-structured workflow skill with clear sequencing and good validation checkpoints, particularly the feasibility review step before execution. Its main weaknesses are the lack of truly executable/concrete implementation examples (configs, scripts, commands) and moderate verbosity in the table templates and agent prompt that could be tightened or split into referenced files. The rules section is strong and provides clear guardrails.
Suggestions
Add a concrete, minimal example of an ablation config file or script modification to make Step 5 more actionable (e.g., a sample YAML config diff for a config-only ablation).
Consider moving the detailed markdown table template and the full Codex agent prompt into a referenced file (e.g., ABLATION_TEMPLATES.md) to reduce the main skill's token footprint while preserving the overview structure.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | The skill is reasonably efficient but includes some unnecessary verbosity, such as the detailed markdown table templates and the lengthy Codex prompt that could be more compact. Some explanatory text (e.g., 'The reviewer thinks like a reviewer; the local executor thinks like an engineer') is mildly redundant but not egregious. | 2 / 3 |
Actionability | The skill provides structured guidance with table formats and a clear agent prompt, but lacks truly executable code—there are no actual scripts, commands, or config examples to copy-paste. The 'Implement and Run' step is high-level ('create configs or scripts') without concrete examples of what those configs look like. | 2 / 3 |
Workflow Clarity | The 5-step workflow is clearly sequenced with logical progression from context gathering through design, parsing, feasibility review, and implementation. Step 4 serves as an explicit validation/feasibility checkpoint before execution, and Step 5 includes smoke testing before full runs. The feedback loop for budget constraints (propose cuts, ask for re-prioritization) is well-defined. | 3 / 3 |
Progressive Disclosure | The content is well-structured with clear sections and headers, but it's somewhat monolithic—the detailed table templates and the full agent prompt could be split into referenced files. There are no external file references for advanced usage or examples, though the skill does reference project files like EXPERIMENT_LOG.md and findings.md appropriately. | 2 / 3 |
Total | 9 / 12 Passed |
Validation
90%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 10 / 11 Passed
Validation for skill structure
| Criteria | Description | Result |
|---|---|---|
allowed_tools_field | 'allowed-tools' contains unusual tool name(s) | Warning |
Total | 10 / 11 Passed | |
700fbe2
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.