result-to-claim

Use when experiments complete to judge what claims the results support, what they don't, and what evidence is still missing. A secondary Codex agent evaluates results against intended claims and routes to next action (pivot, supplement, or confirm). Use after experiments finish — before writing the paper or running ablations.

Quality

68%

Does it follow best practices?

Impact

—

No eval scenarios have been run

Securityby

Passed

No known issues

Optimize this skill with Tessl

npx tessl skill review --optimize ./skills/skills-codex/result-to-claim/SKILL.md

Quality

Discovery

75%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

The description effectively communicates when to use the skill with clear temporal positioning in the research workflow, and the 'what' is reasonably well-defined around claim evaluation and action routing. However, the specific capabilities could be more concrete (what does 'judge claims' actually produce?), and the trigger terms could better cover natural user language variations for experiment result analysis.

Suggestions

Add more concrete output descriptions — e.g., 'Produces a structured assessment listing supported claims, unsupported claims, and evidence gaps' rather than the vaguer 'judge what claims the results support'.

Include additional natural trigger terms users might say, such as 'analyze results', 'interpret findings', 'evaluate experiment outcomes', or 'results review'.

Dimension	Reasoning	Score
Specificity	The description names the domain (experiment evaluation) and some actions (judge claims, evaluate results against intended claims, route to next action), but the actions are somewhat abstract — 'pivot, supplement, or confirm' are named but not fully concrete in terms of what the skill actually does mechanically.	2 / 3
Completeness	The description clearly answers both 'what' (a secondary Codex agent evaluates results against intended claims and routes to next action) and 'when' (use after experiments finish, before writing the paper or running ablations), with explicit temporal triggers.	3 / 3
Trigger Term Quality	Includes some relevant terms like 'experiments complete', 'results', 'claims', 'ablations', 'paper', but misses common natural variations a user might say such as 'analyze results', 'interpret findings', 'evaluate outcomes', or 'experiment analysis'. The terms are somewhat niche and academic.	2 / 3
Distinctiveness Conflict Risk	The skill occupies a clear niche — post-experiment claim evaluation with routing decisions — and is unlikely to conflict with general data analysis, paper writing, or experiment design skills. The specific workflow position (after experiments, before paper/ablations) makes it highly distinctive.	3 / 3
	Total	10 / 12 Passed

Implementation

62%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This is a well-structured research workflow skill with strong workflow clarity and clear routing logic based on experiment verdicts. Its main weaknesses are moderate verbosity (especially in the wiki update section) and reliance on pseudo-logic rather than fully executable code, which limits actionability. The progressive disclosure is reasonable but the inline detail in Step 5 could be better offloaded.

Suggestions

Move the detailed research wiki update logic (Step 5) into a separate referenced file (e.g., wiki-update-protocol.md) and keep only a summary + link in the main skill body.

Make the spawn_agent invocation more concrete — specify the exact tool/API call syntax rather than a template block, or clarify that the format shown IS the literal invocation format.

Trim the 'When to Use' section and opening paragraph — Claude doesn't need motivation for when to apply a skill it's been given.

Dimension	Reasoning	Score
Conciseness	The skill is reasonably efficient but includes some unnecessary framing ('Experiments produce numbers; this gate decides what those numbers mean') and the 'When to Use' section explains things Claude could infer. The research wiki update section (Step 5) is quite lengthy and could be more terse. However, most content is substantive and non-redundant.	2 / 3
Actionability	The skill provides a clear structured workflow with specific fields and routing logic, and the Codex prompt template is concrete. However, there's no truly executable code — the W&B snippet is illustrative, the spawn_agent block is pseudocode/template, and the integrity check and wiki update steps use pseudo-logic rather than executable commands. Key details like how to actually spawn the Codex agent or run the ARIS helper are left implicit.	2 / 3
Workflow Clarity	The multi-step workflow is clearly sequenced (collect → judge → parse → check integrity → route → update wiki) with explicit validation via the secondary Codex judgment and integrity audit. The routing logic includes clear feedback loops: partial verdicts trigger supplementary experiments and re-evaluation, multiple partial rounds trigger scope narrowing. The conditional steps (3.5, 5) are clearly marked as skippable with explicit conditions.	3 / 3
Progressive Disclosure	The skill references external files (shared-references/experiment-integrity.md, shared-references/review-tracing.md, EXPERIMENT_LOG.md, etc.) which is good progressive disclosure. However, no bundle files are provided to verify these references exist, and the skill itself is quite long (~150+ lines) with the research wiki update section (Step 5) being very detailed inline content that could arguably be split into a separate reference file.	2 / 3
	Total	9 / 12 Passed

Validation

81%

Warnings & errors only

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation — 9 / 11 Passed

Validation for skill structure

Criteria	Description	Result
allowed_tools_field	'allowed-tools' contains unusual tool name(s)	Warning
frontmatter_unknown_keys	Unknown frontmatter key(s) found; consider removing or moving to metadata	Warning

	Total	9 / 11 Passed

Repository: wanshuiyin/Auto-claude-code-research-in-sleep
Commit: a425a71

Reviewed: 1 day ago

Table of Contents

Discovery Implementation Validation

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.