result-to-claim

Use when experiments complete to judge what claims the results support, what they don't, and what evidence is still missing. A secondary Codex agent evaluates results against intended claims and routes to next action (pivot, supplement, or confirm). Use after experiments finish — before writing the paper or running ablations.

Quality

—

Does it follow best practices?

Impact

—

No eval scenarios have been run

Securityby

Advisory

Suggest reviewing before use

Quality

Content

77%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

A well-sequenced, highly actionable research-gating skill with strong validation checkpoints and feedback loops. Its main weakness is token efficiency and progressive disclosure: the long inline bash/prompt blocks and duplicated field lists could be tightened or extracted into reference files.

Suggestions

De-duplicate the seven evaluation fields: define them once and have the Step 3 parse template reference the Step 2 prompt's field list rather than restating all fields.

Extract the large Step 5 research-wiki bash block into a referenced script (e.g. scripts/update_research_wiki.sh) and keep only the call site and key constraints inline, improving both conciseness and progressive disclosure.

Tighten the inline bash comments to the non-obvious constraints only, trimming the explanatory prose that restates what the commands already show.

Dimension	Reasoning	Score
Conciseness	The body is operational and free of basic-concept padding, but at ~200 lines it duplicates the seven-field list between the Step 2 spawn prompt and Step 3 parse template, and the Step 5 bash block carries verbose inline comments that could be tightened.	2 / 3
Actionability	Provides concrete executable guidance — real W&B (`wandb.Api().run(...).history()`) and ssh calls, a complete `spawn_agent` prompt schema, and copy-paste-ready `python3 research_wiki.py` commands with specific flags — rather than vague direction.	3 / 3
Workflow Clarity	Steps are clearly sequenced (Collect → Codex Judgment → Parse → Integrity check → Route → Wiki) with explicit validation (integrity_status fail/warn downgrades confidence), a partial→supplement→re-run feedback loop, and explicit edge-case handling (unavailable reviewer, unreachable wiki script).	3 / 3
Progressive Disclosure	No bundle files exist; the external refs that are present (`shared-references/experiment-integrity.md`, `../shared-references/review-tracing.md`) are well-signaled and one-level-deep, but the large inline Step 5 bash block and spawn prompt are content that could be split into reference files.	2 / 3
	Total	10 / 12 Passed

Description

100%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

A strong, specific description that clearly states both the capability and the trigger conditions with concrete routing actions and natural researcher terminology. No vague fluff or over-claims.

Dimension	Reasoning	Score
Specificity	Names multiple concrete actions — "judge what claims the results support, what they don't, and what evidence is still missing", "evaluates results against intended claims", and "routes to next action (pivot, supplement, or confirm)" — matching the score-3 anchor of several specific concrete actions.	3 / 3
Completeness	Explicitly answers both what (judge/evaluate/route results vs intended claims) and when (two explicit "Use when/Use after" clauses), so it is not capped at 2 by the missing-trigger guideline.	3 / 3
Trigger Term Quality	Includes natural researcher phrasing ("Use when experiments complete", "Use after experiments finish — before writing the paper or running ablations", "claims", "evidence is still missing") that a user would naturally say, giving good coverage rather than jargon-only terms.	3 / 3
Distinctiveness Conflict Risk	It carves a clear research-specific niche (post-experiment claim gating with pivot/supplement/confirm routing) with distinct triggers unlikely to fire for unrelated skills.	3 / 3
	Total	12 / 12 Passed

Validation

87%

Warnings & errors only

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation — 14 / 16 Passed

Validation for skill structure

Criteria	Description	Result
allowed_tools_field	'allowed-tools' contains unusual tool name(s)	Warning
frontmatter_unknown_keys	Unknown frontmatter key(s) found; consider removing or moving to metadata	Warning

	Total	14 / 16 Passed

Repository: wanshuiyin/Auto-claude-code-research-in-sleep
Commit: fe5963c

Reviewed: about 15 hours ago

Table of Contents

Discovery Implementation Validation

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.