paper-claim-audit

Zero-context verification that every number, comparison, and scope claim in the paper matches raw result files. Uses a fresh cross-model reviewer with NO prior context to prevent confirmation bias. Use when user says "审查论文数据", "check paper claims", "verify numbers", "论文数字核对", or before submission to ensure paper-to-evidence fidelity.

Quality

—

Does it follow best practices?

Impact

—

No eval scenarios have been run

Securityby

Advisory

Suggest reviewing before use

Quality

Content

77%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

A highly actionable, well-sequenced audit workflow with concrete prompts, schemas, and explicit validation states. Its main weaknesses are repeated rule restatements across sections and monolithic inlining of large blocks that duplicate referenced material.

Suggestions

Collapse the restated rules into one canonical location (e.g. keep them only in "Key Rules") and reference it from the prompt and Step 1 to remove the repeated exclude-list and rounding/thread/cross-model statements.

Move the full Codex prompt template and the inlined assurance-contract JSON schema into a referenced file (e.g. references/audit-prompt.md) and keep only a short summary plus a link in SKILL.md, avoiding duplication of the shared-references contract.

Dimension	Reasoning	Score
Conciseness	Most of the body is operational and earned, but the same rules are restated in multiple places — the exclude list appears in both "Core Principle" and Step 1, and the rounding/fresh-thread/cross-model rules repeat across the prompt, "Key Rules", and "Thread independence" — which could be tightened.	2 / 3
Actionability	It gives fully executable guidance: the exact MCP tool (mcp__codex__codex, model gpt-5.5, reasoning xhigh), a complete copy-paste prompt template, a concrete JSON artifact schema, file-glob patterns, path conventions, and the /render-html command.	3 / 3
Workflow Clarity	The four-step workflow is clearly sequenced with explicit validation checkpoints — a verdict decision table (PASS/WARN/FAIL/NOT_APPLICABLE/BLOCKED/ERROR) and an external-verifier rehash feedback loop for STALE detection — providing clear error recovery for a batch operation.	3 / 3
Progressive Disclosure	The shared-references/*.md links are well-signaled and one level deep, but the full Codex prompt and the JSON schema are inlined monolithically in SKILL.md, with the schema even duplicating the referenced assurance-contract.md — content that should be separate sits inline.	2 / 3
	Total	10 / 12 Passed

Description

82%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

A strong description that clearly states both capability and explicit use-when triggers with good natural keyword coverage. It is held back from the top band by a single-action specificity profile and modest conflict risk against sibling audit skills.

Suggestions

Add one or two more concrete verbs to the capability clause (e.g. "extracts, traces, and cross-checks each claim against raw files") to reach the multi-action specificity anchor.

Disambiguate the generic trigger "verify numbers" by tying it to the paper context (e.g. "verify paper numbers against raw results") to reduce overlap with /result-to-claim and /experiment-audit.

Dimension	Reasoning	Score
Specificity	It names the domain and core action ("verification that every number, comparison, and scope claim in the paper matches raw result files") and the mechanism ("fresh cross-model reviewer"), but it is essentially a single verification task rather than the multiple distinct concrete actions the score-3 anchor expects.	2 / 3
Completeness	It explicitly answers both what (zero-context verification of paper claims against raw result files) and when ("Use when user says … or before submission"), matching the score-3 anchor's what-and-when-with-explicit-triggers pattern.	3 / 3
Trigger Term Quality	It includes natural phrases a user would actually say — "check paper claims", "verify numbers", "审查论文数据", "论文数字核对", and "before submission" — giving good coverage across English and Chinese.	3 / 3
Distinctiveness Conflict Risk	The paper-to-evidence niche is fairly distinct, but the body reveals overlapping sibling audit skills (/experiment-audit, /result-to-claim) and the trigger "verify numbers" is generic enough to risk firing for the wrong skill.	2 / 3
	Total	10 / 12 Passed

Validation

81%

Warnings & errors only

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation — 13 / 16 Passed

Validation for skill structure

Criteria	Description	Result
allowed_tools_field	'allowed-tools' contains unusual tool name(s)	Warning
frontmatter_unknown_keys	Unknown frontmatter key(s) found; consider removing or moving to metadata	Warning
relative_links	Relative link issues: 1 suspicious	Warning

	Total	13 / 16 Passed

Repository: wanshuiyin/Auto-claude-code-research-in-sleep
Commit: 82076e5

Reviewed: 5 days ago

Table of Contents

Discovery Implementation Validation

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.