paper-claim-audit

Zero-context verification that every number, comparison, and scope claim in the paper matches raw result files. Uses a fresh cross-model reviewer with NO prior context to prevent confirmation bias. Use when user says "审查论文数据", "check paper claims", "verify numbers", "论文数字核对", or before submission to ensure paper-to-evidence fidelity.

Quality

77%

Does it follow best practices?

Impact

—

No eval scenarios have been run

Securityby

Advisory

Suggest reviewing before use

Optimize this skill with Tessl

npx tessl skill review --optimize ./skills/paper-claim-audit/SKILL.md

Quality

Discovery

100%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

This is an excellent skill description that clearly articulates a specific, well-defined capability (verifying paper claims against raw data), explains the unique methodology (fresh cross-model reviewer to prevent confirmation bias), and provides explicit bilingual trigger terms. It scores highly across all dimensions with strong specificity, natural trigger coverage, completeness, and distinctiveness.

Dimension	Reasoning	Score
Specificity	Lists multiple specific concrete actions: verifying numbers, comparisons, and scope claims against raw result files. Also specifies the mechanism (fresh cross-model reviewer with no prior context) and the purpose (prevent confirmation bias, ensure paper-to-evidence fidelity).	3 / 3
Completeness	Clearly answers both 'what' (zero-context verification of numbers, comparisons, and scope claims against raw result files using a fresh cross-model reviewer) and 'when' (explicit 'Use when' clause with specific trigger phrases and the scenario 'before submission').	3 / 3
Trigger Term Quality	Includes strong natural trigger terms in both English and Chinese: '审查论文数据', 'check paper claims', 'verify numbers', '论文数字核对', and 'before submission'. These cover natural phrases users would actually say when needing this skill.	3 / 3
Distinctiveness Conflict Risk	Highly distinctive niche: paper data verification against raw evidence files with a cross-model bias-prevention approach. The bilingual trigger terms and specific focus on numerical claim verification make it very unlikely to conflict with general writing, editing, or code review skills.	3 / 3
	Total	12 / 12 Passed

Implementation

55%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

The skill provides excellent actionability with concrete tools, schemas, and a well-structured workflow, but is severely undermined by verbosity and poor progressive disclosure. It repeats key rules (fresh thread, zero context) multiple times, explains concepts Claude already understands (confirmation bias, why fresh context matters), and inlines extensive schema and protocol details that should be in separate reference files. The content could likely be cut by 50-60% without losing any actionable information.

Suggestions

Remove the 'Why This Exists' section and the comparison table — Claude doesn't need motivation or skill differentiation explanations. A single sentence like 'Fresh zero-context cross-model audit to catch executor confirmation bias' suffices.

Move the full JSON schema, verdict decision table, and path convention details into a referenced file (e.g., PAPER_CLAIM_AUDIT_SCHEMA.md) and keep only a brief summary inline.

Consolidate the repeated 'fresh thread' / 'zero context' / 'never codex-reply' rules into a single 'Key Rules' section instead of restating them in Core Principle, Key Rules, Thread Independence, and the reviewer prompt.

Move the detailed reviewer prompt template to a separate file (e.g., REVIEWER_PROMPT.md) since it's ~40 lines of content that only needs to be referenced, not read every time the skill is loaded.

Dimension	Reasoning	Score
Conciseness	The skill is extremely verbose at ~300+ lines. It explains why confirmation bias exists (Claude knows this), includes a comparison table with other skills, repeats the 'fresh thread' rule 4+ times across different sections, and has extensive schema documentation that could be in a referenced file. The 'Why This Exists' section and failure mode explanations are largely unnecessary for Claude.	1 / 3
Actionability	The skill provides highly concrete, executable guidance: specific MCP tool calls with parameters, exact file paths to collect and exclude, a complete JSON schema for the output artifact, a detailed prompt template for the reviewer, and specific examples of failure modes with numeric thresholds (e.g., 84.7% → 85.3% is NOT OK).	3 / 3
Workflow Clarity	The 4-step workflow is clearly sequenced (Collect → Audit → Report → Summary) with explicit validation built into the audit protocol. The verdict decision table provides clear branching logic. Error recovery is addressed (non-blocking HTML render, ERROR verdict for failed reviewer calls). The feedback loop with /auto-paper-improvement-loop is well-defined.	3 / 3
Progressive Disclosure	Despite being extremely long, the skill is a monolithic wall of text with no bundle files to offload content to. The full JSON schema, the complete reviewer prompt, the verdict decision table, the path convention details, and the audit protocol could all be in referenced files. References to shared-references/ files exist but no bundle files are provided, and the inline content is not appropriately split.	1 / 3
	Total	8 / 12 Passed

Validation

81%

Warnings & errors only

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation — 9 / 11 Passed

Validation for skill structure

Criteria	Description	Result
allowed_tools_field	'allowed-tools' contains unusual tool name(s)	Warning
frontmatter_unknown_keys	Unknown frontmatter key(s) found; consider removing or moving to metadata	Warning

	Total	9 / 11 Passed

Repository: wanshuiyin/Auto-claude-code-research-in-sleep
Commit: 66b974e

Reviewed: 5 days ago

Table of Contents

Discovery Implementation Validation

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.