Zero-context verification that every number, comparison, and scope claim in the paper matches raw result files. Uses a fresh cross-model reviewer with NO prior context to prevent confirmation bias. Use when user says "审查论文数据", "check paper claims", "verify numbers", "论文数字核对", or before submission to ensure paper-to-evidence fidelity.
64
77%
Does it follow best practices?
Impact
—
No eval scenarios have been run
Advisory
Suggest reviewing before use
Optimize this skill with Tessl
npx tessl skill review --optimize ./skills/paper-claim-audit/SKILL.mdQuality
Discovery
100%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
This is an excellent skill description that clearly articulates a specific, well-defined capability (verifying paper claims against raw data), explains the unique methodology (fresh cross-model reviewer to prevent confirmation bias), and provides explicit bilingual trigger terms. It scores highly across all dimensions with strong specificity, natural trigger coverage, completeness, and distinctiveness.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | Lists multiple specific concrete actions: verifying numbers, comparisons, and scope claims against raw result files. Also specifies the mechanism (fresh cross-model reviewer with no prior context) and the purpose (prevent confirmation bias, ensure paper-to-evidence fidelity). | 3 / 3 |
Completeness | Clearly answers both 'what' (zero-context verification of numbers, comparisons, and scope claims against raw result files using a fresh cross-model reviewer) and 'when' (explicit 'Use when' clause with specific trigger phrases and the scenario 'before submission'). | 3 / 3 |
Trigger Term Quality | Includes strong natural trigger terms in both English and Chinese: '审查论文数据', 'check paper claims', 'verify numbers', '论文数字核对', and 'before submission'. These cover natural phrases users would actually say when needing this skill. | 3 / 3 |
Distinctiveness Conflict Risk | Highly distinctive niche: paper data verification against raw evidence files with a cross-model bias-prevention approach. The bilingual trigger terms and specific focus on numerical claim verification make it very unlikely to conflict with general writing, editing, or code review skills. | 3 / 3 |
Total | 12 / 12 Passed |
Implementation
55%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
The skill provides excellent actionability with concrete tools, schemas, and a well-structured workflow, but is severely undermined by verbosity and poor progressive disclosure. It repeats key rules (fresh thread, zero context) multiple times, explains concepts Claude already understands (confirmation bias, why fresh context matters), and inlines extensive schema and protocol details that should be in separate reference files. The content could likely be cut by 50-60% without losing any actionable information.
Suggestions
Remove the 'Why This Exists' section and the comparison table — Claude doesn't need motivation or skill differentiation explanations. A single sentence like 'Fresh zero-context cross-model audit to catch executor confirmation bias' suffices.
Move the full JSON schema, verdict decision table, and path convention details into a referenced file (e.g., PAPER_CLAIM_AUDIT_SCHEMA.md) and keep only a brief summary inline.
Consolidate the repeated 'fresh thread' / 'zero context' / 'never codex-reply' rules into a single 'Key Rules' section instead of restating them in Core Principle, Key Rules, Thread Independence, and the reviewer prompt.
Move the detailed reviewer prompt template to a separate file (e.g., REVIEWER_PROMPT.md) since it's ~40 lines of content that only needs to be referenced, not read every time the skill is loaded.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | The skill is extremely verbose at ~300+ lines. It explains why confirmation bias exists (Claude knows this), includes a comparison table with other skills, repeats the 'fresh thread' rule 4+ times across different sections, and has extensive schema documentation that could be in a referenced file. The 'Why This Exists' section and failure mode explanations are largely unnecessary for Claude. | 1 / 3 |
Actionability | The skill provides highly concrete, executable guidance: specific MCP tool calls with parameters, exact file paths to collect and exclude, a complete JSON schema for the output artifact, a detailed prompt template for the reviewer, and specific examples of failure modes with numeric thresholds (e.g., 84.7% → 85.3% is NOT OK). | 3 / 3 |
Workflow Clarity | The 4-step workflow is clearly sequenced (Collect → Audit → Report → Summary) with explicit validation built into the audit protocol. The verdict decision table provides clear branching logic. Error recovery is addressed (non-blocking HTML render, ERROR verdict for failed reviewer calls). The feedback loop with /auto-paper-improvement-loop is well-defined. | 3 / 3 |
Progressive Disclosure | Despite being extremely long, the skill is a monolithic wall of text with no bundle files to offload content to. The full JSON schema, the complete reviewer prompt, the verdict decision table, the path convention details, and the audit protocol could all be in referenced files. References to shared-references/ files exist but no bundle files are provided, and the inline content is not appropriately split. | 1 / 3 |
Total | 8 / 12 Passed |
Validation
81%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 9 / 11 Passed
Validation for skill structure
| Criteria | Description | Result |
|---|---|---|
allowed_tools_field | 'allowed-tools' contains unusual tool name(s) | Warning |
frontmatter_unknown_keys | Unknown frontmatter key(s) found; consider removing or moving to metadata | Warning |
Total | 9 / 11 Passed | |
66b974e
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.