Zero-context verification that every number, comparison, and scope claim in the paper matches raw result files. Uses a fresh cross-model reviewer with NO prior context to prevent confirmation bias. Use when user says "审查论文数据", "check paper claims", "verify numbers", "论文数字核对", or before submission to ensure paper-to-evidence fidelity.
83
81%
Does it follow best practices?
Impact
Pending
No eval scenarios have been run
Passed
No known issues
Quality
Discovery
100%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
This is an excellent skill description that clearly articulates a specific, well-defined capability (verifying paper claims against raw data), explains the unique methodology (fresh cross-model reviewer to prevent confirmation bias), and provides explicit bilingual trigger terms. It covers all dimensions strongly with concrete actions, natural keywords, complete what/when guidance, and a distinctive niche.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | Lists multiple specific concrete actions: verifying numbers, comparisons, and scope claims against raw result files. Also specifies the mechanism (fresh cross-model reviewer with no prior context) and the purpose (prevent confirmation bias, ensure paper-to-evidence fidelity). | 3 / 3 |
Completeness | Clearly answers both 'what' (zero-context verification of numbers, comparisons, and scope claims against raw result files using a fresh cross-model reviewer) and 'when' (explicit 'Use when' clause with specific trigger phrases and the situational trigger 'before submission'). | 3 / 3 |
Trigger Term Quality | Includes strong natural trigger terms in both English and Chinese: '审查论文数据', 'check paper claims', 'verify numbers', '论文数字核对', and 'before submission'. These cover natural phrases users would actually say when needing this skill. | 3 / 3 |
Distinctiveness Conflict Risk | Highly distinctive niche: paper data verification against raw evidence files with a cross-model bias-prevention approach. The bilingual trigger terms and specific domain (academic paper claims vs. raw results) make it very unlikely to conflict with other skills. | 3 / 3 |
Total | 12 / 12 Passed |
Implementation
62%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
The skill excels at actionability and workflow clarity with concrete tool invocations, a complete reviewer prompt, detailed output schemas, and a well-sequenced 4-step process. However, it is significantly over-long and verbose — explaining concepts like confirmation bias, repeating the 'fresh thread' rule at least 4 times, and inlining extensive artifact schemas that could be in referenced files. Trimming redundancy and moving reference material to bundle files would significantly improve this skill.
Suggestions
Remove the 'Why This Exists' section and the comparison table — Claude doesn't need motivation or skill differentiation explanations to execute the workflow.
Consolidate the repeated 'fresh thread / never codex-reply / zero context' instructions into a single prominent rule rather than restating it in Core Principle, Key Rules, Thread Independence, and the workflow step.
Move the full JSON schema, path conventions, and verdict decision table into a referenced bundle file (e.g., `paper-claim-audit-schema.md`) and keep only a brief summary inline.
Remove explanatory text like 'This is stricter than reviewer-independence — it's zero-context evidence audit' which adds no actionable value.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | The skill is extremely verbose at ~250+ lines. It over-explains why zero-context matters, includes a comparison table with other skills, repeats the 'fresh thread' rule multiple times, explains confirmation bias concepts Claude already understands, and has extensive sections on submission artifacts and path conventions that could be in a referenced file. The 'Why This Exists' section and failure mode explanations are largely unnecessary padding. | 1 / 3 |
Actionability | The skill provides highly concrete, executable guidance: specific file paths to collect, exact MCP tool invocations with parameters, a complete structured prompt for the reviewer, specific output formats (both markdown and JSON with full schema), and a clear verdict decision table. The audit protocol with specific failure modes and examples (84.7% → 85.3%) is very actionable. | 3 / 3 |
Workflow Clarity | The 4-step workflow is clearly sequenced (Collect → Audit → Report → Summary) with explicit validation built into the process itself (the entire skill IS a validation checkpoint). The verdict decision table provides clear branching logic, and the advisory-never-blocking pattern with PASS/WARN/FAIL handling is well-defined with explicit feedback loops for integration with other skills. | 3 / 3 |
Progressive Disclosure | The skill references external files like `shared-references/review-tracing.md`, `shared-references/assurance-contract.md`, `shared-references/reviewer-independence.md`, and `tools/save_trace.sh`, which is good progressive disclosure. However, no bundle files are provided, and the main SKILL.md itself is monolithic — the detailed JSON schema, path conventions, and verdict decision table could be split into referenced files rather than inlined, making the core workflow harder to scan. | 2 / 3 |
Total | 9 / 12 Passed |
Validation
81%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 9 / 11 Passed
Validation for skill structure
| Criteria | Description | Result |
|---|---|---|
allowed_tools_field | 'allowed-tools' contains unusual tool name(s) | Warning |
frontmatter_unknown_keys | Unknown frontmatter key(s) found; consider removing or moving to metadata | Warning |
Total | 9 / 11 Passed | |
2028ac4
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.