CtrlK
BlogDocsLog inGet started
Tessl Logo

paper-claim-audit

Zero-context verification that every number, comparison, and scope claim in the paper matches raw result files. Uses a fresh cross-model reviewer with NO prior context to prevent confirmation bias. Use when user says "审查论文数据", "check paper claims", "verify numbers", "论文数字核对", or before submission to ensure paper-to-evidence fidelity.

83

Quality

81%

Does it follow best practices?

Impact

Pending

No eval scenarios have been run

SecuritybySnyk

Passed

No known issues

SKILL.md
Quality
Evals
Security

Quality

Discovery

100%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

This is an excellent skill description that clearly articulates a specific, well-defined capability (verifying paper claims against raw data), explains the unique methodology (fresh cross-model reviewer to prevent confirmation bias), and provides explicit bilingual trigger terms. It covers all dimensions strongly with concrete actions, natural keywords, complete what/when guidance, and a distinctive niche.

DimensionReasoningScore

Specificity

Lists multiple specific concrete actions: verifying numbers, comparisons, and scope claims against raw result files. Also specifies the mechanism (fresh cross-model reviewer with no prior context) and the purpose (prevent confirmation bias, ensure paper-to-evidence fidelity).

3 / 3

Completeness

Clearly answers both 'what' (zero-context verification of numbers, comparisons, and scope claims against raw result files using a fresh cross-model reviewer) and 'when' (explicit 'Use when' clause with specific trigger phrases and the situational trigger 'before submission').

3 / 3

Trigger Term Quality

Includes strong natural trigger terms in both English and Chinese: '审查论文数据', 'check paper claims', 'verify numbers', '论文数字核对', and 'before submission'. These cover natural phrases users would actually say when needing this skill.

3 / 3

Distinctiveness Conflict Risk

Highly distinctive niche: paper data verification against raw evidence files with a cross-model bias-prevention approach. The bilingual trigger terms and specific domain (academic paper claims vs. raw results) make it very unlikely to conflict with other skills.

3 / 3

Total

12

/

12

Passed

Implementation

62%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

The skill excels at actionability and workflow clarity with concrete tool invocations, a complete reviewer prompt, detailed output schemas, and a well-sequenced 4-step process. However, it is significantly over-long and verbose — explaining concepts like confirmation bias, repeating the 'fresh thread' rule at least 4 times, and inlining extensive artifact schemas that could be in referenced files. Trimming redundancy and moving reference material to bundle files would significantly improve this skill.

Suggestions

Remove the 'Why This Exists' section and the comparison table — Claude doesn't need motivation or skill differentiation explanations to execute the workflow.

Consolidate the repeated 'fresh thread / never codex-reply / zero context' instructions into a single prominent rule rather than restating it in Core Principle, Key Rules, Thread Independence, and the workflow step.

Move the full JSON schema, path conventions, and verdict decision table into a referenced bundle file (e.g., `paper-claim-audit-schema.md`) and keep only a brief summary inline.

Remove explanatory text like 'This is stricter than reviewer-independence — it's zero-context evidence audit' which adds no actionable value.

DimensionReasoningScore

Conciseness

The skill is extremely verbose at ~250+ lines. It over-explains why zero-context matters, includes a comparison table with other skills, repeats the 'fresh thread' rule multiple times, explains confirmation bias concepts Claude already understands, and has extensive sections on submission artifacts and path conventions that could be in a referenced file. The 'Why This Exists' section and failure mode explanations are largely unnecessary padding.

1 / 3

Actionability

The skill provides highly concrete, executable guidance: specific file paths to collect, exact MCP tool invocations with parameters, a complete structured prompt for the reviewer, specific output formats (both markdown and JSON with full schema), and a clear verdict decision table. The audit protocol with specific failure modes and examples (84.7% → 85.3%) is very actionable.

3 / 3

Workflow Clarity

The 4-step workflow is clearly sequenced (Collect → Audit → Report → Summary) with explicit validation built into the process itself (the entire skill IS a validation checkpoint). The verdict decision table provides clear branching logic, and the advisory-never-blocking pattern with PASS/WARN/FAIL handling is well-defined with explicit feedback loops for integration with other skills.

3 / 3

Progressive Disclosure

The skill references external files like `shared-references/review-tracing.md`, `shared-references/assurance-contract.md`, `shared-references/reviewer-independence.md`, and `tools/save_trace.sh`, which is good progressive disclosure. However, no bundle files are provided, and the main SKILL.md itself is monolithic — the detailed JSON schema, path conventions, and verdict decision table could be split into referenced files rather than inlined, making the core workflow harder to scan.

2 / 3

Total

9

/

12

Passed

Validation

81%

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation9 / 11 Passed

Validation for skill structure

CriteriaDescriptionResult

allowed_tools_field

'allowed-tools' contains unusual tool name(s)

Warning

frontmatter_unknown_keys

Unknown frontmatter key(s) found; consider removing or moving to metadata

Warning

Total

9

/

11

Passed

Repository
wanshuiyin/Auto-claude-code-research-in-sleep
Reviewed

Table of Contents

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.