Audit experiment integrity before claiming results. Uses cross-model review (GPT-5.4) to check for fake ground truth, score normalization fraud, phantom results, and insufficient scope. Use when user says "审计实验", "check experiment integrity", "audit results", "实验诚实度", or after experiments complete before writing claims.
89
88%
Does it follow best practices?
Impact
Pending
No eval scenarios have been run
Risky
Do not use without reviewing
Quality
Discovery
100%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
This is a strong skill description that clearly defines a specialized niche (experiment integrity auditing), lists concrete actions (checking for fake ground truth, score normalization fraud, phantom results, insufficient scope), and provides explicit bilingual trigger terms with both keyword and situational triggers. The description is concise, uses third-person voice, and would be easily distinguishable from other skills.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | Lists multiple specific concrete actions: cross-model review using GPT-5.4, checking for fake ground truth, score normalization fraud, phantom results, and insufficient scope. These are highly specific audit capabilities. | 3 / 3 |
Completeness | Clearly answers both 'what' (audit experiment integrity via cross-model review checking for specific fraud types) and 'when' (explicit 'Use when' clause with specific trigger phrases and a situational trigger). | 3 / 3 |
Trigger Term Quality | Includes natural trigger terms in both Chinese and English: '审计实验', 'check experiment integrity', 'audit results', '实验诚实度', plus a situational trigger 'after experiments complete before writing claims'. Good coverage of how users would naturally invoke this. | 3 / 3 |
Distinctiveness Conflict Risk | Highly distinctive niche: experiment integrity auditing with specific fraud detection types and cross-model review. Very unlikely to conflict with other skills due to the specialized domain and unique trigger terms. | 3 / 3 |
Total | 12 / 12 Passed |
Implementation
77%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This is a well-structured, highly actionable skill with a clear multi-step workflow and explicit validation criteria. Its main weakness is moderate verbosity — the motivational section, integration details for hypothetical sibling skills, and acknowledgements add tokens without proportional value. The references to external shared files suggest good progressive disclosure intent, but without bundle files to verify, the actual navigation story is incomplete.
Suggestions
Trim the 'Why This Exists' section to 1-2 sentences — Claude doesn't need a detailed explanation of why fraud patterns occur, just what to check for.
Move the 'Integration with Other Skills' section to a separate file (e.g., INTEGRATION.md) and reference it with a one-line link, since it's only relevant when used within a larger pipeline.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | The 'Why This Exists' section explaining fraud patterns is useful context but somewhat verbose. The 'Acknowledgements' section and some integration details (read by /paper-write, /result-to-claim) add tokens that may not be essential. The core workflow is reasonably efficient but the overall document is longer than necessary. | 2 / 3 |
Actionability | The skill provides fully concrete, executable guidance: specific file glob patterns for artifact collection, a complete MCP tool invocation with exact prompt text, detailed output templates for both markdown and JSON reports, and a precise emoji-formatted summary. Every step has copy-paste ready content. | 3 / 3 |
Workflow Clarity | The 4-step workflow is clearly sequenced with explicit separation of executor vs reviewer roles. The audit checklist has clear PASS/WARN/FAIL criteria for each check. The integration section shows how results flow into downstream skills with explicit conditional logic. The reviewer independence constraint serves as a validation checkpoint. | 3 / 3 |
Progressive Disclosure | The skill references several external files (shared-references/reviewer-independence.md, shared-references/experiment-integrity.md, shared-references/reviewer-routing.md, shared-references/review-tracing.md, tools/save_trace.sh) which is good progressive disclosure design, but no bundle files are provided to verify these exist. The main SKILL.md itself is quite long with integration details that could potentially be split out, and the inline MCP prompt is very large. | 2 / 3 |
Total | 10 / 12 Passed |
Validation
81%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 9 / 11 Passed
Validation for skill structure
| Criteria | Description | Result |
|---|---|---|
allowed_tools_field | 'allowed-tools' contains unusual tool name(s) | Warning |
frontmatter_unknown_keys | Unknown frontmatter key(s) found; consider removing or moving to metadata | Warning |
Total | 9 / 11 Passed | |
2028ac4
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.