experiment-audit

Audit experiment integrity before claiming results. Uses cross-model review (GPT-5.4) to check for fake ground truth, score normalization fraud, phantom results, and insufficient scope. Use when user says "审计实验", "check experiment integrity", "audit results", "实验诚实度", or after experiments complete before writing claims.

Quality

88%

Does it follow best practices?

Impact

Pending

No eval scenarios have been run

Securityby

Risky

Do not use without reviewing

Quality

Discovery

100%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

This is a strong skill description that clearly defines a specialized niche (experiment integrity auditing), lists concrete actions (checking for fake ground truth, score normalization fraud, phantom results, insufficient scope), and provides explicit bilingual trigger terms with both keyword and situational triggers. The description is concise, uses third-person voice, and would be easily distinguishable from other skills.

Dimension	Reasoning	Score
Specificity	Lists multiple specific concrete actions: cross-model review using GPT-5.4, checking for fake ground truth, score normalization fraud, phantom results, and insufficient scope. These are highly specific audit capabilities.	3 / 3
Completeness	Clearly answers both 'what' (audit experiment integrity via cross-model review checking for specific fraud types) and 'when' (explicit 'Use when' clause with specific trigger phrases and a situational trigger).	3 / 3
Trigger Term Quality	Includes natural trigger terms in both Chinese and English: '审计实验', 'check experiment integrity', 'audit results', '实验诚实度', plus a situational trigger 'after experiments complete before writing claims'. Good coverage of how users would naturally invoke this.	3 / 3
Distinctiveness Conflict Risk	Highly distinctive niche: experiment integrity auditing with specific fraud detection types and cross-model review. Very unlikely to conflict with other skills due to the specialized domain and unique trigger terms.	3 / 3
	Total	12 / 12 Passed

Implementation

77%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This is a well-structured, highly actionable skill with a clear multi-step workflow and explicit validation criteria. Its main weakness is moderate verbosity — the motivational section, integration details for hypothetical sibling skills, and acknowledgements add tokens without proportional value. The references to external shared files suggest good progressive disclosure intent, but without bundle files to verify, the actual navigation story is incomplete.

Suggestions

Trim the 'Why This Exists' section to 1-2 sentences — Claude doesn't need a detailed explanation of why fraud patterns occur, just what to check for.

Move the 'Integration with Other Skills' section to a separate file (e.g., INTEGRATION.md) and reference it with a one-line link, since it's only relevant when used within a larger pipeline.

Dimension	Reasoning	Score
Conciseness	The 'Why This Exists' section explaining fraud patterns is useful context but somewhat verbose. The 'Acknowledgements' section and some integration details (read by /paper-write, /result-to-claim) add tokens that may not be essential. The core workflow is reasonably efficient but the overall document is longer than necessary.	2 / 3
Actionability	The skill provides fully concrete, executable guidance: specific file glob patterns for artifact collection, a complete MCP tool invocation with exact prompt text, detailed output templates for both markdown and JSON reports, and a precise emoji-formatted summary. Every step has copy-paste ready content.	3 / 3
Workflow Clarity	The 4-step workflow is clearly sequenced with explicit separation of executor vs reviewer roles. The audit checklist has clear PASS/WARN/FAIL criteria for each check. The integration section shows how results flow into downstream skills with explicit conditional logic. The reviewer independence constraint serves as a validation checkpoint.	3 / 3
Progressive Disclosure	The skill references several external files (shared-references/reviewer-independence.md, shared-references/experiment-integrity.md, shared-references/reviewer-routing.md, shared-references/review-tracing.md, tools/save_trace.sh) which is good progressive disclosure design, but no bundle files are provided to verify these exist. The main SKILL.md itself is quite long with integration details that could potentially be split out, and the inline MCP prompt is very large.	2 / 3
	Total	10 / 12 Passed

Validation

81%

Warnings & errors only

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation — 9 / 11 Passed

Validation for skill structure

Criteria	Description	Result
allowed_tools_field	'allowed-tools' contains unusual tool name(s)	Warning
frontmatter_unknown_keys	Unknown frontmatter key(s) found; consider removing or moving to metadata	Warning

	Total	9 / 11 Passed

Repository: wanshuiyin/Auto-claude-code-research-in-sleep
Commit: 2028ac4

Reviewed: 1 day ago

Table of Contents

Discovery Implementation Validation

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.