CtrlK
BlogDocsLog inGet started
Tessl Logo

experiment-audit

Audit experiment integrity before claiming results. Uses cross-model review (GPT-5.4) to check for fake ground truth, score normalization fraud, phantom results, and insufficient scope. Use when user says "审计实验", "check experiment integrity", "audit results", "实验诚实度", or after experiments complete before writing claims.

89

Quality

88%

Does it follow best practices?

Impact

Pending

No eval scenarios have been run

SecuritybySnyk

Risky

Do not use without reviewing

SKILL.md
Quality
Evals
Security

Quality

Discovery

100%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

This is a strong skill description that clearly defines a specialized niche (experiment integrity auditing), lists concrete actions (checking for fake ground truth, score normalization fraud, phantom results, insufficient scope), and provides explicit bilingual trigger terms with both keyword and situational triggers. The description is concise, uses third-person voice, and would be easily distinguishable from other skills.

DimensionReasoningScore

Specificity

Lists multiple specific concrete actions: cross-model review using GPT-5.4, checking for fake ground truth, score normalization fraud, phantom results, and insufficient scope. These are highly specific audit capabilities.

3 / 3

Completeness

Clearly answers both 'what' (audit experiment integrity via cross-model review checking for specific fraud types) and 'when' (explicit 'Use when' clause with specific trigger phrases and a situational trigger).

3 / 3

Trigger Term Quality

Includes natural trigger terms in both Chinese and English: '审计实验', 'check experiment integrity', 'audit results', '实验诚实度', plus a situational trigger 'after experiments complete before writing claims'. Good coverage of how users would naturally invoke this.

3 / 3

Distinctiveness Conflict Risk

Highly distinctive niche: experiment integrity auditing with specific fraud detection types and cross-model review. Very unlikely to conflict with other skills due to the specialized domain and unique trigger terms.

3 / 3

Total

12

/

12

Passed

Implementation

77%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This is a well-structured, highly actionable skill with a clear multi-step workflow and explicit validation criteria. Its main weakness is moderate verbosity — the motivational section, integration details for hypothetical sibling skills, and acknowledgements add tokens without proportional value. The references to external shared files suggest good progressive disclosure intent, but without bundle files to verify, the actual navigation story is incomplete.

Suggestions

Trim the 'Why This Exists' section to 1-2 sentences — Claude doesn't need a detailed explanation of why fraud patterns occur, just what to check for.

Move the 'Integration with Other Skills' section to a separate file (e.g., INTEGRATION.md) and reference it with a one-line link, since it's only relevant when used within a larger pipeline.

DimensionReasoningScore

Conciseness

The 'Why This Exists' section explaining fraud patterns is useful context but somewhat verbose. The 'Acknowledgements' section and some integration details (read by /paper-write, /result-to-claim) add tokens that may not be essential. The core workflow is reasonably efficient but the overall document is longer than necessary.

2 / 3

Actionability

The skill provides fully concrete, executable guidance: specific file glob patterns for artifact collection, a complete MCP tool invocation with exact prompt text, detailed output templates for both markdown and JSON reports, and a precise emoji-formatted summary. Every step has copy-paste ready content.

3 / 3

Workflow Clarity

The 4-step workflow is clearly sequenced with explicit separation of executor vs reviewer roles. The audit checklist has clear PASS/WARN/FAIL criteria for each check. The integration section shows how results flow into downstream skills with explicit conditional logic. The reviewer independence constraint serves as a validation checkpoint.

3 / 3

Progressive Disclosure

The skill references several external files (shared-references/reviewer-independence.md, shared-references/experiment-integrity.md, shared-references/reviewer-routing.md, shared-references/review-tracing.md, tools/save_trace.sh) which is good progressive disclosure design, but no bundle files are provided to verify these exist. The main SKILL.md itself is quite long with integration details that could potentially be split out, and the inline MCP prompt is very large.

2 / 3

Total

10

/

12

Passed

Validation

81%

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation9 / 11 Passed

Validation for skill structure

CriteriaDescriptionResult

allowed_tools_field

'allowed-tools' contains unusual tool name(s)

Warning

frontmatter_unknown_keys

Unknown frontmatter key(s) found; consider removing or moving to metadata

Warning

Total

9

/

11

Passed

Repository
wanshuiyin/Auto-claude-code-research-in-sleep
Reviewed

Table of Contents

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.