Audit experiment integrity before claiming results. Uses cross-model review (GPT-5.4) to check for fake ground truth, score normalization fraud, phantom results, and insufficient scope. Use when user says "审计实验", "check experiment integrity", "audit results", "实验诚实度", or after experiments complete before writing claims.
89
88%
Does it follow best practices?
Impact
Pending
No eval scenarios have been run
Risky
Do not use without reviewing
Security
1 high severity finding. You should review these findings carefully before considering using this skill.
The skill handles credentials insecurely by requiring the agent to include secret values verbatim in its generated output. This exposes credentials in the agent’s context and conversation history, creating a risk of data exfiltration.
Insecure credential handling detected (high risk: 0.80). The reviewer is explicitly asked to read config/result files and produce "exact file:line references" and detailed findings, which can cause secrets (API keys, tokens, cookies) present in those files to be read and reproduced verbatim in the audit outputs, creating an exfiltration risk.
2028ac4
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.