CtrlK
BlogDocsLog inGet started
Tessl Logo

evilissimo/implementation-integrity-review

Reviews repositories, pull requests, diffs, and agent-generated code for reward hacking, fake completion, defensive theater, architectural bypasses, weakened guarantees, hidden fallbacks, and misleading abstractions.

98

1.09x
Quality

97%

Does it follow best practices?

Impact

100%

1.09x

Average score across 6 eval scenarios

SecuritybySnyk

Passed

No known issues

Overview
Quality
Evals
Security
Files

Quality

Content

92%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This is a well-crafted skill with a clear, actionable workflow for integrity review. It is concise, avoids explaining concepts Claude already knows, and provides specific tools, scripts, and output requirements. The main weakness is that the skill heavily depends on bundled reference files (5 docs, 3 scripts, evals) that are not provided, which limits its standalone utility and makes progressive disclosure harder to fully evaluate.

Suggestions

Provide the bundled reference files (docs/integrity-signals.md, docs/reward-hacking-patterns.md, etc.) so the skill can function as designed and progressive disclosure can be properly evaluated.

DimensionReasoningScore

Conciseness

The content is lean and efficient. It avoids explaining what code review is or how tools work. Every section serves a distinct purpose—contract establishment, scan guidance, false positive control, output format—without redundancy or padding.

3 / 3

Actionability

The skill provides a concrete 7-step review workflow, specific script paths to run, named tools for each language ecosystem, and a precise output format with required fields (category, severity, confidence, evidence, rationale, remediation). The guidance is specific and directly executable.

3 / 3

Workflow Clarity

The 7-step review workflow is clearly sequenced from establishing the contract through profiling, scanning, inspecting tests, comparing against contract, correlating signals, and producing findings. It includes validation checkpoints (step 6 correlates signals before reporting, the integrity-first rule acts as a final filter, and false positive control provides a feedback loop to drop illegitimate findings).

3 / 3

Progressive Disclosure

The skill references 5 bundled docs, 3 scripts, and an evals README with clear one-level-deep links, which is good structure. However, no bundle files were provided, so we cannot verify these references resolve correctly. The main content is well-organized but the referenced materials are entirely absent, making the skill incomplete in isolation.

2 / 3

Total

11

/

12

Passed

Description

100%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

This is an excellent skill description that clearly defines a specific niche (implementation integrity review), lists concrete failure types it detects, and provides explicit trigger guidance with multiple natural keyword variations. It uses proper third-person voice throughout and would be easily distinguishable from general code review or testing skills.

DimensionReasoningScore

Specificity

Lists multiple specific concrete actions and failure types: 'reward hacking, fake completion, defensive theater, test manipulation, architectural bypasses, weakened guarantees, and misleading abstractions.' Also specifies concrete targets: 'repositories, pull requests, diffs, and agent-generated code.'

3 / 3

Completeness

Clearly answers both 'what' (reviews code for implementation integrity failures including specific failure types) and 'when' (explicit 'Use whenever...' clause listing multiple trigger scenarios like integrity review, adversarial review, shortcut audit, etc.).

3 / 3

Trigger Term Quality

Includes a strong set of natural trigger terms users would say: 'integrity review', 'adversarial review', 'reward-hacking review', 'fake implementation review', 'shortcut audit', plus domain-specific terms like 'pull requests', 'diffs', 'agent-generated code'. Good coverage of variations.

3 / 3

Distinctiveness Conflict Risk

Highly distinctive niche focused on implementation integrity failures and adversarial code review. The specific failure categories (reward hacking, defensive theater, test manipulation) clearly distinguish it from general code review or testing skills.

3 / 3

Total

12

/

12

Passed

Validation

100%

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation11 / 11 Passed

Validation for skill structure

No warnings or errors.

Reviewed

Table of Contents