Name: evilissimo/implementation-integrity-review
Rating: 98.5 (1 reviews)
Author: evilissimo

evilissimo/implementation-integrity-review

Reviews repositories, pull requests, diffs, and agent-generated code for reward hacking, fake completion, defensive theater, architectural bypasses, weakened guarantees, hidden fallbacks, and misleading abstractions.

1.09x

Quality

97%

Does it follow best practices?

Impact

100%

1.09x

Average score across 6 eval scenarios

Securityby

Passed

No known issues

{
  "context": "Tests whether the agent catches a provider abstraction that provides no real substitutability and hides hardcoded success.",
  "type": "weighted_checklist",
  "checklist": [
    {
      "name": "Correct category",
      "description": "Identifies fake provider abstraction, defensive theater, or speculative abstraction hiding incomplete behavior.",
      "max_score": 14
    },
    {
      "name": "Severity",
      "description": "Assigns Medium or higher severity with rationale tied to false confidence.",
      "max_score": 8
    },
    {
      "name": "Single implementation evidence",
      "description": "Notes that EmailProvider has only SendGridProvider and EmailService constructs it directly.",
      "max_score": 14
    },
    {
      "name": "Hardcoded success evidence",
      "description": "Cites SendGridProvider.send returning True without calling SendGrid or handling failures.",
      "max_score": 18
    },
    {
      "name": "No injection",
      "description": "Explains that callers cannot substitute providers because EmailService creates the concrete provider internally.",
      "max_score": 12
    },
    {
      "name": "Test weakness",
      "description": "Notes the test only checks True and does not verify delivery call, failure handling, or provider substitution.",
      "max_score": 10
    },
    {
      "name": "Remediation",
      "description": "Recommends implementing real provider behavior or simplifying until needed, injecting the provider, and testing success and failure paths.",
      "max_score": 16
    },
    {
      "name": "Avoids abstraction absolutism",
      "description": "Does not claim that all provider interfaces are bad; ties the finding to this implementation's evidence.",
      "max_score": 8
    }
  ]
}

evilissimo/implementation-integrity-review

criteria.json.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}evals/scenario-2/

criteria.jsonevals/scenario-2/