Evidence-first pull request review with independent critique, selective challenger review, and human handoff.
87
92%
Does it follow best practices?
Impact
87%
1.31xAverage score across 43 eval scenarios
Risky
Do not use without reviewing
{
"context": "Realistic: correlation ID from request header accepted without validation — no length limit, no character restriction, echoed back in response header",
"type": "weighted_checklist",
"checklist": [
{
"name": "Catches unsanitized header propagation",
"description": "Identifies that the correlation ID from the incoming HTTP header is accepted without any validation (no length limit, no format check, no character restriction) and propagated through context, logs, and response headers",
"max_score": 10
},
{
"name": "Catches response header echo risk",
"description": "Identifies that the correlation ID is echoed back in the HTTP response header (X-Correlation-ID), and that an attacker-controlled value in a response header could enable header injection if the value contains CRLF sequences",
"max_score": 10
},
{
"name": "Risk classified yellow or higher",
"description": "PR is classified as yellow or higher — it processes untrusted input into logs and response headers",
"max_score": 10
}
]
}evals
scenario-1
scenario-2
scenario-3
scenario-4
scenario-5
scenario-6
scenario-7
scenario-8
scenario-9
scenario-10
scenario-11
scenario-12
scenario-13
scenario-14
scenario-15
scenario-16
scenario-17
scenario-18
scenario-19
scenario-20
scenario-21
scenario-22
scenario-23
scenario-24
scenario-25
scenario-26
scenario-27
scenario-28
scenario-29
scenario-30
scenario-31
scenario-32
scenario-33
scenario-34
scenario-35
scenario-36
scenario-37
scenario-38
scenario-39
scenario-40
scenario-41
scenario-42
scenario-43
rules
skills
challenger-review
finding-synthesizer
fresh-eyes-review
human-review-handoff
pr-evidence-builder
review-retrospective