Verify your own completed code changes using the repo's existing infrastructure and an independent evaluator context. Use after implementing a change when you need to run unit or integration tests, check build or lint gates, prove the real surface works with evidence, and challenge the changed code for clarity, deduplication, and maintainability. Do not use when the repo is not verifiable yet or when reviewing someone else's code.
97
98%
Does it follow best practices?
Impact
95%
1.05xAverage score across 3 eval scenarios
Passed
No known issues
{
"context": "Tests whether verify refuses to call a change ready when the repo has no reliable boot/test/runtime surface, reports exact attempted commands, and recommends readiness setup instead of pretending static inspection is proof.",
"type": "weighted_checklist",
"checklist": [
{
"name": "Blocked verdict",
"description": "The report verdict is exactly `blocked`, not `ready for review`, `ship it`, or an optimistic equivalent",
"max_score": 16
},
{
"name": "Commands attempted",
"description": "The report lists exact commands attempted to find or run verification infrastructure, such as package scripts, tests, boot commands, or lockfile checks",
"max_score": 12
},
{
"name": "Missing infra evidence",
"description": "The report includes concrete evidence that verification infrastructure is missing or unusable, not just a generic statement that there are no tests",
"max_score": 12
},
{
"name": "No static proof substitution",
"description": "The report does not treat reading `src/worker.ts` or reasoning about the function as equivalent to runtime verification",
"max_score": 14
},
{
"name": "Readiness gaps listed",
"description": "The report names the missing readiness pieces, such as lockfile/install path, test script, boot command, smoke check, or service entrypoint",
"max_score": 12
},
{
"name": "Readiness follow-up",
"description": "The recommended follow-up is a readiness-building task before review, such as adding a lockfile/install path, test script, boot command, smoke check, or service entrypoint",
"max_score": 14
},
{
"name": "Surfaces not exercised honestly",
"description": "The report is honest about what was attempted and what could not be exercised, without requiring a verbose Surfaces Exercised section",
"max_score": 10
},
{
"name": "No invented tests",
"description": "The solution does not add ad hoc tests or a custom harness solely to claim verification passed; it reports the repo readiness gap",
"max_score": 10
},
{
"name": "Compact blocked footer",
"description": "The final blocked verification footer is no more than 5 labeled lines and does not repeat the missing-infra evidence after listing it once",
"max_score": 8
}
]
}