Verify your own completed code changes using the repo's existing infrastructure and an independent evaluator context. Use after implementing a change when you need to run unit or integration tests, check build or lint gates, prove the real surface works with evidence, and challenge the changed code for clarity, deduplication, and maintainability. Do not use when the repo is not verifiable yet or when reviewing someone else's code.
97
98%
Does it follow best practices?
Impact
95%
1.05xAverage score across 3 eval scenarios
Passed
No known issues
Self-check your own completed change before independent review. Verify proves the change boots, passes guardrails, and survives the real surface; it is not a ship decision.
AGENTS.md, CLAUDE.md, or repo rules before judging the resultAGENTS.md, CLAUDE.md, or repo rules, when presentmake verify, just verify, pnpm test, cargo test, or the nearest targeted equivalentcurl http://127.0.0.1:3000/healthnode dist/cli.js --help or the repo's packaged entrypointFollow references/evidence-rules.md when collecting proof.
While exercising the change, fix anything cheap and obvious that you spot:
any, unsafe as, or non-null assertion you can replace with a real type in secondsthrow makes the diagnostic usefulKeep this as a self-check pass. Leave substantive code-shape concerns (architecture mismatches, broader duplication, error-classification redesigns) for independent review. Use references/simplification.md as a short self-check.
Produce one clear outcome:
ready for review — guardrails green, real surface confirmed, no obvious self-correctable issues leftneeds more work — the change is not ready to be reviewed; specific issues to address are listedblocked — verification cannot proceed, usually because infrastructure is too weakVerify reports readiness for review. If a requested proof surface was unavailable, name that boundary explicitly instead of substituting a weaker check. The independent ship decision belongs to a separate review pass.
After verification, report in this compact bullet shape:
- verdict: exactly one of ready for review, needs more work, or blocked- evidence: concise explanations of what checks proved, not full commands- fixed during verify: only if self-corrections happened- unverified or gaps: readiness gaps, doc drift, or none- next: independent review, readiness setup, documentation cleanup, more implementation, or noneKeep the final answer short:
fixed during verify when nothing was correctedtypecheck passed for tv-vite or API smoke check returned 200; include full commands only when they failed, are needed for reproduction, or the user asks for themExample:
- verdict: ready for review
- evidence: retry tests covered success and failure paths; API retry smoke returned 200
- unverified or gaps: none
- next: review-gang