Evidence-first pull request review with independent critique, selective challenger review, and human handoff.
89
92%
Does it follow best practices?
Impact
89%
1.36xAverage score across 43 eval scenarios
Risky
Do not use without reviewing
Version: 1.0
The evidence pack is the composed artifact produced by pr-evidence-builder from individual script outputs. It is the single input to fresh-eyes-review, challenger-review, and finding-synthesizer. Reviewer skills receive ONLY this pack and the raw diff — never authoring context.
{
"schema_version": "1.0",
"generated_at": "ISO-8601 timestamp",
"wall_clock_ms": int,
"pr": {
"number": int,
"title": str,
"description": str,
"labels": [str],
"url": str
},
"authorship": {
"ai_assisted": bool,
"ai_tools": [str],
"ai_commit_ratio": float,
"total_commits": int
},
"diff": {
"stats": {
"files_changed": int,
"insertions": int,
"deletions": int
},
"touched_files": [{
"path": str,
"change_type": "added" | "modified" | "deleted" | "renamed",
"subsystem": str,
"insertions": int,
"deletions": int
}],
"subsystems": [str],
"raw_diff": str
},
"context": {
"linked_issues": [{
"number": int,
"title": str,
"body": str,
"labels": [str]
}],
"related_tests": [{
"source_file": str,
"test_file": str,
"test_exists": bool
}],
"owners": [{
"path_pattern": str,
"owners": [str]
}],
"repo_instructions": str | null
},
"risk": {
"lane": "green" | "yellow" | "red",
"confidence": "high" | "medium" | "low",
"factors": [{
"factor": str,
"files": [str],
"severity": "moderate" | "red"
}],
"override_applied": bool,
"override_source": str | null
},
"verification": {
"verifiers": [{
"name": str,
"status": "pass" | "fail" | "warn" | "skipped" | "timeout",
"findings": [{
"file": str,
"line": int | null,
"message": str,
"severity": "error" | "warning" | "info"
}],
"duration_ms": int
}],
"summary": {
"passed": int,
"failed": int,
"warnings": int,
"skipped": int
}
},
"hotspots": [{
"file": str,
"line_start": int,
"line_end": int,
"category": str,
"why": str,
"risk_contributing": bool
}],
"missing_artifacts": [{
"artifact": str,
"why_required": str,
"suggested_location": str,
"required": bool
}]
}raw_diff is included so reviewer skills can inspect actual code, not just metadata. If the diff exceeds the model's usable context window, the tile should not truncate — it should block AI review and recommend splitting the PR (see pr-evidence-builder oversized diff handling).repo_instructions captures contributing guides, PR templates, or .pr-review/ config that the reviewer should respect.schema_version is included for forward compatibility. Backward compatibility and migration rules are a post-v1.0 concern — the schema will change significantly before stabilizing.This is the output format for fresh-eyes-review and challenger-review. The finding-synthesizer consumes arrays of these from all review sources.
{
"finding_id": str,
"source": "fresh_eyes" | "challenger" | "verifier",
"title": str,
"file": str,
"line_start": int | null,
"line_end": int | null,
"hunk": str | null,
"why_it_matters": str,
"evidence": {
"type": "verifier_output" | "hunk_level_code" | "repo_policy" | "contextual_reasoning",
"detail": str
},
"confidence": "high" | "medium" | "low",
"severity": "critical" | "high" | "medium" | "low",
"action": "fix" | "verify" | "discuss",
"requires_human": bool
}{
"corroborated_by": [str],
"contested_by": [str],
"merged_confidence": "high" | "medium" | "low",
"suppressed": bool,
"suppression_reason": str | null
}Suppressed findings are retained in the data for eval but not surfaced in the reviewer packet.
evals
scenario-1
scenario-2
scenario-3
scenario-4
scenario-5
scenario-6
scenario-7
scenario-8
scenario-9
scenario-10
scenario-11
scenario-12
scenario-13
scenario-14
scenario-15
scenario-16
scenario-17
scenario-18
scenario-19
scenario-20
scenario-21
scenario-22
scenario-23
scenario-24
scenario-25
scenario-26
scenario-27
scenario-28
scenario-29
scenario-30
scenario-31
scenario-32
scenario-33
scenario-34
scenario-35
scenario-36
scenario-37
scenario-38
scenario-39
scenario-40
scenario-41
scenario-42
scenario-43
rules
skills
challenger-review
finding-synthesizer
fresh-eyes-review
human-review-handoff
pr-evidence-builder
review-retrospective