Auto-syncs stale docstrings and README when function signatures change. Detects documentation drift after refactors, parameter additions, or return type changes. Dry-run by default — proposes before writing.
87
100%
Does it follow best practices?
Impact
86%
1.59xAverage score across 17 eval scenarios
Passed
No known issues
{
"context": "Tests whether the agent correctly handles symbols moved between files: it should flag them as [NEEDS HUMAN REVIEW] because import references may be stale, and must NOT auto-update import examples or delete any documentation.",
"type": "weighted_checklist",
"checklist": [
{
"name": "Human review flagged",
"description": "doc-sync-report.md contains '[NEEDS HUMAN REVIEW]' for parse_config or sanitize_input indicating the move requires human attention",
"max_score": 30
},
{
"name": "Move context reported",
"description": "The report mentions that parse_config (or the functions) were moved from utils.py OR references src/utils.py in the flagged entry",
"max_score": 25
},
{
"name": "README not auto-updated",
"description": "README.md still contains the original `parse_config(raw)` and `sanitize_input(value)` entries — the agent did NOT rewrite the import paths or module references in README",
"max_score": 55
},
{
"name": "Documentation not deleted",
"description": "The report does NOT say the documentation was deleted or that 'Missing coverage' applies to parse_config — the existing docs are preserved",
"max_score": 55
},
{
"name": "Unified report format",
"description": "doc-sync-report.md contains '## Doc Sync Report' as a top-level heading",
"max_score": 45
}
]
}evals
scenario-1
scenario-2
scenario-3
scenario-4
scenario-5
scenario-6
scenario-7
scenario-8
scenario-9
scenario-10
scenario-11
scenario-12
scenario-13
scenario-14
scenario-15
scenario-16
scenario-17