Run an evidence-grounded software architecture audit workflow that builds a repo brief, selects single-auditor or specialist-panel mode, inspects boundary, layering, dependency, composition, cohesion, and testability risks, writes required finding blocks, and sequences incremental refactors. Use when asked for an architecture audit, architecture review, repo-structure review, software architecture report, audit_report.md, structural issue findings, or specialist-panel synthesis across multi-module systems.
100
100%
Does it follow best practices?
Impact
100%
1.85xAverage score across 3 eval scenarios
Passed
No known issues
{
"context": "Tests whether the agent uses specialist-panel mode for a multi-module full-repo target, correctly synthesizes findings by collapsing symptoms to root causes, avoids prescribing heavyweight architectural patterns, and does not include scorecards or strengths sections.",
"type": "weighted_checklist",
"checklist": [
{
"name": "Specialist-panel mode",
"description": "Report explicitly indicates specialist-panel mode was used (e.g. references multiple specialist perspectives or roles such as boundaries/layering, dependencies/composition, smells, or testability), given the multi-module full-repo scope",
"max_score": 12
},
{
"name": "No scorecards",
"description": "Report does NOT contain a scorecard, numeric rating grid, or any 'Score: X/10' style evaluation",
"max_score": 8
},
{
"name": "No strengths section",
"description": "Report does NOT contain a 'Strengths', 'What's Good', 'Positives', or 'What's Working' section",
"max_score": 8
},
{
"name": "No heavyweight pattern prescription",
"description": "Report does NOT recommend converting to microservices, adopting event sourcing, or other heavyweight distributed architecture patterns without clear evidence-based justification",
"max_score": 8
},
{
"name": "Root cause deduplication",
"description": "At least two symptoms that share a common root cause (e.g. god-class causing both boundary leaks and testability problems) are grouped under one root-cause finding rather than listed as separate unrelated issues",
"max_score": 12
},
{
"name": "Findings in one section only",
"description": "Each full finding appears in exactly one of the three category sections (Boundary and Layering Problems, Dependency and Composition Problems, Testability and Change Friction) — no finding is duplicated verbatim across sections",
"max_score": 10
},
{
"name": "Evidence from actual files",
"description": "At least three finding blocks cite specific file names from the codebase (e.g. src/core/app.js, src/shared/constants.js) as evidence",
"max_score": 10
},
{
"name": "Improvement Sequence: Goal field",
"description": "The Improvement Sequence contains at least two steps each with a 'Goal:' line",
"max_score": 8
},
{
"name": "Improvement Sequence: Why first field",
"description": "The Improvement Sequence contains at least two steps each with a 'Why first:' line",
"max_score": 8
},
{
"name": "Incremental first moves",
"description": "The first recommended improvement is a contained incremental structural change (e.g. extract a module, introduce an interface, isolate a boundary) rather than a rewrite or full redesign",
"max_score": 8
},
{
"name": "Open Evidence Gaps present",
"description": "Report contains a '## Open Evidence Gaps' section with at least one item",
"max_score": 8
}
]
}