Use when the user wants to audit a user journey, audit a signup/onboarding/checkout flow, do a UX audit, find the friction in a funnel, understand why users are dropping off or where they are being lost, or improve conversion in a web app — any diagnostic review of a multi-step, in-product flow. Use it whenever the user mentions drop-off, funnels, session replay, heatmaps, activation, time-to-value, cart or checkout abandonment, onboarding friction, or rage clicks, or wants to know where users struggle and what to fix first, even if they don't say "audit." Produces a severity-ranked, prioritized, experiment-validated improvement backlog via evidence-first intake, five parallel specialist lenses, and synthesis.
94
100%
Does it follow best practices?
Impact
72%
1.26xAverage score across 3 eval scenarios
Passed
No known issues
{
"context": "Tests whether the agent correctly scopes a quick triage (2-3 lenses, not all five), applies an explicit scoring framework (ICE or PXL) with all per-factor inputs shown, routes lenses appropriately given the symptom and evidence type, builds a 9-field journey brief marking unavailable evidence as 'not provided', and avoids fabricating quantitative data.",
"type": "weighted_checklist",
"checklist": [
{
"name": "Triage scope stated",
"description": "The report explicitly states that this is a quick triage (not a full audit) and acknowledges that depth was traded for speed",
"max_score": 8
},
{
"name": "2-3 lenses selected",
"description": "The report runs exactly two or three named lenses (not four or five) and does not run all five lenses",
"max_score": 8
},
{
"name": "Lens justification present",
"description": "The report explains why those specific lenses were chosen given the available evidence and symptoms described",
"max_score": 8
},
{
"name": "Heuristic or qualitative lens included",
"description": "At least one of the selected lenses is a Heuristic/Usability lens or a Qualitative/Friction lens (appropriate when there are user complaints and no analytics)",
"max_score": 8
},
{
"name": "9-field journey brief present",
"description": "The report includes a journey brief with all nine fields: Product, Journey audited, Conversion goal/micro, Segments & devices in scope, Evidence available, Step inventory, Quantitative signal, Behavioral/qualitative signal, Data-trust notes",
"max_score": 8
},
{
"name": "Missing evidence marked 'not provided'",
"description": "Journey brief fields for analytics, replay, and live URL are explicitly marked 'not provided' (or equivalent — not left blank or omitted)",
"max_score": 8
},
{
"name": "ICE or PXL framework used",
"description": "The prioritized action list uses either ICE or PXL scoring by name — not a generic ranking or intuition-based order",
"max_score": 8
},
{
"name": "Per-factor scores shown",
"description": "Every item in the prioritized backlog shows its individual factor scores (e.g. Impact=7, Confidence=6, Ease=8 for ICE) so the ranking is auditable",
"max_score": 8
},
{
"name": "7-field finding blocks used",
"description": "Each finding is presented using the labeled 7-field block format: Finding, Evidence, Why it matters, Fix, Validate, Severity, Journey step",
"max_score": 8
},
{
"name": "No fabricated drop-off rates",
"description": "The report does NOT assert any specific drop-off percentage or conversion rate (e.g. does NOT say '40% of users abandon at step 2') given that no analytics were provided",
"max_score": 8
},
{
"name": "Magnitudes described as unmeasured",
"description": "The report explicitly states that impact magnitudes are unmeasured or unknown (since there is no funnel data), rather than presenting findings as measured facts",
"max_score": 8
},
{
"name": "Treat benchmarks as directional",
"description": "If any external benchmark or industry figure is cited, it is attributed to a source and described as directional rather than presented as a universal target",
"max_score": 6
},
{
"name": "Next evidence step stated",
"description": "The report names at least one specific type of additional evidence (e.g. analytics, session replay, live walkthrough) that would enable a deeper audit",
"max_score": 6
}
]
}