Use when the user wants to audit a user journey, audit a signup/onboarding/checkout flow, do a UX audit, find the friction in a funnel, understand why users are dropping off or where they are being lost, or improve conversion in a web app — any diagnostic review of a multi-step, in-product flow. Use it whenever the user mentions drop-off, funnels, session replay, heatmaps, activation, time-to-value, cart or checkout abandonment, onboarding friction, or rage clicks, or wants to know where users struggle and what to fix first, even if they don't say "audit." Produces a severity-ranked, prioritized, experiment-validated improvement backlog via evidence-first intake, five parallel specialist lenses, and synthesis.
94
100%
Does it follow best practices?
Impact
72%
1.26xAverage score across 3 eval scenarios
Passed
No known issues
{
"context": "Tests whether the agent audits a checkout funnel by ranking leaks using absolute recoverable volume rather than drop-rate percentages, surfaces instrumentation and analytics data-trust caveats, correctly segments findings by device without overgeneralizing, flags uninstrumented gaps, uses structured finding blocks alongside (not instead of) a backlog table, and avoids asserting causes without supporting qualitative evidence.",
"type": "weighted_checklist",
"checklist": [
{
"name": "Absolute volume calculation",
"description": "Agent computes or references the absolute number of sessions lost at each step (e.g., ~3,045 lost at Payment→Review, ~2,808 at Shipping→Payment, ~2,200 at Cart→Shipping) rather than comparing only percentage drop rates",
"max_score": 20
},
{
"name": "Correct top-priority step",
"description": "Agent ranks Payment→Review as the highest-priority step to fix, with reasoning tied to the absolute session volume lost at that step — not solely because it has the highest percentage drop rate",
"max_score": 10
},
{
"name": "Instrumentation sanity check",
"description": "Agent explicitly flags the need to verify analytics instrumentation before fully trusting the funnel numbers — e.g., mentions checking for duplicate event tags, bot/internal-IP filtering, staging events leaking into production, or correct funnel step ordering and windowing",
"max_score": 10
},
{
"name": "Analytics data-trust caveats",
"description": "Agent notes known GA4 accuracy limitations: explicitly mentions ad-blocker data loss (~25–40%) or consent-driven data loss (~50–55%), or notes that GA4 Blended/modeled figures are estimates rather than exact counts",
"max_score": 10
},
{
"name": "Mobile segment isolated",
"description": "Agent reports mobile (28%) and desktop (51%) Payment→Review conversion as distinct findings and uses the device split to inform the specific recommendation for that step — does NOT treat the aggregate 39% as the full picture",
"max_score": 10
},
{
"name": "Hidden drop-offs flagged",
"description": "Agent explicitly notes the possibility of uninstrumented drop-offs between visible funnel events (e.g., between payment submission and the review page load, or between the review page and the confirmation page) that would be invisible in the GA4 funnel",
"max_score": 10
},
{
"name": "Finding blocks with all 7 fields",
"description": "At least two findings in the report use a structured block with all seven labeled fields: Finding, Evidence, Why it matters, Fix, Validate, Severity, Journey step",
"max_score": 10
},
{
"name": "Backlog alongside finding blocks",
"description": "The report contains both a prioritized backlog table AND the detailed structured finding blocks — the backlog table exists as a summary layer alongside the finding blocks, not as a replacement for them",
"max_score": 5
},
{
"name": "Needs-research labeling",
"description": "Leaks or drop-offs for which the report lacks qualitative or behavioral corroboration are explicitly labeled as needing further investigation (e.g., 'needs research', 'cause unknown', 'recommend session replay review') rather than asserting a speculative cause as established fact",
"max_score": 5
},
{
"name": "What's Working Well section",
"description": "Report includes a named section or clearly labeled passage that highlights at least one specific funnel strength with supporting evidence (e.g., the 91% Review→Confirmation rate, or the strong returning-visitor conversion rate of 31%)",
"max_score": 5
},
{
"name": "Falsifiable hypothesis present",
"description": "At least one recommendation includes both a named validation method (A/B test, before-after, or ship-and-monitor) and a falsifiable hypothesis in the form 'changing [X] will improve [metric] for [segment] because [evidence]'",
"max_score": 5
}
]
}