Estimates implementation time for web development tasks (frontend and/or backend) by analyzing the existing codebase and calibrating for an AI coding agent as executor — not a human developer. Use when the user asks about effort, sizing, or feasibility: 'how long', 'how much work', 'estimate this', 'what is the effort', 'breakdown this task', 'can we do this in X days', 'is this a big task', 'how complex is', 'what's involved in', 'fits in the sprint', 'rough sizing', 't-shirt size', 'story points'. Also use when the user describes a feature and implicitly wants to know scope — e.g. 'we need to add X to the app', 'thinking about building Y', 'is this feasible by Friday'. Supports batch estimation from any structured source (BMAD output, spec folders, PRDs, backlogs, task lists) — use when the user mentions 'estimate the stories', 'estimate the epic', 'scan the backlog', 'estimate all tasks', 'estimate the specs', or points to a folder of task/story/spec files.
95
94%
Does it follow best practices?
Impact
98%
1.40xAverage score across 5 eval scenarios
Passed
No known issues
Structured estimation output format
Time ranges used
100%
100%
Sub-task decomposition
100%
100%
Sub-task granularity
100%
100%
Sub-tasks grouped by category
0%
100%
Confidence level stated
0%
100%
Top risk identified
100%
100%
Assumptions declared
0%
100%
T-shirt size assigned
0%
100%
Prior art acknowledged
100%
100%
Agent-calibrated times
100%
100%
Summary section present
60%
100%
Credit line present
0%
100%
Greenfield correction factors
Greenfield acknowledged
50%
100%
Wider time ranges
60%
100%
Medium or low confidence
0%
100%
Five or more sub-tasks
100%
100%
Library selection risk noted
12%
75%
Stack detected as Python
100%
100%
Agent-calibrated times
100%
100%
T-shirt size L or larger
0%
100%
All required sections present
62%
100%
Time ranges not points
100%
100%
Top risk identified
60%
100%
Prior art assessed
0%
100%
Escalation and honesty rules
Does not match CEO timeline
100%
100%
Poor docs risk flagged
100%
100%
Escalation or clarification
100%
100%
Low or medium confidence
50%
100%
External API buffer applied
100%
100%
Time ranges used
100%
100%
Wide range spread
100%
62%
Sub-task decomposition
100%
100%
Stack detected as Go
100%
100%
T-shirt size M or larger
83%
100%
Top risk named
100%
100%
Assumptions listed
50%
100%
Codebase not read declaration
Codebase not read declared
100%
100%
Variance warning present
100%
100%
Low confidence assigned
33%
100%
Very wide ranges used
100%
100%
Stack unknown noted
100%
100%
Sub-task decomposition present
100%
100%
Assumptions made explicit
75%
100%
Top risk identified
62%
100%
T-shirt size assigned
0%
100%
Time ranges not points
100%
100%
Recommendation to read codebase
100%
80%
Batch estimation workflow
Consolidated summary table
66%
100%
All five features estimated
80%
100%
Per-task T-shirt size
25%
100%
Grand total provided
62%
100%
Implementation order suggested
100%
100%
Cross-task dependencies noted
90%
100%
Dark mode sized smallest
75%
50%
Permissions sized largest
87%
100%
Time ranges not points
0%
100%
Stack detected as TypeScript
100%
100%
Per-task risk or confidence
100%
100%
Shared assumptions listed
25%
100%