Estimates implementation time for web development tasks (frontend and/or backend) by analyzing the existing codebase and calibrating for an AI coding agent as executor — not a human developer. Use when the user asks about effort, sizing, or feasibility: 'how long', 'how much work', 'estimate this', 'what is the effort', 'breakdown this task', 'can we do this in X days', 'is this a big task', 'how complex is', 'what's involved in', 'fits in the sprint', 'rough sizing', 't-shirt size', 'story points'. Also use when the user describes a feature and implicitly wants to know scope — e.g. 'we need to add X to the app', 'thinking about building Y', 'is this feasible by Friday'. Supports batch estimation from any structured source (BMAD output, spec folders, PRDs, backlogs, task lists) — use when the user mentions 'estimate the stories', 'estimate the epic', 'scan the backlog', 'estimate all tasks', 'estimate the specs', or points to a folder of task/story/spec files.
95
94%
Does it follow best practices?
Impact
98%
1.40xAverage score across 5 eval scenarios
Passed
No known issues
Produces structured, honest time estimates for web development tasks. Calibrated for Claude Code as the executor, not a human developer.
AI coding agents are fast at mechanical work and slow at ambiguous work. Always read the codebase before estimating — agents that estimate from description alone systematically miss prior art and produce 2–3× wider ranges.
For calibration multipliers and detailed rationale → read references/calibration.md
For common patterns and reference times → read references/patterns.md
For honesty rules and escalation thresholds → read references/honesty-rules.md
Before reading anything else, detect what kind of project this is:
# Detect stack
find . -maxdepth 2 \( \
-name "package.json" \
-o -name "pyproject.toml" \
-o -name "go.mod" \
-o -name "Gemfile" \
-o -name "composer.json" \
-o -name "Cargo.toml" \
\) 2>/dev/null | grep -v node_modules | head -10
# Detect monorepo
ls -d */ 2>/dev/null | grep -E "^(apps|packages|services|libs|frontend|backend|api)/" | head -10Map the result to one of: typescript-fullstack | typescript-frontend | node-backend | python-backend | go-backend | ruby-backend | php-backend | rust-backend | monorepo | unknown.
If unknown or monorepo, note it and proceed with caution — add +30% uncertainty buffer.
Run the reads appropriate to the detected stack. Estimating without reading the codebase leads to generic guesses — the multipliers in this skill only work when grounded in actual project structure and prior art.
For TypeScript / JavaScript projects:
# Structure and dependencies
cat package.json 2>/dev/null | python3 -c "
import json,sys
d=json.load(sys.stdin)
deps={**d.get('dependencies',{}), **d.get('devDependencies',{})}
print(json.dumps(list(deps.keys()), indent=2))
" 2>/dev/null | head -50
# Source layout
find . -type f \( -name "*.ts" -o -name "*.tsx" -o -name "*.js" -o -name "*.jsx" \) \
| grep -v node_modules | grep -v dist | grep -v .next | grep -v coverage \
| head -60
# Existing patterns closest to the task
ls src/components/ src/pages/ src/app/ src/features/ 2>/dev/null | head -30
ls server/ backend/ api/ services/ controllers/ routes/ 2>/dev/null | head -30For Python projects:
cat pyproject.toml 2>/dev/null || cat requirements.txt 2>/dev/null | head -30
find . -name "*.py" | grep -v __pycache__ | grep -v .venv | head -60
ls app/ src/ api/ routers/ models/ services/ 2>/dev/null | head -30For Go projects:
cat go.mod 2>/dev/null | head -20
find . -name "*.go" | grep -v vendor | head -60
ls cmd/ internal/ pkg/ api/ handlers/ 2>/dev/null | head -30All stacks — find prior art for the specific task:
# Replace KEYWORD with a term from the task description
grep -r "KEYWORD" . --include="*.ts" --include="*.tsx" --include="*.py" --include="*.go" \
-l 2>/dev/null | grep -v node_modules | grep -v .git | head -10Read 2–4 representative files most similar to the task — this is what makes the "prior art" dimension reliable.
If the codebase cannot be read (e.g., path not provided, no filesystem access), state this explicitly and add 2–3× uncertainty to all estimates.
Determine:
| Dimension | Options |
|---|---|
| Layer | Frontend only / Backend only / Full-stack / Infrastructure |
| Type | Feature / Bug fix / Refactor / Integration / Performance / Migration |
| Prior art | High (similar pattern exists) / Medium (partial) / Low (greenfield) |
| Spec clarity | Clear / Partial / Vague |
| External deps | None / Internal service / 3rd-party API / Auth / DB migration |
Break the work into concrete sub-tasks of 5–120 agent-minutes each. Never leave a block >2 hours without decomposing further.
Group by: Frontend / Backend / Tests / Config & Infrastructure / Other.
For each sub-task:
references/calibration.md multiplier tableRead references/patterns.md to cross-check sub-task times against known patterns.
Read references/honesty-rules.md before writing any number in the output.
Before producing the final output, verify internal consistency. Catching errors here avoids publishing an estimate that contradicts itself:
references/patterns.mdIf any check fails, fix the estimate before proceeding.
Use this exact format. Do not abbreviate or skip sections.
## Estimation: [Task name]
**Summary**
- Total (agent): X–Y hours
- Confidence: High / Medium / Low
- Main uncertainty: [one sentence — what could most change this estimate]
- Stack detected: [what was found]
---
### Sub-tasks
#### Frontend
| Task | Agent time | Prior art | Notes |
|---|---|---|---|
| [sub-task] | Xh–Yh | High/Med/Low | [risk or pattern note] |
#### Backend
| Task | Agent time | Prior art | Notes |
|---|---|---|---|
#### Tests
| Task | Agent time | Prior art | Notes |
|---|---|---|---|
#### Config / Infrastructure
| Task | Agent time | Prior art | Notes |
|---|---|---|---|
---
### Assumptions
- [Every assumption made. If any is wrong, the estimate is wrong.]
### Blockers / Clarifications needed
- [Any question that, if unanswered, makes the estimate unreliable.]
### Agent-specific risks
- [Anything that slows agents specifically: missing docs, unclear contracts, heavy shared state, lack of types, etc.]
---
### T-Shirt Size
| Size | Agent time |
|---|---|
| This task | **[XS/S/M/L/XL/XXL]** — [X–Y hours] |
*Reference scale in `references/patterns.md`*
---
*Estimated with [web-dev-estimation](https://github.com/ecappa/web-dev-estimation) — agent-calibrated skill by [Eric Cappannelli](https://www.linkedin.com/in/ecappannelli/)*This skill supports several invocation patterns:
Automatic — Claude loads this skill when it detects estimation intent in the conversation.
Direct — /estimate [task description] invokes the skill immediately. When invoked directly with a task description, skip Step 0 confirmation and go straight to codebase reads.
Re-estimate — When scope changes mid-conversation ("actually, skip the export feature"), apply a delta estimate: show only the removed/added sub-tasks and the revised total. Do not re-run the full workflow.
Batch — Estimate multiple tasks at once from any structured source. See the Batch Estimation workflow below.
When the user asks to estimate a set of tasks, use this workflow instead of running the single-task flow N times. This avoids redundant codebase reads and produces a consolidated summary table.
The batch mode accepts any structured source of tasks. Auto-detect the format by scanning the input:
| Source type | How to detect | What to extract |
|---|---|---|
| Directory of files (specs, stories, tasks) | User points to a folder, or scan common locations | One task per file: title from H1/filename, description, acceptance criteria |
BMAD output (_bmad-output/stories/) | story-*.md files with structured frontmatter | Title, description, acceptance criteria, technical notes |
| Single document with sections (PRD, epic, RFC) | One file with H2/H3 sections per feature/task | One task per section: heading = title, body = description |
| Markdown checklist / backlog | A file with - [ ] items or numbered list | One task per item: the item text is the description |
| User-provided list in chat | User types or pastes a list of tasks | One task per line/bullet |
# Check common locations
ls _bmad-output/stories/ docs/stories/ docs/specs/ epics/ stories/ specs/ tasks/ 2>/dev/null | head -30
# Or find markdown files that look like task definitions
find . -maxdepth 3 -name "*.md" -path "*/stories/*" -o -name "*.md" -path "*/specs/*" -o -name "*.md" -path "*/tasks/*" 2>/dev/null | head -20If the user provides a path, file, or list directly, use that instead of scanning.
For each task, extract whatever is available:
Tasks with only a title and no description get a "Vague" spec clarity rating automatically.
## Batch Estimation: [Project / Epic / Sprint name]
**Summary**
- Tasks scanned: N
- Total (agent): X–Y hours
- Overall T-shirt size: [XS/S/M/L/XL/XXL]
- Confidence: High / Medium / Low
- Stack detected: [what was found]
- Source: [what was scanned — e.g. "_bmad-output/stories/", "docs/specs/", "user-provided list"]
---
### Task estimates
| # | Task | Layer | Agent time | Size | Confidence | Top risk |
|---|---|---|---|---|---|---|
| 1 | [Task title] | FE | Xh–Yh | S | High | [one-liner] |
| 2 | [Task title] | BE | Xh–Yh | M | Med | [one-liner] |
| 3 | [Task title] | Full | Xh–Yh | L | Low | [one-liner] |
| ... | | | | | | |
| | **Total** | | **Xh–Yh** | **[size]** | | |
---
### Tasks requiring clarification
These tasks have vague or missing specs. Estimates are unreliable until clarified:
- [Task title]: [specific question or what's missing]
### Shared assumptions
- [Assumptions that apply across all tasks — e.g. "Auth system already exists", "Using existing DB schema"]
### Cross-task dependencies
- [Task A] must complete before [Task B] because [reason]
### Suggested implementation order
1. [Task] — [why first: foundation, dependency, risk reduction]
2. [Task] — [why next]
...
---
*Estimated with [web-dev-estimation](https://github.com/ecappa/web-dev-estimation) — agent-calibrated skill by [Eric Cappannelli](https://www.linkedin.com/in/ecappannelli/)*_refs
bin
evals
scenario-1
scenario-2
scenario-3
scenario-4
scenario-5