Name: cappasoft/web-dev-estimation
Rating: 95.6 (1 reviews)
Author: cappasoft

cappasoft/web-dev-estimation

Estimates implementation time for web development tasks (frontend and/or backend) by analyzing the existing codebase and calibrating for an AI coding agent as executor — not a human developer. Use when the user asks about effort, sizing, or feasibility: 'how long', 'how much work', 'estimate this', 'what is the effort', 'breakdown this task', 'can we do this in X days', 'is this a big task', 'how complex is', 'what's involved in', 'fits in the sprint', 'rough sizing', 't-shirt size', 'story points'. Also use when the user describes a feature and implicitly wants to know scope — e.g. 'we need to add X to the app', 'thinking about building Y', 'is this feasible by Friday'. Supports batch estimation from any structured source (BMAD output, spec folders, PRDs, backlogs, task lists) — use when the user mentions 'estimate the stories', 'estimate the epic', 'scan the backlog', 'estimate all tasks', 'estimate the specs', or points to a folder of task/story/spec files.

1.40x

Quality

94%

Does it follow best practices?

Impact

98%

1.40x

Average score across 5 eval scenarios

Securityby

Passed

No known issues

{
  "context": "Tests whether the agent applies greenfield correction factors when no prior art exists for the requested feature, uses appropriate calibration multipliers for a Python stack, and produces wider ranges reflecting higher uncertainty.",
  "type": "weighted_checklist",
  "checklist": [
    {
      "name": "Greenfield acknowledged",
      "description": "Estimate explicitly notes the absence of an existing chart/visualization library or similar pattern in the codebase, impacting the estimate upward",
      "max_score": 12
    },
    {
      "name": "Wider time ranges",
      "description": "Time ranges are noticeably wider than a typical well-defined task (reflecting greenfield uncertainty), with the upper bound at least 50% above the lower bound for major sub-tasks",
      "max_score": 10
    },
    {
      "name": "Medium or low confidence",
      "description": "Confidence is set to Medium or Low (not High), reflecting the greenfield nature and absence of prior patterns",
      "max_score": 10
    },
    {
      "name": "Five or more sub-tasks",
      "description": "Work is decomposed into at least 5 distinct sub-tasks covering library setup, data endpoints, chart rendering, filters, and export",
      "max_score": 10
    },
    {
      "name": "Library selection risk noted",
      "description": "Estimate mentions that selecting and integrating a new chart library (with no existing pattern to follow) is a risk or driver of uncertainty",
      "max_score": 8
    },
    {
      "name": "Stack detected as Python",
      "description": "Output identifies the stack as Python/FastAPI (not TypeScript or another framework)",
      "max_score": 8
    },
    {
      "name": "Agent-calibrated times",
      "description": "Sub-task times are in the agent scale (minutes to hours), not in human developer scale (days to weeks)",
      "max_score": 8
    },
    {
      "name": "T-shirt size L or larger",
      "description": "T-shirt size is L, XL, or XXL, consistent with a greenfield multi-component feature",
      "max_score": 8
    },
    {
      "name": "All required sections present",
      "description": "Output includes Summary, Sub-tasks table, Assumptions, Risks, and T-shirt size sections",
      "max_score": 8
    },
    {
      "name": "Time ranges not points",
      "description": "All time estimates use ranges rather than single point values",
      "max_score": 8
    },
    {
      "name": "Top risk identified",
      "description": "A specific top risk is named explaining what could most increase the estimate",
      "max_score": 5
    },
    {
      "name": "Prior art assessed",
      "description": "Estimate explicitly assesses prior art level (noting it as Low or none for the visualization component)",
      "max_score": 5
    }
  ]
}