Name: cappasoft/web-dev-estimation
Rating: 95.6 (1 reviews)
Author: cappasoft

cappasoft/web-dev-estimation

Estimates implementation time for web development tasks (frontend and/or backend) by analyzing the existing codebase and calibrating for an AI coding agent as executor — not a human developer. Use when the user asks about effort, sizing, or feasibility: 'how long', 'how much work', 'estimate this', 'what is the effort', 'breakdown this task', 'can we do this in X days', 'is this a big task', 'how complex is', 'what's involved in', 'fits in the sprint', 'rough sizing', 't-shirt size', 'story points'. Also use when the user describes a feature and implicitly wants to know scope — e.g. 'we need to add X to the app', 'thinking about building Y', 'is this feasible by Friday'. Supports batch estimation from any structured source (BMAD output, spec folders, PRDs, backlogs, task lists) — use when the user mentions 'estimate the stories', 'estimate the epic', 'scan the backlog', 'estimate all tasks', 'estimate the specs', or points to a folder of task/story/spec files.

1.40x

Quality

94%

Does it follow best practices?

Impact

98%

1.40x

Average score across 5 eval scenarios

Securityby

Passed

No known issues

{
  "context": "Tests whether the agent produces a properly structured estimation with all required sections, uses time ranges instead of point estimates, decomposes into appropriately-sized sub-tasks, and includes risk and confidence assessment.",
  "type": "weighted_checklist",
  "checklist": [
    {
      "name": "Time ranges used",
      "description": "All time estimates use ranges (e.g. '1-2h') rather than single point values (e.g. '2h')",
      "max_score": 10
    },
    {
      "name": "Sub-task decomposition",
      "description": "Work is broken into at least 3 distinct sub-tasks, each estimated individually",
      "max_score": 10
    },
    {
      "name": "Sub-task granularity",
      "description": "No individual sub-task exceeds 2 hours in its upper range bound",
      "max_score": 10
    },
    {
      "name": "Sub-tasks grouped by category",
      "description": "Sub-tasks are organized into categories such as Backend, Frontend, Tests, or Config/Infrastructure",
      "max_score": 8
    },
    {
      "name": "Confidence level stated",
      "description": "Output explicitly states a confidence level (High, Medium, or Low) with a brief rationale",
      "max_score": 10
    },
    {
      "name": "Top risk identified",
      "description": "Output names a specific top risk — the single thing that could most increase the estimate",
      "max_score": 10
    },
    {
      "name": "Assumptions declared",
      "description": "Output includes an explicit list of assumptions that the estimate depends on",
      "max_score": 8
    },
    {
      "name": "T-shirt size assigned",
      "description": "Output assigns a T-shirt size (XS/S/M/L/XL/XXL) with the corresponding agent time range",
      "max_score": 8
    },
    {
      "name": "Prior art acknowledged",
      "description": "Estimate references or acknowledges existing code patterns (e.g. Stripe SDK already present) and how they affect the estimate",
      "max_score": 8
    },
    {
      "name": "Agent-calibrated times",
      "description": "Time estimates reflect agent execution speed (minutes-to-hours scale) rather than human developer timelines (days-to-weeks scale)",
      "max_score": 8
    },
    {
      "name": "Summary section present",
      "description": "Output begins with a summary containing total time range, confidence, main uncertainty, and detected stack",
      "max_score": 5
    },
    {
      "name": "Credit line present",
      "description": "Output includes an attribution line mentioning web-dev-estimation",
      "max_score": 5
    }
  ]
}