Name: cappasoft/web-dev-estimation
Rating: 95.6 (1 reviews)
Author: cappasoft

cappasoft/web-dev-estimation

Estimates implementation time for web development tasks (frontend and/or backend) by analyzing the existing codebase and calibrating for an AI coding agent as executor — not a human developer. Use when the user asks about effort, sizing, or feasibility: 'how long', 'how much work', 'estimate this', 'what is the effort', 'breakdown this task', 'can we do this in X days', 'is this a big task', 'how complex is', 'what's involved in', 'fits in the sprint', 'rough sizing', 't-shirt size', 'story points'. Also use when the user describes a feature and implicitly wants to know scope — e.g. 'we need to add X to the app', 'thinking about building Y', 'is this feasible by Friday'. Supports batch estimation from any structured source (BMAD output, spec folders, PRDs, backlogs, task lists) — use when the user mentions 'estimate the stories', 'estimate the epic', 'scan the backlog', 'estimate all tasks', 'estimate the specs', or points to a folder of task/story/spec files.

1.40x

Quality

94%

Does it follow best practices?

Impact

98%

1.40x

Average score across 5 eval scenarios

Securityby

Passed

No known issues

{
  "context": "Tests whether the agent properly declares that the codebase was not read, applies appropriate uncertainty multipliers, reduces confidence accordingly, and provides honest caveats about estimate reliability when no project files are available.",
  "type": "weighted_checklist",
  "checklist": [
    {
      "name": "Codebase not read declared",
      "description": "Output explicitly states that the codebase was not read or that no project files were available, using a warning or caveat",
      "max_score": 15
    },
    {
      "name": "Variance warning present",
      "description": "Output warns that actual time may differ by 2-3x or more once the codebase is analyzed",
      "max_score": 12
    },
    {
      "name": "Low confidence assigned",
      "description": "Confidence level is set to Low (not Medium or High), reflecting the lack of codebase access",
      "max_score": 12
    },
    {
      "name": "Very wide ranges used",
      "description": "Time ranges have at least ±60% spread or wider, reflecting high uncertainty without codebase context",
      "max_score": 10
    },
    {
      "name": "Stack unknown noted",
      "description": "Output acknowledges that the stack is unknown or undetected, and notes this increases uncertainty",
      "max_score": 10
    },
    {
      "name": "Sub-task decomposition present",
      "description": "Work is still broken into sub-tasks despite the lack of codebase access",
      "max_score": 8
    },
    {
      "name": "Assumptions made explicit",
      "description": "Output lists assumptions about the stack, existing infrastructure, or patterns that could change the estimate",
      "max_score": 8
    },
    {
      "name": "Top risk identified",
      "description": "A specific top risk is named, likely related to the unknown codebase or stack",
      "max_score": 8
    },
    {
      "name": "T-shirt size assigned",
      "description": "A T-shirt size is provided with the caveat that it may shift once the codebase is reviewed",
      "max_score": 7
    },
    {
      "name": "Time ranges not points",
      "description": "All time values are expressed as ranges, not single point estimates",
      "max_score": 5
    },
    {
      "name": "Recommendation to read codebase",
      "description": "Output recommends sharing the codebase or project access for a more reliable estimate",
      "max_score": 5
    }
  ]
}