Name: cappasoft/web-dev-estimation
Rating: 95.6 (1 reviews)
Author: cappasoft

cappasoft/web-dev-estimation

Estimates implementation time for web development tasks (frontend and/or backend) by analyzing the existing codebase and calibrating for an AI coding agent as executor — not a human developer. Use when the user asks about effort, sizing, or feasibility: 'how long', 'how much work', 'estimate this', 'what is the effort', 'breakdown this task', 'can we do this in X days', 'is this a big task', 'how complex is', 'what's involved in', 'fits in the sprint', 'rough sizing', 't-shirt size', 'story points'. Also use when the user describes a feature and implicitly wants to know scope — e.g. 'we need to add X to the app', 'thinking about building Y', 'is this feasible by Friday'. Supports batch estimation from any structured source (BMAD output, spec folders, PRDs, backlogs, task lists) — use when the user mentions 'estimate the stories', 'estimate the epic', 'scan the backlog', 'estimate all tasks', 'estimate the specs', or points to a folder of task/story/spec files.

1.40x

Quality

94%

Does it follow best practices?

Impact

98%

1.40x

Average score across 5 eval scenarios

Securityby

Passed

No known issues

{
  "context": "Tests whether the agent applies honesty rules when facing poor documentation, identifies escalation triggers, does not compress estimates to match unrealistic stakeholder expectations, and flags high-risk factors appropriately.",
  "type": "weighted_checklist",
  "checklist": [
    {
      "name": "Does not match CEO timeline",
      "description": "Estimate does NOT compress to 'a couple hours' — the total exceeds the CEO's stated expectation, with honest reasoning for why",
      "max_score": 12
    },
    {
      "name": "Poor docs risk flagged",
      "description": "Estimate identifies the incomplete PDF documentation (missing appendix, no SDK, no sandbox) as a significant risk or uncertainty driver",
      "max_score": 12
    },
    {
      "name": "Escalation or clarification",
      "description": "Output includes specific questions or clarifications needed before the estimate can be considered reliable (e.g. missing appendix, no sandbox, authentication details)",
      "max_score": 10
    },
    {
      "name": "Low or medium confidence",
      "description": "Confidence level is Low or Medium, not High, reflecting the documentation gaps and external dependency",
      "max_score": 10
    },
    {
      "name": "External API buffer applied",
      "description": "Time estimates are visibly inflated compared to a well-documented integration, reflecting the poor documentation quality",
      "max_score": 8
    },
    {
      "name": "Time ranges used",
      "description": "All time estimates use ranges rather than single point values",
      "max_score": 8
    },
    {
      "name": "Wide range spread",
      "description": "Range spread is at least ±40% on major sub-tasks, reflecting the high uncertainty",
      "max_score": 8
    },
    {
      "name": "Sub-task decomposition",
      "description": "Work is broken into at least 4 distinct sub-tasks (e.g. API client, polling, reconciliation, push updates, tests)",
      "max_score": 8
    },
    {
      "name": "Stack detected as Go",
      "description": "Output identifies the stack as Go (not TypeScript, Python, etc.)",
      "max_score": 6
    },
    {
      "name": "T-shirt size M or larger",
      "description": "T-shirt size reflects substantial effort (M, L, or XL), not XS or S",
      "max_score": 6
    },
    {
      "name": "Top risk named",
      "description": "A specific top risk is identified relating to the undocumented API or missing specification details",
      "max_score": 6
    },
    {
      "name": "Assumptions listed",
      "description": "Output explicitly lists assumptions that the estimate depends on",
      "max_score": 6
    }
  ]
}