CtrlK
BlogDocsLog inGet started
Tessl Logo

cappasoft/web-dev-estimation

Estimates implementation time for web development tasks (frontend and/or backend) by analyzing the existing codebase and calibrating for an AI coding agent as executor — not a human developer. Use when the user asks about effort, sizing, or feasibility: 'how long', 'how much work', 'estimate this', 'what is the effort', 'breakdown this task', 'can we do this in X days', 'is this a big task', 'how complex is', 'what's involved in', 'fits in the sprint', 'rough sizing', 't-shirt size', 'story points'. Also use when the user describes a feature and implicitly wants to know scope — e.g. 'we need to add X to the app', 'thinking about building Y', 'is this feasible by Friday'. Supports batch estimation from any structured source (BMAD output, spec folders, PRDs, backlogs, task lists) — use when the user mentions 'estimate the stories', 'estimate the epic', 'scan the backlog', 'estimate all tasks', 'estimate the specs', or points to a folder of task/story/spec files.

95

1.40x
Quality

94%

Does it follow best practices?

Impact

98%

1.40x

Average score across 5 eval scenarios

SecuritybySnyk

Passed

No known issues

Overview
Quality
Evals
Security
Files

skill-audit-guide.md_refs/

Agent Skill Audit Guide

A scoring rubric for auditing any Agent Skill against the official specification and best practices. This guide is designed to be read by an AI agent performing the audit.


Official Sources

All criteria below are derived from these authoritative sources. Cite them when reporting findings.

SourceURLWhat it covers
Agent Skills Specificationhttps://agentskills.io/specificationFormal spec: frontmatter fields, constraints, directory structure, progressive disclosure
Anthropic Best Practiceshttps://docs.claude.com/en/docs/agents-and-tools/agent-skills/best-practicesAuthoring principles, degrees of freedom, progressive disclosure patterns, feedback loops, anti-patterns
Anthropic Engineering Bloghttps://www.anthropic.com/engineering/equipping-agents-for-the-real-world-with-agent-skillsArchitecture, context window mechanics, code execution model
Anthropic Blog (Product)https://claude.com/blog/skillsProduct overview, composability, portability
skill-creator SKILL.mdhttps://github.com/anthropics/skills/blob/main/skills/skill-creator/SKILL.mdMeta-skill: how Anthropic builds skills, description optimization, eval workflow, writing style
Anthropic Skills Repohttps://github.com/anthropics/skillsOfficial examples, template, spec
Cursor Docs: Skillshttps://cursor.com/docs/skillsCursor-specific skill loading, directories, frontmatter fields

Audit Procedure

Step 1 — Read the skill

Read ALL files in the skill directory. For each file, note:

  • File path and line count
  • Whether it's referenced from SKILL.md
  • Whether it's a reference file, script, asset, or support file

Step 2 — Score each criterion

For every criterion below, assign one of:

  • Pass — Fully meets the requirement
  • Partial — Meets some but not all aspects
  • Fail — Does not meet the requirement
  • N/A — Not applicable to this skill

Provide a one-sentence justification for each score.

Step 3 — Calculate the score

Each criterion has a weight (1–3). Multiply:

  • Pass = weight × 1
  • Partial = weight × 0.5
  • Fail = weight × 0
  • N/A = excluded from total

Final score = sum of scores / sum of applicable weights × 100.

Step 4 — Produce the report

Use the output format at the bottom of this guide.


Audit Criteria

A. Frontmatter & Metadata

#CriterionWeightSourceHow to check
A1name field exists, 1–64 chars, lowercase + hyphens only, no consecutive hyphens, no leading/trailing hyphen3SpecParse the YAML frontmatter
A2name matches parent directory name2SpecCompare frontmatter name to directory name
A3description field exists, 1–1024 chars, non-empty3SpecParse and count characters
A4Description includes WHAT the skill does2Best PracticesRead for functional description
A5Description includes WHEN to use it (trigger scenarios)2Best Practices, skill-creatorRead for trigger phrases or context cues
A6Description is written in third person1Best PracticesNo "I can...", "You can...", no imperative directives to the agent
A7Description is "pushy" enough for reliable triggering1skill-creatorIncludes specific trigger phrases or use-case patterns
A8license field present (if applicable)1SpecCheck frontmatter
A9compatibility field present (if environment requirements exist)1SpecCheck frontmatter
A10metadata field with author info (if publishing publicly)1SpecCheck frontmatter

B. Structure & Progressive Disclosure

#CriterionWeightSourceHow to check
B1SKILL.md body is under 500 lines3Spec, Best PracticesCount lines
B2SKILL.md body is under ~5000 tokens2SpecEstimate tokens (~4 chars/token for English)
B3Reference files are linked from SKILL.md (one level deep)3Best Practices, SpecCheck that SKILL.md links to refs, and refs don't link to further refs
B4Reference files > 300 lines have a table of contents1skill-creatorCheck line counts and look for ToC
B5Directory structure follows convention (SKILL.md at root, optional scripts/, references/, assets/)2SpecList directory structure
B6No deeply nested references (A links to B links to C)2Best PracticesTrace all links from SKILL.md

C. Content Quality

#CriterionWeightSourceHow to check
C1Skill adds knowledge Claude doesn't already have3Best PracticesEvaluate whether the content is non-trivial and domain-specific
C2Instructions use imperative form1skill-creatorCheck verb forms in instructions
C3Instructions explain the "why" behind important rules, not just "ALWAYS/NEVER"2skill-creatorLook for reasoning alongside directives
C4Consistent terminology throughout (no synonym mixing)2Best PracticesRead for inconsistent terms
C5Language is consistent (no mixing of languages)1Best PracticesScan for non-English text in English skills (or vice versa)
C6No time-sensitive information (or properly handled in "old patterns" section)1Best PracticesLook for dates, version-dependent instructions
C7No Windows-style paths (backslashes)1Best Practices, SpecGrep for \\ in paths
C8Appropriate degrees of freedom: low for fragile ops, high for flexible ones2Best PracticesEvaluate rigidity vs. flexibility per section
C9Examples are concrete, not abstract1Best PracticesCheck for specific examples vs. vague placeholders

D. Workflows & Validation

#CriterionWeightSourceHow to check
D1Workflow has clear, sequential steps2Best PracticesCheck for numbered or named steps
D2Output has a strict template/format2Best PracticesCheck for template blocks
D3Feedback loop / self-check before final output2Best PracticesLook for validation step
D4Escalation thresholds defined (when to stop and ask)1Best PracticesCheck for escalation conditions
D5Conditional workflows for decision points1Best PracticesCheck for branching logic (if X → do Y)

E. Evaluations & Testing

#CriterionWeightSourceHow to check
E1Evaluation file exists (evals.json or similar)2skill-creator, Best PracticesCheck for evals/ directory
E2At least 3 test scenarios2skill-creatorCount eval entries
E3Test scenarios cover different complexity levels1skill-creatorCheck for variety (simple, medium, complex)
E4Test scenarios include edge cases or failure modes1skill-creatorLook for negative tests, ambiguous inputs, escalation triggers

F. Scripts & Executable Code (if applicable)

#CriterionWeightSourceHow to check
F1Scripts are self-contained with error handling2Best PracticesRead scripts for try/catch, error messages
F2Scripts solve problems rather than punt to the agent1Best PracticesCheck for explicit error handling vs. bare exceptions
F3Dependencies are documented1Best PracticesCheck for package requirements
F4Clear distinction between "execute" and "read as reference"1Best PracticesCheck SKILL.md instructions for each script
F5No magic numbers (all values justified)1Best PracticesRead for unexplained constants

G. Distribution & Packaging (if publishing)

#CriterionWeightSourceHow to check
G1README exists with clear description1GeneralCheck for README.md
G2Install instructions are accurate and multi-platform1GeneralVerify install commands
G3No speculative or non-existent commands/platforms1GeneralCheck for made-up marketplaces or CLIs
G4Author/attribution info present1GeneralCheck README and frontmatter

Scoring Table

CategoryMax points (all applicable)
A. Frontmatter & Metadata17
B. Structure & Progressive Disclosure13
C. Content Quality14
D. Workflows & Validation8
E. Evaluations & Testing6
F. Scripts & Executable Code6 (or N/A)
G. Distribution & Packaging4 (or N/A)
Total68 (or less if N/A)

Output Format

When producing an audit report, use this structure:

## Skill Audit: [skill-name]

**Audited on**: [date]
**Auditor**: [agent name or "automated"]
**Skill version**: [version from metadata or commit hash]

### Summary

| Category | Score | Max | % |
|---|---|---|---|
| A. Frontmatter & Metadata | X | Y | Z% |
| B. Structure & Disclosure | X | Y | Z% |
| C. Content Quality | X | Y | Z% |
| D. Workflows & Validation | X | Y | Z% |
| E. Evaluations & Testing | X | Y | Z% |
| F. Scripts (if applicable) | X | Y | Z% |
| G. Distribution (if applicable) | X | Y | Z% |
| **Total** | **X** | **Y** | **Z%** |

### Detailed Results

#### A. Frontmatter & Metadata

| # | Criterion | Score | Justification |
|---|---|---|---|
| A1 | name field valid | Pass | "web-dev-estimation": 22 chars, lowercase + hyphens only |
| A2 | name matches directory | Pass | Directory is web-dev-estimation/ |
| ... | ... | ... | ... |

[Repeat for each category]

### Top Issues (by impact)
1. [Issue] — [why it matters] — [how to fix]
2. ...

### Recommendations
- [Actionable improvement]
- ...

Notes for the Auditing Agent

  • Read ALL files before scoring. Don't estimate from file names alone.
  • The description character count is critical — parse the YAML carefully, the value may span multiple lines or use special quoting.
  • For token estimation, use ~4 characters per token for English prose, ~3 for code-heavy content.
  • If the skill has no scripts, mark all F criteria as N/A.
  • If the skill is not intended for public distribution, mark all G criteria as N/A.
  • When in doubt between Pass and Partial, check the source document for the exact wording of the requirement.
  • Cross-reference the skill's actual output format against the template — don't just check that a template exists, verify it covers all recommended sections.

_refs

skill-audit-guide.md

package.json

README.md

SKILL.md

tile.json