Comprehensive developer toolkit providing reusable skills for Java/Spring Boot, TypeScript/NestJS/React/Next.js, Python, PHP, AWS CloudFormation, AI/RAG, DevOps, and more.
90
90%
Does it follow best practices?
Impact
Pending
No eval scenarios have been run
Risky
Do not use without reviewing
The Task Quality KPI Framework provides objective, quantitative metrics for evaluating task implementation quality.
Key Architecture: KPIs are auto-generated by a hook - you read the results, not run scripts.
┌─────────────────────────────────────────────────────────────┐
│ HOOK (auto-executes) │
│ Trigger: PostToolUse on TASK-*.md │
│ Script: task-kpi-analyzer.py │
│ Output: TASK-XXX--kpi.json │
├─────────────────────────────────────────────────────────────┤
│ SKILL / AGENT (reads output) │
│ Input: TASK-XXX--kpi.json │
│ Action: Make evaluation decisions │
└─────────────────────────────────────────────────────────────┘| Problem | Solution |
|---|---|
| Skills can't execute scripts | Hook auto-runs on file save |
| Subjective review_status | Quantitative 0-10 scores |
| "Looks good to me" | Evidence-based evaluation |
| Binary pass/fail | Graduated quality levels |
After any task file modification, find KPI data at:
docs/specs/[ID]/tasks/TASK-XXX--kpi.json┌─────────────────────────────────────────────────────────────┐
│ OVERALL SCORE (0-10) │
├─────────────────────────────────────────────────────────────┤
│ Spec Compliance (30%) │
│ ├── Acceptance Criteria Met (0-10) │
│ ├── Requirements Coverage (0-10) │
│ └── No Scope Creep (0-10) │
├─────────────────────────────────────────────────────────────┤
│ Code Quality (25%) │
│ ├── Static Analysis (0-10) │
│ ├── Complexity (0-10) │
│ └── Patterns Alignment (0-10) │
├─────────────────────────────────────────────────────────────┤
│ Test Coverage (25%) │
│ ├── Unit Tests Present (0-10) │
│ ├── Test/Code Ratio (0-10) │
│ └── Coverage Percentage (0-10) │
├─────────────────────────────────────────────────────────────┤
│ Contract Fulfillment (20%) │
│ ├── Provides Verified (0-10) │
│ └── Expects Satisfied (0-10) │
└─────────────────────────────────────────────────────────────┘| Category | Weight | Why |
|---|---|---|
| Spec Compliance | 30% | Most important - did we build what was asked? |
| Code Quality | 25% | Technical excellence |
| Test Coverage | 25% | Verification and confidence |
| Contract Fulfillment | 20% | Integration with other tasks |
agents_loop.py)DO NOT run scripts - read the auto-generated file:
Read the KPI file:
docs/specs/001-feature/tasks/TASK-001--kpi.jsonThe KPI file contains:
{
"task_id": "TASK-001",
"evaluated_at": "2026-01-15T10:30:00Z",
"overall_score": 8.2,
"passed_threshold": true,
"threshold": 7.5,
"kpi_scores": [
{
"category": "Spec Compliance",
"weight": 30,
"score": 8.5,
"weighted_score": 2.55,
"metrics": {
"acceptance_criteria_met": 9.0,
"requirements_coverage": 8.0,
"no_scope_creep": 8.5
},
"evidence": [
"Acceptance criteria: 9/10 checked",
"Requirements coverage: 8/10"
]
}
],
"recommendations": [
"Code Quality: Moderate improvements possible"
],
"summary": "Score: 8.2/10 - PASSED"
}Use overall_score and passed_threshold:
IF passed_threshold == true:
→ Task meets quality standards
→ Approve and proceed
IF passed_threshold == false:
→ Task needs improvement
→ Check recommendations for specific targets
→ Create fix specification## Review Process
1. Read KPI file: TASK-XXX--kpi.json
2. Extract overall_score and kpi_scores
3. Read task file to validate
4. Generate evaluation report
5. Decision based on passed_threshold# Check KPI file exists
kpi_path = spec_path / "tasks" / f"{task_id}--kpi.json"
if kpi_path.exists():
kpi_data = json.loads(kpi_path.read_text())
if kpi_data["passed_threshold"]:
# Quality threshold met
advance_state("update_done")
else:
# Need more work
fix_targets = kpi_data["recommendations"]
create_fix_task(fix_targets)
advance_state("fix")
else:
# KPI not generated yet - task may not be implemented
log_warning("No KPI data found")Instead of max 3 retries, iterate until quality threshold met:
Iteration 1: Score 6.2 → FAILED → Fix: Improve test coverage
Iteration 2: Score 7.1 → FAILED → Fix: Refactor complex functions
Iteration 3: Score 7.8 → PASSED → ProceedEach iteration updates the KPI file automatically on task save.
| Score | Quality Level | Action |
|---|---|---|
| 9.0-10.0 | Exceptional | Approve, document best practices |
| 8.0-8.9 | Good | Approve with minor notes |
| 7.0-7.9 | Acceptable | Approve (if threshold 7.5) |
| 6.0-6.9 | Below Standard | Request specific improvements |
| < 6.0 | Poor | Significant rework required |
| Project Type | Threshold | Rationale |
|---|---|---|
| Production MVP | 8.0 | High quality required |
| Internal Tool | 7.0 | Good enough |
| Prototype | 6.0 | Functional over perfect |
| Critical System | 8.5 | No compromises |
Acceptance Criteria Met
(checked_criteria / total_criteria) * 10Requirements Coverage
traceability-matrix.mdNo Scope Creep
(implemented_files / expected_files) * 10Static Analysis
Complexity
10 - (long_functions_ratio * 5)Patterns Alignment
knowledge-graph.jsonUnit Tests Present
min(10, test_files * 5)Test/Code Ratio
(test_count / code_count) * 10Coverage Percentage
coverage_percent / 10Provides Verified
provides frontmatterExpects Satisfied
expects frontmatterIf TASK-XXX--kpi.json doesn't exist:
DO NOT try to calculate KPIs manually. The hook runs automatically when:
Before evaluating:
Check if KPI file exists:
docs/specs/[ID]/tasks/TASK-XXX--kpi.json
If missing:
- Task may not be implemented yet
- Ask user to save the task file firstThe KPIs are objective. Only override with documented evidence:
Target specific categories:
❌ "Fix code quality issues"
✅ "Improve Code Quality KPI from 5.2 to 7.0:
- Complexity: Refactor processData() (5→8)
- Patterns: Add error handling (6→8)"Monitor quality over time:
Sprint 1: Average KPI 6.8
Sprint 2: Average KPI 7.3 (+0.5)
Sprint 3: Average KPI 7.9 (+0.6)Check:
hooks.jsonTASK-*.mdValidate:
Possible causes:
Fix the root cause, not just the score.
Read the KPI file to evaluate task quality:
docs/specs/001-feature/tasks/TASK-042--kpi.json
Based on the data:
- Overall score: 6.8/10 (below threshold)
- Lowest KPI: Test Coverage (5.0/10)
- Recommendation: Add unit tests
Decision: REQUEST FIXES - target Test Coverage improvementIteration 1 KPI: Score 6.2 → FAILED
- Spec Compliance: 7.0 ✓
- Code Quality: 5.5 ✗
- Test Coverage: 6.0 ✗
Fix targets:
1. Refactor complex functions (Code Quality)
2. Add test coverage (Test Coverage)
Iteration 2 KPI: Score 7.8 → PASSED ✓# In agents_loop, after implementation step
kpi_file = spec_dir / "tasks" / f"{task_id}--kpi.json"
if kpi_file.exists():
kpi = json.loads(kpi_file.read_text())
if kpi["passed_threshold"]:
print(f"✅ Task passed quality check: {kpi['overall_score']}/10")
advance_state("update_done")
else:
print(f"❌ Task failed quality check: {kpi['overall_score']}/10")
print("Recommendations:")
for rec in kpi["recommendations"]:
print(f" - {rec}")
advance_state("fix")evaluator-agent.md - Agent that uses KPI data for evaluationhooks.json - Hook configuration for auto-generationtask-kpi-analyzer.py - Hook script (do not execute directly)agents_loop.py - Orchestrator that reads KPI for decisionsdocs
plugins
developer-kit-ai
developer-kit-aws
agents
docs
skills
aws
aws-cli-beast
aws-cost-optimization
aws-drawio-architecture-diagrams
aws-sam-bootstrap
aws-cloudformation
aws-cloudformation-auto-scaling
aws-cloudformation-bedrock
aws-cloudformation-cloudfront
aws-cloudformation-cloudwatch
aws-cloudformation-dynamodb
aws-cloudformation-ec2
aws-cloudformation-ecs
aws-cloudformation-elasticache
references
aws-cloudformation-iam
references
aws-cloudformation-lambda
aws-cloudformation-rds
aws-cloudformation-s3
aws-cloudformation-security
aws-cloudformation-task-ecs-deploy-gh
aws-cloudformation-vpc
references
developer-kit-core
agents
commands
skills
developer-kit-devops
developer-kit-java
agents
commands
docs
skills
aws-lambda-java-integration
aws-rds-spring-boot-integration
aws-sdk-java-v2-bedrock
aws-sdk-java-v2-core
aws-sdk-java-v2-dynamodb
aws-sdk-java-v2-kms
aws-sdk-java-v2-lambda
aws-sdk-java-v2-messaging
aws-sdk-java-v2-rds
aws-sdk-java-v2-s3
aws-sdk-java-v2-secrets-manager
clean-architecture
graalvm-native-image
langchain4j-ai-services-patterns
references
langchain4j-mcp-server-patterns
references
langchain4j-rag-implementation-patterns
references
langchain4j-spring-boot-integration
langchain4j-testing-strategies
langchain4j-tool-function-calling-patterns
langchain4j-vector-stores-configuration
references
qdrant
references
spring-ai-mcp-server-patterns
spring-boot-actuator
spring-boot-cache
spring-boot-crud-patterns
spring-boot-dependency-injection
spring-boot-event-driven-patterns
spring-boot-openapi-documentation
spring-boot-project-creator
spring-boot-resilience4j
spring-boot-rest-api-standards
spring-boot-saga-pattern
spring-boot-security-jwt
assets
references
scripts
spring-boot-test-patterns
spring-data-jpa
references
spring-data-neo4j
references
unit-test-application-events
unit-test-bean-validation
unit-test-boundary-conditions
unit-test-caching
unit-test-config-properties
references
unit-test-controller-layer
unit-test-exception-handler
references
unit-test-json-serialization
unit-test-mapper-converter
references
unit-test-parameterized
unit-test-scheduled-async
references
unit-test-service-layer
references
unit-test-utility-methods
unit-test-wiremock-rest-api
references
developer-kit-php
developer-kit-project-management
developer-kit-python
developer-kit-specs
commands
docs
hooks
test-templates
tests
skills
developer-kit-tools
developer-kit-typescript
agents
docs
hooks
rules
skills
aws-cdk
aws-lambda-typescript-integration
better-auth
clean-architecture
drizzle-orm-patterns
dynamodb-toolbox-patterns
references
nestjs
nestjs-best-practices
nestjs-code-review
nestjs-drizzle-crud-generator
nextjs-app-router
nextjs-authentication
nextjs-code-review
nextjs-data-fetching
nextjs-deployment
nextjs-performance
nx-monorepo
react-code-review
react-patterns
shadcn-ui
tailwind-css-patterns
tailwind-design-system
references
turborepo-monorepo
typescript-docs
typescript-security-review
zod-validation-utilities
references
github-spec-kit