克劳德代码会话的正式评估框架,实施评估驱动开发(EDD)原则
Install with Tessl CLI
npx tessl i github:affaan-m/everything-claude-code --skill eval-harness47
Does it follow best practices?
If you maintain this skill, you can automatically optimize it using the tessl CLI to improve its score:
npx tessl skill review --optimize ./path/to/skillValidation for skill structure
EDD pre-implementation eval definition
Capability eval present
0%
100%
Capability eval format: task
50%
100%
Capability eval format: criteria checkboxes
0%
100%
Capability eval format: expected output
37%
100%
Regression eval present
25%
100%
Regression eval format: baseline
0%
100%
Regression eval format: test list
37%
100%
Correct storage path
0%
100%
Plan describes define-first order
100%
100%
Four-step workflow present
70%
100%
No implementation code
100%
100%
Without context: $0.4005 · 1m 40s · 17 turns · 550 in / 6,104 out tokens
With context: $0.4221 · 1m 33s · 21 turns · 539 in / 5,399 out tokens
Scorer type selection and formatting
Code-based scorer present
40%
100%
Code scorer format
12%
100%
Model-based scorer present
40%
100%
Model scorer format: numbered questions
37%
0%
Model scorer format: rating scale
100%
50%
Human review scorer present
40%
100%
Human review format: fields
12%
100%
Human review for security
100%
100%
Code scorer preferred for deterministic checks
90%
100%
Rationale distinguishes scorer types
100%
100%
No security check fully automated
100%
100%
Without context: $0.3470 · 2m · 18 turns · 25 in / 5,824 out tokens
With context: $0.4538 · 2m 1s · 22 turns · 26 in / 6,264 out tokens
pass@k metrics and eval report format
Capability section present
100%
100%
Regression section present
100%
100%
pass@k used for capability
0%
100%
pass^k used for critical path
75%
100%
pass@k computed correctly
0%
100%
pass^k computed correctly
100%
100%
Metrics section present
50%
100%
Status/conclusion present
100%
100%
Critical path justification
90%
100%
Pass@1 computed for capability
75%
100%
Report completeness
87%
100%
Without context: $0.1612 · 1m 5s · 8 turns · 13 in / 3,494 out tokens
With context: $0.3082 · 1m 24s · 14 turns · 17 in / 4,371 out tokens
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.