eval-harness

Formal evaluation framework for Claude Code sessions implementing eval-driven development (EDD) principles

2.08x

Quality

24%

Does it follow best practices?

Impact

100%

2.08x

Average score across 6 eval scenarios

Securityby

Advisory

Suggest reviewing before use

Optimize this skill with Tessl

npx tessl skill review --optimize ./docs/zh-TW/skills/eval-harness/SKILL.md

Evaluation results

100%

79%

Real-Time Notifications Module: Eval Planning

EDD pre-implementation eval definitions

Criteria

Without context

With context

CAPABILITY EVAL header

100%

Success criteria checkboxes

100%

Expected output field

100%

REGRESSION EVAL header

100%

Regression baseline reference

100%

Regression test list with status

100%

.claude/evals/ directory

100%

Feature-named eval file

50%

100%

No implementation code

100%

Covers new capabilities

50%

100%

Covers regression risk

50%

100%

32%

AI Code Assistant Quality Evaluation Pipeline

Grader type selection and scoring formats

Criteria

Without context

With context

Code-based scorer present

100%

Code scorer applied correctly

100%

MODEL GRADER PROMPT header

100%

Model grader has numbered questions

100%

Model grader score scale

100%

HUMAN REVIEW REQUIRED header

100%

Human review risk level

100%

Human review applied to security

100%

Deterministic preference stated

100%

Model grader for qualitative checks

100%

43%

Search Autocomplete: Eval Report Generation

EDD workflow report with pass@k metrics

Criteria

Without context

With context

Capability evals section

100%

Regression evals section

100%

Individual test results listed

100%

Capability totals

100%

pass@1 metric computed

100%

pass@3 metric computed

100%

pass^k for regression

40%

100%

Status line present

100%

Report in .claude/evals/

100%

Feature-named report file

100%

Regression totals

100%

Repository: ysyecust/everything-claude-code
Commit: 79cc4e3

Evaluated: 3 months ago
Agent: Claude Code
Model: Claude Sonnet 4.6

Table of Contents

Real-Time Notifications Module: Eval Planning AI Code Assistant Quality Evaluation Pipeline Search Autocomplete: Eval Report Generation

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.