CtrlK
BlogDocsLog inGet started
Tessl Logo

benchmark-e2e

End-to-end benchmark suite for vercel-plugin. Runs realistic projects through skill injection, launches dev servers, verifies everything works, analyzes conversation logs, and produces an improvement report for overnight self-improvement loops.

62

Quality

Does it follow best practices?

Impact

No eval scenarios have been run

SecuritybySnyk

Passed

No known issues

SKILL.md
Quality
Evals
Security

Quality

Content

80%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

The body is concise and highly actionable, with concrete commands, typed contracts, and a clear pipeline. It loses points on workflow clarity for missing explicit validation/error-recovery checkpoints and on progressive disclosure because its referenced script bundle is not present.

Suggestions

Add explicit validation/error-recovery checkpoints to the verify stage (e.g. 'if the dev server does not return 200 within N polls, capture logs and abort that slug') and guard the destructive cleanup with a confirmation step.

Either provide the referenced scripts/ bundle (benchmark-e2e.ts and the stage scripts) or remove/replace the dangling file references so progressive disclosure points to real artifacts.

DimensionReasoningScore

Conciseness

The body is lean — options table, typed interfaces, and terse stage descriptions assume Claude's competence and avoid explaining concepts it already knows; every section earns its place.

3 / 3

Actionability

Provides copy-paste-ready commands ('bun run scripts/benchmark-e2e.ts --quick'), a full options table, and complete TypeScript interfaces for run-manifest, events, and report — fully executable guidance.

3 / 3

Workflow Clarity

The four-stage pipeline and numbered self-improvement cycle are clearly sequenced, but verification/error-recovery checkpoints are only implicit and the destructive cleanup ('rm -rf') has no validation, capping at 2.

2 / 3

Progressive Disclosure

Sections are well organized, but the only reference is to scripts/benchmark-e2e.ts and no scripts/ bundle exists on disk, so the reference is unverified and there is no real split across files.

2 / 3

Total

10

/

12

Passed

Description

67%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

The description is specific and distinct, enumerating concrete pipeline actions and carving out a clear niche. Its main weakness is the absence of an explicit 'Use when...' trigger clause, which leaves the 'when to use' guidance only implied.

Suggestions

Add an explicit 'Use when...' clause naming natural trigger terms (e.g. 'Use when running end-to-end benchmarks for vercel-plugin, testing skill injection, or running overnight self-improvement loops').

Soften technical jargon with common user-facing phrasings so natural-language triggers match more easily.

DimensionReasoningScore

Specificity

Lists multiple concrete actions — 'Runs realistic projects through skill injection, launches dev servers, verifies everything works, analyzes conversation logs, and produces an improvement report' — matching the top anchor.

3 / 3

Completeness

The 'what' is clearly and thoroughly answered, but there is no 'Use when...' or equivalent explicit trigger clause, capping completeness at 2 per the guidelines.

2 / 3

Trigger Term Quality

Relevant terms like 'benchmark suite', 'skill injection', 'dev servers', and 'conversation logs' appear, but they lean technical and lack the common natural phrasings a user would actually say.

2 / 3

Distinctiveness Conflict Risk

Scoped to 'vercel-plugin' end-to-end benchmarking and 'overnight self-improvement loops', a clear niche unlikely to trigger for unrelated skills.

3 / 3

Total

10

/

12

Passed

Validation

93%

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation15 / 16 Passed

Validation for skill structure

CriteriaDescriptionResult

referenced_paths_exist

Referenced path issues: 4 missing

Warning

Total

15

/

16

Passed

Repository
vercel/vercel-plugin
Reviewed

Table of Contents

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.