CtrlK
BlogDocsLog inGet started
Tessl Logo

benchmark-e2e

End-to-end benchmark suite for vercel-plugin. Runs realistic projects through skill injection, launches dev servers, verifies everything works, analyzes conversation logs, and produces an improvement report for overnight self-improvement loops.

53

Quality

Does it follow best practices?

Impact

No eval scenarios have been run

SecuritybySnyk

Passed

No known issues

SKILL.md
Quality
Evals
Security

Quality

Content

70%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

The body is highly actionable with executable commands and explicit data contracts, and the pipeline stages are clearly sequenced. It loses points for missing validation feedback loops on destructive operations and for inline detail that belongs in reference files that are not actually bundled.

Suggestions

Add an explicit validation/verify checkpoint and error-recovery loop to the destructive cleanup and overnight-loop workflows.

Move the verbose TypeScript interface definitions into a bundled reference file and link to it from SKILL.md so the overview stays lean.

DimensionReasoningScore

Conciseness

Mostly efficient with concrete command examples and tables, but the contracts section repeats interface definitions and the self-improvement cycle restates pipeline behavior already covered, so it could be tightened.

2 / 3

Actionability

Provides copy-paste-ready commands ('bun run scripts/benchmark-e2e.ts', 'rm -rf ~/dev/vercel-plugin-testing'), concrete flag tables, and exact TypeScript interfaces for each contract.

3 / 3

Workflow Clarity

The four-stage pipeline is sequenced and abort-on-failure is stated, but destructive/batch operations (cleanup 'rm -rf', overnight loop) lack explicit validation checkpoints or error-recovery feedback loops, capping clarity at 2.

2 / 3

Progressive Disclosure

Sections are well-organized, but referenced bundle files (scripts/benchmark-e2e.ts, run-manifest.json) do not exist in the bundle and inline interface/type definitions that could live in reference files are kept in SKILL.md, leaving structure only partially split out.

2 / 3

Total

9

/

12

Passed

Description

50%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

The description is strong on concrete actions and specificity but weak on trigger-term quality and explicit 'when to use' guidance. It reads as an internal tool summary rather than a naturally-triggered skill description.

Suggestions

Add an explicit 'Use when...' clause naming natural user phrasing (e.g. 'Use when running or validating the vercel-plugin skill-injection benchmark against test projects').

Replace internal jargon ('skill injection', 'overnight self-improvement loops') with user-facing trigger keywords a person would actually say.

DimensionReasoningScore

Specificity

Lists multiple concrete actions — 'Runs realistic projects through skill injection, launches dev servers, verifies everything works, analyzes conversation logs, and produces an improvement report' — matching the multi-action anchor.

3 / 3

Completeness

Clearly answers 'what does this do' but lacks an explicit 'Use when...' trigger clause, so 'when' is only implied — capping completeness at 2 per the guidelines.

2 / 3

Trigger Term Quality

Uses internal jargon ('skill injection', 'vercel-plugin', 'overnight self-improvement loops') rather than natural terms a user would say when needing this skill; no common user-facing keywords.

1 / 3

Distinctiveness Conflict Risk

The vercel-plugin benchmark niche is fairly specific, but the absence of explicit trigger guidance leaves overlap risk with other benchmark or testing skills.

2 / 3

Total

8

/

12

Passed

Validation

93%

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation15 / 16 Passed

Validation for skill structure

CriteriaDescriptionResult

referenced_paths_exist

Referenced path issues: 4 missing

Warning

Total

15

/

16

Passed

Repository
vercel/vercel-plugin
Reviewed

Table of Contents

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.