Name: benchmark-e2e
Rating: 53.6 (1 reviews)
Author: vercel

benchmark-e2e

End-to-end benchmark suite for vercel-plugin. Runs realistic projects through skill injection, launches dev servers, verifies everything works, analyzes conversation logs, and produces an improvement report for overnight self-improvement loops.

Quality

60%

Does it follow best practices?

Run evals on this skill

Adds up to 20 points to the overall score

View guide

Securityby

Passed

No known issues

Fix and improve this skill with Tessl

tessl review fix ./.claude/skills/benchmark-e2e/SKILL.md

Quality

Content

70%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

The body is highly actionable with executable commands and explicit data contracts, and the pipeline stages are clearly sequenced. It loses points for missing validation feedback loops on destructive operations and for inline detail that belongs in reference files that are not actually bundled.

Suggestions

Add an explicit validation/verify checkpoint and error-recovery loop to the destructive cleanup and overnight-loop workflows.

Move the verbose TypeScript interface definitions into a bundled reference file and link to it from SKILL.md so the overview stays lean.

Dimension	Reasoning	Score
Conciseness	Mostly efficient with concrete command examples and tables, but the contracts section repeats interface definitions and the self-improvement cycle restates pipeline behavior already covered, so it could be tightened.	2 / 3
Actionability	Provides copy-paste-ready commands ('bun run scripts/benchmark-e2e.ts', 'rm -rf ~/dev/vercel-plugin-testing'), concrete flag tables, and exact TypeScript interfaces for each contract.	3 / 3
Workflow Clarity	The four-stage pipeline is sequenced and abort-on-failure is stated, but destructive/batch operations (cleanup 'rm -rf', overnight loop) lack explicit validation checkpoints or error-recovery feedback loops, capping clarity at 2.	2 / 3
Progressive Disclosure	Sections are well-organized, but referenced bundle files (scripts/benchmark-e2e.ts, run-manifest.json) do not exist in the bundle and inline interface/type definitions that could live in reference files are kept in SKILL.md, leaving structure only partially split out.	2 / 3
	Total	9 / 12 Passed

Description

50%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

The description is strong on concrete actions and specificity but weak on trigger-term quality and explicit 'when to use' guidance. It reads as an internal tool summary rather than a naturally-triggered skill description.

Suggestions

Add an explicit 'Use when...' clause naming natural user phrasing (e.g. 'Use when running or validating the vercel-plugin skill-injection benchmark against test projects').

Replace internal jargon ('skill injection', 'overnight self-improvement loops') with user-facing trigger keywords a person would actually say.

Dimension	Reasoning	Score
Specificity	Lists multiple concrete actions — 'Runs realistic projects through skill injection, launches dev servers, verifies everything works, analyzes conversation logs, and produces an improvement report' — matching the multi-action anchor.	3 / 3
Completeness	Clearly answers 'what does this do' but lacks an explicit 'Use when...' trigger clause, so 'when' is only implied — capping completeness at 2 per the guidelines.	2 / 3
Trigger Term Quality	Uses internal jargon ('skill injection', 'vercel-plugin', 'overnight self-improvement loops') rather than natural terms a user would say when needing this skill; no common user-facing keywords.	1 / 3
Distinctiveness Conflict Risk	The vercel-plugin benchmark niche is fairly specific, but the absence of explicit trigger guidance leaves overlap risk with other benchmark or testing skills.	2 / 3
	Total	8 / 12 Passed

Validation

93%

Warnings & errors only

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation — 15 / 16 Passed

Validation for skill structure

Criteria	Description	Result
referenced_paths_exist	Referenced path issues: 4 missing	Warning

	Total	15 / 16 Passed

Repository: vercel/vercel-plugin
Path: .claude/skills/benchmark-e2e/SKILL.md
Commit: 19606ac

Reviewed: 1 day ago

Table of Contents

Discovery Implementation Validation

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.