CtrlK
BlogDocsLog inGet started
Tessl Logo

benchmark-sandbox

Run vercel-plugin eval scenarios in Vercel Sandboxes instead of local WezTerm panels. Provisions ephemeral microVMs with Claude Code + plugin pre-installed, runs benchmark prompts, extracts hook artifacts, and produces coverage reports.

73

2.09x
Quality

61%

Does it follow best practices?

Impact

92%

2.09x

Average score across 3 eval scenarios

SecuritybySnyk

Advisory

Suggest reviewing before use

Optimize this skill with Tessl

npx tessl skill review --optimize ./.claude/skills/benchmark-sandbox/SKILL.md
SKILL.md
Quality
Evals
Security

Evaluation results

95%

20%

Benchmark Scenarios Authoring

Dynamic scenarios JSON authoring

Criteria
Without context
With context

Valid JSON array

100%

100%

slug field present

100%

100%

prompt field present

100%

100%

expectedSkills field present

100%

100%

userStories exactly 3

100%

100%

No tech name-dropping

100%

100%

Vercel-labs link in every prompt

0%

100%

Dev server command in every prompt

0%

100%

AI feature included

50%

50%

Storage scenario included

100%

100%

Scheduled task scenario included

100%

100%

Auth/middleware scenario included

100%

100%

90%

74%

Sandbox Provisioning Utility

Sandbox provisioning code

Criteria
Without context
With context

SDK version 1.8.0

0%

100%

runtime node24

0%

100%

ports 3000 in create

0%

100%

Home dir /home/vercel-sandbox

0%

100%

writeFiles for uploads

0%

100%

AbortSignal timeout

0%

0%

Timestamped project names

100%

100%

API keys via env in create

30%

100%

Snapshot stops source sandbox

30%

100%

No --print/-p for build

0%

100%

92%

49%

Phase Scoring Module

Haiku structured scoring module

Criteria
Without context
With context

Uses claude -p flag

100%

100%

Uses --json-schema flag

0%

100%

Uses --model haiku

0%

100%

Uses --setting-sources empty string

0%

100%

Extracts structured_output

20%

100%

Build schema correct

100%

100%

Verify schema correct

100%

100%

Deploy schema correct

100%

100%

No -p for phase commands

0%

0%

Timeout on scoring call

0%

100%

Repository
vercel/vercel-plugin
Evaluated
Agent
Claude Code
Model
Claude Sonnet 4.6

Table of Contents

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.