Run vercel-plugin eval scenarios in Vercel Sandboxes instead of local WezTerm panels. Provisions ephemeral microVMs with Claude Code + plugin pre-installed, runs benchmark prompts, extracts hook artifacts, and produces coverage reports.
73
61%
Does it follow best practices?
Impact
92%
2.09xAverage score across 3 eval scenarios
Advisory
Suggest reviewing before use
Optimize this skill with Tessl
npx tessl skill review --optimize ./.claude/skills/benchmark-sandbox/SKILL.mdDynamic scenarios JSON authoring
Valid JSON array
100%
100%
slug field present
100%
100%
prompt field present
100%
100%
expectedSkills field present
100%
100%
userStories exactly 3
100%
100%
No tech name-dropping
100%
100%
Vercel-labs link in every prompt
0%
100%
Dev server command in every prompt
0%
100%
AI feature included
50%
50%
Storage scenario included
100%
100%
Scheduled task scenario included
100%
100%
Auth/middleware scenario included
100%
100%
Sandbox provisioning code
SDK version 1.8.0
0%
100%
runtime node24
0%
100%
ports 3000 in create
0%
100%
Home dir /home/vercel-sandbox
0%
100%
writeFiles for uploads
0%
100%
AbortSignal timeout
0%
0%
Timestamped project names
100%
100%
API keys via env in create
30%
100%
Snapshot stops source sandbox
30%
100%
No --print/-p for build
0%
100%
Haiku structured scoring module
Uses claude -p flag
100%
100%
Uses --json-schema flag
0%
100%
Uses --model haiku
0%
100%
Uses --setting-sources empty string
0%
100%
Extracts structured_output
20%
100%
Build schema correct
100%
100%
Verify schema correct
100%
100%
Deploy schema correct
100%
100%
No -p for phase commands
0%
0%
Timeout on scoring call
0%
100%
61f1903
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.