Agent skill for sandbox - invoke with $agent-sandbox
46
18%
Does it follow best practices?
Impact
93%
4.65xAverage score across 3 eval scenarios
Risky
Do not use without reviewing
Optimize this skill with Tessl
npx tessl skill review --optimize ./.agents/skills/agent-sandbox/SKILL.mdQuality
Discovery
0%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
This is an extremely weak description that fails on every dimension. It provides no information about what the skill does, when it should be used, or what distinguishes it from other skills. It reads more like a label than a functional description.
Suggestions
Describe the concrete actions this skill performs (e.g., 'Executes code in an isolated sandbox environment, manages sandbox sessions, and retrieves execution results').
Add an explicit 'Use when...' clause with natural trigger terms that describe scenarios where this skill should be selected (e.g., 'Use when the user asks to run code safely, test scripts in isolation, or execute untrusted code').
Remove the invocation command ('invoke with $agent-sandbox') from the description and replace it with functional details that help Claude distinguish this skill from others.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | The description contains no concrete actions whatsoever. 'Agent skill for sandbox' is entirely vague and does not describe what the skill actually does. | 1 / 3 |
Completeness | Neither 'what does this do' nor 'when should Claude use it' is answered. The description only states it's an 'agent skill for sandbox' with an invocation command, providing no functional or contextual information. | 1 / 3 |
Trigger Term Quality | The only potentially relevant term is 'sandbox', which is generic and technical. There are no natural keywords a user would say when needing this skill. '$agent-sandbox' is an invocation command, not a trigger term. | 1 / 3 |
Distinctiveness Conflict Risk | The term 'sandbox' is extremely generic and could overlap with any number of skills involving sandboxed environments, testing, or isolated execution. There is nothing distinctive about this description. | 1 / 3 |
Total | 4 / 12 Passed |
Implementation
37%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
The skill provides a reasonable reference for E2B sandbox MCP tool signatures, which is its primary value. However, it is padded with generic persona framing, obvious best practices, and vague workflow steps that don't add actionable value. The workflow lacks validation checkpoints critical for sandbox lifecycle management, and the content could be cut by roughly 40% without losing useful information.
Suggestions
Replace the vague 6-step 'deployment approach' with a concrete workflow that includes validation checkpoints (e.g., check sandbox_status after creation before executing code, verify execution output before cleanup).
Remove the persona introduction, 'core responsibilities' list, and 'quality standards' section—these are generic concepts Claude already knows and waste tokens.
Add a complete end-to-end example showing sandbox creation → code execution → output verification → cleanup as a single executable workflow.
Add error handling examples showing what to do when sandbox creation fails or code execution returns errors.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | The skill includes some unnecessary padding like the persona framing ('You are a Flow Nexus Sandbox Agent, an expert in...'), the 'core responsibilities' list which largely restates obvious things, and the 'quality standards' section which lists generic best practices Claude already knows. The toolkit section is useful but the surrounding content could be significantly tightened. | 2 / 3 |
Actionability | The toolkit section provides concrete function signatures with example parameters, which is useful. However, the code examples are illustrative rather than fully executable workflows—they show individual API calls but don't demonstrate complete end-to-end usage patterns or error handling. The deployment approach is a generic checklist rather than concrete guidance. | 2 / 3 |
Workflow Clarity | The 'deployment approach' is a vague 6-step list with no validation checkpoints, no error recovery steps, and no feedback loops. For sandbox management involving resource creation and deletion (destructive operations), there are no verification steps (e.g., confirm sandbox is running before executing code, validate execution output before cleanup). The workflow is more of a conceptual overview than an actionable sequence. | 1 / 3 |
Progressive Disclosure | The content is organized into logical sections (toolkit, deployment approach, templates, quality standards) which provides some structure. However, it's a monolithic file with no references to external documentation, and the template descriptions and quality standards could be separated or omitted. No navigation to deeper resources is provided. | 2 / 3 |
Total | 7 / 12 Passed |
Validation
100%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 11 / 11 Passed
Validation for skill structure
No warnings or errors.
398f7c2
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.