Agent skill for sandbox - invoke with $agent-sandbox
41
11%
Does it follow best practices?
Impact
93%
4.65xAverage score across 3 eval scenarios
Risky
Do not use without reviewing
Optimize this skill with Tessl
npx tessl skill review --optimize ./.agents/skills/agent-sandbox/SKILL.mdQuality
Discovery
0%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
This is an extremely weak description that fails on every dimension. It provides no information about what the skill does, when it should be used, or what distinguishes it from other skills. It reads more like a label than a functional description.
Suggestions
Describe the concrete actions this skill performs (e.g., 'Executes code in an isolated sandbox environment, manages sandbox sessions, and retrieves execution results').
Add an explicit 'Use when...' clause with natural trigger terms that describe scenarios where this skill should be selected (e.g., 'Use when the user asks to run code safely, test scripts in isolation, or execute untrusted code').
Remove the invocation command ('invoke with $agent-sandbox') from the description and replace it with capability and context information that helps Claude distinguish this skill from others.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | The description contains no concrete actions whatsoever. 'Agent skill for sandbox' is entirely vague and does not describe what the skill actually does. | 1 / 3 |
Completeness | Neither 'what does this do' nor 'when should Claude use it' is answered. The description only states it's an 'agent skill for sandbox' with an invocation command, providing no functional or contextual information. | 1 / 3 |
Trigger Term Quality | The only potentially relevant term is 'sandbox', which is generic and technical. There are no natural keywords a user would say when needing this skill. '$agent-sandbox' is an invocation command, not a trigger term. | 1 / 3 |
Distinctiveness Conflict Risk | The term 'sandbox' is extremely generic and could overlap with any number of skills involving sandboxed environments, testing, or isolated execution. There is nothing distinctive about this description. | 1 / 3 |
Total | 4 / 12 Passed |
Implementation
22%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This skill reads more like a persona prompt than an actionable skill document. It spends significant tokens on role description, generic quality standards, and abstract workflow steps that Claude already understands. The MCP tool signatures are the most valuable part but lack concrete usage workflows, error handling patterns, and validation checkpoints.
Suggestions
Remove the persona framing ('You are a Flow Nexus Sandbox Agent...') and generic quality standards; replace with a concise purpose statement and jump straight to tool usage.
Add a concrete end-to-end workflow example showing sandbox creation → code execution → output capture → cleanup, with actual expected responses and error handling.
Add explicit validation checkpoints: check sandbox_status after creation before executing code, verify execution output before proceeding, confirm deletion succeeded.
Replace the abstract 'deployment approach' steps with specific decision trees or conditional logic (e.g., 'If template is python and packages needed, use install_packages parameter').
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | The content is verbose with significant padding. It explains Claude's role and responsibilities in a persona-style format ('You are a Flow Nexus Sandbox Agent'), lists obvious quality standards ('implement proper error handling'), and includes generic advice Claude already knows ('always consider security isolation, resource efficiency'). The template list and quality standards sections add little actionable value. | 1 / 3 |
Actionability | The MCP tool call examples are concrete and show actual function signatures with parameters, which is useful. However, they are illustrative rather than fully executable—there's no real workflow showing how to chain these calls, handle responses, or deal with errors. The deployment approach section is generic and abstract rather than providing specific guidance. | 2 / 3 |
Workflow Clarity | The 6-step 'deployment approach' is vague and generic (e.g., 'Analyze Requirements', 'Monitor Performance') without concrete validation checkpoints or error recovery steps. There's no feedback loop for handling failed sandbox creation, execution errors, or cleanup failures. For operations involving resource lifecycle management, this lack of validation caps the score. | 1 / 3 |
Progressive Disclosure | The content is organized into logical sections (toolkit, templates, quality standards) which provides some structure. However, there are no references to external files, and content that could be separated (like the full template descriptions or detailed API reference) is inline. For a standalone skill with no bundle, the organization is adequate but not optimal. | 2 / 3 |
Total | 6 / 12 Passed |
Validation
100%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 11 / 11 Passed
Validation for skill structure
No warnings or errors.
9d4a9ea
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.