agent-sandbox

Agent skill for sandbox - invoke with $agent-sandbox

4.65x

Quality

18%

Does it follow best practices?

Impact

93%

4.65x

Average score across 3 eval scenarios

Securityby

Risky

Do not use without reviewing

Optimize this skill with Tessl

npx tessl skill review --optimize ./.agents/skills/agent-sandbox/SKILL.md

Quality

Discovery

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

This is an extremely weak description that fails on every dimension. It provides no information about what the skill does, when it should be used, or what distinguishes it from other skills. It reads more like a label or placeholder than a functional description.

Suggestions

Replace the entire description with concrete actions the skill performs (e.g., 'Executes code in an isolated sandbox environment, runs tests, and manages temporary files').

Add an explicit 'Use when...' clause with natural trigger terms that describe scenarios where this skill should be selected (e.g., 'Use when the user needs to run untrusted code, test scripts in isolation, or execute commands in a sandboxed environment').

Remove the invocation command ('$agent-sandbox') from the description, as it does not help Claude decide when to select this skill and adds no semantic value.

Dimension	Reasoning	Score
Specificity	The description contains no concrete actions whatsoever. 'Agent skill for sandbox' is entirely vague and does not describe what the skill actually does.	1 / 3
Completeness	Neither 'what does this do' nor 'when should Claude use it' is answered. The description only states it's an 'agent skill for sandbox' with an invocation command, providing no functional or contextual information.	1 / 3
Trigger Term Quality	The only potentially relevant term is 'sandbox', which is generic and technical. There are no natural keywords a user would say when needing this skill. '$agent-sandbox' is an invocation command, not a trigger term.	1 / 3
Distinctiveness Conflict Risk	The term 'sandbox' is extremely generic and could overlap with any number of skills involving sandboxed environments, testing, or isolated execution. There is nothing distinctive about this description.	1 / 3
	Total	4 / 12 Passed

Implementation

37%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

The skill provides a useful reference for E2B sandbox MCP tool signatures but is weakened by verbose persona framing, generic best-practice lists Claude already knows, and a lack of concrete multi-step workflows with validation checkpoints. The actionable core (the toolkit code block) is buried among unnecessary content, and the deployment approach lacks the specificity needed for reliable sandbox lifecycle management.

Suggestions

Remove the persona framing, 'core responsibilities' list, and 'quality standards' section—these are generic concepts Claude already knows. Focus the skill on the concrete toolkit API and usage patterns.

Add a complete end-to-end workflow example showing sandbox creation → code execution → output validation → cleanup, with explicit error handling and validation checkpoints between steps.

Include specific error scenarios and recovery steps (e.g., what to do if sandbox_create times out, how to verify execution succeeded before cleanup).

Replace the vague 'deployment approach' with concrete decision logic, e.g., 'If the user needs React: use template react with install_packages for additional deps; verify with sandbox_status before executing code.'

Dimension	Reasoning	Score
Conciseness	The skill includes some unnecessary padding like the persona framing ('You are a Flow Nexus Sandbox Agent, an expert in...'), the 'core responsibilities' list which largely restates obvious things, and the 'quality standards' section which lists generic best practices Claude already knows. The toolkit section is useful but the surrounding content could be significantly tightened.	2 / 3
Actionability	The toolkit section provides concrete function signatures with example parameters, which is helpful. However, the code examples are illustrative rather than fully executable workflows—they show individual API calls but don't demonstrate complete end-to-end usage patterns or error handling. The deployment approach is a generic checklist rather than concrete guidance.	2 / 3
Workflow Clarity	The 'deployment approach' is a vague 6-step list with no validation checkpoints, no error recovery steps, and no feedback loops. For sandbox management involving resource creation and deletion (destructive/lifecycle operations), there are no explicit validation or verification steps between creating, executing, and cleaning up sandboxes.	1 / 3
Progressive Disclosure	The content is organized into logical sections (toolkit, deployment approach, templates, quality standards), which provides some structure. However, it's a monolithic file with no references to external documentation, and the template descriptions and quality standards could be separated or omitted. For a skill of this size (~70 lines of content), the organization is adequate but not optimal.	2 / 3
	Total	7 / 12 Passed

Validation

100%

Warnings & errors only

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation — 11 / 11 Passed

Validation for skill structure

No warnings or errors.

Repository: ruvnet/ruflo
Commit: ccb062f

Reviewed: 3 days ago

Table of Contents

Discovery Implementation Validation

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.