agent-sandbox

Agent skill for sandbox - invoke with $agent-sandbox

4.65x

Quality

11%

Does it follow best practices?

Impact

93%

4.65x

Average score across 3 eval scenarios

Securityby

Risky

Do not use without reviewing

Optimize this skill with Tessl

npx tessl skill review --optimize ./.agents/skills/agent-sandbox/SKILL.md

Quality

Discovery

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

This is an extremely weak description that fails on every dimension. It provides no concrete actions, no trigger terms, no 'when to use' guidance, and no distinguishing characteristics. It reads more like a label than a functional description that Claude could use to select the right skill.

Suggestions

Replace the entire description with concrete actions the skill performs (e.g., 'Executes code in an isolated sandbox environment, runs tests, and manages sandbox containers').

Add an explicit 'Use when...' clause with natural trigger terms (e.g., 'Use when the user asks to run code in isolation, test in a sandbox, or execute untrusted scripts').

Remove the invocation instruction ('invoke with $agent-sandbox') from the description, as it is operational metadata rather than descriptive content that helps with skill selection.

Dimension	Reasoning	Score
Specificity	The description provides no concrete actions whatsoever. 'Agent skill for sandbox' is entirely vague and abstract, giving no indication of what the skill actually does.	1 / 3
Completeness	The description fails to answer both 'what does this do' and 'when should Claude use it.' There is no explanation of capabilities and no 'Use when...' clause or equivalent guidance.	1 / 3
Trigger Term Quality	The only potentially relevant term is 'sandbox,' which is overly generic and not a natural keyword a user would say when needing a specific task done. '$agent-sandbox' is a technical invocation command, not a trigger term.	1 / 3
Distinctiveness Conflict Risk	The description is so generic that it provides no distinguishing characteristics. 'Agent skill for sandbox' could refer to virtually any sandbox-related functionality and would be impossible to differentiate from other skills.	1 / 3
	Total	4 / 12 Passed

Implementation

22%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This skill reads more like a persona prompt than an actionable skill document. It spends significant tokens on role description, generic quality standards, and obvious responsibilities rather than providing concrete, executable workflows. The MCP tool signatures are the strongest element but lack integration into complete, validated workflows.

Suggestions

Remove the persona framing and responsibility lists; start directly with the tool signatures and usage patterns Claude needs to follow.

Add explicit validation steps after sandbox creation (check status before executing code) and error handling workflows (what to do when execution fails).

Replace the abstract 'deployment approach' with a concrete end-to-end example: create sandbox → upload files → execute code → check output → cleanup, with actual parameter values.

Cut the 'Quality standards' section entirely — these are generic best practices Claude already knows and waste token budget.

Dimension	Reasoning	Score
Conciseness	The content is verbose with unnecessary persona framing ('You are a Flow Nexus Sandbox Agent'), lists of responsibilities Claude doesn't need explained, and padded quality standards that are generic best practices Claude already knows. The template list and deployment approach sections restate obvious concepts.	1 / 3
Actionability	The MCP tool call examples are concrete and show actual function signatures with parameters, which is useful. However, they aren't fully executable in context (no complete workflow showing how to chain calls), and the deployment approach is abstract guidance rather than specific instructions.	2 / 3
Workflow Clarity	The 6-step 'deployment approach' is vague and lacks validation checkpoints. There's no error recovery guidance, no feedback loops for failed executions, and no explicit verification steps after sandbox creation or code execution. For operations involving isolated environments and resource management, this is insufficient.	1 / 3
Progressive Disclosure	The content is a single monolithic file with no references to external documentation, but it's organized into logical sections with headers. For a skill of this complexity (multiple templates, lifecycle management), some content could be split out, but the lack of bundle files means everything must be inline.	2 / 3
	Total	6 / 12 Passed

Validation

100%

Warnings & errors only

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation — 11 / 11 Passed

Validation for skill structure

No warnings or errors.

Repository: ruvnet/claude-flow
Commit: 2b9e2de

Reviewed: 3 days ago

Table of Contents

Discovery Implementation Validation

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.