CtrlK
BlogDocsLog inGet started
Tessl Logo

agent-sandbox

Agent skill for sandbox - invoke with $agent-sandbox

63

4.65x
Quality

Does it follow best practices?

Impact

93%

4.65x

Average score across 3 eval scenarios

SecuritybySnyk

Risky

Do not use without reviewing

SKILL.md
Quality
Evals
Security

Quality

Content

50%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

The body provides a clear role, a useful toolkit of MCP signatures, and a sequenced deployment workflow, but it is padded with role-playing boilerplate, uses placeholder examples, and lacks validation checkpoints for destructive operations.

Suggestions

Trim the 'You are a Flow Nexus Sandbox Agent...' role preamble and the generic quality-standards list to reduce tokens.

Add validation/error-recovery steps to the deployment workflow (e.g., check sandbox_status after create, retry on failure, confirm before sandbox_delete).

Replace placeholder examples (sandbox_id: 'id') with concrete, complete runnable examples, or move the toolkit reference to a separate REFERENCE.md and link to it.

DimensionReasoningScore

Conciseness

The body is mostly efficient with a concrete MCP toolkit block, but includes unnecessary boilerplate ('You are a Flow Nexus Sandbox Agent, an expert in...') and quality-standards prose Claude could infer, so it could be tightened.

2 / 3

Actionability

The toolkit provides real MCP tool signatures with parameters, but examples use placeholders like sandbox_id: 'id' and omit return values, leaving them not fully copy-paste ready.

2 / 3

Workflow Clarity

The 'deployment approach' lists six sequenced steps, but there are no validation checkpoints or error-recovery feedback loops, which caps workflow clarity at 2 for operations that include destructive actions like sandbox_delete.

2 / 3

Progressive Disclosure

The content is organized into clear sections in a single self-contained file, but an inline API/toolkit reference is embedded rather than split out, and the body exceeds the ~50-line simple-skill threshold that would allow a 3 on structure alone.

2 / 3

Total

8

/

12

Passed

Description

40%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

The description is too thin: it only states the skill is for sandboxes and how to invoke it, without describing concrete capabilities or when to use it. It lacks a 'Use when...' trigger clause and specific actions.

Suggestions

List concrete capabilities (e.g., 'Create, configure, and manage E2B sandboxes; execute code in isolated environments') instead of the generic 'Agent skill for sandbox'.

Add an explicit trigger clause such as 'Use when the user needs isolated execution environments, sandbox deployment, or running code safely in E2B.'

Include natural trigger terms like 'E2B sandbox', 'isolated environment', 'sandbox template', and 'execute code in sandbox' to improve distinctiveness and recall.

DimensionReasoningScore

Specificity

The description 'Agent skill for sandbox - invoke with $agent-sandbox' names no concrete actions, only the broad domain 'sandbox', matching the vague 'no actions; abstract language' anchor.

1 / 3

Completeness

It gives a weak 'what' (sandbox skill) but no 'Use when...' or equivalent trigger guidance, so 'when' is missing entirely — capping completeness at 2 per the guidelines.

2 / 3

Trigger Term Quality

'sandbox' is a relevant keyword a user might say, but 'invoke with $agent-sandbox' is a technical invocation hint rather than natural trigger phrasing, and common variations are missing.

2 / 3

Distinctiveness Conflict Risk

'sandbox' is a somewhat specific domain but is generic enough to overlap with other sandbox-related skills, fitting 'somewhat specific but could still overlap'.

2 / 3

Total

7

/

12

Passed

Validation

100%

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation16 / 16 Passed

Validation for skill structure

No warnings or errors.

Repository
ruvnet/claude-flow
Reviewed

Table of Contents

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.