This skill should be used when the user asks to "build background agent", "create hosted coding agent", "set up sandboxed execution", "implement multiplayer agent", or mentions background agents, sandboxed VMs, agent infrastructure, Modal sandboxes, self-spawning agents, or remote coding environments.
43
28%
Does it follow best practices?
Impact
Pending
No eval scenarios have been run
Advisory
Suggest reviewing before use
Optimize this skill with Tessl
npx tessl skill review --optimize ./skills/hosted-agents/SKILL.mdQuality
Discovery
37%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
This description is essentially a trigger-term list with no explanation of what the skill actually does. While the trigger terms are strong and specific, the complete absence of capability descriptions makes it impossible for Claude to understand the skill's purpose or differentiate it from other agent-related skills. The description needs a clear 'what it does' section listing concrete actions and outputs.
Suggestions
Add a concrete capability statement before the trigger clause, e.g., 'Scaffolds background coding agents with Modal sandbox infrastructure, configures VM environments, sets up agent spawning logic, and implements multiplayer agent coordination.'
Restructure to follow the pattern: '[What it does]. Use when [triggers].' — currently the entire description is only the 'Use when' clause with no 'what' component.
Include specific outputs or artifacts the skill produces (e.g., 'generates Dockerfiles, Modal deployment configs, agent orchestration code') to help Claude distinguish this from general infrastructure or agent skills.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | The description contains no concrete actions or capabilities — it only lists trigger phrases. There is no explanation of what the skill actually does (e.g., 'creates infrastructure for...', 'configures Modal sandboxes...', 'deploys agents to...'). It is entirely vague on the 'what'. | 1 / 3 |
Completeness | The description answers 'when' extensively but completely fails to answer 'what does this do'. There is no explanation of the skill's capabilities, outputs, or concrete actions. The rubric states missing 'what' OR 'when' should score 1. | 1 / 3 |
Trigger Term Quality | The description includes a rich set of natural trigger terms users would say: 'build background agent', 'create hosted coding agent', 'set up sandboxed execution', 'Modal sandboxes', 'self-spawning agents', 'remote coding environments'. These are specific and varied enough to cover common user phrasings. | 3 / 3 |
Distinctiveness Conflict Risk | The trigger terms like 'Modal sandboxes', 'background agents', and 'sandboxed VMs' are fairly niche, but without describing what the skill actually does, it could overlap with other agent-related or infrastructure skills. The specificity of some terms (e.g., 'Modal sandboxes') helps, but the lack of capability description introduces ambiguity. | 2 / 3 |
Total | 7 / 12 Passed |
Implementation
20%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This skill reads more like an architectural whitepaper or design document than an actionable skill file. It provides extensive strategic reasoning and conceptual guidance but lacks the concrete, executable examples that would make it useful for Claude to actually build hosted agent infrastructure. The extreme verbosity and absence of code examples are its most significant weaknesses.
Suggestions
Add concrete, executable code examples for key operations: a Modal sandbox definition, a Dockerfile for image building, git configuration commands, WebSocket streaming setup, and session state management with SQLite.
Cut the 'because...' justifications throughout - Claude doesn't need rationale for architectural decisions, just the decisions themselves. This could reduce the content by 30-40%.
Move detailed subsections (Client Implementations, Multiplayer Support, Authentication) into separate reference files and keep SKILL.md as a concise overview with links.
Add a concrete end-to-end workflow with validation steps: e.g., 'Build image → Verify image health → Start sandbox → Validate sandbox ready → Run agent → Extract results → Verify PR created'.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | The skill is extremely verbose at ~300+ lines, with extensive explanatory rationale ('because...') for nearly every point. Much of this explains architectural reasoning Claude already understands. Sections like 'Why Multiplayer Matters' and 'Adoption Strategy' are strategic advice, not actionable skill content. The repeated 'because' justifications add significant token bloat. | 1 / 3 |
Actionability | Despite the length, there is almost no executable code, no concrete commands, no specific API calls, and no copy-paste ready examples. Everything is described at a conceptual/architectural level (e.g., 'Pre-build environment images', 'Take filesystem snapshots') without showing how to actually implement any of it. No Dockerfiles, no Modal sandbox code, no actual git commands, no API endpoint definitions. | 1 / 3 |
Workflow Clarity | The Sandbox-to-API Flow section provides a clear 4-step sequence, and the Guidelines section lists ordered priorities. However, most multi-step processes (image building, sandbox lifecycle, session teardown) lack explicit validation checkpoints or feedback loops. The Gotchas section partially compensates by identifying failure modes but doesn't integrate them into workflows. | 2 / 3 |
Progressive Disclosure | The References section provides well-signaled links to external resources and related skills with 'Read when' guidance, which is good. However, the main body is a monolithic wall of text that could benefit from splitting detailed topics (sandbox infrastructure, client implementations, multiplayer) into separate reference files. The inline content is far too long for a SKILL.md overview. | 2 / 3 |
Total | 6 / 12 Passed |
Validation
100%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 11 / 11 Passed
Validation for skill structure
No warnings or errors.
7a95d94
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.