Test level system (LevelRoot, LevelTag, default lighting, scene hierarchy) against the poke example using the iwsdk CLI.
50
55%
Does it follow best practices?
Impact
—
No eval scenarios have been run
Passed
No known issues
Optimize this skill with Tessl
npx tessl skill review --optimize ./.claude/skills/test-level/SKILL.mdQuality
Discovery
40%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
The description targets a very specific niche (testing a level system via iwsdk CLI), which gives it strong distinctiveness, but it lacks a 'Use when...' clause and relies heavily on domain-specific jargon without natural trigger terms. The actions described are narrow ('test against the poke example') rather than listing multiple concrete capabilities.
Suggestions
Add an explicit 'Use when...' clause, e.g., 'Use when the user wants to test or validate level system components like LevelRoot, LevelTag, lighting, or scene hierarchy using the iwsdk CLI.'
Include natural language trigger terms a user might say, such as 'run level tests', 'validate scene setup', 'check poke example', or 'iwsdk test'.
List more specific concrete actions beyond just 'test against the poke example', e.g., 'Validates LevelRoot and LevelTag configuration, checks default lighting setup, verifies scene hierarchy, and runs poke example tests via the iwsdk CLI.'
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | The description names a specific domain (test level system) and mentions concrete components (LevelRoot, LevelTag, default lighting, scene hierarchy) and a specific tool (iwsdk CLI), but the actual actions are limited to 'test...against the poke example' without listing multiple distinct concrete actions. | 2 / 3 |
Completeness | The description addresses 'what' (test level system components against the poke example) but completely lacks a 'Use when...' clause or any explicit trigger guidance for when Claude should select this skill. Per the rubric, a missing 'Use when...' clause caps completeness at 2, and the 'what' itself is also somewhat unclear, warranting a score of 1. | 1 / 3 |
Trigger Term Quality | Includes some relevant technical keywords like 'LevelRoot', 'LevelTag', 'iwsdk CLI', 'scene hierarchy', and 'poke example', but these are highly domain-specific jargon. Common natural language variations a user might say (e.g., 'run level tests', 'validate scene setup') are missing. | 2 / 3 |
Distinctiveness Conflict Risk | The description is highly specific to a particular system (level system with LevelRoot, LevelTag, poke example, iwsdk CLI), making it very unlikely to conflict with other skills. It occupies a clear, narrow niche. | 3 / 3 |
Total | 8 / 12 Passed |
Implementation
70%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This is a highly actionable and well-sequenced test skill with concrete commands, clear assertions, and good error recovery workflows. Its main weakness is that all content is packed into a single large file with no progressive disclosure — the MCPCALL helper, known issues, and individual test suites could benefit from being split into referenced files. Some minor verbosity in explanatory comments could be trimmed.
Suggestions
Extract the MCPCALL shell function into a separate helper script file (e.g., mcpcall.sh) and reference it from the skill, reducing inline bulk significantly.
Move 'Known Issues & Workarounds' to a separate KNOWN_ISSUES.md file and link to it, keeping the main skill focused on execution steps.
Consider splitting the 5 test suites into individual files or a TESTS.md reference document, with the main SKILL.md providing only the workflow overview and setup/teardown steps.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | The skill is fairly long (~200+ lines) with some unnecessary verbosity. The MCPCALL shell function is a large inline block that could be referenced externally. The 'Known Issues & Workarounds' section and some explanatory comments (e.g., 'The LevelSystem enforces identity transform on the level root every frame') add context Claude doesn't strictly need. However, most content is functional and test-specific. | 2 / 3 |
Actionability | Every step has concrete, executable bash commands with specific tool names, JSON arguments, and clear assertions with expected values. The MCPCALL helper is fully executable code, and each test specifies exact expected outputs (e.g., position [0,0,0], id 'level:default', exactly 10 entities). | 3 / 3 |
Workflow Clarity | The workflow is clearly sequenced (install → start server → verify connectivity → run suites → cleanup) with explicit validation at each step. There are feedback loops for server startup failure, retry logic for transient errors, and a dedicated Recovery section. Pre-test setup includes sleep/wait steps and console log checks before proceeding. | 3 / 3 |
Progressive Disclosure | Everything is in a single monolithic file with no references to external files. The large MCPCALL shell function, known issues, and detailed test suites could be split into separate files. With no bundle files provided, there's no external structure to support navigation or layered discovery. | 1 / 3 |
Total | 9 / 12 Passed |
Validation
90%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 10 / 11 Passed
Validation for skill structure
| Criteria | Description | Result |
|---|---|---|
frontmatter_unknown_keys | Unknown frontmatter key(s) found; consider removing or moving to metadata | Warning |
Total | 10 / 11 Passed | |
b3d1162
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.