Test UI system (PanelUI, ScreenSpace) against the poke example using the iwsdk CLI.
52
58%
Does it follow best practices?
Impact
—
No eval scenarios have been run
Passed
No known issues
Optimize this skill with Tessl
npx tessl skill review --optimize ./.claude/skills/test-ui/SKILL.mdQuality
Discovery
40%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
The description identifies a very specific niche (testing PanelUI/ScreenSpace with iwsdk CLI against the poke example), which makes it distinctive but also quite narrow and jargon-heavy. It lacks an explicit 'Use when...' clause and doesn't enumerate multiple concrete actions, limiting its usefulness for skill selection. The technical specificity helps avoid conflicts but hurts discoverability for users who might phrase requests differently.
Suggestions
Add an explicit 'Use when...' clause, e.g., 'Use when the user wants to run UI tests, verify PanelUI or ScreenSpace behavior, or test against the poke example using iwsdk.'
Include natural language trigger terms a user might say, such as 'run UI tests', 'test panels', 'screen space rendering', 'poke example test'.
Expand the 'what' portion to list specific actions, e.g., 'Runs UI tests for PanelUI and ScreenSpace components, validates rendering output, and reports test results using the iwsdk CLI against the poke example.'
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | Names a specific domain (UI testing with PanelUI, ScreenSpace) and mentions concrete tools (iwsdk CLI, poke example), but doesn't list multiple distinct actions—it's essentially one action: 'test UI system against the poke example.' | 2 / 3 |
Completeness | Describes what it does (test UI system against poke example) but has no explicit 'Use when...' clause or equivalent trigger guidance. Per the rubric, a missing 'Use when...' clause caps completeness at 2, and the 'what' is also fairly thin, so this scores a 1. | 1 / 3 |
Trigger Term Quality | Includes some relevant keywords like 'PanelUI', 'ScreenSpace', 'iwsdk CLI', and 'poke example', but these are highly technical/project-specific terms. Missing natural language variations a user might say (e.g., 'run UI tests', 'test panels', 'screen space testing'). | 2 / 3 |
Distinctiveness Conflict Risk | The description is highly specific to a particular UI system (PanelUI, ScreenSpace), a specific example (poke), and a specific CLI tool (iwsdk). This is unlikely to conflict with other skills due to its narrow, project-specific scope. | 3 / 3 |
Total | 8 / 12 Passed |
Implementation
77%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This is a well-crafted test execution skill with highly actionable, concrete commands and specific assertions. The workflow is clearly sequenced with proper validation checkpoints, error recovery procedures, and failure handling. The main weakness is its length—the detailed assertion values for each test suite make it verbose, and some of the 'Known Issues' content is informational rather than directly actionable for the test workflow.
Suggestions
Consider moving the detailed per-suite assertion values into a separate TEST_ASSERTIONS.md reference file, keeping SKILL.md as a concise workflow overview with links to detailed specs.
Trim the 'Known Issues' section to only items that directly affect test execution (e.g., keep 'PanelDocument loading is async' and 'Entity indices change on reload', but remove 'Panel interaction' and 'ScreenSpace re-parenting in XR' which are informational).
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | The skill is mostly efficient and avoids explaining concepts Claude already knows, but it's quite long (~180 lines) with some redundancy. The 'Known Issues & Workarounds' section adds useful context, but some items (like 'Panel interaction' and 'ScreenSpace re-parenting in XR') are informational rather than actionable for the test workflow. The repeated 'IMPORTANT' callouts and some verbose assertion descriptions could be tightened. | 2 / 3 |
Actionability | Every step has concrete, executable CLI commands with exact flags and arguments. Assertions are specific with expected values (e.g., `maxWidth = 0.5`, `height = "50%"`). The commands are copy-paste ready with clear JSON input parameters, and the output format is specified for parsing. | 3 / 3 |
Workflow Clarity | The workflow is clearly sequenced with numbered steps, explicit validation checkpoints (verify connectivity before testing, check for errors in pre-test setup, poll for server readiness), feedback loops (retry logic in Recovery section), and clear failure handling (skip to Step 5 if server fails). The pre-test setup includes sleep commands and assertion checks before proceeding to test suites. | 3 / 3 |
Progressive Disclosure | The content is well-structured with clear section headers and logical grouping of test suites, but it's a monolithic document that could benefit from splitting detailed test assertions into a separate reference file. For a skill of this length (~180 lines of dense test specifications), the inline approach makes it harder to navigate. However, no bundle files exist to reference, and the section organization is reasonable. | 2 / 3 |
Total | 10 / 12 Passed |
Validation
90%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 10 / 11 Passed
Validation for skill structure
| Criteria | Description | Result |
|---|---|---|
frontmatter_unknown_keys | Unknown frontmatter key(s) found; consider removing or moving to metadata | Warning |
Total | 10 / 11 Passed | |
e8f9df1
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.