Run isolated E2E tests in devcontainer from ai_docs/tests runbooks. Use this skill whenever the user asks to: run an E2E test, execute a test runbook, validate a feature end-to-end, create a new runbook, or test CLI behavior in isolation. If you need to run a multi-step CLI validation sequence (init → install → sync → verify), this is the skill — it handles ssenv isolation, flag verification, and structured reporting. Prefer this over ad-hoc docker exec sequences for any test that follows a runbook or needs reproducible isolation.
90
88%
Does it follow best practices?
Impact
Pending
No eval scenarios have been run
Advisory
Suggest reviewing before use
Quality
Discovery
100%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
This is an excellent skill description that hits all the marks. It provides specific concrete actions, includes natural trigger terms developers would use, explicitly states both what the skill does and when to use it, and carves out a clear niche that distinguishes it from general testing or Docker skills. The guidance on when to prefer this skill over alternatives is particularly helpful for skill selection.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | Lists multiple concrete actions: 'run an E2E test', 'execute a test runbook', 'validate a feature end-to-end', 'create a new runbook', 'test CLI behavior in isolation', and describes specific workflow 'init → install → sync → verify'. Also mentions specific capabilities like 'ssenv isolation, flag verification, and structured reporting'. | 3 / 3 |
Completeness | Clearly answers both what ('Run isolated E2E tests in devcontainer from ai_docs/tests runbooks') and when ('Use this skill whenever the user asks to: run an E2E test, execute a test runbook...'). Includes explicit 'Use this skill whenever' clause with multiple trigger scenarios and even provides guidance on when to prefer this over alternatives. | 3 / 3 |
Trigger Term Quality | Excellent coverage of natural terms users would say: 'E2E test', 'test runbook', 'end-to-end', 'CLI behavior', 'docker exec', 'devcontainer', 'isolation'. These are terms developers would naturally use when requesting this functionality. | 3 / 3 |
Distinctiveness Conflict Risk | Highly distinctive with clear niche: E2E tests specifically in devcontainer context, runbook-based testing, ssenv isolation. The explicit contrast with 'ad-hoc docker exec sequences' helps distinguish from general Docker or testing skills. Specific mention of 'ai_docs/tests runbooks' location adds uniqueness. | 3 / 3 |
Total | 12 / 12 Passed |
Implementation
77%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This is a highly actionable, well-structured skill for running E2E tests with clear workflows and validation checkpoints. The executable code examples and jq assertion patterns are excellent. Main weakness is length—the document tries to be both a workflow guide and a reference manual, which could benefit from splitting detailed references (checklist, assertion types, command templates) into separate files.
Suggestions
Extract the Runbook Quality Checklist into a separate CHECKLIST.md file and reference it from the main skill
Move the `--json` Quick Reference and Container Command Templates to a REFERENCE.md file to reduce main document length
Consolidate redundant information about ssenv isolation (mentioned in Rules, ssenv Quick Reference, and Container Command Templates)
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | The skill is comprehensive but includes some redundancy (e.g., the checklist repeats information from earlier sections, multiple explanations of ssenv isolation). Some sections like the relationship table with /mdproof could be trimmed. However, most content is project-specific knowledge Claude wouldn't have. | 2 / 3 |
Actionability | Excellent executable guidance throughout: complete docker exec commands, specific jq patterns, copy-paste ready templates, and concrete examples for every operation. The Container Command Templates and JSON assertion patterns are immediately usable. | 3 / 3 |
Workflow Clarity | Clear 5-phase workflow (Environment Check → Detect Scope → Select Tests → Prepare & Execute → Cleanup & Report) with explicit validation checkpoints. Phase 3 includes error recovery paths (filter failures with jq, debug individually). The checklist provides pre-execution validation. | 3 / 3 |
Progressive Disclosure | Content is well-organized with clear sections and tables, but it's a monolithic document (~400 lines). The Runbook Quality Checklist, Assertion Types reference, and Quick References could be split into separate files. References to /mdproof skill and lessons-learned.md are good but inline content is heavy. | 2 / 3 |
Total | 10 / 12 Passed |
Validation
90%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 10 / 11 Passed
Validation for skill structure
| Criteria | Description | Result |
|---|---|---|
frontmatter_unknown_keys | Unknown frontmatter key(s) found; consider removing or moving to metadata | Warning |
Total | 10 / 11 Passed | |
053ecb4
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.