Add integration/E2E tests to existing codebase using Design Docs
45
47%
Does it follow best practices?
Impact
—
No eval scenarios have been run
Passed
No known issues
Optimize this skill with Tessl
npx tessl skill review --optimize ./skills/recipe-add-integration-tests/SKILL.mdQuality
Discovery
32%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
The description is too terse and lacks both detailed capability listing and explicit trigger guidance. While it identifies a reasonably specific niche (adding integration/E2E tests using design docs), it fails to enumerate concrete actions or provide a 'Use when...' clause, making it difficult for Claude to reliably select this skill from a large pool.
Suggestions
Add an explicit 'Use when...' clause with trigger terms like 'add integration tests', 'write E2E tests', 'end-to-end testing', 'test coverage from design doc', 'acceptance tests'.
List specific concrete actions such as 'Reads design documents to identify test scenarios, generates integration test files, creates E2E test suites with setup/teardown, and validates API contracts'.
Include common keyword variations users might use: 'end-to-end', 'e2e', 'integration testing', 'test generation', 'design document', 'spec-based tests'.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | Names the domain (integration/E2E tests) and a key action (add tests using Design Docs), but doesn't list multiple concrete actions like generating test cases, writing assertions, setting up test fixtures, etc. | 2 / 3 |
Completeness | Describes what it does (add integration/E2E tests using Design Docs) but has no explicit 'Use when...' clause or equivalent trigger guidance, which per the rubric should cap completeness at 2, and the 'what' itself is also quite thin, placing this at 1. | 1 / 3 |
Trigger Term Quality | Includes relevant terms like 'integration tests', 'E2E tests', and 'Design Docs', but misses common variations users might say such as 'end-to-end', 'test coverage', 'testing', 'test suite', or 'acceptance tests'. | 2 / 3 |
Distinctiveness Conflict Risk | The combination of 'integration/E2E tests' with 'Design Docs' provides some distinctiveness, but it could overlap with general testing skills or design doc skills. The scope is somewhat specific but not sharply delineated. | 2 / 3 |
Total | 7 / 12 Passed |
Implementation
62%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This is a well-structured orchestration workflow with strong sequencing, validation gates, and feedback loops for error recovery. Its main weaknesses are moderate verbosity (repeated routing patterns, unnecessary orchestrator identity preamble) and reliance on external skills/agents that aren't bundled, making full actionability hard to verify. The workflow clarity is the standout strength.
Suggestions
Deduplicate the subagent routing logic (repeated in Steps 4, 6, 7) by defining it once at the top as a routing table, then referencing it in each step.
Remove the 'Orchestrator Definition' preamble (Core Identity, Why Delegate) — Claude doesn't need to be told why delegation saves context; just instruct the behavior directly.
Include bundle files for referenced skills (acceptance-test-generator, integration-test-reviewer, quality-fixer, monorepo-flow.md) or at minimum provide a file listing so the progressive disclosure can be properly evaluated.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | The skill is moderately efficient but includes some redundancy—routing rules are repeated across Steps 4, 6, and 7, and the orchestrator identity/delegation rationale at the top is unnecessary context for Claude. The template and step descriptions could be tightened. | 2 / 3 |
Actionability | Steps provide concrete tool invocations (Agent tool with subagent_type, bash commands) and specific file naming conventions, but many details rely on external skills/agents (acceptance-test-generator, task-executor, quality-fixer) without showing their actual behavior. The bash in Step 1 is executable, but most steps are structured instructions rather than copy-paste-ready commands. | 2 / 3 |
Workflow Clarity | The multi-step workflow is clearly sequenced (Steps 0-9) with explicit validation checkpoints: Step 3 is marked as a GATE, Step 5 review feeds back to Step 6, Step 7 quality check loops back to Step 4 on stub_detected, and there's clear escalation on 'blocked'. The per-task-file sequential execution through Steps 4→5→6→7 is well-defined. | 3 / 3 |
Progressive Disclosure | The skill references external skills (documentation-criteria, acceptance-test-generator, integration-test-reviewer, quality-fixer, monorepo-flow.md) but no bundle files are provided to verify these exist. The content is all inline in one file—the task file template and scope boundary block could potentially be extracted—but references are at least one-level deep and clearly signaled. | 2 / 3 |
Total | 9 / 12 Passed |
Validation
90%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 10 / 11 Passed
Validation for skill structure
| Criteria | Description | Result |
|---|---|---|
frontmatter_unknown_keys | Unknown frontmatter key(s) found; consider removing or moving to metadata | Warning |
Total | 10 / 11 Passed | |
68ecb4a
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.