Playwright E2E testing patterns, Page Object Model, configuration, CI/CD integration, artifact management, and flaky test strategies.
62
45%
Does it follow best practices?
Impact
92%
1.58xAverage score across 3 eval scenarios
Passed
No known issues
Optimize this skill with Tessl
npx tessl skill review --optimize ./.agents/skills/e2e-testing/SKILL.mdQuality
Discovery
32%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
The description identifies the domain (Playwright E2E testing) and lists relevant topic areas, but reads more like a table of contents than an actionable skill description. It lacks concrete action verbs describing what the skill does and entirely omits 'when to use' guidance, making it difficult for Claude to reliably select this skill from a large pool.
Suggestions
Add a 'Use when...' clause with explicit triggers, e.g., 'Use when the user asks about writing Playwright tests, debugging flaky E2E tests, setting up browser test automation, or configuring Playwright for CI/CD pipelines.'
Replace topic nouns with concrete action phrases, e.g., 'Generates Playwright E2E test files using Page Object Model patterns, configures playwright.config.ts, sets up artifact collection, and diagnoses flaky tests.'
Include common user-facing trigger terms and file extensions like 'end-to-end tests', 'browser testing', 'test automation', '.spec.ts', 'playwright.config.ts' to improve keyword coverage.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | Names the domain (Playwright E2E testing) and lists several topic areas (Page Object Model, configuration, CI/CD integration, artifact management, flaky test strategies), but these are more like categories than concrete actions. No verbs describing what the skill actually does (e.g., 'generates test files', 'configures test runners'). | 2 / 3 |
Completeness | Describes the 'what' at a topic level but completely lacks any 'when should Claude use it' guidance. There is no 'Use when...' clause or equivalent explicit trigger guidance, which per the rubric should cap completeness at 2, and since the 'what' is also weak (topics rather than actions), this scores a 1. | 1 / 3 |
Trigger Term Quality | Includes relevant keywords like 'Playwright', 'E2E testing', 'Page Object Model', 'CI/CD', and 'flaky test' that users might naturally mention. However, it misses common variations like 'end-to-end tests', 'browser testing', 'test automation', '.spec.ts', or 'playwright.config'. | 2 / 3 |
Distinctiveness Conflict Risk | Mentioning 'Playwright' specifically helps distinguish it from generic testing skills, but terms like 'E2E testing', 'CI/CD integration', and 'configuration' are broad enough to overlap with other testing framework skills or CI/CD skills. | 2 / 3 |
Total | 7 / 12 Passed |
Implementation
57%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
The skill provides excellent, actionable Playwright code examples covering a wide range of E2E testing concerns. However, it suffers from being a monolithic document that tries to cover too many topics in one file without progressive disclosure or clear workflow sequencing. Trimming domain-specific sections (Web3, financial flows) into separate files and adding an explicit workflow with validation steps would significantly improve it.
Suggestions
Split domain-specific sections (Wallet/Web3 Testing, Financial Flow Testing) and reference material (CI/CD config, artifact management) into separate linked files, keeping SKILL.md as a concise overview with navigation.
Add an explicit numbered workflow for setting up and running an E2E test suite (e.g., 1. Configure playwright.config.ts, 2. Create page objects, 3. Write tests, 4. Run and validate, 5. Review artifacts), with validation checkpoints.
Remove the Test Report Template section — Claude can generate report templates on demand without needing this boilerplate in the skill.
Trim the bad/good pattern comparisons in the flaky test section; Claude already understands auto-waiting vs arbitrary timeouts — a brief note with the 'good' pattern only would suffice.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | The skill is fairly efficient with code examples doing most of the talking, but it's quite long and includes sections like Wallet/Web3 testing and Financial/Critical Flow testing that are domain-specific additions adding bulk. The 'Test Report Template' section is boilerplate Claude could generate. Some patterns (like the bad/good comparisons in flaky tests) are things Claude already knows. | 2 / 3 |
Actionability | Nearly all guidance is provided as fully executable, copy-paste-ready TypeScript code, bash commands, and YAML configurations. The POM example, config, CI workflow, and flaky test patterns are all concrete and immediately usable. | 3 / 3 |
Workflow Clarity | The skill presents many patterns but lacks explicit multi-step workflows with validation checkpoints. There's no clear sequence like 'first set up config, then create page objects, then write tests, then validate.' The flaky test section has good diagnostic steps but no feedback loop for fixing and re-verifying. | 2 / 3 |
Progressive Disclosure | This is a monolithic wall of content (~250+ lines) with no references to external files. Sections like Wallet/Web3 testing, Financial Flow testing, artifact management, and CI/CD could easily be split into separate reference files. There are no navigation links or cross-references. | 1 / 3 |
Total | 8 / 12 Passed |
Validation
90%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 10 / 11 Passed
Validation for skill structure
| Criteria | Description | Result |
|---|---|---|
frontmatter_unknown_keys | Unknown frontmatter key(s) found; consider removing or moving to metadata | Warning |
Total | 10 / 11 Passed | |
5df943e
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.