Playwright E2E testing patterns, Page Object Model, configuration, CI/CD integration, artifact management, and flaky test strategies.
60
60%
Does it follow best practices?
Impact
Pending
No eval scenarios have been run
Passed
No known issues
Quality
Discovery
32%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
The description identifies the domain (Playwright E2E testing) and lists relevant topic areas, but reads more like a table of contents than an actionable skill description. It lacks concrete action verbs describing what the skill does and entirely omits a 'Use when...' clause, making it difficult for Claude to know when to select this skill over others.
Suggestions
Add a 'Use when...' clause with explicit triggers, e.g., 'Use when the user asks about writing Playwright tests, debugging flaky E2E tests, setting up Page Object Models, or configuring Playwright for CI/CD pipelines.'
Replace topic nouns with concrete action phrases, e.g., 'Guides writing Playwright E2E tests using Page Object Model patterns, configures Playwright for CI/CD pipelines, manages test artifacts, and diagnoses flaky test failures.'
Include common user-facing trigger terms and variations such as 'end-to-end tests', 'browser testing', 'test automation', 'spec.ts', and 'playwright.config' to improve keyword coverage.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | Names the domain (Playwright E2E testing) and lists several areas like Page Object Model, configuration, CI/CD integration, artifact management, and flaky test strategies, but these are topic areas rather than concrete actions (no verbs like 'create', 'configure', 'debug'). | 2 / 3 |
Completeness | Describes what the skill covers (Playwright testing topics) but completely lacks a 'Use when...' clause or any explicit trigger guidance for when Claude should select this skill. Per the rubric, a missing 'Use when...' clause caps completeness at 2, and the 'what' portion is also weak (topic listing rather than clear capability description), warranting a 1. | 1 / 3 |
Trigger Term Quality | Includes relevant keywords like 'Playwright', 'E2E testing', 'Page Object Model', 'CI/CD', and 'flaky test' that users might naturally use, but misses common variations like 'end-to-end tests', 'browser testing', 'test automation', 'Playwright config', or file extensions like 'spec.ts'. | 2 / 3 |
Distinctiveness Conflict Risk | Mentioning 'Playwright' specifically helps distinguish it from generic testing skills, but terms like 'E2E testing', 'CI/CD integration', and 'configuration' are broad enough to overlap with other testing framework skills (e.g., Cypress, Selenium) or CI/CD-focused skills. | 2 / 3 |
Total | 7 / 12 Passed |
Implementation
57%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
The skill excels at actionability with high-quality, executable Playwright code examples covering POM, configuration, flaky tests, CI/CD, and artifact management. However, it suffers from being a monolithic document that tries to cover too many topics inline without progressive disclosure or clear cross-file references. The lack of an overarching workflow sequence and the inclusion of domain-specific sections (Web3, financial flows) bloat the content unnecessarily.
Suggestions
Split domain-specific sections (Wallet/Web3 testing, Financial/Critical Flow testing) and detailed reference content (artifact management, CI/CD) into separate linked files, keeping SKILL.md as a concise overview with navigation.
Add a brief workflow overview at the top showing the recommended sequence: configure → create page objects → write tests → run & debug → CI integration, to tie all sections together.
Remove the Test Report Template section — it's a static markdown template that doesn't provide actionable guidance and could be a separate file if needed.
Add a validation step or feedback loop for the overall test development process (e.g., 'run tests locally before pushing, check trace on failure, fix and re-run').
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | The skill is mostly efficient with good code examples, but it's quite long (~250 lines) and includes sections like Wallet/Web3 testing and Financial/Critical Flow testing that are domain-specific and could be split into separate files. The test report template section is also verbose filler that Claude doesn't need inline. | 2 / 3 |
Actionability | Every section provides fully executable, copy-paste ready TypeScript code, bash commands, and YAML configurations. The POM example, config, CI/CD workflow, and flaky test patterns are all concrete and immediately usable. | 3 / 3 |
Workflow Clarity | Individual patterns are clear, but there's no overarching workflow sequence tying them together (e.g., 'set up config → write POM → write tests → run → analyze artifacts'). The flaky test section has good before/after patterns but lacks an explicit validation/feedback loop for the overall test development process. | 2 / 3 |
Progressive Disclosure | This is a monolithic wall of content with no references to external files. The Wallet/Web3 testing, Financial Flow testing, artifact management, and CI/CD sections could easily be split into separate referenced documents. Everything is inline with no navigation structure beyond flat headings. | 1 / 3 |
Total | 8 / 12 Passed |
Validation
90%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 10 / 11 Passed
Validation for skill structure
| Criteria | Description | Result |
|---|---|---|
frontmatter_unknown_keys | Unknown frontmatter key(s) found; consider removing or moving to metadata | Warning |
Total | 10 / 11 Passed | |
Reviewed
Table of Contents