Toolkit for interacting with and testing local web applications using Playwright. Supports verifying frontend functionality, debugging UI behavior, capturing browser screenshots, and viewing browser logs.
90
71%
Does it follow best practices?
Impact
100%
2.17xAverage score across 7 eval scenarios
Passed
No known issues
Optimize this skill with Tessl
npx tessl skill review --optimize ./skills/anthropic-webapp-testing/SKILL.mdQuality
Discovery
57%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
The description adequately identifies its domain (Playwright-based local web app testing) and lists several capabilities, making it reasonably distinctive. However, it lacks an explicit 'Use when...' clause and could benefit from more natural trigger terms that users would actually say when needing this skill. Adding explicit trigger guidance and more user-facing keywords would significantly improve skill selection accuracy.
Suggestions
Add an explicit 'Use when...' clause, e.g., 'Use when the user asks to test a web app, open a browser, take a screenshot of a page, check UI elements, or debug frontend issues on localhost.'
Include more natural trigger terms users would say, such as 'e2e testing', 'end-to-end', 'localhost', 'open browser', 'click button', 'web page', 'check element', '.html'.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | Names the domain (local web applications, Playwright) and several actions (verifying frontend functionality, debugging UI behavior, capturing screenshots, viewing browser logs), but the actions are somewhat general rather than highly concrete operations. | 2 / 3 |
Completeness | Clearly describes what the skill does (interacting with and testing local web apps via Playwright, screenshots, logs), but lacks an explicit 'Use when...' clause specifying when Claude should select this skill. Per rubric guidelines, missing 'Use when' caps completeness at 2. | 2 / 3 |
Trigger Term Quality | Includes useful terms like 'Playwright', 'browser screenshots', 'browser logs', 'frontend', and 'UI behavior', but misses common user phrases like 'test my app', 'open browser', 'click button', 'e2e testing', 'end-to-end', 'localhost', or 'web page'. | 2 / 3 |
Distinctiveness Conflict Risk | The combination of 'Playwright', 'local web applications', 'browser screenshots', and 'browser logs' creates a clear niche that is unlikely to conflict with other skills. This is distinctly about browser-based testing of local apps. | 3 / 3 |
Total | 9 / 12 Passed |
Implementation
85%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This is a well-structured skill that provides clear, actionable guidance for web application testing with Playwright. The decision tree is an effective way to guide approach selection, and the executable examples are concrete and useful. Minor verbosity in repeated advice about black-box script usage and some obvious best practices slightly reduce token efficiency.
Suggestions
Remove the redundant 'use bundled scripts as black boxes' bullet from Best Practices since it's already emphasized in the intro paragraph.
Trim Best Practices items that Claude already knows (e.g., 'Always close the browser when done', 'Use sync_playwright() for synchronous scripts') to improve conciseness.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | Mostly efficient but has some redundancy — the 'use scripts as black boxes' advice is repeated in the intro and Best Practices, and the Best Practices section includes some items Claude already knows (close browser, use sync_playwright). The decision tree and examples are well-structured though. | 2 / 3 |
Actionability | Provides fully executable bash commands for with_server.py, complete Python Playwright code snippets that are copy-paste ready, and concrete examples for both single and multi-server setups. The reconnaissance pattern includes specific method calls. | 3 / 3 |
Workflow Clarity | The decision tree clearly sequences the approach based on context (static vs dynamic, server running vs not). The reconnaissance-then-action pattern is a clear 3-step workflow. The common pitfall section serves as a validation checkpoint for the critical networkidle wait. For a non-destructive testing skill, this level of workflow clarity is appropriate. | 3 / 3 |
Progressive Disclosure | The skill provides a concise overview with clear references to examples/ directory listing specific files and their purposes. Helper scripts are referenced but not inlined. Content is well-organized into logical sections with appropriate depth for a SKILL.md. | 3 / 3 |
Total | 11 / 12 Passed |
Validation
100%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 11 / 11 Passed
Validation for skill structure
No warnings or errors.
431bfad
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.