Agent-native E2E runtime with verifiable safety. 16 MCP tools including alethia_propose_tests (agent generates tests from a URL), alethia_assert_safety (proves destructive actions are blocked), and the expect block: NLP primitive unique to Alethia. Zero-IPC; 2-5x faster than Playwright MCP per flow; signed evidence packs. Works with Claude Code, Cursor, Cline.
95
94%
Does it follow best practices?
Impact
98%
2.80xAverage score across 5 eval scenarios
Advisory
Suggest reviewing before use
Quality
Discovery
100%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
This is a strong skill description that clearly communicates both capabilities and trigger conditions. It opens with an explicit 'Use when...' clause covering diverse user intents, lists concrete actions and specific output types, and occupies a distinct niche combining browser automation with safety auditing. The description is well-structured, concise, and uses appropriate third-person voice throughout.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | Lists multiple specific concrete actions: run E2E tests, verify a web page, generate tests, prove destructive actions are blocked, check UI element visibility, fill out a form, drive a browser. Also specifies concrete outputs: per-step results, safety classifications, policy decisions, DOM diffs, structured page context, signed audit trail. | 3 / 3 |
Completeness | Clearly answers both 'what' (runs E2E tests, verifies web pages, generates tests, checks UI elements, fills forms, drives browser; returns step results with safety classifications, DOM diffs, audit trail) and 'when' (explicit 'Use when...' clause listing multiple trigger scenarios). | 3 / 3 |
Trigger Term Quality | Excellent coverage of natural terms users would say: 'E2E tests', 'web page', 'generate tests', 'UI element', 'fill out a form', 'drive a browser', 'natural language'. These are terms users would naturally use when requesting browser automation or end-to-end testing. | 3 / 3 |
Distinctiveness Conflict Risk | Highly distinctive with a clear niche: browser-driven E2E testing with safety classifications and audit trails. The combination of browser automation, testing, safety/policy decisions, and DOM diffs creates a unique profile unlikely to conflict with generic testing or web scraping skills. | 3 / 3 |
Total | 12 / 12 Passed |
Implementation
85%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This is a well-crafted skill that provides highly actionable, clearly structured guidance for driving the Alethia MCP server. The three primary workflows are well-sequenced with validation checkpoints, the NLP phrasing guide gives exact syntax, and structured response fields are documented for agent self-repair. Minor verbosity in the trigger terms list and a few explanatory asides prevent a perfect conciseness score, but overall token efficiency is good.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | The skill is mostly efficient and avoids explaining basic concepts, but some sections are slightly verbose — e.g., the 'Use when' trigger terms list is long, and some explanatory prose (like 'This is the primitive that makes Alethia a verifiable-safety framework, not just a test runner') could be trimmed. The safety classifications table and reason codes are well-structured and earn their tokens. | 2 / 3 |
Actionability | The skill provides concrete, copy-paste-ready NLP phrasings, executable JSON config, specific tool call sequences, and structured response field names. Every workflow has specific tool names and clear expected outputs. The NLP phrasing guide is particularly actionable with exact syntax examples. | 3 / 3 |
Workflow Clarity | Three distinct workflows (A, B, C) are clearly sequenced with numbered steps. Workflow A includes a self-repair loop (read suggestedFix, retry). Workflow B has explicit validation (passed: true/false) with clear guidance on what to do when validation fails. The status check as step 1 serves as a liveness checkpoint. | 3 / 3 |
Progressive Disclosure | The skill provides a concise overview with clear one-level-deep references to docs/index.md and rules/alethia.md. Content is well-organized into logical sections (workflows, NLP guide, response fields, safety, limitations) without being monolithic. The inline content is appropriately scoped for what an agent needs during execution. | 3 / 3 |
Total | 11 / 12 Passed |
Validation
100%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 11 / 11 Passed
Validation for skill structure
No warnings or errors.
Reviewed
Table of Contents