CtrlK
BlogDocsLog inGet started
Tessl Logo

vitron-ai/alethia

Agent-native E2E runtime with verifiable safety. 13 MCP tools including alethia_propose_tests (agent generates tests from a URL), alethia_assert_safety (proves destructive actions are blocked), and the expect block: NLP primitive unique to Alethia. Zero-IPC, ~45x faster than Playwright, signed evidence packs. Works with Claude Code, Cursor, Cline.

95

2.77x
Quality

94%

Does it follow best practices?

Impact

97%

2.77x

Average score across 5 eval scenarios

SecuritybySnyk

Advisory

Suggest reviewing before use

Overview
Quality
Evals
Security
Files

Quality

Discovery

100%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

This is a strong skill description that excels across all dimensions. It opens with an explicit 'Use when...' clause containing diverse, natural trigger terms, then clearly describes the outputs. The combination of browser automation, E2E testing, and safety/audit capabilities creates a distinctive and well-defined niche.

DimensionReasoningScore

Specificity

Lists multiple specific concrete actions: run E2E tests, verify a web page, generate tests, prove destructive actions are blocked, check UI element visibility, fill out a form, drive a browser. Also specifies concrete outputs: per-step results, safety classifications, policy decisions, DOM diffs, structured page context, signed audit trail.

3 / 3

Completeness

Clearly answers both 'what' (runs E2E tests, verifies web pages, generates tests, checks UI elements, fills forms, drives browser; returns structured results with audit trail) and 'when' (explicit 'Use when...' clause listing multiple trigger scenarios).

3 / 3

Trigger Term Quality

Excellent coverage of natural terms users would say: 'E2E tests', 'web page', 'generate tests', 'UI element', 'fill out a form', 'drive a browser', 'natural language'. These are terms users would naturally use when requesting browser automation or end-to-end testing.

3 / 3

Distinctiveness Conflict Risk

Highly distinctive niche combining browser automation, E2E testing, and safety/audit features. The specific mention of DOM diffs, safety classifications, policy decisions, and signed audit trails makes this clearly distinguishable from generic testing or web scraping skills.

3 / 3

Total

12

/

12

Passed

Implementation

85%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This is a well-structured, highly actionable skill that clearly defines three primary workflows with proper validation steps and self-repair loops. The NLP phrasing guide with concrete examples and the structured response field documentation are particularly strong. Minor verbosity in some explanatory passages and the trigger terms list prevents a perfect conciseness score, but overall the skill is effective and well-organized.

DimensionReasoningScore

Conciseness

The skill is mostly efficient and avoids explaining basic concepts, but some sections are slightly verbose — e.g., the 'Use when' trigger terms list is long, and some explanatory prose (like 'This is the primitive that makes Alethia a verifiable-safety framework, not just a test runner') could be trimmed. The safety classifications table and reason codes are well-structured but the surrounding prose adds some unnecessary context.

2 / 3

Actionability

The skill provides concrete, executable NLP phrasing examples, specific MCP tool calls with named functions, JSON config for setup, and structured response field documentation. Every workflow has specific tool names and clear instructions on what to call and what to read from the response.

3 / 3

Workflow Clarity

Three distinct workflows (A, B, C) are clearly sequenced with numbered steps. Each includes validation checkpoints — checking `alethia_status` for liveness, reading `run.ok`, using `suggestedFix` for self-repair on failure, and surfacing `passed: false` prominently. The feedback loop (fail → read nearMatches/suggestedFix → retry) is explicit.

3 / 3

Progressive Disclosure

The skill provides a concise overview with clear references to deeper docs (Tile Overview, Rules) that are one level deep. Content is well-organized into logical sections (workflows, NLP guide, response fields, safety classifications, limitations) without being monolithic, and the main file stays focused on actionable guidance.

3 / 3

Total

11

/

12

Passed

Validation

100%

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation11 / 11 Passed

Validation for skill structure

No warnings or errors.

Reviewed

Table of Contents