CtrlK
BlogDocsLog inGet started
Tessl Logo

vitron-ai/alethia

Agent-native E2E runtime with verifiable safety. 13 MCP tools including alethia_propose_tests (agent generates tests from a URL), alethia_assert_safety (proves destructive actions are blocked), and the expect block: NLP primitive unique to Alethia. Zero-IPC, ~45x faster than Playwright, signed evidence packs. Works with Claude Code, Cursor, Cline.

95

2.77x
Quality

94%

Does it follow best practices?

Impact

97%

2.77x

Average score across 5 eval scenarios

SecuritybySnyk

Advisory

Suggest reviewing before use

Overview
Quality
Evals
Security
Files

criteria.jsonevals/scenario-5/

{
  "context": "Tests whether the agent follows the correct three-step Alethia workflow: calling alethia_status first to check runtime health, then alethia_compile to preview the Action IR, then alethia_tell to execute. Also checks that the agent knows to inspect PlanRun response fields and includes a name parameter in alethia_tell.",
  "type": "weighted_checklist",
  "checklist": [
    {
      "name": "alethia_status first",
      "description": "The first tool call in test_script.json is alethia_status (before alethia_compile and alethia_tell)",
      "max_score": 15
    },
    {
      "name": "alethia_compile second",
      "description": "alethia_compile appears as a step before alethia_tell in test_script.json",
      "max_score": 15
    },
    {
      "name": "alethia_tell present",
      "description": "alethia_tell appears as a step in test_script.json",
      "max_score": 10
    },
    {
      "name": "name parameter in alethia_tell",
      "description": "The alethia_tell step in test_script.json includes a 'name' parameter",
      "max_score": 10
    },
    {
      "name": "run.ok check documented",
      "description": "test_plan.md or test_script.json mentions checking run.ok from the PlanRun response",
      "max_score": 10
    },
    {
      "name": "stepRuns inspection documented",
      "description": "test_plan.md mentions inspecting stepRuns (per-step results) from the PlanRun response",
      "max_score": 10
    },
    {
      "name": "policyAudits documented",
      "description": "test_plan.md mentions reviewing policyAudits from the PlanRun response",
      "max_score": 10
    },
    {
      "name": "kill switch check documented",
      "description": "test_plan.md or test_script.json describes verifying the kill switch is inactive as part of the status check",
      "max_score": 10
    },
    {
      "name": "compile purpose documented",
      "description": "test_plan.md explains that alethia_compile is used to catch NLP issues before execution (not just as an optional step)",
      "max_score": 10
    }
  ]
}

tile.json