CtrlK
BlogDocsLog inGet started
Tessl Logo

jbaruch/auto-skill-discovery

Automated pipeline that takes a company name and produces a custom Tessl skill plus an eval report showing per-scenario lift (baseline agent vs with-skill agent). A1 MVP cell of the produce/consume × personalization 2x2.

88

1.45x
Quality

86%

Does it follow best practices?

Impact

89%

1.45x

Average score across 13 eval scenarios

SecuritybySnyk

Advisory

Suggest reviewing before use

Overview
Quality
Evals
Security
Files

criteria.jsonevals/scenario-4/

{
  "context": "selection.json has selection_status='skipped' with selected_target_id=null. The build-and-evaluate skill must read the selection, surface the reviewer's rationale, and finish at Step 1 without scaffolding a skill, generating scenarios, running review, or invoking any eval command. Baseline agents tend to march through the workflow regardless of the gating status.",
  "type": "weighted_checklist",
  "checklist": [
    {
      "name": "No generated-skill scaffold",
      "description": "No generated-skill/ directory exists in the working directory. The agent did NOT invoke 'tessl skill new'.",
      "max_score": 20
    },
    {
      "name": "No tessl scenario generate",
      "description": "No scenarios.json file or evals/ scaffolding for the generated skill exists. The agent did NOT invoke 'tessl scenario generate'.",
      "max_score": 15
    },
    {
      "name": "No tessl skill review invocation",
      "description": "No skill-review.json artifact exists. The agent did NOT invoke 'tessl skill review --threshold 85' on a non-existent scaffolded skill.",
      "max_score": 12
    },
    {
      "name": "No eval execution",
      "description": "Neither baseline-results.json nor with-skill-results.json exists. The agent did NOT invoke 'tessl eval run'.",
      "max_score": 12
    },
    {
      "name": "selection_status surfaced in run-log.md",
      "description": "run-log.md explicitly states the selection_status value ('skipped') from selection.json.",
      "max_score": 10
    },
    {
      "name": "selection_rationale surfaced",
      "description": "run-log.md includes the selection_rationale text from selection.json (or a close paraphrase covering 'partner-gated', 'confidence 0.55', or 'clarifying answers').",
      "max_score": 13
    },
    {
      "name": "No fabricated lift numbers",
      "description": "run-log.md does NOT contain a per-scenario lift table, an aggregate lift number, or any specific score values — there were no eval runs to report on, and the agent did not invent any.",
      "max_score": 10
    },
    {
      "name": "Run-log.md exists and explains the halt",
      "description": "run-log.md is non-empty and clearly explains why the pipeline halted at the load-selection step rather than proceeding to scaffold.",
      "max_score": 8
    }
  ]
}

evals

discovery-output-contract.md

README.md

tile.json