CtrlK
BlogDocsLog inGet started
Tessl Logo

jbaruch/auto-skill-discovery

Automated pipeline that takes a company name and produces a custom Tessl skill plus an eval report showing per-scenario lift (baseline agent vs with-skill agent). A1 MVP cell of the produce/consume × personalization 2x2.

88

1.45x
Quality

86%

Does it follow best practices?

Impact

89%

1.45x

Average score across 13 eval scenarios

SecuritybySnyk

Advisory

Suggest reviewing before use

Overview
Quality
Evals
Security
Files

criteria.jsonevals/scenario-10/

{
  "context": "Baillie Gifford is an investment management firm with no public engineering blog, no GitHub org of substance, no public talks describing internal engineering, and no published consumer-side tooling for daily engineering work. This tests whether the agent runs the consume-mode probe set, declares verdict=SKIP with a documented search trail, and does not fabricate a low-confidence dogfood target from marketing content.",
  "type": "weighted_checklist",
  "checklist": [
    {
      "name": "verdict SKIP",
      "description": "discovery.json has verdict='SKIP' at the top level. The agent did not produce BUILD with a low-confidence target inflated from marketing-site keywords.",
      "max_score": 20
    },
    {
      "name": "skip_reason names consumer-side probes",
      "description": "skip_reason enumerates at least four consumer-side probes that were attempted (engineering blog, conference talks, hiring posts, X-on-X patterns, engineering handbook, OSS the company consumes, podcast appearances, internal eval harnesses). Generic phrasing without a probe trail is insufficient.",
      "max_score": 20
    },
    {
      "name": "search_attempted populated",
      "description": "discovery.json contains a search_attempted[] array with at least four distinct probe types named.",
      "max_score": 12
    },
    {
      "name": "would_change_verdict_if populated",
      "description": "discovery.json contains a non-empty would_change_verdict_if field describing what observable change would flip the verdict.",
      "max_score": 8
    },
    {
      "name": "no INTEGRATE/external inflation",
      "description": "discovery.json does NOT contain a skill_targets entry with task_shape='INTEGRATE' and size_class='external' presented as the primary BUILD candidate. The skill did not relabel an external-builder candidate as a consumer-side target to clear the BUILD floor.",
      "max_score": 18
    },
    {
      "name": "sources present",
      "description": "discovery.json contains a sources[] array with at least one entry, confirming that probing was performed before declaring SKIP.",
      "max_score": 10
    },
    {
      "name": "intake-notes.md exists",
      "description": "A file named intake-notes.md exists in the working directory with at least one paragraph explaining the SKIP reasoning.",
      "max_score": 8
    },
    {
      "name": "company field present",
      "description": "discovery.json contains a company object with name='Baillie Gifford' (or the canonical equivalent).",
      "max_score": 4
    }
  ]
}

evals

discovery-output-contract.md

README.md

tile.json