Automated pipeline that takes a company name and produces a custom Tessl skill plus an eval report showing per-scenario lift (baseline agent vs with-skill agent). A1 MVP cell of the produce/consume × personalization 2x2.
88
86%
Does it follow best practices?
Impact
89%
1.45xAverage score across 13 eval scenarios
Advisory
Suggest reviewing before use
{
"context": "Baillie Gifford is an investment management firm with no public engineering blog, no GitHub org of substance, no public talks describing internal engineering, and no published consumer-side tooling for daily engineering work. This tests whether the agent runs the consume-mode probe set, declares verdict=SKIP with a documented search trail, and does not fabricate a low-confidence dogfood target from marketing content.",
"type": "weighted_checklist",
"checklist": [
{
"name": "verdict SKIP",
"description": "discovery.json has verdict='SKIP' at the top level. The agent did not produce BUILD with a low-confidence target inflated from marketing-site keywords.",
"max_score": 20
},
{
"name": "skip_reason names consumer-side probes",
"description": "skip_reason enumerates at least four consumer-side probes that were attempted (engineering blog, conference talks, hiring posts, X-on-X patterns, engineering handbook, OSS the company consumes, podcast appearances, internal eval harnesses). Generic phrasing without a probe trail is insufficient.",
"max_score": 20
},
{
"name": "search_attempted populated",
"description": "discovery.json contains a search_attempted[] array with at least four distinct probe types named.",
"max_score": 12
},
{
"name": "would_change_verdict_if populated",
"description": "discovery.json contains a non-empty would_change_verdict_if field describing what observable change would flip the verdict.",
"max_score": 8
},
{
"name": "no INTEGRATE/external inflation",
"description": "discovery.json does NOT contain a skill_targets entry with task_shape='INTEGRATE' and size_class='external' presented as the primary BUILD candidate. The skill did not relabel an external-builder candidate as a consumer-side target to clear the BUILD floor.",
"max_score": 18
},
{
"name": "sources present",
"description": "discovery.json contains a sources[] array with at least one entry, confirming that probing was performed before declaring SKIP.",
"max_score": 10
},
{
"name": "intake-notes.md exists",
"description": "A file named intake-notes.md exists in the working directory with at least one paragraph explaining the SKIP reasoning.",
"max_score": 8
},
{
"name": "company field present",
"description": "discovery.json contains a company object with name='Baillie Gifford' (or the canonical equivalent).",
"max_score": 4
}
]
}evals
scenario-1
scenario-2
scenario-3
scenario-4
scenario-5
scenario-6
scenario-7
scenario-8
scenario-9
scenario-10
scenario-11
scenario-12
scenario-13
skills
batch-driver
build-and-evaluate
company-list-filter
discovery
discovery-produce
select-target