Automated pipeline that takes a company name and produces a custom Tessl skill plus an eval report showing per-scenario lift (baseline agent vs with-skill agent). A1 MVP cell of the produce/consume × personalization 2x2.
88
86%
Does it follow best practices?
Impact
89%
1.45xAverage score across 13 eval scenarios
Advisory
Suggest reviewing before use
{
"context": "The exported file contains only blank lines — the deduplication step returns count_out=0. This tests whether the agent recognizes an unusable input and produces an explanatory artifact rather than fabricating a triage report from no data.",
"type": "weighted_checklist",
"checklist": [
{
"name": "triage-skipped.md exists",
"description": "A file named triage-skipped.md exists in the working directory with at least one paragraph of content.",
"max_score": 15
},
{
"name": "Dedup yield reported as zero",
"description": "triage-skipped.md surfaces that the deduplication step yielded zero unique companies (count_out=0 or a clear phrasing referring to an empty result).",
"max_score": 15
},
{
"name": "Input filename named",
"description": "triage-skipped.md names the input file read (inputs/companies.txt or equivalent path).",
"max_score": 10
},
{
"name": "Re-export guidance present",
"description": "triage-skipped.md includes guidance the coworker can act on (e.g., re-export the file, verify the column header, check the source system).",
"max_score": 12
},
{
"name": "No triage-report.md written",
"description": "The file triage-report.md does NOT exist in the working directory. An empty source cannot be triaged into the four-bucket classification.",
"max_score": 20
},
{
"name": "No fabricated four-section report",
"description": "No file in the working directory contains the four MEGA_CORP / SELF_OR_NA / RUN_DISCOVERY / UNKNOWN section headers with synthesized company entries. The agent did not invent companies to fill the buckets.",
"max_score": 18
},
{
"name": "dedup-output.json reflects empty result",
"description": "If dedup-output.json exists, it shows count_in matching the row count of the input file and count_out=0 with unique as an empty array. If it does not exist, this criterion is satisfied by triage-skipped.md citing the equivalent values.",
"max_score": 10
}
]
}evals
scenario-1
scenario-2
scenario-3
scenario-4
scenario-5
scenario-6
scenario-7
scenario-8
scenario-9
scenario-10
scenario-11
scenario-12
scenario-13
skills
batch-driver
build-and-evaluate
company-list-filter
discovery
discovery-produce
select-target