Automated pipeline that takes a company name and produces a custom Tessl skill plus an eval report showing per-scenario lift (baseline agent vs with-skill agent). A1 MVP cell of the produce/consume × personalization 2x2.
88
86%
Does it follow best practices?
Impact
89%
1.45xAverage score across 13 eval scenarios
Advisory
Suggest reviewing before use
The ML platform team has selection results queued at inputs/selection.json (with the linked discovery at inputs/discovery.json) and asked you to run the standard build-and-evaluate pipeline. The selection step ran earlier and the reviewer captured their rationale in the selection file.
Run the pipeline against this selection. Persist a run-log.md documenting which steps you executed, what was produced, and the final state. The team uses this log to audit pipeline runs without scrubbing through CLI history.
Produce run-log.md in the working directory. The log should make clear what was run (or what was skipped, and why) and the disposition of every artifact the pipeline would normally produce.
evals
scenario-1
scenario-2
scenario-3
scenario-4
scenario-5
scenario-6
scenario-7
scenario-8
scenario-9
scenario-10
scenario-11
scenario-12
scenario-13
skills
batch-driver
build-and-evaluate
company-list-filter
discovery
discovery-produce
select-target