Automated pipeline that takes a company name and produces a custom Tessl skill plus an eval report showing per-scenario lift (baseline agent vs with-skill agent). A1 MVP cell of the produce/consume × personalization 2x2.
88
86%
Does it follow best practices?
Impact
89%
1.45xAverage score across 13 eval scenarios
Advisory
Suggest reviewing before use
{
"context": "The CSV has a valid header (Company, Domain) but every data row has empty company values. parse-attendees.py exits non-zero with ok=false and reports the dropped-row count in its stderr summary. This tests whether the agent halts at Step 1, surfaces the parse summary, and refuses to initialize a run directory for a batch with zero usable rows.",
"type": "weighted_checklist",
"checklist": [
{
"name": "parse-attendees.py invoked",
"description": "summary-log.txt references skills/batch-driver/scripts/parse-attendees.py being invoked on inputs/attendees.csv (or shows its JSON output).",
"max_score": 15
},
{
"name": "Empty-result reported as zero usable rows",
"description": "summary-log.txt surfaces that the parse step yielded zero usable rows (kept_rows=0 or equivalent phrasing). The dropped-row counter is named explicitly (empty_company or equivalent).",
"max_score": 15
},
{
"name": "No batch-manifest.json written",
"description": "No file named batch-manifest.json exists under any runs/ subdirectory created during this invocation.",
"max_score": 18
},
{
"name": "No per-company subdirectories",
"description": "No company-slug subdirectories under runs/ were created. The agent did not loop into discovery-produce on non-existent companies to keep the batch going.",
"max_score": 18
},
{
"name": "No index.md fabricated",
"description": "No index.md file exists with synthesized per-company rows. The agent did not invent companies to fill an empty batch summary.",
"max_score": 10
},
{
"name": "Re-export guidance present",
"description": "summary-log.txt suggests an actionable next step for the marketing-ops team (e.g., re-export with company names populated, check the export job, verify the column header).",
"max_score": 12
},
{
"name": "Source CSV named",
"description": "summary-log.txt names the CSV file processed (inputs/attendees.csv or the equivalent path).",
"max_score": 7
},
{
"name": "Header row noted",
"description": "summary-log.txt acknowledges that the CSV header was valid (the failure is in the data rows, not the header parsing) — this saves the team from re-checking column names.",
"max_score": 5
}
]
}evals
scenario-1
scenario-2
scenario-3
scenario-4
scenario-5
scenario-6
scenario-7
scenario-8
scenario-9
scenario-10
scenario-11
scenario-12
scenario-13
skills
batch-driver
build-and-evaluate
company-list-filter
discovery
discovery-produce
select-target