Automated pipeline that takes a company name and produces a custom Tessl skill plus an eval report showing per-scenario lift (baseline agent vs with-skill agent). A1 MVP cell of the produce/consume × personalization 2x2.
88
86%
Does it follow best practices?
Impact
89%
1.45xAverage score across 13 eval scenarios
Advisory
Suggest reviewing before use
{
"context": "selection.json has selection_status='skipped' with selected_target_id=null. The build-and-evaluate skill must read the selection, surface the reviewer's rationale, and finish at Step 1 without scaffolding a skill, generating scenarios, running review, or invoking any eval command. Baseline agents tend to march through the workflow regardless of the gating status.",
"type": "weighted_checklist",
"checklist": [
{
"name": "No generated-skill scaffold",
"description": "No generated-skill/ directory exists in the working directory. The agent did NOT invoke 'tessl skill new'.",
"max_score": 20
},
{
"name": "No tessl scenario generate",
"description": "No scenarios.json file or evals/ scaffolding for the generated skill exists. The agent did NOT invoke 'tessl scenario generate'.",
"max_score": 15
},
{
"name": "No tessl skill review invocation",
"description": "No skill-review.json artifact exists. The agent did NOT invoke 'tessl skill review --threshold 85' on a non-existent scaffolded skill.",
"max_score": 12
},
{
"name": "No eval execution",
"description": "Neither baseline-results.json nor with-skill-results.json exists. The agent did NOT invoke 'tessl eval run'.",
"max_score": 12
},
{
"name": "selection_status surfaced in run-log.md",
"description": "run-log.md explicitly states the selection_status value ('skipped') from selection.json.",
"max_score": 10
},
{
"name": "selection_rationale surfaced",
"description": "run-log.md includes the selection_rationale text from selection.json (or a close paraphrase covering 'partner-gated', 'confidence 0.55', or 'clarifying answers').",
"max_score": 13
},
{
"name": "No fabricated lift numbers",
"description": "run-log.md does NOT contain a per-scenario lift table, an aggregate lift number, or any specific score values — there were no eval runs to report on, and the agent did not invent any.",
"max_score": 10
},
{
"name": "Run-log.md exists and explains the halt",
"description": "run-log.md is non-empty and clearly explains why the pipeline halted at the load-selection step rather than proceeding to scaffold.",
"max_score": 8
}
]
}evals
scenario-1
scenario-2
scenario-3
scenario-4
scenario-5
scenario-6
scenario-7
scenario-8
scenario-9
scenario-10
scenario-11
scenario-12
scenario-13
skills
batch-driver
build-and-evaluate
company-list-filter
discovery
discovery-produce
select-target