Automated pipeline that takes a company name and produces a custom Tessl skill plus an eval report showing per-scenario lift (baseline agent vs with-skill agent). A1 MVP cell of the produce/consume × personalization 2x2.
88
86%
Does it follow best practices?
Impact
89%
1.45xAverage score across 13 eval scenarios
Advisory
Suggest reviewing before use
Automated pipeline that takes a company name and produces a custom Tessl skill plus an eval report showing per-scenario lift (baseline agent vs with-skill agent). Designed for booth-scale runs (~100 leads) ahead of conferences.
Install: tessl install jbaruch/auto-skill-discovery
Strategic context: see SPEC.md (the 10-step pipeline contract) and STAKEHOLDER-BRIEF.md (the four operating modes, what's validated, cost envelope).
One pipeline, four cells on a 2x2 of (target surface × personalization):
| Personalization OFF | Personalization ON | |
|---|---|---|
| Produce (company's public API/SDK) | A1 ✅ shipped | A2 (deferred) |
| Consume (engineers' daily tools) | B1 (deferred) | B2 (deferred) |
A1 is the MVP for the AI Security Summit by Snyk. The other three cells layer in after.
company name
│
▼ Step 1 — produce-mode discovery
/discovery-produce (skills/discovery-produce/SKILL.md)
│ → runs/<ts>/<slug>/discovery.json
│
▼ Step 2 — target selection (human gate OR auto-pick)
/select-target <slug> (skills/select-target/SKILL.md)
OR
auto-select.py (skills/batch-driver/scripts/auto-select.py)
│ → runs/<ts>/<slug>/selection.json
│
▼ Steps 3–9 — skill build + eval + report
/build-and-evaluate <slug> (skills/build-and-evaluate/SKILL.md)
→ runs/<ts>/<slug>/generated-skill/
→ runs/<ts>/<slug>/scenarios.json
→ runs/<ts>/<slug>/skill-review.json
→ runs/<ts>/<slug>/lift.json
→ runs/<ts>/<slug>/gap-analysis.md
→ runs/<ts>/<slug>/report.md ← final deliverableFor a batch, /batch-driver <csv> loops the per-company pipeline under one shared run timestamp and emits a per-batch index.md.
/discovery-produce — give it a company name when prompted. Writes runs/<ts>/<slug>/discovery.json./select-target <slug> — presents the top-3 candidate targets; reply with a tgt_NN id, skip, or defer. Writes selection.json./build-and-evaluate <slug> — runs phases 4–10 unattended. Final output is runs/<ts>/<slug>/report.md.Skip the human gate by auto-picking the top-confidence target:
# After /discovery-produce has run
python3 skills/batch-driver/scripts/auto-select.py runs/<ts>/<slug>/discovery.jsonThen /build-and-evaluate <slug> as above. Auto-pick writes auto_selected: true to selection.json for traceability — downstream steps can't tell auto-picks from human picks.
/batch-driver path/to/attendees.csvCSV header must include company (required) and optionally domain. The skill:
skills/batch-driver/scripts/parse-attendees.py), dropping empty / duplicate / non-printable rowsDISCOVERY_RUN_TS for the batchdiscovery-produce → auto-select.py → build-and-evaluateruns/<DISCOVERY_RUN_TS>/index.md with verdict counts, per-company table, aggregate lift across BUILDs, and a follow-up checklist for SKIPs/AMBIGUOUS/errorsruns/
<DISCOVERY_RUN_TS>/ # e.g., 2026-05-14T-stripe-live
batch-manifest.json # batch mode only
index.md # batch mode only
<slug>/ # one per company
discovery.json # Step 1 output (contract: discovery-output-contract.md)
selection.json # Step 2 output
scenarios.json # Step 4 output (from tessl scenario generate)
generated-skill/ # Step 3 — scaffolded by tessl skill new
tile.json
SKILL.md
evals/
skill-review.json # Step 6 — tessl skill review --threshold 85
baseline-results.json # Step 7 — without-context variant
with-skill-results.json # Step 7 — with-context variant
lift.json # Step 8 — compute-lift.py
gap-analysis.md # Step 9
report.md # Step 10 — final deliverable| Skill | Slash command | Role | Review score |
|---|---|---|---|
discovery-produce | /discovery-produce | A1 entry point — produce-mode discovery from company name | 85 |
discovery | /discovery | Consume-mode discovery (B1/B2 — deferred path) | 76 ⚠️ below gate |
select-target | /select-target | Human-gated target selection | 85 |
build-and-evaluate | /build-and-evaluate | Phases 4–10 orchestrator | 88 |
batch-driver | /batch-driver | CSV-driven batch runner | 94 |
company-list-filter | /company-list-filter | Pre-discovery triage (drop obviously-out, surface obviously-in) | 90 |
All scores against tessl skill review --threshold 85. The consume-mode discovery skill is below the gate — pre-existing issue, out of A1 scope, flagged for follow-up.
./smoke-test-a1.shExercises all deterministic plumbing (contract validator, auto-select, bleeding audit, lift compute, report render, batch summary) end-to-end on synthetic Stripe data. Passes in seconds. Does NOT exercise the live Tessl tooling — those calls happen when build-and-evaluate is invoked from Claude Code.
Inspect the artifacts at runs/2026-05-13T-a1-smoke/stripe/.
The discovery JSON shape is the canonical interface between every step. See discovery-output-contract.md. Two modes (consume, produce) and three schema versions (1, 2, 3 — v3 introduces the mode field). The validator at skills/discovery/scripts/validate-output.py enforces shape; produce-mode rejects consume-side fields and vice versa.
tessl eval run --variant without-context --variant with-context output schema not yet exercised — build-and-evaluate Step 8 instructs the agent to reason over the result and emit baseline-results.json + with-skill-results.json in the canonical shape. Once the schema is observed, that step gets a parse-eval-view.py script.discovery skill scores 76 against tessl skill review --threshold 85. Pre-existing, not in A1 scope.~$13–$28 per company on Opus 4.7 with prompt caching (clarifying $0–$2 + discovery $3–$6 + skill gen + eval + report $10–$20). A 100-company Snyk batch runs at ~$1.3k–$2.6k in tokens. See STAKEHOLDER-BRIEF.md § 3 for the full breakdown.
Strategic / context documents:
Skill companion files (referenced from the SKILL.md execution plans):
evals
scenario-1
scenario-2
scenario-3
scenario-4
scenario-5
scenario-6
scenario-7
scenario-8
scenario-9
scenario-10
scenario-11
scenario-12
scenario-13
skills
batch-driver
build-and-evaluate
company-list-filter
discovery
discovery-produce
select-target