Automated pipeline that takes a company name and produces a custom Tessl skill plus an eval report showing per-scenario lift (baseline agent vs with-skill agent). A1 MVP cell of the produce/consume × personalization 2x2.
88
86%
Does it follow best practices?
Impact
89%
1.45xAverage score across 13 eval scenarios
Advisory
Suggest reviewing before use
Reference material for SKILL.md Step 4. Read once per run when classifying a candidate surface.
For every candidate surface, ask: does the target company's daily-employee population CONSUME this surface, or BUILD it?
The company's employees use the surface the same way external customers would. USE/majority targets pointing at this surface ARE credible.
Examples:
The company ships this surface for outside consumers. Their relationship is "we build it," not "we use it as customers do."
Examples:
USE/majority targets pointing at a builder-side surface are WRONG. Three legitimate moves instead, in priority order — exhaust each before falling to the next.
"Builder-side at the public API" does NOT mean "no consumer-side surface exists." Internal surfaces are often described publicly. Run all of these searches before concluding none exists:
"we use internally", "our internal", "how we built", "dogfood", "we run on our own" — many companies blog explicitly about internal toolingThe skip_reason must enumerate which of these searches were attempted and what each returned. Generic phrasing like "internal tooling is non-public" is insufficient and indicates lazy search.
"How to author a new endpoint in our API following our conventions." Population is small_team (the API team only). Score will be lower (AUTHOR weight is 0.5). Use this when Move 1 finds nothing AND there's a real authoring pattern an agent could codify.
For the company's customers, not its employees. Booth-aha score will be low (external × INTEGRATE weights ≈ 0.06). Acknowledge this honestly rather than relabeling.
After exhaustive consumer-side search, if Move 1's full search list has been run and documented in skip_reason, AND no candidate clears the BUILD floor of raw confidence ≥ 0.5 in any task_shape, fall to verdict SKIP. The skip_reason MUST list the specific consumer-side searches attempted (with what was searched and what came back) — not just a generic "non-public internal engineering" claim. A SKIP without a documented search trail is a process error and should be reworked.
Inflating an INTEGRATE/external candidate to USE/majority to clear the BUILD threshold is also a process error. Both shortcuts (lazy-SKIP without search and over-inflate-to-BUILD) violate the contract.
task_shape: USE for the catalog/scaffolder/tech-docs daily workflow with size_class: majority.INTEGRATE / external.INTEGRATE / external.score = confidence × iu_weight × pop_weight × ts_weightWeights:
iu_weight (from internal_usage level): confirmed=1.0, inferred=0.7, weak=0.3, none=0.1pop_weight (from target_population.size_class): majority=1.0, minority=0.6, small_team=0.4, external=0.2ts_weight (from task_shape): USE=1.0, AUTHOR=0.5, INTEGRATE=0.3Score is advisory ranking only — BUILD verdict still requires raw confidence ≥ 0.5 on at least one target. The score signals to the human-gate whether the booth-aha audience is broad (majority/USE) or narrow (small_team/AUTHOR or external/INTEGRATE).
evals
scenario-1
scenario-2
scenario-3
scenario-4
scenario-5
scenario-6
scenario-7
scenario-8
scenario-9
scenario-10
scenario-11
scenario-12
scenario-13
skills
batch-driver
build-and-evaluate
company-list-filter
discovery
discovery-produce
select-target