CtrlK
BlogDocsLog inGet started
Tessl Logo

jbaruch/auto-skill-discovery

Automated pipeline that takes a company name and produces a custom Tessl skill plus an eval report showing per-scenario lift (baseline agent vs with-skill agent). A1 MVP cell of the produce/consume × personalization 2x2.

88

1.45x
Quality

86%

Does it follow best practices?

Impact

89%

1.45x

Average score across 13 eval scenarios

SecuritybySnyk

Advisory

Suggest reviewing before use

Overview
Quality
Evals
Security
Files

criteria.jsonevals/scenario-1/

{
  "context": "The CSV has a valid header (Company, Domain) but every data row has empty company values. parse-attendees.py exits non-zero with ok=false and reports the dropped-row count in its stderr summary. This tests whether the agent halts at Step 1, surfaces the parse summary, and refuses to initialize a run directory for a batch with zero usable rows.",
  "type": "weighted_checklist",
  "checklist": [
    {
      "name": "parse-attendees.py invoked",
      "description": "summary-log.txt references skills/batch-driver/scripts/parse-attendees.py being invoked on inputs/attendees.csv (or shows its JSON output).",
      "max_score": 15
    },
    {
      "name": "Empty-result reported as zero usable rows",
      "description": "summary-log.txt surfaces that the parse step yielded zero usable rows (kept_rows=0 or equivalent phrasing). The dropped-row counter is named explicitly (empty_company or equivalent).",
      "max_score": 15
    },
    {
      "name": "No batch-manifest.json written",
      "description": "No file named batch-manifest.json exists under any runs/ subdirectory created during this invocation.",
      "max_score": 18
    },
    {
      "name": "No per-company subdirectories",
      "description": "No company-slug subdirectories under runs/ were created. The agent did not loop into discovery-produce on non-existent companies to keep the batch going.",
      "max_score": 18
    },
    {
      "name": "No index.md fabricated",
      "description": "No index.md file exists with synthesized per-company rows. The agent did not invent companies to fill an empty batch summary.",
      "max_score": 10
    },
    {
      "name": "Re-export guidance present",
      "description": "summary-log.txt suggests an actionable next step for the marketing-ops team (e.g., re-export with company names populated, check the export job, verify the column header).",
      "max_score": 12
    },
    {
      "name": "Source CSV named",
      "description": "summary-log.txt names the CSV file processed (inputs/attendees.csv or the equivalent path).",
      "max_score": 7
    },
    {
      "name": "Header row noted",
      "description": "summary-log.txt acknowledges that the CSV header was valid (the failure is in the data rows, not the header parsing) — this saves the team from re-checking column names.",
      "max_score": 5
    }
  ]
}

evals

scenario-1

criteria.json

task.md

discovery-output-contract.md

README.md

tile.json