Use this skill any time someone wants to create, scaffold, build, fix, improve, benchmark, or optimize a Tessl/Claude skill — even if they don't say 'tessl' explicitly. If the request involves making a new skill ('create a skill for X', 'build me a skill that does Y', 'scaffold a skill called Z'), fixing or completing an existing one (missing tile.json, broken repo integration, low eval scores, description not triggering), or running and iterating on evals, invoke this skill. The full workflow covers: structured interview → SKILL.md + tile.json + rules/ scaffolding → README/CI repo integration → tessl tile lint → optional Tessl CLI pipeline (skill review, scenario generate/download, eval run) → hand-authored evals or LLM-as-judge fallback → benchmark logging. Do NOT use for: editing application code, debugging, refactoring, writing general documentation, or creating presentations.
93
92%
Does it follow best practices?
Impact
91%
1.26xAverage score across 3 eval scenarios
Advisory
Suggest reviewing before use
Create production-quality Tessl skills from scratch and optimize them through eval-driven iteration.
Two modes of operation:
Detect which mode from the user's request. If ambiguous, ask.
tessl tile lint if available; otherwise run simulated lint checks (Phase 2.5).skills-lock.json — it pins vendored skills under .agents/skills/; first-party tiles live under skills/ and are wired via README + CI only.tessl eval run or LLM-as-judge runs. All skill mutations (e.g. tessl skill review, Phase 5 apply) must occur before eval execution starts, at explicit workflow boundaries — never interleaved with a running eval.tessl skill review --optimize --yes is an exception: Tessl applies changes immediately without per-edit approval — treat it as a distinct "Tessl apply" step and tell the user when you run it.metadata.version. Use a semver string (e.g. 1.0.0 for new skills; bump minor or patch when behavior or documentation meaningfully changes).Run all 10 questions using AskUserQuestion before generating any files. Collect answers into a working decision map held in memory. Complete the Completeness check at the end before proceeding.
| # | Question | Key options | If unsure |
|---|---|---|---|
| 1 | What does this skill do? (one sentence) | Free text | Ask "What task should the AI do better?" and "What goes wrong without it?" |
| 2 | Who will use this skill? | Developers / Semi-technical / Both | Default to Both |
| 3 | What type of project? | Code generation / Writing / Tool use / Interview / Other | Ask for a brief domain description |
| 4 | What are the 3–5 things this skill MUST do every time? | Free text (list) | Ask "What would make you say 'it worked perfectly'?" |
| 5 | What should this skill NEVER do? | Free text (list) | Generate domain-specific anti-patterns from purpose + domain answers |
| 6 | What phrases or signals activate this skill? | Free text / Generate suggestions / Research similar | Produce ≥5 candidate trigger terms from purpose + domain + behaviors; present for approval |
| 7 | What does the final output look like? | Files / Structured message / Interactive flow | Research similar skills |
| 8 | Does this skill need companion files beyond SKILL.md? | No / Rules files / Templates | Recommend companion files if >5 core behaviors or estimated length >300 lines |
| 9 | Which tools does this skill need? | AskUserQuestion only / + file tools / + WebSearch / All + Bash | Infer from domain: Code-gen → file tools + Bash; Writing → file tools; Workflow → all; Interview → AskUserQuestion + optionally WebSearch |
| 10 | Describe 2–3 realistic test tasks for this skill | Free text / Generate / Skip | Generate from purpose + behaviors |
Completeness check: Before scaffolding, verify all 10 categories have resolved values. If any are missing or still "unsure," resolve them before continuing.
Turn the decision map into a complete skill directory. See scaffold-rules for full implementation details.
Frontmatter:
---
name: <skill-name>
description: "<Purpose>. Triggers: <trigger terms>. Uses <tools>. Outputs: <deliverables>. Do NOT use for: <exclusions>."
metadata:
version: "1.0.0"
tags: <domain tags>
---Apply activation-design heuristics: front-load trigger terms; use imperatives throughout ("Use X", "Do not Y", "Always Z") — never "consider", "may want", or "try to".
Body structure: title + one-liner → non-negotiables (numbered) → process/phases → integrated example (realistic, exercises ≥2 non-negotiables) → anti-patterns.
Length target: 150–400 lines. If content exceeds 400 lines, extract secondary rules into rules/*.md and reference with relative links.
{
"name": "oh-my-ai/<skill-name>",
"version": "1.0.0",
"private": false,
"summary": "<one-line purpose>",
"skills": { "<skill-name>": { "path": "SKILL.md" } }
}If the interview identified companion file needs — Rules: rules/<rule-name>.md with YAML frontmatter (name, description) and structured content. Templates: at skill root, referenced from SKILL.md with relative links.
Both integrations are mandatory. Check for existing entries before inserting.
| Skill | Description | table..github/workflows/tessl-publish.yml): Append - skills/<skill-name> to the tile: array. Validate YAML after editing (python3 -c "import yaml; yaml.safe_load(open(...))") — revert and report on failure.With tessl CLI: cd skills/<skill-name> && tessl tile lint
Simulated (no CLI): Verify: SKILL.md has valid YAML frontmatter with name, description, and metadata.version (semver string); tile.json has name, version, summary, skills; tile.json name matches oh-my-ai/<skill-name>; no broken relative links; each rules/*.md has YAML frontmatter with name and description.
Report results. Fix failures and re-lint.
Run from repository root with paths like ./skills/<skill-name>. Use which tessl (or equivalent) first; if missing, skip this subsection and use Phase 3 Path M + Phase 4 Path B as needed.
Boundary: tessl skill review writes the skill and must complete before tessl eval run starts (→ non-negotiable #6).
| Step | Command / action |
|---|---|
| 1 | tessl skill review --optimize --yes ./skills/<skill-name> — may rewrite SKILL.md (and other files per Tessl). This is Tessl auto-apply (non-negotiable #8). |
| 2 | If the skill has tile.json: cd skills/<skill-name> && tessl tile lint — same as Phase 2.5. |
| 3 | tessl scenario generate ./skills/<skill-name> — parse the generation id from stdout; do not guess. |
| 4 | tessl scenario download <generation> — use the id from step 3. |
| 5 | Place downloaded scenarios under the skill: if Tessl wrote ./evals/ at repo root, move it with mv ./evals/ ./skills/<skill-name>/ (or merge — see below). If output landed elsewhere, move that directory into skills/<skill-name>/evals/. |
| 6 | Continue to Phase 4 — Path A in eval-runner. |
If skills/<skill-name>/evals/ already exists: Use AskUserQuestion before moving: replace entirely, merge (explain how), or download to a temp directory — never overwrite silently.
Local Tessl cache under .tessl/ stays out of git (typically gitignored).
If the skill has no evals/ directory, or the user asks for eval scenarios, offer to create them via AskUserQuestion.
Path T — Tessl CLI (preferred): Run steps 3–5 from §2.6: tessl scenario generate → parse generation id → tessl scenario download → place under skills/<skill-name>/evals/. To tune the skill before generating scenarios, run steps 1–2 from §2.6 first. After download, verify coverage against benchmark-loop; add or adjust scenarios by hand if gaps remain.
Path M — Manual (fallback): Use when Tessl is missing, the user declines CLI generation, or download fails. Author scenarios directly.
Generate 2–3 scenarios (or validate CLI output) following benchmark-loop coverage rules (full scenario schema, scoring rules, and selection heuristics are defined there). For each scenario, ensure evals/<scenario-slug>/ contains:
task.md — A realistic problem (100–300 words) reflecting actual user prompts. Not a toy example.
criteria.json:
{
"context": "Tests whether <specific capability>",
"type": "weighted_checklist",
"checklist": [
{ "name": "<criterion>", "max_score": N, "description": "<what to check>" }
]
}Key constraints: all max_score values must sum to exactly 100; each criterion must be independently verifiable. Name scenarios as kebab-case slugs (e.g., core-interview-flow, noisy-context-retrieval).
See eval-runner for full implementation (including the full Tessl CLI pipeline and --json vs --agent). Summary:
Path A — Tessl CLI (preferred): From repo root, tessl eval run ./skills/<skill-name> --json (add --agent=... when you need a fixed judge model; see eval-runner). Parse JSON output into per-scenario, per-criterion scores.
Path B — LLM-as-Judge Fallback: For each scenario, run two subagents (Agent tool) — one with task only (baseline), one with SKILL.md prepended (with-skill). Score each criterion by launching a judge subagent with the criterion description and agent output; request a JSON response {"score": N, "reasoning": "..."}.
Assemble results into a unified schema: date, method, model, scenarios (each with baseline score, with-skill score, delta, and per-criterion breakdown).
Calibration: If both paths are available, run both on the same scenarios. Accept if within ±15%; otherwise flag to user and prefer CLI results.
Analyze eval results, classify failures, and propose targeted edits. See activation-design and benchmark-loop for full failure pattern definitions and classification guidance.
| Pattern | Signal | Fix |
|---|---|---|
| Activation gap | Skill didn't fire / agent ignored instructions | Add explicit triggers to description; front-load non-negotiables |
| Ambiguous instruction | Inconsistent behavior across runs | Replace "consider"/"may want" with imperatives |
| Missing example | Agent doesn't know expected output shape | Add integrated example showing input → decision points → output |
| Regression | Negative delta vs. baseline | Identify which edit caused it; revert or rewrite |
| Context overload | Skill too long, agent loses focus | Compress; extract rules to companion files |
On user approval: apply edits → re-run Phase 4 → compare new vs. previous results → log to benchmark-log.md → flag any negative deltas immediately.
Optional Tessl loop: Before Phase 4, re-run §2.6 from step 1 (tessl skill review through scenario refresh) to regenerate scenarios after major skill changes. All such mutations must finish before eval execution begins (→ non-negotiable #6).
After every eval run, append to skills/<skill-name>/benchmark-log.md:
## Run: <ISO-8601 timestamp>
**Method:** <tessl-cli | llm-as-judge> | **Model:** <model-name>
| Scenario | Baseline | With Skill | Delta |
|----------|----------|------------|-------|
| <name> | <score> | <score> | <+/-N> |
**Changes applied:** <summary of edits, or "Initial evaluation">
---Create the file if it doesn't exist. Always append — never overwrite.
Warnings do not block. If warnings exist, offer to run another optimization cycle (return to Phase 5).
User says: "Create a skill for writing git commit messages"
Interview summary → decision map:
| # | Answer |
|---|---|
| 1 | "Generate conventional commit messages from staged diffs" |
| 2 | Developers |
| 3 | Code generation |
| 4 | Read staged diff; use Conventional Commits format; keep subject ≤72 chars; include body for non-trivial changes |
| 5 | Never fabricate changes not in the diff; never use vague subjects like "update code" |
| 6 | "commit message, write commit, git commit, conventional commit" |
| 7 | Structured message |
| 8 | No companion files |
| 9 | Bash (for git diff --staged) |
| 10 | Generate scenarios |
Scaffold produced:
skills/commit-message/
├── SKILL.md # Frontmatter with triggers, non-negotiables, format rules, integrated example
├── tile.json # oh-my-ai/commit-message, v1.0.0
└── evals/
├── simple-feature-commit/
│ ├── task.md # "Given this staged diff adding a login form..."
│ └── criteria.json # Tests: conventional format, subject length, body presence
└── noisy-multi-file-commit/
├── task.md # "Given this large diff touching 8 files..."
└── criteria.json # Tests: focus, not fabricating, correct scopeRepo integration: README row added (alphabetically); CI matrix updated.
tessl eval run / judge runs) or overwriting previous benchmark-log.md entries.tessl skill review in the middle of Phase 4, or guessing a scenario generation id instead of parsing CLI output.skills-lock.json when scaffolding a first-party skill under skills/.75611cd
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.