CtrlK
BlogDocsLog inGet started
Tessl Logo

uinaf/skill-audit

Audit existing skills with Tessl scoring, trigger-coverage checks, repo conventions, or quick experiential feedback from a recent task. Use when revising skills, triaging weak activation, or turning observed skill guidance failures into scoped repo edits.

95

1.03x
Quality

95%

Does it follow best practices?

Impact

95%

1.03x

Average score across 3 eval scenarios

SecuritybySnyk

Passed

No known issues

Overview
Quality
Evals
Security
Files

criteria.jsonevals/scenario-1/

{
  "context": "Tests whether the agent audits discovery quality (name specificity, description completeness), flags missing overlap boundaries, identifies misplaced content that should be handed off to a different skill, uses Tessl per skill, writes descriptions in third person with what/when/boundary, and avoids calling the optimizer without approval.",
  "type": "weighted_checklist",
  "checklist": [
    {
      "name": "Tessl run per skill",
      "description": "audit-log.sh contains a separate Tessl invocation for each of the four skills (four distinct Tessl commands targeting different skill paths)",
      "max_score": 8
    },
    {
      "name": "Score per skill reported",
      "description": "audit-report.md states a Tessl score for each of the four skills individually",
      "max_score": 7
    },
    {
      "name": "Vague names flagged",
      "description": "audit-report.md identifies at least two of the four skill names (notifier, alerter, comms-router, data-pipeline) as vague, generic, or forgettable",
      "max_score": 8
    },
    {
      "name": "Missing boundaries flagged",
      "description": "audit-report.md flags that the overlapping skills (notifier, alerter, comms-router) lack explicit boundary statements distinguishing them from each other",
      "max_score": 9
    },
    {
      "name": "Third-person descriptions proposed",
      "description": "All four proposed replacement descriptions in rewrite-suggestions.md are written in third person (no first-person pronouns)",
      "max_score": 8
    },
    {
      "name": "What and when in descriptions",
      "description": "Each proposed replacement description in rewrite-suggestions.md states both what the skill does and when to use it",
      "max_score": 9
    },
    {
      "name": "Overlap boundary in descriptions",
      "description": "At least two of the proposed replacement descriptions in rewrite-suggestions.md explicitly state when NOT to use that skill (boundary with overlapping skills)",
      "max_score": 8
    },
    {
      "name": "Misplaced code review content flagged",
      "description": "audit-report.md identifies the Code Review Guidance section in data-pipeline/SKILL.md as out of scope and recommends it be moved or handed off",
      "max_score": 9
    },
    {
      "name": "Correct handoff skill named",
      "description": "audit-report.md names `review` (or a `review` skill) as the appropriate destination for the code review guidance, rather than leaving it in data-pipeline",
      "max_score": 9
    },
    {
      "name": "Evidence over taste",
      "description": "audit-report.md grounds each finding in Tessl output, repo conventions, or the actual file content — not subjective preference alone",
      "max_score": 8
    },
    {
      "name": "Optimizer not invoked",
      "description": "audit-log.sh does NOT contain `--optimize`",
      "max_score": 8
    },
    {
      "name": "Scope respected",
      "description": "The agent does NOT rewrite the skill body or workflow sections — only metadata (name/description) improvements are proposed, and structural body changes are flagged as separate findings",
      "max_score": 9
    }
  ]
}

evals

scenario-1

criteria.json

task.md

SKILL.md

tile.json