CtrlK
BlogDocsLog inGet started
Tessl Logo

oh-my-ai/skill-maker

Interactive skill creation and eval-driven optimization. Triggers: create a skill, make a skill, new skill, scaffold skill, optimize skill, run evals, improve skill. Uses AskUserQuestion for interview; WebSearch for research; Bash for eval execution. Outputs: complete skill directory with SKILL.md, tile.json, evals, and repo integration.

93

1.26x
Quality

94%

Does it follow best practices?

Impact

91%

1.26x

Average score across 3 eval scenarios

SecuritybySnyk

Passed

No known issues

Overview
Quality
Evals
Security
Files

criteria.jsonevals/scenario-3/

{
  "context": "Tests whether the agent follows the prescribed optimization workflow: prioritizing negative deltas first, then 0% criteria, then low-delta scenarios; producing specific actionable edits rather than vague suggestions; and appending to the benchmark log rather than overwriting it. The existing benchmark-log.md content must be preserved.",
  "type": "weighted_checklist",
  "checklist": [
    {
      "name": "Benchmark log preserved",
      "description": "The existing entries in benchmark-log.md are present unchanged in the output file — prior run data is NOT removed, truncated, or overwritten",
      "max_score": 14
    },
    {
      "name": "New entry appended",
      "description": "A new dated entry for this eval run is appended AFTER the existing entries in benchmark-log.md (not inserted before them or replacing them)",
      "max_score": 8
    },
    {
      "name": "Negative delta addressed first",
      "description": "The optimization proposals address the scenario/criterion that has a negative delta (scenario-2 / 'Changelog check') BEFORE addressing lower-priority issues",
      "max_score": 14
    },
    {
      "name": "Zero-percent criteria addressed",
      "description": "The optimization proposals include a fix for the criterion that scored 0% with-skill (the 'Security patterns' criterion in scenario-1)",
      "max_score": 7
    },
    {
      "name": "Proposals are specific edits",
      "description": "Each optimization proposal contains specific wording to add, remove, or restructure in SKILL.md — does NOT only say things like 'add more examples', 'clarify this section', or 'improve coverage'",
      "max_score": 14
    },
    {
      "name": "No vague direction",
      "description": "None of the optimization proposals use vague direction phrases without a concrete edit: 'add more examples', 'make it clearer', 'consider adding', 'try to include' are absent from proposals",
      "max_score": 10
    },
    {
      "name": "Priority order followed",
      "description": "The list of proposed edits is ordered: negative-delta issues first, then 0% criteria, then lowest-delta scenarios — not in arbitrary or alphabetical order",
      "max_score": 10
    },
    {
      "name": "Eval read-only respected",
      "description": "The output does NOT include a modified SKILL.md or tile.json — optimization proposals are documented separately (e.g. in an analysis file), not applied directly to skill source files",
      "max_score": 14
    },
    {
      "name": "Result schema present",
      "description": "The analysis output includes a structured summary with at minimum: scenario names, baseline scores, with-skill scores, and deltas",
      "max_score": 4
    },
    {
      "name": "Readout table format",
      "description": "The benchmark-log.md new entry includes a Markdown table with columns for scenario, baseline, with-skill, and delta — matching the format of existing entries in the log",
      "max_score": 5
    }
  ]
}

evals

SKILL.md

tile.json