CtrlK
BlogDocsLog inGet started
Tessl Logo

oh-my-ai/skill-maker

Interactive skill creation and eval-driven optimization. Triggers: create a skill, make a skill, new skill, scaffold skill, optimize skill, run evals, improve skill. Uses AskUserQuestion for interview; WebSearch for research; Bash for eval execution. Outputs: complete skill directory with SKILL.md, tile.json, evals, and repo integration.

93

1.26x
Quality

94%

Does it follow best practices?

Impact

91%

1.26x

Average score across 3 eval scenarios

SecuritybySnyk

Passed

No known issues

Overview
Quality
Evals
Security
Files

criteria.jsonevals/scenario-2/

{
  "context": "Tests whether the agent structures the skill creation interview correctly: using the AskUserQuestion mechanism for user-facing choices, always including an uncertainty option in every question, covering all required topic areas, and running the complete interview before producing any scaffold. The task asks the agent to document their interview plan as a structured artifact, making the process observable.",
  "type": "weighted_checklist",
  "checklist": [
    {
      "name": "AskUserQuestion mechanism",
      "description": "The interview plan document explicitly names AskUserQuestion (or the equivalent tool call) as the mechanism for presenting questions to the user — does NOT describe printing questions via echo, console output, or inline text responses",
      "max_score": 12
    },
    {
      "name": "Uncertainty option present",
      "description": "Every question in the interview plan includes an uncertainty/escape option (e.g. 'Not sure yet', 'Unsure — decide for me', or equivalent) as one of the selectable answers",
      "max_score": 12
    },
    {
      "name": "Core purpose question",
      "description": "The interview plan includes a question that captures the skill's core purpose or what problem it solves",
      "max_score": 6
    },
    {
      "name": "Trigger signals question",
      "description": "The interview plan includes a question about what user phrases, keywords, or situations should activate the skill",
      "max_score": 6
    },
    {
      "name": "Non-negotiables question",
      "description": "The interview plan includes a question about steps or behaviors the skill must always perform",
      "max_score": 6
    },
    {
      "name": "Gotchas/warnings question",
      "description": "The interview plan includes a question about edge cases, warnings, or things that can go wrong",
      "max_score": 6
    },
    {
      "name": "Anti-patterns question",
      "description": "The interview plan includes a question about behaviors or patterns the skill must avoid",
      "max_score": 6
    },
    {
      "name": "Outputs/artifacts question",
      "description": "The interview plan includes a question about what the skill produces (files, messages, side-effects)",
      "max_score": 6
    },
    {
      "name": "Completeness check logic",
      "description": "The interview plan or accompanying notes describe a completeness check: if core purpose, trigger signals, gotchas, or sub-steps are still marked 'unsure' after all questions, a follow-up question is asked before proceeding",
      "max_score": 10
    },
    {
      "name": "Interview before scaffold",
      "description": "The plan/document makes clear that all interview questions are completed BEFORE any scaffold files are generated — does NOT interleave file generation with question-asking",
      "max_score": 12
    },
    {
      "name": "Question count",
      "description": "The interview plan contains at least 8 distinct questions (covering different topic areas)",
      "max_score": 6
    },
    {
      "name": "Scaffold follows from answers",
      "description": "Any scaffold produced (SKILL.md, tile.json, etc.) reflects the answers provided in the scenario input — the interview responses are used, not ignored",
      "max_score": 6
    },
    {
      "name": "metadata.version in scaffold",
      "description": "If SKILL.md is produced, its frontmatter includes a metadata.version field",
      "max_score": 6
    }
  ]
}

evals

SKILL.md

tile.json