CtrlK
BlogDocsLog inGet started
Tessl Logo

oh-my-ai/skill-maker

Interactive skill creation and eval-driven optimization. Triggers: create a skill, make a skill, new skill, scaffold skill, optimize skill, run evals, improve skill. Uses AskUserQuestion for interview; WebSearch for research; Bash for eval execution. Outputs: complete skill directory with SKILL.md, tile.json, evals, and repo integration.

93

1.26x
Quality

94%

Does it follow best practices?

Impact

91%

1.26x

Average score across 3 eval scenarios

SecuritybySnyk

Passed

No known issues

Overview
Quality
Evals
Security
Files

task.mdevals/scenario-3/

Analyse Eval Results and Propose Improvements for the Git Commit Helper Skill

Problem / Feature Description

The AI tooling team at a software consultancy has been running a skill called git-commit-helper for several weeks. This skill helps engineers write clear, consistent git commit messages. Recently the team ran a round of evaluations to measure how much the skill actually improves agent behaviour compared to a baseline (no skill).

The results are mixed. Some scenarios show strong improvement, but one scenario actually shows the skill making things worse compared to the baseline, and several criteria scored zero even with the skill present. The team lead wants to understand what's going wrong and get a concrete action plan before the next sprint.

Your job is to analyse the eval results, record the outcome in the project's benchmark log, and produce a prioritised list of specific proposed edits to address the failures. The analysis should be ready to hand to the engineer who will actually make the edits.

Output Specification

  • benchmark-log.md — update this file with the new eval run results (the existing file is provided; preserve its history)
  • optimization-proposals.md — a prioritised list of specific proposed edits to SKILL.md, ready for an engineer to action

Do NOT modify SKILL.md or tile.json directly — capture all proposed changes in optimization-proposals.md.

Input Files

The following files are provided as inputs. Extract them before beginning.

=============== FILE: inputs/benchmark-log.md ===============

Benchmark Log — git-commit-helper

Run 2026-03-01 | method: llm-as-judge | model: claude-opus-4-6

ScenarioBaselineWith SkillDelta
scenario-0: basic commit5281+29
scenario-1: breaking change4874+26
scenario-2: merge commit6179+18

Overall: Baseline avg 53.7 → With-skill avg 78.0 | Δ +24.3


Run 2026-03-15 | method: llm-as-judge | model: claude-opus-4-6

ScenarioBaselineWith SkillDelta
scenario-0: basic commit5583+28
scenario-1: breaking change5077+27
scenario-2: merge commit5876+18

Overall: Baseline avg 54.3 → With-skill avg 78.7 | Δ +24.4


=============== FILE: inputs/eval-results.json =============== { "date": "2026-04-07", "method": "llm-as-judge", "model": "claude-opus-4-6", "scenarios": [ { "scenario": "scenario-0: basic commit", "baseline": 56, "withSkill": 82, "delta": 26, "criteria": [ { "name": "Conventional commit prefix", "baseline": 60, "withSkill": 90, "delta": 30 }, { "name": "Subject line length", "baseline": 70, "withSkill": 95, "delta": 25 }, { "name": "Imperative mood", "baseline": 45, "withSkill": 80, "delta": 35 }, { "name": "No period at end", "baseline": 55, "withSkill": 75, "delta": 20 }, { "name": "Security patterns", "baseline": 40, "withSkill": 40, "delta": 0 } ] }, { "scenario": "scenario-1: breaking change", "baseline": 48, "withSkill": 71, "delta": 23, "criteria": [ { "name": "BREAKING CHANGE footer", "baseline": 30, "withSkill": 75, "delta": 45 }, { "name": "Scope in prefix", "baseline": 55, "withSkill": 80, "delta": 25 }, { "name": "Body explains why", "baseline": 40, "withSkill": 65, "delta": 25 }, { "name": "Security patterns", "baseline": 38, "withSkill": 38, "delta": 0 }, { "name": "Blank line after subject", "baseline": 75, "withSkill": 90, "delta": 15 } ] }, { "scenario": "scenario-2: merge commit", "baseline": 65, "withSkill": 60, "delta": -5, "criteria": [ { "name": "Merge commit format", "baseline": 72, "withSkill": 68, "delta": -4 }, { "name": "Changelog check", "baseline": 60, "withSkill": 48, "delta": -12 }, { "name": "No squash marker", "baseline": 80, "withSkill": 77, "delta": -3 }, { "name": "Co-author attribution", "baseline": 50, "withSkill": 62, "delta": 12 }, { "name": "Blank line after subject", "baseline": 70, "withSkill": 85, "delta": 15 } ] } ] }

=============== FILE: inputs/SKILL.md ===============

name: git-commit-helper description: "Use this skill when writing or reviewing git commit messages. Triggers: write a commit message, review my commit, help me commit, check my git message." metadata: version: "1.1.0" tags: git, commit, version-control

Git Commit Helper

Helps engineers write clear, consistent git commit messages following Conventional Commits.

Non-negotiables

  1. Always use the Conventional Commits prefix format: type(scope): description
  2. Keep subject line under 72 characters
  3. Use imperative mood in subject line ("add" not "added")
  4. Never end subject line with a period
  5. Always include a blank line between subject and body

Process

Step 1 — Analyse the staged changes

Read the diff and identify the type of change: feat, fix, docs, refactor, chore, etc.

Step 2 — Draft the subject line

Write a subject under 72 chars using imperative mood with the correct prefix.

Step 3 — Write the body (if needed)

For non-trivial changes, explain why the change was made, not what files changed.

Step 4 — Add footers

For breaking changes, add a BREAKING CHANGE footer. For co-authored commits, add Co-authored-by lines.

Example

Input: staged changes adding a new API endpoint for user authentication

Output:

feat(auth): add user authentication endpoint

Adds POST /api/auth/login that validates credentials against the user
store and returns a JWT. Replaces the legacy session-based flow.

Anti-patterns

  • Using past tense ("added feature X")
  • Subject lines over 72 characters
  • Forgetting the BREAKING CHANGE footer on breaking changes
  • Including file names in the subject instead of describing the intent

evals

SKILL.md

tile.json