oh-my-ai/skill-maker

Interactive skill creation and eval-driven optimization. Triggers: create a skill, make a skill, new skill, scaffold skill, optimize skill, run evals, improve skill. Uses AskUserQuestion for interview; WebSearch for research; Bash for eval execution. Outputs: complete skill directory with SKILL.md, tile.json, evals, and repo integration.

1.26x

Quality

94%

Does it follow best practices?

Impact

91%

1.26x

Average score across 3 eval scenarios

Securityby

Passed

No known issues

Analyse Eval Results and Propose Improvements for the Git Commit Helper Skill

Name: oh-my-ai/skill-maker
Rating: 93.4 (1 reviews)
Author: oh-my-ai

Problem / Feature Description

The AI tooling team at a software consultancy has been running a skill called git-commit-helper for several weeks. This skill helps engineers write clear, consistent git commit messages. Recently the team ran a round of evaluations to measure how much the skill actually improves agent behaviour compared to a baseline (no skill).

The results are mixed. Some scenarios show strong improvement, but one scenario actually shows the skill making things worse compared to the baseline, and several criteria scored zero even with the skill present. The team lead wants to understand what's going wrong and get a concrete action plan before the next sprint.

Your job is to analyse the eval results, record the outcome in the project's benchmark log, and produce a prioritised list of specific proposed edits to address the failures. The analysis should be ready to hand to the engineer who will actually make the edits.

Output Specification

benchmark-log.md — update this file with the new eval run results (the existing file is provided; preserve its history)
optimization-proposals.md — a prioritised list of specific proposed edits to SKILL.md, ready for an engineer to action

Do NOT modify SKILL.md or tile.json directly — capture all proposed changes in optimization-proposals.md.

Input Files

The following files are provided as inputs. Extract them before beginning.

=============== FILE: inputs/benchmark-log.md ===============

Benchmark Log — git-commit-helper

Run 2026-03-01 | method: llm-as-judge | model: claude-opus-4-6

Scenario	Baseline	With Skill	Delta
scenario-0: basic commit	52	81	+29
scenario-1: breaking change	48	74	+26
scenario-2: merge commit	61	79	+18

Overall: Baseline avg 53.7 → With-skill avg 78.0 | Δ +24.3

Run 2026-03-15 | method: llm-as-judge | model: claude-opus-4-6

Scenario	Baseline	With Skill	Delta
scenario-0: basic commit	55	83	+28
scenario-1: breaking change	50	77	+27
scenario-2: merge commit	58	76	+18

Overall: Baseline avg 54.3 → With-skill avg 78.7 | Δ +24.4

=============== FILE: inputs/eval-results.json =============== { "date": "2026-04-07", "method": "llm-as-judge", "model": "claude-opus-4-6", "scenarios": [ { "scenario": "scenario-0: basic commit", "baseline": 56, "withSkill": 82, "delta": 26, "criteria": [ { "name": "Conventional commit prefix", "baseline": 60, "withSkill": 90, "delta": 30 }, { "name": "Subject line length", "baseline": 70, "withSkill": 95, "delta": 25 }, { "name": "Imperative mood", "baseline": 45, "withSkill": 80, "delta": 35 }, { "name": "No period at end", "baseline": 55, "withSkill": 75, "delta": 20 }, { "name": "Security patterns", "baseline": 40, "withSkill": 40, "delta": 0 } ] }, { "scenario": "scenario-1: breaking change", "baseline": 48, "withSkill": 71, "delta": 23, "criteria": [ { "name": "BREAKING CHANGE footer", "baseline": 30, "withSkill": 75, "delta": 45 }, { "name": "Scope in prefix", "baseline": 55, "withSkill": 80, "delta": 25 }, { "name": "Body explains why", "baseline": 40, "withSkill": 65, "delta": 25 }, { "name": "Security patterns", "baseline": 38, "withSkill": 38, "delta": 0 }, { "name": "Blank line after subject", "baseline": 75, "withSkill": 90, "delta": 15 } ] }, { "scenario": "scenario-2: merge commit", "baseline": 65, "withSkill": 60, "delta": -5, "criteria": [ { "name": "Merge commit format", "baseline": 72, "withSkill": 68, "delta": -4 }, { "name": "Changelog check", "baseline": 60, "withSkill": 48, "delta": -12 }, { "name": "No squash marker", "baseline": 80, "withSkill": 77, "delta": -3 }, { "name": "Co-author attribution", "baseline": 50, "withSkill": 62, "delta": 12 }, { "name": "Blank line after subject", "baseline": 70, "withSkill": 85, "delta": 15 } ] } ] }

=============== FILE: inputs/SKILL.md ===============

name: git-commit-helper description: "Use this skill when writing or reviewing git commit messages. Triggers: write a commit message, review my commit, help me commit, check my git message." metadata: version: "1.1.0" tags: git, commit, version-control

Git Commit Helper

Helps engineers write clear, consistent git commit messages following Conventional Commits.

Non-negotiables

Always use the Conventional Commits prefix format: type(scope): description
Keep subject line under 72 characters
Use imperative mood in subject line ("add" not "added")
Never end subject line with a period
Always include a blank line between subject and body

Process

Step 1 — Analyse the staged changes

Read the diff and identify the type of change: feat, fix, docs, refactor, chore, etc.

Step 2 — Draft the subject line

Write a subject under 72 chars using imperative mood with the correct prefix.

Step 3 — Write the body (if needed)

For non-trivial changes, explain why the change was made, not what files changed.

Step 4 — Add footers

For breaking changes, add a BREAKING CHANGE footer. For co-authored commits, add Co-authored-by lines.

Example

Input: staged changes adding a new API endpoint for user authentication

Output:

feat(auth): add user authentication endpoint

Adds POST /api/auth/login that validates credentials against the user
store and returns a JWT. Replaces the legacy session-based flow.

Anti-patterns

Using past tense ("added feature X")
Subject lines over 72 characters
Forgetting the BREAKING CHANGE footer on breaking changes
Including file names in the subject instead of describing the intent

evals

scenario-1

scenario-2

scenario-3

rules