CtrlK
BlogDocsLog inGet started
Tessl Logo

mcollina/skill-optimizer

Optimizes AI skills for activation, clarity, and cross-model reliability. Use when creating or editing skill packs, diagnosing weak skill uptake, reducing regressions, tuning instruction salience, improving examples, shrinking context cost, or setting benchmark and release gates for skills. Trigger terms: skill optimization, activation gap, benchmark skill, with/without skill delta, regression, context budget, prompt salience.

87

1.14x
Quality

87%

Does it follow best practices?

Impact

87%

1.14x

Average score across 5 eval scenarios

SecuritybySnyk

Passed

No known issues

Overview
Quality
Evals
Security
Files

task.mdevals/scenario-2/

Benchmark Report for a Commit Message Skill

Problem/Feature Description

Your team has been developing a skill called commit-message-writer that guides models in producing well-structured git commit messages (with subject line, body, and a Refs: footer). You've collected raw evaluation data from running three different language models (ModelA, ModelB, ModelC) on five scenarios, both with and without the skill active.

Your tech lead wants a formal benchmark report they can use to track the skill's effectiveness over time and decide what to improve next. The data is in the raw results file below. The report needs to be formatted consistently so it can be compared to future runs.

Output Specification

Produce a file called benchmark-report.md that contains:

  • A summary table of results across all models
  • A section highlighting any models or scenarios that stand out as concerning
  • A brief interpretation section recommending what to do next for each model/pattern you identified

Also produce a file called methodology.md explaining how you structured the report and what decisions you made.

Input Files

The following files are provided as inputs. Extract them before beginning.

=============== FILE: inputs/raw-results.json =============== { "skill": "commit-message-writer", "run_date": "2026-04-15", "scenarios": [ "basic-feature-commit", "bug-fix-commit", "omission-stress-refs-footer", "noisy-context-large-diff", "multi-file-refactor" ], "results": { "ModelA": { "without_skill": { "basic-feature-commit": 72, "bug-fix-commit": 68, "omission-stress-refs-footer": 40, "noisy-context-large-diff": 55, "multi-file-refactor": 61 }, "with_skill": { "basic-feature-commit": 88, "bug-fix-commit": 85, "omission-stress-refs-footer": 90, "noisy-context-large-diff": 78, "multi-file-refactor": 80 } }, "ModelB": { "without_skill": { "basic-feature-commit": 80, "bug-fix-commit": 77, "omission-stress-refs-footer": 45, "noisy-context-large-diff": 60, "multi-file-refactor": 70 }, "with_skill": { "basic-feature-commit": 79, "bug-fix-commit": 76, "omission-stress-refs-footer": 88, "noisy-context-large-diff": 58, "multi-file-refactor": 68 } }, "ModelC": { "without_skill": { "basic-feature-commit": 30, "bug-fix-commit": 28, "omission-stress-refs-footer": 15, "noisy-context-large-diff": 20, "multi-file-refactor": 25 }, "with_skill": { "basic-feature-commit": 35, "bug-fix-commit": 32, "omission-stress-refs-footer": 22, "noisy-context-large-diff": 0, "multi-file-refactor": 30 } } } }

evals

SKILL.md

tile.json