Optimizes AI skills for activation, clarity, and cross-model reliability. Use when creating or editing skill packs, diagnosing weak skill uptake, reducing regressions, tuning instruction salience, improving examples, shrinking context cost, or setting benchmark and release gates for skills. Trigger terms: skill optimization, activation gap, benchmark skill, with/without skill delta, regression, context budget, prompt salience.
87
87%
Does it follow best practices?
Impact
87%
1.14xAverage score across 5 eval scenarios
Passed
No known issues
Your team has been developing a skill called commit-message-writer that guides models in producing well-structured git commit messages (with subject line, body, and a Refs: footer). You've collected raw evaluation data from running three different language models (ModelA, ModelB, ModelC) on five scenarios, both with and without the skill active.
Your tech lead wants a formal benchmark report they can use to track the skill's effectiveness over time and decide what to improve next. The data is in the raw results file below. The report needs to be formatted consistently so it can be compared to future runs.
Produce a file called benchmark-report.md that contains:
Also produce a file called methodology.md explaining how you structured the report and what decisions you made.
The following files are provided as inputs. Extract them before beginning.
=============== FILE: inputs/raw-results.json =============== { "skill": "commit-message-writer", "run_date": "2026-04-15", "scenarios": [ "basic-feature-commit", "bug-fix-commit", "omission-stress-refs-footer", "noisy-context-large-diff", "multi-file-refactor" ], "results": { "ModelA": { "without_skill": { "basic-feature-commit": 72, "bug-fix-commit": 68, "omission-stress-refs-footer": 40, "noisy-context-large-diff": 55, "multi-file-refactor": 61 }, "with_skill": { "basic-feature-commit": 88, "bug-fix-commit": 85, "omission-stress-refs-footer": 90, "noisy-context-large-diff": 78, "multi-file-refactor": 80 } }, "ModelB": { "without_skill": { "basic-feature-commit": 80, "bug-fix-commit": 77, "omission-stress-refs-footer": 45, "noisy-context-large-diff": 60, "multi-file-refactor": 70 }, "with_skill": { "basic-feature-commit": 79, "bug-fix-commit": 76, "omission-stress-refs-footer": 88, "noisy-context-large-diff": 58, "multi-file-refactor": 68 } }, "ModelC": { "without_skill": { "basic-feature-commit": 30, "bug-fix-commit": 28, "omission-stress-refs-footer": 15, "noisy-context-large-diff": 20, "multi-file-refactor": 25 }, "with_skill": { "basic-feature-commit": 35, "bug-fix-commit": 32, "omission-stress-refs-footer": 22, "noisy-context-large-diff": 0, "multi-file-refactor": 30 } } } }