Optimizes AI skills for activation, clarity, and cross-model reliability. Use when creating or editing skill packs, diagnosing weak skill uptake, reducing regressions, tuning instruction salience, improving examples, shrinking context cost, or setting benchmark and release gates for skills. Trigger terms: skill optimization, activation gap, benchmark skill, with/without skill delta, regression, context budget, prompt salience.
87
87%
Does it follow best practices?
Impact
87%
1.14xAverage score across 5 eval scenarios
Passed
No known issues
{
"context": "Tests whether the agent correctly applies release gate criteria: checking required pass conditions (no 0% criteria, no negative deltas on critical scenarios, recorded benchmark run, follow-up issues opened), soft pass conditions, and producing a complete PR checklist with all required items.",
"type": "weighted_checklist",
"checklist": [
{
"name": "Identifies 0% criterion failure",
"description": "release-assessment.md flags ModelC's async-error-propagation score of 0 with skill as a universal failure / 0% criterion, and this counts as a required-pass-condition failure",
"max_score": 10
},
{
"name": "Identifies negative delta",
"description": "release-assessment.md identifies ModelC's logging-integration as a regression (with=40 < without=42, delta=-2) and treats it as a required-pass-condition failure",
"max_score": 10
},
{
"name": "Benchmark run entry recorded",
"description": "release-assessment.md contains a benchmark run log entry with the date (2026-04-16), the model matrix (ModelA/B/C), and delta values",
"max_score": 10
},
{
"name": "Follow-up issues required",
"description": "release-assessment.md states that follow-up issues must be opened for the 0% failure and the regression before or alongside the merge (not just 'nice to have')",
"max_score": 10
},
{
"name": "Go/no-go recommendation",
"description": "release-assessment.md contains an explicit go or no-go recommendation, and it is NO-GO (or conditional) given the 0% criterion and regression",
"max_score": 10
},
{
"name": "Soft pass: measurable gain noted",
"description": "release-assessment.md notes that the soft pass condition of 'at least one measurable gain on a target weak model' is met (ModelC shows gains on custom-error-classes and centralized-middleware)",
"max_score": 8
},
{
"name": "PR checklist: SKILL.md links updated",
"description": "pr-description.md includes a checklist item confirming SKILL.md links are updated for the new rules/async-patterns.md file",
"max_score": 8
},
{
"name": "PR checklist: benchmark run log entry",
"description": "pr-description.md includes a checklist item for the benchmark run log entry being added/updated",
"max_score": 8
},
{
"name": "PR checklist: validation commands",
"description": "pr-description.md includes a checklist item for validation command outputs (test, typecheck, lint) — and flags that these were NOT run (per the input data)",
"max_score": 8
},
{
"name": "PR checklist: tracking issues linked",
"description": "pr-description.md includes a checklist item for linking tracking issues and remediation notes",
"max_score": 8
},
{
"name": "Post-merge loop mentioned",
"description": "release-assessment.md or pr-description.md mentions scheduling a rerun after the next model update and/or comparing against prior run history",
"max_score": 5
},
{
"name": "Stale guidance pruning mentioned",
"description": "release-assessment.md or pr-description.md mentions pruning stale guidance that no longer moves metrics as part of post-merge maintenance",
"max_score": 5
}
]
}