he-improve

Improve existing Harness Engineering implementations or workflows with evidence-backed changes. Use when users ask for targeted enhancement of shipped or drafted work.

Quality

—

Does it follow best practices?

Impact

—

No eval scenarios have been run

Securityby

Passed

No known issues

Progressive Disclosure Entry

This entrypoint stays concise and keeps full operational context in archived references.

Philosophy

Optimize with measurable evidence, not subjective preference.
Keep experiments bounded and reversible.
Persist experiment state to disk as the source of truth; chat context is not durable state.

When to use

Use when behavior exists and needs targeted quality, reliability, or performance improvement.
Use after baseline implementation when iterative tuning is appropriate.
Use when multiple plausible changes should be compared under explicit gates instead of picking one implementation path up front.
Use when prior Codex sessions, archived sessions, or ~/.agents/session-collector evidence should become targeted Harness Engineering plugin or workflow improvements.

Inputs

Request, artifacts, repo context, linked Linear issues, and optional session evidence paths or collector output.

Outputs

schema_version: 1 when structured; result, validation, blockers, and next Harness Engineering action.

Procedure

Load or create the optimization spec and validate metric type, scope, gates, and stopping limits.
If session evidence is requested or supplied, read ../../../../../../references/session-evidence-contract.md and classify recurring signals before choosing improvements.
Decide whether the target should use direct hard metrics, judge scoring, session-recurrence evidence, or hybrid gates plus judge evaluation.
Detect and resolve fresh versus resume state before running new experiments.
Establish a trusted baseline with the measurement harness, collector output, index counts, or explicit evidence samples before widening execution.
Run bounded iterations with explicit measurement gates and isolated experiment state.
After each experiment or session-evidence pass, write results to disk immediately, verify the write, and only then report or compare outcomes.
Keep, revise, or discard changes based on measured outcomes or recurring evidence, then route proven results to the next stage.

Validation

Ensure the spec, metric mode, and measurement command are valid before experimentation starts.
Ensure session-derived improvements cite the collector output, archived path, session index count, or exact sample used.
Ensure each iteration has explicit metric target and rollback posture.
Ensure accepted changes are justified by observed improvement.
Ensure critical experiment state is written to disk and verified before moving on.
Fail fast: stop at first failed gate and do not proceed.

Constraints

Redact secrets, credentials, tokens, and sensitive data by default.
Do not broaden scope beyond bounded optimization goals.
Do not mutate the measurement harness or declared immutable surfaces inside experiment edits.
Do not summarize optimization results before they have been durably logged.
Apply the context-disposition policy: move important still-valid context to references and index it when meaningful; intentionally discard stale, duplicated, unsafe, superseded, or low-signal text.

Anti-patterns

Tuning without baseline metrics.
Changing HE workflows from anecdotal memory without session evidence.
Keeping changes that do not improve target outcomes.
Running parallel experiments before baseline and readiness probe confidence exists.
Treating optimization as one-shot implementation instead of a measured keep-or-revert loop.

Full Context

Assets: icon-small.png, icon-large.png

Examples

"Can you inspect this shipped retry workflow and prove the improvement with before and after metrics?"
"Help me tune this validation lane, but keep each experiment reversible and stop if the metric gets worse."
"This feature works, but the review loop is slow. Compare two bounded improvements and keep only the measured winner."
"Use archived Codex sessions and the session collector to find repeated HE workflow failures and improve the plugin."

Repository: jscraik/Agent-Skills
Commit: 5a6027f

Last updated: 5 days ago
Created: 5 days ago

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.