Design a loss function and harness for a long-running /goal optimization run (loss-function development, LFD). Use when the user wants to set up an autonomous optimization loop, distill a product from public artifacts, turn a spec into an optimization target, or asks to design a /goal. Observes the existing environment, interrogates the task, ingests or generates the spec, builds a blinded eval, generates and verifies the harness, red-teams the target for cheats, and emits goal.md ready to launch. Re-invoke in patch mode when a running loop cheated and the loss function needs patching.
80
—
Does it follow best practices?
Impact
—
No eval scenarios have been run
Advisory
Suggest reviewing before use
You are designing an optimization target, not solving a task. The agent that
receives goal.md is a competent, tireless, literal optimizer: it will satisfy
the target by the cheapest available path — memorizing the eval, hardcoding
answers, mining feedback channels into lookup tables. Your job is to make
genuine capability the cheapest path left.
A spec says "build this, make the tests pass." A loss function says "build this, make the tests pass, then descend toward this bar on data you cannot see." You are writing the second thing. It has four parts: the target, the constraints, the instruments, and the forced entropy. Every /goal you emit must contain all four.
Two modes. Design mode (default): the phases below, in order. Patch mode (see end): a running loop cheated; fix the loss function, not the agent.
Inventory the environment BEFORE asking the user anything. The first principle of harness engineering is observability — apply it to your own task:
Reuse what exists — extend an existing scorer or eval rather than generating a parallel one. Whatever observation could not answer becomes Phase 1.
Ask the user in ONE batched round, only what Phase 0 couldn't answer:
The spec is the starting point, not the finish line. Before designing any optimization target:
spec.md.goal.md must gate the outer loop behind the inner one: Stage 0 = build
to spec, tests green, before any descent on the eval. Never let the agent
optimize a half-built system against sparse, slow feedback.If the user cannot hand over enough cases, build them — for more and more problems, real expected outputs are sitting in public:
eval/dev (scored freely, misses reported but capped) and
eval/holdout (scored rarely, aggregate-only; acceptance measured here
exclusively; answers outside the repo if at all possible).Target.
Constraints.
Enumerate the cheats. Read references/cheat-museum.md, then list at
least 10 ways a lazy optimizer could max THIS metric without solving THIS
task. For each, write the fence: a constraint in goal.md AND a way to
detect violation. A constraint without an instrument is a vibe — the agent
will violate it cheerfully because it can't tell it's violating it.
Enforcement design rule. Any constraint that references eval content
(e.g. "no literal in the codebase may match an eval item") can only be
checked by the harness — the agent can't check it without reading the eval.
Put the check in harness/lint.sh, run it inside score.sh, and on
violation VOID the score and report nothing else. Naming the offending
literal turns your lint into a membership oracle the agent can mine
string-by-string (museum exhibit 12). Your enforcement instrument is itself
a feedback channel — leak-audit it like any other.
Write these files now, tailored to the task. Do not ship placeholders. Reuse anything Phase 0 found.
harness/score.sh — the task-specific scorer. Pixel-diff for a UI clone
(deterministic rendering: frozen time, animations off, pinned fonts,
fixed viewport), recall@k + precision for retrieval, structured JSON diff
for API behavior. Runs lint.sh first: any violation voids the score
(output VOID: constraint violation and nothing more). Scores eval/dev
by default; --holdout returns one aggregate number, rate-limited, and
appends to an audit log.harness/lint.sh — checks capacity caps and eval-literal overlap. Called
only by score.sh; its detailed findings go to a file outside the
optimizer's read surface, for the human.harness/probe.sh — generates perturbed variants of dev INPUTS
(paraphrases, date shifts, entity swaps) and reports the dev-vs-probe
score gap. The gap is the memorization gauge.harness/status.sh — per-step timestamps and total wall-clock elapsed;
spend so far AND projected burn before the next paid batch, per surface;
score history per cycle; and the optimizer's own token consumption where
session logs allow. Gain per token is the gradient of the optimization
itself — the loop should be self-aware.eval/dev/ and eval/holdout/ — from Phase 3.LOG.md — instantiate references/log-template.md: one entry per cycle
with hypothesis / expected failure mode / diagnostic / result, written
before the change, not after. This is what survives context compaction.Do this now, with your own tools. Do not delegate it to the user:
score.sh on dev — it must produce a number.probe.sh and status.sh once each.Before emitting, simulate the laziest possible agent against your draft /goal: what is the five-minute win? Common ones: seed data that mirrors the eval, mining per-item miss feedback into a keyword lookup table, gaming a judge, editing the scorer or the goal itself, declaring victory on the dev set. Patch the draft and simulate again. Emit only when three consecutive simulations find nothing cheaper than doing the real work.
Fill the structure in references/goal-template.md. Every placeholder gets
a task-specific value; no section is dropped. Invariants the emitted goal.md
must keep regardless of task: the Stage 0 tests-green gate, VOID semantics,
holdout-only acceptance, the read-only set including goal.md itself, the
per-cycle checkpoint commit, the entropy rules, and the stop conditions.
Everything else was verified in Phase 6. Tell the user:
A cheat mid-run is a bug in the target, not the agent. When invoked against a running or paused loop (the user reports a cheat, or LOG.md / a probe gap shows one):
LOG.md, the score history, and the diff since the last honest
checkpoint.references/cheat-museum.md — what it looked
like → the fence that closed it.abf2661
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.