Design, build, or audit a coding agent, agentic loop, tool-use harness, or autonomous coding system — covering loop architecture, action space, context strategy, observation formatting, evaluation, error handling, prompt engineering, and task decomposition. Use when the user wants to design an agent, build a coding agent, scaffold an agentic system, architect a tool-use loop, review an existing agent harness for improvements, fix context bloat or compaction problems, tune observation formatting or tool output handling, debug agent loop or termination issues, design a system prompt or evaluator prompt for an agent, set up or redesign an agent evaluation pipeline, plan multi-agent orchestration, or specify how an agent should manage context, tools, prompts, evaluation, or recovery (greenfield design or audit mode).
100
100%
Does it follow best practices?
Impact
100%
1.23xAverage score across 4 eval scenarios
Passed
No known issues
Every constraint that can be expressed as an action the agent should take must be written that way. The ONLY negative-framed lines that may appear in the final prompt are inside the "Never Do" tier of the three-tier permission structure (safety-critical only — typically ≤3 items).
| ❌ Negative phrasing | ✅ Positive directive |
|---|---|
| "Do not use line-number-based patches" | "Use str_replace with exact unique strings to edit files" |
| "Don't return vague error messages" | "Return errors with expected format, constraints, and a correction example" |
| "Avoid relative paths" | "Use absolute filepaths for every file operation" |
| "Do not modify the user's git config" | "Limit git operations to commits and reads of the working tree" |
| "Never edit files outside the repo" | "Confine all edits to paths inside the configured repository root" |
| "Don't write to stdout in tool outputs" | "Route all tool output through the structured logger" |
| "Avoid hardcoding API keys" | "Read every secret from the configured env vars or vault path" |
Apply this rewrite pass once after the first draft, before shipping the prompt.
Don't front-load every instruction in the system prompt. Instead, inject contextual guidance immediately before the relevant decision or tool call. Achieves 100% accuracy across 600 runs vs. 82.5% for upfront-only instructions.
When the deliverable includes system-prompt.md, prompt-architecture.md, or
another prompt architecture artifact, include a visible
## Just-in-Time Steering Protocol section. This section must say:
Use the words "just-in-time" or "contextual steering" explicitly, and include a small trigger table like this:
| Trigger | Injected guidance | Removed when |
|---|---|---|
| Editing a file over 500 lines | Require exact string replacement with 3+ lines of context | Edit succeeds or one retry fails |
| Touching an ask-first path | Require explicit user approval before write | Approval is granted or denied |
| Verification fails | Preserve the error verbatim and retry only after root-cause analysis | Next correction is attempted |
Implementation pattern — three layers:
Example — file-edit tool
[System prompt — always present]
You edit files using str_replace with absolute paths.
[Injected before any str_replace call when target_file > 500 lines]
This file is 1,247 lines. Include >=3 lines of context on each side
of your old_string to guarantee uniqueness. If your match fails,
re-read the relevant region before retrying.
[Injected after a failed str_replace]
Your previous str_replace failed: old_string was not unique.
Add more surrounding context and retry once before considering
a different edit strategy.The system prompt stays minimal. The just-in-time blocks fire only when relevant — so the prompt the agent sees at each decision point is laser-targeted.
| Task type | Architecture |
|---|---|
| Single-turn completable | Minimal prompt via Right Altitude |
| Multi-session | Differentiated prompts (initializer + continuation) with JSON feature tracking |
| Multi-step within session | Technical Design Spec with sprint contracts |
| Subjective quality | Generator-Evaluator with separated rubric prompts |
| Requirement type | Specify? | Rationale |
|---|---|---|
| Format requirements | Usually no | 70.7% guessed correctly; specifying adds bloat |
| Conditional logic | Always | Only 22.9% guessed correctly; high regression risk |
| Error handling patterns | Yes | Model-dependent; regresses across updates |
| Code style | Yes, with examples | Actual code, not prose |
| Implementation details | No | Over-specification causes cascade failures |
When producing a ready-to-use system prompt, include a compact
## Conditional Rules section with explicit branches. Use if / then / otherwise wording, not implicit prose. Include at least three rules covering
permission decisions, large-file edits, verification, or recovery:
## Conditional Rules
- If a target file is larger than 500 lines, then edit with exact string
replacement and at least 3 lines of context; otherwise use the standard edit
path.
- If a change touches an ask-first path, then request approval before writing;
otherwise proceed within the Always Do permissions.
- If verification fails, then preserve the error verbatim, identify root cause,
and retry once; otherwise summarize the passing check count only.When the system prompt specifies code style, include 2-3 verbatim code blocks showing the desired style. Prose descriptions ("use modern idioms", "keep functions short") produce inconsistent results because models infer different styles from the same words. A code block is unambiguous.
Include code blocks for: naming conventions, error-handling pattern, async/concurrency pattern, module/file structure, comment style. Even one well-chosen example anchors the model better than 10 lines of prose.
# In your produced system-prompt.md, include sections like:
## Code style
Use type hints on every public function. Prefer explicit returns
over implicit None. Use f-strings for formatting.
# Example of the style we expect:
def parse_config(path: Path) -> Config:
"""Load a Config from a YAML file."""
if not path.exists():
raise FileNotFoundError(f"Config not found at {path}")
raw = yaml.safe_load(path.read_text())
return Config.model_validate(raw)The actual code blocks become part of the system prompt — the agent sees them at every step. Always include at least two examples covering different aspects of the style.
The evaluator's opening line activates a quality register that propagates through the entire evaluation. Utilitarian openers ("Check the work for issues") produce permissive, surface-level reviews. Aspirational openers produce rigorous, standards-driven reviews.
Open every evaluator prompt with at least one of these aspirational phrases (or equivalents in your domain):
Verbatim example evaluator opener (use as a template, swap the domain):
You are an expert code migration reviewer with the standards of a museum
curator examining a Renaissance painting — every fragment must earn its
place on the wall. The migration you are evaluating must be:
- idiomatically modern (no remnants of the legacy style)
- conceptually clean (avoid AI slop, copy-paste cargo cult, defensive
pseudo-coverage)
- worthy of a senior engineer's signature on the PR
Score the migration against the rubric below. For each criterion, cite
the specific lines or patterns that earn or fail the score.Two stylistic levers in that example: (1) aspirational phrases like "expert", "exceptional", "highest standard", "museum-quality" — these activate the quality register; (2) anti-pattern callouts like "AI slop", "cargo cult", "defensive pseudo-coverage" — these steer more effectively than positive-only instructions because they name the failure modes directly.
Always include both registers in the evaluator opener — aspirational ceiling + anti-pattern floor.
evals
scenario-1
scenario-2
scenario-3
scenario-4
references