Design, build, or audit a coding agent, agentic loop, tool-use harness, or autonomous coding system — covering loop architecture, action space, context strategy, observation formatting, evaluation, error handling, prompt engineering, and task decomposition. Use when the user wants to design an agent, build a coding agent, scaffold an agentic system, architect a tool-use loop, review an existing agent harness for improvements, fix context bloat or compaction problems, tune observation formatting or tool output handling, debug agent loop or termination issues, design a system prompt or evaluator prompt for an agent, set up or redesign an agent evaluation pipeline, plan multi-agent orchestration, or specify how an agent should manage context, tools, prompts, evaluation, or recovery (greenfield design or audit mode).
100
100%
Does it follow best practices?
Impact
100%
1.23xAverage score across 4 eval scenarios
Passed
No known issues
Greenfield harness design document
Architecture level stated
75%
100%
Separate evaluator
100%
100%
Loop pattern matched to duration
100%
100%
Generator-Critic iteration cap
50%
100%
Tool count within budget
0%
100%
Content-based file editing
37%
100%
Absolute filepaths required
0%
100%
Architectural loop detection
90%
100%
Layered termination mechanisms
75%
100%
Context management strategy
100%
100%
Output sections complete
62%
100%
Reasoning budget differentiated
25%
100%
Observation and context strategy design
Success silent, failure verbose
100%
100%
Token-based truncation limits
50%
100%
Truncation bands by output size
100%
100%
Error output preserved verbatim
60%
100%
FIC strategy for 3h+ tasks
66%
100%
Error history preserved during compaction
100%
100%
KV-cache preservation rules
100%
100%
Compaction preference hierarchy
50%
100%
Observation pipeline steps
100%
100%
Recovery hints on truncation
100%
100%
System prompt architecture design
Three-tier permission structure
100%
100%
Positive framing of restrictions
50%
100%
Concrete code examples in prompt
20%
100%
Conditional logic specified
75%
100%
Error handling specified
100%
100%
Code style with examples
0%
100%
Just-in-time steering present
10%
100%
Evaluator prompt produced
100%
100%
Evaluator prompt aspirational language
10%
100%
Reasoning differentiation stated
50%
100%
Format and implementation details left implicit
100%
100%
Evaluation pipeline architecture
Evaluation stage ordering
78%
100%
LLM evaluation stage last
100%
100%
Separate evaluator model
100%
100%
Holistic rubric presentation (CRE)
100%
100%
Calibration approach specified
100%
100%
Precision over catch rate
37%
100%
Iteration limit stated
100%
100%
Two-stage judge for developer-facing output
58%
100%
Self-evaluation bias addressed
100%
100%
Deterministic tests before LLM
100%
100%