Skills on Tessl: a developer-grade package manager for agent skillsLearn more
Logo
Back to podcasts

How Too Much Information Destroys Agent Performance

with Robert Brennan, and Itamar Friedman

Transcript

Chapters

Introduction
[00:00:00]
Parallel vs Sequential Agents
[00:02:33]
Model Selection per Role
[00:03:58]
Context: The Critical Factor
[00:07:45]
Scaling Code Maintenance with Robert Brennen
[00:11:04]
OpenHands SDK's Single Framework
[00:13:02]
Parallelizing at Scale
[00:14:41]
Building Trust & Starting Small
[00:16:35]

In this episode

In this episode from QCon, host Simon Maple speaks with Itamar Friedman, CEO of Qodo, and Robert Brennan, CEO of OpenHands, about advancing AI in software development through multi-agent systems. They explore how specialised, role-based agents can enhance code quality, manage context effectively, and automate repetitive tasks at scale, moving beyond singular AI "copilot" models. Discover how these systems can transform code maintenance in the cloud and why designing distinct agent roles, leveraging the right models, and orchestrating workflows are key to optimising developer productivity.

From the QCon hallway track, host Simon Maple catches up with two builders shaping how developers work with AI: Itamar Friedman, CEO of Qodo, on designing multi‑agent systems for the software development lifecycle, and Robert Brennan, CEO of OpenHands, on scaling code maintenance with AI orchestration in the cloud. The throughline: moving beyond single “copilot” usage into structured, role‑based agent systems that improve code quality, tame context, and automate the boring but necessary parts of engineering at scale.

Why Multi‑Agent Systems for the SDLC

Friedman frames multi‑agent systems as a direct response to code quality and lifecycle coverage. Instead of relying on one monolithic agent to “do everything,” software teams benefit from role‑specialised agents—planners, coders, reviewers, and security checkers—that mirror how effective human teams distribute work. This matters most in brownfield enterprise environments where compliance, standardization, and reliability are non‑negotiable. Recent research (including a new Google paper) and benchmarks like SWE‑bench reinforce the architectural choices needed to make these agents productive on real codebases.

Design choices start with execution patterns: agents that run in parallel for throughput, or sequentially for tighter control. Planning versus coding is a prime example—what happens when those agents disagree? Adding an arbitration step (a separate “decider” agent or a deterministic policy) prevents deadlock and ensures forward progress. Friedman emphasises that developing a multi‑agent system is less about stacking prompts and more about assembling complementary capabilities: different agents, different graphs, different contexts, different permissions.

Role Design and Architecture: One Agent vs Many

A common anti‑pattern is dressing a single agent up with multiple prompts and calling it a multi‑agent system. That can work for lightweight tasks, but complex SDLC workflows benefit from truly distinct agents with their own architectures and control flows. For coding, Friedman argues the core should preserve LLM creativity—developers constantly navigate ambiguities and invent workarounds. In practice, a coding agent’s graph leans heavily on the LLM’s generative reasoning, with access to tools like AST analysis or repo introspection.

By contrast, review and security agents should be much more structured. Think explicit checklists of standards and rules, deterministic validations, and narrow tool permissions. Instead of a loose “don’t do X” prompt, a security agent needs a “one‑minus” stance: enumerate exactly what must be checked (e.g., 100 rules), then allow creativity only outside those constraints. This shift from “creative generation” to “rigorous verification” often means the LLM’s role is smaller in the graph, and control logic does more of the heavy lifting.

The choice of model vendor and tool strategy also changes by role. Friedman notes Anthropic’s philosophy—keep tools simple, let the model do more of the thinking—maps well to coding agents. But don’t expect a single model plus a new prompt to excel equally at security review. Different vendors, prompts, tools, and permissioning should be treated as independent dials you tune per agent role.

Context Is “King and Kingdom”: How to Feed Agents the Right Facts

Developers consistently report context as the number one source of poor quality and hallucination in coding agents—by some surveys, 33–80% identify context gaps as the primary pain. Friedman’s advice: treat context as a system, not a single blob of “more code.” Useful context spans the codebase, vulnerability databases, best‑practice and style rules, dependency metadata, and stack‑specific standards. Qodo, for example, maintains a rules/best‑practices database per stack and learns from developer interactions to adapt what to surface.

But more context isn’t always better. There’s a spectrum: too little starves the agent; too much overwhelms it; a noisy middle includes the relevant material but pollutes it with non‑relevant details. The job of a context engine is retrieval and curation—ranking, cutting, and prioritising so the agent sees just what it needs, in the right order. This is the essence of the “context wars”: bandwidth is limited, and indiscriminate stuffing reduces quality.

Equally critical is how an agent uses context. Friedman points out variability across tools; give Cursor the same prompt and context repeatedly and you may see different outputs. That variability can be mitigated by more structured agent graphs—e.g., enforce checklists, verify against rules, and gate steps on concrete signals (tests pass, static analysis clean, policy checks satisfied). In other words, don’t just retrieve better context; make agents prove they consumed it correctly.

Scaling Code Maintenance in the Cloud with AI Agents

Brennan shifts the focus from day‑to‑day pairing on a laptop (e.g., OpenHands CLI or Claude Code) to high‑leverage automation in the cloud. Many maintenance tasks are repetitive and automatable across repositories: dependency management, open source vulnerability remediation, and larger migrations like Python 2→3, Java upgrades, or even COBOL→modern languages. These are exactly the kinds of jobs where agent orchestration shines and where running many agents in parallel pays dividends.

The goal is not to “one‑shot” a massive change, but to encode a repeatable strategy that scales: break down the work, parallelise across codebases, and automate 90% of the path with a human in the loop at the end for verification. Treat it like a production system—schedule runs, observe outcomes, and iterate. Over time, that orchestration layer becomes a factory for tech‑debt reduction, turning what used to be dreaded chores into predictable workflows.

This approach pairs well with enterprise CI/CD: agents open branches, run build/test pipelines, and submit PRs with linked evidence (test results, changelogs, CVE references). Humans focus on oversight—reviewing deltas, approving merges, and handling edge cases—rather than doing the rote work themselves. The productivity gain compounds when you apply the same playbook across dozens or hundreds of services.

Orchestrating Agents with OpenHands SDK and Inter‑Agent Protocols

On the implementation side, Brennan advocates for a unified framework that defines multiple agents—each with its own system prompt, toolset, and behavior—inside a single orchestration layer. OpenHands SDK enables this, including access to MCP servers to expose external tools and data sources, and the ability to wire agents together. While agent‑to‑agent communication is possible via emerging protocols, putting them in one framework simplifies coordination, logging, and control.

A practical orchestration pattern looks like this:

  • Planning agent scopes the work, proposes a migration strategy, and produces a task graph.
  • Worker coding agents execute changes in chunks (per package, per module, per repo), using creative LLM capabilities plus tools like AST analyzers and code search.
  • Review/security agents run structured checks: rulesets, vulnerability databases, style/standard compliance, and policy gates.
  • An arbitration or controller agent resolves disagreements (e.g., planner vs. reviewer), escalates to a human when confidence drops, and enforces stop/go thresholds.
  • Integration steps build, test, and package; successful runs become PRs with evidence; failures loop back into the plan.

Crucially, this isn’t just about “more agents.” It’s about role clarity, guardrails, and the right data at the right time. Designers should define explicit permissions (which repos, which tools, which environments), decide parallel vs. sequential phases, and set verification criteria per stage. When that foundation is in place, the choice of model vendor per role (e.g., Claude for coding; a different vendor for structured review) becomes a tactical optimization.

Key Takeaways

  • Start with roles, not prompts: design different agent graphs for planning, coding, review, and security. Coding agents need creative latitude; review/security agents need deterministic checklists and narrow permissions.
  • Expect and design for disagreement: introduce an arbiter agent or deterministic policies to resolve conflicts between planner, coder, and reviewer outputs.
  • Treat context as a product: retrieve from multiple sources (code, vulnerabilities, best‑practice rules), rank and trim aggressively, and verify that agents used the context via tests, policy checks, and structured gates.
  • Choose models per role: a single model rarely excels at every task. For coding, a model like Claude with strong reasoning and minimal tool complexity can work well; for security/review, lean into structured workflows and possibly different vendors.
  • Move maintenance to the cloud: orchestrate multi‑agent workflows for dependency updates, CVE remediation, and language/runtime upgrades. Aim for 90% automation with a human in the loop for final verification.
  • Use a unified framework: tools like OpenHands SDK let you define multiple agents with different prompts, tools, and MCP servers in one place, simplifying coordination, observability, and control.
  • Build guardrails into the graph: enforce tests, static checks, and policy gates between phases. Measure outcomes, capture evidence in PRs, and iterate on the orchestration plan.
  • Optimise for repeatability and scale: once a workflow works on one repo, parameterise it and run it across many codebases.

Chapters

Introduction
[00:00:00]
Parallel vs Sequential Agents
[00:02:33]
Model Selection per Role
[00:03:58]
Context: The Critical Factor
[00:07:45]
Scaling Code Maintenance with Robert Brennen
[00:11:04]
OpenHands SDK's Single Framework
[00:13:02]
Parallelizing at Scale
[00:14:41]
Building Trust & Starting Small
[00:16:35]