Why AI Coding Agents Could Wreck Your Codebase

New: Build your software factory with Tessl AgentLearn more

Careers Docs Book a Demo

Why AI Coding Agents Could Wreck Your Codebase

Spring creator Rod Johnson is back—and betting on deterministic AI. Why GOAP planning beats LLM chaos for enterprise agents, and why Java isn't going anywhere. #AINativeDev

5 May 202656 min 37 secwith Rod Johnson

Transcript

In this episode

Rod Johnson — the creator of Spring Framework and founder of Embabel — joins Simon Maple on the AI Native Dev Podcast to share his unfiltered take on where enterprise AI is actually heading.

In this episode, Rod breaks down why enterprises are making a huge mistake rewriting Java apps in Python, why vibe coding will destroy your codebase if left unchecked, and why this might be the last generation of frameworks that developers ever choose for themselves.

Rod also pulls back the curtain on Embabel — the new JVM-native agentic framework he's building — including how it borrows its planning algorithm from NPC AI in video games, why he's skeptical of MCP despite its hype, and the AI failure pattern he keeps seeing in large enterprises.

Whether you're a Java developer navigating the AI wave or a tech lead trying to figure out where to actually invest, this is essential listening.

Deterministic AI Planning and Why Java Still Matters: Rod Johnson on Embabel

The enterprise AI landscape has a strange contradiction at its center. Companies with decades of Java applications are being told to rebuild in Python for AI enablement, despite Python having no meaningful advantage for making HTTP calls to external models. The confusion between data science workloads and business application development has created costly rewrites that solve problems that do not actually exist.

In a recent episode of the AI Native Dev podcast, Simon Maple spoke with Rod Johnson, creator of Spring Framework and founder of Embabel. The conversation covered why deterministic planning matters for enterprise AI, the case against Python rewrites, and how to maintain architectural control when agents write most of your code.

The Python Rewrite Fallacy

Rod's argument against Python rewrites is straightforward: if your business logic already runs in Java, the LLM is not running in your process regardless of language. It is an HTTP call away.

"It's utterly baffling to me that people think any particular language has a natural advantage to make what is an incredibly simple HTTP call," Rod observed.

The confusion stems from conflating two different problem domains. Data science work involving TensorFlow, model training, and data processing legitimately benefits from Python's ecosystem. Enabling enterprise applications with AI capabilities does not. When you consider the critical adjacencies of a business problem, including existing codebases, enterprise services, databases, and compliance requirements, the case for staying in the existing stack becomes clear.

Rod pointed to OpenClaw as evidence: it is not written in Python. The choice of language for agent frameworks reflects developer preference, not technical necessity.

Goal-Oriented Action Planning

Embabel uses an approach called Goal-Oriented Action Planning (GOAP), borrowed from video game NPC AI. Unlike frameworks where the LLM decides what to do next, GOAP provides deterministic planning with clear traceability.

The system works by defining actions with preconditions and postconditions. The planner identifies a path from the current world state to a desired goal by chaining actions. After executing each action, it replans based on the actual world state rather than assuming the happy path. This enables the system to adapt when actions do not produce expected results.

The integration with Java's type system is central to the approach. Actions are defined as annotated Java methods, and the arguments and return types provide the information the planner needs for chaining. This means actions cannot be invoked without the required parameters being available, enforced at compile time.

The practical benefit is explainability. The system can show exactly what plan was formulated and what world state conditions triggered that plan. For enterprise applications where audit trails matter, this deterministic approach offers something LLM-controlled planning cannot: reproducibility.

There may be multiple routes to a goal, and the planner can choose the cheapest. Costs can even be dynamic, reflecting real-time conditions like system load. If one path becomes expensive, the planner automatically routes around it.

The MCP Skepticism

Rod expressed skepticism about MCP as a universal solution, which runs counter to much of the current conversation. His argument has two parts.

First, if you are AI-enabling an enterprise system already written in Java, why go through MCP? Frameworks like Embabel, Spring AI, or LangChain4j can expose Java methods as tools directly. You can expose tools on domain objects retrieved from repositories, something that would be difficult through MCP.

"Your first thought for any tool you want to expose is, 'I could just expose it using my stack,'" Rod noted. The question should be why you need the additional indirection.

Second, MCP is essentially another API specification, and we already have OpenAPI, Swagger, and GraphQL. The argument for MCP is that it is specifically designed for agents, but Rod suggested that the ideal tool interface for any given agent might be unique to that agent anyway.

His view is not that MCP is bad, but that it has become the only hammer people reach for. Plan A should always be exposing logic directly from the existing stack.

Maintaining Architectural Control

Rod writes perhaps 5% of his code directly, with agents generating the rest. But if you read Embabel's codebase, you would think he wrote it. The designs are distinctively his. He watches diffs, reviews output, and frequently stops agents to correct architectural decisions.

"You cannot vibe-code serious software," Rod stated. "I am a vigorous user of coding agents. But I am very much in control, and I find that from a design perspective, the agent more often gets it wrong than right."

The pattern he described involves design conversations with Claude before coding begins. The agent surfaces problems Rod had not considered, but original architectural ideas remain the human's responsibility. Left entirely to agents, quality degrades with each feature addition as designs accumulate compromises.

This aligns with the context engineering (https://claude.ai/blog/context-engineering-guide) approach of treating agents as highly capable implementers rather than architects. The human provides design direction and quality standards; the agent provides speed and consistency within those constraints.

Rod described frustration when working at the frontier of testing patterns. When building elaborate testing infrastructure involving real LLMs with Testcontainers databases, the agent struggled significantly. The pattern was too novel, outside its training distribution. Context and skills helped but did not fully solve the problem. Attention still falls off in long contexts.

Coding Agents and Language Choice

The conversation surfaced a counterintuitive observation about which languages coding agents handle best. Given the massive training corpus for Java and Python, you would expect those to be strongest. Instead, Rod found Kotlin consistently better.

The explanation: Java and Python have evolved significantly in recent years, but most training data reflects older idioms. When you tell an agent to use var, use type hints, or use enhanced switch statements, you are fighting against the weight of outdated examples. Kotlin started modern and has not accumulated as much legacy style in its training data. The people who adopted Kotlin early were also generally skilled, so there is less bad Kotlin code in the world.

This suggests that training corpus size is not the only factor in agent code quality. The distribution of that corpus matters, and languages with cleaner, more consistent codebases may produce better results despite smaller absolute numbers.

The Last Wave of Human-Chosen Frameworks

Looking forward, Rod offered a stark prediction: Embabel represents the last generation of frameworks that will be chosen by humans. Increasingly, tools and agents will make technology choices.

This shifts the value proposition for framework creators. Rather than marketing to developers who evaluate options, frameworks need to be the choice that agents make when given latitude. What makes a framework attractive to an LLM choosing how to implement a requirement may differ from what makes it attractive to a human reading documentation.

The immediate practical implication for enterprise teams is to resist the pressure to rewrite for AI. The business logic, domain models, and integration points that already exist in Java applications represent enormous accumulated value. AI enablement is primarily a matter of exposing that logic through appropriate interfaces, whether that is native tools, MCP servers, or direct API integration. The language of the interface call is largely irrelevant to the outcome.

The full conversation covers additional ground on Spring's survival through multiple acquisitions, why events should drive logging, and comparisons between TypeScript and Kotlin. Worth listening through for anyone navigating the intersection of established enterprise stacks and emerging AI capabilities.

CHAPTERS