
Stop Maintaining Your Code. Start Replacing It
Also available on
Transcript
[00:00:00] Simon Maple: Back in November, we hosted the first ever in-person AI native devcon in New York. This June 1st and 2nd, we are bringing it to London. It's two days built for AI-native developers and engineering teams. One day full of hands-on workshops and one day full of practical talks on agent skills, context engineering, agent orchestration and enablement platforms, and how teams are actually shipping AI in production.
[00:00:34] Simon Maple: Join us at the Brewery in London, near the Baran, for all of that, plus networking parties, giveaways, and a room full of people building the future of AI native development. You can also join us from anywhere in the world via the livestream. As you're listening to this podcast, you get 30% off your ticket with CodePod 30.
[00:00:54] Simon Maple: Just head to AI native devcon.io, and we'll see you in London.
[00:01:01] Guy Podjarny: Hello everyone. Thanks for tuning back into the AI native Dev. Today we are going to go, I don't know if it's far into the future, but definitely looking into the future or the concepts of it, with Chad Fowler, who is a VC today, but don't hold it against him. He's at Blue Yard Capital today.
[00:01:19] Guy Podjarny: He really has this sort of rich technical background on it. Some of you might know him as the person leading Ruby Central at the time when Ruby was all the rage, which feels like a while ago. Very innovative. I think you were a CTO at Wunderlist. I'm not sure how right it is to call that what you sold to Microsoft, and you were a CTO for a variety of departments and areas for Microsoft then, and generally, I think, a fairly well-known thought leader in the space.
[00:01:37] Guy Podjarny: Specifically, for this podcast, what pulled you in is that you've been writing some really interesting blog posts and little storylines and perspectives on the future of software development and where that is headed in the Phoenix Architecture blog, which is inconveniently posted on AI coding leaflets pub every time.
[00:02:09] Guy Podjarny: I have to find that, but really, really good blog, really, really good ideas, which we will talk about today. So Chad, thanks for coming onto the show here and sharing some perspectives.
[00:02:21] Chad Fowler: Yeah, thanks for having me. It's a totally exciting time to be talking about software development, so I want to do as much of it as I can.
[00:02:29] Guy Podjarny: Yeah, thanks for having me. It's a totally exciting time to be talking about software development, so I want to do as much of it as I can.
[00:02:40] Chad Fowler: Yeah.
[00:02:41] Guy Podjarny: So we're going to talk a lot about the Phoenix architecture, and I want to maybe dedicate some specific focus on what you mean by this.
[00:02:50] Guy Podjarny: But maybe for starters, I know there was a bit of an origin for all of this thinking, and you were thinking about code as disposable and these types of views for a while now. So can I, for starters, just have you tell us a little bit about the origin story of how you came to this line of thinking and how it came about?
[00:03:09] Chad Fowler: Yeah, I guess it started in the early two thousands. Somehow I ended up in this line of consulting and full-time jobs where my job was to euthanise an existing software system. So come find some system that's doing good work but can't keep up with the demands, like the technical demands or the business demands posed to it.
[00:03:36] Chad Fowler And so I'd come along and kill the systems, and I started getting really tired of that, because I was thinking, "What about my own systems that I create?" Think about how much time, effort, and emotion I put into my work. How can we build systems that can outlive us? Why are we building all these systems that are dying within, like, five years?
[00:03:53] Chad Fowler: It's insane. And so I started thinking a lot about legacy and what that word means, and also biological metaphors for software development. Because I thought, well, I don't take very great care of myself as a human, but I'm still here. How is that possible? We spend all this money on software systems, and they still sort of get corrupted and die and fail under their own weight.
[00:04:12] Chad Fowler: And a couple of key insights came to me, one of which I didn't really act on much over the years. The first one is just about infrastructure, and I started thinking about how we, in the nineties, would be excited about uptime for a given server.
[00:04:30] Chad Fowler: We installed these massive big Sun servers or whatever, and I remember one that had been up for like two years without being rebooted. Terrifying. But back then we were proud of it.
[00:04:40] Guy Podjarny: Yeah.
[00:04:40] Chad Fowler: And I realised that if you want to be able to change things, you need to do it all the time. And so this sort of behavior, whether it be infrastructure or software, would lead to situations where you just couldn't progress the system because you're not practicing change.
[00:04:50] Chad Fowler: So the first insight that I had in this legacy thinking is let's just kill things over and over and over, right, so that we can be used to killing. And I started with infrastructure, and now it's just old hat, the idea of immutable infrastructure.
[00:05:08] Chad Fowler: But never modify a server, always replace it, which of course you couldn't do with physical hardware, but you can do with containers and EC2 instances is what we were doing back then.instances, which
[00:05:19] Guy Podjarny: Right, but not just even for the purpose of learning and the agility, but also to know that you can do it; to really truly know that you're able to recover from a cyber failure is to do it all the time intentionally.
[00:05:32] Chad Fowler: Right. And why is that? Why is that the rule? It's because if you cannot change the system, it means you don't actually understand the system.
[00:05:43] Chad Fowler: So if you're changing it constantly, you understand what the requirements are. You have to. So the same should be true in software, but it was much more expensive 15 years ago or so to just throw away code all the time.
[00:05:56] Chad Fowler: But I was thinking a lot about what it would mean to have disposable software and immutable code. So as soon as you write one unit of work, whatever that is, you never modify it again. You just throw it away, because the unit of code itself is small enough and bounded enough that you can understand what it's supposed to do trivially, and therefore you can safely replace it.
[00:06:18] Chad Fowler: So this is the sort of backstory that when you put it against where we are in terms of LLM-generated code, it starts to click and you say, "Oh, okay." And we can go into that in more detail later, but that's sort of the origin story of this thinking.
[00:06:32] Guy Podjarny: No, I love that. And I love the analogy to immutable infrastructure, because I think so many DevOps analogies are helpful, at least to me or to people that have been through the DevOps change, to be able to apply those to AI and try and forecast where it's headed.
[00:06:49] Guy Podjarny: The fact that servers are statistical creatures, and they don't always behave the way we want them, and so we need to deal with a world in which it's not deterministic and to move those. So I love the immutable software, or rather the immutable infrastructure concept, and applying that to immutable software.
[00:07:08] Guy Podjarny: And think, okay, what have we learned about immutable infrastructure that we need to now represent here?
[00:07:15] Chad Fowler: Yeah. There's also, like, as you were talking, I was thinking, there's also this thing back in the day, like when I talked about how proud I was of my two-year uptime on a server, I also knew that server's name, and I knew its IP address.
[00:07:28] Chad Fowler: And I set it up myself physically. I cared about it. And so I was attached to this thing. And when you get to immutable infrastructure, you can never be attached to one running instance of your software stack. I think the same sort of lesson applies to code too.
[00:07:45] Chad Fowler: Yeah. That. We have all developed, especially developers, this almost fetish-like worship of our code. Yeah. Forgetting that the code is not actually the thing we're trying to build; it's just the implementation of the thing we're trying to build.
[00:07:59] Guy Podjarny: Yeah. Yeah. No, I love that. That's, you know, in DevOps, a whole ethos of cattle versus pets, right?
[00:08:04] Guy Podjarny: You have the ones that are nurturing as pets, which is what you're describing. And it is a really interesting thought to say, how do you think of code as cattle? And you just have plenty of it, and you keep swapping it, which was just not feasible before. Just like immutable infrastructure was not really feasible or practical, I guess, except for very unique circumstances, before the cloud enabled it.
[00:08:28] Chad Fowler: Yeah. So, I mean, how would you make it feasible even back then? As I said, I was starting to think about this. The first rule when I was working on Wunderlist was you can do code in any language you want, as long as it's no more than this big. And so I'd literally say this.
[00:08:46] Chad Fowler: So it just means, you mean
[00:08:47] Guy Podjarny: gesturing for a small amount? Just between
[00:08:49] Chad Fowler: Yeah, roughly. Roughly. Like it fits on a page of your editor. So if you have really good eyes, you can have more code, because the font can be smaller. But we actually did this, and we did this sort of radical microservices architecture.
[00:09:02] Chad Fowler: And there were at least a couple of cases where code was disposable because of this. We had consistent calling conventions and just a way of dealing with mutations offline, and we had a way of dealing with REST and CRUD sort of stuff through the system.
[00:09:20] Chad Fowler: And it was all consistent, and all the code was tiny, which meant we were able to replace things. So for the two versions of this one, I had written a service in Haskell, and I was one of only two people who would tolerate even looking at Haskell. Yeah, I loved it.
[00:09:37] Guy Podjarny: Write-only code.
[00:09:39] Chad Fowler: Yeah. Well, yeah, and in fact, in this case, that was the problem. So we had to change this code so infrequently that the build system became stale underneath it. And after we were acquired, we could not recompile the code anymore. It still worked. It was great, but the toolchain had broken underneath us because we didn't have that rigor of constantly changing it.
[00:10:02] Chad Fowler: So there was a problem with it at one point, and someone just said, "Well, we're just going to rewrite this in Go or something." So they rewrote it in like an afternoon and deployed it. It was fine. And the other thing was we had deployed a major release of Wunderlist, which was the crown achievement of what we did technically.
[00:10:19] Chad Fowler: And when we did it, we had literally hundreds of server instances running these heavy Ruby on Rails servers, because that's what we knew. It was the consistent choice. But then afterward, over the course of three months, we replaced something like 70% of the code base with Clojure, Go, Rust, et cetera, and made it, I think, cost like 25% of what it did at launch to run, because we were able to replace all these systems that were running Ruby, because they were all just tiny little things with consistent calling conventions and easy to plug in.
[00:10:52] Chad Fowler: That was our first attempt at doing this sort of disposable code.
[00:10:55] Guy Podjarny: Yeah. No, and I love that. And I think some of the kind of clear modularity goes all the way back to Java design patterns, just thinking about abstract factories and the creation of those, that at least from a dynamism of the software and the ability to hot load different components will come in.
[00:11:16] Guy Podjarny: And I guess we still think of the code itself oftentimes as the thing that is precious, even if we modularise it within the system. And I guess this is AI land or the AI era now, and so clearly that is now changing. So if you tell us a little bit about the Phoenix architecture, I love, in general, a lot of the storytelling you have on the blog, but some of these visual analogies are helpful and fun.
[00:11:43] Guy Podjarny: So you talk about Phoenix architecture, and you talk about regenerative software. So what's the philosophy behind that?
[00:11:51] Chad Fowler: So if you think about what I was just saying about consistent calling conventions and generally conventions for what the interfaces would be and how you think about CRUD operations on data, and this doesn't just apply to CRUD-based REST apps, but these are all ways of constraining a system so that you understand how the components interact with each other so that the components could then be potentially implemented in any language or framework or whatever.
[00:12:20] Chad Fowler: You still have the idea of how the system works. And one thing I used to always say to the team at Wunderlist is the code that we have is a liability, and the system is the asset that we're building. And the Phoenix architecture takes this and sort of codifies it.
[00:12:39] Chad Fowler: And I also have a not-yet-released, but eventually open-source project called Phoenix, which is the actual implementation of all this stuff I'm talking about. But it codifies it, and it says any software in the system you should think about like it's a build artifact. So imagine you have a Makefile or Ant file or whatever, the dependency-oriented build system thing.
[00:13:02] Chad Fowler: As you know, there's always some top-level asset that you start with. Usually it's code in a high-level language, and you run a thing, and it understands the path, and it can compile all the way through. It can invalidate any object code that's changed if you're talking about a truly compiled system or a VM-type system like JVM, and it can replace just the paths that need to be changed.
[00:13:26] Simon Maple: Mm-hmm.
[00:13:27] Chad Fowler: But the top-level asset that you're building in that case is the Java code or the C code or the Rust code. Phoenix architecture takes the same exact idea, but it says ideally what you should have is some sort of a specification for the system, and then the code you should think about as just an implementation detail or a build artifact.
[00:13:46] Guy Podjarny: Right?
[00:13:47] Chad Fowler: And the reason it's called Phoenix architecture is that you have the metaphor of the Phoenix that's destroyed and rebuilt and rises from the ashes, but the entire architecture is about constantly replacing every piece of the system, but not replacing the system as a whole, because you need that sort of consistent shape and calling convention for interacting with the system generatively.
[00:14:11] Chad Fowler: And so that's sort of the key thing. If we want to be able to change systems safely and rapidly, and we want that to be how we interact with our systems, and they're very change-friendly, then I take a lesson from Extreme Programming, Kent Beck, and all those people in the early agile days: if something's hard, just do it all the time.
[00:14:35] Chad Fowler: So if testing is hard, then do it all the time. If code review is hard or expensive, then just do it all the time by doing pair programming. The idea here is if change is hard, then just build it into the system so you have no choice but to constantly change. Right?
[00:14:50] Chad Fowler: And you can also never actually look at the code. And I guess I should back up and say the goal here is to be able to deploy code in production that was generated by AI of some sort that humans never reviewed. This would have sounded crazier six months ago. Now a lot of people are doing this, but at scale, and big enterprise systems are very complex, older systems.
[00:15:13] Chad Fowler: Of course you can't do this yet. So if that is the goal, why would that be the goal? Because that's what's going to happen. We know that's what's going to happen, not because we're looking at the future and looking at technology trends, but because we know human behavior, and it's way too easy to do that now.
[00:15:28] Chad Fowler: So people are always going to do the easy things. So let's figure out systems that make the easy thing okay to do.
[00:15:34] Guy Podjarny: Yeah, I think it's also the opportunity. It's not just that it will happen; it's also the means of staying competitive is to be able to tap into that and to take the human out of the equation.
[00:15:44] Guy Podjarny: But I, you know, so as someone has been saying, code is gonna become disposable a couple of years ago and kind of was dismissed as heresy not that long ago, like sort of two years ago; today doesn't feel as far-fetched. But let me ask about a few challenges within that world.
[00:16:01] Guy Podjarny: So I think the most blatant one is what is sufficient detail in the spec, and at what point does that sufficient detail just become a different piece of code that you've written? And all you've done now is create some sort of template that's at the top. How would you define the right level of specificity in the spec, and which parts are you happy to delegate to the implementer that is the agent?
[00:16:28] Chad Fowler: That's a good question. And the answer changes weekly, kind of. Since you've been talking about this for at least two years, it probably means four years ago you were messing around with code generation and in the frustration loop constantly, like, "Oh my God, it's magic," and then it's not magic anymore; it fails.
[00:16:46] Chad Fowler: And I think the last time I felt that way was like September or October with the tools available, and now I don't feel that way anymore. But I know it doesn't sound like I'm answering the question, but the reason I'm saying this is there's a whole lot more you can one-shot with very little detail these days.
[00:17:03] Chad Fowler: However, like forgetting about the fact that it just keeps getting smarter and smarter, I think the answer for me is that I want to iterate until I see the thing change, right? So like I'm working on a new IRC network, for example, a reboot of IRC right now. I didn't one-shot it.
[00:17:25] Chad Fowler: I've got all sorts of interesting technical things in it. It's called Freak, by the way. And when I worked on it, I didn't create a spec that had a great level of specificity, but I iterated until I had a spec that was enough. Mm-hmm. And I knew it was enough because I saw the evidence in the software.
[00:17:44] Chad Fowler: Right. So it's kind of like you go back to the early two thousands, and we had this thing called Naked Objects in Java where you just generate Java objects, and it creates the UI for you on the fly. Mm-hmm. And the whole idea was a new agile way of working with customers and software where you just pop things open and show that to them and iterate like that, sitting with the customer instead of going in these long project things.
[00:18:06] Guy Podjarny: Right. Interact with the data model. Yeah.
[00:18:08] Chad Fowler: Yeah. And I think it's the same sort of thing now where you can iterate and iterate. The problem this creates, though, is if you want a system that can survive change over time, you can't just regenerate the entire thing constantly.
[00:18:21] Guy Podjarny: Yeah.
[00:18:21] Chad Fowler: So you need to be able to iterate, but you need to have some way of being very clear about which things are changing. So there are two issues. One, if a user knows how to use an application and it changes every day because you've changed the spec and regenerated it and reimagined an entire UI, that's not okay. Even if the UI is great, even if it's better tomorrow than it was today, users have to actually understand how to use software.
[00:18:47] Chad Fowler: And the other is you just don't want things that are core to the technical foundation of your system to change constantly because they might be wrong. And that's where this level of specificity matters a lot. So an example of this is I'm building this IRC network, so I've got an IRC server that implements the protocol, and then I've got protocol layers on top and another layer, and then I've got clients and client SDKs.
[00:19:17] Chad Fowler: At the very bottom level, I want to get that right quickly and just set it in stone and probably never change it unless there's a mistake or we want to upgrade the IRC protocol. And so the best way to do that would be if there were some set of specifications that we iterate on that we can literally tie through some build graph back to the code that gets generated.
[00:19:40] Chad Fowler: And it never has to change unless the dependency tree changes before it. So we can change the protocol all we want in the beginning, and then we sort of lock it because we never change even the vague specs we created that happened to create a thing that was good enough. This sounds pretty scary.
[00:19:55] Chad Fowler: And then you just work in the layer that makes sense to do a lot of change. There's a term, and I remember the person's name who coined it, called "pace layers." Sort of codifies this idea of different speeds of change and layers of a software architecture.
[00:20:13] Guy Podjarny: Yeah, I think so, so I've done a fair bit of thinking of this, and in the Tesla framework that we've since kind of parked as we focus on Tesla and context management, and we can get a little bit into the delta there.
[00:20:25] Guy Podjarny: We had a lot of these elements, and we talked precisely about that, in that you will create a spec, and the spec can be at whatever level of resolution you reach software that you bless by using it and looking at it and deciding that it is good enough. And then from here you have the question of stable regeneration.
[00:20:46] Guy Podjarny: So when you regenerate it, you want stability. You might have a button, you might not care what color it is, and you're happy to delegate that to the agent, but you don't want a change in colors every version. And so you want some stability there. And so you want that, and the challenge, of course, is figuring out what the aspects are that you want to stabilise.
[00:21:08] Guy Podjarny: Because increasingly those change, like maybe you want the button to be the same color unless it has a good reason to change because you've just changed all of the theme colors, and so the button should also change in that process.
[00:21:27] Guy Podjarny: And we actually introduced, you use the word "lock," we had in the system tests for the different parts of the spec, and you could lock a test to imply it. And so it is very interesting. And I think probably a bit of an unsolved and an interesting interaction model as well of if you were to create these different things, and let's say your information architecture allowed that, what is the sort of user interaction that you would have with these assets?
[00:21:50] Guy Podjarny: And maybe I'll throw in one more, which is I think there's also an element of who made the decision that is more important. If I said something, I, the human in control who is paying the bills and whose neck is on the line, that should be more holy than if the agent had made a decision, and that decision should be more easily overwritten because it was just made at the time.
[00:22:20] Guy Podjarny: So there's a lot to sort of figure out. I don't know if you would anticipate new programming; this is like a frequent question that I get as well. Would you anticipate these things manifesting in new programming language paradigms, or are these just patterns that will eventually be applied by the agents?
[00:22:40] Chad Fowler: They will manifest in programming languages, but I think that's also a step along the way to where it goes, because I think what we're gonna find is that any programming language we create is gonna be trying to solve problems that LLMs can't currently solve with plain language, and then they'll become obsolete.
[00:23:01] Chad Fowler: My first assumption when I started doing stuff with ChatGPT, in the early ChatGPT days, was we should be using Haskell or OWL and RSS, like ontologies, and I had all these ideas about static typing being better. It just doesn't matter, actually.
[00:23:20] Chad Fowler: But I think what we need instead are different programming paradigms and patterns that aren't languages. So my work right now on my implementation of this Phoenix architecture is focused on going from raw human specs to refined, which are called "clauses" in my system, where you can canonise actual intents out, and then you can do hash-based tracking of these intents, all the intents all the way through cryptographic hashes, which gives you that build system thing.
[00:23:55] Chad Fowler: But it also gives the ability to add provenance because I absolutely agree. Like capturing intents of the system and capturing who made them and when and why are really critical, both in knowing what to regenerate and also in just figuring out how you would remove things from a system. To me that's the biggest problem because we're generating things at such speed.
[00:24:21] Chad Fowler: You're gonna end up creating absolute masses of garbage. And if the agents did it and not the person, it's probably okay to delete. There are a lot of other factors you have to check. But on my IRC system, Freak, I actually have two parallel web apps by accident. I have a really nice one that I've been iterating with, and then one that the agents just decided they were going to make, and it's linked some places, so I have to remove it.
[00:24:46] Chad Fowler: But at least I know in this case that I made this decision. But I think what's really interesting is the combination of what we were just talking about with specs, et cetera, and then capturing the human interaction with the LLM as the code is being built. I know there's a bunch of work going on here doing this.
[00:25:05] Chad Fowler: And to me that's the important thing: how do we figure out how to map from intent all the way through to object of code, whatever that means, implementation unit in as clear and replaceable a unit as possible? Then we have the ability to do this regenerative software thing for real.
[00:25:24] Guy Podjarny: Throughout, I think it's super interesting, and I guess my thesis also would be that there are actually three layers of human condoning or blessing. There's the explicit instruction that came from the human developer. There's the explicit review. I actually read this and said yes, which is a bit tricky because did you really read it?
[00:25:43] Guy Podjarny: I don't know. Like you technically said, "Is this okay? Should I go?" and you said yes. And then there's the no human interaction. The agents just decided those. And yeah, we had a lot of those. We, by the way, refer to them as shadow specs. There's the spec and the shadow spec, which show the decisions you never made but have been made for you.
[00:26:03] Chad Fowler: Yeah, that's a good term. I like it. And I must admit, most of the work that I do, I've got mostly shadow specs with really high-level stupid instructions these days. And I go YOLO, so my review, my explicit review, I did not read it. I scanned through. I'm like, Yeah, that sounds right.
[00:26:20] Chad Fowler: Exactly.
[00:26:21] Guy Podjarny: Yeah. And there's, again, these two levels. I don't know how to name them. They're sort of levels of shadow, like the shadow and the dark spec. I don't know. But the ones that you in theory reviewed and the decisions that were never presented to you, they were just made.
[00:26:32] Guy Podjarny: Yeah. And you could have proactively gone and reviewed, but you didn't, bless.
[00:26:36] Chad Fowler: Asking about programming languages reminds me of a conversation I had earlier this week with a real language innovator. I won't say who it is because I didn't ask if I could, but we were talking about specifically as I take all these stupid ideas I have and I try to make them into real code, then I can create projects. One of the first things that I ran into was that I keep seeing consistent calling conventions and ways of dealing with mutations in the system asynchronously.
[00:27:06] Chad Fowler: There's some sort of message bus thing. There are these things that I have in my head for what you need for a Phoenix-type architecture to work. And the immediate thing that I ran up against when I started generating this is, oh, I actually have to choose what language and framework. Let's say I'm going to build a web app. What MVC framework is it going to be in, what programming language, and what runtime?
[00:27:21] Chad Fowler: And I reached the conclusion that I should just decide this for the entire system, for any code that's generated. So we can start with web apps. I'll just decide, okay, what's the best framework for generated code for a web app and AI today?
[00:27:44] Chad Fowler: I think there are two answers to it. One is whatever the LLM knows, so probably two-year-old Next.js or something, React. Not things I like. It makes me sort of ill to say this.
[00:27:58] Guy Podjarny: Yeah.
[00:27:58] Guy Podjarny: But it's whatever it is that has this sort of sweetheart of the weights at the moment.
[00:28:02] Chad Fowler: Exactly.
[00:28:02] Guy Podjarny: Which tends to persist historical popularity.
[00:28:08] Chad Fowler: Yes. Well, now it's really gonna lock in forever because we're gonna keep regenerating because everything is getting duplicated and it fits itself.
[00:28:14] Guy Podjarny: Yeah. Yeah.
[00:28:15] Chad Fowler: But what really matters now is, I was all into Ruby and Rails and evangelism and all this stuff back in the day, as you mentioned. I would not do that today.
[00:28:25] Chad Fowler: I would not tie myself to such a thing, because what matters is the code has to be able to be correct. It has to be as cheap as possible to run. It has to perform as well as it can. It does not have to be human-friendly. It doesn't have to be trendy. It doesn't even have to have unit tests.
[00:28:44] Guy Podjarny: Yeah.
[00:28:45] Chad Fowler: Yeah. It's okay.
[00:28:45] Chad Fowler: So what I realised is this Phoenix architecture, I don't think you'll decide when you use it what the framework is or the language that it generates. You shouldn't. That should also be a thing that's completely open to change.
[00:29:06] Chad Fowler: So what we'll be compiling to is more system architectures, a shape of the system that allows us to have pluggable, replaceable components because of the shape of the system, because of the calling conventions and the consistency, and probably some sort of type system.
[00:29:22] Chad Fowler: There will be these sorts of properties, but they won't be about the programming language. They'll be about how this thing runs and how the components interact.
[00:29:26] Guy Podjarny: And I feel like when you work with humans and people on your team, the best developers that you have on the team, or even yourself and your future self on it, it's not so much that you expect them to truly represent these definitions of your software or the historical decisions that you've made and that need to bless and all that, but rather that you expect them to combine intelligence and information, to combine smart decision-making and conclusion drawing with the right context. We're using a term that I think is good for humans and agents to put the emphasis in the right spots.
[00:30:18] Guy Podjarny: And I feel like when you work with humans and with people on your team, the best developers that you have on the team, or even yourself and your future self on it, it's not so much that you expect them to truly represent these definitions of your software, right?
[00:30:39] Guy Podjarny: Or the historical decisions that you've made and that need to be blessed and all that, but rather that you expect them to combine intelligence and information, to combine smart decision-making and conclusion-drawing with the right context. We're using a term that I think is good for humans and agents to put the emphasis in the right spots.
[00:31:05] Guy Podjarny: They should know how much you care about finance and budgets versus how much you care about speed and latency. How much you care about functional programming versus how much you care about Ruby, how much you care about security versus just all of these different preferences, and even your business surroundings.
[00:31:30] Guy Podjarny: So it's sort of an understanding of your constraint. And sometimes those constraints may include it has to be written in this language, right? Maybe if you are building within Microsoft and someone condones it, well, this has to run on Windows.
[00:31:45] Guy Podjarny: So that's a constraint, and that limits some of your options.
[00:31:49] Guy Podjarny: So I don't know, I really love and still expect a representation of these things to manifest, but I wonder where it is that we, even if we're thinking ahead, are not thinking sufficiently ahead and we still like the idea of a data model that can represent robustness, right, or some semblance of determinism when really it's more about making good decisions.
[00:32:18] Chad Fowler: Yeah. I mean, I guess just a random thought from what you were talking about is I think instead of specs, we need to be thinking about building knowledge graphs, and that sounds probably a little bit too inflated, but knowledge graphs of all of these things, whether it be the values and variance and intents of the person. The beautiful thing with LLMs is they can actually get in mind this stuff from you very easily.
[00:32:48] Chad Fowler: Yeah. They can say, "I feel like you're missing this," which is sort of how I did it with the tests. I've been doing this Phoenix architecture so far, and variants are part of the system, and metrics of success are part of the system. And you're right that there's a deploy target or build target or whatever.
[00:33:06] Chad Fowler: Clearly there's still a Windows app sometimes, and there's still a Mac app sometimes. The way I'm thinking about it, there's probably just a set of patterns that I'll say I create for myself with this Phoenix thing, but everyone's gonna have their own center of gravity of ways of doing this that they adhere to.
[00:33:25] Chad Fowler: But for mine it would be a set of patterns that apply to native macOS apps, native iOS apps for now, web apps, two e-apps for consoles or whatever, server-based things, and even distributed systems so that I have an architecture that I compile into, which can include things like platform.
[00:33:48] Chad Fowler: For most of the cases for me, I'll be pretty agnostic about platform and language, but then you could have different constraints that you add to the system. Yeah, very hand-wavy abstract stuff because we're not there yet. I mean, the other thing was we are talking about apps that manifest, but maybe there's just a data substrate and apps manifest on the fly.
[00:34:12] Chad Fowler: It could be visual or it could be LLMs, whatever conversation.
[00:34:15] Guy Podjarny: Yeah. There's always still an efficiency element there. So I really like, I also, I don't want to challenge, I feel either way we need these data structures for it. And I think another lesson learned a little bit is that you can't be too far out into the future. You have to take a little bit about it.
[00:34:30] Guy Podjarny: And so I think the Phoenix architecture is already fairly dramatically innovative and a bit of a mental switch to embrace for anybody in the current software development world. So even if it is a stepping stone as you think ahead, it is very likely to be a kind of necessary stepping stone.
[00:34:52] Guy Podjarny: I also really like how it enables something else that I think about, which is personalised software, and the idea that because I find increasingly, I don't know if you feel the same, I use a bunch of tools that I really like, and I don't really have a problem with them. I don't want to vibe code an alternative, even if I could.
[00:35:09] Guy Podjarny: It's good. I don't mind paying them a little bit. But what I increasingly get frustrated about is that I can't customise them. I want to add, they're not doing it; they don't have this feature, and I'm annoyed. And it's almost to the level that I was like, "You know what, I'm gonna write one of my own just because I want that feature."
[00:35:22] Guy Podjarny: And I guess I'm imagining a world in which you, the author, can produce a piece of software that has some of these definitions of what you can muck about with and what you cannot, what the essence of this piece of software is, and it has to be embedded.
[00:35:41] Guy Podjarny: And what is it that, Chad, if you want to add a different UI, a different way about this? Maybe this is one that we do know we can do because of APIs and all that, but you can modify it. Have you thought about the Phoenix architecture or being an actual distributable software that you're sending off to users?
[00:36:05] Chad Fowler: Yeah, I think so. Although it's not that I disagree, the way I've been thinking about this is more from the data model up. And it comes from the local-first community specifically. I think Ink & Switch wrote a good thing called "Malleable Software" about some of this. In the local-first community, it's all about your data being local; it's on your device.
[00:36:31] Chad Fowler: People write applications that interact with that data, and then you have CRDTs for synchronising this stuff. So why wouldn't you have a shared data model across? And I'm sort of not answering a question you didn't ask, but imagine you had a shared data model of all the sort of common things you need to do.
[00:36:51] Chad Fowler: And anyone could write software that allows you to interact with your shared data models, or it could be your productivity stuff, like meetings and tasks and emails and stuff. If it's not in a cloud and it's not on some private server that you have to deploy to, but rather it's software that you run locally, then manifesting new interfaces to these things is sort of just like an assumption.
[00:37:13] Chad Fowler: Because of third parties, you know, maybe one company writes software that you start using, let's say Linear, which is actually local-first software; someone else could then come along very easily and write a second interface to that same stuff. And Linear can't stop you. They wouldn't want to; that's part of the point of doing local-first software.
[00:37:33] Chad Fowler And maybe I'm just an old stupid software developer that's still chasing this idea of reusability and the semantic web and everything. But you know what? If we could now think industry-wide, or at least in pockets of consortia, about doing local-first as a default and having shared data models by default, schemas, JSON Schema, or whatever it is, schema.org.
[00:37:59] Chad Fowler: That is so that new vibe-coded apps just talk to this stuff, and there's an assumption that it's there. I think this is really interesting. The other idea of constraining which parts are malleable is interesting, although I don't see why there should be a limit to which pieces of the software you should be able to change.
[00:38:22] Chad Fowler: Though maybe from an IT department perspective, I do see it, because someone calls and says my software doesn't work because I changed it. That's a bad thing. Yes.
[00:38:32] Guy Podjarny: But part of it is not so much the constraint about what you're allowed and not allowed to change. At least a portion of it is just if you invested in Wunderlist and you've spent this deep thinking about choosing the right functionality or visuals or means of it, or the things that will drive the correct kind of to-do list handling.
[00:38:52] Guy Podjarny: And so I don't want to lose that. I just want to add this feature, so it's more like de-risking me. When I added my sort of experts to my own social network button that didn't exist on it because I wanted to extend it, can you help me preserve Chad's brain when I get the software from you and just modify it?
[00:39:13] Guy Podjarny: Big software does this, right? Like CRMs and all of those, you would customise your organisation. I just find it interesting to think about.
[00:39:19] Chad Fowler: Yeah, it is a great idea because one of the things that we all, when you start talking about regenerative software, the immediate pushback is you can't just regenerate the whole thing. The users won't know how to use it.
[00:39:28] Chad Fowler: So it's true that if you could define even pace layers within the UI, maybe this panel and this panel or these types of panels are things that are always extensible; it makes sense. And you're right, it's like a lesson from the nineties or eighties; this has been a thing in enterprise software forever.
[00:39:47] Chad Fowler: It's usually not because they want to preserve the experience, because the experience is usually not very good, but everyone needs to customise them.
[00:39:54] Guy Podjarny: To add constraints. And I love the idea of pace layers, and to be clear, I'm a huge fan of reusability. I think even if you get to the point in which LLMs and agents can create everything from scratch with a very high reliability, that is so massively inefficient.
[00:40:09] Guy Podjarny: I want to create a to-do app, and it needs to build a kernel. Why would it do that? And familiarity between different systems, consistency within enterprise systems, and just mutual learnings; when there's a bug, I don't want everybody to have all of these problems.
[00:40:28] Guy Podjarny: So
[00:40:29] Chad Fowler: Yeah, the efficiency thing is a really interesting one. I was talking a couple weeks ago with one of the founders of Radicle, which is this open distributed GitHub sort of replacement. He is now working on something called Radiant Computer, and it's like a ground-up operating system that he's building.
[00:40:50] Chad Fowler: And he's thinking a lot about LLMs, and he wrote a piece about how you can reduce the surface area of how you interact with something, let's say an operating system, to the benefit of the efficiency of LLM use. Because part of what he's doing is everything is private by default, and you can run local models, et cetera.
[00:41:11] Chad Fowler: Of course, you're dealing with local power constraints and the fact that the models aren't quite as good as the frontier models.
[00:41:17] Chad Fowler: And it's an obvious thing to reach, but somehow there's this beauty in the insight that if you start with an open playing field and then you create this constrained language that actually expresses what the system's supposed to do, and I think this is a version of what you're talking about, then it becomes a very easy thing for the LLMs to interact with.
[00:41:35] Chad Fowler: And you can do this over time with every system where it starts with this open field and then it becomes, it congeals. Maybe this is what programming languages are in the future. You asked about new programming languages. I don't know.
[00:41:50] Chad Fowler: There's another guy, Guy Steele, who is a well-known programming language researcher. He did an amazing talk at OOPSLA in, like, '98, I think, called "Growing a Language," that every generation I reintroduce to people because it's always relevant somehow.
[00:42:13] Chad Fowler: But the talk starts with him defining a subset of English from scratch that he can use for the talk. And it's all this big lesson on building up vocabulary and building up foundations so you can then be expressive. And maybe that's what we end up evolving toward, because in this agent, this rapid change, rapid software creation environment, we should be able to create tons of reusable vocabulary like this.
[00:42:41] Chad Fowler And we should be able to easily differentiate between what's reusable for me, what's reusable for my organisation, and what's reusable for the world. And the ideal state is that we figure out markets for this or ways of sharing this stuff.
[00:42:54] Guy Podjarny: Yeah, absolutely. And we've gone through all of that type of thinking as well at Tessel in the early days.
[00:43:00] Guy Podjarny: Like, create an HTML-to-Markdown converter for me. Sounds like a simple spec. Which HTML do you mean? Which Markdown do you mean? On it, to be loaded in a browser, which browser?
[00:43:22] Guy Podjarny: And these things rot over time. Like whatever you interpreted today, LLMs will interpret the thing from two years ago. Well, I guess maybe it stays the same because it's persisting itself, but over time the definition.
[00:43:30] Simon Maple: Hey everyone, hope you're enjoying the episode so far. Our team is working really hard behind the scenes to bring you the best guests so we can have the most informative conversations about agentic development.
[00:43:42] Simon Maple: Whether that's talking about the latest tools, the most efficient workflows, or defining best practices. But for whatever reason, many of you have yet to subscribe to the channel. If you are enjoying the podcast and want us to continue to bring you the very best content, please do us a favor and hit that subscribe button.
[00:43:58] Simon Maple: It really does make a difference and lets us continue to improve the quality of our guests and build an even better product for you. Alright. Back to the episode.
[00:44:07] Guy Podjarny: So I want to ask you about another set of posts that you have there. There's more about evaluations and durability. And so if everything is regenerated, and as we've been talking here, that's not always ideal, and there's some accumulation of learnings and preferences and distributions.
[00:44:28] Guy Podjarny: You have a statement that said evaluations are the real code base; they are, in one of these posts. I guess, how do you think about durability? What is it that, as I build IP or as I build anything, persists across?
[00:44:46] Chad Fowler: Yeah. So I do say this thing about evaluations, and it might be an overloaded word where it sounds like I'm talking about some modern definition of evaluations in the context of LLMs or ML.
[00:44:58] Chad Fowler: And I'm not, although I also am; I'm really talking about invariance, metrics, constraints, and all these things that we were just talking about. You know, like to take a stupid example just because it'll be easy to talk about, let's say you have, I don't know, an order record in a database and you're just talking about the CRUD operations for orders in an e-commerce system, you know, there would be correctness requirements.
[00:45:28] Chad Fowler: Obviously there might be side effects that you expect when an order is placed that should also happen that you can measure the fact that that happens. There might be speed constraints, there might be memory constraints or IO, or all these sorts of system-level things. And to me, this set of stuff, this is kind of why I was talking about knowledge graphs earlier versus specs.
[00:45:53] Chad Fowler: Because I think you have to create this knowledge graph, which may be pretty inscrutable, and we probably need tools for dealing with this too.
[00:46:00] Chad Fowler: But it captures all this intent that's both business intent and technical intent. Now in my old naive way of doing things, which is still the way I would do it for now.
[00:46:12] Chad Fowler: I would always talk to my teams about focusing on mean time to resolution of problems versus mean time between failure, which was another way of saying, "Don't worry so much about testing everything upfront, but let's just make sure everything is really well instrumented so that we can easily roll back."
[00:46:32] Chad Fowler: And so my way of changing software is YOLO in production, just to overstate the point.
[00:46:39] Guy Podjarny: Yeah.
[00:46:40] Chad Fowler: And then immediately respond to the live metrics and the real state of the system via intense observability at all layers, including business and technical. I think there's probably some aspect of that still, at least for now, in a system that's going to be changing rapidly like this.
[00:46:52] Chad Fowler: The other thing is you can create tests, and you can do it both before and after the fact. And you can do it really rapidly with AI now. So I think there's a thing where, and you were talking about locked tests, I think, in your way.
[00:47:13] Guy Podjarny: Yeah, because of tests. Otherwise you don't know what to trust. You take some specs, you create code, and you create tests. They disagree. Who's right?
[00:47:20] Chad Fowler: Exactly. Yeah. My way of thinking about it right now is there's at least a layer of tests that everything is locked.
[00:47:30] Chad Fowler: There is a differentiation between locked and not locked. Of course, I'm not writing tests anymore. I'm not writing any of the code. But I'm explicitly talking to the agents about the purpose of the tests, and the purpose of the tests is based on our current understanding of the system.
[00:47:46] Chad Fowler: Like in my IRC system, for example, I will go through runs where I have it just look for all of the possible edge cases we might not have thought of for a certain class of bugs. And I get multiple models multiple times to do this. And then I review it, and I have it create tests that discover bugs, and then I have it work until the tests run all the time.
[00:48:09] Chad Fowler: And I keep this layer locked all the time. So that's sort of the way I'm thinking about evaluations now. I think probably it goes further when we can collectively spend more time on specs to canonise requirements to invariance and this sort of chain, because that gives you something to canonise and reason about in a more automated way. But right now, I think just this sort of metrics plus tests is the way to go.
[00:48:36] Guy Podjarny: I think I am a big believer in evaluations. I think in the first year, if you will, I was thinking a lot about tests as the things that persist.
[00:48:48] Guy Podjarny: And you'll have tests; you will lock tests. This is the behavior of the functionality. Increasingly in my current worldview, I think more about making good decisions. And so the tests that I want are more the tests of my agent's behavior as opposed to the tests of the system that has been built.
[00:49:06] Guy Podjarny: And I guess I perceive that to be a higher leverage point because you are making the developer that you want versus making the system that you want. And the developer that you want is actually much more powerful than a system, because they can create the system that you want at a fairly high fidelity.
[00:49:36] Guy Podjarny: And what's interesting is the same concept that you said right now is true though. There's still an element of you having to capture and aggregate what the evaluations are that represent what you want. And you probably have to accumulate them, and over time you need to curate them, which is still something we don't know how to deal with yet.
[00:49:56] Guy Podjarny: You know, we have to, but at least the valuations speak in slightly higher-level terms of like, when this happens, the developer behaves like this, but there is still a volume challenge of how many of those do you assess? But then once you get that the instructions, the specs actually are malleable themselves.
[00:50:20] Guy Podjarny: They also change with the models, right? Like the way that you would instruct, haiku is different, effectively, than the way you would instruct Opus. And we see already that, you know, Opus is very creative, but it's a bit of a smartass, and it doesn't follow instructions. And, you know, a codex is less creative, but it is more obedient, you know, in that sense.
[00:50:40] Guy Podjarny: And, you know, Gemini has its own strengths, and just each of these things is different. And so when you think about these, if you really continue down the same rabbit hole and you continue, you move from programming languages to programmer language, right? It's the language with which you instruct the agent versus the language with which you define the system.
[00:51:01] Chad Fowler: That's really an interesting approach. And I feel this, like I switched from Opus to another model trying to arbitrage credits, yeah, a week ago. And I had a very frustrating day until I woke up the next morning and realised, "Oh, I changed the model.” That's why it feels like, you know, 2025 again. I blamed the model.
[00:51:22] Chad Fowler: I probably should blame myself for the way I interact with the model. However, um, that actually sort of makes a point, like when you said try and make the best programmer. I think this is a really cool approach, and it's absolutely the opposite of the way I think about it. I just assume the programmers are both idiots now and going to be geniuses later, and they're going to progress at different paces.
[00:51:44] Chad Fowler: Depending on the model. And so I lean toward trying to create a system of code generation, and maybe that's the programmer, I don't know, but it's not an intelligence, you know. But a system that allows me to plug in the most idiotic LLM here and the genius here and still have some results that I feel I can trust, but also decouples me from thinking about the models underneath.
[00:52:02] Chad Fowler: They're just gonna keep changing and getting better. Of course, eventually all this stuff that we're talking about is gonna be irrelevant and obsolete, 'cause the models are just going to-
[00:52:16] Guy Podjarny: There's a different point in the future, which is, you know, having kind of a more intelligent conversation by a couple of AI models than what we're sort of having here.
[00:52:25] Guy Podjarny: But yeah, and I think you need both, because I think amidst the team, you need some form of definition of the output of that, you know, intelligent programmer that captures the definition of your system and makes it easier for the programmer to be, you know, less intelligent or even just to capture the alignment on what the system does and, you know, make that increasingly LM-friendly, so it's easy for them to understand and kind of apply and represent instructions.
[00:52:49] Guy Podjarny: And then you separately need the instructions of the developer. And I think to me, context engineering is about that, right? And I think we started writing, and we'll probably talk more about the context development lifecycle, and I think the agents will handle the software development lifecycle.
[00:53:09] Guy Podjarny: And we need developers to sit in the context development lifecycle as we develop context. And the context really is the operator's manual, right? It is about how the agents should behave and what your intents are. It'll be interesting to see the interplay there.
[00:53:23] Chad Fowler: A friend of mine, Jess Martin, is working on something called Context Library, which is a way of automating building up context of all of these decisions that you make.
[00:53:34] Chad Fowler: With the goal being kind of like what you're saying, run agents all night while you're asleep, and they make decisions based on probably what you would've said because they have enough context in the system built up in a way that's usable by them.
[00:53:46] Guy Podjarny: Yeah, I see a lot of that. And then I do wonder, the reason I say context development, I think it's a different view on it, although time will tell which one is right, which is there's this spontaneous, you know, observe me and understand what I do.
[00:53:55] Guy Podjarny: But then what I do today is not the same as I would've done a year ago, let alone 10 years ago. And so, like, the alternate approach is that context is a thing you develop. It is not a point in time. And we all know, like at one point during the LLM kind of iterations, all the rage was learning my code base.
[00:54:21] Guy Podjarny: And then we learned it's like there's a bunch of stuff in my code base that I do not want replicated. In fact, yeah, most of the stuff in my code base I do not want replicated because it's been accumulating garbage, and garbage in, garbage out. And so it's not that it's not useful, but you want to develop your sort of practice.
[00:54:38] Guy Podjarny: So it's interesting. I guess it's a good point to sort of try and conclude the conversation at that point in the timeline. So what is your thesis? Like you're building, put aside the guide for the programmer, even aside the switch from current-day software to immutable, regenerative, you know, Phoenix architecture-style software.
[00:55:03] Guy Podjarny: Like, what do you think is practical, or what is your anticipation in terms of the kind of path of adoption for software actually changing to be working like this in the industry?
[00:55:14] Chad Fowler: I think it will be both really quick and incredibly slow. And the slow part is the realisation we all need to make that we have systems that have been in development for decades and no one understands them.
[00:55:29] Chad Fowler: And most of us nerds out on the frontier are not thinking about them so much yet. I think some of us are starting to. I think if I were to go start a company now, I would be doing the company this way today. So, you know, the greenfield code base right now is the answer. And I know some people who are doing this to varying degrees, but I also know of development shops in fairly forward-leaning companies that are barely touching AI at all, and they're resistant.
[00:55:55] Chad Fowler: There's a huge portion of the developer population that's afraid or just not paying attention. You know, just like when Java happened, they didn't want to do Java; they didn't want to do Ruby and Python, you know, and, you know, they just want to have fun and go to work as little as they can.
[00:56:20] Chad Fowler: That is the truth about most people in their jobs, which is fine. That's, you know, humans are humans. Mm-hmm. But organisations are backward; organisations are slow. Most people don't know how to deal with this, and they're gonna be looking at large consulting companies to try and figure it out. So I think it's a long, long time before there's a mass adoption of AI-generated software up and down the stack through, you know, the largest enterprises.
[00:56:38] Chad Fowler: There's a mass adoption of AI-generated software up and down the stack through, you know, the largest enterprises. Um, but there's a lot of interesting work that should be happening probably in the coming year or two because so many of us and companies I'm talking to, you know, from my perspective as a VC, are having these same conversations about, like, trying to capture intent and what is a spec versus intent and where can intent come from.
[00:57:05] Chad Fowler: You could mine it from corporate email systems for, you know, the last several decades. You've got all that stuff captured. You could mine it from source control systems and project management, so work is being done to figure out the legacy intent of this legacy system so that you can then start automating, maybe like a strangler fig pattern breaking pieces of an application off.
[00:57:28] Chad Fowler: But yeah, I think it's gonna be like insanely fast on one side and surprisingly slow on the other side.
[00:57:35] Guy Podjarny: Yeah, I think that's good; it's almost always like the right answer to like any diffusion of technology. I do. One of the things that I do feel, and maybe this is in sort of Tessl on the company-building-ethos side, is that one of the reasons that we've kind of focused on this sort of, "train the developer" versus "define the system in the offering."
[00:57:58] Guy Podjarny: You know, there's a whole bunch of reasons. Some of this stuff needs to, you know, be open source, needs to sort of be part of the agents. You know, there's a variety of reasons, but part of it is that if you train the programmer, then you're sort of simulating the human behaviors, right? Like in these ancient systems that are so fragile, there are still humans that are operating in there, and they're making decisions, and those decisions are, by and large, these companies are still alive, right?
[00:58:18] Guy Podjarny: Like they are growing, they're modifying it. And so there's some logic by which they're making this work. And so it felt like it's even though conceptually, as we talk about it, a higher layer, and it's maybe a further future to think about training the programmer versus training there's actually in some aspects of it a more short-term applicability of it.
[00:58:45] Guy Podjarny: Yeah. Because that better programmer can help you do better on the systems, even if it doesn't fix the fragility or it doesn't make those systems, you know, be able to rise up from the ashes.
[00:58:55] Chad Fowler: Yeah, I agree. I mean, I think, you know, it's like what I said about everything's gonna be obsolete that we're talking about at some point.
[00:59:00] Chad Fowler: I think there are waves where we will solve these problems in the way we're talking about today, and it's totally valid, and it will be useful for a long time. But the industry's just gonna go so fast. We'll think, "Oh God, what we're talking about now is silly, but it's still useful, but it's maybe not the future anymore," you know?
[00:59:18] Chad Fowler: The beautiful thing about today is we can go so fast; we can deal with all these different phases, and we can have roadmaps for our companies where we just say, okay, we'll get to that when it's a problem. Okay, now it's a problem, boom. Next week we've got a whole new release. You know, it's like a miraculous time to be a software developer as well as a terrifying one. But yeah, I think we get to be like steering through this crazy time. It's really exciting.
[00:59:39] Chad Fowler: But yeah, I think it's, it's we're we get to be like steering through this crazy time. It's really exciting.
[00:59:46] Guy Podjarny: Yeah. Yeah. Definitely sort of fasten your seatbelt, but it is an exhilarating ride. Chad, this has been super, super fun, you know, fascinating, future-looking, you know, sort of deep on it, and I think right in many ways.
[01:00:00] Guy Podjarny: So I'm really looking forward to the open-source version of it and to hear more about this development. I don't know if you want to say, but if people are keen to find you to invest in their sort of, you know, invention of the sort of future of software development, where can they find you?
[01:00:16] Chad Fowler: Probably Blue Sky is the best place. I'm chadfowler.com on Blue Sky.
[01:00:20] Guy Podjarny: Sounds good. So thanks again for coming onto the show.
[01:00:24] Chad Fowler: Thank you.
[01:00:25] Guy Podjarny: And thanks, everyone, for tuning in, and I hope you join us for the next one.
Chapters
In this episode
"The code that we have is a liability. The system is the asset we're building."
Chad Fowler, VC at Blue Yard Capital and former CTO at Wunderlist, sits down with Guy Podjarny to discuss the Phoenix Architecture: software designed to be replaced rather than maintained.
In this episode:
- why was the code written by Chad never longer than a page
- how he replaced 70% of a codebase in 3 months and cut costs by 75%
- shipping AI code no human ever reviewed, and how to make it safe
- the shadow specs your agents are making without you
- why your system should work with the worst LLM, not just the best
If you're still thinking about your codebase the old way, this one will change that.
Phoenix Architecture: Why Code Should Rise from the Ashes
The idea of treating code as disposable once seemed heretical. Now it feels increasingly inevitable. In a recent episode of the AI Native Dev podcast, Guy Podjarny sat down with Chad Fowler, a VC at Blue Yard Capital with deep technical roots going back to leading Ruby Central and serving as CTO at Wunderlist and Microsoft. The conversation explored what happens when the same principles that transformed infrastructure management get applied to code itself.
The core thesis: if immutable infrastructure taught us to constantly replace servers rather than nurse them along, perhaps immutable software should teach us to constantly regenerate code rather than accumulate it.
From Immutable Infrastructure to Immutable Code
The origin story begins in the early 2000s, when Chad found himself repeatedly euthanising software systems that had become too fragile to change. The pattern was consistent: systems that worked well could not keep pace with new demands because the teams maintaining them had lost the ability to modify them safely.
"If you want to be able to change things, you need to do it all the time," Chad explained. "This sort of behavior, whether it be infrastructure or software, would lead to situations where you just couldn't progress the system because you're not practicing change."
The first insight led to immutable infrastructure, now a standard practice: never modify a server, always replace it. The second insight, which remained theoretical until recently, was that the same principle should apply to code. As soon as you write one unit of work, you never modify it again. You just throw it away.
This was not practical when humans had to write every line. But with LLM-generated code, the economics shift dramatically. The question becomes not whether this will happen but how to make it safe when it does.
Specs as the Real Source of Truth
The Phoenix architecture, both a philosophy and an in-progress open-source project, treats code as a build artifact rather than a primary asset. Just as a Makefile takes source code and produces compiled binaries, a Phoenix system takes specifications and produces running software. The code itself becomes intermediate, disposable, regenerated whenever the spec changes.
"The code that we have is a liability, and the system is the asset that we're building," Chad noted, echoing something he used to tell his teams at Wunderlist. The architecture codifies this: specifications capture intent, and everything downstream can be regenerated.
This raises the obvious question of what constitutes sufficient specification. At what point does detailed spec writing just become a different form of programming? The answer appears to change weekly as models improve. What required elaborate specification six months ago can now be one-shot from a vague description.
The practical approach involves iteration: start with rough specifications, see what the system produces, refine until the output matches intent. Then lock that layer and work on the next. Different layers change at different speeds, a concept borrowed from architecture theory called "pace layers." Protocol implementations might lock early and change rarely. User interfaces might regenerate frequently.
The Challenge of Accumulated Decisions
If code regenerates constantly, what persists? The conversation surfaced several categories. Explicit human instructions have the highest status: decisions a human developer made intentionally and articulated clearly. Explicit reviews come next: a human looked at something and approved it, though the reliability of that review varies. Finally, there are shadow decisions: choices the agent made that were never presented for review, simply executed.
"I actually have two parallel web apps by accident," Chad admitted about one of his projects. "I have a really nice one that I've been iterating with, and then one that the agents just decided they were going to make." The provenance of decisions matters for knowing what can be safely removed.
This connects to a broader challenge in context engineering (https://claude.ai/blog/context-engineering-guide): capturing not just what the system does but why, who decided, and when. Cryptographic hashes can track intent through transformation. Knowledge graphs can capture the relationships between business requirements, technical constraints, and implementation choices.
Evaluations as the Durable Layer
In a world where code regenerates, tests and evaluations become more important, not less. They represent the invariants that any generated implementation must satisfy. "Evaluations are the real code base," as one of Chad's posts puts it.
This does not mean traditional unit tests, necessarily. The conversation distinguished between tests that agents should generate to verify their own work, which can be numerous and disposable, and tests that capture core invariants about correctness, performance, and behavior. Those core tests should lock and rarely change.
There is also a metrics dimension. If you practice YOLO deployment but maintain intense observability at all layers, you can respond to real system behavior rather than trying to anticipate every failure mode in advance. Mean time to resolution matters more than mean time between failure when change is cheap and fast.
The Programmer Versus the Program
An interesting divergence emerged in the conversation about what developers should focus on: training the programmer or defining the system. One approach captures intent in specifications that describe what the software should do. The other focuses on evaluating and shaping agent behavior so that agents make better decisions across all the systems they touch.
The approaches complement each other. Specifications capture what a particular system needs. Agent evaluations capture how a particular development organisation wants decisions made. Both feed into what might be called a context development lifecycle that runs parallel to the traditional software development lifecycle.
The implication is that developers increasingly work on context rather than code. They develop specifications, refine agent instructions, curate evaluations, and review generated output. The code itself becomes something agents handle.
Adoption Patterns
The trajectory appears to split dramatically. Greenfield projects can adopt these approaches today. Some development shops are already treating code as fully regenerable, never looking at the implementation details as long as tests pass and the system behaves correctly.
Meanwhile, most enterprises have legacy systems that no one fully understands, teams resistant to change, and organisations that move slowly. Work is happening to mine intent from existing systems: extracting specifications from code, email archives, and project management systems. The strangler fig pattern might apply, gradually replacing pieces of legacy systems with regenerable components.
"It's gonna be like insanely fast on one side and surprisingly slow on the other side," Chad observed. The gap between leading-edge practice and mainstream adoption will likely persist for years, even as the leading edge continues accelerating.
For teams watching these developments, the practical starting point may be simpler than redesigning entire architectures: begin treating code as more disposable than precious. Practice throwing away agent-generated work and regenerating it. Build the muscle memory for change that makes more dramatic shifts possible later.
The full conversation covers additional ground on local-first software, malleable applications, and the interplay between model capabilities and specification requirements. Worth a listen for anyone thinking about where software development practices are headed.