
Everything 100 Episodes Revealed About AI Native Dev
Also available on
Transcript
Simon: Back in November, we hosted the first ever in-person AI native Devcon in New York. This June 1st and 2nd, we are bringing it to London. It’s two days built for AI native developers and engineering teams. One day full of hands-on workshops and one full of practical talks on agent skills, context engineering, agent orchestration and enablement platforms, and how teams are actually shipping AI in production.
Simon: Join us at the brewery in London, near the Barbican for all of that, plus networking parties, giveaways, and a room full of people building the future of AI native development. You could also join us from anywhere in the world via the livestream. As you’re listening to this podcast, you get 30% off your ticket with Code POD30. Just head to ainativedevcon.io and we’ll see you in London.
Simon: Hello, and welcome to a milestone episode of the AI Native Dev. My name’s Simon Maple, one of the hosts of this show. And joining me is
Guy: Guy Podjarny. I’m the co-host, Simon’s co-host and sidekick overall.
Simon: And today’s a milestone episode because, do you know Guy, we’ve already done, already shot and released 100 episodes of the AI native dev.
Guy: That’s insane.
Simon: How does that feel?
Guy: Like, I don’t know how we get anything else done. I know, right? Just sort of a, all the time just sort of coming here and then talking to interesting people, which actually is not a bad reality.
Simon: Yeah. We’re a podcast company, right? And we do this other thing on the side called
Guy: It’s the fallback. It’s the safety net.
Simon: Yeah.
Guy: I don’t know how safe it is, but it’s interesting.
Simon: Amazing. And so this episode is really going to look back at the previous 100 episodes, and we’re gonna take sound bites from some of the most amazing speakers that, like you say, we’ve had on the show.
Guy: Yeah. So it’s gonna be about 45 hours, this.
Simon: Something around there. Yeah, it’ll definitely be a week at least. So yeah, make sure you’ve got a long drive or yeah. And we’re gonna add our commentary. Yeah. And also see how things have changed. For example, let’s run the first clip Guy, because I think we’re gonna find this very, very interesting. Okay.
Thomas: Anyone who’s predicting what that looks like in a year or two is going to be fundamentally wrong. And I think that that’s really what makes it so exciting. We don’t know where we’re going other than that we do know that we are going to keep building software while maintaining, you know, billions if not trillions of lines of code that are already out there.
Simon: Thomas Dohmke, of course, the previous CEO of GitHub and now I believe founder of a new company, new startup in the AI space called Entire. So if anyone’s gonna be fundamentally wrong about a prediction in one to two years, let’s go back, maybe let’s say one to two years Guy, to episode one where you predicted something around AI development.
Guy: Yeah. I never claimed to be too wise.
Simon: No, no, no, no, no caveats now, Guy. Let’s hear what you have to say and then we’ll see how accurate you were.
Guy: I think the biggest change that we should expect, that I feel fairly convinced will happen, is the move from code centric to spec centric development. When you think about how you develop software today, you write, you get some requirements, you write some code. You make a hundred decisions in that code that never leave the code. Yeah. And the code very quickly becomes the source of truth. And we focus on evolving it and growing it. I think in the world of AI native development, we will have a means of specifying what it is that we need and AI will provide the implementation.
Simon: So Guy, tell us your new take. How accurate were you?
Guy: Yeah, I mean, I think some things were spot on, in that I think today it’s a much less controversial statement to say that, you know, code will become disposable and that implementations will be, you know, something that is just kind of created for it. And so you will revolve around intent, you know, around what it is that you wanna build. So I think it was not far off on that bit.
Simon: That bit was good. Tell me about the bit that’s fundamentally wrong, as Thomas would say.
Guy: I think the primary kind of change, I’d like to think evolution kind of in my thinking, our thinking here at Tessel, is this move from spec to context, right? When we recorded that episode and we were talking about spec centric software, we talked about how you want to make sure that LLMs, this is pre agents, you know, now agents will make sure to update, to kind of capture, you will convey what it is that you wanna build, and then the LLMs will build that and they will fill in the gaps. That’s what we talked about back then. And while that is not wrong, you know, it’s true, LLMs need to do that, it’s just such a small part of development. When you think about your dev team and what do you expect of them? You know, you don’t really say, hey, make sure that every time you follow, you read this doc and you follow the exact instructions and you update that. You know, sometimes you want them to do that, but generally you want them to make good decisions, you know, including choosing when to update documents and read them. But that’s just like a portion of what you want from a great developer in your team. You know, you want them to have all sorts of preferences around how they build and how do they collaborate on it, and how do they test and make sure it didn’t break, and how do they balance quality and speed, and how do they use your existing infrastructure and how do they troubleshoot issues? And so all of that is really about training the developer. And so I guess my catchphrase today is the thing that we’ve moved from specking the program to specking the programmer.
Simon: Mm-hmm.
Guy: It’s much more about how do you want a developer, a great developer in your team to behave. And again, there’s a portion, I don’t think spec centric software or spec driven development as is the term today goes away. Like I think that is important. I just think it is a subset of how you want today’s agents to build. And it’s more important when you think about the new software development, it should revolve around training these AI agents to build the way you want them to build.
Simon: Such a good way of putting it. And one way that resonates with me is onboarding. And it’s like if you were to hire an individual to work on your team, you would spend time onboarding them, training them. They could be the most senior developer you’ve hired, but they still need to know how your team works, how your organization works, and take on those policies and ways of working to actually use their skills in the new environment. And there’s no reason why we shouldn’t do that with an agent as well. And it’s that onboarding of that agent, very, very nice.
Guy: Yeah, absolutely. And that would sort of add, it’s like onboarding and then continual training. You know, you want to make sure that they, as a group, you learn things. So some of it is you individually learn, but a lot of it is the group learns, the team learns, and you wanna make sure you accumulate that and retrain the team. So both of those are human analogies, right? About how do you want people in the team to work. You want them to onboard and then you want them to keep with it, and then sort of adapt and evolve and be aligned with the rest of the organization and how they build. So we need to build that. So all of that today manifests in the form of context engineering. And I guess the unit of software today, or unit of context today that is most discussed, is skills. And so I’d say if you take the clip, you can swap the word spec for skill. It’ll still be natural language words, it’ll still be a capture of intent. But it is not a program spec, it is a programming spec, you know, how do you want to develop software here?
Simon: So it was a quite subtle thing that you were fundamentally wrong about.
Guy: I was fundamentally wrong. I had to be fundamentally wrong about something. Yeah.
Simon: Yeah.
Guy: Otherwise we wouldn’t be learning anything.
Guy: It wouldn’t be boring, Simon.
Simon: I mean, we can agree, you and I can agree that you were fundamentally wrong about that. So that’s fun. Okay, next up, you know, Thomas didn’t just mention that you were fundamentally wrong about something. He also talked about the changes that developers need to go through. Let’s hear from Thomas.
Thomas: Simon Willison on his blog recently had like a statement, I’m paraphrasing, but it was like we have to get comfortable with not reviewing every single line of code before we deploy them into production, because the sheer amount of code that a single agent can write, 24/7, now you parallelize that into running 10 agents in parallel all the time, it’s never going to be able to review all the code without the one human doing that becoming the major bottleneck and effectively erasing some of the productivity gains from this agent.
Simon: Yeah, very interesting take there from Thomas, and actually also supported by a couple of other folks on the podcast. Peter Guagenti from episode three way back also talked about a potential future whereby agents might actually write code that isn’t even human readable, so maybe something close to bytecode or a machine style code. Birgitta, also from ThoughtWorks, in episode 69 also said she ran into situations where she has done so much, almost like vibe coding, but allowing the agent to write that code, that she feels so disconnected from it she isn’t necessarily the most appropriate person to do the code reviews.
Guy: Yeah. I think, so super interesting. Fully aligned with Thomas’s view in general, like the human is the bottleneck. You can’t, it was like all sorts of similar statements in the AI world. You had Elon Musk talk about how, if you had some weird spreadsheet analogy on it, but also saying if you have a gazillion spreadsheets and just a few cells need to be filled in by humans, you’ve reduced it down to human speed. And I think that’s what we’re seeing with code review. If humans are the bottleneck, then you know, you get human traits like review fatigue, but then you also just sort of see slowness, right? This is the human speed that is limiting how much AI can code and build. Instead of just moving the bottleneck, I think this will actually extend further. So once you get through and you deploy, now the problem will be that the system changes too quickly and we can’t do observability and root cause analysis and handle those. We can’t figure out that a bug needs to come back and be resolved and then resolve that and deploy the fix. So really throughout the end-to-end software development lifecycle, our aspiration really has to be, and I think we’re seeing this in the advanced organizations, to really agentify every single step of that process.
Guy: And I think that is the only way to move forward. And when we think about, okay, but these things are unreliable, right? They don’t, you know, that they build the right thing, it comes back to context, right? And at Tessel we talk about the context development lifecycle. Right. And I think agents should operate the SDLC and the human should live in the context development lifecycle, the CDLC. They should really be a guide. And I think the human analogy is really a manager. And if you’re, you know, at the moment we’re like the frontline manager, the player coach, you know, you’re sort of giving the tasks, you’re monitoring, and you’re getting things done. And increasingly if you want to increase your scale and have more employees, in this case AI employees, or AI labor, that is getting things done, then you need to become a second line manager and a third line manager. And I think those really predicate on your ability to define what correct looks like. To convey instructions, to identify the mistakes that were made and be able to resolve them. And so you need to move up the stack. And the way you communicate with agents is through context. It’s not just context, it’s context, it’s the tools you make available to them, it is the foundation to be able to observe what has happened. And so that’s why we want to think about a context development lifecycle. We think about generating some piece of context, it’s actually, we talked about that in one of those episodes as well, about how knowing what you want is actually hard.
Guy: And so figuring out what it is that you want, what is good, what is correct behavior, and then you want to be able to test and evaluate, this is maybe the equivalent of testing humans, but in a little bit more software form, like your tests to know whether these instructions are well adhered to. So we think about evals for it. You wanna be able to disseminate and communicate to the hoard of agents, you know, that you have the right communication to the right agent at the right time, and then you wanna observe what has happened. And then once you’ve observed, you will have learned a bunch of things. Also a bunch of facts in the world have changed, you know, maybe some code has changed, maybe a pricing model came out. And you want to learn from that, come back and sort of update your instructions. And so all of that is what we think of as the context development lifecycle. So I think that is very aligned with this destination. And I think what we’ve, it’s interesting that the world of AI, I like my crystal ball, you know, like I like these fundamentally wrong predictions. And looking ahead, but I think one of the things that I find most interesting about the way AI has evolved is it’s oftentimes around solving the next problem, solve the next problem, solve the next problem. So I think the next problem to solve right now is code review, because we’re failing there. I think subsequently we’ll get into those deployment sites.
Simon: Because one of the biggest worries is of course the quickest, cheapest short term fix is just to allow things to happen and for developers to go, if I’m the bottleneck, I can just eliminate myself from being the bottleneck by just checking, yeah, this is fine, or just not doing it at all. So it’s about doing things, how can we speed things up, continue at the pace that AI wants to generate at,
Guy: Yeah.
Simon: While doing things accurately and properly, and adding that context in to almost take things away from the developer, so that the developer, the human, can really look at the things that are most important rather than, as code changes scale up, just try and do things faster or by learning how to code review faster. Yeah, that doesn’t make sense. Adding this context that actually can take a lot of that away from us and for us to be that second line manager, to understand it, to manage it, makes a lot more sense.
Guy: Yeah, and I think the cloud analogy works pretty well here, which is when we were in the waterfall era, a lot of these processes and a lot of these reviews were human. And as we moved into cloud, we had to automate them, right? We had to define, to do policy as code, as the sort of the meta name for it, but infrastructure as code to deploy your infrastructure. Automated testing, you know, and dev QA became a thing, it wasn’t a thing before. And so we actually have gone through a fairly similar shift when we moved to cloud to say things are moving faster, the bottleneck is the human, let’s automate this action of the human.
Guy: And now we’re doing another leap. And it was mighty uncomfortable at the time to say, what do you mean, I’ll automate this and I will deploy it and I will deploy more bugs. And you know, the time has shown that if you close the loop, then yes, you might even deploy more bugs. But because you’re able to identify those and then fix them, and then come back and roll those out, the systems that we’re continuously deploying, today’s well known, they are more secure, they have higher quality, they have all of those. But you have to go all in. If all you’ve done is removed the gate and just said, just let it flow, then you end up with a system that is more broken. So a lot of that cloud learning actually applies here substantially. And you have to think about the end-to-end operation. It is happening faster than in cloud. And so what we don’t have is the benefit of trying it out and seeing it works. I have one more small comment just on what he said about non-readable code, which is interestingly something that is now starting to surface as a safety concern.
Guy: So there’s actually some significant call-outs to say that actually a safety requirement should be that LLMs are not allowed to produce things that are not human reviewable. Like binary code or their own languages. Even if it is more efficient, for safety, as part of being able to supervise agents. So that’s interesting. I don’t know if it’s an exaggerated concern or not. But I think right now, more likely in the prediction realm, we will produce readable output, readable code, and then it’ll get compiled.
Simon: So following on from code review, I guess next up we have Annie Vella, who’s a distinguished engineer from Westpac NZ. And really interesting blog and discussion that I had with her around the future of developers. Let’s hear from Annie.
Annie: There are literally people who enjoy going through those sorts of LeetCode exercises. There’s a reason that website exists, right? There are leaderboards of people who can solve the most complex problems the fastest, so that, I don’t think will be very important anymore. That’s a skill that a lot of software engineers have taken to be the core of being a really strong software engineer. That is a skill I think we’re gonna need to learn to let go of, both in interviews and personally. It just won’t make much sense anymore. Why would you need to be an expert at writing that level of code when an AI can probably generate something good enough for you?
Simon: Really important point, and I think it’s something that most of the industry is aligned on. But it’s possibly one of the things that we are least comfortable giving away, because it’s a thing that we enjoy.
Guy: Yep.
Simon: It’s a thing that we have become, it’s part of what excellence is, or seen as, in development, in programming, to write the nicest code, the cleanest code that solves the deepest, most complex problems.
Guy: Yeah. And be fast about it.
Simon: And be fast about it. Is this gonna be a problem for developers adapting in the future?
Guy: I mean, I think for some, for sure. Yeah. I like the mental model that says some developers, I’m generalizing here, but some developers are more motivated by the craft of coding, while others are more motivated by the impact, right, or by the creation. And so I think if you are motivated by the creation side then, you know, agents are amazing. I mean, you can produce so much, shortening that journey from idea to execution. Even if you were a very, very fast human developer, it still takes you that time. Meanwhile you could have thought of other things, right?
Simon: It’s the means to the end of the coding versus
Guy: Exactly. So the fun is having an idea and bringing it to life and tuning it and orienting it, and even making some architectural decisions and things like that, right, around what is elegant and what’s not. But writing the code, stepping that journey, it is not the goal. While for others writing the code is the craft, you know? And I do think there’s beauty in the architecture of writing correct code, well crafted, short, simple code. All of those are
Simon: Good abstracted Java classes and interfaces.
Guy: Exactly, yeah. And also in modern languages. But you know, there’s beauty in it, you know. So there’s probably room for it from an aesthetic perspective and a skill, but it’s slightly becoming like factory manufactured furniture versus handcrafted ones, right? You can create things that are very local, they might even be able to better represent very specific preferences, very specific decisions that are almost hard to convey in any way that isn’t code. But for the vast, vast majority of furniture, you’re totally comfortable with them being factory made, with them being built by machines. And so I think that is true for code as well, and for increasingly sophisticated and complicated code, and the advantage of doing it by hand is so minuscule compared to the advantage of being able to produce it quickly, cheaply, by more people. Like, think about the loss of knowledge when you have one person with the idea and they need to convey that to a person to convey that to a person, you know, to eventually get to someone who writes the code. So dropping some of these middlemen, so all of those are just so much more valuable. So the reality is I think there will be room for handcrafted code in some niches, in some very specific spots and some artisanal element. And I think it can remain a hobby. Just like today there’s a bunch of hobbyist hardware hobbyists, it’s okay for that to happen. But I think as a profession, you gotta let go.
Simon: So do you think the best developers five years ago will continue to be the best developers in five years? What do you think the key skills need to be for a successful developer?
Guy: It’s a good question. I don’t know what the percentage is of people that are the best developers because of their ability to craft code versus the best developers otherwise. I’d say that even throughout the years, the best developers are the ones that truly understood the problem to be solved and the architectural trade-offs to be made. And then they saw the code writing as a pass-through, right, they knew how to then translate the English into code quickly. And so I think it’s the same best developers, but I think there’s, if you were to contrast or compare different developers, for some of them the strength of code writing was a bigger part of that mix. And that will diminish. There’s another element of just speed of decision making. And so writing code is slow, and because it is slow, you actually get to think about the problem and these architectural decisions and all of those as you code.
Simon: Hmm.
Guy: And I think that is an interesting thing that is lost a little bit when you are coding quickly, because you need to decide more things upfront. And that’s hard because sometimes you don’t need to do it. I see the pattern that is emerging to try and tackle that is, okay, as you get the agent to code, be explicit around architecture, even if it’s not you who made the decision but the agent. Have the agent be explicit about the architecture. Review that, and don’t be afraid to tell the agent, scrap all of that and build again. So this ability to rebuild.
Guy: Yeah. Is really more important than the ability to build. How do you capture that? It slightly comes back to context, in this case maybe it is sometimes specs and architecture. It’s a specific type of context to say, this is what we’ve built, these are the emphasis points about how we built it. Now remove the whole thing and write it in, you know, I know Java. Again, Rust, something per your requirements.
Simon: And I love that thought. Where, you know, as coding becomes something that’s very fast and actually taken away from us slightly, we do lose that time to think as we are building. But what we do gain is that fast iteration, which allows us to gain feedback faster, throw away quicker. Yeah. So we can throw away, and we can use that feedback to build something faster with real, stronger knowledge about what works and what doesn’t.
Guy: Precisely. Which, by the way, the recent episode with Chad Fowler about regenerative software and the Phoenix architecture, yeah, is really all about that.
Simon: Amazing. So next up, let’s jump into the world of from dev to ops via DevOps. Armon Dadgar.
Armand: You might hire the world’s best SRE, who knows Terraform and Cloud inside and out. But when they join your organization, they don’t know anything about your organization. So they don’t know, do you run Windows in production, RHEL in production, Ubuntu in production? Now they might be power users at all of those things, but the context of knowing, hey, as an organization, we standardized on RHEL, that’s a critical piece of context that you have to be a useful SRE at the org, right? You don’t wanna hire someone, the rest of my stack is Windows and all of a sudden this person’s writing Linux for me, but it doesn’t help me, right? They need to understand how does the organization work, what have we standardized on, what does my existing infrastructure look like, for them to generate things that are then contextually relevant to my environment.
Simon: And you can tell from your youthful looks there Guy, that was back in episode 19. Armon, of course, the co-founder of HashiCorp. So yeah, context is the difference between great AI and useless AI. And it’s not just development, it’s generally AI, right?
Guy: Yeah. In ops, I think what we’re seeing, first of all context is probably the most abused word right now, like anything you need to know. And I think the meta perspective there is to separate intelligence from knowledge. You can be massively intelligent, but what do you know? And I think there’s a bunch of things you need to know to operate in an org. I really liked the episode with Armon, which was early on in DevOps. We also had a recent one with Mirko, who’s the founder of Dash0, a modern observability play, also talking about context and how it’s important from an observability perspective and the ability to troubleshoot and do root cause analysis also in the world of DevOps. Also highlighting that the key is context. I think ops is an interesting place because when you think about who in your team, which developers are carrying the pager and able to troubleshoot the things, or even the superhero SREs in some organizations, who are the ones that will be called to firefight a really bad production outage? Those people, they have a very broad understanding of the system, right? They know what are the types of applications, what are the patterns that they’re deployed, what is the infrastructure that they are running on, so all the way from the operating system to the types of scaling, historical or past outages, known weaknesses within the infrastructure that have not yet been prioritized. You know, there’s all this knowledge, and because of all that knowledge, they can look at a piece of current data, like this system crashed or those requests are slow or whatever it is, and they’re able to kind of quickly translate that into, oh, it’s probably this problem and it’s therefore probably this solution.
Guy: And so I think context is very, very important because oftentimes the production system is where it all comes together. It is the infrastructure you’ve deployed, it is the system that you are connected to, it is of course your code and your application and what you’re seeing in. So definitely seeing in the world of DevOps that context is important and any player on the planet will talk about context when it comes to DevOps. And what we’re seeing is root cause analysis capability is the one that is manifesting as probably the most useful in the DevOps world, to be able to look at these vast amounts of data that exist about how your system operates, which is a strength of AI and agents, to understand it. Combine that with context in the organization where you’re not just blindly looking at the logs, but you actually know more of that information and then you go off to understand it. And the advantage of root cause analysis is that it’s kind of harmless. We’re still seeing very cautious use of AI, which was still the case back in episode 19 with Armon, in the use of it for actual ops activity. Like people are still very uncomfortable with an agent deploying a system, slightly more comfortable with writing some Terraform code.
Simon: Which is another interesting piece, because like everything that you talked about up until the Terraform code is really pulling sensitive information, but information that has come out of your ops infrastructure, whether it’s troubleshooting, pulling data and logs and things like that. But when we talk about the creation of Terraform code, enterprise Terraform code, that’s not just something that you find on someone’s GitHub account, right? Whereas typically there is so much more, for a developer, there’s tons of code that the agent can train on, the LLM can train on, and it will provide you with good suggestions and good alternatives to what you’re trying to do. But is there really the right training data there for an agent to use or an LLM to use when generating enterprise quality and enterprise grade Terraform scripts, for example?
Guy: Yeah, I think there was definitely a holdback in the early days. There’s still a decent amount out there. You know, there’s a bunch of public modules and things like that. You know, there’s good documentation for it so it can follow it. And eventually, you know, declarative languages like HCL, the HashiCorp language, or Rego and such, are a little bit easier. They’re still probably behind a lot of commentary about Rego, the open policy agent, you know, not being, agents not being great at creating it. So there’s probably room to improve. I have seen substantial improvements on it, from various users. And you see it in probably two primary reasons. One is there is some data aggregation that happens as these things are actually used in the real world. And so there’s more data accumulated from private sources. Not all of it can be trained right away, but still there are signals that can be learned. And the second is the ability to create synthetic data. And so the labs are increasingly relying not just on public data, but on synthetic data. And I do think that infrastructure as code has the same kind of attribute that regular code has, and sometimes even more, which is you can generate something and you can check if it actually works. So you can deploy it and see whether the result is correct or not. So that becomes just a kind of a data generation exercise. But still, that is one limiting factor, indeed the sort of the public-private split and maybe the price of generating it. Another big concern or limitation is the ability to deal with time series data. And so when you analyze observability data, the sequence of things that happen is very important. And LLMs are a little bit more of meta thinkers. You’ve seen a variety, we had Ollie from Datadog on the podcast, you know, talked about Toto, their Datadog model, but also their time series model. And you see some of these startups like Traceable in this space that actually look to create models, or at least fine-tune RL models around that. And so the ability to deal with those volumes, and then on top of all of those, there’s just sheer price. Like when you’re producing a lot of data, you’re not gonna flow that through Opus and be happy, you know, until it hits your bank account. So you need to be smarter, and there’s much more of a tendency of using open and cheap models. So I’d say overall, context is key in DevOps as much as it is anywhere else, and maybe even more so. And I still think the DevOps world, and we’ve mentioned three episodes here, we’ve had a few other references to it, still ends up being a little bit more cautious, mostly because of some of these technical limitations. But I’d also say the blast radius of if they got it wrong, you know, what are the horror scenarios that we hear about agents running systems? They dropped the database in production, right? Those are very hard to recover from type setups. So it’s interesting, let’s solve code review, you know, kind of evolved from there. But it’s still making progress.
Simon: You mentioned meta thinking. Let’s switch, this is so seamless, Guy. Ian Thomas, who’s a meta thinker because he works for Meta. He has some thoughts about the adoption of AI within the Meta organization and I caught up with him at QCon AI in New York. And actually to set this up, Peter Guagenti, who’s also a friend of mine, also mentioned a stat that he gave, Gartner say that 10% of enterprise devs use AI tools today, but by 2027 that number hits 80%. Let’s see what Ian thinks about that.
Ian: We had a few people that were really keen and experimenting and they were finding value of these tools outside of work. And then we had a bunch of really senior engineers who were perhaps a little bit more skeptical. And then gradually over time we’ve seen adoption grow. Last time I checked, we were over 80% weekly active users.
Simon: 80%, over 80% weekly active users within Meta, which is a little bit biased perhaps, given how deeply involved they are with AI. But over 80% weekly active users within Meta using AI dev tools today. Does that surprise you?
Guy: It doesn’t, but I feel like I have a few grains of salt to throw into that statement. One is this is a world of haves and have nots, you know. And what you’re seeing is this sort of big polarity between organizations that have embraced and invested and have leadership that is both bought in and able to execute on it, and driven, and sort of businesses that are in a state to push the use of agents, which is oftentimes disruptive, expensive, you know, it’s not an easy thing to do. And so those organizations are ahead. And we’re seeing a lot of other organizations, we engage with a lot of them here, where you’d sort of say, this organization uses agents. But really when you pierce through and we talk to a lot of these orgs, there’s like a cohort of forerunners that are embracing agents, and some of them are actually very sophisticated in how they use it. But then the vast majority of the organization is still very nascent. Everybody’s using code completion and the likes. So anyway, that’s one, I think there’s a difference between the sort of the 10% stat, which was also about a year earlier.
Simon: Oh yes.
Guy: But between that stat and the 80% stat, just between organizations, and Meta is still a very AI-forward, it is a large organization, but it is an AI-forward, very driven, very technology organization. The second thing is, I think we don’t yet have good enough metrics to say, what does it mean that you’re using agents? And I feel, and I see that the majority of developers today that are using agents, they’re using agents for smaller tasks. They’re still very supervisory. So they’re delegating, but they’re delegating small capabilities, they give very detailed instructions of like, write this piece. They use it for things that are immediately available or immediately useful, like learning some use of an API or, you know, just sort of building things that are very repeatable. They build small things with it.
Simon: What are your thoughts on adoption through looking at organization’s costs of using models or tokens from using models? Some people are more and more doing that these days.
Guy: Yeah. There’s a very controversial view there, right, of whether you should basically require people to spend more on the models. I think it’s a very short-lived thing. So I do see it today, people talk about how that measure is like just a means of driving adoption. I think everybody that says it has a bit of an apology with it. In part because I think the models are actually getting very expensive and one of the competencies of being good at using agents is knowing when do you need to use Opus and when can you use a smaller model, and when can you use an open model? You know, those are actually part of the strength, right? And when do you give the right instruction so the agent doesn’t need a thousand turns to get it right, but it can actually get it done in one go. And so there’s actually an efficiency element. The metric that I like the most and I think maybe has a shelf life for a bit, is the notion of merged pull requests. And then some people, Lemonade is a good example, actually introduce the complexity measure on top of the pull request. So for starters, just assess, you know, rate of pull request, merge rates, fine, maybe these are small pull requests, but as long as you kind of have established a point in time and you’re kind of seeing the trend moving up, that is I think a better measure for developers to say, well, how many pull requests have you opened and how many were merged? And from that point on, even if you cheated at the beginning and you can kind of get to like some small pull requests, because you just started pushing things in that way, you can only do that a little bit at the beginning and then you still have to accelerate that number. Yeah. Over time. So I think that measure is probably a slightly better one.
Simon: So next up, after thinking about a potential future where, you know, 80% plus in 2027 of enterprise developers and developers generally are gonna be using AI, we are gonna switch to Birgitta. In episode 69 she talks about how AI amplifies everything, good engineering practices, bad engineering practices. It does them equally as well, you know, whenever it sees them. Let’s hear from Birgitta.
Birgitta: It amplifies indiscriminately, right? So if you have like a bad setup, you might just amplify that bad setup, right? And if you have a good one, everything might go well, right? Can you actually, in your organization, with your pipelines, with your processes, can you support an increase in throughput? I think a lot of organizations are still underestimating this foundation that you need.
Simon: Really interesting, because it kind of means, you know, you can’t just put your cowboy outfit on and just create software. You still need to use good practices. You need to guide and have the right context to be able to get an agent to do what you want, the way you want to do it. And that’s really the crux of what Birgitta’s saying, right?
Guy: Yeah, I agree. I really like the idea of AI as a multiplier and amplifier of it. It actually makes me remember one of the early episodes we had with Des, from Intercom, one of the co-founders of Intercom who talked even in the support world of, but garbage in, garbage out, right? Where you came in and if you were to like, hey, just replicate what’s in, you know, like learn from my knowledge base, and quickly you learn, well, your knowledge base has a whole bunch of outdated stuff and things that are no longer correct. It’s the same for your code base if the agents just replicate the code base. Well, a lot of the things that you’ve done before in your code base, you don’t actually want to repeat. So you really have to be mindful of that. And so I think we’ve kind of gone through that phase a little bit in the AI world in which the thinking was we will just learn from your code and we will replicate what development does. And you don’t actually hear that much anymore. I think today it is a lot more about context engineering and about the definition of what is the correct behavior to build out. And then you want to codify that and you want to be consistent and first, and even if the agents are building things from scratch, then they might build it, you might want them to be explicit about the way and the decision that they’ve made as they’ve built it so that future versions of that agent are able to build them out. And so there’s more and more of those types of definitions of what good looks like, of history, of decisions, you know, of building those out.
Guy: I still think that there’s a gap, which makes sense just with time, around looping in the learning into that, of saying, fine, you’ve made these decisions. What would you see in the deployed system that would make you think, well, that decision was maybe not very good, or it’s no longer correct. Can I go and adapt? So we still have to evolve that.
Simon: And also I guess what you need to change as a result. Does that decision come from context that needs to be changed, or is that a human decision that needs to be changed? It’s pulling feedback or data in and understanding what was the root cause of that issue?
Guy: Yep. Yeah, I agree. And actually, it’s interesting to think about also making the migration to the change and making that be incremental. You know, I saw a recent post from Intercom about how they manage skills and all that, and they have a test generator or something that tests their systems as part of software development, and when it finds a bug, it looks for that pattern elsewhere in the system, and then it modifies it in those areas as well. I don’t know how thoroughly and autonomously they do that, but I think that type of pattern has to be there. It has to be a sort of self-healing, self-improving system. And it all comes back to that, like the whole SDLC, the end-to-end SDLC has to be handled by the agents. And what we need is we need to really focus on what correct behavior is and what are the mechanisms and what are the things we need to review versus can, you know, be made autonomous around modifications to that context. And then how does the context meet the code? How, when you modify the context, does it do it. And it’s interesting because people, I think what we’re seeing today is we are moving so quickly, but nobody has had substantial mileage with this yet. And so people haven’t experienced yet what happens when their context goes stale? You know, what happens when, you know, like a practice, we think today about make sure that it applies the right practices, but what happens when a practice changes, right? When a security issue changes. And so that is very interesting.
Simon: And also the fact that we’re never gonna be as manual as we are right now with agents. Everything is, you know, agents are getting better, but we’re also trusting them more. There’s gonna be more autonomy in the future than less.
Guy: Yeah. Right.
Simon: And in terms of that we need, you know, things to act correctly and we need things to be as we need it. And that’s so, so important indeed.
Simon: Hey everyone. Hope you’re enjoying the episode so far. Our team is working really hard behind the scenes to bring you the best guests so we can have the most informative conversations about agentic development, whether that’s talking about the latest tools, the most efficient workflows, or defining best practices, but for whatever reason, many of you have yet to subscribe to the channel. If you are enjoying the podcast and want us to continue to bring you the very best content, please do us a favor and hit that subscribe button. It really does make a difference and lets us continue to improve the quality of our guests and build an even better product for you. Alright. Back to the episode.
Simon: Okay. And then one of our actually most popular episodes in terms of the number of views was Olivier Pomel, Datadog CEO, who talked about the wonderful dream of no one waking up at 3:00 AM because of a critical situation. Let’s hear from Olivier.
Olivier: At the end of the day, we think just everybody will want automation. Right. Every single company in the world will want automation. That’s the dream, right? The dream is you never have to wake up in the middle of the night ever again to fix an issue. Yeah. The issue’s been fixed for you and so the bar for precision needs to be very high on anything that relates to reliability. And the good news is that I would say three, four years ago being completely right about root cause most of the time was science fiction. And I think now it’s definitely within reach and we can clearly see on the horizon the moment where this technology is good enough to be put into the hands of customers in a large number of situations.
Simon: What’s the worst that can happen, Guy?
Guy: Yeah, I think we spoke a little bit about root cause analysis and how that is the sort of the dominant case. And I think it just really lends itself well to, you have to sift through a lot of data, you have to build those, and if you get it wrong, then you’ve wasted time as opposed to you taking out a system. And so I think those are useful and they have a little bit of the coding assistant version over there, which is like, surface candidates for me. Right, be an assistant for root cause analysis. Now that technology is improving, to be able to get you good inklings of what that is, you know, with context. And so I do think that this is, you know, we touched on this already, very much the sort of the dream of not getting woken up. There’s a nightmare version of this that we’re seeing and actually has manifested a lot recently with all the supply chain attacks, which is what we’re also seeing is that attackers are becoming increasingly agentic. And there is a marked rise in phishing attempts, you know, that are scaling and are going downstream and going into SMBs, we’re seeing agents finding vulnerabilities in open source libraries. And so there was a big push around a variety of supply chain vulnerabilities. Supply chain vulnerabilities are multiplied in different ways, in different parties, because attackers can find vulnerabilities more easily. And on top of that, they can actually phish. Like the Axios vulnerability that happened right now had a phishing element, which sort of combines, you know, the other capabilities of AI, but then also the agents are pulling in a lot of these libraries and they’re done with less supervision. So there’s just this trifecta of, of whatever multifactor, you know, yeah, of reasons that drive it. And so I think in the DevOps world, as is often the case, right, in the dev and DevOps world, it’s a very positive message. It’s like, you can do this and we can all sing kumbaya together and, you know, sleep well at night because the systems will be kind of self-healing and we’ll build that. And self-healing is a DevOps term that has existed for a while. In the security world there’s a bit of a, you know, if you don’t, then type of risk scenario, which is we have to level up security handling to not just find vulnerabilities, but actually fix them and then go all the way to deploy to production. And that forcing function really again drives that sort of end to end speed. What I like about it is that it’s actually a case in which, you know, you really should embrace full end-to-end agent capabilities so that you are able to respond this quickly. So again, an extreme version of the DevOps kind of cloud move, in which initially it’s like, well, it’s insecure to deploy multiple times a day or whatever, to have continuous deployment. And then it became, well, it is insecure not to do that, because the vulnerabilities will happen, the systems do exist. So the question is how quickly can you respond and reach out? And so I think we’ll see a lot of that. And I would expect this year to be security very much at the forefront as agents are being used for much more real production cases. So you’ll see both of those, you’ll see the security issues because agents were not used, like people were just sort of slow, attackers are becoming faster than defenders. And there was a lot of understanding and awareness of this in the security industry, but not real solutions yet. And the second is you’ll see security issues because of failures on the agent side and on reliability. We’ve seen some leaks from Anthropic, right? Whether it’s Claude Code that they’ve quickly clarified is a human error, it’s like, it’s the button push, your human was being thrown under the bus. And we’ve seen the leak of Mythos, somewhat ironic, you know, like a super cybersecurity model having a cybersecurity leak. But there will be more and more.
Simon: Yeah. And do you think, you know, for the 3:00 AM kind of wake up, do you feel like that will turn more into a human in the loop validation versus actually, like you say, the important piece is the activity, the action, the root cause analysis, the fix, and getting all of that done quick. Do you still feel like there’s an amount of time where we need human in the loop to build that trust? Or do you feel people will be more immediate.
Guy: Yeah, I think really the best setup is to make yourself error tolerant. And so what you want is you want to have an environment in which, if you let the agent do it, but you build the infrastructure such that if the agent has made the wrong decision, it is reversible. And I think that’s really what we aspire to. I increasingly think about the mental model in which, you know, if you think there were some things that were tolerated in waterfall and were no longer tolerated in the cloud era, and the best teams in waterfall were automating things or finding issues early, like shift left in security is a good example of that. And then they were, you know, no longer tolerated in cloud. Like you could not get by without shifting security left or by having automated tests of some capacity, because you just couldn’t deal with the speed. I think there’s similarly now a few things that are tolerated in the cloud era and will not be tolerated in the AI era.
Guy: Largely due to speed. Yeah. And many of them come down to the same pattern of, these were human actions, human reviews that had to be done and the best teams were automating these things in the cloud era. But in the AI era, the average team needs to be doing that. Otherwise, you just, it would not be viable for you to build in this world.
Simon: Let’s switch gears a little bit now to Mati Staniszewski, who is the CEO and founder of ElevenLabs. He has some opinions on how we interact with AI. Let’s hear from Mati.
Mati: Like in general, we think voice will be the future of interactions, the digital interactions of how you interact with interfaces of the digital world, and it can carry so much more emotion, so much understanding than text.
Guy: So it will read your mind to an extent, but I know, like, you know, maybe go to Elon’s, kind of a Neuralink or some scary options over there. Yeah, I mean, I think voice is definitely prevalent. It’s interesting because it’s less about just typing speed and more about our human tendencies. When we speak over here, there’s, there’s ums, there’s ohs, you know, there’s, you know, you know, and all those idiosyncrasies.
Simon: The audience don’t know that that gets edited out, Guy.
Guy: So, you know, automation technology makes that go away. And when you type something, then you bother fixing these things. You pause, you think, so it’s actually not just related to your time of sort of typing down.
Simon: Mm-hmm.
Guy: It is a lot more related to how much do you need to phrase, how much can you ramble on. And so it’s been really interesting to see how voice and speech have become common in the world of AI. In AI coding, because coding requires that precision, you know, when you code you can’t have ambiguity. But when you are discussing things, right, and talking to an agent, then that’s okay. The agent can remove those things and they can take them away, and so it reduces the cognitive load. It just allows you to ramble on. There’s still value in precision when you’re instructing something to the agent. But if context is managed well, in this case it’s inline context, you ramble something, the agent might create some instruction files. That instruction file might be the thing that if passed to a subagent to implement things, then it’s, again, using the human analogy a little bit, this is the sort of the project manager, right, that you are talking to and you can ramble onto them. And part of their job is to sort of streamline your ramblings to be able to translate that into real conversations. So it’s interesting, I think Mati is right. It was clearly a little bit self-serving to talk about voice being the future. You know, EevenLabs continues to sort of shine and thrive, you know, since we’ve had that episode. Interestingly, they’ve actually expanded past audio and so now they don’t just do audio. But I think you’ll see more and more of that text in the AI dev world as well. You’re seeing, there’s an interesting pattern in Claude Code and I think it probably will be, if not already, mimicked by others to delegate to subagents a lot more.
Guy: That to me is that analogy of the front agent that you have there is the project manager and you’re interacting with it. And then it will delegate activity to subagents to do the work. And I will say that just like with project management, there’s a caveat about losing visibility. So when you’re in the terminal and you’re giving those instructions, it’s a lot harder to know what happens when you’re making that delegation. So it’s interesting, but I think generally correct that voice will be a significant factor. And of course
Simon: I think Claude Code released a feature, actually maybe a month ago or so, it’s a slash command now, slash voice, I think, or something along those lines, where then it actually listens to the mic and you can actually, from your laptop, you can actually just chat with Claude Code. And it feels quite natural actually for me. I quite enjoy doing that naturally with Claude Desktop actually. Anyway, yeah, so it was a natural addition for me, so definitely in agreement.
Guy: And it raises the, like, there’s a very different question to say, will a thing happen? Like, yeah, would voice kind of be more dominant than text? And who would be the players that win in that space? So it is very interesting. I will say that as we record this, you know, there’s been recent updates on the Claude side in which they’ve tightened up a lot of where Claude can be used, and they really want their own harness, they want their interfaces. And so we’re probably starting to toy a little bit with, you know, as companies get more capable and Anthropic right now is in that state, do they seek to form a closed garden around it and, hey, you just need to use our systems and run everything within our systems? Versus the other trend where you see, for instance, Nvidia committing to open code and open Claude and building things that are very open and composable systems, which once again feels a little bit cloud-esque, a lot more web-like. And we had in an early episode Matt from Netlify, the co-founder and CEO of Netlify, talk about the criticality of the open web, and you know, how do we build over there? So they’ve continued to invest in that as sort of being a composable infrastructure, you know, while you’re seeing Vercel actually definitely supporting some elements that are open, but also mostly investing in sort of the end-to-end experience and making a very good end-to-end experience within their kind of walled gardens, right, of it. So it’s just different questions. What will happen and who will win, or what does it take to win, are just very, very different questions at the pace this world moves.
Simon: Amazing. Well look forward to chatting about what’s next in the next 100 episodes. But for now, that’s all we have time for. Yeah. Really fun to look back and think about some of the topics that were raised by such amazing guests that we’ve had on the podcast.
Guy: And I wanna extend a huge thanks to all of you listeners who are really the reason that these guests come on to have these conversations with us. Right. And they, just all the feedback that you get and the suggestions for different speakers and different guests that we will have. So thanks. Keep on listening. Subscribe if you haven’t already. And if you have suggestions for guests that we should have in the next hundred episodes, yeah, and beyond, send them over. And of course, you know, to all these amazing guests that have spent their time sharing their learnings, their perspectives with us, with you, big thanks, very much enjoy the conversations, learn from the conversations. And you know, at this pace, one of the critical elements of being able to deal with this space of change is community, it’s about learning together, sharing perspectives, sharing these learnings. So we’re very grateful for you for sharing those perspectives and we’ll continue to provide a stage for that to happen.
Simon: Seconded, well put. Thank you very much for listening and tune into the next episode. Thanks Guy.
Guy: Thank you.
Chapters
In this episode
When did writing code stop being the job and start being the hobby?
One hundred episodes in, Guy Podjarny and Simon Maple pull the clips, check the predictions, and trace the through line across conversations with guests from Datadog, ElevenLabs, GitHub, and more.
They get into:
- The move from spec-driven to context-driven development
- Why humans become the bottleneck in code review
- What changes when agents run the SDLC end-to-end
- Adoption across orgs vs depth of actual usage
Thanks to every guest and every listener who made this possible. On to the next hundred.
Want to have these conversations in person? AI DevCon is coming to London on 1st and 2nd June, 2026.
From Spec-Centric to Context Engineering: Lessons from 100 Episodes
The pace of AI development makes predictions precarious. In episode one of the AI Native Dev podcast, Guy Podjarny predicted a shift from code-centric to spec-centric development, where developers would specify what they need and AI would provide the implementation. Looking back after 100 episodes, that prediction was partially right and partially, as former GitHub CEO Thomas Dohmke would say, fundamentally wrong.
The milestone episode brought together clips from guests across the first 100 conversations, with Guy and Simon Maple reflecting on how their thinking has evolved. What emerged was a consistent theme: the industry has moved beyond simply telling AI what to build and toward training AI how to build.
From Specs to Context: The Evolution
The original spec-centric vision imagined capturing intent in natural language specifications that AI would implement. That part holds up. What the early framing missed was how narrow that focus was compared to what great developers actually do.
"When you think about your dev team and what you expect of them, you don't really say, hey, make sure that every time you read this doc and follow the exact instructions," Guy reflected. "Generally you want them to make good decisions, including choosing when to update documents and read them."
A great developer in any organization brings judgment about quality versus speed tradeoffs, collaboration patterns, testing approaches, and infrastructure choices. Spec-driven development addresses a subset of those concerns. The fuller picture involves what Guy now calls "speaking the programmer" rather than "speaking the program," essentially training AI agents to behave like the developers you would want on your team.
The unit of software has shifted accordingly. Where the early conversation focused on specs, the industry now discusses skills, broader instructions about how to develop rather than just what to develop. Context engineering has become the discipline of providing AI agents with everything they need: not just task specifications but organizational standards, architectural patterns, and decision-making frameworks.
The Human Bottleneck Problem
Thomas Dohmke articulated what many guests confirmed: humans cannot review the volume of code that agents can produce. Running ten agents in parallel around the clock generates more code than any person can meaningfully evaluate without becoming the bottleneck that erases productivity gains.
The response pattern that emerged across episodes was not to simply skip review but to automate the end-to-end software development lifecycle. Code review is the immediate pressure point. But as Guy noted, once code flows through review, the next constraint becomes deployment, then observability, then incident response. Throughout the entire cycle, human involvement at the same intensity as before breaks down.
"Our aspiration really has to be to identify every single step of that process and automate it," Guy observed. The human role shifts from frontline execution to something closer to management: defining what correct looks like, conveying instructions, identifying mistakes, and resolving them.
This maps to the context development lifecycle (https://claude.ai/blog/context-engineering-guide): generating context about desired behavior, testing and evaluating whether instructions are followed, distributing that context to agents, observing what happens, and learning from the results to update instructions. Humans operate in that lifecycle while agents operate in the software development lifecycle itself.
What Makes a Great Developer Now
Annie Vella from Westpac surfaced an uncomfortable truth: the skill of solving complex coding problems quickly, the skill celebrated in technical interviews and leaderboards, matters less when AI can generate good-enough solutions faster than humans can type.
"That is a skill I think we're going to need to learn to let go of, both in interviews and personally," she noted.
Guy distinguished between developers motivated by the craft of coding and those motivated by creation and impact. For the creation-oriented, agents are tools that shorten the journey from idea to execution. For the craft-oriented, there is an adjustment to make.
The best developers, even before AI, were those who understood the problem deeply and saw code writing as translation from understanding to implementation. That profile remains valuable. What changes is the speed of iteration and the loss of thinking time that slower coding provided.
"As coding becomes something that's very fast and actually taken away from us slightly, we do lose that time to think as we're building," Simon observed. "But what we do gain is that faster iteration which allows us to gain feedback faster and throw away quicker."
The emerging pattern involves making architectural decisions explicit, having agents document their reasoning, and being willing to discard and rebuild rather than incrementally modify. Chad Fowler's recent episode on regenerative software and Phoenix architecture directly addresses this shift toward disposable, rebuildable code.
DevOps Context and Production Reality
Armon Dadgar from HashiCorp emphasized early that context differentiates great AI from useless AI. You might hire the world's best SRE, but they need to know whether your organization runs Windows or Red Hat in production before they can contribute meaningfully. The same applies to AI agents.
The DevOps world has embraced context engineering for root cause analysis, where sifting through massive amounts of observability data plays to AI strengths and where errors waste time rather than destroy systems. Production deployment and modification remain more cautious. The blast radius of dropping a database in production creates appropriate hesitation.
Mirko from Dash0 and Olivier Pomel from Datadog both emphasized context as central to troubleshooting and automation. The dream Olivier described, never waking up at 3 a.m. to fix an issue because the system handles it automatically, requires precision in root cause analysis that was science fiction four years ago but now seems within reach.
The flip side, which multiple guests addressed, involves security risks accelerating alongside capabilities. Attackers use the same AI capabilities to find vulnerabilities, craft phishing attempts, and exploit supply chain weaknesses. The forcing function pushes toward end-to-end automation not just for efficiency but for security response speed.
Amplifying Good and Bad Equally
Birgitta from ThoughtWorks captured a critical insight: AI amplifies indiscriminately. Good engineering practices get multiplied. Bad ones do too. If your code base contains patterns you would not want replicated, agents will replicate them. If your knowledge base has outdated documentation, AI will learn from it.
This drives the shift away from "learn from our code" approaches toward explicit context engineering. Rather than hoping AI infers correct behavior from existing artifacts, organizations increasingly define what correct behavior looks like and provide that as context.
The observation requirement extends beyond initial setup. Context goes stale. Practices change. Security vulnerabilities emerge. The systems need to be self-healing not just in the DevOps sense of recovering from failures but in the sense of updating their own instructions based on observed outcomes.
Intercom's approach, where a test generator finds bugs and then searches for similar patterns elsewhere in the system, represents this direction: closed-loop learning that updates context based on production reality.
What Comes Next
The retrospective pointed toward several near-term pressure points: automated code review to remove human bottlenecks, security response automation to match attacker speed, and observability-driven context updates to prevent staleness. Each requires extending the end-to-end automation that cloud-era development initiated.
Some things tolerated in the cloud era will not be tolerated in the AI era. Guy drew the parallel to how practices acceptable in waterfall became inadequate for cloud speed. Similarly, practices acceptable for human-paced development will not survive AI-paced development.
The consistent thread across 100 episodes has been treating AI as team members to be trained rather than tools to be wielded. Onboarding, continuous learning, and organizational alignment matter for agents just as they do for human developers. The difference is that agents can scale, which makes getting the training right both more valuable and more urgent.
Worth listening through the full retrospective for the guest clips and additional commentary. And for those keeping score on predictions, Guy's spec-centric framing was directionally correct, just insufficiently broad. The next 100 episodes will reveal which of the current frameworks hold up and which get labeled fundamentally wrong.