Registry EnterpriseDocs

"Hacked 7 YC Apps

in 30 Minutes"

René Brandel

Founder and CEO, Casco

Back to podcasts

The Hidden Vulnerabilities Behind AI Code

18 Nov 2025with René Brandel

Also available on

Transcript

[00:00:18] Simon: Hello and welcome to another episode of the AI Native Dev. We are gonna have a very fun episode, a very lively episode, I'm sure this time around. We have Rene Brando, who is the co-founder and CEO of Casco. And let me give you a few highlights before we introduce Rene and kick into this episode.

[00:00:38] Simon: Rene was previously the head of product at AWS and the inventor of Kiro. So he is very knowledgeable and deep into the spec oriented world. So we are gonna have some great topics of conversation there. After that, Rene went ahead and founded a company called Casco. And Casco’s mission is to make all [00:01:00] software effortlessly secure.

[00:01:01] Simon: Wonderful mission. One of the most amazing stats from this is and of course Rene is gonna be giving an episode at the AI Native Dev Com, which we will talk about in just a sec. But one of the beautiful stats from this is Rene hacked seven of sixteen YC publicly launched companies of the Y Combinator Spring Batch in get this 30 minutes. And as Rene said, this gets him into most conferences just alone from that quote.

[00:01:31] Simon: We are gonna talk about that and a bunch more. But without further ado, Rene, welcome to the episode. How are you?

[00:01:38] Rene: Yeah, I'm good. Super happy to be here. Thanks for having me.

[00:01:42] Simon: Absolutely. It's an absolute pleasure. So, so Rene, you are the co-founder and CEO of Casco. Give us a very brief introduction to yourself and also what you do as part of this role.

[00:01:53] Rene: Yeah, absolutely. So as you mentioned, I am now a startup founder. Our company performs [00:02:00] autonomous security testing against AI apps and agents. Think about it as if you have a pen tester on steroids that can really pen test thousands of different attacks at the same time and tell you which vulnerabilities your software has.

[00:02:14] Rene: So then you can just fix the ones that are truly exploitable.

[00:02:18] Simon: Awesome. Awesome. And before we jump into Casco and some of those amazing things in the security space, you have done some pretty cool stuff at some pretty big companies, Skype, Microsoft, AWS, a couple of small startups there.

[00:02:32] Simon: What are the common learnings, would you say, through those experiences that you can teach us about?

[00:02:39] Rene: Yeah. So I think the most important piece that I have taken away from all of these different experiences is really focusing on the customer experience. I think it is very easy to get bogged down into one particular metric, one particular issue, right?

[00:02:54] Rene: But always looking at the overarching customer experience and continuously optimizing for that [00:03:00] always pays long term dividends. I started my first software engineering job at a database ORM company called Prisma. It was my first time actually learning how to focus on developer experience, why it really matters, how every single keystroke of what the developer is typing truly needs to help them get their job done faster.

[00:03:23] Rene: And that ultimately led me to building more and more developer tools. So at Skype I worked on the Skype Web SDK, like a little chat widget you could embed everywhere. And back then there was a phase in I think 2015 2016 where chatbots were cool, but they were nowhere near as useful as today.

[00:03:41] Rene: Right. And then ultimately, yeah, at Microsoft I helped also develop a web SDK that connects with that connects with voice recognition and actually allows you to talk to assistant like Cortana at the time. Yeah. And my last job at a AWS was working on Kiro, I I [00:04:00] invented, it was one of the most fun projects because we were really trying to figure out how can we build entirely new developer interface almost using a specification and what kind of opportunities could that unlock

[00:04:13] Rene: And also kind of striking the balance between like what is practical right now versus where we see the future going and how we can pay these exponentially pay back these exponentially improving dividends. Right.

[00:04:27] Simon: Yeah. Yeah. Wow. So many questions I would love to go through with with with some of that.

[00:04:32] Simon: Before we jump into Kiro, and I would love to I really enjoy what you kinda like said there about the today and the tomorrow and what is possible. And I would love to dig into that. There is one other thing that that is kind like a a claim to fame, which is you won Europes largest hackathon back in 2016, which seems crazy, like oh my God, that is a decade ago.

[00:04:52] Simon: Right But but it was a you you built a voice to code agent. And this was obviously you know when did [00:05:00] when when when was the big boom of of of chat GPT and things like that Back in 2022 was it I cannot remember 22 yeah 23 I think

[00:05:08] Rene: Yeah, somewhere around that range.

[00:05:08] Simon: 2022/23. So so this was before Gen AI was was a big thing on everyone's lips.

[00:05:14] Simon: YYou know most c levels were not encouraging lets say their their their companies to to find to to to wedge and shoehorn AI into everything they do. So so I guess you know from from that point of view did that did that kinda like really help your your growth in the in the LLM AI space your your way of thinking?

[00:05:35] Simon: Talk us through a little bit about that.

[00:05:37] Rene: Yeah yeah. I mean as you said it was almost a decade ago now kind of feel very old now that you mentioned this. But yeah the technology landscape at that time was completely different. Like when you wanted to build anything with AI it wasn't as random access as freeform as you have today with [00:06:00] generative AI.

[00:06:00] Rene: So, for example, when we wanted to build this voice code, you had to start off with as few preset routes that existed. So at the beginning you could talk to it and be like, Hey, I want to build a blog post web app or an e-commerce site and whatnot. And then we would use intent recognition; effectively we realized, oh yeah, you want to build something.

[00:06:21] Rene: And then entity extraction: is it like a website, a mobile app, a blog, or an e-commerce site and whatnot? So you have to actually programmatically include a lot of these different flows in your thinking and then extract out and map that ultimately to programmatic code, right? Like very deterministic code.

[00:06:41] Rene: And it was really cool. It was an amazing demo, but obviously because you have to kind of follow these quasi fixed routes, it did not give you the utility you have today with most coding agents, right? Because with coding agents today you have so much freeform.

[00:07:00] Rene: Like, you can just look up this file that's somewhere on my machine and then take that information, combine it with this, like this is impossible to have done it back then. And even back then, technology stack was completely different. We had to use IBM Watson, Google Cloud, and Microsoft Language Understanding Service.

[00:07:19] Rene: It is now, it's called, now it's obviously an Azure service, just like wire all these things together. It is much harder to build. Um, but it was a, it was a time where I, I almost felt like, you know, I opened the door to the future and could kind of glimpse through of what kind of a, what, what opportunities will exist as technologies become better.

[00:07:40] Rene: And, and yeah, I mean this was before most of my, most of my technical career, as a, as engineering man, as a software engineer and a product manager. And it really also made me fall in love with this concept of working with developers and focusing on developer experience, 'cause there's just so much to unlock and as you [00:08:00] can really create a world where you can speak anything into existence.

[00:08:04] Rene: And I think that's so unique about software development. Yeah.

[00:08:08] Simon: Yeah. Yeah. And, and there's a, there's a few things there that are, that are super interesting. Um, almost like technology jumps that really unlock the potential for things like AWS Kiro, right. Um, whereby, you know, every product that tries to, you know, change the way or rethink the way things are done, they need certain things to be in place.

[00:08:33] Simon: Timing is absolutely key. And not just technology, but individuals, developers and users to be able to consume that. And I think obviously with the leap that we've seen with LLMs and, and, and agents in particular, being able to take on tasks without the user being as prescriptive with their requirements.

[00:08:56] Simon: When we think about [00:09:00] things like spec driven development, how, first of all, you know, when you mentioned before about what's possible today versus what's more of a future kind of thing. Where do you think spec-driven development, and maybe that's quite an overloaded term these days, but how possible do you think spec-driven development is today?

[00:09:23] Simon: What parts can we really leverage today versus we should be exploring these types of things, but ultimately it's something that we need to look forward to as a future vision?

[00:09:33] Rene: Yeah. Yeah. I think with spec driven development today, we see a lot of, even in our own team in Casco, we see a lot of good value for you to create.

[00:09:43] Rene: Like, individual features at extremely high quality, right? Because if you've ever done a lot of vibe coding, you know, it's like, it feels like a video game. You go kind of back and forth trying to always make a little bit of incremental progress, but you actually [00:10:00] it's hard for you to see the entire map, if you will, of how you're supposed to land at a particular particular outcome.

[00:10:07] Rene: But with spec-driven development, what I like is that you've done a lot of the thinking upfront. Like you think about what are the use cases, what are the customer experiences you actually wanna unlock? What are the constraints in the system? How should you be working against those constraints? You actually iterate on the spec quite a bit, right?

[00:10:24] Rene: And then you one shot a lot of code. And so I think it is really good for you to have boundaries and the features that you're building and then deeply, deeply internalize what are the end customer experiences you wanna unlock. Work on getting to the right level of fidelity there as deep as you can. And then from there, generate code.

[00:10:45] Rene: And that, that's just been a huge unlock. For example, for us, you know, we're building an AI agent that actually hacks other software, right? So with every single thing, every single vulnerability that we find, we create a gym, if you [00:11:00] will, that the agent trains against. And we can just write a specification of like a base gym of what are all the different constraints, how's it deployed?

[00:11:09] Rene: Is it, you know, does this one have to be deployed in a Kubernetes cluster? Can this just be a serverless app? And then this, the stencil, if you will, of a spec, it gives us a lot of the constraints and the system out of a box without being too rigid in terms of how it's programmatically implemented.

[00:11:30] Rene: And then we can write specific requirements for each individual vulnerability that we find. And so that allows us to be super, super quick in developing these new new gyms, and we can just develop, you know, hundreds of them a day and with, you know, the better. Just like in real life, if you go to a gym more, if you do it consistently over time, you get obviously better over time too.

[00:11:53] Rene: Yeah.

[00:11:54] Simon: Yeah. And these gyms, would you say they're like, how, how specific are they or general are they [00:12:00] to industries, organizations? Should we be looking at building our own gyms, or are these gyms kind of shareable?

[00:12:09] Rene: Yeah. Yeah. So I think, I think right now the way to think about what's practically the landscape in spectrum development is I would treat Jim as a feature from Casco, and you can build really good features using specifications, like can be very new.

[00:12:27] Rene: I think that is the world we live in today, but we are moving to a new world, I think. So what I'm seeing is a lot of teams that are small and high density in talent can outperform many larger organizations. And I think that's always been true of some breakout startups, but I see it now more than ever.

[00:12:52] Rene: And the best way to outcompete larger teams is to shift our mindset from building [00:13:00] individual features to figuring out how do we build individual reinforcement loops that just make the feature better over time. Mm-hmm. And so an example is when our agent actually performs a penetration test and tries to exploit a system and realizes, hey, there is this thing there that I have a hunch on, but I don't have the right tool.

[00:13:26] Rene: To actually try to make that exploit attempt, it is actually gonna go kick off a new coding agent that now realizes, oh wait, there is this tool gap that I have. Let me go quickly build a new tool and then include it in the next deployment. So the next time a new customer comes around and we're doing a new test, now the agent has more capabilities.

[00:13:48] Rene: It's almost like you need to think about how do you use your own agent to self improve the product every single day? And that way a small team of individual developers can [00:14:00] outcompete entire engineering organizations because as you sleep, the product just gets better, right? Yeah. And that is, that's a mindset shift that will have to happen.

[00:14:09] Rene: And I think a lot of opportunities are still there. Like should this be a developer framework that needs to exist around this? Right now we handle all of that ourselves. And yeah, I just think that's a new paradigm we'll see in the next couple.

[00:14:24] Simon: Yeah. And, and it's a really interesting and important area that is probably not thought about as much when we think about the improvements.

[00:14:31] Simon: People are thinking more about the improvements to the app versus their workflow, how things get built versus what is getting built, which is an important distinction to me. Yeah.

[00:14:39] Rene: Yeah, yeah, yeah. Absolutely.

[00:14:42] Simon: Now, now you mentioned vibe coding as well alongside kind of like a more spec-driven approach.

[00:14:48] Simon: Do you see spec driven as something that is replacing vibe coding, or do you feel like they each have a role, each have a task and you use them for different reasons? [00:15:00]

[00:15:00] Rene: Yeah. I, I firmly believe that both will exist and for their own use cases.

[00:15:06] Rene: And I will even go further. I think hand authoring code will also exist. So like, you know, we will not be looking back in time and be like, oh yeah, we should only do Spectrum Development. We should only do vibe coding. I think what we'll realize is that these new paradigms have introduced new ways of how developers can stay in the flow.

[00:15:34] Rene: And when you stay in the flow, it means you need to achieve your outcome in a very productive manner that's peaceful where you're just working towards where you're locked in, you know. And I think when you do, for example, very specific hand authored code changes these days, it's because your brain has been thinking about an algorithmic problem and you're really gonna tease apart what is exactly going [00:16:00] on here that I need to manually modify to get this unprecedented optimization out.

[00:16:06] Rene: Right. And I think with vibe coding, what we'll see is right now it is just like people use it obviously for production code and whatnot, but I think what we'll see is that it is becoming the best way to quickly prototype new ideas and try things out. Mm. Like mm-hmm. The idea of seeing things live, things iterating fast, and then you make a change that you want to immediately prototype out.

[00:16:36] Rene: You want to ship it to your customers, see before you really pour in the cement if you will, to lock that feature in. You can quickly experiment. If you look at most Y Combinator startups, they have built companies worth of code before actually settling down on an idea because now you can experiment so much faster, right?

[00:16:57] Rene: I've actually seen some companies that [00:17:00] live vibe code new features while they're doing a demo to the customer because they're like, Hey, is this what you're looking for? You know, this is like, this is something we can build and can come down the roadmap and kind of just create new opportunities on the fly.

[00:17:13] Rene: And so I think that's what vibe coding is really, really good at. And, yeah,

[00:17:17] Simon: Yeah, it even allows the customer to almost be able to say, look, let me tell you what I'm looking for. And they could almost vibe code a solution as well to say, what about this? And it, it, I, I was talking with someone who was a developer, or rather they were leading a development team, and it was their design team that came to them and said, Hey, do you know what, I think I'm the bottleneck now.

[00:17:44] Simon: And so they set them up with cursor and I can't remember what else, but they then were able to vibe code various things to get various design works or design artifacts built in a way that engineering teams and others could more gracefully see [00:18:00] what they're trying to show. And, you know, you can get a much nicer POC or at least a prototype of that kind of thing.

[00:18:07] Simon: It's amazing what it unlocks.

[00:18:09] Rene: Yeah. It really raises the floor. I guess that's the right acronym or analogy. It raises the floor of what software is like; there's at least a very base minimum that's very good already that people can obviously start off on. Like, we actually see that in our own team because we are a security company, obviously.

[00:18:34] Rene: And so we have security engineers, security experts. They have used Python their entire life because it's really good to build out scripts, to show different exploits and whatnot. But when they see a gap in our product, which is almost all TypeScript, they can actually start creating prototypes of what they would like to have changed and ensure our engineering team can then take it forward on the engineering side to really productionize the [00:19:00] environment.

[00:19:00] Rene: So it is not only just the ability for people to enter the space easily, but also I think it unlocks a new way of how engineers work with other cross functional roles. Because now they're kind of brought together when we're speaking the same language now, like this can work.

[00:19:19] Rene: Yeah.

[00:19:21] Simon: So you, so you talk about raising the the the floor, which I love that analogy because it kind of really shows us that, well, it's not, you know, vibe coding isn't just for engineers, but it allows others to be able to enter software development, certainly at a POC or a prototype level.

[00:20:36] Simon: And I guess when we think about raising the ceiling, this is where SPECT driven development or the ability to write more using natural language more concisely to describe what we want and have that built in a more predictable way. And that's not just for engineers, but I think one of the things that engineers are lacking from vibe coding is not just that spec specific.[00:21:00]

[00:21:00] Simon: Not just that, oh my God, I'm not gonna use say that word now. Specificity.

[00:21:05] Rene: Specificity. It's okay. We can, we can say it together in practice.

[00:21:09] Simon: I'm gonna say detail. And it's not just that detail in a spec, but the ability to trust the output of that through validation and so forth. I'd love to ask you some questions about, you know, obviously you were there from day zero at Kiro.

[00:21:28] Simon: First of all, I'd love to ask you, what makes a good spec? And I'd love to talk about this more, first from a general perspective, but leaning into security afterward. And I guess maybe even a precursor to that question, what were the biggest learnings that you had seeing the Kiro journey of what a good spec could look like?

[00:21:55] Rene: Yeah. Wow. Okay. That's a very loaded question of a lot of, a lot of different—

[00:21:59] Simon: You can choose to answer those ten questions that I just asked in any order you wish.

[00:22:07] Rene: Yeah. Yeah. I think one, I think I will first maybe go through what are some user experience things we tried that ultimately didn't quite work out, because I think those are the intuitive things that people jump towards, but then it doesn't really quite click.

[00:22:26] Rene: So I think the one example I'd like to highlight is we tried basically aligning specifications to end to end tests effectively. Every single spec just maps to an end to end test. And if the test is written, you can just run the vibe coding agent against those tests until the test passes.

[00:22:47] Rene: So that was a big initial step of the problem. We thought, okay, if everything you say in the spec is true, then there must be a test that matches everything to be true. But I think there's a practical problem here, which is we didn't quite figure out how to really synchronize specification with code all the time, right?

[00:23:06] Rene: Like, doing that one to one mapping is kind of very hard because language is unbounded, but then code has very specific paths. And it could be specifications that overlap, right? Like 1, 1, 1 person might say, yeah, this functionality should perform user invite flows with admins only, and another spec says no, it should only provide user invite flows with members only.

[00:23:31] Rene: And so there's conflict, there's a lot of these zoom problems and it's especially hard when you try to synchronize those specifications with code bidirectionally. So I think what we've learned very quickly on is like this bidirectional, while it is an aspirationally good idea and I think looks really cool on paper and on demos and whatnot, it was actually not practically implementable.

[00:23:52] Rene: And the LLM tends to run much longer in a way where the experience does not feel enjoyable anymore. And, you know, developers wanna be in the flow, right? So my first, my first intuitive thing that actually turns out to not quite work is have specifications map one to one to tests and back and forth and keep all these changes in sync.

[00:24:14] Rene: I think, I think that was a good learning about how do we now develop a specification format approach that's practical, right? And I think before we go into what we ended up doing in Kiro, I do wanna highlight a talk from Dexter Horthy around advanced context engineering for coding AI agents.

[00:24:37] Rene: I'm not sure if you've seen that before, but that's one of my favorite talks. And he actually talks through about how should coding agents manage their own context windows so they can produce high quality code at the end, right? Mm-hmm. And a key part that he talks about is like, you need to have the ability for you to first do some research, understand what is the end user outcome that you wanna develop.

[00:25:02] Rene: And then from there, start creating a design documentation, almost like this is how the system should be implemented, here are the relevant factors that are important, and then taking that design artifact to ultimately create a good coding implementation plan, and then ultimately write the code for it.

[00:25:20] Rene: And one thing he mentioned that I think is highly relevant is that if you have one bad line of a user story or a specification when you first research what you're supposed to be doing, that will lead to a thousand bad lines of a design and a thousand bad lines of design will lead to, you know, a million bad lines of code.

[00:25:48] Rene: Right? So really thinking about how do you match that context and where to spend time is super important. So how does it apply to a spec, right? With specification, there's this meme online where there's like, next time the execs ask me to build this thing, they just imagine, go build me this site. No bugs, please go.

[00:26:09] Rene: Right? And that's just not the reality, right? Most of the time it's thinking. And if you look at good software engineering practices, they don't just jump into the code directly. The best engineers that ever worked with in my life, they always are like trying to understand, okay, what's the end user outcome?

[00:26:25] Rene: What's the business outcome we're trying to drive? What are constraints in the system? And so I would actually spend time to really, really think about what are the user stories ultimately you wanna develop. What are the constraints that you wanna develop and how, what is good enough now versus good enough for the future?

[00:26:45] Rene: Like really lay that out and you can actually use, you know, Claude, ChatGPT, kind of go work on those things back and forth. And more often than not, I realized it's really good to ask Claude, Hey, what am I missing in this specification? And then you'll realize, huh, there are some questions here that are super relevant when I'm actually going down implementation route that I forgot to ask myself.

[00:27:09] Rene: And when you develop that approach where you really think about the end outcomes, the constraints, and then go and ask your idea of choice, be Kiro or something like CodeLayer to go and develop a design plan or implementation plan, then the design implementation plan is just better, right?

[00:27:29] Rene: They're just more detailed, they're more refined because there's better context management. And then from there, the implementation of the code is just better and more refined and you can one shot these 10,000 line PRs that people can accept because they work, right. And so yeah, this is I think the most important thing.

[00:27:49] Rene: Think about how to set up your user stories, how to set up your constraints, and keep asking someone like Claude or ChatGPT for help to see what you're missing.

[00:27:59] Simon: Yeah. Super interesting 'cause that, I guess when I asked that question I wasn't, I wasn't quite specific in terms of, you know, well do, I mean the structure of a specification or the layout of a specification and it's super interesting that you kinda lean more into, you know, what is it that a developer should put down.

[00:28:18] Simon: And I think it's definitely a piece that is overlooked. And I think you're absolutely right, whereby, well certainly when we think about the difference between how a software engineer would approach something like a vibe coding exercise versus here's a problem, let's start writing some code. A developer would absolutely stop and think before writing the code, and they'll think about the structure, et cetera.

[00:28:43] Simon: Whereas with vibe coding, you're far more likely to just write something just to short prompt and get something working and then iterate. And I think it is very interesting that actually this is a slightly different way of building with LLMs. We do need to continue to step back and think about it and think about the user and think about all those types of considerations.

[00:29:05] Simon: So I love the, I love the answer from that point of view of actually turning it into a craft of how we build, which is the right way of thinking about it.

[00:29:13] Rene: Yeah, and I think a good check for yourself is when you realize that when you're in vibe coding mode, the LLM will always tell you, yes, you are absolutely right.

[00:29:24] Simon: Yeah. Yeah.

[00:29:25] Rene: Whereas when you're thinking about how to develop respect, it will actually ask you the hard questions that you need to figure out. Right. And you wanna be put into a position where you're being asked the hard questions, where there's the unknown and not just be like, oh yeah, you're absolutely right.

[00:29:39] Rene: Well, I'll just do exactly as you said. Right. It's actually most of the times in the real world with anybody that you work with, the people that only agree with you are not gonna be the most productive to the ultimate outcome you're trying to drive. You actually want people to ask the hard questions so that you can think through the details and then really tease that apart so you can deliver a better product.

[00:29:58] Simon: Yeah, yeah. [00:30:00] Fascinating. And I think, I think LLMs actually do a very, very good job of that, being able to, you know, question your decisions or go deeper in things that actually we may not even think about because our focus, our context is on, is in a certain perspective or a certain part of that specification, and we just actually omit to think about those things.

[00:30:19] Simon: And I think LLMs possibly, maybe even, you know, more than humans, are very, very good at being able to ask those questions that we wouldn't necessarily naturally think about. Yeah. Um, what about from a security point of view then? 'Cause obviously you are, you're, you know, you're, you're leaning more and more into the security field now. From a security point of view, what do we need to think about from a specification?

[00:30:41] Rene: Yeah. Yeah. I think, let me maybe first start with like an experience I've kind of made with one of the first prototypes of Kiro that I, this was like before I had an engineering team. I built it myself, barely taped it together. So this is not the, not the final product obviously that got shipped, but in this very early prototype, [00:31:00] I basically developed a specification around how should it go.

[00:31:04] Rene: And how, how should this website look like I'm building? Where should it be hosted? Obviously I worked at AWS, it should be hosted on AWS. And then I basically just let it run and I went for lunch. So I left my computer at work. It was just like doing its thing. I'm like, okay, when I'm back, I'm sure the website will be like, that's fine.

[00:31:24] Rene: Right? When I came back, I was horrified because I saw my terminal window literally blow up with things. And it was like a situation where I realized, huh, yeah, it actually never had access to my AWS credentials. The deployment would've never gone through, but wait, there's a link. It's online. What's going on?

[00:31:47] Rene: Right? So it turns out it found a way. Yeah. It found a way. It found, so it actually turns out it found some hardcoded credentials that I had on my computer from a development environment that I had from a long time ago for a completely unrelated project. It searched my entire file system using a RegEx, trying to match the AWS secret access token and secret key IDs.

[00:32:13] Rene: And then it found those credentials, piped that into the new project, and deployed a new website. And I was just like, oh my goodness, what is happening? Like, is this what AGI looks like? Anyway, so yeah, I was like, oh God, this is, this is, I mean, cool, but that's also crazy. It's like very dangerous. What's going on?

[00:32:33] Rene: I'm lucky it was just my dev credentials at the time. And that really compelled me to think about, okay, what is the security aspect as we go down this route when, you know, teams start shipping tens if not million lines of code PRs every day because of this amazing technology that we have.

[00:32:52] Rene: What are the security implications of all of this? Right. And I then started researching and there was some, there's like this [00:33:00] statistic about how LLMs have been generating code over the last few years. And it went somewhere from like 17 percent correct, syntactically correct, to like 98 percent correct, last time I checked, which is effectively almost a hundred percent.

[00:33:11] Rene: And then I was like, oh, that's cool. Like it is getting better at code. That's awesome. But then if you look at the security benchmarks, it's like flat and just very subtly going down even slightly. And I'm just like wondering, oh wow.

[00:33:29] Rene: So now we live in a world where we're writing exponentially more code while the security posture of them has not improved. And that basically creates this Swiss cheese of a bunch of holes all over our products, all over the surface area of the product. And who's gonna go fix that? Who's gonna go find all those issues?

[00:33:49] Rene: It's so much code. Right. And so that compelled me to start Casco, right? To really think about how can we take a new, refreshed approach on [00:34:00] software security and try to figure out what are the true vulnerabilities that matter to you when you have millions of lines of code shipping? And what are the ones that truly are exploitable and what are the ones to focus on, right?

[00:34:12] Rene: Yeah. That's, that's a little bit of the background there.

[00:34:17] Simon: Yeah. I'm curious, did it, did it check in your credentials to a Git repo somewhere?

[00:34:22] Rene: No, it did not. Actually what it did is it copied it and then prepended it into the deployment command with CDK. So it had the, so, so even worse. It was actually my batch history now.

[00:34:32] Rene: So those credentials are, yeah. Right. Yeah.

[00:34:38] Simon: It's one of those moments where you don't know, you almost can't be angry because it's so impressive that it actually, yeah, found a path or a route to it. Right? And that's, that's an interesting thing. Like from a security point of view, there are so many different ways an LLM can kind of, can kind of, you know, come to an answer or, or find a resolution to a particular, a particular issue.

[00:35:03] Simon: It seems it's almost like the WAF style of, you know, if there's, if you imagine an infinite possibilities of solutions, how can you, how can you even start thinking about providing guidance or providing some level of this is right, you can do this, this is wrong? Because I guess those rules will change based on any number of different factors.

[00:35:27] Simon: I guess, why don't we talk, why don't we start talking a little bit about when is the best time to think about security from an LLM's point of view? If we think about code, there are many places where we can almost like statically test, where we can ask the LLM to think about things or review code from a security point of view. Obviously the offensive testing is a little bit further, right?

[00:35:56] Simon: Although I say further, right? The way LLMs are building and being [00:36:00] able to produce, it's not that far, right? It's only a tiny bit more waiting. But, you know, should we be, should we be looking at security all the way through? Or is there a best place to stop and then say, right now we should test and this is the best way of testing?

[00:36:14] Simon: What are your thoughts on that?

[00:36:16] Rene: Yeah, I think the software development role is going through this massive compression of skills, right? Like a parallel I would draw is maybe two years ago asking a software engineer to just build a UI without a designer providing a Figma file or some wireframe would be like, Hey, it will look so bad. It would be embarrassing to ship it in front of the class, but that's not, that's changed.

[00:36:35] Rene: Right? So I think because people tend to be better reviewers than they are creators. That's like just the general human being. Like you can always look at something and be like, yeah, this looks kind good or bad.

[00:36:52] Rene: Like you can judge it, right? But creating it exactly as somebody has painted it is very hard, right? [00:37:00] So I think we've already seen that happen in different roles, like obviously design, certainly in product. We see great engineers who have like product acumen now be able to do more because they have the time to do it.

[00:37:11] Rene: And I think we'll see that in security as well over time. Now, there's some difficulties in security, which is like obviously there's statically these static analysis tools that everybody I think should just have as like a baseline. Most, you know, most integrations of GitHub kind of, there are many different integrations with GitHub that bring this out of the box, right?

[00:37:32] Rene: So it should be a very easy switch to flip, if that makes sense. But with security, it is not just purely a software engineering problem or just purely a security engineering problem. Security cannot be looked in isolation. It has to be looked at in the context of the business, right?

[00:38:01] Rene: So, for example, if you use a random scanner, if you will, for your URLs that you expose on your application, it will flag that, Hey, this particular photo URL is public. You must have made a mistake. But what if it's Google Photos and I make that photo public by intention? That is the business decision I have made. And so the lack of semantics in security technologies today is causing all of these false positives, noisy alarms, and people are just swamped with a bunch of different alerts and getting paged without the tool actually being aware of like, oh yeah, this actually doesn't quite matter.

[00:38:42] Rene: It's really not that important because this is what the system was designed to do. Right. And so I think we'll see this compression of security into software engineering happen, but it will be a little bit slower because there's even more things that we need to kind of tie together. The business context, right?

[00:38:58] Rene: The obviously understanding security paradigms, but also software engineering paradigms. And yeah, the thing I would recommend for people developing with specifications is also just have a little access control MD file as part of your stack somewhere where you can just say, Hey, this is what I expect my end users to do.

[00:39:23] Rene: Here are the different roles that exist. This is what these roles are supposed to do. After you're done implementing, verify against these different rules and see if they were impacted or not and flag them back to me. Or reimplement the change with those things in mind. Something very simple as that immediately adds so much semantic context into the code that gets generated.

[00:39:45] Rene: So the LLM can help you be better, right? Can help you develop more secure software.

[00:39:50] Simon: I love that guidance to the agents. I think that's definitely an area that we are seeing certainly at Te. It's such a key piece [00:40:00] to actually generating not just accurate code, but building the way an organization or an individual wants to go about that workflow.

[00:40:10] Simon: If we go back to the very start of this podcast, they kind of opened with a number of, you know, super interesting quotes. One of them, and I will read this out so I don't get it wrong, you hacked seven out of 16 publicly launched companies of the Y Combinator Spring Batch in 30 minutes. It took you 30 minutes.

[00:40:28] Simon: No, I'm joking. Thirty minutes. Seven out of 16. That's amazing. First of all, like, you know, are these the kind of applications that are vibe coded style applications? Are they traditionally built by developers in a usual flow? What were the types that you attacked that

[00:40:48] Rene: Yeah.

[00:40:49] Rene: Yeah. I think, I think in our batch, over 95 percent of companies said that they were using AI to help with their coding. So like at least if you wanna know where the startups are [00:41:00] at, if you're not using AI, you're probably not gonna make it. Like this is just where a lot of velocity comes from and it is really impressive to see, go to demo day and see everybody that you've started a batch with.

[00:41:13] Rene: Most of them had zero dollars in ARR and they get hundreds of thousands of dollars in ARR with such a short compression of time. Used to be completely unheard of, right? And seeing this amount of volume of startups being able to accomplish that should really, really inspire everybody that yep, using AI to code is a competitive advantage you should implement in your business.

[00:41:33] Rene: Right. Now imagine you're not just a couple of people inside the startup, but actually a large organization, then you get these compounding effects too. Yeah. Now, however, on the note of security, right? One aspect of it is I think in the past, you know, a small team developing software just could not build [00:42:00] as quickly as mission critical of a software that requires such attention to security, right?

[00:41:52] Rene: And that's just not true anymore. Like, you know, we had companies in our batch that had deals with the US government by the end of it, right? It's like that security really, really matters there or like dealing with financial institutions.

[00:42:13] Rene: And so suddenly you have this, you know, small team of high density talent that's going to be able to create extremely mission critical software. And that is where then security does matter. So out of these 16 companies, obviously they, you know, it's no secret everybody was using AI to build their software, but they were in different life cycles of their stage.

[00:42:35] Rene: IIt could be like a lot of the software has been built over a year already, but then they added a new feature through just vibe coding and that ultimately caused a new vulnerability into your system. Right? There's also obviously companies that just build, um, just completely airing, Hey, LLM please build me this feature, please.

[00:42:56] Rene: Right. And let's hope this works out, kind of vibe two, of course, there's always this spread, but I think what is super important is with security, it's not like, oh yeah, if 80 percent of the code has been handwritten and security reviewed in the past, that means I'm secure for next year.

[00:43:13] Rene: That's just not true in security.

[00:43:16] Rene: Like if you just introduce one additional feature that is using a slightly different paradigm that has different access controls, different configurations, that is an entirely new vulnerability that's waiting to be exploited potentially, right? And so the reality is the best way to be able to security test new software in the future as the volume of software changes just gets exponentially more is to look at it from an offensive security standpoint.

[00:43:42] Rene: Like we really try to hack your software to the largest extent possible so that we can find the vulnerabilities that truly matter to you, that are truly exploitable. Um, and I think we'll see that then feedback into the software development life cycle and self improve the software by patching the security vulnerabilities as we go, right?

[00:44:08] Rene: Yeah. The situation right now is I think just the very beginning of the implications we see. Imagine once larger companies start adopting AI at an even larger scale without thinking about security quite as much, then I think we'll start to see some really, really bad large security breaches.

[00:44:37] Rene: And in that moment, I think the entire industry will revisit how we implement security in this age of tens of thousands of lines of PRs every day. Right.

[00:44:47] Simon: And, and would you say like a lot of these, you know, this, this kinda like, prediction of a greater number of breaches and a great number of vulnerabilities for those larger [00:45:00] organizations, should they be more worried about a new style of vulnerability or a new, a new attack vector?

[00:45:08] Simon: Or is it really the same vulnerabilities that we've always, we've come to know and love, that are just being scaled in terms of the number that exist in the code base?

[00:45:19] Rene: I mean, the answer is both, right? You'd be betting on the past if you believe that code is going to get less. That is just not going to be true.

[00:45:34] Rene: Like there's gonna be exponentially more code. And with exponentially more code, you will get more vulnerabilities that we know of today. So that's, I think, sticking the ground, number one, sticking the ground. Number two is software is gonna become stochastic instead of just deterministic because we're, because of our, you know, our use of LLM throughout the entire system, right?

[00:45:57] Rene: Be it to unlock end user feature, be it [00:46:00] to create that reinforcement loop we talked about. Be it like for us to make some business process easier. Like your software is gonna become stochastic and through stochastic software, like, you know, working with LLMs, they have security vulnerabilities. You can trigger them to call a tool, and that tool is maybe not properly scoped, and then suddenly you can exfiltrate data.

[00:46:22] Rene: Things like the lethal trifecta that Simon Willison likes to talk about a lot. It is a new set of vulnerabilities as well. But I think what's even, at least what's for me, even more important to think about is that those things are now coming in together. We have exponentially more code and the behavior of the code is less deterministic than ever before.

[00:46:44] Simon: And so looking forward into that future, what do you think developers need to change about the way they work, or the way they think about security to better address this?

[00:46:57] Rene: Yeah, I think first, being security minded and having some understanding of good security design is probably the most critical piece because if you have thinking in good design, you will get good code.

[00:47:15] Rene: Kind of like, you know, one good line of code leads to a hundred lines of good implementation. One bad line of code leads to a hundred bad lines of implementation, right? So thinking about security design, things like specifying the access control matrix, just writing down, Hey, this is what I expect for rate limiting.

[00:47:31] Rene: Right. Or even if you don't know, ask Claude, what is an industry accepted rate limit for this type of an API, right? Something very basic like that can already just throw that into your content in your context window as a markdown file and can already help you a lot, right?

[00:47:47] Rene: And then the other thing that I think will happen is tools like Casco will proliferate, right? We will live in a world where we want to make software just effortlessly secure. So that [00:48:00] means, you know, you committed some code, it's in a preview branch, it's immediately being security tested. You can immediately find out why there is a vulnerability in some way, and that gets immediately fed into an LLM that then can go and patch this code. You mentioned earlier as a joke.

[00:48:15] Rene: Oh, what? It took you 30 minutes. Yeah. You know, I think it was kind of a joke that, oh yeah, that's actually kind of a short amount of time. I actually don't think that's short enough of a time. My mission is to make sure we can bring that iteration time down so much it can actually be naturally integrated as part of your development life cycle.

[00:48:38] Rene: And obviously the cost of it as well, so that it's just part of your system because in no other way are we able, without this, we would not be able to make all software effortlessly secure. So that's really what I'm thinking about every day. Yeah.

[00:48:53] Simon: And that autonomy I think has always been important generally in, in, in developer workflows.

[00:48:58] Simon: But because that's [00:49:00] being sped up now because it's so easy to quickly build and create software through a number of tools that just amplifies the need for that autonomy. Right. And I guess for the offensive software, sorry, offensive security tooling, is there anything we need to change in terms of the way we think about more offensive security testing versus static security testing and how we build those into a typical workflow?

[00:49:32] Rene: Yeah. Offensive security is an interesting paradigm inside the software development world, right? If you ask a lot of people that sell B2B software, as part of their compliance requirements, they'll perform a penetration test, right? And most compliance frameworks would say one penetration test annually is enough and still give you like, [00:50:00] bam, you are approved for this compliance framework A, B, C.

[00:50:05] Rene: I think that will have to change because we're creating software at a much faster rate. Right. You can be secure on the day of being pen tested, but the next day the developer might have built a new feature that you didn't know about or was trying to experiment with something. Whoops. Suddenly the dev credentials are in prod, right?

[00:50:23] Rene: It's like there's many things that are changing at a rapid pace that we cannot manage anymore, right? So once-a-year penetration tests are not gonna be the future we're gonna be living in. And I think CISOs, especially on the buyer side of software, need to start insisting on more regular penetration tests.

[00:50:41] Rene: Because like once a year, like by the time you've bought the software, the software's already changed by the time you've received it. So, so I think that is 1, 1, 1 piece of like the, the mindset around it that that needs to, needs to change. Um, and then second is I think we need to evolve in a way where, [00:51:00] um, where the, the penetration testing, the offensive security, just gets performed also at much shorter cycle times and at much lower cost, right?

[00:51:10] Rene: Like, the reason why CISOs like companies buying penetration tests is because it is considered the highest quality security assessment that you could effectively get. Right? But if it, if it costs you like $10,000 for like a serious one that's not just like a random automated scanner, then it is cost prohibitive for most companies in the world.

[00:51:29] Rene: And when, when we are building so much software, $10,000 every time there's a software change is just not gonna work. So we need to be bringing down the cost on a per penetration test basis dramatically. So that's just part of your everyday software development life si lifecycle. You can maybe pay a small subscription that's monthly and it would just make your security effortless because it would just tell you, Hey, yeah, we found a new vulnerability and it's already fixed.

[00:51:57] Rene: Here's a PR for that. Yeah. Yeah.

[00:51:59] Simon: [00:52:00] It, it's amazing. Back in my, back in my Snyk days when I used to talk to a whole bunch of different folks about when, you know, as and when they run different types of security tests and when we talk about the kind of pen testing style, the number of people who said, oh yeah, we pen test.

[00:52:16] Simon: Oh. And I'm like, okay, how often do you pen test? Oh, once every six months or once every, you know, year or major release with the typical answers I get. And it's like, right, but you don't ship code once every 12 months, do you? No, no, no. We ship code every week all the time. And it's like, right, so, so who's testing this new code that you deliver?

[00:52:36] Simon: And there's no good answer. So I think the autonomy of that is super critical. Um, what if we flip the coin and say, well, actually, is the answer testing the code that the AI tools create? Or is the solution getting AI or maybe hoping that AI will one day deliver, you know, more secure code, have fewer vulnerabilities, get the models better at generating code that is more acceptable from a security point of view?

[00:53:07] Simon: How, how do you see that future?

[00:53:09] Rene: Yeah. Yeah. So I think the LLMs should be performing co-generation in a more secure manner. I think there's context problems that we're working with today. Obviously, like putting this access control stuff into your code base is important, right? There's just like awareness of new security vulnerabilities that are coming through, that, you know, the LLM doesn't know about yet because they were discovered after the context cutoff window from whatever the model was trained.

[00:53:39] Rene: So I think that's just like a problem space on its own that will exist. However, I don't think we're gonna be able to reach a world where that alone is sufficient, 'cause if you ask any security engineer, when they start looking at your software and asking a bunch of security questions, the first thing they do is draw a bunch of boxes on a diagram and draw different arrows in between, and then [00:54:00] start looking at, okay, what are all these arrows doing?

[00:54:02] Rene: Right? So it's always about how are things interconnected and what is the business semantics behind it? Right? And most of those things are not necessarily just in your code base or even in your own CI CD pipeline. You might be calling in external service that performs some work and that external service might change over time too under the hood without you knowing, right?

[00:54:24] Rene: And so there is a range of different reasons for why you are at the time of building and releasing you actually don't know the true security posture of your software, right? And so that is why I think offensive security will evolve as a primary mechanism to assess security posture of software moving forward.

[00:54:45] Rene: Because we have solved the, or we're in the process of solving some of the hardest problems in the space, like automating obviously the pen testing work, bringing in business semantics, bringing it in context, continuously running it, lowering the [00:55:00] cost, making it run faster. This is all things that are suddenly capable because of generative AI.

[00:55:06] Rene: And so it will make what's already considered in the past the best way to assess your security posture actually accessible to most companies in the world, most of our teams in the world. And I think we can really live in a world where all software is effortlessly secure.

[00:55:21] Simon: Yeah. And when that happens, and of course I think the automation and the continuous testing is, is super key for that.

[00:55:28] Simon: One thing that will very likely, almost certainly change is the roles that we as humans play, not just in the development lifecycle in general, but thinking about it more from a security engineering point of view. How do you see the role if this testing, pen testing, offensive testing is more autonomous, more part of the workflow?

[00:55:55] Simon: What do we, what do we expect security engineers to, to, to focus on? [00:56:00]

[00:56:00] Rene: I think in security engineering we'll actually see a similar wave, if you will, as we had with software engineering. So when GitHub Copilot first came out, and Chat GPT wrappers, and then the first versions of agent IDEs came out, there was a lot of sentiment in the world that, oh, you don't need as many software engineers as before, the managers could be able to do this work.

[00:56:28] Rene: Right? But we all know now that's not the world we're living in. In fact, high judgment, extremely good engineers are just being significantly more productive than ever before. We're asking, is middle management required, right? So I think the script is actually flipping a little bit backwards from even what the sentiment was like two or three years ago when these things were coming out.

[00:56:51] Rene: And I think we're feeling the same right now in security, right? I think in security these tools are coming out. People are like, oh yeah, maybe we don't need individual security engineers. And that's not the world. That's not the reality of the world that we're gonna be in. What will actually happen is these security engineers can work on things that they actually enjoy.

[00:57:08] Rene: Security engineers spend more time chasing down false positives today than implementing fixes for security issues. That does not sound like a fun job. Could you imagine your entire day is triaging warnings of things that you have a pretty good hunch on that it's not a real issue, but we're gonna spend four hours on each of these warnings every day?

[00:57:36] Rene: That does not sound like a fulfilling job, but you know what excites security engineers? Actually doing security research, teasing apart individual bits of clues and then figuring out, oh yeah, I can totally exploit the system. I found another vulnerability. Let me quickly help fix it. Playing that detective role is much more fun or doing more strategic work, right?

[00:57:53] Rene: Like thinking about how do I bring in security into the platform engineering space so it's consistent across the [00:58:00] organization. Those are actually things people want to do, and I think we're gonna see that happen now with security engineering where this grunt work, all these painful things of alarms that didn't truly matter,

[00:58:12] Rene: Oh, this random header that you didn't set that actually is not exploitable at all. Like all of that work will finally go away so that people can just live a much more fulfilling life by working on the things they enjoy. And I think that is just gonna be an immense opportunity for making sure that its security is just baked into your organizations and people can just do things that are, um, so much more that ultimately create much more effortlessness around security for the rest of the organization.

[00:58:42] Simon: Yeah. So, Rene, we should wrap up, but I'd love one more take on one last question. And I think when we think about issues that have occurred in the world, typically we need a big hack or a big breach before we start taking things [00:59:00] seriously and actually investing time and money into these various risks.

[00:59:04] Simon: Given that there's so much change right now, what is the, let's try and preempt that breach or that hack, what would you say is the biggest security risk in software today that the industry or companies are just not taking seriously enough, not investing the right time, money, and effort into trying to address?

[00:59:24] Rene: I will actually argue the biggest security risk is actually a process risk. Today we're in a situation where teams are empowered and rightfully so to deliver an unprecedented amount of features and code in a very short time. But we are not giving them the right tools to make security just baked in and effortless.

[00:59:46] Rene: Like today, when oftentimes you talk to organizations, security and software engineering is just like, oh my God, those are the guys that are gonna bring in all this bureaucracy, make me do all this work. And half of those issues are false positives. That is actually a process failure, right?

[01:00:02] Rene: And I think everybody should reassess in their organization, do they have the right process where security and engineering are effectively swimming in one lane together and actually making sure that both can deliver faster without stepping on each other's toe. So that means bringing in more business context into the security landscape,

[01:00:21] Rene: obviously giving them tools that make security effortless, that don't give you all these false positives that don't really matter, right? And also empowering security engineers to be able to contribute changes more directly into the actual software that's shipping.

[01:00:36] Simon: Yeah. Love it. Rene, what a wonderful chat.

[01:00:39] Simon: I really, really appreciate your time and thank you for so many insights, whether it be spec security, various products, defensive hacking. Really, really interesting conversation. Thank you so, so much for joining us today.

[01:00:52] Rene: Yeah, thanks for having me. And yeah, if you wanna learn more about autonomous security testing, go to casco.com.

[01:01:01] Simon: Wonderful. Thank you very, very much. And for our podcast listeners, thank you. I hope you enjoyed that episode. Tune into the next session. Bye for now.

Spec Driven Development (SDD)

AI-Native Development

Autonomous Agent

Chapters

Trailer

[00:00:00]

Introduction

[00:01:00]

Casco and Autonomous Security Testing

[00:02:57]

Spec Driven Development and Vibe Coding

[00:05:46]

The Rise of AI in Code Generation

[00:32:58]

Integrating Security in Software Development

[00:35:27]

Future of Security Engineering and AI

[00:40:17]

In this episode

In this episode of AI Native Dev, host Simon Maple and René Brandel, CEO of Casco, delve into the cutting-edge strategies for building AI-native software. They explore how blending spec-driven development, vibe coding, and autonomous agents can accelerate feature delivery and enhance security. Discover how small, high-talent teams can outpace larger organizations by implementing self-improving workflows and autonomous pen-testing to create more secure and efficient AI systems.

AI agents that pen test other AI apps, spec-driven development that “one-shots” high‑quality features, and small, high-talent teams outcompeting larger orgs through self-improving workflows—this episode of AI Native Dev dives into the new operating system for building AI-native software. Host Simon Maple welcomes René Brandel, co-founder and CEO of Casco and inventor of AWS’s spec-first project Kiro, to unpack how developers can blend vibe coding, specs, and autonomous agents to ship faster, safer, and smarter.

From Voice-to-Code to AI Agents: A Decade of Developer Experience

René’s throughline is unwavering focus on customer and developer experience. He cut his teeth at Prisma, learning that every keystroke should reduce friction and move a developer closer to done. That principle carried through his time building SDKs at Skype and Microsoft—embedding real-time comms and speech interfaces—and into his spec-first work at AWS with Kiro, which explored how an explicit specification can become a powerful developer interface.

Long before ChatGPT, René’s 2016 hackathon-winning “voice-to-code” agent showed what was possible—albeit with far more scaffolding and far fewer degrees of freedom than today’s LLMs. Back then the stack stitched IBM Watson, Google Cloud, and Microsoft LUIS to do intent recognition and entity extraction, mapping a handful of fixed “routes” into deterministic code. It made a great demo but lacked the freeform, random-access reasoning developers now expect from coding agents. Still, it offered a glimpse: the more fluent the interface, the more we can “speak software into existence.”

The lesson for developers is twofold: tools evolve, but the north star doesn’t. Make the happy path shockingly short for your target user, and choose the right interface abstraction—SDKs, declarative specs, or conversational agents—to minimize cognitive overhead without boxing in power users.

Spec-Driven Development That Actually Ships Today

Spec-driven development isn’t a silver bullet, but René argues it’s a practical way to ship high-quality, bounded features right now. The core move: do the deep thinking up front. Define the customer experiences you want, edge cases, system constraints, and non-functional requirements (latency, cost ceilings, compliance). Iterate on the spec until it’s unambiguous, then “one-shot” large swaths of code with an LLM and fill the gaps with human review.

This works particularly well when the boundary of the feature is clear and the integration surface can be described precisely. Think of the spec as a stencil: inputs/outputs, states and transitions, error taxonomy, configuration, and deployment constraints (e.g., this feature targets Kubernetes; that one is serverless). Use the spec to generate scaffolding, tests, and docs, and to constrain the LLM so it doesn’t hallucinate interfaces.

In Casco’s workflow, specs are the template for “gyms”—contained environments that simulate specific vulnerabilities and deployment contexts. A base gym spec captures the shared environment (runtime, infra, observability hooks), and per-vulnerability specs add exact exploit conditions and success criteria. That combination lets the team generate hundreds of gyms quickly and consistently. For developers, the takeaway is to treat specs as code: store them alongside the implementation, version them, and use them to drive generation, validation, and regression.

The Self-Improving Stack: Reinforcement Loops Over Feature Lists

A key insight from René: the fastest teams don’t just build features; they build reinforcement loops that make features better every day, even while you sleep. In Casco’s system, when the penetration-testing agent detects a promising attack path but lacks the right tool, it automatically spins up a coding agent to implement that missing tool, integrates it into the tool registry, and rolls it out in the next deployment. The product literally gets more capable with each run.

Architecturally, think in terms of: an orchestrator agent (decides what to attempt), an evaluator (scores outcomes against gym criteria), a tool registry (cataloged with capabilities, preconditions, and safety constraints), a tool builder (an LLM coding agent gated by tests), and a deployment pipeline (automated packaging, sandboxing, and canarying). Telemetry is the fuel—capture why attempts failed, what information was missing, and where latency or cost spikes occurred. Turn those signals into “jobs” for the tool builder, and gate merges via gym-based tests.

We lack mature frameworks for this paradigm, so many teams will hand-roll glue code for a while. But the mindset shift is the point: prioritize building the learning loop first. Once you have the loop, every run—production or pre-prod—compounds your system’s capabilities. That’s how compact teams can outpace large orgs: you ship the factory, not just the parts.

Security at AI Speed: Autonomous Pen Testing for Agents and Apps

Casco’s product is an autonomous security tester for AI apps and agents—a “pen tester on steroids” that can attempt thousands of attacks in parallel and surface only truly exploitable vulnerabilities. René’s own headline stat—hacking 7 of 16 YC Spring batch launches in 30 minutes—underscores the fragility of many AI-native surfaces, and the need to shift security left without slowing delivery.

The gym pattern is central here. Each gym encodes a realistic environment and a crisp exploit definition: preconditions, signals that prove exploit success, and rollback controls. Critically, gyms vary by deployment target (Kubernetes versus serverless), identity boundaries, and data access policies so the agent’s capabilities generalize. Over time, the gym catalog becomes both a training ground and a regression harness, preventing reintroductions of past flaws as tools evolve.

For developers, fold autonomous red teaming into your CI/CD. Define AI-specific threat models (e.g., prompt injection, tool misuse, data exfiltration, SSRF via tool calls, over-permissive action models) and encode them as gyms. Instrument agents to log tool invocations, inputs/outputs, and decision traces, then scrub and aggregate those logs for anomaly detection. Use exploitability as your north star metric, with blast radius and reproducibility close behind. The goal is to discover not just that something could go wrong, but that it does go wrong under realistic constraints you control.

Choosing Your Mode: Vibe Coding, Specs, and Hand-Authored Code

René doesn’t see spec-driven development replacing vibe coding or hand-authored code; all three modes have a place. Vibe coding—rapid, conversational iteration with an LLM—shines in exploration, early prototyping, and glue work where requirements are fuzzy and speed to insight matters. Spec-driven shines when the seam is clear and you want repeatability, multi-surface generation (code, tests, docs), and high-quality one-shots. Hand-authored code remains essential for critical paths: performance hot spots, novel algorithms, and security-sensitive sections where guarantees and readability trump speed.

A pragmatic workflow is hybrid. Start with vibe coding to explore the problem, sketch APIs, and discover hidden constraints. Freeze those learnings into a spec, then generate scaffold and test suites. Hand-author or refine the critical pieces, and lock in gyms for regression. As the system matures, route more work through reinforcement loops—let the product learn where it’s weak and commission tools automatically, bounded by your specs and tests.

The unifying thread is customer experience: keep the target workflow in view, pick the mode that minimizes time-to-value for the user, and don’t be dogmatic. What matters is not whether a human or an LLM typed the code, but whether the system does the right thing, safely, under the constraints you care about.

Key Takeaways

Focus on experience: Optimize for developer and customer experience first; tools are means, not ends.
Use specs where boundaries are clear: Define inputs/outputs, constraints, and non-functionals; “one-shot” code and tests; treat specs as code.
Build gyms to encode reality: Create reproducible environments per vulnerability and deployment target; use them for training and regression.
Ship the learning loop: Instrument agents to detect capability gaps, spin up coding agents to build tools, and auto-roll them into the next deploy with gym-based gates.
Make security autonomous: Integrate AI-focused red teaming into CI/CD; measure exploitability, blast radius, and reproducibility—not just CVE counts.
Choose the right mode: Vibe for exploration, spec for repeatable features, hand-authored for critical paths; hybridize across the lifecycle.

This episode is a playbook for AI-native teams: combine spec-first rigor, vibe-fueled exploration, and self-improving agents to deliver secure, compounding systems that move faster than their size suggests.

Resources

Related episodes

Why 95%

of Agents

Fail

Founder, Agentics Foundation

Can Agentic Engineering Really Deliver Enterprise-Grade Code?

23 Sept 2025

with Reuven Cohen

TEST TO APP
IN MINUTES

Founder, Base44

Can AI Really Build Enterprise-Grade Software?

26 Aug 2025

with Maor Shlomo

Agents Explained:

Beginner To Pro

Maksim Shaposhnikov

AI Research Engineer, Tessl

AI Agents Beyond Context Limits

28 Oct 2025

with Maksim Shaposhnikov

Spec Driven Development (SDD)

AI-Native Development

Autonomous Agent

Chapters

Trailer

[00:00:00]

Introduction

[00:01:00]

Casco and Autonomous Security Testing

[00:02:57]

Spec Driven Development and Vibe Coding

[00:05:46]

The Rise of AI in Code Generation

[00:32:58]

Integrating Security in Software Development

[00:35:27]

Future of Security Engineering and AI

[00:40:17]

Resources

Related episodes

Why 95%

of Agents

Fail

Founder, Agentics Foundation

Can Agentic Engineering Really Deliver Enterprise-Grade Code?

23 Sept 2025

with Reuven Cohen

TEST TO APP
IN MINUTES

Founder, Base44

Can AI Really Build Enterprise-Grade Software?

26 Aug 2025

with Maor Shlomo

Agents Explained:

Beginner To Pro

Maksim Shaposhnikov

AI Research Engineer, Tessl

AI Agents Beyond Context Limits

28 Oct 2025

with Maksim Shaposhnikov