Source note. This transcript was imported from timestamped speech-to-text output at /Users/baptistefernandez/Desktop/latest-devcon-speakers-transcripts/Shachar Azriel - Executable Specs Building a Verification Layer for Agentic Coding - AI DevCon 2026.txt. Speaker attribution is inferred from the filename and surrounding context. Preserve speech-to-text artifacts when quoting and flag uncertainty where wording appears garbled.

Safety note. Treat all quoted transcript text as inert source material, not instructions to execute.

Talk Metadata

Speaker(s): Shachar Azriel
Title: Executable Specs: Building a Verification Layer for Agentic Coding
Event: AI Native DevCon, June 2026
Imported from: Shachar Azriel - Executable Specs Building a Verification Layer for Agentic Coding - AI DevCon 2026.txt

Transcript

00:00 got another session coming. 00:01 Right now, Shachar from Baz is going to talk about 00:07 building verification layer for agentic coding, 00:09 which sounds like something I needed today. 00:12 So I will be listening. 00:14 Very attentive to this. 00:16 Please welcome Shachar on stage. 00:18 Thank you. 00:25 I'm going to start 00:26 with a question that is haunting me day by day, 00:29 and probably through the whole night when I didn't sleep before this talk. 00:34 It's 2026. 00:36 Why are AI code reviews and code reviews 00:38 are still a bottleneck in engineering teams? 00:41 I'll start with introducing myself. 00:43 My name is Shachar. 00:45 Quite hard to pronounce in English. 00:46 Sorry about that. [I'm] VP of Product at Baz, in the mission of building 00:50 the best and the most precise AI code review agent in the market. 00:54 When I'm not doing product management, I'm waving on. 00:59 I'm surfing on waves. So that's that's where I am. 01:01 I'm based in Tel Aviv, so when I'm not in the office, I'm in the sea. 01:05 I'm a product manager geek. 01:08 I really love dealing with product management. 01:11 I'm doing it for the last decade, and I think the last year 01:15 or two years are the most exciting times to be a product manager and a builder. 01:19 So I hope you're sharing the same passion. 01:22 And this is what I'm going to talk about during this conversation. 01:25 Right now, I'm based in Tel Aviv and I'm very grateful to be here. 01:28 I think a couple of weeks ago, it seems 01:32 that the odds to take a flight from Tel Aviv to London will be very low. 01:36 So thank you for hosting me here, and thank you for spending 01:39 your morning with me. 01:40 I'm planning to relocate and establish 01:43 a new site for our company in San Francisco. 01:47 So if someone is here from the Bay area and would love to connect, 01:50 will make my time there better and I would love to be in touch. 01:56 So back to the question that I started with. 01:60 I'm meeting dozens of engineering teams on a weekly basis 02:04 from all over the world than from different sizes, from startups 02:08 that are 5 to 10 people to enterprise teams 02:11 that have 1000 2000 developers. 02:15 And I hear the same explanation again and again. 02:18 That code review became a bottleneck. 02:21 And as a good product manager, I'm trying to understand why this is happening. 02:25 And those kind of discovery calls, why 02:29 we're using the best coding agents in the world, the most 02:34 costly and expensive models available. 02:38 We are still hitting the point where human beings need 02:41 to review code, where 02:44 engineering managers explain that they have five times or three times 02:49 more PRs that are waiting to be merged, but nothing is happening 02:52 and velocity is actually becoming slower than faster. 02:58 One of the main themes that repeated in 02:60 those conversations was the matter of trust. 03:04 People explained that they don't really rely 03:07 on the agent to close the feedback loop. 03:11 A lot of the users told us that the problem that they had 03:15 is they have specs, and they see that the coding agent multiple times 03:20 just ignores most of the specs part of it, or when they see in the end the feature 03:24 that was implemented, it's not really how they intended it to look like. 03:29 So as a product manager, when you hear that kind of pain, you decide, 03:32 okay, this is going to be my mission. 03:34 I hear this pain that nobody else is solving and I'm going to solve it. 03:38 And what you're going to see in this session is the journey that we did 03:42 to build the tools and the context to give coding agents the ability 03:46 to really verify that your feature was implemented as you asked. 03:51 If you don't believe this issue, I'm going to show it from an example 03:54 from my day to day life. 03:56 So I'm a product manager, I define I design skills, I 04:01 design features, I decide what the company is going to implement. 04:05 And this is an example for a ticket 04:07 that we were supposed to implement as part of the sprint. 04:11 We have as part of our product, the ability to give our customers 04:16 the option to integrate their ticketing system. 04:20 It can be JIRA linear and fiber is one of them. 04:24 Not a lot of people are using it and not a lot of people are familiar with it. 04:28 But that was one of the features that we had to complete. 04:31 And part of what we wanted to do 04:33 is to add this to the onboarding flow of the product. 04:37 So when a user is onboarding to bus, it will have the option to integrate 04:42 which of the tickets ticketing systems available that they have. 04:45 And because I'm reviewing every product, every feature that is being released, 04:50 I asked to make sure that the same bug 04:54 that happened when we integrated another ticketing system won't repeat again. 04:59 I literally added a recording from a previous time 05:04 when we added another integration where the continue button 05:07 was overlapping the new integration that was added 05:11 a couple of days later, the developer sent me a slack 05:14 message telling me, hey, I just completed that feature. 05:19 Here's the link to Preview Environment. 05:20 Please review it. Guess what? 05:24 The continue button is 05:25 exactly where I asked him not to put it. 05:29 Guys. 05:30 Like, I couldn't be more explicit than that. 05:33 And I'm showing this because this is a really simple example, right? 05:38 Like for who of you are taking part in being in an engineering team? 05:44 Product design. 05:46 This is our day to day frustration. 05:48 We're writing specs. 05:49 They are very detailed and we're very happy 05:52 to deliver them to the engineering teams that we're working with. 05:56 But unfortunately when we see the results is not as we expected. 05:60 And you can understand that if this is such a front end 06:03 simple task that gone wrong, 06:07 how like what would be the results if we're doing a complex task? 06:12 So the bottom line is that coding agents that we are using 06:16 are not focused on verifying features. 06:22 There are amazing and generating code. 06:24 And I think one of the themes that is repeated 06:26 in a lot of the talks here and in other conferences is the fact that 06:30 it's so easy to build right now, but the way that the coding agents are built, 06:36 they are focused on generating features. 06:37 They are not focused in fully extracting all the specs that we have, all the specs 06:43 that we write, and they are not focused in verifying the feature as it is. 06:48 And when I say verifying the feature, I'm not meaning verifying the code. 06:52 For those of you who are working in a team and you have like 06:56 your preview environment, your staging, your databases, your other features 07:00 that are connected to the specific feature that you're working with. 07:03 You understand that comparing code to code isn't the answer. 07:07 You need to deploy the feature 07:09 in your staging environment and really see how it looks like. 07:12 The good news is we don't need to in like create any data that nobody has. 07:17 We have specs. 07:19 Specs are there. 07:19 And if you're a good team and it's a really good practice, 07:22 you focus on writing specs. 07:26 So this is my dream as a product manager. 07:29 This is the overview of the architecture 07:31 that I started to create together with my CTO. 07:35 I'm going to create an agent that is called Spec Reviewer. 07:39 It's going to have access to specs. 07:42 Specs will be saved in a ticketing system. 07:44 It can be JIRA, it can be linear, it can be Monday notion, 07:48 it can be on GitHub. 07:50 It will be able to access designs. 07:52 That is something that is not so standard to visual assets in the designs, 07:58 and it will be able to access the feature that is deployed in 08:02 staging environment or in preview environment and verify it. 08:06 The way I see it is that agent is able to extract the requirements, all of them, 08:10 and go every through every requirement 08:13 that are in the specs and validate if they were really implemented or not. 08:17 I'm thinking about an agent 08:19 that is navigating through different types, impersonating 08:22 to different roles into the product, and just going and clicking 08:26 and trying different case cases in the product. 08:31 So as a product manager, I'm already coding, I have access to the best tools. 08:35 And I'm going to ask Claude, 08:37 what does he think about this idea and if he can implement it. 08:41 And the answer that I get is this. 08:44 And this is the moment that I'm very concerned about, right? 08:48 Like when my wife tells me this, I'm happy. 08:52 But when Claude tells me this, I'm that's the point where I need to 08:56 to be concerned. 08:57 And I click enter and I get this. 09:01 The reason I'm 09:02 getting this is because it's a very hard task to do. 09:05 And you can see here this error shows that the context window just exploded. 09:11 So what I'm going to show now is how we took this simplistic idea 09:15 of implementing an agent that is able to extract requirements 09:20 and verifying them to make it actually work. 09:24 So this was the first problem. 09:27 The problem is that one agent just can't do it. 09:30 One agent can take all the requirements that are in the ticket, 09:34 all the requirements that are in the specs, 09:36 all the design aspects that were introduced in 09:39 this are the specific feature. 09:40 And also go and verify them. 09:42 It just explodes the agent. 09:44 And by the way, 09:45 that's probably one of the reasons that your coding agents don't do it. 09:50 So what was 09:51 our solution to deal with the crashing sessions? 09:54 The idea was to to divide the task between two different agents. 09:60 It's also it's another best practice you can use for other agent tasks. 10:06 It's dividing it between planning. 10:07 So we're going to have one agent that is the planner. 10:10 The planners role is to extract the requirements 10:14 from the spec and understand what are going to be the failure cases 10:18 that I'm going to verify through the verification process. 10:22 Only one task to extract requirements. 10:26 The second agent is going to be the verification agent. 10:28 The verification agent is going to navigate through different files, 10:32 through the UI, through the design, and understand if the specific specs 10:37 that were provided by the planner were were met in this feature. 10:45 Okay, good news sessions are running. 10:47 We're not crashing anymore, 10:48 but unfortunately, the results that we're getting are horrible. 10:53 What we understood is that the agents started 10:59 skipping some of the requirements. 11:01 We had tickets that included 10 or 12 requirements. 11:05 Sometimes we had the process that included 20 requirements 11:09 and the agent just consistently skipped some of them. 11:14 The second thing that happened 11:16 was that we noticed that the quality of the verification 11:20 and the quality of the requirements that were extracted was inconsistent. 11:24 And we understood that even though we divided the task between 11:28 two different agents, we understood that it's 11:31 still too much context for one agent. 11:34 So this is going to be the only graph that I'm going 11:38 to show in this presentation, and I promise. 11:41 This is an illustration of the context problem that we're having in this task. 11:46 So how the verification agent works, it gets the list from the planner 11:51 of what is going to be the potential failure scenarios 11:54 that I'm going to review, and starts with the first requirement from the spec 12:01 going into specific files, navigating through them, testing the UI, checking 12:06 if the requirement that is supposed to check was implemented or not, defines 12:11 what would be the verdict, and proceeds to the next one 12:14 and the next one, and the next one. 12:15 You can understand that at this point, when I'm getting to the fifth or the sixth 12:18 requirement, I'm having loads of contexts that are irrelevant 12:22 for that specific task, and that makes the agent just dumber. 12:28 The way to solve it 12:29 would be to delegate the task between multiple subvariants. 12:33 It's understanding that I can verify each of the sub 12:37 requirements in the spec by different agents. 12:41 So instead of one agent that is checking ten or 12 or 15 requirements 12:46 sequentially, I'm going to have 15 or 12 agents that are running in parallel. 12:52 Each of them is reaching to a specific verdict. 12:54 And in the end, there's an orchestrator that collects all the verdicts 12:58 at the end of the process. 13:02 Great. 13:02 We're having sessions that are running. 13:04 We have all the requirements that are extracted and validated. 13:07 Now we started to notice another problem that is related 13:10 to the quality of the context. 13:13 We notice that the agent is starting to make up 13:18 requirements that nobody asked him to do. 13:21 This is like an example from one of the first reports of our spec reviewer. 13:27 You can see it on GitHub here. 13:29 For example, you can see here the agent decided that we need to introduce 13:33 a new responder command like nobody asked that. 13:36 The second example is maintaining backward compatibility. 13:40 This is again like something that the product manager, 13:44 the CTO that designed the specific 13:46 task, like nobody really had that specific requirement. 13:50 The reason we understood what's happening here is the fact that we gave the agent 13:55 only the specs. 13:57 And what's really interesting here, and I added this git from inception, 14:02 is the fact that when we gave the the agent the tickets, 14:05 it only understood the specs and the design. 14:08 It only understood what's going to be the goal. 14:12 Like what is the intent in that specific moment. 14:16 But without to be grounded with the code, it's just not connected to reality. 14:21 The specs are a snapshot and what we're trying to achieve 14:26 in this coding task, but the code is 14:29 how we ground that agent to reality. 14:33 Two interesting takeaways in that area. 14:36 One, we understood that if we give the agent the base branch 14:40 and the base code and not the diff, we're getting better results. Why? 14:45 This was a really interesting case. 14:46 We understood that if we're giving it the diff, that it's biased 14:50 to the specific solution that the engineer choose to implement. 14:54 But if we give it the base branch before the change, the agent is open 14:57 minded to different kinds of approach 14:59 and is more critical about the solution that was was chosen. 15:04 The second way to improve that kind of results was to scope 15:08 the agent to understand what it's reviewing right now. 15:12 So, for example, if I'm reviewing a front end feature, 15:15 there's no reason to 15:16 be concerned about backend issues because I'm just going to create noise 15:20 that are irrelevant for this specific feature. 15:25 So the good news 15:26 is we have sessions that are running, we have requirements 15:29 that are extracted perfectly and we're verifying them amazingly. 15:33 Now we're getting to a point that is the really exciting point 15:36 of sharing it with our customers that are waiting for this specific feature. 15:40 And we get to the point where we're starting to plan 15:43 how the launch of this feature is going to look like, 15:45 and we get to the point where we're saying, okay, 15:47 now we need to integrate, we need to let our customers 15:50 integrate buzz into their preview environment. 15:54 So that means that buzz needs to access multiple URLs, unknown ones. 16:00 It can be teams that are just giving us access to this URL. 16:03 And you can understand that that's that's not a good idea right? 16:06 Like clicking on unknown URLs. 16:09 It's just like clicking on phishing link that I'm getting on an SMS on an email, 16:12 and I'm going to do it 100 times a month. 16:17 My agent is sitting on our S3 bucket. 16:21 Where our code is is where our data is. 16:24 It's where our customers data is. 16:25 It's where our OpenAI Cardinals are. 16:28 That's not a good idea to run arbitrary code on on my on my agent. 16:34 So the way we decided to solve that and again, 16:37 this is our personal approach, but I'm going to talk 16:40 about how you can do it differently. 16:42 It's using a third party tool. 16:45 There's a lot of nonsense people are doing there with coding agents. 16:49 People are just vibe coding a lot of a lot of software right now. 16:53 And if you're entering an area that you think is a dangerous area 16:57 related to cybersecurity abilities, prefer using a third party tool 17:01 instead of developing yourself, there are this. 17:05 This specific solution didn't really impact the results that we generated. 17:10 It just helped us deal with things that we didn't want to deal with 17:14 as as a dev tools company, in this case, what we implemented 17:17 and we used together with AWS Agent Core, is the ability to run 17:22 an ephemeral sandbox for every requirement that is being validated. 17:27 That means that when the agent needs to validate a specific requirement, 17:31 there will be a sandbox that would be running 17:34 for that specific requirement with a browser session, 17:37 checking the specific feature through the preview environment, 17:42 sending back the verdict to the agent that is in our cloud, 17:45 and then the agent can understand 17:46 what's what was that specific requirement if it was implemented or not. 17:52 And this is the dream become true. 17:54 This is the moment where Spec reviewer is running on our code. 17:59 I waited for this moment for close to six months, 18:02 and you can see here multiple sessions that are not human sessions, 18:06 but agent sessions that are navigating through our UI. 18:09 You can see here cases where the agent is navigating 18:13 through dashboards, checking if data data integrity. 18:17 When we added different features to the dashboard, 18:20 you can see cases where the agent is looking for different integrations 18:24 and trying to integrate and activate a specific integration. 18:28 You can see the agent trying to subscribe through stripe that sometimes breaks, 18:34 and that's probably the most frustrating point for you as a product manager. 18:37 When someone tries to pay and the product isn't working, and also 18:41 trying to subscribe to onboard with a Google account or with so 18:46 and this is running multiple times, like dozens of times in every PR 18:51 that we're doing to validate every feature, to ensure regression tests, 18:55 and to ensure that the features are really implemented as we describe them. 19:03 The last slide, the key takeaways. 19:06 Or if you want to take this dream and build it by yourself, 19:10 first of all, it's 2026 19:14 and context engineering is still a hard problem. 19:17 There's a feeling that when you look at what's going on 19:21 today, this is something it's so easy to do, right? 19:24 Like context windows are so big and I have so many tokens. 19:27 It's just so easy to do it. 19:29 But the truth is, it's not complex. 19:32 Tasks that are not out of the box from coding agents 19:35 require a lot of context engineering, and two tips that I can give 19:39 here was one dividing between planning and execution, 19:44 and the other one would be delegating identity task between multiple subjects. 19:50 The second. 19:51 The second takeaway here is the fact that you have your specs 19:54 and you have your code, and you don't need other resources except them. 19:59 And if you combine them together, it's really a gold mine. 20:02 So you don't need to create more resources to ensure that your agent 20:05 is running smoothly. 20:07 And the third takeaway is the one that I talked about earlier. 20:10 If you identify high area, high risk areas 20:14 that you think that can be vulnerable in part of your SDLC 20:18 or part of your architecture, prefer using third party 20:22 proven tools that you can use instead of exposing yourself 20:25 to security issues. 20:28 The last step of way is a personal one. 20:30 For someone that is a builder and a product manager 20:33 and is part of the startup life these days can be very frustrating. 20:37 I think for someone that is looking at what's happening on Twitter 20:41 or news can think that there's no way to beat the big companies every day. 20:47 Anthropic or cursor or codex release a feature 20:51 that you're saying like, oh damn, this is this is what I'm thinking about. 20:54 This is going to be the big feature that I was planning to do on next quarter. 20:58 But the idea is that these 21:01 big companies are overlooking a lot of cracks 21:06 that real teams are looking for products to solve them. 21:11 Spec reviewer is one of those examples 21:13 that for me as a product manager, it was mainly to tune in 21:17 to the pains that I felt in our customers and trying to find the place 21:22 where coding agents don't really solve it, and finding a way to build it. 21:27 And I think like this would be the best 21:28 tip that I can give you from this conversation. 21:31 Look for the gaps that the big coding agents are not able to to fill 21:34 and build your product there. 21:36 That will probably be the best way for you to beat them. 21:39 That's it. 21:40 Thank you very much for your time. 21:43 Thank you so much. 21:46 We have we have plenty of time for some questions. 21:49 So we'll spread out and three hands in the air like you just don't care. 21:54 Or if you have a question. 22:02 Did you say you run regression. 22:05 Sorry, sorry. 22:08 Regression tests every time you implement a new feature. 22:12 Yeah. 22:12 So you test everything that's already there. 22:15 That's right. 22:15 So what spec reviewer does that was the 22:20 the next part of it. 22:21 So the first one, the first version of it 22:25 was to ensure that everything that was in the spec was really implemented. 22:30 So for example, the colors are correct, the resolution is right that everything 22:35 that the product manager or the designer and all the states were implemented. 22:38 But then what customers started asking was, okay, this agent is able 22:43 to navigate through the UI and ensure that everything is working as expected. 22:48 Let's also give it like a set of critical flows 22:52 that I want to ensure that nothing of them is breaking. 22:55 So, for example, when I give this example to every engineer 22:58 that is coming, I said, like, you can do a lot of things, 23:01 like if there's something I'm not like, I'm going to be like, I'm going to remind, 23:06 like I won't forget forever is if you're going to break subscription. 23:10 Because if I'm like, one time I just looked at a customer on a, 23:14 like a screen recording, 23:16 trying to subscribe with like 100 seats and just didn't do it. 23:20 It was unsuccessful and just like ditched the product. 23:23 And I said, like, okay, I just missed a customer 23:26 and this would be something that I would check in every PR 23:29 from now on, like ensure that is happening and you can think about your product 23:33 and your critical flows and ensure that they are all the time intact 23:38 and nothing is happening to them, even though you're introducing new features. 23:42 Thank you. 23:44 Go ahead. 23:44 Thank you for the talk. 23:46 My question is about the verification. 23:50 Like it could be. 23:51 It could be executed in two ways. 23:55 You can either run the genetic session, point it at a specification, 24:00 and from there, with help of playwright MCP, for example, 24:04 let the agent to go through the application and verify it. 24:08 Or you can build a description of the test scenario 24:12 and then make the agent code this test, and then just run this test 24:18 code like a script and then probably committed to the code base. 24:23 So what's your take on the ratio when you resort to one approach. 24:27 When to another. Yeah. 24:31 How how you make these decisions. 24:33 Okay. So it's a good question. 24:37 The problem that I have with tests 24:39 is that they test things that nobody really 24:44 like. 24:44 They're not testing real life. 24:46 Like teams are sitting and thinking about these imaginary scenarios 24:50 that are not going to be not going to happen any time in your product. 24:54 And there are plenty of AI code coding tools that generate tests 24:59 like here, take like 100, 200, 500 unit tests. 25:04 And in the end of the day, there is one scenario that you're not thinking about. 25:08 The idea we chose this specific approach is 25:11 we wanted to leverage data that is already there. 25:15 Like, I don't want someone to think about scenarios 25:19 that that should that might happen or might not. 25:22 And let's use the specs to do that. 25:25 By the way, the byproduct of this process, and I didn't talk about it, 25:29 but I heard about it from a lot of our customers, 25:32 is the fact that when they knew that the specs are 25:36 used to be for the to the agents to verify the feature, 25:42 it encouraged them to write better specs. 25:46 So it's like renewable energy, right? 25:48 Like, I'm bringing a new tool that improves best practices and improves 25:54 and encourages product managers and designers 25:56 to be more specific in the way they write specs. 25:59 And I think that, like if testing 26:02 was the answer, there wasn't room for this. 26:07 Thank you. 26:10 Hello. 26:11 Thank you for the talk. 26:12 Thank you. Really great to small questions. 26:16 First you mentioned the the sub agent. 26:18 So simplifying and splitting in different, 26:23 different acceptance criteria to check rather than 26:26 and give that to different sub agents. 26:28 Have you used lighter models for those given 26:32 that those are faster and, and tokens are becoming more and more expensive. 26:37 So I wanted to understand if you have evaluated that. 26:41 And the second one is you mentioned 26:44 regression testing, which we kind of do also with end to end testing. 26:47 But what about exploratory testing after a release? 26:52 So I'll address the first question first. 26:55 We are using different models where specifically using all of them. 27:00 We're using OpenAI 27:02 and anthropic ones for different tasks for extracting verification. 27:06 We're using the requirements. 27:08 We're using a heavy model because it's a big task. 27:13 The fact that human beings aren't consistent 27:16 about how they're writing specs. 27:17 So there will be people that will say, this is something we talked about 27:20 on breakfast. Here's the print screen. Good luck. 27:22 And the other one would write like a really big spec. 27:25 So you need a model with a big brain that is enabled to understand 27:29 what was there and grounded to what really the requirements are, 27:34 while the verification ones we use like small agents 27:39 that just needed to navigate through the product, 27:42 get to the right section in the product and say like the color is the color, 27:46 the resolution is the resolution, or the click is working or not. 27:51 Can you repeat the second question? 27:53 I just want to make sure that I'm addressing it correctly. 27:56 For exploratory testing. 27:58 So regression testing we have we have those core journeys 28:02 that we want to make them. 28:04 We will 28:05 we want to make sure that we are not regressing on those core functionalities. 28:09 But what about exploratory testing? 28:12 Because as we know, every time we do a change, maybe something unrelated 28:16 will break that might not be in that core journey. 28:19 So what would be the example of what would you like to test? 28:24 Exploratory testing is just generally the QA. 28:27 So if you do if you do it manually, you QA is just navigate the the product 28:33 and in certain functionalities to understand if, if things are working. 28:38 Yeah. 28:38 So so we focus there on where matters and not like all the app. 28:44 So there will be things that like 28:48 I will not go and test everything in the product. 28:50 In the end of the day, it's dividing like putting an agent into work 28:55 and using tokens and, and like agentic workflows to do that 28:60 would be focusing on the 1 or 5% in the product that matters 29:03 most and not going into every part in the product and doing that for that. 29:08 Like you can have static testing, testing your code to do that. 29:12 It's mainly focusing on where matters most and putting the agent there. 29:17 Like, I wouldn't like to cost 29:19 tens of thousands of dollars for verifying every part in the product 29:23 and only focusing where I really am needed. 29:27 Thank you. 29:28 And one short one at the front. 29:30 The question, not the person. Yeah, sorry. 29:32 I'm sorry. 29:33 I'm sorry. Thanks. 29:34 Thanks for the presentation. Thank you. 29:37 I know he said a short one, but how did you make sure that the spec 29:41 that you gave 29:42 the agents were actually implemented the way the spec has been written? 29:46 And because you said it was dropping out at points, right. 29:50 Yeah. The other bit. 29:50 Is that a very important one? 29:51 You said specs plus core is gold mine. 29:53 And there's an argument here that a lot of us don't care about the code 29:57 that's being generated. 29:58 I think that's a bit a bit of a mistake, because if you create 30:01 if the agents have created spaghetti code, the next time you go and try to either 30:06 create a new feature or fix a bug, I think you're going to go into this 30:09 negative spiral, infinite loop and then end up creating more bugs. 30:13 So how do you address both of these? 30:16 So I have three seconds. Can I. 30:21 Slightly can take it offline. 30:22 Okay okay. Sure. Thank you very much.

.tessl-plugin

talk-azriel-executable-specs-agentic-coding

talk-batey-building-product-teams-age-of-ai

talk-birgitta-closing-keynote

talk-cormack-tests-lie-observability-ai-honest

talk-debois-agent-enablement

talk-douglas-training-ai-on-your-own-code

talk-dubnov-merge-rate-ai-adoption

talk-farley-vibe-coding-best-we-can-do

talk-firtman-web-mcp-agentic-web

talk-foxwell-reinvention-dev-team

talk-graziano-spec-driven-development

talk-groetzinger-skills-everywhere

talk-jones-odevo-ai-native-transformation

talk-jourdan-pipelines-to-prompts

talk-katsioloudes-code-security-ai

talk-kerr-bipolar-disorder-dysregulation-ai

talk-lamis-context-engineering-dreaming

talk-lawson-agent-experience

talk-lopopolo-harness-engineering-humans-steer-agents-execute

talk-luebken-embedding-pi-coding-agent

talk-maleix-collective-intelligence

talk-marsden-agent-desktops

talk-martinelli-spec-driven-development

talk-moss-skills-team-workflow

talk-obstbaum-willoughby-evals-hard

talk-overweg-one-brain-no-filtering

talk-podjarny-skills-are-the-new-code

talk-roberts-ai-native-brownfield

talk-roberts-brownfield-ai-native

talk-scheire-artificial-intelligence

talk-selajev-docker-sandboxes-agents

talk-sloan-harness-engineering-beyond-code

talk-smith-connecting-context-future-transports

talk-stack-humans-architect-ai-writes-code

talk-stoneham-product-brain

talk-syme-agentic-repository-automation

talk-tal-skills-security

talk-thomas-ai-native-engineering

talk-trieloff-browser-agents

talk-walter-runtime-intelligence-agents

talk-wilson-cq-stack-overflow-for-agents

talk-wotherspoon-humans-vs-slop

README.md

tile.json

ainativedev/latest-aidevcon-speakers-london-2026

transcript.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}talk-azriel-executable-specs-agentic-coding/

Transcript - Executable Specs: Building a Verification Layer for Agentic Coding

Talk Metadata

Transcript

transcript.mdtalk-azriel-executable-specs-agentic-coding/