CtrlK
BlogDocsLog inGet started
Tessl Logo

ainativedev/aidevcon-2026-ldn

AI Native DevCon 2026 London — all conference sessions as interactive skills

70

Quality

88%

Does it follow best practices?

Impact

No eval scenarios have been run

SecuritybySnyk

Passed

No known issues

Overview
Quality
Evals
Security
Files

transcript.mdtalk-martinelli-spec-driven-development/

⚠️ Speaker labels absent. This transcript was supplied as one continuous block with no per-speaker markers. The main body is clearly Simon Martinelli delivering the talk; introductions and Q&A are mixed in. When attributing specific Q&A exchanges, prefer hedged phrasing ("an audience member asked...", "Simon replied...") unless the turn is unambiguous.

⚠️ Speech-to-text artifacts. The transcript contains many garbled terms. Notable likely-misrecognitions: "spectrum development" → "spec-driven development"; "music search" → likely "Mistral" or similar; "Ivar Jakob's own" → "Ivar Jacobson"; "Siren Mace" → likely "Guy Podjarny" or another Tessl author [unclear]; "nativedev.io" → "nativedev.ai" or Tessl's domain; "new piece estim" / "nearp system" → "ERP system"; "music festival" segment refers to a real second customer. Quote verbatim — these are part of the source.

1. Opening story — volunteer management system

I'll leave you. To it. We're just, we're gonna get started in a few minutes. But. We'll let people trickle. It. Learn to be. Like random empty seats. Would you mind? Scooching? Maybe just. One. Thank. You. For the introverts of this conference. Which there are. A handful. You know? Okay. We are. Gonna go ahead and get. Started. Thank you guys for coming again. Another time slot was the best competition that you guys are the smart ones. I can see. This is Simon Martinelli who needs no introduction. If you've been involved in the Java community. He is a legend there, and he's going to be talking about some lessons learned from spec driven development. Let's give Simon a hand.

Let me start by telling you a story. So I'm in a sports club. I'm from Switzerland and I live in a very small town and we lose sports event. So we are mainly doing track and field and we have competition for kids already for 45 years. And since 1997, I'm doing software for. Them because I realized that doing computation and doing computation ranking list and stuff like that. Meets a lot of people back in the days, like we were 12 to 50 people. And now it's just me using software doing that for them. But that's not the only thing that we do. We also have different events and we always need volunteers. And it was in summer 2024 when we had probably holiday in Switzerland, someone of the club came to me and said, hey, Simon, we have an issue. Me and volunteer management system, but that's outdated and doesn't work anymore. So you're a software engineer and I heard about this AI stuff. You can certainly do that better. Let's go. And I said, okay, why not? And they started to use main search. That is around 4024. And that was relatively fast. And I had a long dimension system. But the problem is someone else from music festival brought me and said, hey, I heard that you're building a volunteer management system, meaning that as well. And then I miss you. I have not told what I was implementing and I like to change the features so that it will fulfill the requirements of this music. Test. And this brought me to spectrum development [likely: spec-driven development].

2. Discovery via nativedev / Tessl

As I was already introduced, I'm doing mostly Java development. I'm consulting with residents for 17 years. And I'm working for insurance companies wholesale retail government, large enterprises. Mostly, and I'm doing business applications. So I don't do tools, I don't do products, I do business applic. Ations.

And during that time when I was kind of lost, I found a website. It is called a native dev.io that's Tessl. And there I found an article from Siren. Mace. And he was writing about AI 90 development [likely: AI-native development] and that kept my attention. And I read about spectrum [likely: spec-driven development]. But at that time spent in the moment [likely: spec-driven development] was relatively new. So the term spectrum in the moment is very old, right? That's from 2000s or maybe late 90s. But the spectrum development with AI was relatively new. So I started to think about what could specs be. And because I'm doing business applications, I had a different take probability of inspector. If you won't put that to find a way to do that.

3. Landscape — tool-centric vs process-centric

And so in the meanwhile, if you look at the specimen development ecosystem, there are flavors of bacteria. Development. And there is one process centric. That's my process I created. Say a unified process [AI Unified Process]. I will show you that in a minute. But on the other hand, we have tools. We have some here we have kit of spec kit [Kiro, Spec Kit]. We have pm method. These are all great tools. And also Tessl created back in tombs back then. 2025. But they are all, in my opinion, at least to developer centric. And I drive a covered whole. Software development life cycle, especially for enterprises where you know our soap and are summoned in a larger team with different roles in a larger baseage.

And there's a website called ai unified process. And there you will see the diagram with some color. So the colors mean that usually if you're in the green field, I don't talk about groundfield as well because I'm rarely on greenfield project. We start with the vision. And then you start getting requirements. However, you would do that. By the way there is the international requirements engineering board [IREB] and they just created something called the micro credential. It's called AI for requirements engineering. And they are prompting guide skills and stuff. They're working on that requirements engineer can speed up with AI. And then we have testing and stuff like that.

4. Why system use cases

And grid things probably interesting here. I think about what could be good specs that a stands, but also all stakeholders in the project. And I ended up in system use cases by the way who knows this new. Space. Who knows? Not a lot of people are safe. So system use cases were created by Ivar Jakob's own back in 1987 [Ivar Jacobson]. And then later on we had UML and the room tea and stuff like that. But step moved like that. 80s early 90s. But I was using this news case for when I was working for space railways early 2000 [Swiss railways]. And this white worked well as communication. Specification between psychology. And I thought, okay, if that worked back then, why shouldn't it work can be any?

And then we need something else. I call it the entity model, but it's more like a domain model. So it depends if you bring the major design [domain-driven design] or your board. These are more or less utility data. And yet we need to model. And these things together with the software architecture, we talk about that later can be used to directly generate. Code.

5. Pipeline — use cases → code, skipping plan/tasks

Because the difference from my approach to the tool approach is the two approach is always the same. So usually if you look at times of tira [likely: Kiro], for example, you have product requirements document from that you generate some plan and part of the plan. There will be tasks. And finally, AM will implement the tas. Ks. I escaped the plan task phase. I just use this use case of 10th global and generate code. Type.

So now set use cases are not enough because usually there's a UI, for example, or if you build an API with an API spec. So you need additional information. To that. So this muny's case just defines the behavior, but how it looks can be, for example, a Figma design that you can integrate with MCP server directly interact with. And then usually code is generated and Tess is generated. The question is in which order.

6. Test order and risk-based reviews

So I'm multitwin [likely: full-stack] stack development. That means I have a UI. And if you have a UI tester long, it's difficult because you first need to figure out how do you have your lock life before you start create the test. If you do apis, for example, always go to testing development. So start with tests and then use the tests also to drive the current generation.

And finally, as you can see always there is a review phase. So most often things are reviewed. The question is how much tribute do we need? And that's all about risk management. So I'm currently working on modernization of nearp system [ERP system] for the largest wholesale companies in Switzerland. And if you look at the near piece estim [ERP system], you have different modules in the system that don't have all the same, should I say criticality. So that means if you have the product management or the inventory or if that doesn't work, that's probably not even a problem because people that are working with that inventory management system, they can just go grab a coffee. If the order management system doesn't work, that's. More of a problem because then the company probably will lose money. So you should do risk management and then decide how much review your code from it. But that's not different from AI or mental driven development. That's just how it works.

7. Greenfield vs modernization flows

Now let's have a look in the details already. We have a greenfield project, for example. It would start with the specs. So specs means we have some clients engineers, product owners, business analysts. Depending on how the people are called in your organization and they create the entity modeling use cases. So they can also derive that directly from the grindings, for example. And then we will have a phase where we can review that. So then we will be something like the name of done [definition of done] and the software engineer, the gasoline use case is done. Or the specification is done.

And then we use the engine. And the engine comes with what a lot of people were already talking about. So we need scales [skills] and MCP guidelines, guardrails. Whatever. You'll see that in a minute in the demo. But because we don't do the plan and task phase, we need that in the middle. That is probably the most important thing of the whole process. So that means the skills must match the outcome. That means we need skills that know how to code and test should be generated. And that's not always the same. So when we're 24 and six customers [likely: in 2024, six customers] with that process and all skills are different. Because I have customers that use React and Spring Boot on this new spot in the springboard that sort of charge landmark are using Angular and Quarkus. So I have a variety of different frameworks and also. Some in-house frameworks maybe that need to be working in that context.

Now that's a green view project, but I really rarely do that. What I do mostly is software model session [modernization]. So I'm doing mobilization for enterprise applications for about eight years now. And there I just changed the process. That means I extract use case identity model for coding tests and documentation, maybe I've conflict here and there because usually documentation spread around multiple artifacts that we have. And then we have the entry moment use cases. They will be revised for reviewed by the business people. And then we generate the.

Because there are a lot of ideas that we can directly transform maybe from coal to jump [likely: COBOL to Java]. So that's also something that ant traffic [likely: Anthropic] is telling us that that never worked. So we did that like 30 years ago called C or C++ or something, but that's not helpful because that's just left legit [lift-and-shift]. And modernization is not left to chess [lift-and-shift] modernization is rethinking how people are working with the software, integrating features that maybe are not there. And because of these rigorous engineering, we also have. Positive feedback from the end users because we are not transforming from one technology to another. Because we probably want use case and then all the feature. S.

Because usually you don't do that. So the guideline that we started with the globalization [modernization] project two years ago was we want to have the exact same system just didn't know the technology and we don't review just because we don't want to use newbox. And now we can just do that because we've just changed.

8. Why use cases (vs user stories) + PetClinic demo

Now the question is maybe why use cases? Because usually I use the stories or any other product requirements document. The point is use cases are very well defined and even AI knows how to write use cases because it's around for a very long time. That means we have use case and hence you believe that for right. We have precommerce conditions [preconditions]. And we have scenarios. We have made cluster [main success] and I have done alternative flows. And if we compare that, we use stories. We can see that this stories are just collection or use cases to collect refuge. So one user story is usually a flow in the use case. And in my opinion or in my experience you guys are better than if stories because they are simply big [likely: simply too small].

So we have that. And I created a small project. So we have geometry numbers here. By the way, you suddenly know the implication that they display before we have the Spring Boot on it. So that's kind of a demo project from Spring Framework. Just some history Spring that was using Pet Clinic because before we had drama enterprise station and they had shop. And they wanted to stay in this pet industry. So to say, to take it that.

So what we can do here, we can see doctors, we can find owners and all these are pets, for example, at one of the league as true and rows here. And for example, joy is here. And a broken leg. And you can have the. So that's more or less. That's very simple. A very, very simple application, but it's more or less what I'm doing. So I don't do that simple application, but I do a business application. So we have UIs, data, stuff. Like that.

And what I did, I reverse engineer Pet Clinic. And that's a use case, UML diagram, by the way. And this is quite helpful because in the diagram, we have actors. So you have two, the visitor and clinic use it. And these usually are also roles in the system. And then we have like use cases. There are groups. They do parts or modules, whatever. So we have a login page with the doctors and the owner management pet management. And finally there is a management that I just. Did. And when I reverse engineering, I just draw that plan.

We also have an entity model. So that's usually derived from the database model and SQL database or other database. Usually you have information to record that. And here we have diagram that just contains the types that we can see how things are related in the system. And then we have definition. Of doing. This. I will talk about architecture in a minute because working like that or working with AI, in my opinion, has a huge impact on architecture style. So we talk about fits well.

And then we have created or reverse engineered use case. Look at that one. That's the list of the doctors. And here you can see that we have an actor. We have preconditions, scenarios. But this one also contains an API that's just reverse engineer. And then we have trans conditions [postconditions] and construction is now a kind of acceptance criteria if you think about users for instance can be verified. It has objective use case. Finally. Most success.

And from that, because I have skills in everything, I just can fully implement it. So that means in those projects we don't prompt. So we have skills for everything and we iterate in these fields. So we try to improve them constantly. We shared skills in the organization. That's kind of problem currently because not everybody in the organization is maybe using the same agent. So we need some skill distribution.

9. Skills, MCP servers, guardrails

I'm using our code [likely: Claude Code] here. I do things that I usually don't do, so I don't run it here inside the ID. That's just for technical purposes. Now it looks at that. But I also have. Heard in the last talk if you were here in that room, I've got empty [likely: CLAUDE.md is empty] and this more or less not much inside, but a very important part. Somewhere I can find it. But somewhere there is no reference to some guidelines. So we have guidelines for architecture, for example, how we structure the application of extract looks like. What tools we are using today. Was about that. But most of the things are in the skills.

And my process also comes with skills. So if you go there, you don't have time to look at that difficulty. You see there are two levels of scale. Some you have skills. Primarily for specification. There are skills specific for one stack that I use. But I stopped adding more skills for more stacks because there are so many combinations out there that's probably going to think that there could be as. It works on that.

And when I go back to this one, because I was already talking about that, in my opinion, the architecture has an impact on how well things work.

10. Architecture — SCS vs microservices vs modular monolith

First of all. In the past 15 years or so, we were doing microservices. And most of my customers did microservices in a very naive way. So they have way too many microservices because they were focusing on micro and microservices. That's probably a problem. So they have this distributed big policy [distributed big-ball-of-mud]. So I'm working for an insurance company and they have around 500 microservices. And that's kind of an impact because they also have kind of 500 micro front ends. And they are kind of related, right? So we have an end-to-end relationship between front-end and back end. And that's the worst case scenario. Because if you want to work with AI on certain part of your system, you have to have context. So you need code that the AI should work on in a single place, at least on your machine or very energy work with that with AI. And if you have 500 microservices, a lot of components somewhere and you have to mix and match that, that becomes very diffic. Ult.

Now there's movement away from microservices or modular monolithic application. And I would say stop here. Don't do that because that's not the right way because maybe you have like a huge long light [likely: huge monolith] that we have currently piece estimates [ERP] and one of modules. And that's a problem, right? Because the context is big. So how can we learn from that?

And there's an architecture style that not so well known, but. It's approximately created when microservices were created. It's called self-contained system. And sometimes architecture just says we create verticals. So we split our application to verticals which has UI, space [business] logic and database in one report [repo], usually one project at least. We multiplication. And if you have that, if you can split that and we do that with the ERP modernization, then the AI can exactly work on that. And you can also add skills, especially depending on technology you're using. So for example in inventory system here we use lot [Vaadin] in my demo. That's enough web framework in Java, but for the ordinance system, for example, we have React in the front end. So we can have different technology and different system. S.

There's another thing that if you can stay on nonstack [one stack], so for example, if you stay with the chosen state, the TypeScript stack or it consisted, then roll the Granny barter as a simpler [it's even simpler] because you need to create skills and stuff just for technology. Otherwise, if you do what my customers usually doing, they are single penny framework. Like React or Angular and then usually Spring Boot and Hibernate they also maintain.

11. Team and process changes

And one another thing that also quite interesting is the impact on the teams. And that's the biggest change. So for the wholesale company that's relatively simple because that's not a huge thing. And they already work in maintenance mats [mode] and they don't really scram [scrum] anymore, right? They don't have Depending on what features they have to implement. But because we use like the specs as an input, as we also kind of did with the stories, by the way, Development way. But we are way faster, so we cannot wait two weeks. So that's way too long. So we need to be extremely software [extremely small?]. And the other reason we use the team size, we are parcel compliances mounted to develops. We firewall two because of knowledge exchange and maybe it's boring to work. But we reduce that from like five to seven to one or two. And as I said, we do normal spreads [no more sprints]. We continuous flow and we use cases kind of in a countdown [kanban] way to track progress. That's all we. Do.

12. Determinism, regeneration, conclusion

Then something that we can go back. And create is just deep adaptation of. That. And now we have the list of doctors here. I really don't like PowerPoint because this changes. Everything. So that's done. Took one and a half millipur [minutes per?] with the doctor's. Right. So that's a simple use case. But that's something that we have a lot in the pieces that we write.

And why did this work? First of all, because of. This God race [guardrails] that I was talking about. So first of all, my recommendation never let AI create a project. First of all, you just raise tokens that you do that. But if you do that, you probably end up need to know how to date the application. So if you have like Spring Boot, for example, you can start with Spring Boot, you could create an application, the newest dependencies, you can do the way how it's painting currently thinks about great implications. Maybe you have a CLI variables. Use that. Don't you say.

And then you have to define the rules. So that means you don't put everything in here in the 1870 [likely: CLAUDE.md] because there's a study from the university in Zurich. And they say the bigger system prompt more hallucinations. You get. So it's maybe even better to have none of those guys than people. But then you can add architectures or usually that's kind of a format of including architecture documentation. Then we had skills. That's the important thing. A lot of people were talking about that. And we also have MCP servers because skills shouldn't be that big and we have huge documentation, for example, for in-house frameworks or UK, MCP servers, invector search and AI can direct research in the document page. That's how we do that.

And if you do a good job and iterate on that, you really get probably. A near deterministic solution. So what I did before I can delete everything, do it again and get more or less the same outcome. But very important the union should rebuild depending, as I said, on the risk that happens when your system doesn't work, it will have bugs, then you should probably go and add more reviews or depending on if you do reviews by hand or you use AI to you or however you learn from that.

So we don't do pull requests, by the way, at the moment. We do count-paced development [likely: pair-programming-paced] and we do kind of an ongoing reading [review] process. So we work on something and then we do peer readings [peer reviews]. So we work if we are two developers. They work together and to the review to get started. They try to explain parts of the system that they create with the other story completely.

So to conclude, we use not that missing but not specs are not enough. So you need to harness, you need all the context around that that is really works. And something that I didn't talk about is my opinion specs are very sustainable. Hopefully they will be sustainable in the future because currently what I'm doing is reverse engineering of existing content. And every aspect we can probably generate same application with different technology with different drive maybe we don't have even the UI or something directly in this. Place. And the business people can directly change the way systems should behave without you know. So they don't need to.

This accelerates more. The problem is at the moment. Just accelerates the law [likely: the code]. So the requirements phase and spec phase takes some time. For example I'm working for discovery for parliament. They have a business case management software that we want to organize. And then the only one to really prove concept with something with code. But we have two product owner requirements engineers that were on spec and they have much more work to do than I trust augment everything so content working pipeline that they can just change mode as everything will automatically generated. So the kind of the work moves or shapes left. So everything that's left will be private engineering [requirements engineering] because now that's obvious before we had like scrum, you have two weeks to work on the requirements somehow and then lobster [lobbed-over] two weeks working on the implementation. Now you have to mix and then five minutes and two weeks. Something. Like that.

And the most important thing is you should know your architectural domain. And this will lead us to the discussion about the junior developers to appear. At this. And that's it from my side.

13. Q&A

Thank you very much. Simon. We have a few minutes for questions. If anyone would like to ask me, we'll start here.

Q1 (audience): Thank you for a nice hands-on presentation. I have a question about use cases. Did I understand correctly that you generate diagram in Figma and then generate video text to quarantine.

Simon: So usually. Review project, it will have like a requirements capital over something or prolocking private document. And from that I would first generate you space [use case] diagram that we have no view that we have. And maybe this book even gave you an idea of the split replication. Model. S. And then from that you can go ahead and generate that because this use case is not kind of in the user story include. Right. It's just more detail steps. And yeah, our environment engineers and product owners use AI law just to verify use cases to find many publicates [duplicates] or things that are missing and they work a lot together. Yeah. And we call use case by use case because a lot of people say spend a lot more and that same thing not true. It just is requires and then testing implementation. But that's also the case with that at but we don't pick up from design which was produced by use case.

Q2 (audience follow-up): Maybe to follow a question. But like saying your user story is wrong. And you need to update it. What is the flow? Do you regenerate the whole application? Document?

Simon: So if I have a use case for example let's go here and I have 24. I think that. Now I see okay. Let me have something here. So it says that this place the doctor with party mask name and the concept rate at least of special. Now we could go ahead and say no that's not concept rate that would be. The time that I need to separate and what I will do now I will go ahead and say again the same that I did before. Because at the moment we need to call or we read the code right and I don't want, I could throw it away. But then my dentistry [git history?] wouldn't look nice and in order to read you the changes right that would be an iss. Ue.

In my opinion or in my opinion appearance of some people is I just need to source code. So I'm a child developer that's byte code that's executed. AI could generate byte code. There's no reason to generate. The software. Or maybe we end up with programming language customer AI friendly and less human readable phenomena. But at the moment we just work as you would if you do that. Manually. So we really change the use case or the entity model and then we say just apply the change. S.

Moderator: I'm so sorry there's so many more questions but we are out of time. You guys can hunt down Simon and ask him yourselves. Our next session is going to be in 10 minutes. So big round of applause. Because usually on 50 minutes on the same. Question again.

Q3 (in hallway / continued): [verification question]

Simon: Yeah. So we have different verifications. So we have kind of a skill that runs in the pipeline. We have drift management. So because if you work like that. It could happen that AI implement from saying that we create tests that are useless and stuff like that. And that's what we call reflection. So we follow the pipelines that verify if they still implemented or if the implementation part is not as effective. That's not very true. So we can go relatively fast. So as I said really bends into critical application.

Q4 (audience): Yes. And how much you need to invest in that.

Simon: Because what I did there didn't have the time to test that are generated by the white box. Test. Because AI sees back plus implementation like these are just duration [integration?] tests. But we also have Tess and the test engineering team they work in also with AI. Define workflows aspects and from that they generate Playwright tests. Against the running application so that it's more like a black box test. And we have behaviors without the appearance without giving the agent assessment assessment. Exactly that's very helpful especially because they are testing processes and we are at the end. And that's only not because the outcome of the use case may be correct. But the next use case is something else and there's a cap. Right. So we need to have workflows. But the problem is end-to-end tests are very slow. I'm doing integration tests because I'm using a service site render [server-side render] my framework. I can do that. And everything tested for UI backend or database. That runs around milliseconds per test. Same. Gender.

Q5 (audience): So we have approached it from Granite. Option. Do you have. Skills? Process. Ing.

[transcript ends mid-question]

talk-martinelli-spec-driven-development

README.md

tile.json