The Tessl Registry now has security scores, powered by SnykLearn more

Registry EnterpriseCareers Docs

Back to podcasts

The Greatest Time to Build a Startup (The AI-Native Advantage)

10 Mar 2026with Daniel Jones

Also available on

Transcript

[00:00:00] Simon Maple: Before we jump into this episode, I wanted to let you know that this podcast is for developers building with AI at the core. So whether that is exploring the latest tools, the workflows, or the best practices, this podcast is for you. A really quick ask. 90% of people who are listening to this have not yet subscribed.

[00:00:25] Simon Maple: So if this content has helped you build smarter, hit that subscribe button and maybe a like. Alright, back to the episode. In this episode, we are going to be asking the big questions. What are the hidden pitfalls that you do not even know might be ruining your chances of success with agentic development, and what are the most important factors that a developer needs to know about in using agentic coding tools, as well as rolling it out across your organisation?

[00:00:51] Simon Maple: As I mentioned, joining me on this journey is Daniel Jones. Daniel, tell us a little bit about what re:cinq does.

[00:00:56] Daniel Jones: So re:cinq is a consultancy in Northern Europe, and we help people with AI native transformation, whatever that means. And it is a thing that is kind of emerging as the state of technology evolves. We kind of help people across three threads, really.

[00:01:10] Daniel Jones: One is agent coding, so using AI in the software development process to increase productivity. Another is helping people build things that leverage AI. So maybe your product has some ML features in it, or you are building agents. And then the third string of that is helping the non-software parts of the business.

[00:01:31] Daniel Jones: So accounts marketing, how can we introduce agents or workflows that make them more productive? So looking at AI nativity, AI nativeness across all of those different strands.

[00:01:42] Simon Maple: I love that, the AI nativity. We should definitely celebrate that once a year. So Daniel, you work with a lot of large companies and enterprises, talking about AI adoption and how those companies can best take on and roll out agent development across large numbers of teams.

[00:01:58] Simon Maple: Today we are going to be talking about some of the advantages that can bring, but really talking about some of the pitfalls that people very often fall into when trying to adopt, and some of the best practices that we can add in. So why do we not kick off by talking about people having good engineering practices to start off with and where the pain exists?

[00:02:16] Simon Maple: Whereby if someone does not have a good engineering practice and they use agent development, that pain could be exacerbated. So are good engineering practices needed for agent development?

[00:02:32] Daniel Jones: They're certainly beneficial.

[00:02:33] Daniel Jones: They are certainly beneficial. And one of the things that the Dora report found in 2025 was that teams with high levels of development maturity went faster when they introduced agent coding. Teams with low levels of maturity went slower. And this makes perfect sense when you look at it through the lens of the theory of constraints.

[00:02:49] Daniel Jones: If you take one part of a system and speed it up massively, then you are going to create bottlenecks on either side. The interesting challenge there is that we do not know exactly where that tipping point is. So we are not sure how good you need to be in order for things to get better or how bad things need to be for it to be like throwing petrol on a dumpster fire.

[00:03:08] Simon Maple: And when we talk about how good, what does good look like? Is it the documentation that needs to exist? Is it just people following good workflow practices? What does "good" mean?

[00:03:19] Daniel Jones: So good can be measured in a number of dimensions, and there are quite a few.

[00:03:24] Daniel Jones: So we have got things like test coverage. We have got things like alignment on what good looks like for the coding standards. We have got things like the size of batches that they are working on, like how big and well detailed their stories are. We have got things like the path to production.

[00:03:43] Daniel Jones: Like, what is their lead time to production? Can they get changes out quickly? Because if coding agents can put out commit after commit minute after minute, but it takes three days to get anything into production, which is quite optimistic for some enterprises, then the tip of the branch is going to move on miles, and you are going to end up with loads of merge conflicts when something goes wrong in CI.

[00:04:00] Daniel Jones: So there is quite a lot that really would be considered just general good practice from the last ten or twenty years of software development that needs to be considered just because everything goes much faster now.

[00:04:14] Simon Maple: So having those good pipelines, those good workflows, it enables that speed so that you do not just have one massively sped-up component, which is the actual delivery, the actual build of the code.

[00:04:27] Simon Maple: The point is having that if the rest of it is just going to take time. Tests. You mentioned tests. Why are tests so important in true agent development?

[00:04:38] Daniel Jones: So tests are important for one reason, but there is some nuance here, and maybe we can get into the ways that maybe some types of tests are less important or at least less important to humans.

[00:04:48] Daniel Jones: First and foremost, if you do not have good test coverage in your software, then how is an agent going to know that it has broken something? And really, one of the things that people who are new to AI agent coding need to realise and often kind of miss is that your agent needs to be able to perceive something.

[00:05:08] Daniel Jones: If it cannot perceive a problem, then it cannot possibly react to it. So whenever you are getting frustrated with a coding agent, you need to consider, could it have known better? Did it have the information to know that it was doing the wrong thing?

[00:05:19] Daniel Jones: And so if your agent keeps breaking your software, did it get fast feedback from a failing test suite?

[00:05:26] Daniel Jones: If you have got low test coverage, then it is quite likely that it could break a whole bunch of things and be absolutely none the wiser. And that is not really the agent's fault because you have not given it the information or the ability to perceive the fact that it is making a mistake.

[00:05:41] Simon Maple: And is it actually the test runs that inform the agent, "Yes, this is broken because the test failed," or is it actually the agent using the test cases as context almost to understand that this is expected behavior? This is kind of what we intend the application to do.

[00:05:59] Daniel Jones: That is a really good point, and it is both. Certainly the failing tests. If your agent is well-mannered enough to actually run the tests all the time, then it should figure out that it has done something wrong.

[00:06:12] Daniel Jones: But using the tests as documentation and guidance of what is expected is quite important. I think that this kind of brings me to what I mentioned earlier about nuance; maybe unit tests are not for humans anymore. Maybe those are too low level. If unit tests help the agent figure out whether it has broken something, great.

[00:06:32] Daniel Jones: Agents, go and write as many unit tests as you like. But acceptance tests, kind of outside-in acceptance tests that define what a user should be able to do with the system, those are important for your ultimate safety barrier of "this is what I need it to do." But also that guidance that you talked about of this expected behavior. This is what it needs to do. These are the features that we want to have. Regardless of whether they work or not, this is what it is supposed to do.

[00:07:00] Daniel Jones: And that kind of then bleeds into things like spectrum development and where the source of truth is. Is it the tests or is it the spec? But these are all things that, going back to the previous point about the last decade or two of good software practice, include things like behavior-driven development.

[00:07:15] Daniel Jones: A good BDD test suite should have been describing what the software does at a kind of human-accessible level anyway. So if you have got that kind of thing in your code base, then agents will be able to work with it better.

[00:07:27] Simon Maple: And it makes sense really because if we as developers are turning more into intent machines where we describe and define the intent that we expect, and it is actually the agent's responsibility to effectively generate that implementation that describes the behavior we ask for, the tests we care about should be to validate the intent that we are asking for.

[00:07:38] Simon Maple: Effectively generate that implementation that describes the behavior. We, we, we ask for the tests we care about to validate the intent that we're asking for, whereas it should be the agent's responsibility to create the unit test because they're creating the implementation. So it'll be weird for us not to care as much about the code but then to care about the unit tests for that code.

[00:07:59] Simon Maple: It is nice to keep both test and intent or generation levels the same. What about typical development hygiene, let us call it, so things like version control and those types of things? Is that as important with organisations doing agent development?

[00:08:21] Daniel Jones: Absolutely. Version control in particular is critical to making good use of agents because sooner or later your agent is going to do something that you do not want it to. It is going to commit a binary into Git, and then you are only going to notice four or five commits later, and you are going to want to remove that. So if folks are not able to do things like interactive rebasing, which, to be fair, can be a little bit fiddly, and I found a little bit mind-bending when I first learned how to do that, then you are going to end up with quite a messy code base that will mislead humans, and it will mislead agents.

[00:08:50] Daniel Jones: Another thing to consider when it comes to version control is that commits and branches are cheap. They are next to free. They add very little overhead. So making lots of small commits, which again is something people would argue, or I would certainly argue, is good development practice anyway.

[00:09:06] Daniel Jones: Maybe making even smaller, more frequent commits is better because then you can throw away the work your agents have done.

[00:09:13] Simon Maple: Hmm.

[00:09:13] Daniel Jones: And when it comes to behavior change, a lot of adopting agent coding is not just knowing how the tools work and gaining new knowledge. It is unlearning old habits.

[00:09:25] Daniel Jones: So being happy to throw away some work that an agent has done because you have got plenty of commits that you can go back to.

[00:09:32] Daniel Jones: You are going to be quicker throwing away something that an agent did and then starting again than you would have done typing on the keyboard yourself like it was 2022 or something like that.

[00:09:43] Simon Maple: And agents are not precious about their code. They do not care about throwing that code away, right?

[00:09:47] Daniel Jones: I often say to people that you should imagine you are the worst software manager possible. Insist that you treat your agent a bit like a junior or a member of staff that you really hate and want to demoralise. Throw away its work. Get three of them to work on the same thing at the same time and only have one of them have their work used.

[00:10:06] Daniel Jones: Those patterns actually make you more productive. Even if our AI overlords look back through our chat history in 20 years' time, we are all going to be in trouble.

[00:10:17] Simon Maple: It is always good just to add in the odd "I respect you" or "I very much appreciate you" because we are not always going to be at the top of the pecking order.

[00:10:23] Simon Maple: I am not sure.

[00:10:24] Daniel Jones: Indeed.

[00:10:25] Simon Maple: So version control. Let us continue talking a little bit about version control. Who should decide about the commit? Is this something that we just maybe use as a hook to an agent or something where we just say, "Look, after every reasonable amount of code change, whatever that is, I want you to make sure you commit into a Git repo or something like that. Who decides?

[00:10:43] Daniel Jones: I think that is a key indicator of how psychologically comfortable people are on their journey to adopt agent coding. Certainly to start off, I think folks are much more comfortable making commits themselves. Ask the agent to go and do a load of stuff. I am going to read the code, I am going to validate it, and then I am going to commit it.

[00:11:04] Daniel Jones: As you build confidence and build competence with things like spec-driven development, it becomes more acceptable to people psychologically. I am going to let the agent do the committing here. Anytime that we introduce a human having to take an action, we slow things down by several orders of magnitude.

[00:11:24] Daniel Jones: So if your agent is sitting there going I am done, do you want to commit to this?" and all your attention is elsewhere, then it is making way less progress than if you let it do that.

[00:11:33] Simon Maple: Yeah.

[00:11:34] Daniel Jones: But then there need to be some kind of guardrails around that to make sure that it is not just steamrolling ahead and definitely not doing any force pushing.

[00:11:44] Simon Maple: Yes. It is funny actually because when I first started building with agent coding agents, I did manual commits exactly the same. I did manual commits, and actually I was able to go so fast I realised, "Oh crap, I should have committed some of this a long time ago."

[00:12:02] Simon Maple: The one thing I do love now is that I always let my agents commit for me, as they feel like changes are being made. The nice thing about that is when you go back you can dissect commits, and you can go back to a certain point very easily because it actually commits pretty often.

[00:12:18] Simon Maple: And like you say, it is free. It is very cheap to do these commits. It makes software development going back retrospectively much nicer. You mentioned a little bit about safety when obviously there is a YOLO mode, which, when we do want to increase the autonomy of coding agents, we do want it to just go away and do some stuff without constantly asking like a child, "Can I do this? Can I do this?"

[00:12:41] Simon Maple: What do we need to think about from a safety point of view, particularly from an enterprise that is not going to run havoc on a machine?

[00:12:48] Daniel Jones: One of the points that provided much more friction than I realised was the use of containerisation and people's level of maturity with that. Having come from a cloud-native background where we were deploying things in Kubernetes and containerising everything, I kind of took it for granted that everybody would be familiar with things like Docker and Colima and alternatives.

[00:13:16] Daniel Jones: It turns out a lot of enterprise developers are not, and a lot of enterprises are not necessarily set up for people to be able to run things in containers on their development machines. If you do not, then you have got this situation where people can be running coding agents with dangerously skipped permissions natively on their host, and I am pretty sure those people are going to go to hell if they keep doing that.

[00:13:37] Daniel Jones: So running things inside containers is definitely the way to go because then you can let it off the reins, and you do not need to be there to constantly say, "Yes, you can do that; yes, you can run ls; yes, you can cat that file.'"

[00:13:56] Daniel Jones: I was optimistic about dev containers. This open standard that came out of Microsoft and VS Code supports where you put some config in a directory, and VS Code will detect that the config is there, and then it will relaunch itself inside a container.

[00:14:10] Simon Maple: Nice.

[00:14:10] Daniel Jones: That is all great, but the user experience and the config fit into a sweet spot of hallucination that maybe we are going to talk about in a bit.

[00:14:23] Daniel Jones: It is recent. It is not super popular, but it is popular enough that agents tend to hallucinate the config, which is super frustrating. Dev containers are a nice solution to that, but maybe not quite mature enough. I have certainly seen enterprise developers struggle with that and spend hours, if not weeks, fine-tuning their dev container config and not being able to move forward and make progress.

[00:14:50] Daniel Jones: That whole containerisation thing is like MCP as well. I understand why it is this way, but it would be lovely if all MCP servers, like all the standard IO ones, were running containers by default. We have had Docker for quite a long time. It would be nice if everything was just secure out of the box.

[00:15:06] Simon Maple: And I was chatting with Oleg Shula, actually a former colleague of mine who worked at Docker as a developer advocate at Docker. I was chatting with him about Docker sandboxes, and they seem ideal for exactly this type of thing. I think enterprises are really going to want their developers to actually start developing in this.

[00:15:24] Simon Maple: When we talk about the setup cost though, is it easy to develop in a sandbox compared to, obviously, if you are a developer and just want to go to a terminal, kick off Claude, and use your local resources and things like that? How much is there if we are doing this in a container?

[00:15:40] Daniel Jones: It is not massively problematic or massively burdensome in my opinion.

[00:15:45] Daniel Jones: So if you are in that environment where you are not given freedom and space to explore new things regularly and you are of the mindset that you learn one thing a year, you get sent on a course, your company pays for it, you get a certificate at the end, and then you have learned a thing, then it can be quite uncomfortable to be using new tools like containerisation or dev containers.

[00:16:10] Daniel Jones: You might be using a different terminal emulator if you are used to using an IDE, like a big graphical IDE, and then somebody says, "Spin something up in a terminal emulator," and you are going to need to authenticate with Claude code.

[00:16:28] Daniel Jones: Then there are line breaks in the URL, and you cannot redirect back to the port on localhost because it is inside a container, and it is not exposing that port.

[00:16:45] Daniel Jones: Then you need to copy and paste tokens. Those little bits of friction definitely put people off.

[00:16:51] Daniel Jones: It is something that is worth considering, guiding people through those steps because you do not want people to bounce off before they have seen how productive these tools can be.

[00:17:01] Simon Maple: Yeah, absolutely. I think it is the dangerously skipped permissions, the YOLO flag of I am just going to let Claude or whatever agent you are using go off and do whatever it wants. Those are the instances where you really need it in a container because at least in other instances you still get that choice of saying yes I am happy for you to do this or no I do not want you to have access to that.

[00:17:23] Simon Maple: You mentioned spec driven development a little bit. Providing it with the right intent and the right instructions is very important. We can do that in a number of ways and we will maybe talk about skills and context in a little bit.

[00:17:44] Simon Maple: But when we actually talk about the task that is required, previously we would relate this to something like a user story. How important is it to provide an overly verbose or a concise user story? What are the important things that we need to add?

[00:17:55] Daniel Jones: The story that you can provide to your agent and the unit of work that you provide to it is hugely important in the level of detail provided.

[00:18:06] Daniel Jones: Something that I think we are exposing as we get agents more and more into the mix is the amount of knowledge in the head that is required or the amount of knowledge in the head that we all take for granted when we are doing software development. Donald Norman, in The Psychology of Everyday Things, a book that was renamed The Design of Everyday Things, talked about knowledge in the head and knowledge in the world.

[00:18:28] Daniel Jones: Knowledge in the world is discoverable. It is written down somewhere. Having your coding standards on Confluence, where maybe an agent could get to it, would be knowledge in the world. Knowledge in the world would also be a well-written story with acceptance criteria so you could write a test from it.

[00:18:45] Daniel Jones: Knowledge in the head would be I spoke to my product manager last week, and I know how they want this to be implemented, but it is not written down anywhere. If you are missing that kind of information and there is lots of human context required, or you are used to having that and just going, "I will ask on Slack for a clarification," that is a really bad spot for agents to be in.

[00:19:03] Daniel Jones: They are not going to be able to produce results that you want. They are not going to be able to implement things without making stuff up or asking a whole bunch of questions. So having a decent level of specificity there is important. With one of the organisations that I have been working with, we asked them to connect their coding agents to the Atlassian MCP server and then start critiquing stories, which the product managers are maybe raising an eyebrow at. But it is putting back pressure on the product about how well specified these things are.

[00:19:39] Daniel Jones: That is not necessarily because product teams are doing a bad job. It is just that they are used to having a much more informal relationship. Something I have seen in quite a few organisations is that you speed up the software development part, and that puts pressure on product almost immediately.

[00:19:55] Daniel Jones: They are like, "Well, mate, we have not got enough stuff figured out." We cannot keep the agents well fed. We are going to need to massively speed up what we are doing here. And also we are just so used to it as humans, maybe not going into all the tedious detail we need to because if somebody does not understand, then they can come and ask us rather than meticulously writing out every single detail.

[00:20:17] Simon Maple: I want to switch a little bit now and talk about the tools people aim to use for a lot of these things. In the past it has been very much a discovery kind of thing, and developers have been playing with as many tools as they can just to try and get used to AI and switch their mindset into an "I can do this with AI" approach.

[00:20:40] Simon Maple: What do you see from an organisation or industry point of view in how companies are trying to adopt a range of tools or focusing on specific tools?

[00:20:53] Daniel Jones: 2025 was definitely the year of tech leaders saying, "I am going to let people use what they want, and I am encouraging them to try things out."

[00:21:02] Daniel Jones: We have got somebody over here using GitHub Copilot, somebody over there using Claude Code. I think part of that came from a lack of knowledge and comfort from tech leaders being like, "There is so much to keep track of." I do not know what the latest thing is, and I do not want to bet on any one horse because I am not sure.

[00:21:23] Daniel Jones: There was also a fear that if I say we are going to use this tool, not only might it not be the right one, but developers might not like it, and then they are going to push back because they have a lot of pride in their job and a sense of identity from it. So 2025 was a lot of soft-touch, organic, disorganised trialing of things.

[00:21:43] Daniel Jones: I think over Christmas Anthropic did a great thing for their marketing by giving people free usage over Christmas. So lots of board members, CTOs, and VPs of engineering tried out Claude Code and went, "Wow, this really works." So 2026 has definitely seen a shift in people wanting to be much more structured. They have seen what is possible, and now they want to roll this out across the department, across the team, and across the organisation.

[00:22:04] Daniel Jones: We are seeing people try to standardise on tools, and I think there is still quite a lot of anxiety over what is the right tool to pick. From an enterprise and organisational level, a lot of it sadly is driven by licensing and the logistics of how we pay for these things.

[00:22:30] Daniel Jones: There is some speculation at the moment as to whether Claude Code subscriptions are going to get revoked for enterprise usage. I have seen some chatter about this. But if you have got a thousand developers and each of them needs to expense an Anthropic subscription every month, that is a huge amount of overhead as opposed to having an enterprise agreement.

[00:22:50] Daniel Jones: So we see quite a few people opting for GitHub Copilot just because it is part of their GitHub Enterprise offering. Whether that is the right choice for every organisation, though, is an entirely different matter.

[00:23:03] Simon Maple: And is the worst decision indecision here?

[00:23:07] Daniel Jones: I think there is a strong case for that. Often when people are faced with transformative change, they jump in with both feet and no plan.

[00:23:18] Daniel Jones: We saw a lot of that in 2024. Companies are saying we need to do some AI. What do we need to do? I do not know. Do some AI. Here is some budget. We definitely saw that happening at large financial institutions, and lo and behold, they did not get any return on investment. Who would've seen that one coming?

[00:23:35] Daniel Jones: Then there is the alternative of saying we are going to wait until this is settled science and we are not going to waste money getting up to speed on this. The speed of change continues to increase as time goes on, so holding off is not a viable solution. Having some amount of budget, some amount of time, and some amount of slack in the system to explore and trial things out is the most sensible strategy.

[00:24:01] Daniel Jones: Otherwise there is a decent chance that you will get outpaced by a startup that is embracing these things. I can think of some SaaS providers I have worked with who have had no competitors for twenty years. In the last six months two have popped up replicating their feature set entirely.

[00:24:20] Daniel Jones: These are brand new startups who presumably are using agentic coding to replicate their software very quickly. There is a chance that you will fall behind if you do not stay on top of this stuff. Then if we do get to a point where it is settled science, do you want to be starting from a standstill?

[00:24:40] Daniel Jones: There are approaches like managing context windows that, as far as we can see, if models continue working the same way they do and the architecture does not change radically, this is going to be an underpinning constraint of all agent software development going forward. Daniel Jones: Even if we end up with software factories that build upon these things, you still need to understand the fundamentals underneath.

[00:25:03] Simon Maple: That is a good segue. Let us talk about tips. I would love to talk about tips you can give developers directly, and then we will maybe talk about tips for rollout across an organisation. So tips for developers. You talked about context. Let us go deeper into that. Context is where you provide your message about the task you want the agent to do.

[00:25:22] Simon Maple: Context is really important. Whether that is skills or instructions about how your organisation wants to develop software or those types of things, how much is the right amount of context?

[00:25:35] Daniel Jones: Looking at academic research, certainly the NoLiMa paper from 2024 suggested that above thirty thousand tokens of context reasoning ability drop off by about fifteen percent.

[00:25:46] Daniel Jones: There is another paper, the name of which escapes me, that was repeating the experiment in 2025 for GPT 4.1 era models and found between 30,000 and 60,000 tokens. Again, reasoning capability drops off. That is not very much. There have been, I mean, Claude Code has three releases a day, so things change all the time.

[00:26:07] Daniel Jones: But there have certainly been points where I have opened up Claude Code with no MCP servers installed and no Claude MD, and it is already using like 40,000 tokens just for its system prompts.

[00:26:18] Daniel Jones: So it does not take much to tip your model into the point where it is going to get confused. One of the challenges with this is that when I was starting to develop agent coding training for people, I was on a video games website over Christmas, and gamers tend to hate generative AI.

[00:26:38] Daniel Jones: They think it is going to be the ruination of all things. I was going to try and educate people about why it is not all bad, and I mentioned in the comments on some website that I was doing agent coding training. Somebody posted a very snarky response like, why do you need to teach people how to use coding agents?

[00:26:56] Daniel Jones: Do they not just work? I thought that was actually quite a good point. Because these things just work, it is not until you have problems that you need to understand what is going on underneath. Context windows and context management are a good example of that. If you just use a coding agent and everything is working fine, you do not need to worry about context.

[00:27:16] Daniel Jones: Then your conversation goes on a bit, and it starts to fall apart, and you get more frustrated, but it is not obvious that it is because your context window, the amount of context that you are using, has gone over the effective limit or that you have distracted it by talking about too many things in one chat.

[00:27:36] Daniel Jones: That is something that quite clever engineers who are new to this do not immediately notice.

[00:27:44] Simon Maple: Yeah, and I always say a developer knows when they are in the Goldilocks zone of context. They are one-shotting prompts because they are giving enough context to have that acceptable answer.

[00:27:56] Simon Maple: They are not giving too much context, so the agent gets confused. It is just enough. How should a developer keep just enough context? Is it something they should proactively monitor or something they should react to if the agent is going astray?

[00:28:12] Daniel Jones: I was certainly of the opinion that people should be proactively monitoring this. Also workflow habits are important. Whenever you are at a clean break, clear context.

[00:28:21] Daniel Jones: Whenever you need to ask an off-topic question, and my mind does not stay on one train of thought for very long, so I am normally thinking about three things at once, use a different chat. Use a different app.

[00:28:41] Daniel Jones: I use Warp, an agentic terminal. If I need to ask something related to the Linux command line, I will do it in Warp. If I am doing something in Claude Code and trying to build a feature with Claude, I try to keep that on topic. If I need to do research, I will go to a web browser and use a chat app like ChatGPT or Gemini. Splitting up conversations has the advantage of not distracting the model with too many concepts in one chat and not extending the amount of context that you are using.

[00:29:02] Daniel Jones: So clearing your context regularly and not going off topic are valuable things to do. The other thing that may be slightly different is not adding all of the MCP servers.

[00:29:16] Simon Maple: Yeah.

[00:29:17] Daniel Jones: And not adding all of the instructions for every possible eventuality of how to write software into your agent's guidance or your agent's MD.

[00:29:25] Daniel Jones: There was a paper that came out at ETH Zurich a couple of weeks ago that showed that adding stuff to your agent's MD will reduce the quality of outcomes by about 20 percent compared to having no instructions whatsoever. It is interesting looking at things that have been fashionable over the months as people learn about this.

[00:29:46] Daniel Jones: I remember around September people talking about putting everything in Agent MD. Every time it makes a mistake, add something else in agent MD. What you are doing there is adding something that is largely irrelevant most of the time in the hope that it is relevant this time.

[00:30:01] Daniel Jones: That is not a great thing to do. Then it became I am going to add all the MCP servers for every eventuality," and that causes its own problems. A lot of people do not realise how tool calls work and the fact that the full menu of tools needs to be provided to the model on every single prompt just in case it wants to use one of them.

[00:30:20] Daniel Jones: So skills and progressive disclosure are a much better solution to this. Instead of sending the model all the details about a skill, there is a little summary, and the model can ask for more information if it wants to.

[00:30:33] Simon Maple: Let us talk about that because I think that is really important.

[00:30:35] Simon Maple: There are a couple of things here. One is the behavior under the covers, and the second is more about activation, which is a global problem today but something we expect to get better. You talked about adding things into the agent MD. Anything you add into the Agent MD file the agent will fully read. It does not necessarily mean it will act upon it, but it will fully read it.

[00:30:54] Simon Maple: It doesn't necessarily mean it'll act upon it, but it will fully read. When you add an MCP, uh, again, there's a description there. It chooses when to activate it. Similar to a skill, I guess, um, a skill. Like you say, there's a description there. It chooses when to use that. These are more of the on-demand style approaches.

[00:31:12] Simon Maple: When you add an MCP, there is a description there, and it chooses when to activate it. Similar to a skill, there is a description there, and it chooses when to use that. These are more on-demand-style approaches.

[00:31:30] Simon Maple: So the overhead there is very real. If you have hundreds of MCP servers or hundreds of skills, it is going to get confused, and it is probably not going to pick the skills you want it to. Do you have experience or advice in terms of not necessarily what the right number of skills or MCP servers are, because it is very subjective, but what advice would you give to someone who is thinking about adding yet another skill or yet another MCP to their environment?

[00:32:01] Daniel Jones: I think add them judiciously. Maybe have loads that you rely on but do not have them all added and registered at the same time. Certainly for MCP servers, do not have them all added and registered at the same time. If you do need them registered at the same time, then maybe selectively filter out their tools. If your agent does not support that natively, then maybe you could use an MCP gateway for that.

[00:32:21] Daniel Jones: For skills, I think there is slightly less of a problem with that because of the progressive disclosure. But you might end up with multiple skills that have very similar descriptions, and then the model is not sure which one it wants to ask for more information on.

[00:32:37] Daniel Jones: That could be a problem. I know, having spoken to Amy Heineke from Tessl, who was talking about this and some of the challenges that you folks face in making sure that the things you are building get selected by the model, it is a thorny problem.

[00:32:53] Simon Maple: Activation generally is a thorny problem in terms of making sure that the agent chooses the right skill or the right piece of context at the right time.

[00:33:02] Simon Maple: It is definitely a problem that exists today. I think it will be less of a problem as time goes on and as agents get more used to using skills.

[00:33:02] Simon Maple: It will be interesting to see how that turns out.

[00:33:02] Simon Maple: When we think about a user, a developer, part of a team building a certain part of a project that is much broader, when we think about context management and replicating their environment to share essentially the environment that that developer has with the rest of the team.

[00:33:31] Simon Maple: What is the best way to replicate that context? The context management from one developer to another, is that something that should be more global and stay with the repository, or is that something that should be more subjective to the developer and how they want to work?

[00:33:54] Daniel Jones: That is an interesting point and one of the places where team dynamics and the conflict avoidance that sometimes is arguably necessary or allowed in large teams to proliferate can create problems. If you have lots of developers with different ways of doing things, then maybe in the old world before agent coding that was acceptable. You knew that Rachel over there has her way of writing code and Steve over there has his way.

[00:34:23] Daniel Jones: But if you have an agent going over that code base and it is being confused by two different code styles, then that becomes more of a problem. Generally I would say that having things as global as possible and consistent is going to lead to better results.

[00:34:43] Daniel Jones: As we move away from individual developers having their own idiosyncratic styles that they like and defer more of the implementation to agents, does it matter if the code is written the way that you like it versus the way that I like it? As long as the agent is able to deliver useful outcomes and features more quickly.

[00:35:01] Daniel Jones: So trying to standardise and make sure that these things are available to agents when they need them is important. Whether that is global, and maybe you put that in Confluence or a wiki and give your agent an MCP server so it can access that.

[00:35:24] Daniel Jones: The problem with that is drift and when it changes. You are probably better off having that in your code repository so the guidance is pinned to a particular version. As that evolves, it is clear. Wikis are often where documentation goes to die.

[00:35:34] Simon Maple: Yeah.

[00:35:35] Daniel Jones: If you want stale and out of date information that is a good place to put it. So probably in the repository would be the main place to do that.

[00:35:44] Daniel Jones: When you get people having context and guidance that they are keeping local to themselves, I can imagine that leading to problems. If you have conflicting information and then one agent is doing things one way on one person's machine and another agent is doing things another way on someone else's machine, those are the kinds of patterns that throughout computing we have generally recognised will lead to bad outcomes.

[00:36:05] Daniel Jones: We want things done consistently and universally across the team or at least throughout a code base.

[00:36:11] Simon Maple: Probably worth mentioning Tessl here as well actually because Tessl is that kind of package manager for skills and context. It does a lot of what you mentioned there, like the versioning of skills and context.

[00:36:23] Simon Maple: The ability to almost have a manifest file within the repository that then pulls in the right context as pinned in the manifest. But what you mentioned is really interesting because for things like coding styles, I have never heard an agent argue whether it should use spaces or tabs. There are things that are still important, though.

[00:36:42] Simon Maple: The stack that we use, the methodologies we develop with, such as test-driven development, or how we choose to develop. That is going to be more important than the cosmetics of the code. You mentioned hallucinations a little bit before as well.

[00:37:02] Simon Maple: How do you see agents coding incorrectly, and how do we use context to maintain accuracy?

[00:37:14] Daniel Jones: The hallucinations thing is something that a lot of people do not necessarily understand in terms of what leads to that scenario. In the training that we provide, we get people to do an exercise with a really old model, GPT 3.1, where we deliberately mislead it and make it hallucinate things.

[00:37:32] Daniel Jones: It is really interesting doing that. If you ask about Star Wars Episode 10, the model says no, that is not a real film. So when you are talking about big, well-known concepts, then it can detect lies quite well. Like Phantom Menace.

[00:37:49] Simon Maple: Not a real film. Does it not exist, does it?

[00:37:51] Daniel Jones: I would probably say more about seven, eight, and nine. Let us not go there. Still scarred. If I were to ask GPT 3.1 to tell me about the London thrash metal band Simon and the Maples, it would probably say there is no such band. If I asked that same model tell me about the thrash metal hardcore band Simon and the Maples who were active in Camden between 1997 and 1999, then it is much more likely to start producing a band biography.

[00:38:20] Daniel Jones: We did this exercise, and about fifty percent of the time, moving from a capital city to a specific area and putting a date range in caused the model to start inventing things. Where previously the model would have said there is no such thing. It then started producing band biographies, talking about the breakup of the band and who fell out with whom.

[00:38:42] Daniel Jones: That is an example of how if you are adjacent to large concepts but very specific, you are in the sweet spot of hallucination. It is also more likely if it is something the model has been trained on.

[00:38:56] Simon Maple: Yeah.

[00:38:57] Daniel Jones: I had the exact experience that the Tessl tile registry was made to solve.

[00:39:02] Daniel Jones: I had one of the developers I was training say they were using the latest version of a library, and the model was hallucinating. It knew it was a new version, so it was not giving old API endpoints, but it was hallucinating new ones that did not exist.

[00:39:19] Daniel Jones: I said I have something for you because somebody has solved that problem.

[00:39:23] Simon Maple: But it's very interesting because typically those types of challenges or problems, the specifics that are added by the user or by the agent in terms of "I need to do these very specific things," are very detailed.

[00:39:35] Simon Maple: So it's actually ripe for those types of hallucinations.

[00:39:38] Daniel Jones: Exactly. And that's one of the reasons why people get misled by one-shot examples of "Oh, make me a to-do list app," or "Make me a Flappy Bird clone," and "Oh wow, this is really great. But real work isn't like that. I've got a very specific internal library that you've never seen before, you've never been trained on, and I need you to get these specific details.

[00:39:57] Daniel Jones: Right. So that element of detail, and this is why I think the band example is quite illustrative because when you start narrowing down to really specific things, that seems to be when it's more likely to hallucinate.

[00:40:10] Simon Maple: Yeah. Because we mostly toured around the Chelsea area anyway, so it's probably very easy to do.

[00:40:15] Simon Maple: So we have talked a little bit about agents and very little about LLMs. And actually when we think about the hallucinations and things like that, it's coming from the LLM. Yeah. So when we think about the roles of each and how we select and choose the tools, what should we be mindful of when thinking about that?

[00:40:35] Daniel Jones: And I think that's a really good point. A lot of people are new to the technology, and I have probably been conflating the two because you interface with the agent, and then really it's the model making decisions about which tool to use. The agent itself is not very smart.

[00:40:51] Daniel Jones: It doesn't have very much logic in it. It can't do much. It's just a loop running around, and the model is making decisions. When folks are new to agent code and get into this, they don't necessarily understand what the difference is between the agent's responsibilities and the model's. So when we are doing training after we have done an exercise with people, we go through on a whiteboard, like, okay, what did we just do?

[00:41:14] Daniel Jones: And then draw it up or drop all of the components where they live, like where the tool definitions are, how they get to the model, and all that kind of stuff, which helps cement it in people's minds. Folks that are coming into this are new, and there are a lot of them. Like, in the last couple of weeks, I know so many senior engineers who have reached out to me directly and been like, "I feel like I'm falling behind."

[00:41:36] Daniel Jones: Can you help me get up to speed with this? And they get up to speed really quickly, and they normally jump in with Claude Code or something, a terminal-based agent. They get used to using it and get it productive but still without knowing what is under the hood. And so folks end up worrying about which is the best agent to use.

[00:41:56] Daniel Jones: There are functional differences in terms of how large each of the system prompts is and how much effective context window that leaves you with, but really it's more like personality. It's more like different brands of cola. They are all full of sugar and fizzy; they just have slightly different spins on them.

[00:42:16] Daniel Jones: And really also the models, like if I use the latest GPT and if I use Opus 4.6, they are probably going to be roughly equivalent in terms of what they can do. If I switch from Opus to Haiku, then I am going to notice differences. But really the different foundation frontier models are all roughly capable and equivalent.

[00:42:39] Daniel Jones: People see the brand names, and they see the tools, and they are like, "Oh, well, Codex is going to be much better than Claude Code." What I think is more important is ergonomics. Like, do they have the features that you want that allow you to work in the way that you want to without having to sit there and babysit the agent?

[00:42:59] Daniel Jones: Do they support skills? Do they have the UX that you want? Productivity is more important than correctness. Correctness is something you do not have much control over. Either the model is trained well, or it is not.

[00:43:27] Daniel Jones: The productivity that you get from having the features that you want and the user experience that you want, combined with the context management, that is a much bigger determinant of whether you are going to get useful outcomes rather than which model you are using because they are all pretty close.

[00:43:27] Daniel Jones: The frontier labs are trying very hard to keep up with each other.

[00:43:34] Simon Maple: And it could also be around the way you as a developer ask your questions or provide your requirements because the models are going to pick up differently.

[00:43:43] Simon Maple: The models are going to have different ways of activating the skills using the context. And so it may be, and perhaps this is why there are so many arguments as to who gets the most out of which models and agents, because people are asking things in a different way.

[00:43:58] Simon Maple: So a lot of it could just be which agent actually works best with the way you like to work, the way you ask for things.

[00:44:06] Daniel Jones: Yeah. One of the ways that agent coding is problematic here is because of the non-determinism. It is a breeding ground for superstition. If you look back through the annals of human-computer interaction and usability engineering, there are plenty of examples where, when people get inconsistent results, there is a word for that: intermittent and incoherent.

[00:44:24] Daniel Jones: Then they start to come up with superstitions like, "Oh, it only works when I click on this thing first before doing the doodah." So folks can start getting all these superstitions about which prompts work and which ones do not, which is why the kind of work that Amy was talking to me about, about running evals on things, is so important to try and get some science into this.

[00:44:53] Daniel Jones: From my own experience and working with the hundreds of developers that I have, the whole prompt engineering does not seem to be that important. It is more about context management and, as you say, how you communicate requirements. The whole idea of how you phrase the question or if you give it a persona does not seem to make too much of a difference, even if it is something that is quite immediately engaging to humans.

[00:45:19] Daniel Jones: It is more about framing the context, making sure the right information is there, that the agent has the ability to perceive when it is doing the wrong thing, and whether your requirements are coherent or not. So let us talk about the data because I think that is an important piece.

[00:45:37] Simon Maple: Very often so much of this world is anecdotal in terms of, "Oh yeah, I think this skill works most of the time; let us leave it like that." Or essentially going from the gut feel of my experience. How do we bring data into that, maybe through skills or context as well?

[00:45:55] Daniel Jones: Well, I gather that you folks have a-

[00:45:57] Simon Maple: It was a loaded question.

[00:45:58] Daniel Jones: A skills package manager where you are rating skills both on how well written they are and how effective they are, which I think is a hugely valuable thing. Yesterday I was doing something in Golang. I say I, Claude Code, was doing something in Golang for me, and I installed the Go language server plugin, and I set it running, and I am like, is it even using this thing?

[00:46:20] Daniel Jones: I am not paying attention. I am doing other stuff whilst it is running away. Is this making anything better? Having more scientific data on that, I think, is incredibly valuable. Maybe going out to the far future when we end up with software factories, which is a separate topic of conversation where humans are involved much less.

[00:46:45] Daniel Jones: I am excited by that idea because being able to run requirements through a software factory where you have got multiple agents collaborating and outputting finished software for a given set of requirements, if all of that is under version control and has full observability, then you can run methodological experiments.

[00:47:05] Daniel Jones: In a way that you cannot do with a human team because you cannot get somebody to pretend they did not just implement that story and then try doing it a different way. Whereas with software factories we might be able to do that and slowly iterate and tweak prompts and tweak skills and which skills we are using and how those skills are written.

[00:47:23] Daniel Jones: To be able to automatically start getting ourselves to maximum productivity of that software factory. But until we are there, there is so little visibility for the end user into what they are doing, when they are activating skills, why, and how well those are working. I think it is something where the magic of agentic coding tools works against the user.

[00:47:46] Daniel Jones: It is great when it just works, but actually we also need to be able to see the dials and the readouts to figure out what is going on.

[00:47:54] Simon Maple: Totally. And I think there are a number of things that I find very important here. We talked about activation, and a lot of the time with activation, it is the way we write the skill.

[00:48:03] Simon Maple: For example, Anthropic delivered a whole bunch of best practices of how to write skills, which was added into a blog. Actually our Tesla review skill essentially uses LLM as a judge to work out whether those best practices have been adhered to. I think from the activation perspective, realistically the only way to actually look back in hindsight and say, "How good is my skill from an activation point of view?" is to look through the logs.

[00:48:31] Simon Maple: Almost like saying, "How many times should this skill have been activated, and how many times was it activated?" There is probably an amount of work there that can look through that and provide that data. That is definitely one thing that we are looking at here at Tessl. The other thing I think is what we call task evals within Tessl, which is essentially to run this with the skill, run it without the skill, and work out actually if you do use this skill, what benefit it provides you.

[00:48:58] Simon Maple: And for me, as a writer of skills, for authors of skills versus context versus the consumers, it shows you with data where your skill can be improved. And that, for me, whether it's 80% or 85% improvement with a skill, is useful information. It's a kind of fuzzy feel-good, but for me the value here is for the author to go back and say, I can see what scenarios are failing."

[00:49:26] Simon Maple: I can see what scenarios my context does not provide much value for because it was getting it right beforehand. And going back to the point of overwhelming an agent with context, if an agent does well out of the box already, does it need that context to actually just tell it to do the same thing the same way?

[00:49:46] Simon Maple: It provides you that data to enhance and iterate, to optimise the skill that you are creating.

[00:49:54] Daniel Jones: Yeah. And this is somewhere where I think, from an enterprise context, we are going to see something new emerge. You know, in the last 10 years, platform as a product and platform engineering providing a deployment substrate to your engineers were hugely important.

[00:50:08] Daniel Jones: And people looked at developer experience and that kind of stuff. But the kind of experiment running that you were describing there, that is something that the average enterprise developer does not have the time to do. Like, I am going to do this story twice, or maybe the inclination is like, I will run this once in one work tree with a skill and once in another work tree without a skill, and then I will look and evaluate that. They normally have too much feature pressure.

[00:50:29] Daniel Jones: So I can imagine a world in which, and in fact I see this happening in some organisations, where there is this kind of like, it is not quite a platform team, but like a DevX AI abstraction team where they are building. They are working out which are the right tools to use.

[00:50:48] Daniel Jones: Earlier you mentioned the challenges of an enterprise managing which skills are in use and that whole package management thing that I know you folks are looking at. I have absolutely seen CTOs talking about that and asking the question very directly of like, how the hell do you manage all of these skills?

[00:51:05] Daniel Jones: You know, it is just scripts floating around moving between people's machines. We need this to be auditable, we need it to be versioned, and we need to see what is running where because we are regulated. So I can imagine that as these tools start to get more uptick, we are going to end up with teams of people who are trying to provide a sensible base level of abstraction for all the other developers to stand upon.

[00:51:31] Daniel Jones: And to work out which is the right skill to use.

[00:51:34] Simon Maple: And do you think that is like a new role? It feels like an extension of the platform team, but perhaps not something that the platform team has existing skills around. Do you feel like it will be a role that maybe lives in the platform team?

[00:51:46] Simon Maple: Or do you see that as a completely separate team?

[00:51:48] Daniel Jones: I think it would be nice if platform teams did that because really platform teams have tended to evolve out of infrastructure teams, right? But then typically platform teams also have this injection of a lot of platform engineering thinking. Infrastructure teams were very ticket-oriented, especially in the enterprise.

[00:52:04] Daniel Jones: Like a ServiceNow ticket comes in, I do a thing, it goes out. They are not building something for the future that gets better every day. And then you have this injection of product management and software engineering. It is like, no, we are going to make durable solutions for this rather than doing toil where a ticket comes in and a ticket goes out.

[00:52:19] Daniel Jones: We are going to make a bit of automation to make sure that we never have to do that ticket again. So we have already seen the evolution from infrastructure teams to platform teams and them introducing new skill sets. I think it would be nice to see the identity of a platform team evolving into how we can support developers with immediate future things running locally in terms of agents and the skills that they are using.

[00:52:46] Daniel Jones: Maybe the platform team becomes the software factory team in the longer term.

[00:52:51] Simon Maple: And when we think about rollout for a larger organisation, do you see that rollout coming from the platform team, as they essentially are the enablers, right? Whether it is the developer experience or the platform team, all of these groups are enablers for larger enterprise development teams.

[00:53:06] Simon Maple: Do you see that rollout spawning from that team?

[00:53:11] Daniel Jones: Yes. In the sense that a customer that we are talking to at the moment has got 12,300 developers. So rolling out a consistent, sensible, auditable, regulatory safe set of agent coding practices is going to need coordination, and that is going to need to come from somewhere central.

[00:53:33] Daniel Jones: If you have got people making up their own rules and ideas and pulling in skills and plugins willy-nilly across the organisation, that is going to lead to auditing chaos. So that will need to be coming from a central place. And I am thinking of a particular customer here where there are a lot of people that are interested in the path to production.

[00:53:53] Daniel Jones: Platforms were generally interested in tightening feedback loops and value streams. We have not talked about value stream mapping at all, but that is a hugely important part when you are trying to adopt agent coding practices. Working out how your software delivery function works as a system and what is going to happen if you speed up one particular part of it.

[00:54:16] Simon Maple: Right.

[00:54:17] Daniel Jones: But the kind of people that were interested in things like DevOps and platforms were always interested in productivity, getting ideas into production as quickly as possible so then the product folks can learn, was that the right thing to build?

[00:54:30] Daniel Jones: So it is a natural progression for them to be thinking about how can we speed up other aspects of software delivery.

[00:54:38] Simon Maple: What three tips would you give a developer who wants to be successful with the Gentech coding?

[00:54:43] Daniel Jones: One. Get stuck in, start using something. Two. Context management. Make sure you understand it.

[00:54:49] Daniel Jones: And three. Perception. Make sure that your agent can perceive what is going on so we can fix things. With getting started, just pick it. They are all much of a muchness. They are all fairly similar, so just get started with one. Do not put this off until you think it is going to be settled science, because everything is moving faster and continuing to move forward.

[00:55:10] Daniel Jones: With context management, it is not obvious what is going on under the hood. Clear your chats regularly. Make sure that you stay on topic. Do not give the model an excuse to be confused about things. Do not add everything in the kitchen sink into your agent's MD because it is just going to make things worse. On the topic of perception, if your agent lets you down or frustrates you by breaking things in CI, it needs to be able to see what is going on in CI.

[00:55:31] Daniel Jones: So add an MCP server so it can check what is going on in the latest build. Otherwise, you cannot blame it for when it breaks things.

[00:55:38] Simon Maple: What three tips would you give an organisation that wants to roll out AgTech software development across their company?

[00:55:46] Daniel Jones: One, make sure that you are using a mature team who are enthusiastic and keen to adopt change.

[00:55:52] Daniel Jones: Two, they need to be well aligned to have a documented definition of what good looks like. And also three, they need to have good product management behind them where things are specific and well thought out. If you start off with a team that is already enthusiastic and performing well, they are more likely to go faster, and you have better results that you can then repeat throughout the organisation.

[00:56:13] Daniel Jones: And you have got the social proof of when people are looking at their peers of what has worked well, and they can be impressed by those results. Folks need to be aligned around what good looks like. If you have different ideas of what good code is, there is no way an agent is going to be able to consistently come up with results that satisfy everybody. That needs to be documented.

[00:56:31] Daniel Jones: Everybody needs to be on board and on the same page about what good engineering is. The product management side is something that people often forget. Things need to be well specified so that agents can pick them up. As well as being well specified, there needs to be a decent backlog of work and a good understanding that underpins that of what it is that needs building.

[00:56:48] Daniel Jones: Because as soon as you get your developers going faster, that backlog of stories is going to run dry very quickly. And I have seen that happening in quite a few organisations.

[00:56:56] Simon Maple: Yeah. Amazing. Awesome. And as a final question, let us look a little bit forward. And you kind of, I think you have already alluded to some of this during the session.

[00:57:04] Simon Maple: But looking forward, how do you see the future of Egen development changing, maybe the way we work as individual developers, but also as a workforce, as a company, and as an organisation?

[00:57:17] Daniel Jones: The potential organisational change could be massive. There are kind of incremental steps towards this where engineers are starting to encroach more on product turf by doing spectrum development.

[00:57:29] Daniel Jones: You have got product people starting to encroach on the engineering space with the idea of prototype as requirements. And those are all incremental. But I think the big potential change here is software factories. You are already hearing about hive mind startups where it is too slow to write things down because factories of multiple agents collaborating can deliver software so quickly that writing something down just slows the whole system down.

[00:57:52] Daniel Jones: So when we end up in that world, we are having conversations with customers about this, like what does the world look like where you have got one product engineer per value stream? You do not have a two-pizza team anymore. So many of the things about good Agile software development might go out the window there.

[00:58:07] Daniel Jones: If we can end up with software factories, dark software factories where no one is reading code and no one is writing code that reliably produces the right thing, then the future of the industry is going to look very different. But I really hope we are going to end up with much more bespoke, usable software and an abundance of it, a proliferation of unique solutions for individual people and teams where they do not have to wait for a SaaS provider for 18 months.

[00:58:32] Daniel Jones: They are going; can you please implement this feature? Somebody can go and build it for you on a small scale almost immediately. And that, I think, is cause for optimism.

[00:58:41] Simon Maple: Daniel, it has been absolutely wonderful and very insightful. Hearing some of the stories, hearing some of the advice, and particularly hearing the tips for people to really get going as a developer or as someone who wants to roll out agent development more within their organisation.

[00:58:57] Simon Maple: Super appreciate all your advice and all your thoughts, and it has been great having you on the episode.

[00:59:01] Daniel Jones: Cool. Thank you very much. It's been a pleasure being here.

[00:59:03] Simon Maple: Thank you very much, Daniel, and thank you very much for listening. Tune into the next episode soon.

context engineering

Prompt Engineering

Chapters

Trailer

[00:00:00]

AI DevCon

[00:01:10]

Introduction

[00:02:03]

Good engineering practices for agentic development

[00:03:42]

Why tests matter for agents

[00:06:02]

Version control and commit hygiene

[00:09:36]

Safety and containerisation

[00:13:56]

Spec-driven development and user stories

[00:18:53]

Tool selection across organisations

[00:21:48]

Context management fundamentals

[00:26:33]

Managing skills and MCP servers

[00:30:38]

Sharing context across teams

[00:34:43]

Understanding hallucinations

[00:38:25]

Agents vs LLMs explained

[00:41:43]

Non-determinism and superstition

[00:46:10]

Measuring skill effectiveness

[00:49:52]

Platform teams and rollout

[00:53:01]

Tips for developers

[00:56:42]

Tips for organisations

[00:57:42]

The future of agentic development

[00:59:01]

In this episode

The best agentic developers throw away their agent's work without guilt, run three agents at once and only use one, and treat their AI like a junior developer they genuinely dislike. It sounds wrong. It works.

Daniel Jones, Head of Product at re:cinq, has upskilled hundreds of developers across Northern Europe's largest enterprises. In this episode he joins Simon Maple to share the counterintuitive habits, hard data, and practical frameworks behind high-performing agentic development teams.

On the docket:

Why bad engineering practices get worse, not better, with AI agents
The exact conditions that make your agent hallucinate every time
Why your AGENTS.md is quietly working against you
How to manage context before it kills your productivity
What enterprise AI rollout actually looks like at scale
Why the worst managers get the most out of agentic coding

If your team is adding AI and wondering why things aren't getting faster, this episode is for you.

The Hidden Pitfalls of Enterprise Agent Coding Adoption

Teams with high development maturity go faster when they introduce coding agents. Teams with low maturity go slower. This finding from the 2025 Dora report captures a fundamental tension in enterprise AI adoption: the same tools that accelerate well-functioning teams can amplify dysfunction in struggling ones. In a recent episode of the AI Native Dev podcast, Simon Maple sat down with Daniel Jones, founder of re:cinq, a consultancy helping organisations across Northern Europe navigate AI native transformation.

The conversation surfaced practical guidance for both individual developers and organisations looking to roll out agent coding at scale, grounded in Daniel's experience working with enterprises ranging from financial institutions to SaaS providers.

Why Good Engineering Practices Matter More Now

The theory of constraints explains why agent coding can make things worse for unprepared teams. Speed up one part of a system dramatically, and bottlenecks appear elsewhere. If coding agents can produce commits minute after minute but it takes three days to get anything into production, the tip of the branch moves far ahead, creating merge conflicts when CI fails.

This applies across multiple dimensions: test coverage, coding standards alignment, story sizing, and path to production. As Daniel explained, "There's quite a lot that really would be considered just general good practice from the last decade or twenty years of software development that needs to be considered just because everything goes much faster now."

Test coverage deserves particular attention. Agents need to perceive problems to react to them. Without failing tests, an agent can break functionality and remain none the wiser. "Whenever you're getting frustrated with a coding agent, you need to consider: could it have known better? Did it have the information to know that it was doing the wrong thing?"

The conversation surfaced an interesting nuance here: perhaps unit tests are no longer primarily for humans. Agents can write as many as they like to verify their own work. But acceptance tests that define what users should be able to do with the system remain essential as the ultimate safety barrier and as documentation of expected behaviour. This connects naturally to spec-driven development (https://claude.ai/blog/spec-driven-development-guide) practices where explicit specifications guide agent behaviour.

Context Management as Core Competency

Academic research suggests that above 30,000 tokens of context, reasoning ability drops by approximately fifteen percent. More recent work on GPT 4.1 era models found similar degradation between 30,000 and 60,000 tokens. That threshold is lower than many developers realise. Daniel noted that opening Claude Code with no MCP servers installed and no configuration can already consume around 40,000 tokens just for system prompts.

This creates practical constraints. Clear context regularly. Stay on topic within conversations. Use different tools for different types of questions, such as a web browser for research, a terminal agent for command-line tasks, and a coding agent for feature development.

The conversation also challenged a pattern that became fashionable in late 2024: adding instructions to agent configuration files every time the agent makes a mistake. "What you're doing there is adding something that's largely irrelevant most of the time in the hope that it's relevant this time. That's not a great thing to do."

Similarly, adding every available MCP server creates overhead. Tool definitions get sent to the model on every prompt. Skills with progressive disclosure, where the model receives a summary and can request more detail when needed, offer a better approach than frontloading everything into context.

The Hallucination Sweet Spot

The conversation surfaced an instructive framework for understanding when hallucinations occur. Models can detect obvious fabrications about well-known concepts. Ask about a fictional Star Wars episode, and the model will likely flag it as nonexistent. But move to specific details adjacent to large concepts, and hallucination risk increases dramatically.

Daniel demonstrated this with an exercise: asking about a fictional London thrash metal band produces a rejection. Adding specific details, such as a neighbourhood and date range, caused the model to start inventing band biographies about fifty percent of the time. "That's an example of how if you're adjacent to large concepts but very specific, you're in the sweet spot of hallucination."

This pattern explains why agent coding on real projects proves harder than demos suggest. Building a to-do app or Flappy Bird clone uses well-known patterns. Working with specific internal libraries the model has never seen, requesting precise API endpoints for new library versions, these scenarios sit squarely in the hallucination danger zone. Context that provides accurate, current information about these specifics becomes essential for AI coding agent reliability.

Containerisation as Safety Infrastructure

Running coding agents with unrestricted permissions on host machines creates obvious security risks. Yet Daniel found that containerisation maturity varied significantly across enterprises, creating unexpected friction in adoption.

"Having come from a cloud-native background, I kind of took it for granted that everybody would be familiar with things like Docker and Colima. It turns out a lot of enterprise developers aren't, and a lot of enterprises aren't necessarily set up for people to be able to run things in containers on their development machines."

Dev containers, the open standard from Microsoft, offer a potential solution but currently sit in that same hallucination sweet spot: recent enough that models tend to generate incorrect configuration, but not obscure enough that they refuse to try. Some enterprise developers have spent weeks troubleshooting dev container setups that agents confidently but incorrectly produced.

The recommendation remains clear: running agents inside containers enables the autonomy that makes them productive while maintaining security boundaries. The setup friction is worth the investment.

Rolling Out Across the Organisation

For organisations looking to scale agent coding beyond early adopters, Daniel offered three concrete recommendations. Start with a mature team that is enthusiastic about change. Teams already performing well are more likely to accelerate rather than stumble, and their success creates social proof for broader adoption.

Second, align on documented standards for what good code looks like. If different developers have different expectations, agents cannot consistently satisfy everyone. Those standards need to be explicit, not assumed.

Third, ensure strong product management support. Stories need sufficient specification that agents can work with them. And backlogs need depth because agents will consume work faster than most product teams expect. "As soon as you get your developers going faster, that backlog of stories is going to run dry very quickly."

The conversation pointed toward an emerging organisational pattern: teams that manage agent tooling, skill curation, and development experience across the enterprise. Platform teams seem well positioned to evolve into this role, extending their focus from deployment infrastructure to the full developer toolchain including agent configuration.

Looking Forward

The trajectory appears to lead toward software factories where multiple agents collaborate with minimal human intervention. Some startups are already experimenting with this model, finding that writing specifications down slows the system because agents can iterate faster than humans can document.

For organisations that cannot wait for settled science, the practical path involves starting somewhere, understanding context management deeply, and ensuring agents have the perceptual capability to detect their own mistakes. The tools will continue evolving rapidly. The fundamental constraints of context windows, the importance of good engineering practices, and the need for agent perception are likely to remain relevant regardless of which specific tools emerge.

The full conversation covers additional ground on version control practices, tool selection anxiety, and the ergonomics of different agent interfaces. Worth a listen for teams navigating the transition from experimental adoption to structured rollout.

context engineering

Prompt Engineering

Chapters

Trailer

[00:00:00]

AI DevCon

[00:01:10]

Introduction

[00:02:03]

Good engineering practices for agentic development

[00:03:42]

Why tests matter for agents

[00:06:02]

Version control and commit hygiene

[00:09:36]

Safety and containerisation

[00:13:56]

Spec-driven development and user stories

[00:18:53]

Tool selection across organisations

[00:21:48]

Context management fundamentals

[00:26:33]

Managing skills and MCP servers

[00:30:38]

Sharing context across teams

[00:34:43]

Understanding hallucinations

[00:38:25]

Agents vs LLMs explained

[00:41:43]

Non-determinism and superstition

[00:46:10]

Measuring skill effectiveness

[00:49:52]

Platform teams and rollout

[00:53:01]

Tips for developers

[00:56:42]

Tips for organisations

[00:57:42]

The future of agentic development

[00:59:01]