AI Native DevCon 2026 London — all conference sessions as interactive skills
71
89%
Does it follow best practices?
Impact
—
No eval scenarios have been run
Risky
Do not use without reviewing
⚠️ Speaker-label warning: This transcript was provided as a single continuous block with no per-speaker labels. It opens with an unnamed host/introducer (lines ~1–10) and then continues with Ian Thomas for the remainder. When attributing quotes, treat everything from "Nice to see a wonderful photo of me..." onward as Thomas; treat the opening paragraph as "the host" or "the introducer". The only named individuals are Ian Thomas (the speaker), Simon (mentioned once, in the context of a previous QCon AI in New York), and references to teams/products/tools (Horizon, Horizon Worlds, Horizon Workrooms, Workplace, Reality Labs, Meta, Facebook, Microsoft, Amazon, Oracle, Unity, DRS, "fuse", MCP, DORA, Miro, Yorkshire). Do not fabricate other attributions. Speech-to-text artefacts (e.g. "create a set" likely = "cause an incident"; "her writing experiences" likely = "Horizon Experiences"; "qubit" likely = "crux") are preserved verbatim.
Cool. Okay. Now. We're jumping to the. To the pen. Ultimate session before. The closing key. Note. We have. Ian Thomas. From. Meta. And this can be a really nice session whereby. One of the things that. Is a big challenge is when we want to adopt a genetic engineering or gentle coding. How we go about rolling that out, how we go about adopting these tools, these processes, how we build that community is super important. Work, things that doesn't work. Not everything's going to be the same for every organization or culture, but it's important for us to learn from the experiences of others. So really, in fact, I met Ian at QCon AI in New York, and as soon as I saw a session, I thought, ah. Next time I do a conference, I want to make sure Ian is at that conference and giving a very similar session. So it's been great pleasure to introduce Ian Thomas. Please give that. Everyone.
Nice to see a wonderful photo of me on that little voice slide. Nice to know that I'm not presenting. I'm not catfishing people. I am that ugly. Okay, so I've got an interesting case study for you today from my time in Horizon experiences, which is part of Reality Labs in Meta. Over the past 12 months now, we've been on a journey where we've gone from a nascent understanding and nascent usage of AI tooling to something that is now holistically used across many different workflows, different teams outside of engineering, inside of engineering, and very, very broadly internally of the company. But it wasn't always the case. And there was a time where it was looking like people were starting to strike their hills a little bit with their adoption. So I'm going to hopefully share a bit of information about that story that will help you impact on your journey today.
So this was the vision that was set out to us by our leadership. They wanted to change engineering. So the, the engineers could move away from the operational work, the toil, the undifferentiated heavy lifting, so to speak, so that we could go back to exploring and innovating, reducing the toil from what they estimated to be about 50%, so less than five percent of engineering and time so that we could focus more on product development. This was broadly how they framed that timeline and where the split of work might end up being. And you can see that the point where I was starting this process, which is between sort of mid-year 2025, we were heavily in the, most of the work still done manually. Lots of people were working on operational work. They had lots of on call overhead bugs coming in front end. So things that everyone has to deal with in their day-to-day work and it takes up a significant amount of your capacity. But just give them. Keys.
We were also trying to build this culture of AI adoption, and we knew that AI tooling was going to be important and it's going to be the way that engineering evolved. But what we are finding was there was a bit of hill tracking going on. And the adoption wasn't uniform. It was ad hoc at best. People were finding local outcomes, they were dragging, they were trying to use the wrong tools for the wrong problem. They didn't always have the same processes and where there was success. It was kind of kept in a fairly small area of knowledge that people siloing information within their teams. And there wasn't a great deal of value being perceived. People weren't seeing the return on their investment in time. So we were seeing this was going to be quite a difficult problem for us to bridge.
And to fast forward you to the end, what we did achieve in the end with this, with this work that I was part of, we grew an organic community that was over 40 times bigger than when we started. I think at the time when I stopped working on this, we were well over 500 people and that was at the start of this year in January. Well over 80% of people were using these tools weekly, which was up from under half. And I would say consistently now, I think I see this in the mid to high 90s. So we're really hyping the adoption curve now. And for certain workflows we're seeing significant time gains. We built a maturity model that I'm going to share with you later today. And there are patterns that we invested in and we found through experiments that we then started to spread throughout the company and it became a pattern that was, was used widely, not just in these small siloed groups of success.
So it's, hi, I'm Ian. I am a software engineer based in the UK. I work remotely from Yorkshire. Which is a wonderful juxtaposition to working on VR and metaverse products. I'm currently part of the risk team at Meta. I've moved on from her writing experiences and I am a tech lead for the transparency and choice area. But before that, I spent many years, two or three years working on various parts of Horizon, one of which with Workrooms, which is a collaborative 3D VR product and more latterly Horizon Worlds or the metaverse is the rest of the world seems to know it. So yes, I'm going to go through a few different things with you today, mainly talking about how we achieve the success and the growth and the adoption that we did achieve.
So I want to jump back to this. One of the things that is interesting here is considering which of these areas are going to be the most effective for us to lean into to find value and find ways that people can adopt the tools that will make them feel successful. And if we jump straight into feature development and looking at product exploration, we felt that maybe that was going to be quite hard because it's a novel process in its own right. So there's too many moving parts. The things that we did understand well and that we could control is that top area on the left, the code modernization and administrative tasks.
But there is a slight problem with that. This is a little bit out of date now, but hopefully it will give you an idea of the scale of the problem that I'm going to talk about. When I joined the company, we were already at half a 500 million lines of hack code. And to give you some. Idea of scale, Haskell on the right there, that tiny little slither, that's still half a million lines of code. So this is a significantly large code base and it's all in one repo. So for those of you that have experimented with agents and large repos, this sort of thing is challenging at best. The bit that I work on at the time is kind of right in the middle. It's the mix of C++ and C because we were working on a platform that was based on the Unity framework. But there was all sorts of things to go on there. So lots of code.
It was very difficult for us to think about this adoption process. So starting with bugs and operation overhead is probably a good place for us. And that's where we started to say, well, how can we use this AI to help us achieve our engineering excellence goals?
And for us engineering excellence is consistent of three main parts implementation quality, production excellence and better engineering. Now some of these things are things that you do every day in the course of your work no matter what you're working on. It's fair to say the implementation quality is important no matter what. But the interesting ones are things like responding to incidents and how you manage service and bugs and things like that. So the production excellence aspect, but also better engineering. How can we build things to help us go faster anyway? How can we build internal tools or processes, improve our working practice? In ways that make everybody better.
You may have seen this meme about organizational structure and companies. Now this, I haven't been in Microsoft, Amazon, Oracle, but I can say the Facebook world, it's actually quite true and all Meta as we are now more fondly known. This is a social company. Things get done through bringing people along with you for the journey. There's no real top down incentive. Or hierarchical effort, especially when it comes to engineering adoption of things. It's a very grounds up evidence based approach that's needed to be taken. And engineering excellence is a program, is a great way for us to help leverage that to make the AI adoption more seamless. Because we already had small groups of motivated people who were interested in making their lives better and making the code base better.
So the other thing we get with this is that because it's a relatively small group of people that lead this work, they tend to take it on as a challenge for their own teams. We could then build a small community where people could feel stage, they could have the space to fail and they could ask questions and not feel like they were going to be put on trial or have any kind of impact on their performance reviews. This was a place where we could be very specific about the needs that they had, talk through the problems and work through things together. And also because it's a smaller group, we had less overhead, less friction. We could make decisions more quickly and it allowed us to grow the group from a position of understanding and a core group of people who were in a kind of aligned on the mission that we were trying to achieve. Gradually over time it grew legs and word of mouth helped to get more people involved and what people saw that there was value in this work and they wanted to find out how they could become part of it. Too.
And it was interesting in our culture at the company people focused a lot on the impact of the work they do. They want to know how their time being spent is going to become valuable to the product, the business and to themselves frankly. And often that means that people can tend to focus on the things that are easy to measure. But I think actually in this case one of the things that's important is thinking about the community aspect. And when I was talking this through with the people that were in the engineering excellence group, we tried to focus on some of the more intangible hard to measure parts of what a community needs to make them successful. So that we could start to build up that sense of belonging that people could feel like they were onto something that was going to help them change meaningfully and adopt something that's going to be part of their working practice for many years to come, not just a quick tool that they were going to use for a bit and sort of the next best thing came along. I think we all fundamentally agreed that AI was going to radically change what we do on a day-to-day basis forever. Again, these things aren't easy to measure necessarily. There's no dashboard I can pull up to say how many people feel a sense of belonging in the community. So it's kind of a bit of a asking people to take a bit of a leap of faith, but I think it's the important thing that we needed to do.
Out of having these conversations with folks and this small group came lots of feedback around what was working, what wasn't. And people were bringing ideas from their individual teams and we started to notice some patterns. And around about August last year we started to see what was going on in the wider world as well and put the two together. And we started to think about how we could model maturity model to help people understand where they might be on their curve of adoption and how their individual team context might be used and they could change what they were doing to try and leverage these tools best for them.
So we came up with this approach that was based in some of the research from DORA and some of the experiences that we had where we had six dimensions of maturity and we produced a self-assessment workshop. So teams could take that back. They could run it and then they could have it run on a regular cadence to see how they were changing over time. The six dimensions I mentioned were things like workflow integration, how you find prompting other ways that you're finding success with different ways of sharing information, sharing prompts and distribute things across the team. Are there individual productivity gains that you've got that you're able to share? Or are there things that people are doing as part of the overall team that you're seeing making improvements? Is there a use case that you're finding that you've got a significant amount of leverage on or are there others that you're struggling with?
And then we model this across five levels because we're engineers, we somehow decided this should be a zero index based list. But yes, it goes from sit right the way through to leap. And the main thing is that the dimensions are largely independent of each other. So you can get a different feel that you can focus on different parts at different times. An example of what we have here is the workflow integration dimension. So we were talking about people understand there are tools, but they don't use them right the way through to, they've seamlessly integrated these tools across many different processes and they would self identify as AI native now. And it's quite a useful thing to sort of sit and think well where am I today in my team? How are we faring on this sort of rubric? And if you've got a camera, apologies, it's quite small, but this is what we ended up with and what we distributed out to the team. So if anyone wants to take a photo, be my guest or I can try and find a way to distribute this afterwards.
I mentioned that we put some time into creating a workshop around us so that everyone managed to get a photo of the quantum. We put a workshop around us because I think having a model itself is okay. But actually using it to drive conversations within the teams was the more valuable part of the process. And so we had this setup where we could go in, we'd help facilitate the first session, we talked through the different aspects of the model and what it meant. And then we'd set the teams up with a regular period of every three or four weeks. They'd maybe run it again and give us some feedback on how they were finding themselves progressing over time. And the nice thing about this is because it was team specific, they'd find their own gaps and they'd find unique ways that they could build new ideas into their work in practice on a day-to-day basis.
And it's that tracking progress over time that really made the difference here because we wanted to make sure that people were seeing value in the time that they were investing. But if they didn't have any kind of way of measurement, this is where the model really helped us out. They could actually see, oh yeah, right, we did try that thing. Now we're seeing ourselves embedding this process into our processes more. We're seeing more value in trying this prompting technique where we found this tool that we're going to use more. So they could track it.
The assessment was really simple. If you want to run this yourselves, it's not rocket science at all. It's a very basic sort of kaizen retro format where we would talk through the descriptions. We would get people to vote in isolation something like Miro or another kind of online whiteboard is great for this sort of thing where you can just have anonymous voting. And then the qubit was the discussion. The 20 to 30 minutes often spiraled into 45 minutes plus. And then we would talk and share learnings with the community. And one of the things in Meta that we use a lot is the talk of Workplace, which is essentially an internal version of Facebook. So every time anyone run one of these assessments, they would post the results and they would share the insights they'd uncovered. And so it was a case of everybody else would then get visibility into what other teams were doing. Raising up the network effect.
One of the big things that came out from pretty much every single workshop was can we trust the code? And it was one of the big reasons of the senior engineers was slightly hesitant about adopting AI tooling in the first place. They wanted to have control. They liked finessing what they were doing. They like to feel that they had the craft in hand. And that was great because it told us where we could focus on more cross cutting efforts as a leadership group, a team that were looking to help the adoption curve. We could find ways to unilaterally improve people's experiences. So on the back of things like we were looking at testing or how we could increase our test coverage and reduce flakiness before. What we were seeing that AI was great at hallucinating awful tests. It would make stuff that was really looking like it was doing something useful, but it really wasn't there. It was just increasing our overhead on CI.
So we invested in an anti-test swap initiative where again on CI, we could go and validate the changes that were being made in the autonomous way using a different AI tool to judge the quality of the output. And these things are again example of a ground up culture that we have is that this thing started out at the Horizon, but the test slot initiative that we started and the tooling that was built was eventually adopted by Infra and rolled out across the whole company. And I think to date it still runs pretty much on every diff. It's got significant volume of AI generated tests in it.
So it's time progressed, the momentum built and this is when the other side of it, so we have the ground up bit. This is where the sort of top down aspects of support came in. So leadership saw that there was a good thing happening here. And that's where we got the kind of support and push to make this broader. So we opened this whole idea up to more than our local teams. We pushed this out to the whole of the metaverse organization. And it was a way that we could start to scale this and help with the speed of adoption. And what we saw was there was quite a step change in our community just by these things. And this is where I think you start to get to a snowball effect where you reach a critical mass and enough people know and enough people start sharing things out. The adoption, the numbers just grow fairly organically and rapidly.
So I've got some patterns of things that we saw that were helping people in their focus on improving their practice. And they kind of go through the maturity model in some respects as well. So we started off by saying, right, how can we think about tests? Tests are always problematic. We know that we want to have more of them. The ones that we've got, especially when we're thinking about building a VR immersive experience and you need to have headsets in a device lab and we want to have end-to-end testing and things and multi-users, they're hard to write and they're hard to maintain and they're not exactly the most fun thing for people to be fixing all the time.
So how can we improve our test coverage? And is there a way that we can use agents more cleverly to find the best places where we could. Increase our test coverage? So one of the engineers, he went off and did this. He identified some dashboards and some data that we had, some tables that were telling us what files were uncovered. But then we could also understand how those fires were being changed over time. Were they hotspots for people making critical changes to the product? And then he taught the agent how to go and find that data and how to assess it and build up the priority list of files that we wanted to go and write test coverage for.
And in the end, I think the time estimate here is quite significant. He applied this and he's thinking about three hours he achieved what he would have taken about half a week normally, which I think this is sort of an early adopter, so it might be was a bit faster you've got there, a bit faster than some other people would have done. But it showed the potential in this approach. And again, it was a repeatable pattern that we could think about rolling out more broadly because these tables have data about lots and lots of files in the Meta ecosystem, not just the Horizon ones. And I think from this initial effort near the 60th got merged.
VR, like I mentioned, is quite a challenging world to build software for in some respects. And one of the challenges is that. We largely work on Windows machines that are very, very powerful with very big GPUs. And so they're not necessarily the easiest things to rack up in a service center. And provide on demand to people. We tend to have one or two beefing machines per person. And so there's a lot of things that you can only do kind of single threaded. Refactoring in VR means that you're going to be spending a lot of time in your windows, PC looking at some pretty gnarly low level code at times. And there's a lot of legacy patterns that we built up over the years that needed to be refactored out. And people tended to shy away from these things because there's a high risk of breaking things. These are the refactorings are not something that you would undertake for fun.
But one of the engineers thought about, okay, well, if I can teach the AI the way I would think about resolving these refactorings and then go and find patterns that are similar across the code base, what can we achieve? And again, roughly half the time spent in achieving his goal, but he finally found multiple areas where this pattern could be reapplied and set the agents off in the kind of team pattern so they would go and work in parallel. And then he was acting more as a reviewer and just being a critical architectural oversight, making sure that he was doing the right thing and not making any kind of gross miscalculations on how we actually wanted the system to look afterwards.
Moving on from that, like I said, we had a challenge with VR in that your kind of constraints of one machine. But one of the engineers who was thinking about this from the, what has the machine actually need to render and what can we do that doesn't require us to have this massive PC with huge GPU in it. If we provide better context to the models when they're doing work, would they be able to work in our more standardized on demand worlds?
And the result of this was an MCP server that was a bridge between the data that backs Horizon was. So all of the things that explain how world state looks, how things are built, the models, the kind of textures and everything that are in the world. It taught the models exactly what it needed to know about how Horizon works. So they could go and work largely in isolation and in multiple on demand environments. And so that removed the whole Windows PC barrier. And it meant that we could paralyze the work in a lot more effective way. And again, the AI was clever. It knew more about the world that we were working in than it did before. So we got better outputs as well. So not only was it more efficient and we could paralyze more, we actually found the quality of the work quite up too.
And then lastly, Meta has a pretty great infrastructure for rolling out code mods. And the infrastructure teams have been experimenting with ways that we can integrate agentic code mods into that platform. And as you saw with our 500 million lines of hack back in 2023 that has only got bigger since. There were many, many places where there were going to be code quality issues that were just not worth the effort finding them unless we could do it in an automated way.
And that's where this automated code fix pattern really came to the fore. So we could generate ways to find these issues and we could provide rules and runbooks to the AI that look basically like skills or context files. And it would just churn constantly, just run, go, go find a thing, fix it. Promote a diff, get someone to review it. And 30 diffs generated autonomously is a pretty low number. I'm fairly confident that that number is several orders of magnitude higher now. This was just at the start of the use of this. But this was a way that we found to automate code quality at scale. And again, it moved away from humans doing this in a sort of supervised way to a largely unsupervised review afterwards manner.
So what was it behind these winds that kind of made them successful? Thinking about this helps us to like look for more opportunities in the future. So being clear on the problem definition was the main thing. We weren't just sort of scattering ideas and trying things. We'd had a clear vision that we wanted to focus on our engineering excellence. We tried to focus on shared tools where we could, things that were going to be unilaterally used across the company so we could benefit from cross-pollination of ideas. And one person improves it, other people get the benefit. But we also tried to focus on keeping it as a human oversight in the first place. If we went too far into the fully unsupervised agentic world, I think we were going to set ourselves up for failure. And then all the things that we know from years and years of agile work iteration, learning, being open minded, sharing the approach that we get feedback and looking for other people's insights along the way. And all of this was available to us using commercial tools. It's not something that is restricted to a Meta internal only.
And thinking of the tools that we did use. I quite like to map them across the SDLC because code is only one part of the story. And I think actually if we want to be AI native, then we need to think about it as more broadly across all aspects of what it takes to deliver software from idea to production. And we have a bunch of things that at the time that I was running this group were available to us. But I think the key thing that I sort of reflected on this that actually made the difference was that underpinning all of these things and all these stages in life cycle was a strong investment in platforms and platform engineering. Because a lot of the tools that you see on the screen here were hooked up to existing platforms and had entry points that were allowed them to be relatively quickly onboarded and scaled across all the engineers in the company. And so that isn't a trivial amount of effort and building those platforms in the first place actually is still paying dividends even if we think about a world where AI is supposedly reducing the need for the multiple people. I think we can focus on doing more, but we still need to have these underpinning things that allow us to do more collectively rather than just individually.
And there's an argument at the bottom there that's doing a lot of heavy lifting as well. So build your own tools, you can fuse is our platform where people can build their own agents and have them to be highly specific. After I gave this talk for the first time when I saw Simon in New York, I invested a whole bunch of time in creating agents to help us with maintenance mode for different products. They would go and review diffs and they would flag them against known criteria, look for approval signals so that we could start to have an autonomous way of saying that this was okay or this needed to be sent back. This was a piece of work that was not something that we wanted to be pursuing because we were looking to migrate from one version of our product to a new version as we released a new version of the Horizon engine that underpinned everything. So we were really trying to minimize the effort of pitman time people were spending on the old product. And having the ability to build custom agents using shared tools just rapidly allowed us to get to a place where that could roll out. And we were starting to see more and more unsupervised tooling in place like that.
Other ones on here that are really valuable include things like DRS, which runs on every single diff in a company that gives us a score of risk based on how likely we think that changes to create a set. And that's something that has proven extremely valuable over time because you can move from having that to be a thing safe. This is super risky. Maybe don't do it. So this looks super not risky. Let's get it into production as soon as we possibly can. And again, it grows and learns over time. It's become really valuable part of our tool. Kit.
So let's have a little look at what works and what didn't. So starting small always helps because you have more control and you can move faster. And looking at ways that we could hook that into existing ways of working and different concepts that we already had in place helped us to get people kind of motivated to take on a challenge of growing the usage and adoption of these tools. The community was a key aspect of it. And building that from the ground up gave it credibility and allowed us to kind of drive this adoption from engineers upwards rather than as a top down mandate. Which honestly I've never really gone down that well in any large company that I've worked in.
Some things that were okay but we're still needing to be proven out with things like what does this mean for code review? How do we prevent ourselves from creating new bottlenecks in our SDLC based on going faster in new partners constraints applies pretty well here. And what's the cost of this in the long term? How does it look 9, 12, 3 years down the line? When you've got all this code that got generated really rapidly that you don't need to understand. I think we're starting to see more about this now as agents have got bigger context windows we're able to provide better information to them. And so we're almost standing off a little bit from the direct hands-on ownership of the code in the same way that we used to. But I think it's still something that people have got one eye on making sure that we're not losing the quality overall for the long period of time.
And then the key things I think are understanding where the value of measurement lies. So we started off being obsessed by the number of people that we're using tools on a weekly basis and I think that and how many diffs they were generating, how many lines of code were being written by agents. At the end of the day they're vanity metrics, token usage as well. That's another, it's a good indicator that someone's using something but that's about it. It's not actually proving any value. It's being created there. So thinking about how we actually measure productivity and what the gains are was something that I think is still bearing fruit. And then things that we discounted. We didn't really want to move to full autonomy yet. We didn't feel it was totally realistic. But we knew that we could get somewhere towards it by choosing which workflows were going to be best suited to unsupervised work versus the supervised work.
And so these are the strategies that I think works and helped us the most. Give people permission to fail and share and don't be worried about penalty if something goes wrong. Be given a fairly long rope to go and do the right thing and try and find out the way forward. But you can constrain that by making sure you're working on something that is largely less risky. So by choosing something like internal tooling and tests and things, we got a good start on that. So we have to make mistakes. Starting bottoms up gives credibility but once you get to a certain mass you do need that top down credibility and the push from leadership to say that this is something you should all get behind. And then thinking about how we maintain that quality overall. Consistently investing in the education supporting people and then finding the value and the examples that have worked and really highlighting them to prove to people that there is something meaningfully good happening here. There are always ways that can help us to kind of increase the support that we have from the engineering team and from the wider leadership.
If you want to follow this model just find a couple of people and get started. There's no real permission needed. It's quite fun to just sort of dream big and start off small. I would recommend getting the materials models in place if you were in this position and trying to set up certain assessment workshops so people can find their own gaps and they can feel like they're invested in the process too. And then finally is quick wins and start to make progress against your smaller goals and soon the bigger goals will be hit too. And that's it.
.tessl-plugin
talk-azriel-executable-specs-agentic-coding
talk-batey-building-product-teams-age-of-ai
talk-birgitta-closing-keynote
talk-cormack-tests-lie-observability-ai-honest
talk-debois-agent-enablement
talk-douglas-training-ai-on-your-own-code
talk-dubnov-merge-rate-ai-adoption
talk-farley-vibe-coding-best-we-can-do
talk-firtman-web-mcp-agentic-web
talk-foxwell-reinvention-dev-team
talk-graziano-spec-driven-development
talk-groetzinger-skills-everywhere
talk-jones-odevo-ai-native-transformation
talk-jourdan-pipelines-to-prompts
talk-katsioloudes-code-security-ai
talk-kerr-bipolar-disorder-dysregulation-ai
talk-lamis-context-engineering-dreaming
talk-lawson-agent-experience
talk-lopopolo-harness-engineering-humans-steer-agents-execute
talk-luebken-embedding-pi-coding-agent
talk-maleix-collective-intelligence
talk-marsden-agent-desktops
talk-martinelli-spec-driven-development
talk-moss-skills-team-workflow
talk-obstbaum-willoughby-evals-hard
talk-overweg-one-brain-no-filtering
talk-podjarny-skills-are-the-new-code
talk-roberts-ai-native-brownfield
talk-roberts-brownfield-ai-native
talk-scheire-artificial-intelligence
talk-selajev-docker-sandboxes-agents
talk-sloan-harness-engineering-beyond-code
talk-smith-connecting-context-future-transports
talk-stack-humans-architect-ai-writes-code
talk-stoneham-product-brain
talk-syme-agentic-repository-automation
talk-tal-skills-security
talk-thomas-ai-native-engineering
talk-trieloff-browser-agents
talk-walter-runtime-intelligence-agents
talk-wilson-cq-stack-overflow-for-agents
talk-wotherspoon-humans-vs-slop