
How DeepSeek leveraged Qwen and Llama to build its model in $5M
Also available on
Transcript
[00:00:00] Simon Maple: So after our discussion, we'll just do an intro, which kind of covers the main points. You good? Hello and welcome to another episode of the AI Native Dev, and today we're going to be talking about all things open source. Joining me today is the CEO of OpenUK, Amanda Brock. Amanda Brock, welcome to the session.
[00:00:18] Simon Maple: How are you?
[00:00:18] Amanda Brock: Thank you very much for having me along, Simon Maple. Finally.
[00:00:21] Simon Maple: It's an absolute pleasure. We've been talking about this for quite some time now, and either you're traveling, I'm traveling, or you're a real. But finally we found a day where we could be here in person in London, and you're, of course, UK-based, OpenUK.
[00:00:35] Simon Maple: Tell us a little bit about OpenUK. Because I think a lot of our listeners will maybe have heard of OpenUK from some of the reports and research that has been created or maybe State of Open Con in the UK, a conference you run. But tell us a little bit about what it is.
[00:00:48] Amanda Brock: Yeah, so we are just over six years old.
[00:00:51] Amanda Brock: And we are an industry organization that takes a very different approach. I'm just going to move this before I knock it over. Sorry. Sit back. So we're an industry. Can you still... Is that all right?
[00:01:01] bg: A little closer?
[00:01:02] Amanda Brock: Yeah. It's just, if I move at all, I'm going to knock it. You'd rather have it static, right?
[00:01:09] Amanda Brock: Yeah. So we are just over six years old at OpenUK, and we are an industry organization with a difference. We realized very quickly that the other industry organizations and other countries for open tech, so software, hardware, data standards, and AI, focus on companies.
[00:01:29] Amanda Brock: Right. By doing that, they end up with a small group of companies, and they miss a whole part of the open-source ecosystem. So we focus on individuals; we bring people together. We've become a sort of convening point for the open tech sector. We then do a bunch of research and reporting. So we do legal and policy work.
[00:01:46] Amanda Brock: We respond to legislation. We do research into open tech in the UK and beyond. And then we do a bit of skills development. We have the annual conference that you've mentioned. We're doing that a bit differently this year, which we'll probably come back to. So we really bring people together to talk about the stuff that's going on in open tech.
[00:02:05] Simon Maple: Yeah, it really is a truly wonderful organization. And of course, your background is also kind of more on the legal side. Right? Yeah. You've been 25 years legal experience and you've done a lot of work there with open source legal frameworks, as well as internet law.
[00:02:20] Amanda Brock: Yeah, so I guess I started in the mid-to-late nineties as a lawyer. Went through the dot-com boom. I actually thought that that was going to be the pinnacle of my tech career, and I didn't think there was going to be anything else coming down the line that was going to be quite as innovative and new. I really had enjoyed that part of my career because you're trying to fit the laws around the technologies rather than make the technology fit the law, if that makes sense.
[00:02:47] Amanda Brock: So in 2008 when I joined Canonical, I was just blown away at the stuff that was going on. Canonical back then was really cutting edge. Yeah, I can't really comment so much on what they're doing today, but it was really a moment in time in the industry. So I've worked in open source on and off since 2008 and I thankfully, I will say, gave up being a lawyer in what, 2018, 19?
[00:03:10] Amanda Brock: And started the role with OpenUK in 2019.
[00:03:14] Simon Maple: Amazing.
[00:03:14] Amanda Brock: So in theory, I still use my legal skills in the work we do, but I don't have to be a lawyer anymore.
[00:03:20] Simon Maple: Amazing. And so, when OpenUK started, of course it was probably, we would say, the more traditional open source that we think about: open source maintainers building software components that are available for reuse under certain licenses and so forth.
[00:03:34] Simon Maple: Then AI happened. Or that, of course, AI's been happening for a long time. AI hit the mainstream, and it really caught everyone's attention. And OpenUK has recently actually done a lot in the world of AI as well. Let's talk a little bit about what openness means to AI and to LLMs when we think about an open model. Which has been thrown around a lot. What is an open model?
[00:04:02] Amanda Brock: Oh, you could get me into so much trouble with that very small, simple question. There's been so much debate about it, and we've seen people trying to come up with definitions, so I'm going to give you what I think this is all about.
[00:04:16] Amanda Brock: And we all know what open-source software is, right? We spent 30 years working on the open-source definition, working on licensing based on that, and relying on it as a trust point, and that trust point has allowed us to build traction. And for me, the key to it is that anyone can use the code for any purpose.
[00:04:36] Amanda Brock: Now, if we take that and sort of bring it to the next level for AI, to me, it means that anyone can use what you are giving them for any purpose. Now you can get into all the debates with models and things about whether you have to give the data. How much of it do you have to give? My preference is that we disaggregate the technology.
[00:04:56] Amanda Brock: We don't try to define something that's so emerging, and we look at AI as it evolves. So a couple of years ago when people were really trying to do definitions, all they were thinking about was language models. They weren't thinking about GenTech or any of the stuff that's coming down the line now, the robotics, whatever else.
[00:05:13] Amanda Brock: Embodiment. So I think the best way to view it is to disaggregate the components, look at what the components of the AI are, and ask the question about the components. So if we're looking at a model or an algorithm, if we're looking at a data set, if we're looking at an agent, is that something that anybody can use for any purpose?
[00:05:32] Amanda Brock: And that goes back to how it's licensed. So I would look at each of the different components and whether or not it's on an OSD-compliant license, so the licenses that you know from open-source software like the GPL, Apache, and MIT, those kinds of things.
[00:05:48] Amanda Brock: And if it is, then I would say that that part of it is open source. It might not be that the whole of the AI is, but that part of it is.
[00:05:56] Simon Maple: And it's interesting when we say open source, because open source just means, you know, the source code is there for you to see.
[00:06:02] Amanda Brock: Mm-hmm.
[00:06:02] Simon Maple: Now, if I were to try and build an open-source component, a piece of software, I could probably go ahead and, well, it'll be open. I could probably build it through GitHub, et cetera, and some CI/CD pipeline. With a model, though, of course, just having the code isn't really enough to be able to build the model and use the model to the same level. So there are other terms that have been thrown around, like "open model."
[00:06:29] Amanda Brock: Yeah.
[00:06:29] Simon Maple: Open weights. Yeah. Open training.
[00:06:31] Amanda Brock: What is the difference? So for me, I would say that you are looking at the component part, which is a recognizable part, as the algorithm, the weight, and the model, as being open and meeting that open-source standard.
[00:06:43] Simon Maple: Yeah.
[00:06:43] Amanda Brock: So it's something that's given to you freely in order that you can use and reuse. It doesn't mean that all the conditional parts are going to be available. So particularly when we look at open weights and open models, I think the real differentiator there is that you're not getting the whole of the LLM or small language model, whatever it is that you're using. You're only getting that piece of it, the model piece; you're not getting the data.
[00:07:07] Simon Maple: Right.
[00:07:07] Amanda Brock: And that is almost universally the truth. Not quite, but almost.
[00:07:12] Simon Maple: Yeah. What's the value to the end user of something being open source as a model?
[00:07:18] Amanda Brock: Yeah, I think it's enormous, and I think it depends on where you're sitting in the ecosystem and who the user is. So if the user is an innovator, it opens up innovation. Because we saw a moment in time that I'm sure we'll come back to when we looked at Llama in what, I'm going to get the year wrong, 2024? 2024, right?
[00:07:37] Amanda Brock: So I'm all over the place here, decades out, but it's like two and a half years ago. In July '23, we saw Llama actually be opened up, and I'm sure we'll come back to that when we get to that point. I've lost your question; I'm completely sorry. I completely lost my thread there with the years.
[00:08:02] Simon Maple: What's the value of open? Why?
[00:08:03] Amanda Brock: So the value depends very much on who you are and where you are in the ecosystem. And if we're looking at innovators, that access to technology... and when we saw the first LLMs being opened up in 2023, first of all with the Llama leak and then Llama opening up in July, what we saw was a pace of innovation that was really unprecedented, right?
[00:08:26] Amanda Brock: And it was suddenly giving the ecosystem the access. One of the big values of something being open is iterative development: innovators being able to freely access and use technology they wouldn't otherwise have access to and to build on top of it. And you see that shift from Meta with Llama through to DeepSeek with R1.
[00:08:46] Amanda Brock: And that wouldn't have happened, one without the other. So you get that kind of innovation. You also, when we look at the landscape globally, see two dominant players: China and the US, way ahead of everybody else. And then we have the middle countries; there are about ten of us together, and really affording what China and the US have built without collaborative innovation would be impossible.
[00:09:06] Amanda Brock: So we really need access either to what China and the US have done, or we need access collaboratively to further innovation to be able to just fund the scale of the cost of AI innovation. And then for sort of end users and individuals, it's, again, it comes back as most things with open source do, to cost benefits, democratizing technology, and allowing more access.
[00:09:35] Amanda Brock: You know, in the India AI Summit recently, we talked a lot about access for all. And when you look at something like the Global South, the only way they're ever going to get that innovation is if we open it up and create access.
[00:09:46] Simon Maple: Yeah. Yeah, it's actually a lot, and a lot of the things that you mentioned there are also still very true of traditional open source that we mentioned with libraries and people building upon that. It's a builder's world, right?
[00:09:57] Amanda Brock: Absolutely.
[00:09:58] Simon Maple: So, when you mentioned Meta and Llama there. Now Meta stated that Llama is open source; the Linux Foundation backed it up. You called them out.
[00:10:08] Amanda Brock: Yeah. That was all a bit confused, right? Tell us about that. The Linux Foundation didn't back them up, so it all got very confused. In July '23, OpenUK was actually the only organization that supported the launch of Llama as open innovation.
[00:10:24] Amanda Brock: And it was very carefully structured all the way through; I think it was the 23rd of July off the top of my head to that date, as open innovation.
[00:10:34] Amanda Brock: Because the license has two things: it has an acceptable use policy that puts restrictions in, and it has a commercialization provision that says when you hit X million users, you have to go back and get a commercial license from Meta. Nobody knows what the terms of that license are, because I don't think anybody's hit that level of use that's been trackable to date.
[00:10:51] Amanda Brock: So what that means is, when you go back to that basic principle of open source, that anyone can use it for any purpose, that just doesn't happen. Right? You've always got the risk of this restriction. You don't have that same free flow, that ability to cascade the outputs and let somebody else use it, iterate it, and build on top of it.
[00:11:10] Amanda Brock: So that's kind of broken. And it was fine as open innovation, and we supported it. And I still think it's the right thing to do, because that shift, opening up an LLM, was critical for change in the industry.
[00:11:23] Amanda Brock: And I think it will go down in history as one of the seminal moments. If you talk to people today about Llama and Meta, they'll say to you that the word on the street is that they're moving away from openness.
[00:11:34] Amanda Brock: And I think it's because they haven't done it right. They haven't actually created true open source. So they themselves weren't calling it open source until Zuck posted on Facebook. And by that point, you know, nobody was going to go backwards from there, whether it was intentional or not.
[00:11:49] Amanda Brock: We'll never know unless he tells us, of course. But you know, I recently met Yann LeCun and had a really interesting conversation about some of his aspirations, and he thinks we should be building a global model across countries, across borders, collaborating, and bringing money from states together to do that.
[00:12:07] Amanda Brock: So I suspect they understood what open source was.
[00:12:10] Simon Maple: Absolutely.
[00:12:11] Amanda Brock: Yeah.
[00:12:11] Simon Maple: And for people who are building on something that they believe is open source, but actually it's not truly open source in your definition, what are the real-world risks that I as a developer might have if I'm using something thinking it's open source when actually it's not?
[00:12:29] Amanda Brock: Well, we get into something called "open washing," which existed in open-source software before. To be quite honest, it's such a nerdy thing that I never thought it would be in the mainstream. And there have been headlines in the last couple of years in the New York Times and The Economist with open washing on the cover of the magazines.
[00:12:48] Amanda Brock: So open washing is where somebody takes that open-source goodness and implies that the deliverable they're sharing with the world benefits from the open-source goodness when it doesn't really. So if you are Meta and you're saying Llama is open source when it's not, you're misleading people. It's disingenuous at best.
[00:13:07] Amanda Brock: But what it means is you can't have the same reliance in the ecosystem. So you can't take, use, or recycle it; know that those taking it from you can also use and recycle it. And that becomes problematic, at its simplest term. And I actually think it's why Meta now... We're looking at shifting away, because to get the real value from open source, it goes way beyond that legal definition of having a license and making it open.
[00:13:34] Amanda Brock: It goes to the heart of community and collaboration and contribution, and if people can't trust that openness is going to be there forever... So it was like, we did a piece recently with one of the creators of MCP, and he was talking about why they've put MCP into the Linux Foundation's new AI Edge Foundation.
[00:13:53] Amanda Brock: And it's all about keeping it open and knowing that you've got that security forever of it being open. I think that that's the bit that's missing. When you open wash, you take away the trust, and if you don't have trust, you will not be successful because you don't have the ecosystem around you, and you won't grow that ecosystem the way you would with real open source.
[00:14:14] Simon Maple: Yeah. Trust is a really important word here as well. It's what, you know, when we talk about open source and talk about the usage and the sharing of open source code and open source models, trust is really what binds the whole community together in terms of that. So it's a term that I think we really need to think about when talking, when using this language.
[00:14:30] Simon Maple: So for me as a developer, if I'm using a model, how do I go about working out and deciding whether this model is open source? How open is it? Is there a good, easy way I can do that?
[00:14:50] Amanda Brock: That's quite a difficult one. So I suspect that if you're a developer doing that, you're more technical than me. So you understand the component parts of the model better than I do, and you are better placed than I am to judge which bits you need.
[00:15:03] Amanda Brock: From a licensing perspective, you want to make sure that it's in one of the OSI-approved, OSD-compliant licenses that we all know, the GPLs, Apaches, or MITs, something that you've heard of and that hasn't been amended. And I think that's quite critical to it. So it's really understanding what the licensing is and then which bits you're getting and you're not.
[00:15:23] Amanda Brock: I think we really see a benefit from documentation. And when we saw DeepSeek releasing R1, one of the things that really differentiated it was that the data that it was trained on wasn't provided, but the documentation was so good that within days Hugging Face had built R1 Open, and they were able to go and train the same thing themselves because they had that replicable data to help them understand.
[00:15:48] Simon Maple: Mm-hmm.
[00:15:49] Amanda Brock: So I think you're looking at "What do I need to use this?" if you're a developer.
[00:15:53] Simon Maple: Yeah, absolutely. Let's talk about actually one of the big rumors more recently is about DeepSeek-V4, right? When that's going to come. I don't know if that's going to come.
[00:16:04] Simon Maple: I don't know. But back in the day, a year ago or so, when in fact I think I ran a panel at State of Open...
[00:16:11] Amanda Brock: Yeah, you did.
[00:16:12] Simon Maple: In and around DeepSeek with some of our Chinese colleagues.
[00:16:18] Simon Maple: That's right. And it was just as DeepSeek came out. I think a couple of weeks, maybe a week after. Now, DeepSeek was a model that was, I think, a distilled model for...
[00:16:23] Amanda Brock: Yeah, R1 Open. Yeah, R1, not R1 Open. Yes.
[00:16:27] Simon Maple: Yeah. And so, first of all, what is a distilled model?
[00:16:30] Amanda Brock: Yeah. So they took Quinn and they took Lama and used what was in there through this distillation technique to create something equivalent.
[00:16:40] Amanda Brock: Without actually building a model from scratch. Mm-hmm. And that was super impactful because it took the price of building a model down apparently to 5 million instead of a hundred million. Right? Yeah. So 5% of the cost at that point in time. Mm-hmm. I think that that distillation was real. Huge in terms of the innovation for the time.
[00:17:01] Amanda Brock: When we look at models now, um, India has just released through SRA open models, which are small language models.
[00:17:08] Simon Maple: Mm-hmm.
[00:17:09] Amanda Brock: And it's quite interesting to see how that progression has worked in the last year. And I think these small language models, you know, that can be used on phones and things, really, that's where we're shifting to as a market.
[00:17:19] Simon Maple: Mm-hmm.
[00:17:21] Simon Maple: It's interesting, and it's interesting when people are saying, "Oh yeah, this is my model," how you actually sometimes don't even realize the models behind it.
[00:17:26] Amanda Brock: Yeah.
[00:17:27] Simon Maple: I think I was reading a thread; it was either on Hacker News or Reddit this week just gone about the new Cursor... is it the Composer 2 model or something? Along those lines. And there was the... wait for that. Sorry about the sirens. The sirens go off.
[00:17:54] Simon Maple: He's trying to find my good side, is what he's doing. Yeah. You won't find it.
[00:18:04] Simon Maple: Where shall I start?
[00:18:07] Amanda Brock: Go back to DeepSeek.
[00:18:11] Simon Maple: All the way back to deep seek.
[00:18:13] Amanda Brock: Don't have to go again, but I'll just talk more about DeepSeek.
[00:18:40] Simon Maple: No. Oh, so Composer 2, which is Cursor's model. I think there was talk about that being derived from or repackaging Kimi under the covers, which is a Chinese model.
[00:18:48] Amanda Brock: Interesting. Yeah.
[00:18:49] Simon Maple: Yeah, I know Kimi. Yeah, so I don't know how true that is. Maybe we actually don't keep it in just because I don't know how true it is. But it was a Reddit thread.
[00:18:55] Simon Maple: I think, actually, I think Kimi actually posted as well, validating it, but then they deleted the post. But,
[00:19:02] Amanda Brock: So I had quite an interesting experience in December. I went out to China. Several years ago, I edited a book on open source law, and it's been translated by the Chinese open source community into Mandarin.
[00:19:15] Simon Maple: Mm-hmm.
[00:19:15] Amanda Brock: It's 640 pages long. So it took a bit of time to do it, and I was really lucky to get a call with DeepSeek while I was there.
[00:19:24] Amanda Brock: And I'm not allowed to discuss the content specifically, but it was an interesting experience, and it was interesting not just to talk to them but to talk to people across the marketplace there. And you know, DeepSeek is obviously big and so is Kimi. But Kimi seemed to be the thing that absolutely every developer was using in China.
[00:19:42] Amanda Brock: Interesting. Yeah, it really seemed to be running ahead for the developers.
[00:19:45] Simon Maple: Yeah. And it's interesting actually, because Kimi, I'll say this and then we can decide afterwards if we keep it in. And that's interesting with Kimi, because there were rumors, I think, on either Reddit or Hacker News.
[00:19:56] Simon Maple: That Kimi was actually the core model behind Cursor's Composer 2 model. Right. Which was just recently, and there was a, I don't know if it was trolling. I don't know how much truth there is, but it's very interesting what I think more and more developers are becoming.
[00:20:11] Simon Maple: It's becoming clearer to developers that not every model is built from scratch. No. And models are learning from each other. Absolutely.
[00:20:19] Amanda Brock: Yeah. It has to be.
[00:20:19] Simon Maple: Yeah.
[00:20:20] Amanda Brock: It really has to be.
[00:20:21] Simon Maple: So at what time do you think it makes sense for a developer to use an expensive model?
[00:20:27] Simon Maple: Maybe an Opus or something from OpenAI
[00:20:31] Amanda Brock: Yeah.
[00:20:31] Simon Maple: Versus the cheaper model. You know, these models that are derived from each other because the costs are so low, they're naturally going to be cheaper for the end user.
[00:20:39] Simon Maple: So at what stage does a developer need to make that decision?
[00:20:42] Amanda Brock: I think we're in an odd position right now where, when we look at uptake and adoption, it is actually much lower now than you would expect it to be on the open-source front. And it's very reminiscent of open source software 20 years ago. And I think a lot of it is for similar reasons, which were about risk management, lack of understanding, maybe unnecessary fears, not really knowing what-
[00:21:08] Amanda Brock: The consequences of it being open were going to be. And I think what we saw then were risk professionals, lawyers, procurement, finance, who had to sign off on contracts saying no. And then there was waprocurement, and momentum gathering around GitHub. I think what we will see, it won't be a decision about a specific engineer or developer making a decision to use it, is we'll see an industry shift where it just becomes the norm.
[00:21:35] Amanda Brock: And I think it's an absolute inevitability, and I'm reassured by hearing people like Yann LeCun saying the same thing at events I've been at in the last few weeks. I just think there is no way around it, particularly for anybody who is not from the US or China. And of course if you're from China, you're already probably using models.
[00:21:54] Simon Maple: Yeah, absolutely. Do you feel like there's a group? Maybe it's a startup versus an enterprise or something. Do you think there's a style of user that finds open source models easy to-
[00:22:06] Amanda Brock: I think it depends on where you are in the ecosystem. And there's a lot of concern about IP, managing your IP, leveraging your IP, and making money.
[00:22:15] Amanda Brock: Right? And I think this may be the wrong podcast to be saying this, but I think there is a piece where there's no return on investment right now. People are not seeing the returns that they expect to get from AI at this stage.
[00:22:34] Amanda Brock: And they're working out how to actually use it in a practical way when you get to that business level. And I think we're a ways off from people being able to say, "Here is how AI is working for me, and here is how it's saving me cost, taking away risk, increasing my productivity. So I think there are different levels in that ecosystem. There's the core creators, the development community, and then the end user.
[00:22:53] Amanda Brock: And it's a gradual shift that we're going to see. I don't think it's particularly about anybody being the right user for it.
[00:22:59] Amanda Brock: I think it's about building understanding. I think there's been a quality issue for a long time.
[00:23:04] Amanda Brock: And we see that shifting now. I'm told that there's very little difference in the testing now.
[00:23:09] Simon Maple: Mm-hmm.
[00:23:09] Amanda Brock: That your open-source models are getting to a point where they are good enough.
[00:23:13] Simon Maple: And there's probably like a variation in the accuracy that is needed per task as well. Sometimes you might want to do some prototyping or something just to try out an idea and actually get some rapid development.
[00:23:26] Simon Maple: It's not necessarily production code in that case. I know I don't want a slow model that's thinking very deeply
[00:23:32] Amanda Brock: Absolutely.
[00:23:33] Simon Maple: I want something that will create something fast and cheaply for me.
[00:23:36] Amanda Brock: And that was one of the things I took away from China, to be honest. They're absolutely obsessed with keeping everything lean and reducing compute.
[00:23:43] Simon Maple: Mm-hmm.
[00:23:47] Amanda Brock: And I think we will increasingly, particularly in this world of geopolitics, see people being more and more conservative about the access they have to compute and infrastructure.
[00:24:00] Amanda Brock: Looking at what's most productive. And I think some of the models will start to come into play then. Mm-hmm. I think there'll be a lot more around these small language models and opening up models and sharing more.
[00:24:14] Simon Maple: How much do you think an agent, an AI agent, can plaster over some of the problems that a cheaper model, or not problems,
[00:24:26] Simon Maple: But some of the inefficiencies may be that a cheaper model will show that a more expensive model would be able to do out of the box?
[00:24:34] Amanda Brock: That's an interesting way of looking at it. I think we are seeing such a shift to GenTech because it really starts to deliver what we all thought AI was going to give us.
[00:24:42] Amanda Brock: Right. It's a realer, more productive output that you are getting by applying an agent to what you're doing. I suspect that that is exactly as you're describing it, a sort of sticking plaster that will hide a multitude of sins within the models. Mm-hmm. And I hadn't really thought about it like that before, but yeah, I think probably right at scale.
[00:25:04] Simon Maple: It's like sometimes
[00:25:05] Amanda Brock: if
[00:25:05] Simon Maple: I ask a model a hundred times to do something, and it gets it right once. The agent can share that with me, and I don't have to go through the pain of the 99 problems.
[00:25:13] Amanda Brock: No. But I've also been talking to people recently about abstention and models being trained not to respond when they don't know the answer.
[00:25:19] Amanda Brock: And it's not about hallucination per se. It's just about some of the bullshit answers you get from models because they always want to please. Mm-hmm. And I think retraining them in that way is quite important. Mm-hmm. So I think we're seeing shifts there that will actually be quite critical for the business productivity of models and the agentic layer on top of it.
[00:25:41] Amanda Brock: I also, speaking to people, increasingly see sectors looking at taking models and refining those models for the sectors and training them. You know, telco is one great example. At Mobile World Congress a couple of weeks ago, the GSMA, one of their industry bodies, has pooled together all the big telcos, all the network providers, and they're working collaboratively to train on specific data related to telco to refine their models.
[00:26:11] Amanda Brock: And you can imagine then how your agent's going to work with that. It's going to become super useful, and I think as that begins to sort of bed in, that's when we're going to really start to see return on investment.
[00:26:23] Amanda Brock: And that's going to be critical to the landscape, right?
[00:26:25] Simon Maple: Absolutely. Yeah. We've talked about things like Llama.
[00:26:29] Amanda Brock: Yeah.
[00:26:29] Simon Maple: Llama, everyone has heard of Llama, I suspect. Yeah. Is there a model that you think, an open-source model, that you think, "Oh, do you know what, this is a really great model; it does so much," mm-hmm, but people aren't talking about it or people just haven't heard of it before?
[00:26:42] Amanda Brock: I think Kimi is the piece that we are missing in the West.
[00:26:45] Simon Maple: Mm-hmm.
[00:26:46] Amanda Brock: Just from the reactions that I was seeing from engineers, from developers around me in China, you know, its pace of adoption was so fast. Mm-hmm. And so great that I think Kimi's probably being underutilized in the West.
[00:26:58] Simon Maple: And tell us a bit about Kimi.
[00:26:59] Amanda Brock: Ah, so what's, I always get this wrong. Are they on their honeymoon?
[00:27:07] Amanda Brock: It's not Moonshot?
[00:27:09] Simon Maple: Is it not a moonshot?
[00:27:09] Amanda Brock: I think it's honeymoon.
[00:27:10] Simon Maple: No, it's a moonshot.
[00:27:12] Amanda Brock: Are you sure?
[00:27:14] Simon Maple: Could you just Google Kimi if it's Moonshot?
[00:27:21] Simon Maple: Moonshot.
[00:27:22] Amanda Brock: Moonshot. Well done. Well done. I just, I really struggle with it. So Kimi's a Chinese open source model created by Moonshot, which I think came out towards the end of last year. And I've not used it, but I know that every developer I spoke to said that it is just the best.
[00:27:42] Simon Maple: Yeah.
[00:27:42] Amanda Brock: So I can only recommend it through word of mouth and other people's recommendations rather than my own.
[00:27:48] Simon Maple: Yeah. No, I've heard it's actually quite similar. So you have; yeah, I've heard it's funny how when Claude heard it, it'sOpenAI were originally kind of bringing out their models and everyone was saying, "Oh yeah, Claude's the best thing. Claude's the best thing." And it's all about word of mouth a lot of the time, isn't it?
That's what works well; this is the thing that people don't understand. So when you are in that open ecosystem, or even an ecosystem, it's about community, right? And I think within engineering over the last 20, 30 years, it's something that's been refined so that it's now a dark art that people understand how to build community, how to get contribution, how to engage people, how to keep them engaged, and how to be there for maintainers.
[00:28:29] Amanda Brock: You know that stuff. If you are not from an open background, you don't understand it.
[00:28:34] Amanda Brock: You think you're going to build something and they will just come. You don't understand that you need to have a user base and people start and that sort of drum signal that everybody hears and everybody's suddenly around.
[00:28:46] Amanda Brock: I mean, Mistral, how quickly did you know about it? Because I was sitting by Saturday trying to get myself logged in. You know, it's just one of those things that our whole ecosystem converses about, whether it's through social media or through sharing tools amongst each other. Mm-hmm. But there is a very definite way that it happens, and I think that art of collaboration is something that we're going to see more and more.
[00:29:13] Simon Maple: Let's talk a little bit about people's choices to choose. No, sorry, that's my focus, isn't it?
[00:29:21] Amanda Brock: He's not listening to you.
[00:29:25] Simon Maple: I'm used to that.
[00:29:25] Amanda Brock: Not listening to you.
[00:29:27] Simon Maple: Yeah. Yeah. It's a habit. It's a habit he's fallen into.
[00:29:32] Amanda Brock: He's looking at his watch.
[00:29:33] Simon Maple: Yeah, yeah, yeah.
[00:29:35] Simon Maple: It must be; he's getting a lot of notifications this last night, isn't he? So, given that there are a significant number of open source options for people to pick from, why are most developers still defaulting to closed model APIs as their go-to?
[00:29:50] Amanda Brock: It's interesting. I think that's shifting.
[00:29:52] Amanda Brock: And when you say closed model APIs, do you mean closed models or closed model APIs? You mean APIs?
[00:29:57] Simon Maple: No, no more.
[00:29:59] Amanda Brock: Because they're probably using MCP to get there, right?
[00:30:01] Simon Maple: Yeah, no, which
[00:30:01] Amanda Brock: Is of itself an open protocol.
[00:30:03] Simon Maple: So let me ask that question again. Actually, I think it's closed.
[00:30:05] Amanda Brock: I quite liked the way you asked it.
[00:30:06] Simon Maple: I did. You like that?
[00:30:06] Amanda Brock: Yeah. I think that was quite a good debate.
[00:30:08] Simon Maple: I'm sorry; go on then. You start the answer again.
[00:30:11] Amanda Brock: So I don't know whether you really mean closed model APIs or closed models. And even if they're using a closed model API, they're probably using MCP.
[00:30:20] Amanda Brock: As a protocol, which is open to getting there.
[00:30:22] Simon Maple: That's a great question actually. So explain the difference there between a closed model and a non-open source model.
[00:30:27] Simon Maple: A closed model. API.
[00:30:29] Amanda Brock: Yeah. The same, it's whether or not the API is open or closed, mm-hmm, or whether it's something that's freely available. Generally you will get an open API because somebody wants to build an interface that everybody is able to access.
[00:30:41] Amanda Brock: And then when we look at MCP (Model Context Protocol), you have an open standard effectively. Mm-hmm. Which hasn't gone through a standards process. It's a de facto standard, like so many things in open tech, and it's just become the thing that everybody uses because it does what they need it to do.
[00:30:58] Amanda Brock: And it joins things up. It joins up between the old and the new world and the new world and the new world. But all of that being open means that in that ecosystem, even if you're going from a closed model to a closed API, you're still using something that's open in the middle.
[00:31:14] Simon Maple: Yeah, absolutely. Very interesting.
[00:31:16] Simon Maple: Let's switch a little bit from models
[00:31:18] Amanda Brock: Yeah.
[00:31:18] Simon Maple: To agents. We've touched on agents briefly. Yeah. How important is it that if someone uses an open model,
[00:31:26] Simon Maple: They then use an open agent as well?
[00:31:33] Amanda Brock: I think all of this really comes down to personal preference and choice, right? Mm-hmm. I'm going to advocate for you doing it openly because I think that you are going to build community around it. You're going to build adoption and engagement, and you are able to take technology innovation and iterate yourself on it and share it with others, right?
[00:31:51] Amanda Brock: So I think if you open that up, you're going to build an ecosystem that you're not going to build otherwise. And that's important to most people. But it is like the conversation we were having before; it really comes down to IP. Yeah. And whether people feel that the only way they're going to be able to monetize, if that's their goal, is to hold onto the IP and keep it closed.
[00:32:10] Amanda Brock: You go back to 2023, and there was a famous leaked memo from Google in May '23. It is only the opinion of one person; I have to caveat it with that, and it says, "We have no moat." And what they meant was the moat, the water around the castle, which is normally intellectual property, which will hold something safe, which will keep everybody off, and allow you to charge revenue.
[00:32:36] Amanda Brock: That just wasn't there once things started to open up. And I think that's going to be the case with most things in AI. Yeah, I suspect we're going to see it really open up, partly because we're learning from history. Right. We've ended up with eight companies, I think eight's right, that control the digital infrastructure.
[00:32:56] Amanda Brock: And I don't think anybody wants to see that as the AI future, with eight companies controlling our AI future. So the value of opening things up is something that is much more understood now than it was 30 years ago when we began those big tech journeys.
[00:33:10] Simon Maple: Yeah. I'm going to pause there. I just want to do a quick time check.
[00:33:14] Simon Maple: I didn't check what time we started. I'm guessing 40 minutes
[00:33:17] Amanda Brock: Say again? 33. 33 minutes. Okay. Not too bad. So the only other bits that I was going to talk about, we've done all those questions. Mm-hmm. There were two questions that I was going to talk about.
[00:33:20] Simon Maple: One was going to be, "What do you want to see next?" So, more future-looking. Mm-hmm. What do you want to see next in the world of open source
[00:33:32] Simon Maple: And AI? Yeah.
[00:33:43] Simon Maple: And then the second one I was going to kind of wrap up with was, "What can people expect who are interested in some of the OpenUK events and things like that?"
[00:33:51] Simon Maple: Yeah. Okay. What can they look forward to going forward? Okay. So are you comfortable with those? Yeah, totally. And then we'll do a wrap and do the intro again. Cool. So, what was my first one? It was predictions or what you're looking forward to.
[00:34:11] Amanda Brock: I do what you like.
[00:34:11] Simon Maple: Yeah. Okay.
[00:34:13] Amanda Brock: Let go with whatever you come out with.
[00:34:14] Simon Maple: Okay. Well, not even I know until I say it.
[00:34:20] Simon Maple: So, looking forward now, let's get the crystal ball out and start gazing into what we think is going to happen.
[00:34:26] Amanda Brock: Yeah.
[00:34:26] Simon Maple: What would you love to see going forward in either maybe adoption, but more the innovation around open source and AI?
[00:34:38] Amanda Brock: I'm probably not gonna give you the answer you're expecting.
[00:34:40] Simon Maple: Go for it.
[00:34:41] Amanda Brock: So, about eight years ago, China was very clear that it was adopting an open-source-first strategy as a nation. Mm-hmm. And that was embedded into policy. Then very specific activities and actions were taken across the ecosystem by the Chinese government and Chinese enterprises. And I think if you want to have a successful open-source AI ecosystem future, if you want to be able to compete with what China's achieved on the open-source front and AI, I think you really have to look at that ecosystem and landscape.
[00:35:13] Amanda Brock: And what we need, I don't know if you saw it, but about three or four weeks ago, Feryal Clark, who's the UK's AI minister, made a sweeping statement, which I'm very pleased with, but it's quite sweeping, that the UK is going to become the home of open-source AI. So how is he going to do that? How are we going to make that happen?
[00:35:31] Amanda Brock: And I think we have to learn lessons from China. We have to build on what they've done and do something more. We have to look at what everybody else is currently doing and how the world has changed in that eight-year period. And I think for me, putting those pieces of the picture in place involves things like capacity and skills development and open source.
[00:35:50] Amanda Brock: It involves things like building a national foundation. And I think we need a body that could hold something as a standard, like MCP, that could hold GenTech or language model technology on behalf of UK enterprises.
[00:36:10] Amanda Brock: And I think that's gonna be a big shift, and I'm hoping that we will move towards that.
[00:36:15] Amanda Brock: But I think that model is something we're going to see in every country. There was a press piece this week that described the Linux Foundation as if it were the US National Foundation. Mm-hmm. But we've all been contributing to that, right? And it was talking about a land grab around how the US is trying to own the standards in AI.
[00:36:33] Amanda Brock: China will be doing the same thing. For us to do it, we need the home of that. And I think we'll also see other countries like Germany and France doing something similar. They've both taken steps in that direction without fully doing it already. Mm-hmm. So I think building the environment for a successful ecosystem at a national level for me is the Mystic Meg piece.
[00:36:57] Simon Maple: That's what I'd like to see us do. Really interesting. And I think, you know, we were alluding to it during the episode; yeah, models are getting better. [00:37:04] Amanda Brock: Mm-hmm.
[00:37:04] Simon Maple: People adopting it more.
[00:37:07] Amanda Brock: Yeah.
[00:37:07] Simon Maple: Is there anything in the way, do you think, in the future for people to really grab hold of open-source models and use them?
[00:37:13] Amanda Brock: Yeah.
[00:37:13] Simon Maple: Whether it's in production or more aggressively than they are today? Or do you think it's just a timing thing, that it will happen, but it's natural?
[00:37:20] Amanda Brock: I think it will happen. I think it's inevitable. I think what's in the way of it is, again, going back to 20 years ago in software: it's understanding.
[00:37:30] Amanda Brock: And the people who are able to stop that happening, the sort of naysayers, those with concerns who maybe don't understand the risks associated. I think there's that piece where there are various blockers. You know, you probably saw about a year ago, I think it was JPMorgan's CIO saying that my supply chain isn't agentic. So there are pieces like that that will shift over time as we get more confident. And I think we probably have had so much press in the last 30 months.
[00:38:00] Amanda Brock: Around AI that has caused a lot of concerns for a lot of people. I mean, things like Botborg terrify people. Mm-hmm. You know, if you explain to somebody who's not in tech that you've now got social media for agents and they're off creating their own religion, they think that's HAL, right? Yeah. You know, and you can understand why.
[00:38:15] Amanda Brock: So I think there has to be a sort of shift in general skills, general understanding. People are scared they're going to lose their jobs, so they're resistant to it. I think we have to look at the future of society, the future of work, and the impacts that AI will have, but it's perhaps not as quick as people are panicking around.
[00:38:36] Amanda Brock: And also building that understanding and giving them the training that they need.
[00:38:40] Simon Maple: Yeah. Amazing. And let's pivot slightly back onto OpenUK.
[00:38:45] Amanda Brock: Yeah.
[00:38:45] Simon Maple: What can people look forward to in the next year with OpenUK?
[00:38:49] Amanda Brock: There's a lot going on. Yeah. So we've got ongoing reporting. We did our first international reports at the end of last year with India because we were there for the AI Impact Summit in February.
[00:38:58] Amanda Brock: We did quite a few events there. So we've got an Africa report coming out this week. Um, we are working with the Chinese open-source ecosystem to build one in China as well, particularly focused on not just open source but AI. We have a bunch of reports. We'll do our annual report, all the bits and pieces that you're used to seeing from us.
[00:39:18] Amanda Brock: We have a number of events in Parliament. We just had one last week with the Conservative opposition ministers talking about the needs of open source. I think you'll see a lot more conversation around sovereignty. We have taken State of Open, which is our annual conference that I know you know well, but we've taken it on the road this year, so we've got confirmed dates now: I think the 5th of June in Edinburgh.
[00:39:40] Amanda Brock: And West offices, and then the 8th of July in Cambridge at Pembroke College. We'll do probably three more of those in the autumn. And then we've got our awards launching after Easter. Great opportunity to nominate someone you know, or yourself even, for an award. I think we've got about 10 or 12 categories this year.
00:39:59] Amanda Brock: That will take place at the House of Commons on the 5th of November.
[00:40:03] Simon Maple: Amazing. And you actually
[00:40:04] Amanda Brock: You didn't pick up on that. It'll take place at the House of Commons on the 5th of November.
[00:40:08] Simon Maple: Oh, 5th of November.
[00:40:09] Amanda Brock: 5th of November.
[00:40:10] Simon Maple: I like it.
[00:40:10] Amanda Brock: Guy Fawkes.
[00:40:11] Simon Maple: Guy Fawkes. Absolutely.
[00:40:12] Amanda Brock: So for anybody who's not from the UK or hasn't done their UK history, the 5th of November is a night upon which,
[00:40:17] Amanda Brock: Many years ago, a gentleman called Guy Fawkes tried to blow up the House of Commons, which we're going to on its anniversary. So everybody has to behave.
[00:40:25] Simon Maple: I'm sure it'll go off with a bang. Amazing.
[00:40:28] Amanda Brock: Oh, you're the first to say that too, I bet.
[00:40:29] Simon Maple: I bet. And your laugh's getting better, actually, when someone does say that.
[00:40:34] Simon Maple: Amanda Brock, it's been an absolute pleasure. Uh, yeah. It's good to see you. And sorry it was so long. We should've done this earlier, but—
[00:40:40] Amanda Brock: We should do it again.
[00:40:40] Simon Maple: It was worth the wait.
[00:40:41] Amanda Brock: Thank you.
[00:40:42] Simon Maple: Thank you very much, Amanda Brock. And all the best with OpenUK. And for people who want to learn more about OpenUK, what's the best place?
[00:40:48] Amanda Brock: So, the website is openuk.uk. stateofopencon.com is for the conference. And then LinkedIn is generally where we hang out. I thought, as we were saying, maybe we need to do more on X.
[00:40:59] Simon Maple: Yeah,
[00:41:00] Simon Maple: Yeah, do check that out. That's a really great community, a very, very welcoming community, and actually very up-to-date with a lot of what's happening.
[00:41:07] Simon Maple: I love the reports. There was a really interesting one that I actually saw in the House of Commons, House of Lords, I think it was, right? With DORA. Yeah, the DORA and OpenUK report, which was excellent. So, really well worth checking that community out. Um, thanks very much.
[00:41:22] Simon Maple: Hope you enjoyed the episode and look forward to the next one. Please do tune in for that one. I'm going to do that again, Tom.
[00:41:33] Amanda Brock: Sorry. Should my laugh put you off?
[00:41:38] Simon Maple: All right. Where shall I start from? Just the "thanks for tuning in." Thanks for tuning in, and we'll see you on the next episode. Bye for now.
Chapters
In this episode
Meta’s Llama might not actually be open source AI, and the developers building on it have no idea.
In this episode of AI Native Dev, Simon Maple sits down with Amanda Brock, CEO of OpenUK, to break down what open source actually means in the age of AI and why most of the industry is getting it wrong.
They get into:
- what open washing is and why it is happening across major models right now
- how DeepSeek built a frontier model for $5 million instead of $100 million
- why Chinese developers are already on Kimi and the west is sleeping on it
- what the west needs to learn from China's open source strategy before it is too late
Eight companies already control the world's digital infrastructure. We cannot let that happen again with AI.
What Makes an AI Model Truly Open Source?
The term "open source" has accumulated trust over thirty years. Developers know what it means when applied to software: anyone can use the code for any purpose. But as that term migrates to AI models, the definition has become contested, sometimes intentionally muddied by companies seeking the reputational benefits of openness without delivering its substance.
In a recent episode of the AI Native Dev podcast, Simon Maple spoke with Amanda Brock, CEO of OpenUK and a veteran of open source legal frameworks dating back to her time at Canonical in 2008. The conversation cut through the marketing language surrounding "open" AI to examine what openness actually means for developers building with these technologies.
Disaggregating the Components
Rather than trying to define whether an entire AI system is open source, Brock advocates examining each component separately. An AI system might include algorithms, weights, training data, documentation, and increasingly, agents. Each component can be evaluated independently against established open source criteria.
"My preference is that we disaggregate the technology," Brock explained. "We don't try to define something that's so emerging, and we look at AI as it evolves."
The practical test remains the same as traditional open source: is the component licensed under an OSI-approved license like Apache, MIT, or GPL? If the algorithm is openly licensed but the training data is not provided, that represents a partial opening. Developers can make informed decisions about what they are actually receiving rather than accepting blanket claims of openness.
This disaggregation matters because AI systems are evolving rapidly. When organizations first attempted definitions two years ago, they focused almost exclusively on language models. That framing already feels dated as robotics, embodiment, and agentic systems emerge. A component-based approach adapts more gracefully to technological shifts.
The Llama Open Washing Case
Meta's Llama release in July 2023 became a pivotal moment for the industry, but also a textbook case of what Brock calls "open washing." OpenUK supported the launch as "open innovation" but deliberately avoided calling it open source because the license contained two critical restrictions.
First, an acceptable use policy limits what developers can do with the model. Second, a commercialization provision requires organizations that reach a certain user threshold to negotiate a commercial license with Meta. Nobody knows the terms of that commercial license because, according to Brock, no one has triggered it in a trackable way yet.
"When you go back to that basic principle of open source, that anyone can use it for any purpose, that just doesn't happen," Brock noted. "You've always got the risk of this restriction."
The distinction matters because open source trust enables cascading innovation. A developer who builds on truly open technology can share their work knowing that downstream users inherit the same freedoms. With restricted licenses, that chain breaks. You cannot build confidently when the foundation might require commercial negotiation at scale.
Brock suggests Meta's recent signals about moving away from openness stem from this structural problem. Without genuine open source, they failed to build the ecosystem and community that open source typically generates. The reputational benefits of calling something open do not compensate for the missing collaborative dynamics.
DeepSeek and the Documentation Difference
The conversation turned to DeepSeek R1, which demonstrated a different approach to openness. While DeepSeek did not provide its training data, the documentation was comprehensive enough that Hugging Face built R1 Open within days by training on equivalent data themselves.
This points toward documentation as a critical factor in evaluating openness. A model that cannot be replicated offers limited practical benefit regardless of how it is licensed. DeepSeek's approach enabled meaningful iteration even without full data access.
The distillation technique DeepSeek used, building on Qwen and Llama to create something equivalent without training from scratch, also reduced costs dramatically. Reports suggested roughly five million dollars instead of one hundred million. That cost reduction itself enables broader participation in model development.
Kimi: The Model Western Developers Are Missing
When asked about underappreciated open source models, Brock pointed to Kimi, created by Chinese company Moonshot. During a recent trip to China, she observed that virtually every developer she spoke with was using Kimi, describing adoption as remarkably fast.
"Kimi seemed to be the thing that absolutely every developer was using in China," Brock observed. "It really seemed to be running ahead for the developers."
The gap between Kimi's popularity in China and its relative obscurity in Western development communities suggests information asymmetries in the global open source AI landscape. Rumors have circulated that Cursor's Composer 2 model may actually be Kimi repackaged, though the specifics remain unclear. Either way, the model represents capability that developers outside China may be underutilising.
Why Adoption Lags Despite Quality Improvements
Despite open source model quality approaching parity with closed alternatives, enterprise adoption remains lower than expected. Brock sees parallels to open source software adoption twenty years ago: risk management concerns, lack of understanding, and procurement processes that default to rejection.
"It's very reminiscent of open source software 20 years ago," she said. "A lot of it is for similar reasons, which were about risk management, lack of understanding, maybe unnecessary fears."
The shift will likely happen at an industry level rather than through individual developer decisions. Just as open source software became default infrastructure through gradual normalization, open source AI models will probably follow the same trajectory. For organizations outside the US and China, the economics make this particularly compelling. Building frontier capabilities independently is prohibitively expensive, making collaborative development through open models an economic necessity.
Agents as a Compensating Layer
An interesting thread emerged around how agentic systems (https://claude.ai/blog/ai-agent-evaluation-framework) might interact with model quality. If an agent runs a model a hundred times and surfaces only the successful result, it effectively compensates for lower accuracy. This suggests cheaper, open source models combined with agentic retry logic could deliver comparable outcomes to expensive closed models for many use cases.
Brock also noted growing interest in training models not to respond when they lack knowledge, rather than generating plausible but incorrect answers. Combined with domain-specific fine-tuning (she cited telco companies pooling resources to train specialised models), these approaches could significantly improve the practical utility of open source models for production applications.
Building National Open Source AI Ecosystems
Looking forward, Brock emphasised the need for national-level infrastructure to support open source AI development. China adopted an open-source-first strategy eight years ago with specific government actions to support it. The UK's AI minister recently declared ambitions to become the home of open source AI, but achieving that requires learning from China's ecosystem development approach.
She advocates for national foundations that could hold standards like MCP, maintain shared model resources, and coordinate capacity building. Without such infrastructure, countries outside the US-China axis will struggle to participate meaningfully in AI development rather than simply consuming what others produce.
For developers evaluating open source AI options today, the practical guidance is clear: examine licenses carefully, distinguish marketing claims from actual openness, evaluate documentation quality, and consider the entire component stack rather than accepting blanket characterizations. The trust that makes open source valuable depends on precision about what that term actually means.
The full conversation covers additional ground on sovereignty concerns, OpenUK's upcoming events across the UK, and the trajectory of small language models. Worth a listen for anyone navigating the increasingly complex landscape of AI model selection.