CtrlK
BlogDocsLog inGet started
Tessl Logo

ainativedev/latest-aidevcon-speakers-london-2026

AI Native DevCon 2026 London — all conference sessions as interactive skills

66

Quality

83%

Does it follow best practices?

Impact

No eval scenarios have been run

SecuritybySnyk

Risky

Do not use without reviewing

Overview
Quality
Evals
Security
Files

transcript.mdtalk-douglas-training-ai-on-your-own-code/

Transcript — The beginner's guide to training AI on your own code

⚠️ Speaker-label warning. This transcript has no per-speaker labels and was produced by speech-to-text with significant garbling of technical terms. Speakers present:

  • MC — unnamed conference host who opens and closes the session.
  • Brian Douglas — the presenter; speaks from "Good morning. I'm Brian Douglas" to the end of the talk, including answering the Q&A question.
  • Audience member(s) — one rapid-fire question at the end about Codex support.

The MC's opening covers lines 1–18 (everything before "Good morning"). Everything from "Good morning. I'm Brian Douglas." onward is Brian, with the exception of one brief audience question near the very end ("Yeah, it's a great question." is Brian's response to a question the transcript did not capture verbatim).

Common transcription substitutions to watch for: "tapes" ↔ "takes"/"taste"; "Claude Code" ↔ "quad code"/"clog code"/"clock code"; "Codex" ↔ "codecs"; "steros" ↔ "stereos"; "Qwen" ↔ "Quin"; "Sweeper Agent" ↔ "super agent" (Brian jokes about this himself); "SFT" ↔ "specialized fine tuning". When citing, preserve the artifact verbatim and clarify in [brackets].


Section 0 — MC intro

I swear. We're close. Hi. Everyone. Come on. In. I think. We have. Let me not. Look at my watch. When I'm holding. Water. We have. Four minutes. So. No rush. If there's a seat like towards the wall, could you. Scooch? I know some of you might. But. Like I said yesterday, it's a service for the introverts at this conference. I'm pretty sure there's at least. Two of them. Thank you so much. And also while you're here while we're waiting, please check these app, check your plans for the day. A couple sessions have moved rooms. And there are a lot of stairs in this venue, so you don't want to pick the wrong room with two minutes to go. Okay, we'll get started in a few minutes. Yeah. It was hard. To. Have to talk. In, like, a different. Lane. Yeah. It was. Very, like, spread out. In the basement, but. In the lavishing of the city. Like, travel. Ing a lot. Okay. But I think that is. I think that it's probably everyone. A couple of stragglers. I say stragglers. They're not even late. Anyway, we're gonna go ahead and get started. This is Brian Douglas. He is here to talk to you about training AI on your own code. I am going to be listening intently. Because I would like to know how to do this. Round of applause for Brian.

Section 1 — Brian's intro & disclaimer

All right. Good morning. I'm Brian Douglas. He'll like bw on the internet. You might know him from the internet. I used to get him for about five years. I've been on your content and stuff like that for a long time. But as of recent, started this new company called Paper Compute. I had nothing to sell you. I'm here to talk about some open source stuff that we've been doing the last couple months. And really, if you guys talk to us about the booth, so we have the blue dubstairs and the expo hall, you kind of got the earliest version of this. And a bunch of other details. So this should be a speedrun. This is not a workshop, so don't expect to, like, step one, step two, step three. I'm going to just share my journey of how I learned this. I'm more than happy to talk. Actually, I have a blog post on this that has step one, step two, and thread step three. But I'll show that later.

Section 2 — Pokémon case study setup

Cool. I'll start with pokemon. I said, who knows what Pokemon is? Okay, excellent. So I am an elder millennial. This came out when I was a kid on the game boy. It was a big deal. I spent a lot of time playing pokemon red. And I will show you. This is what makes sense eventually, but I'm going to show you how to speak from pokemon and how I sort of navigate to that. It's what kind of led into sort of training AI in your code. So actually, it also practices this also paying me from a quad as well. So usually I have your own discretion. So case study, there's a Pokemon game, Game Boy, Pokemon red. I read a paper, and I'll talk about this paper in a second, but I wanted to be able to, like, test out some of the stuff we were building open source. One thing in particular, if you went out to the booth, it takes, takes that dev as a way to capture aging sessions [tapes.dev — captures agent sessions]. And I wanted to basically validate the stuff that we were doing at scale. So, like, an intense amount of scale. So the idea here is that what we basically speed for a pokemon to capture a thousand turns per session, and then each session basically epic. And then every time I ran through a session, I would actually capture that data and go learn. That's what starts us from some kind of loop. So one of the actually prefaces actually leads into the bunch of stuff. It's kind of talked about soon. But the idea here is that we have these two things that Paper Compute open source. Once tape top dead [tapes.dev], we spent a lot of time upstairs talking about turning your dev tokens into skills using tape sessions. But the thing we didn't really talk about was stereos [steros]. And stereo is we're not really chatting about here at conference because if you're interested and you have that problem, like we talked to about it, but most folks are really just like running agents on their own macbook. If you're really, really skilled, you're probably using a cloud environment or a sandbox. But like we're all kind of this learning, we're pretty early around the trajectory of curve and flexion. But stereos [steros], it's a runtime for agents. So like a sandbox would actually battery included. Cool. So disclaimer, I mentioned this earlier when I started. I had nothing to sell you. This is all stuff that I open that this is the open source within our organization. So the only way it can pay us is through a five star rating and then review. I don't know if we're doing reviews at this event, but I appreciate that. We are. Oh, we are. Well, there we go. Bottles of water at the, at the end.

Section 3 — The Pokémon agent setup

So this is the setup that I have for pokemon. Bear with me as explains because this all connects. Pokemon is a gay. We started as a kid in your house. You wake up late and you have to go professor host cabin to get your first pokemon. So I thought, hey, if I can, like, use a couple things. So pyame boy, which I didn't actually explain a couple slides ago, but pygame boy is a python library that you could play game boy emulators in games. So if you, if one of them deep in the lore, like you get up romps, these are games that have been flash that you can then play on handheld devices or on your computer. And with pi gateway, you can play any gameboy game headless. So meaning I don't have to watch it play. I can just let it run, like, through an agent. So then that kind of unlocks and they're like, cool. Well, I can use quad code or codex [Claude Code or Codex] to basically run pygmy because quad code is just a harness. Because it says code, but it's really, it's a pretty general report that's harnessed if you do almost anything with. So I end up doing this in a sense of, like, I knew I want to run the agent thousand turns, but I also want to record those sessions and identify the learning patterns. And then every time, if I got blocked or if I hit a rope, I couldn't go over a bridge. That would get recorded. So this is actually live stream. This is actually spray shots of the game playing. So what I did is never 10 turns, take a screenshot. So that way the h can always go back in the water and find out what happened. So when they get blocked, you can identify the stating is like, it's this character is just not quite getting to conclusion. And my only conclusion was zero to pokemon. So it's very beginning of the game, you start kind of get to professor bodes happen. So battling, there's no conflict. They just get there, pick a pokemon. And I just wanted to get to that point.

Section 4 — The "politely hallucinating" failure

But the challenge is like, ran this, like, for hours and would not basically get to anything. When I found out is I forgot to tell agent to talk to people in the game. Because one of the first contact clues in the game is talk to your mom. And your mom will tell you exactly where to go. But an agent would have no idea. So one of the rules I actually had is I can't use the internet to go over and learn how to play pokemon. It has to be self learned within the ecosystem that I'm giving it to player. So the agent was, it wasn't learning. It was just politely hallucinating progress. It would actually celebrate wins and like, hey, we found this thousand turns. We did it. You know, absolutely right. You guys know that from clog [Claude]. Basically, this was back in February. So, yeah, this would have been like five, four codecs [Codex]. And like one of the offices.

Section 5 — observation.md & observer-state.json

But. What I'm getting at is like, this was like a nice construct for me to take that context and go serve it back to Asia [the agent]. So a lot of folks we talked about stairs [steros], like they have a skill that they kind of sit and look at their session weave and then they summarize like stuff that's been done. Good for you. That's great. You're, you're definitely like top notch when it comes to like using agents. But I don't do that. So instead I just dump all the context into a taste [tapes] database. So at the time we were using SQLite, we've actually looked on the postgres for limitation reasons. SQLite has single player methodology. So only one person can write the database. This setup was actually 10 Pokemon agents running simultaneously, thousand turrets each and we're constantly paying back. So it takes database, which all the raw data. So this is the log for agents. And then I have this memory folder that I, this concept was from master block post observational memory. And this is like taking your thousand sessions and then writing observation. The thing about the end of the day, you open up your journal and you're like, man, I saw this amazing talk from Brian Douglas from San Francisco, California. It's amazing. And it breaks your mom, his glasses. I'm just kidding. I don't know if you can just have the end of the day, but like these are observations at the end of your sessions. I was just dropping these in markdown. So these get written by the agent. The agent has these human breathable feedback. So things like when I found out, oh. The MDC has no idea. Like, we've not talked to any of these things. We have no context of the awareness of the entire world. We should build that into the, into the loop. And then also this observer state JSON. This is something I actually discovered later. This is a very specific post of mine. Like this could live something different in your code base if you're approaching this. I'll show you how to do that. But it's like up down left right. So like you go up to out one, the things actually work through the observer is that when you go through a door, it takes seven seconds. For a cooldown because you can't stay on back and forth through caves. Babylon's playing jrpgs or any sort of games. Yes, no. Yeah. So I just can't stand back and forth and like refresh and like go like level up and kill a bunch of enemies that way. So that was something I actually learned through the observer MD as I was like looking at the doctor and then chat with the agent and say, hey, well got here, what's happening? Like what do you think? Like what's this chat through? And usually like Socratic method to have that move.

Section 6 — Kafka + anomaly detection on the Pokémon loop

So this kind of concludes the bookmark section of this talk. But this all live on pokemon kafka actually did have 10 simotase agents also record the kafka and then capture anomaly detection. So a lot of anomalies is like not going through doors. Actually when I got to the point of battling, that was like some weirder novel days of like okay when you're below a certain HD you should probably eat a berry or you should probably switch pokemon. There's like this a lot of nuance that I had to like package and learn about the game. And I guess I won't have to like pokemon this too much. But the goal was I ended up getting to the point where I can fire it like viciously hit a self healing infrastructure loop to eventually get the three seconds of pokemon. So gain starts spans the a button and then a is the name of the character. A is the name of the pokemon and the speed runs to get the pokemon for professor oak. But at that point through the observer I ended up figuring out that don't hit a when you take your pokemon hit b because they asked you yes or no and you had it be to get out of that.

Section 7 — From Pokémon to codebases

So it's just a lot of like nuance but again like we think about our code bases things the best stuff we've been shipping for like 15, 20 years, maybe less than that five years. Like there's a lot of nuance that we know about our code base or they have learned and complexity we've created with code base that you kind of learn on the fly. But lately we've been basically this like shoot from the hip and just like we got to do every five hours session. So we're just going live next like you got refreshed, got more token window, let's just go. But we never learned from the last five hours of sessions. And that kind of leads me into this like brute force discovery and like massive learning. So it's a great way to like understand code base, search and grep identify files that you should be understanding things like writing a blog. Like I like writing blog posts about code bases I don't know. It's like one of the more fun investing things that I've been doing. And it's mainly I'm not going to write this code, but I do need to know how this code works and you know how to like navigate this and like discuss it with engineers. So what I ended up developing is like cheap simulator to identify some of these best practices that are coming out of this like card packing papers and stuff like that. So I encourage you like it's open source like take this run with it and not go play a different game.

Section 8 — Super Agent & Sweeper Agent

But what I'm getting at is like we are now in a place where we're all leveraging AI. We're all here at this conference. You didn't pay for a ticket because you were using AI or you're not using like you probably sign up today. And then catch up in one of the workshops. But quad code storage sessions codec storage sessions on your machine for 30 days. After 30 days is deleted. So as I walk through all this like pokemon navigation of like infrastructure and learning like all the sessions are also in my machine from because I use quad code that's my harness. But the value is all being lost because we're just moving on to the next five hours and like the next window the next day or the next month. We take the next $200. I guess you pay 180 pound. Is that what the price is that convert that rate. But what I'm getting at is like we're paying for this and we're just renting the tokens. And then we get no output from it. So there's a lot of opportunity for us to figure this out. So all the pokemon turned into what I call super agent. So super agent I thought would be funny to say that out loud sweeper agent. But it goes to our code base in this sweeps there. So I had a bunch of code that I shipped last summer. I was by coding and I never touched that right now. So I'm like, I've got a lot of garbage code. What if I went through and like this linked it and like massively fixed a bunch of stuff. Or what if I ended up like setting up like 10 parallel agents to write documentation or create a context like budget spectrum develop context of engineer figure this out. So that was super agents and it basically takes the separate VMs which is stereo [steros] that I talked about earlier, but then also PC [Paper Compute] has a taste [tapes] session being recorded. So then I have like now 10 take sessions of data that I can now have to one generate skills, but also take a documentation, context.

Section 9 — The data-value argument

Okay, so the sect about fine tuning a model. But this is a mass amount of data. If you're working in enterprise for a company and like you're paying for this stuff, you should be extracting value. I know some people are in a position where their company specifically, they have a deal with their data is being trained by improv picks. So you get a huge discount like that work for any property. But if you want to get the data that dropping like they'll give you a discount, that's expensive. That's a really good deal. Cursor currently is in a deal for $10 million at minimum with SpaceX. Like that's the high point. I've actually been working with a lot of companies in the last year. People are selling your data. So using, I don't want to name a bunch of companies, but like they are selling the data because it's expensive. So you should, you should type in yourself. And you should do something like that. So that's what I'm here to say. So sweepers also open source if you're interested in taking a look. There's a blog post on both the pokemon and the super agent stuff. Highly recommend this table. Look at it. One thing I'll also say is like the agent. S, they don't always think that that model. So like we've got opus or eight as of like two days ago. That's great. It's working pretty well. It's better than four seven apparently maybe. I don't know. Who knows where past performance is not to make their future growth. But sometimes you can use the cheaper model. For certain things. Like when I talk about sweeper agent, I choose haiku at scale 10 different agents and that can crush that. Because it's a very bespoke like rudimentary up down left right type experience. So now you have an opportunity to like use a cheaper model and have to get the job done. Maybe it takes longer, but like if we're using background agents, that would be kind of like none the wiser.

Section 10 — Aside: auto-research, Qwen 3.6, and getting unbanned from Claude

So. This is, I think the other broad damage, but like a Paper Compute, like we capture traces. And the beauty of this is the ever like took your school book and wrote a little picture on the bottom corner and like had like little stick man and you like fold it. Like you kind of trace back and forth and you have this cartoon. Like this is kind of what we're doing with all of our sessions. Like we fully figured things out. We've like innovated something whether we wrote the code or the agent wrote the code. We innovate something that we should keep as a novel concept that like should share and transfer that knowledge. And I also want to mention that I was exploring this a couple months ago. The same week that I finished pokemon thing is the same week that the auto research repo dropped. So everything I explained is basically this car capti has a repo about a lot of research. Quin 3.6 [Qwen 3.6] has auto research built into the opening model. So it's a concept of self healing and the self field of infrastructure exists already in a Chinese model today from Alabama [Alibaba]. So I say all this because I did get unboxed from cloud [unblocked from Claude]. So as we remind you that quad did block makeup for running 10 parallel super regions. Turns out if you run a blog post and you know them and say, hey, sorry about that. I was actually running with the sweeper agent. I had an old repo. I ended up in less than an hour. I fixed all lint errors that I was avoiding for years. This the project has about 100 users on it and like this didn't happen. It couldn't be bothered to fix this project. But instead of running super agent completely and bring it up to modern standard times six months later because it's a segmental product. So this is like kind of like what we're thinking about with tapes. Like there are opportunity for collecting the trace data, but anomaly detection. So common section of like, hey Tessl calls had failures. Like what happened? But another anomaly was actually success. So if you have 26 skills and vocations out of like 10 sessions like there's probably something good happening there. And a lot of us are like we're not even like looking with the like fine tuned of identifying anything about it. Like we're kind of like shooting again. I'm shoot from the head. We're like cool prompt me no mistakes clean code. Go. And there's a lot of stuff we can learn for even a simple problem like that. There's a lot we can learn from that one prompt. So that's why it's important for keep these trace sessions. The unblockly after the block goes took about 12 hours. Pretty amazing. It's all robots that answer emails but apparently the blog post could have gone.

Section 11 — Tapes architecture: Merkle DAG of sessions/turns

So we went up the stairs talking about this story about turning a dead token to seals [skills]. If you haven't quite got this concept yet and I kind of explained this it's like again you're you're using tokens. You get a five hour window for every time you get a new refresh of like access to tokens. You get a weekly allowance of tokens despite the fact that you're typically saying you get to loop the allowance and you can pay for extra time and for task time. So even for a situation like man should I turn on clogged code fast? Or should I pay for overages like this is the value there. There's a lot of opportunity to understand that just by looking at the data. So I want to talk about this T skills [tapes-to-skills] pipeline and I'll get to sort of the fine tuning story in a second. So the wind is down. But takes collective sessions, generates deals [skills]. A lot of folks ask us like how this works. So my co-founder in the back, he's actually the one that designed all this stuff but we have like a marble deck [Merkle DAG]. So you go get every git commits sits within a bag. So you can go and have hash right your commitments. And they have these branches and like looks like this. Um so every session has turns and I was talking about a thousand turns within a session for like my pokemon experience but it's the same thing for your session. So every time you talk tight clawed codex, olama [Claude Code, Codex, Ollama], whatever, that's your session. Your session has begun. Cloud's a little weird where they start splintering sessions in a weird way and like we'll figure out how to like display that and like in a sort of a way eventually. But just consider just very rudimentary each session you type quad [claude] that's your session. When you do clear that's a new session. When you close it, you start again that's a new session.

Section 12 — "Check the tapes" skill

So once you have a concept of like cool everything I've said in this session is now stored in a trace. Now you can do some cool things of like I've got a skill called check the tapes. So again data storing machine for 30 days. I can go back six months and say check the tapes. Like how do we get here? Like why did this happen? We can get blamed but who's actually using complaining? Cool. Okay, what were you using? Sorry. I apologize. But I'm going to take away. I think it's different in like six months because I've been building a bunch of tools that look like this way that perform much better. I built a tool called on top of land cd complicate blamer and it's just basically this. Uh it was very bad work close on it but it wasn't great. So I didn't use it anymore. But if I check the tape and say hey give me all the sessions that I like solve the door cooldown window for pokemon. Like how do we how do we like turn that into a story? What I did actually recently is I built entire wireframe of our actual cloud product and we have designers became one of the contract and she's like hey can you share your comps? With it? And I'm like I don't have any prompts like who saves a prompt? Copy it because then you don't want to see my prompts. But I had a taste [tapes session] so I could say hey based on these wireframe let's open up github issues with every single feature. And then every feature has the prompt, it has the intent. It has a token usage. It has like all the quality of that designer would need to then go take to design the product. And I had a very, very rudimentary wire frame. I'm front end developer by trade but I sort of shag in crop coded that thing to what I do.

Section 13 — TapeDeck UI

But having a skill that's called check the tapes. You can go look at deeper into the database and the way this works is also you have like all these sessions that have been charted. And we have we well I say currently we're not labeling the data right now but at the time we were doing this for take away we were labeling the data so I can have actually a labeled data of like this is type permanent stuff. This is my stuff which was super helpful. What I'm showing briefly for folks who asked enough questions upstairs is take. So Tess deck [TapeDeck] was like we had open source. There's no sign up. This is all built on the command line. And you can actually have a visualization of probability of your tape sessions which is why it's called tape step [TapeDeck]. And takes to being the most durable form medium. So I can see all my sessions and then I can sort of click in and then see all the input output settings all the prompts. Like if all the raw data and I think a lot of folks have asked a question about, well what if my prompt has sensitive information? Like what if you didn't put sensitive information in your prompts? But this is all on you. So like this is all your data and it's all the data on your machine. So like how you treat that data, how you sanitize it. Like it's a it's open for the session and open source, but like for now it's all pretty raw data. So don't put secrets in your clock code [Claude Code]. But we'll get there. So yeah, this is like just looking at the ever same old, so it's hard to see, but these are all tool calls. These like cogs and every pencil is a skill in vocation. So like this is you can actually visualize like how often skills are being called and how often tool calls are being leveraged. And this is actually about permitted in our, sorry take this document or documentation and take that.

Section 14 — Generating skills from tapes

Take scale is a feature that we have a one blog post on it. We have to know what's going on. We got until here at this conference. But because we have everything stored in a database, I didn't mention it's also factorized as well. So you can do natural language. You can say take search. Whatever strain of text you're looking for and it will bring all the sessions up of that search. So if you want to take that session with the output of all obsessions that you've searched and say hey create a skill. And this you can do with like a small model. I've actually been using, I'm a bit of a, I'm a cheap person so I've been using 40 [GPT-4o] as my main driver for models because it's a year old and it's super cheap. But you can generate pretty decent scale [skill] like first pass. I do recommend that you want a newman [a human] to like go clean up like the skill itself. And like go read it as a human with your human eyes to then identify if it actually leverage. But once this skill's up, the skill works. It gets invoked. Then you can use the tape sessions and identify how often it gets invoked or whether it's working. And then I guess you could probably get to the point where you use Tessl to identify scorecard, whether the scale is actually useful, which I'd love to talk to the Tessla [Tessl] people about that next.

Section 15 — The book analogy for SFT

But I say all this because I moved into eventually their skills. And now I want to get on this path of like what if I train the model? I had the benefit of working in this company called continue. We'll retrain the model called next edit. So back when tab completion was so cool, we had a model that we trained using specialized fine tuning [SFT]. It's the exact same way that cursor composer 2 has fine tuned your data. So if you're a pro user of cursor, congratulations. The data is now being sold to space mics [SpaceX/Anthropic — likely a STT garble of an earlier company name]. But that same pattern is basically where I'll show you in a sec. So this is the mark post I wrote about this and this will have more detail when I'm about to go to. But I had 77,000 tape sessions. So this is me just starting your core as of January and everything that I wrote, I was just recording it and I take database. And what's super cool about this is like we're in London, right? This is like the home of king Floyd [Pink Floyd] and Rolling Stones and the Beatles. Did I get all that right? Okay. Yeah. But anyway. Like what I play music and when I worked the old around this time is like they actually intently use less tracks. They actually get everyone in the room and they play live. And it's like I never bought my client listen to Beatles or anything like that until I got older. But it's got one of those weird like deep radical about this. But sometimes you can just do more or less. But right now I have a lot. So 77,000 TH [tape] sessions a lot. It's about four gigs worth of data. We're doing about a gig of data a week. So your storage sessions on your sheet and get about four gigs. If you want to push it up to a year, you're probably like approaching 50,100 gigs of this data that's sitting on your machine, which you've never looked at. And maybe a hypoStrang chatting with on Fridays. So the way I'm going to explain this is two different ways specialized fine tuning. We're not going to go super deep. I'm going to mention some terms. You can go Google and read papers later. But specialized fine tuning is the process that cursor used for composer two. DPO, which is the right preference optimization [Direct Preference Optimization]. Is a bit more specific. And we all just like break this down into cumulative is the tool that folks would probably use pytorch or unswapped lift [unsloth]. And you could fine tune again on your own data. And essentially do this. So if you think about model like a book. So every model's a book. It's got like lots and lots of motivations. They like co-located and took place. This is why you can search for pokemon and search for like big restaurants. And the same chat to be conception. SFT specialized fine tuning is as if you were to write your skill in the margins of the book. So you're now adding extra context and extra notes into the book. Does that kind of make sense to you all? Cool. So I don't write a lot of mint to my book. I haven't been school in a long time. So like better wiser. I actually think kind of annoyed when I get a used book and has a bunch of regular notes. I'm like, who is this person and why is this here? Maybe they were famous. Some of them. But that specialized fine tuning. And what I ended up doing is taking Qwen 4B. So it's a very, it's a smaller model. And I had the idea of like, what if I embed the scale [skill] into a small language model? And the results were actually really good. Uh tipp us. I didn't spend a lot of time. I didn't have a lot of data and this is all my skills. Actually, this was my all my skills. This was me and three other team members skills. And I was able to embed the scale [skills] into the bar [model]. Now you're asking questions like why would you embed in the bottle and why would you not achieve the skill and then both had hardest. And I would say why not? And I would say really just going down this nervous night moment. Like I'm just going to test it out. It's great work. It actually works pretty decently for a small language model on your MacBook or on your Lenovo or whatever your lanux [linux] machine works pretty good. So think about the use case for this is like it is the Pokemon use case. It is onboarding to a random project. It's onboarding into some random documentation that you have to go swap through. It's a very, very specialized small language model problem. It is not your daily driver. So like if you try to replace crop code [Claude Code], you will not. Now if you try to get signed equivalent interactions, you will not. But if you're just trying to like prove a point and do a cool thing for something that's very bespoke if you're working on a machinery or you're working with like weird languages that have bespoke situations like memory management. This is an approach and I talked to a couple enterprises who are approaching it in this manner.

Section 16 — DPO: the Cliff Notes / Matthew McConaughey aside

Uh, I will mention the DPO, which is reinforced force net warning [reinforcement learning], which kind of looks like if you have two choices, it will always pick the best choice no matter what. So this is actually really expensive and I'll just get that out of the way. Uh, I don't recommend doing this ever. Unless you're a researcher and you work at meta, I don't know that guy's here. But like this is probably a thing that costs a lot of money that you have very expensive problem for that you have sort of reasons to use this. And I say this because I was actually doing this all in a 4070 RTX. So local gaming rig specialized kitting works perfect. This could not run. It needed 24 gigabytes. No, sorry, 832. I had 24. So I had to upgrade my, I upgrade to a 32 gigabyte 5090. Which I shamelessly did. And the results were basically middling. So lucky enough I actually got a friend who was at Nvidia. I will not name their name. I think it's an h100s. And sorry, the other thing I was mentioning is like cliff notes. So if anybody's quick notes left hand here. Or just American dropouts, so we use those things. So click meds if I explain this. This is the book you get instead of reading the actual college textbook. So you will get the cliff note and then you'll take the test. Not recommended. But it's like if you just want to skip the line and see like, just take the test and learn to click notes. Um, so that's what DPR [DPO] is. That's fascinating. It's culture. Culture shock here. This is why university costs so much to the US.

Section 17 — Hardware results

Um, so results DPO on a 4 million parameter [4 billion parameter — Qwen 4B]. Not good. Not even worth it. Don't even try to be like an expensive. Like you'll probably spend $1,000 a week in compute just to get something that didn't work. Out of 7 billion parameter model. Yeah, go for it. It also very expensive. So I actually was just doing this because I know I had this talk and I wanted to like kind of round up this talk into a conclusion. But what I'm doing next is I'm actually meeting with folks who actually know how to do research. And we'll figure this out together. But this is more of a side project for me. So with that said, DPO, I'll write, all right. All it does is pick the rights selection every single time. All the time. Anybody watch this movie? Is this an American culture thing? Just the American in the back. All right. Matthew McConaughey, one of his earliest movies. It's a quote to me.

Section 18 — Wrap-up: the three steps

So basically I'll round this up is like step one is capture the sessions. Like if you leave this room today, go look at that clause [Claude] sessions and start using them. If you want to park like the data and pull it off your machine, tape.dev [tapes.dev], it's open source, definitely leverage it. But the next step is knowledge transfer. So the best part about this is multiplayer coding session. So multiplayer as in the sense of you're all solo player on your chain mobile. If you have a second machine, you're probably not transmitted data back and forth. Like let's get to the point where we actually transform that knowledge, then transfer the knowledge actually looks like skills. It's probably the best way to do this. And then you now move into a place where you unlock the hardest [harness] freedom. So like if you're using clock code [Claude Code] and you want to use codecs [Codex] because codex turns out to be better this week. You can just transfer the skills over the codex or pipe builder or anything else. But we have the option. We have optionality to do this. I know as of today, a promise for an IPO in the next couple months, it's announced their SCC [SEC] filing. I'm going to read it on the flavor at home. Um, my background is finance. I went to school for it. So that's my. Guilty pleasure to reading SEC filings. But the next thing is model choice. Like at that point, you don't need all this all the time. You can use high quality. We use some random Chinese model. Like that's an opportunity and then eventually we'll start that the other shoe will drop and we'll have to think about cost. So trace it there reinforce with learnings. Find us an expo hall. That is me on BW. And I think I have that slide. This will be on the internet. Thank you.

Section 19 — Q&A

So much, Brian. We have one minute. If anyone has a rapid fire question.

[Audience question — not captured verbatim in the transcript; from Brian's answer, it was clearly asking about Codex support.]

Yeah, it's a great question. So today works for, it does not work for codecs [Codex]. It worked at a moment. But things change quickly. So we currently today work with Cloud Code [Claude Code], Conductor, and Ollama. We're more than happy to make it work for Codex if someone needs to say they need it. That's pretty much the only implementation. That is all depending on. I'm so sorry. But as well.

[MC closing:] So catch him asking questions, hunting down. We'll be here. All day. Thank you guys so much. We have a coffee break. And like I said, we're beginning to please check what your next session is, where it is. Feel free to ask me if you're not sure. But I am presenting in the other room. So ask me quick. See you at 11:50.

talk-douglas-training-ai-on-your-own-code

README.md

tile.json