Speaker-label warning: The source transcript has no per-speaker labels. The opening welcome reads as host Simon Maple's introduction but transitions into Lamis's talk without a clear marker — the seam is around "By way of introduction, my name is Lamis." The Q&A section contains audience questions whose askers are not named. When attributing words, prefer "the speaker said..." or "a questioner asked..." unless context makes it unambiguous. The two known speakers/participants are:

Simon Maple — host, Tessl Head of Developer Relations

Lamis — Anthropic Applied AI, the talk's presenter

Speech-to-text artifacts are preserved verbatim (e.g. "context blurred", "rep" for grep, "MRE", "bargening" for versioning, "pennies" for "picky", "manor incuration", "open announcement", "audiceable") — do not silently correct them in quotes.

Section 1 — Welcome & host intro

Hey everyone. Great to see you all today and hopefully we'll be having a wonderful day so far here at AI DevCon. I certainly am. I hope that your contacts for those are not too full and you have a bit of space for a little bit more information about context.

Section 2 — Speaker intro: Lamis & Applied AI at Anthropic

By way of introduction, my name is Lamis. I'm a member of technical staff at Anthropic. I work on applied AI team and this is a team which sits between research, product and go-to-market. So we do a mixture of working on internal projects as well as directly with customers who are building agents at the frontier. Me specifically, I work with startups and founders, many of whom I'm sure are in this room today. And I think I have the best seat in the house. Because these are the users that are constantly pushing us right up against the boundary of what is possible with models and products today. And as such, we just get to write the exponential together.

Section 3 — Talk roadmap

One thing that constantly comes up, as I'm sure you're all aware, is what it really takes to take the raw model intelligence that we have today and translate that into durable, scalable, useful product. And one of the main levers that we have in order to do this is context engineering, which will be the focus of my talk today. So on this journey, I want to quickly do a recap of where context engineering has gone in the past year is a space that's completely blown up. And through that, we'll kind of distill primitives and proven to be really useful. Some stuff that has been a little bit less useful. Secondly, we'll talk about what the state of the art is for memory management today. And thirdly, and in particular with that, we'll talk about not just what nice theoretical principles are, but what it takes to actually build these systems in production. And then finally, we'll talk about where this will go on the path to continual learning and in particular touching modern paradigm. Called dreaming.

Section 4 — Why context engineering matters

So we said this before. And models, we recently models all the time, they are more and more intelligent. But when it comes to actually deploying these models in your agents, in your environments, in your organization. The intelligence alone is not going to compound because they need this context that helps them perform the specific tasks that you need them to. In particular, a lot of this context is often kind of orthogonal to the novel intelligence, right? Like the newest model we just released one isn't going to get out of the box, know exactly what it takes to succeed in your organization and with the tasks that you want them to. And so it's a really great investment to work in the context engineering part because over time has the effect of multiplying the intelligence even as models get smarter. So I'm sure you'll be familiar with these problems. It's like agents not knowing their way around a code base or knowing enough about your own user preferences. And then additionally, like you don't have the effect where agents are better at the task the next time they perform it. So they might not learn from their mistakes and as such. You don't have this continual learning effect.

Section 5 — Timeline of context-engineering primitives

So just to recap. Where we've got so far on this. Journey of context engineering. Anthropic, we like to say do the simple thing that works. And this is a timeline that's only really spans the past year and where we started.

5a — CLAUDE.md files

Those were these Claude MD files that we launched with Claude to Code. And what we learned from this was that it was kind of unreasonably effective, like this markdown file that just gives the agent a couple of instructions about making your way around the code base, the organization, your own user preferences. That injected into the beginning of a model context at the beginning of a session was so good at steering it towards the things that mattered and helping it navigate. And Align its actions towards your preferences. However, we also learned a couple of things about what doesn't work here. So when we're injecting this at the beginning of the session into context, we obviously start to run into problems. Where you get problems with context blurred, like what happens when this file with very important preferences gets very, very long. How do I manage that over time? And so we went back to the drawing board and thought about like ways that we can improve this. Separately though. What was true was that having just a very simple markdown file, which is human readable, which your agent can write to, which you can write to, is really, really effective.

5b — Memory tools

So a second avenue that we investigated was these memory tools. And this is interesting because it leans into this idea of, okay, what happens if we let agents autonomously manage their own memory systems? So we let them decide when they read, when they write and when they update memories. And this is all happening in band, which means that it's within the context of a session. So during a session, an agent is thinking about like what might be interesting to pull from memory, what might be interesting to write to memory. So autonomy proves to work really well in this case. And over time we've kind of developed that into systems where we're even less opinionated about what these tools need to look like. And I'll touch on that in a second.

5c — Skills & progressive disclosure (bookshelf analogy)

So the next stop on this journey was skills, which I'm sure you've had a bunch about today. And what this solves is this problem of the like ever growing context. So we have this really clever idea of progressing disclosure. The way I like to think about it is actually first on skill. What skills are good at? Its processes where you have like a procedural workflow. So something where you have an opinion about how the process works end to end, but you want the agent to follow. And what's very clever is that the agent only looks at this front matter a couple of sentences at the top of the file before loading the skill. But you can still load as much detail as you want into the main body of the file. So you're able to at the same time have very deep levels of detail. Whilst not overloading the model's context. And the way I like to think about it is as if I had a bookshelf. In my room. And every time someone talks to me, I can kind of scan and look at my list of books and see if any of the titles might be relevant to the conversation and kind of pick that off the shelf and read it when I need to. So for example, if someone walked up to me and started speaking to me in French and I noticed I have a French dictionary, I can pull that out and it would give me context kind of loaded during the conversation that would help me without me having paid attention in like seven years of French classes at school and having that all loaded into my context already. So this was a really, really great innovation. But one bottle I potentially would have is that it's still kind of driven by humans and agents together. So you'll still, even if you're using your agents to write the skills with you, yourself being quite opinionated about like what things need skills.

5d — Filesystem-as-memory (current state of the art)

So that takes me to the final step on this path, which is what we perceive to be state of art for memory systems today. And what we have done, yeah, what we think is best practice is modeling these memory systems just as file systems. So this kind of aggregates a couple of the learnings from this path. So file systems are great. You can just fill them up with markdown. Agents actually just very good at using normal file system tools like bash and rep. So just let them search over the file system rather than being a pennies about these specific tools that they should use to read and write to memory. And then yeah, that search kind of mirrors this idea of progressive disclosure. You can index these memory systems really well so that agents can intelligently search over them. And that's where we have kind of got to so far.

So just to recap the key learnings from that format markdown is great for reading memories. You know, allow memories to grow large. But give agents tools to quickly index and search for what's relevant. And finally like give agents autonomy when they're writing. To memories. And if you were to go out and build a system, it would work really well. You would have the feeling of continual learning because your agents would get better at the individual, whatever individual tasks you're doing.

Section 6 — Production problems at scale

However. As with everything, this very neat idea runs into some problems when you try and scale it to production. So we have, and we have a concept for theoretically what works. And when we then think about scaling these to production in environments where we have many agents collaborating at the same time where they run over very long periods of time and where potentially things like code bases get really complicated, all manner of problems start to arise. And we've seen these in production time and time again. So one, a couple things just to like, yeah, spark your imagination. Think about multiple agents trying to write to a memory file at the same time. How do you manage that? Think about one agent running into a problem and deciding to like write to the organizational wide context which every other agent is currently reading from. Like if something was incorrect there, that would scale to all of your agents and be pretty disastrous. And think about when you have humans in agents collaborating on memory contacts together, like how do you keep track of what's going on? The final problem is that memories can go stale. Of course something that was relevant in the past. Might not be relevant today or maybe it was written incorrectly or even maliciously injected by someone trying to. Prompt injector agents to write bad things to memory. So you have to have a lot of guardrails in place to make sure that these nice autonomous memory systems actually work in production.

Section 7 — The four production principles

And so I'm going to talk through a couple of key principles that we use when designing memory systems in production to make sure that we do get to use all those nice effects that we've spoken about.

1. Versioning ("bargening") — So the very first thing is bargening. So when you're designing any kind of memory system, you need to be able to store versions. To keep track of what's going on. To allow you to roll back should you need to if a new update isn't particularly good. Additionally you probably want to think about like what context was this update based on? So which agent session which transcript. Resulted in me wanting to make this update. And additionally like you might want to track like who did it, which agent, which human etc etc. So this is really important.

2. Concurrency (hashing) — The second thing is concurrency. So we've talked about okay what happens when I deploy thousands of agents all working off the same memory system. And the solution that we've adopted here is to have this hashing system. So when an agent decides that it wants to write an update to a memory, it takes the hash. It then drafts its edit and then before it writes the update. It takes another hash. If those two things do not match then the agent cannot write it because it means that some update was made in the meantime. And in order to handle that agent reports an MRE drafts its new update and then tries to commit this again. So these are the kinds of just engineering practices that allow you to scale multiple layer architectures scale memories of these kinds of architectures.

3. Permissioning — Another couple key principles. So commissioning is really important. When you have large memory bases. You probably have a mixture of top level organizational wide knowledge. It might be like your key like what your organization is trying to achieve or key principles about the code base which you've really carefully curated. All the way down to the level of like a scratch pad for an agent where it writes down its working memory and it's very like individualized. And all the way in between you have things for specific organizations or cross sections. And so it's really important that you have guardrails when it comes to permissioning these memories. So like I said you wouldn't want one agent to just decide that it should update the organization wide context. Probably you might want that as read only. However for its own scratch pad you'd want it to have right access. And yeah that's permissioning.

4. Portability — Final thing which is kind of peripheral but still really important when you design memory systems. Is portability. So like I mentioned before your curation of your memory systems is going to be so important like throughout the future. This is really really important organization views are like work task specific context. And so it's likely that if you're putting a lot of effort into curating this. You want it to be accessible across potentially multiple product services. And accessible by multiple systems. So designing in a way with the clean API which is portable and you can access it is really important.

Section 8 — Benefits when guardrails are in place

And so when you put all these things together we have the kind of. Learning to have from allowing agents to creatively manage their memory and then these production level broad guardrails that allow them to like reasonably use one of those principles in practice. And when you do this you get very effective results. So just sharing here a couple of learnings from what we see deploying these large scale memory systems in production. And for example we see you get better accuracy so you have this effect where the second time the agent does the task. It actually does it better. With high results. Because it's noticed all of those memories. About what went wrong. Secondly that then has second order effect on the speed and latency sorry speed and cost of your agents running because then spending a few tokens. You more easily one shot these tasks because they actually know what they're doing. And you'll see that a possible sort of different processes agents are just able to do the task better and faster. Finally having this process where your agents are starting to autonomously write their own memories. Freeze up capacity and context for you as product developers potentially to focus on product wins while you know that the agents are doing this kind of self learning continual learning group in the background. And yeah, once that infrastructure is set up really well this works very well technically.

Section 9 — Limits of in-band memory

As ever we do then reach a new bottleneck. And this specifically is about in-band memory. So inbound memory as I mentioned before is when agents are writing to you and reading from memory within a specific session. So if you think about Claude for example when you like spin up a new session it's largely like focusing on that specific context when it's reading to reading and writing from memory. And this just architecturally or philosophically has limitations in the general like agent fleets continual learning objectives.

There's two main reasons why. First of all is that you have this inherent split of focus and resources. So you're asking an agent to complete a task. But at the same time you're also asking it to invest in manner incuration which would help it perform better in a future run. So when you put these things together it's just a very difficult optimization problem because how much capacity can the agent put into like helping future versions of itself versus doing the task you actually asked it to do. And also there's like other effects like latency for example.

The second thing is that the agents just have an inherent visibility limitation. So they only have a context of what's going on in their session. As such they just won't see patterns that happen across sessions. So when you get frustrated your agent keeps making the same mistake over sessions. It just doesn't understand how frustrating that is because it has a new context window in each of those. Secondly when you're running multiple fleets of agents in different environments. These single agents just don't have the context of what other failures other agents are running into.

Section 10 — Dreaming introduced (school analogy)

So for these two reasons we introduce this concept of some out of band memory curation and this helps to make these problems go away. And just to introduce an analogy for why this in theory should work. I'd like you to think about a school for example where you have lots of students that submit a lot of work. And you also have teachers at market and a head teacher that reviews everything. This is a system that we have in the real world for good reason and it's because when you have certain individuals that have dedicated capacity for helping people learn that's really effective. And when you also have people that have visibility over the whole fleet of agents or learners and they're able to spot patents and then kind of steer context or let's say in this case the curriculum, that's also really effective. So as always we kind of look to the real world to think about how to build these systems.

Sorry, I also didn't touch on a final limitation which is that memories go stale. So you need something, some parts that checks. That everything that's written there is still correct.

Section 11 — What dreaming looks like mechanically

And so we introduce this concept of dreaming which is like a second order process over memory. So if we think about how that's been constructed we have the like actual context which agents reference and has useful information. The memory processes which allow agents to kind of autonomously manage that contact themselves and then dreaming which is a process that runs in batch and asynchronously. With its own allocated resources to ensure that those memories themselves are affected up to date and helping the agents align over time.

So what does dreaming look like? Essentially what we do. Is we take an existing memory store. So this is a collection of memories. We then take a bunch of sessions or transcripts from agent interactions over a period of time. And we give these together to an agent which reviews all of the transcript looks at the memory store and starts to identify patterns for where there could be uplift in the memories. It then outputs a new memory store. Whether it proposed changes to the existing memory store. And what the agent is able to do, as I mentioned is spend tokens on solving this problem of making agents land better. Identify patterns for where agents are consistently failing. And then propose changes for what might make a more effective memory store such that next day when you run these agents again they actually feel smarter and they're running better.

Section 12 — Dreaming examples (school analogy continued)

To go back to my analogy just to paint some pictures of like what this could look like in practice. Let's imagine that the head teacher reviewing all these transcripts notices that. Every geography student has incorrectly answered a question. They're just all writing like complete garbage to this question. The teacher notices that actually by kind of in this case analyzing the memory store, that entire topic is missing from the curriculum. So while the teacher is able to do is notice that pattern, look at the memory store and suggest a new change to that curriculum such that the next day when these agents run they now have that information that they needed.

To give another example the teacher might notice that in a certain maths exam all of your answers are wrong in the same way all of the students are outputting radians when it's meant to be degrees. I don't know if anyone else said like GCSE maths had that problem too. What they're able to do is give an instruction saying like this is how you should configure your calculators and in the case of agents that's like noticing in the transcripts that there's something wrong with a tool configuration. So you might notice that something in the tool halls keeps failing. And what's important here is that when we look at those transcripts. We're not just looking at kind of the passes of Response between agent and the system or the user, but also really scrutinizing like those tools and all of the other metadata. That is really central to agent. S.

Finally, you can also notice something that's like fleet wide organization wide. So for example, like everybody's using too many em dashes and you don't like that, so you want. To add some organizational wide open announcement or context change that says not to do that. And so I hope that gives a picture of like why this could be really, really effective.

Section 13 — Designing dreaming in production

And now I'll just talk about how. You would go about designing such a system. In production. So you have some concept of your memory store, which is a concept, which is a collection of memories. Memories themselves might just be marked down files. Organized in this directory. You then take a number of the transcripts. And like I mentioned. That is a mixture of like the back and forth passes between the agents as well as metadata and tools, other skills they used, etc.

And the way that we've designed it, we have the orchestrator deploy a fleet of sub agents that go and analyze what those transcripts. And one point worth making here is that when you design these systems. You have the ability to steer how these agents, which both right memories and coordinate rename go about the problem and by steering, I mean that you're able to tell them like in your specific case, these are the kinds of things I think are important and relevant. These are the kinds of things that are not important and relevant. So you do have the ability there to start to curate that memory and dreaming process. To your organization specifically.

And the orchestrator then reviews all of the responses from the sub agents. And it then decides like where there are prevalent enough patterns that it thinks this warrants the change in the memories. From that it proposes individual changes to the memory store. And in our case, the way that we've designed this introduction is the agent will additionally give you examples of transcripts where it's noticed as pattern has happened and also some stats on like how prevalent this issue is and why this warrant is actually updating the memory store. And so all of this works really neatly. You get this output and you as the individual. Can decide where you want to accept changes to the memory and where you want to reject them etc. And this works really effectively.

Section 14 — Memory + dreaming in parallel

So together. We have these two processes that run in parallel. The first is memory. And these agents are using some of their. Like inbound contacts and inbound resources to write an memory where they think it's important. And this is neat because it means that in the actual next run, the next session that agent will be better. So there's a shorter time to kind of seeing that change. But inherent movie these agents have competing resources when they think about what to dedicate to memory, what to dedicate to completing the task. And additionally a lack of visibility.

So on the other side we have dreaming. Which is this out of bound process and this allows again broader visibility and dedicated capacity. I.e. token spend which is specifically directed towards helping agents learn better. And you might think, okay, that sounds really expensive. Why would I want to chuck extra resources at this? But if we kind of go back to the improvements we saw when you build effective memory stores, actually you can see a bunch of costs go down because agents are able to one shot things more effectively. They have more information that they need. In order to perform a task well.

Section 15 — Summary & call to action

To summarize. At the very least do this in a thing that works context management makes such a huge difference to your agent performance. Like forward MD file, like skills, which I'm sure you've heard about a bunch and allowing agents to autonomously manage these systems themselves goes a really long way.

Once you think about scaling those things into architectures with many agents, agents have run over a very long time. Situations where you will like continue to work and develop on a workspace or a co-pace over a long period of time. Or very complex domains. You should start thinking about adding some features or some guardrails that allow those agents to manage their memory in a way that is safe, verifiable, audiceable.

I shall also say here that whilst this kind of these kinds of practices are really effective when it comes to coding tasks for example, this by no means is just specific to coding. Like I use memory all the time when I'm producing presentations that has context on like how I like to write things, how I like my slides, etc. Etc. And that develops over time. So this is really not coding specific.

The final thing is. If you really want to kind of like close the loop here, think about adding an additional out of bound process like dreaming as we call it to consolidate the memory. And cut things that are no longer relevant, add things that agents are missing and clean up and organize memory systems.

So to close, I would say that this journey that we've been on with contacts engineering, a lot of this stuff has only happened in this past year. This is very much an open area research and development. And one which we see huge value in the future. So we're so excited to see the kinds of things and contributions that you will get to this space. So I encourage you to keep thinking, keep learning and keep dreaming. Thank you.

Section 16 — Q&A: Memory store implementations

Do I have any questions in the room? Oh golly question. I'll dive straight here first.

Questioner: Thank. You. Thanks for the presentation. Do you have any memory store implementation that you would like to suggest?

Speaker: Any memory storage implementation? That you would like us.

Questioner: So are you saying papers? No restoration. Potential.

Speaker: Okay, okay. I would like to suggest. To solve what problem specifically.

Questioner: Well one thing is put things in files that I have on my laptop. Yeah, but I think I'm looking into something more enterprisey. So what kind of solution do you suggest over there?

Speaker: Okay, yeah. So I mean maybe I was kind of coy in the talk because we're not allowed to make product transactions. But given that you asked. A lot of this reference to the architecture that we used in art, memory. Our memory infrastructure, our managed agent solutions. And so when I talk about these things about productionizing memory, so everything like versioning. Hashing etc, that's all available within our memory and dreaming API through Claude Managed Agents. So if you did want an out-of-box solution for this kind of thing, that is where I would point you to.

Section 17 — Q&A: Guardrails & permissions for dreaming

Questioner: Hi. Early on you talked about guardrails and permissions. And I think. Sure most of us have probably. Read the pod. Coded leak and memory stuff. The dreaming stuff was definitely some of the most interesting. In it. But how do you scale that at enterprise if you've got hundreds of users with different permission sets? How do you make sure dreaming follows those same. Guardrails? If it's handing out of band and the context is different compared to say the contents of the agent might have when the user is using it.

Speaker: So just to check that I understand. We have like some permissioning about what agents can access like in terms of memory and then a separate like yeah, so I mean I think these things compose quite well. Because when you set up a dreaming procedure you decide exactly which session transcripts to attach and so you could build a process which mirrors whatever permission you have on the agents. So yeah, I mean. If that's to say that it's not the case that when you kind of trigger a dream job it just takes like everything in a certain time period. You can configure it that way. But you could also just search over whichever transcripts have the same mission set as like whatever your memory stories and then make sure that that matches. If that makes sense.

Section 18 — Q&A: "Are we reinventing databases?"

Questioner: Hi. Thank you. This was really, really interesting. I found it fascinating earlier when you were mentioning about like versioning concurrency, durability, all these things. At what point are we like reinventing databases from like first principles again?

Speaker: Yeah, this is an interesting point. I still think. Like one of the things that. I suppose and this is a good reminder for me that's something I didn't say. What we're trying to do here is like thread the needle like sorry find the right boundary to draw between kind of letting these agents autonomously act and then also like which things should just be kind of programmatic things that are baked into the harness. And so I think what you allude to is like. First of all we were just kind of like letting these agents write in markdown files and just like commit whatever they wanted and now they're kind of seeing having seen which primitives work really well with thinking about like kind of codifying that in the harness. And so when we think about like hashing or the version of yeah we are kind of going back to the software engineering principles that we've seen work well in the past, but in a way that kind of autonomous agents can act and like. Can interact with those really effectively. So I think to some extent like we sort of aren't merging back into those practices but that's because we have enough signal now to know that those things should just be done in a very deterministic way and there's no need to reinvent the wheel. I hope that answered your question.

Section 19 — Close

Perfect we're absolutely out of time. Thank you once again big round of applause for that. Where we will all. Be. Outside.

The. Face. Delivery. Help with anything? For. Me. Help at all. I think I'm good. I think. I just. Like it this. Way I go back to life. So that. S probably the right spot. Does a 10 minute. Break between the two. Anyway? If you. Re. Wondering why you were going with that.

.tessl-plugin

talk-azriel-executable-specs

talk-baker-sadogursky-context-engineering-skills

talk-batey-building-product-teams-age-of-ai

talk-birgitta-closing-keynote

talk-cormack-tests-lie-observability-ai

talk-debois-agent-enablement

talk-douglas-training-ai-on-your-own-code

talk-dubnov-merge-rate-ai-adoption

talk-farley-vibe-coding-best-we-can-do

talk-firtman-web-mcp-agentic-web

talk-foxwell-reinvention-dev-team

talk-groetzinger-skills-everywhere

talk-jones-odevo-ai-native-transformation

talk-jourdan-pipelines-to-prompts

talk-katsioloudes-code-security-ai

talk-kerr-bipolar-disorder-dysregulation-ai

talk-kushwaha-benchmarking-agent-era

talk-lamis-context-engineering-dreaming

talk-lawson-agent-experience

talk-lopopolo-harness-engineering

talk-lubken-embedding-pi-coding-agent

talk-maleix-collective-intelligence

talk-marsden-agent-desktops

talk-martinelli-spec-driven-development

talk-moss-skills-team-workflow

talk-obstbaum-willoughby-vibes-to-metrics

talk-overweg-one-brain-no-filtering

talk-podjarny-skills-are-the-new-code

talk-roberts-ai-native-brownfield

talk-roberts-brownfield-ai-native

talk-ruiz-agents-on-canvas-tldraw

talk-scheire-artificial-intelligence

talk-selajev-docker-sandboxes-agents

talk-sloan-harness-engineering-beyond-code

talk-smith-connecting-context-future-transports

talk-stack-humans-architect-ai-writes-code

talk-syme-agentic-repository-automation

talk-thomas-ai-native-engineering

talk-trieloff-browser-agents

talk-walter-runtime-intelligence-agents

talk-wotherspoon-humans-vs-slop

README.md

tile.json

ainativedev/aidevcon-2026-ldn

transcript.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}talk-lamis-context-engineering-dreaming/

Transcript — Context Engineering, Memory Systems, and Dreaming — Lamis