Speaker-label warning. The source transcript has no per-speaker labels. The opening (~first paragraph) is a conference host introducing Oleg; from "My name is. Oleg." onward, virtually everything is Oleg. The closing Q&A fragments are audience questions interleaved with Oleg's answers but the labelling and word boundaries are heavily garbled by speech-to-text. When quoting from the Q&A section, hedge attribution. Preserve all speech-to-text artifacts (e.g. "as next" likely = "SBX", "ext lives" likely = "ext live", "Testkube" appears mistranscribed as variants).

Participants explicitly named in the transcript: Oleg Šelajev (speaker), Alberta (Oleg's daughter, mentioned by host), Liran (from Snyk, referenced but not present), Simon (audience member, greeted mid-demo), Justin Cormack (next speaker, mentioned by host at end).

Section 1 — Introduction & framing

Okay. You're back with me for the remainder of the conference now so looking forward to looking forward to finishing off the last half with some great sessions as well. I'm really looking forward to inviting up this next speaker because me and Oleg go back many many years and one thing that I really love our conferences where folks can kind of like bring their families and their other halves. I know obviously in two or three speakers have done that and I think this is the first time Oleg is going to be presenting in front of such an amazing audience and room with his daughter Alberta here so I want everyone to say after three hello Alberta okay one two three hello Alberta awesome right now you're going to see how smart or not so smart your daddy is okay. If that is very very smart so I worked with Oleg many years ago at ZeroTurnaround where we did like a huge ton of amazing stuff around bytecode manipulation things like that Oleg also moved on to another company called Atomic Jar doing a huge amount on Testcontainers and Atomic Jar were acquired by Docker so I actually got a couple of sessions from from a Docker and ex-Docker CTO coming up straight after so it gives me huge pleasure to invite up on stage an amazing speaker amazing person and a really good friend please give it up on next live.

My name is. Oleg. I work. At Docker. And I'm a member of the developer team doing little things AI. And Docker like every other company has been trying to figure out its place in the AI ecosystem and how we can help. And one of the recent initiatives that we have is the sandboxing AI agents and this is how this session came to be. We're going to talk about isolating your AI workloads specifically running locally so if any of you are writing agents. To do some stuff for you and you're running them on your work machine. And you maybe consider if there are some other ways or better ways that you can do that or what you need to take care of the session for you.

Section 2 — Spectrum of AI usage and the YOLO problem

And of course of course we all run things and we like the AI to do as much as possible for us. So the more we do that the more the horror stories come online every week or so there is another news article or Reddit thread where somebody complains how AI went rogue and did something that it was not supposed to do. And that is just half of the story if a rogue agent messes up your laptop or optimizes the home directory or destroys your database even and then it's completely different story if whatever the malicious action is actually targeting your machine, your credentials, your identity and it can result in a massive security incident. So we want AI to do stuff for us but we also don't want to really give the keys to our whole lives to this soulless. Maybe artificially intelligent entity that has no consequences for itself whatsoever.

So there are different stages where people can be in their AI usage story and to the left we have more. To the left. We have more conservative usage where you can use maybe AI autocomplete or assistant and of course to the right. I hope most of us in this audience are where we use AI more autonomously and give them tasks as goal set like setting goals for AI for them to actually try to figure out the path to achieving them and this is where you can have like autonomous agents or swarms of agents or any sort of hierarchies. How many of you run agents? How many of you do that only on your machine? Okay how many of you run sort of YOLO. Brave? Very good. So more autonomous more skipping permissions more removing a human as a bottleneck right yesterday was talking how from OpenAI there was a session about the prompt engineering how removing human is the only conceivable way now to scaling the workloads further and faster. And this is what we need to do more autonomy. More autonomy comes with a drawback.

This is Liran from Snyk. Yes they had such a community saw the session. Perfect. If you watch an online you can go and watch that video. Excellent session how AI can be influenced by malicious prompts and how it can do things on behalf not you but somebody else who could be an evil person. How many of you got scared watching that session? Very good. This is the 15 second PSA watch the run session get scared use sandboxes.

Section 3 — The trifecta of risk

Very interesting. The reason AI is dangerous is the combination of private data, external communication, and untrusted content. That's the point where security teams start to struggle. Leadership wants more AI, developers want more AI, and the middle is responsible for enabling it safely. Because when an agent is allowed to act, it can also be coaxed into doing the wrong thing if the guardrails are only advisory.

The problem is that the stakes there are completely asymmetric. You need to protect your data and your environments all the time. The attacker only needs to succeed once, and that is why the problem is so hard. Once a prompt injection works, the blast radius can grow very quickly across the tools and libraries the agent can reach.

Section 4 — Why soft guidelines fail

So it's very hard. So the question becomes what your agent can do for you and how to make it so it can do useful things but you have the control. On telling it what to do or not. The important part is that when the agent has something in mind it will keep trying until it fails. You need to be very mindful of that and even without sudo, people and the agents can find workarounds which are completely legit. For example by using Docker. That's not an issue; it's the documented workaround.

So if your security policy is just a suggestion, it is not enforceable. And while you might feel safe by putting a reassuring sentence in the prompt or the skill, that does not create a real boundary.

Section 5 — The "best skill ever" demo

So here is a quick example of that. I have a little skill that performs a local awareness check. It gives you a report so you can see how exposed a machine could be. It is not a malicious skill, but it does demonstrate how easy it is for sensitive material to surface.

So then some people come to me and say like oh like there is the anthropic Claude and auto mode will look at the instructions and it will block the malicious instructions and then AI will do the sensible thing and not break everything. And I tried it with this particular skill. And it's a video because it's inherently unreliable. So what you can see here is Claude running in auto mode and I'm asking it to run the skill. It sensibly refuses when the instructions are dangerous, which is what auto mode is supposed to do. Then we continue the session and I ask it to rewrite that in a python code. It does that, but the point is that if you let the context reset, the same agent can be coaxed into running a security audit against the machine. That is the part that matters here.

So we can do this. Boundary is if it's a suggestion, if it's a guideline in the context of the agent, that is not enough. It might work for you and depending on your risk profile you can accept that and be like I'm not risking too much. Right? If it's your personal laptop. After all you are exposing your personal life if you're a company, if you're a security team, if you're a platform team and you need to enable loads of people, please is not the security. 25th frame so you get a little bit scared by Liran reminding of your dangers. So you need hard isolation. You need a way for the agent to run unbothered because you cannot be accepting all the requests. Right and as we saw even the automatic controls as an auto mode that will use AI to accept or not accept running individual commands that is not the enforceable policy. You need hard isolation that is on the hardware level and preferably you want to have the configuration outside of that sandbox. Where you would have the keys and the access to your files and networks outside. So the agent can do whatever it wants and you could. If push comes to shove you can just remove that environment and create it again.

Section 6 — Why containers aren't enough

So naturally at Docker and all everyone else thinks about containers. We've been packaging and isolating software in containers for years. So why cannot we just take an agent and put it into. The container? After all we had dev containers. And other approaches. And that is better than not running it in any isolated environment at all. But containers are not ideal for this. Containers are great for immutable software. It's great for packaging and databases. It's going to run there. You have the full visibility and control what goes into the container how it was built. What are the sources for that. You have the prominence files and SBOMs and all the verification there. But it's not great for the agent that is actively changing the environment within the container. It needs to install its tools. It needs to write its own. Maybe helper scripts and so on. So in the container ecosystem hasn't been really designed with that in mind. There are a number of projects that use containers as a sandbox in primitive and you are better if you're using that than not using that at all.

But at Docker we thought about this. And when we were building sandboxes we built one version with containers. We went out, we talked to a bunch of security teams at enterprises and all security teams said that no containers are not. The isolation boundary that we can trust. So being great engineers what did you do? We rewrote that using microvms. So containers were some microvms might not sound like such a big difference. But if you're sharing a kernel, if you have a number of security vulnerabilities for exploiting from outside of a container that is not something that. Well a security team can sign off because it's going to be well their jobs and their reputation when the leaks will happen. And yeah, albeit container escaping exploits are not frequent. There are like a dozen over the last maybe six seven eight years. But as we discussed you only need to be breached once and then it's bad.

Section 7 — Docker Sandboxes (microVMs) and live demo

So you want the hardware, sort of virtual machine isolated primitive and this is what Docker has built for the sandboxes, the Sandbox feature that runs in microVM on your machine and then your agent is put there within the container in addition to that microVM can run other containers. So if you are using agents for software development. The agent within this microVM can do exactly what you would do. It can run tests. You test my spin up some containers maybe with Testcontainers libraries, maybe just normal Docker Compose approach, but they can create a full environment, run your applications and that gives you much more control over how your agent can develop software because it can do whatever you would do on your machine. Just fully isolate. It.

And within that sandbox within that microVM there is no access to the host file system. You choose what you share with the thing. You have the networking proxy so all the requests out of the sandbox go through that so you have control over what it can reach. There is also a secret injection mechanism, so the agent inside doesn't have access to your private data while still being able to do things on your behalf.

So here's how it works. I'm going to try to show you a very quick sandbox. We're going to pick a random directory. This is my terminal and I'm going to say sbx run Claude here. There are a bunch of different agents supported with this. When you do sbx run, you can run Codex or Gemini or Claude or OpenCode or maybe some others. You get dropped into the agent. Now it looks exactly the same way as if you run code, but this is me running in isolation here. So if I say something like, can you explore this project and give me a short readme and maybe build this project for me as well, the agent will explore stuff and then do what it needs. It will start building the project, running Docker containers, and running shell commands, all while staying fully isolated. I can let it solve problems for me and run it for a prolonged period if I need to. That's the point of the sandbox.

Right so a sandbox that's fully isolated. It's there. It can run. If I don't run this I can run sbx. I get the little bit of terminal UI that can control and tell me what the sandbox are doing for me. So you can see the network request they're doing and so on. It just it's a it's a it's a nice ergonomics. We really wanted sandboxes to be as convenient as containers because everyone knows and runs containers. But really to have this microVM boundary with additional security controls on top. And it works pretty well.

Section 8 — Secret injection / sentinel values

So now further. So the security proxy. We really don't want your agent to have access to your data. Your host machine is a treasure of real keys. But within this sandbox we're going to use the sentinel values and we're going to inject the actual keys into the request going to particular services outside of the sandbox. And you can extend this mechanism and do more complex setups with this. So you can have your agent work within the sandbox. But for example you can do commits. And code signing and everything else outside of that and outside of sandbox you can also have the sort of the trusted environment for example for enabling and establishing the provenance and gathering the metadata and knowing exactly what went into the particular commits. So you can build a larger ecosystem of this. While allowing your agents to run autonomously without oversight.

Section 9 — The empty-environment problem & Kits

The problem with sandboxing of course is that those isolated environments are completely empty. They are horrible from developer experience perspective. They're just frankly annoying to use. It's like getting a clean machine all the time. And if you're a developer you know how much time you spend configuring your beloved. Machine to be yours. To have your tools to know what you do and then you get dropped into this empty environment and nothing works. You don't have your Rust compiler chain. You don't have access to your Maven Central for your for your Java libraries. You don't have access to this and that and everything you need to install again and again and again. And then you create a new sandbox and you need to repeat that. And of course there is a way to sort of build a template with base image that the sandbox starts from. And stuff it with all the tools in the world. So that problem is not a problem. But it's already at like bigger than the gigabyte. Which is sort of pushing the limit of what people find convenient when they run a sandbox.

So to alleviate that problem and to make sharing sandboxes and enabling teams, not individuals but teams working with sandbox is a little bit easier. We built the ecosystem of plugins which we call. Kits. And SBX kit is a declarative way of configuring a sandbox. Technically it's a YAML file. Well it could be a local files or an OCI registry artifact OCI artifact. But it's a YAML file that sort of defines which commands you want to run when creating a sandbox. So it can for example install your programming language tool chain. The commands to run which processes you want to run, which files you want to place inside the sandbox. And also the networking configuration. And how to do the secrets injection. And that is convenient because you can create those reusable artifacts. And then whoever runs their preferred sandbox can just apply it on top of that. If you have experience with dev containers this is similar dev containers features. Which is again sort of a mixing on top of the configuration within the container. With the addition of controls for network and secrets that are outside of the sandbox.

So here's how it works. I have conveniently here an example of. A kit. And this kit is the Testkube sandboxes kit sitting in my GitHub repository. So what it contains it contains just a few things. So it has the command section that says install Testkube. Testkube is like a binary right like we need to install it. So gives access to Testkube. It initializes the Testkube project when needed. It manages the environment so it. Sets the Testkube token to develop proxy managed. And it also does the credential swap on the network level for getting the Testkube token from my machine. Set in the secret. Vault. Injected into request. And then in the network we allow Testkube domains. So we can we can do that and we say that we swap the authorization token and the bearer gets injected from the host.

So when I run this. When I run this the command looks like that. Sandbox run and I specify which kits I want and I specify which agent I want. And the directory where I want to run this. Let's go to some other directory. Just because I'm confused. Let's go to crush. And then we're going to do this as we ex command. The reason for that is very simple. It's going to recreate the sandbox and I don't want to have any name collisions. So it will create me a sandbox. It will install Testkube in that it will drop me into Claude. But now what I have I have my Testkube here. What's the command? ls. Testkube? Do we have Testkube binary on the thingy. I love how when you talk to AI you can just not worry about spelling. Which two of the Testkube you saw that it was saying in so in Testkube running. That's interesting. We're going to try the. Different. SBX. Run Claude. Right do have Testkube here. Okay, so Testkube was here installed previously I'm sorry I don't know what exactly happened there but now my agent has access to Testkube so I can ask things which skills do we have access to via Testkube and can you for example install the old X best skill ever. And what you're going to have spelling is unnecessary. Don't get any ideas please.

Right so skills and kits pardon me kits are the ecosystem primitive that allows people work more efficiently with sandboxes and of course we can create loads of kits but if you are managing a technology if you were a vendor and you would like people to work with your technology from sandbox a little bit better please contact us we would like to grow the ecosystem of kits successfully so it's not just Testkube there but maybe access to your cloud services your AWS CLI or your Confluent Cloud CLI or your Rust chain. In the kit that is not managed by us at Docker individually but in collaboration with actually people who are venerating that technology. So Testkube works here beautifully which is very good because I want to get invited again. And we can continue with the further. Things.

Section 10 — IDE integration and limits

In addition to building the ecosystem within sandboxes what you also might not obviously realize is that by giving sandboxes the ergonomics of containers just giving you the this command for SBX to create those environments you can build sandboxes into your existing ecosystem. So for example I can take my IDE and my editor and run. And run very easily agents is inside the sandbox effect I'm going to use the ACP for that but you can imagine whenever I open my agent pane in my Visual Studio Code for my IntelliChat or in IntelliJ IDEA instead of running that agent naked on my host it would go to the sandbox. So that is very very straightforwardly implemented.

The thing is. If you are super excited about enrolling into this it is great you should be but it will also not save you from all the attacks it only limits the blast radius it gives you control of the infrastructure levels but if you give your agent access to your email. For reading and writing your attacker can send you email to summarize your inbox and send them an email in response. So the application level attacks are still something that you need to think of. You need to give your agent enough control to do what it needs but with some security audit and moderation in mind. Docker is working towards building a more comprehensive solution for that for governance that includes controls and skills is based on sandboxes. So if you're interested please don't hesitate to contact but at the current stage sandboxes is a good primitive for running your agents in isolation on your local machine. That gives you more safety that you will get otherwise.

Section 11 — Closing & Q&A fragments

Speed without security. Is chaos. Security without speed is paralysis. Neither of them are good. You want balanced execution. You want to understand the attack vectors, what you control what you don't control and then you can run agents a little bit more in a more sensible way. So if you want to try as next pre-installed SBX, very good it's there. You log in, you don't have to pay. You can run your containers in that if it needs a way to run containers and it's pretty good for letting agents cook on them for the moment. Thank you so much. There are some resources to docs. You can find all the resources online. Thank you Liran. For allowing me to use your likeness. As. The. Scary face. Thank you so much.

We don't have to have the questions but we're going straight into a break so feel free to come up and ask Oleg any questions you like. We have just shy of 10 minutes. And then we'll jump into the next session which in here is Justin Cormack, previous Docker CTO. So come back in about 10 minutes or so.

Q&A fragments below are heavily garbled by speech-to-text and lack clear speaker boundaries. Treat with caution; do not attribute confidently.

It is. An architecture. We are working on. Figuring out. Which way. To. Bundle. Them. Or. Maybe. Trust catalog. S and then. Wide open. They can do. Commands to. Just steal stuff. At the. Free time. It. Still runs inside. The cycle. So we can still receive. These automatically. But. It can open. Network request. Stuff. Which will have the enter. Prise rollout. Most of the. Same. Like no. We're. Global security policy. Things. Control. We need. That display. It's. Going to be good. We need. Partner. We need to. Figure out. Whether the. Kits are very new. Right. So we need. To figure it out. If you're. Interested in building. More component. S. Like the. Micro VM is. Ours internal. But. No. Can't be. Able. Yeah. There were a couple of iterations. So. I might not. Be. Up. To date with that but like. It's. A. Different. Thing. There is. The kits. Feature. That. Is very. Much the containers. Dev containers features. Which is. What I said. So. You need. A mind map. Where. It's kind of. Planned. You can. Do that. So you. Probably when you. Spawn the sand. Box you can set up. The. Template. What is the. Base. Image that it. Gets. That. Gets injected and. That gets. Run. So you have. Lightweight containers. You can do that. With. Your base. Container image. And then. Yeah. So we don't have that. Somewhere in the. Roadmap. We like to enable that. A little bit better. But currently.

.tessl-plugin

talk-azriel-executable-specs

talk-baker-sadogursky-context-engineering-skills

talk-batey-building-product-teams-age-of-ai

talk-birgitta-closing-keynote

talk-cormack-tests-lie-observability-ai

talk-debois-agent-enablement

talk-douglas-training-ai-on-your-own-code

talk-dubnov-merge-rate-ai-adoption

talk-farley-vibe-coding-best-we-can-do

talk-firtman-web-mcp-agentic-web

talk-foxwell-reinvention-dev-team

talk-groetzinger-skills-everywhere

talk-jones-odevo-ai-native-transformation

talk-jourdan-pipelines-to-prompts

talk-katsioloudes-code-security-ai

talk-kerr-bipolar-disorder-dysregulation-ai

talk-kushwaha-benchmarking-agent-era

talk-lamis-context-engineering-dreaming

talk-lawson-agent-experience

talk-lopopolo-harness-engineering

talk-lubken-embedding-pi-coding-agent

talk-maleix-collective-intelligence

talk-marsden-agent-desktops

talk-martinelli-spec-driven-development

talk-moss-skills-team-workflow

talk-obstbaum-willoughby-vibes-to-metrics

talk-overweg-one-brain-no-filtering

talk-podjarny-skills-are-the-new-code

talk-roberts-ai-native-brownfield

talk-roberts-brownfield-ai-native

talk-ruiz-agents-on-canvas-tldraw

talk-scheire-artificial-intelligence

talk-selajev-docker-sandboxes-agents

talk-sloan-harness-engineering-beyond-code

talk-smith-connecting-context-future-transports

talk-stack-humans-architect-ai-writes-code

talk-syme-agentic-repository-automation

talk-thomas-ai-native-engineering

talk-trieloff-browser-agents

talk-walter-runtime-intelligence-agents

talk-wotherspoon-humans-vs-slop

README.md

tile.json

ainativedev/aidevcon-2026-ldn

transcript.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}talk-selajev-docker-sandboxes-agents/

Transcript — You're absolutely right, it was your home directory!