AI Native DevCon 2026 London — all conference sessions as interactive skills
66
83%
Does it follow best practices?
Impact
—
No eval scenarios have been run
Risky
Do not use without reviewing
⚠️ Speaker labels are absent. This transcript was provided as one continuous block with no per-speaker markers. The talk has two named presenters — Simon Maple and Baruch Sadogursky — plus audience interjections. Do not invent attributions. Where context clearly identifies a speaker (e.g. "hand over to Baruch", "This is Baruch", first-person references to Tessl/JFrog history), attribution can be inferred but should be hedged.
The transcript also contains substantial speech-to-text noise (e.g. "Macey" likely = "Maple"; "Sadly mansource" likely garbled; cat-themed slide-deck jokes). Preserve these verbatim — do not silently correct.
The final ~15% of the captured audio is hallway/lunch chatter and the intro to a separate Anthropic managed-agents workshop. It is not part of the Maple/Sadogursky talk.
Yes, Derek is downstairs. We swapped his rooms and this. Workshop. That's not a workshop. It's a deep dive. It can be a workshop if only. Yeah. So. What I'm going to represent ourselves in a second. First I want to show you something. I want to show people something. Yeah, I want to give you guys a little bit background as well. So at Tessl, we've been trying to turn our own software development lifecycle into full dark factory. How many of you are kind of playing with something like this as well? Yeah. Cool. No, I thought it was just us. So we made this up. Ourselves. So we've been focusing on the part of the process that is linear issue to merge PR. And you might call this like an orchestrator. This is what we've kind of been working on. And this is what we're going to try and attempt today. Using just context. And we'll explain a little bit more about what that means.
So. To go from an issue to a merged PR, there are many steps. What are like, just like chat now? What is your process in your head of going from an issue to API? Okay. Get one. Next. Planning. Planning. What do you expect in like in the VR in the issue? What I want to see in the issue. Some identifier. Right? What else? Requirements. Have been. Tests, CI, verification, I guess. Yeah. V steps for reproduction for verification. Yep. What else? Think simpler. Like think about the steps. What would you do if it was 2019 and you had to do this as a human person? Go to your source control. Right? Open an issue. Like not of any issues, sorry, like bright code, the fix, write the fix. I guess. Right. Yeah. Check out a branch. Yep. Write code. Noble idea. That's even scarier. Evolve. Well, evolves for code. Yes, exactly. PR reviews. What if it's not code? What if it's like a readme file? You don't need test for that. So it's kind of even documentation. What do we need? To. Review? Someone to review it. Yeah. So basically. Yeah. And maybe like a bunch. If it's a documentation. Then just do nothing. Just thank you for documenting it. Oh, here you go. Yeah. Which is different from code. And then let's say everything works perfectly. What's the last step? Bird and deploy and deploy if all green. Yay. I think we have good processes. I think we have a good process here. Like I've put numbers on these, but this doesn't matter. That doesn't relate to the order. It's just the things that you call it what they want to do. Like we want to document the process right now.
So what I want to get out of Claude right now is like a big. Mega prompt. That's going to take care of all of the things that we mentioned. And if you think about it, like when we use agents, think of this dark factory thing. It's kind of like we've got an agent, we have an ultimate goal. This stuff in between is really what we're talking about today. And a giant mega prompt is one way that we could achieve that. Realistically, like the models are good enough. So I'm going to say give me a mega prompt that will help me take a linear issue and turn it into a merged PR. If anyone would like to race me and type. Hey, let's go, GitHub instead of linear because we can do link github right now with the workshop and linear maybe forgot about this. Okay, github issue turn it into a merged PR using. All of the. Steps. Above. Perfect. Cool. So we're going to get a mega prompt out of Claude. In tokens. So I was having some issues. Yeah. So I don't know. So today I want workshop obviously Claude Henna knowledge because this is how we roll. It would be not interesting otherwise. Oh, okay. Look at that. I'll get to all that into a single usable prompt and save. Okay. Done. Or it's just me. Okay, great. Okay. I need to. Dig into something. Go for it.
Let's start with this first line. Just thought I'd make it ignore the errors. So you were a seasoned world-class senior quality engineer who's 20 plus years hands on that experience and suffered. Just raise your hand if you've used like a line like this. URA something. These don't really work. These are a little bit of theater. There's also a lot of what I called begging. Usually like all caps is kind of a smell for begging. It's like please do. I capitalize it so I'm serious. Make no mistakes. No mistakes. Yeah. A million ARR no mistakes. Well make no mistakes works. I don't believe otherwise. Make no mistakes work. One other thing to point out and then I'm going to hand over to Baruch. So we have a conditional in this mega prompt. If you know about how LLMs process input. They don't do branching. When it comes to like imbibing a prompt, it's all or nothing. It's going to read all these tokens or none of them. So this conditional is useful for telling it what behavior we want. But if this is a documentation to get, we're going to read this entire branch anyway. So all those tokens are going in regardless. Here. It's a month rot. Come on. Okay, step one, step two. Okay, so let's see. Step two, if it's a documentation ticket, I mean just like don't summon bothering your human spine. If it's a code ticket, step one before you write a single line of code check if project actually has tests. Okay, that makes sense, right? We want that. If there is no test, do not even start implementing and don't do it. That's great. Once you're confident the project has tests, go ahead. Now run the tests. And step four open pull request. Step five and this step everybody forgets. You have to second compiler to review the pull request. For conduit download whether I'm from human, we use copilot reviewer and then someone through github graph like API so that we have a pros of how to use graphql, like put the word bearer following space and then like, you know, talking that this is good. And right. So I mean what can possibly go wrong and the body request JSON, it has query field. After you summon review, you can also continue because if you do not come back instantly, you have to wait for it. So pull for it like this. Waste about 10 seconds. Send another graph in our query. Fetch your face. Come on. Once commander, I read all the comments about any comment. Put any changes. You have to read a single line. This is brilliant. This is actually this. Will work. Right? I mean probably most of the time. It's a prompt. Okay, this is everything. Follow all of it. Make no mistakes. Take a depression. We used to do all this by hand cut. We did. We did. We did. This is like. Suspiciously complete. It isn't specifically than all what I will say about that. All right.
Okay. So this is Macey maker. This is Baruch. Source here. And we are our social credentials. Our bottom of every slide because you will feel a certain urge to praise this workshop on socials. And boom, it's right there. So you can easily do it in whatever. Once I was a cat. Pure engagement date. This is a cat themed slide deck purely for enjoyment services. You're welcome. Also one of us, I will let you guess. That. Look like this. Do what you singing. Cats. One is called any Russian speaker skills. Okay, so because it is ears, which is in Russian and the other is. Fail because he has Tessl. So you have to join us in our journey to a better context engineering. The most important slide is this. The QR code right there is for everything that we are going to talk about. Including all the demo that will emerge out of this talk and everything else about this QR code will appear once more in the end for skeptics of you who think that it won't be so awesome that it requires the scanning right now. There are two people like that in the crowd. And this QR code is going to disappear in five. Three, two. One. Half. Quarter. And chaos. A. Excellent. All right. So just housekeeping, as Baruch said, this is. Like kind of pushing the boundaries of what a workshop is supposed to be. You can follow along if you want to. You do not have to. The purpose of this session is just to give you some ideas about how to treat context. Properly as real software as we believe it is. So we hope to inspire you. If you want to follow along, you're going to need probably Claude Code or agent of your choice coding agent of your choice installed and ready to go. Ideally like some kind of sandbox repo like a little hello world thing that we play with. And our Tessl account plus a workspace setup would be ideal. We will talk a little bit more about where Tessl comes into the process. But I think that will help a lot if you're trying to follow up. Right. So we are in exactly the same boat as you are. We have nothing prepared. This is how you come to a proper workshop. So Claude will, and in your presence open up in you repo with your word and we're going to start a lot of tickets on how to improve and then see what are going to happen with no feet. All right. So shall we go for it?
Hey, do we want to do a first weapon file first issue and then point our mega prompt on it? Let's find out. Let's see the problem actually. What? So what we're going to do here is we're going to open a new Claude. And. It's not on screen because the slides are on screen. The slides are. Okay, so that's the page, right? And the slides are right up. So and I'm not logged in. So they should be fine. It might be a browser. What does. It say? Okay, we'll fix it. Go get the slides. I don't know what happened, but we'll fix it. Okay. This code and am I in the right directory? Yes, context workshop. Okay. Let's create hello world and art and open. An issue. Yeah. I have no idea what's going to end here. I don't care as much. Because. Whatever good or good. It doesn't have any context. It's a brand new. Everything is super authentic. Your actual request is simple. So two things create a word and open gamma machine improvement. We have that problem set to be referencing that. Yeah, exactly. So now we can, okay, it has hello python. We have got right here. Okay, so we have yellow python, which is amazingly lower in python and we have an issue. So if we go to jar github. Right, let me make it too much. Water is it's a workshop here in a workshop. We have an issue. Boom. And this issue is improving our. Suggestion, accept name argument and tests at basic CLI, bootstrap acceptance criteria. That's a corporation. And what I'm going to do now is should we go to place or point it to, we can point it to the. Front gauge. Let's see what happens. Also remember all of the steps we kind of told it to do. The models are good. Like they should, they should understand everything that we want to do. Look at the tokens consumption. Now. What it does now is actually parses the entire promise into context. Because prompt is all or nothing. Also time. It's the cycling. Also takes a long time. It is extra height, I believe. We'll check the model but it should be like delayed conspiracy or on the latest thinking. I draw all the money on it because it's a genuine issue that requires like all the llm powered together. Just for context, please like if you have questions as we go, please ask. Oh yeah, I'm trying to get this in interactive session. I'm going to be a bit of a great chorus to Baruch, but like please. Shut the car. Yeah. If we try to follow along. Just clone the context workshop AI deployed on London 2026 repo what you're doing locally. Or let me commit the prompting view real quick and let me go. And then we've got a Tessl account. Is there anything else we need other than our own collection? Just a workspace in Tessl where you can upload skills and set off emails like that. Ideally yes yes exactly. We are going to move. Like kind of fast which is why I said at the beginning that you don't have to follow. Yeah I mean this is just pure fun and most of the things are not something super old to just follow along or you can actually type it with us which makes it fun. Ny 8.8,000 tokens. That's a decent amount of money for like improving. It's good that Tessl for having a name for actually yeah I mean it's python I couldn't do it by hand but even if it asked me to because I'm a drama guy is showing here but. Yeah. Look at it look at it PR2 is open now to step five it's actually go through the steps look at that. Is it's pretty impressive Old Test passive so it actually added text I guess right so here let's see what it does. Okay you asked me. So issue one makes first asking progress Turn prompt does come first. Look at that it added actually test for hello work that's very important. Confirmed that has failed because nice. Fail is expected. Now implementing the feature it has the flag now now running step test again. Drew is offense five request for pilot review okay let's see what it does. So first of all it didn't use graph right it says like no not going to do it instead I'll use. API GraphQL it uses the binary that is installed on my machine which I didn't allow but it didn't ask. Well nice of it reviewers available now I request review will request reviews user ID. Check okay now it's gonna do that's burn the tokens step five succeeded compiled reviewers now reviewing and now it's going to do what? It's do pulling and now I guess we wait for the copilot. Okay let's see what's going on in each. Thank you for the do think about actually doing something. Yeah. Obviously pull request here we go summary at Sadog criteria compile they are reviewing. Let comment. Actually requested changes for. Oh. This is excellent. Look at. That it's ripping apart our mega frogs yes okay so what's going on okay now it reads the reviews it's going to be very disappointed. If bringing about our no it's actually have comments on the on the real thing on the code yeah default name world is duplicated between greed and the earth phase default. I mean you've never seen anything like that like a three phase review in the whole world in python. It's as extraordinary and this is like I wonder how much it cost between this and what compiler costs. For fiscal world. Less than one hourly review like that. And one flag stands out. Includes prompt which are deliberately didn't come in it's a race condition without commitment. For push because yes because that's our own okay it's going to be fun okay so okay you got the idea right it's going to eventually it's going to work. Okay so. Yes sorry. It's a proper. Question pushed it right now. And like both compilers and my agent are both mad at me for that time. Now I think I think it's fine all three round to future branch yes but I think now it should be. Now it was automatic green branch but I don't think it's. Yeah it's here okay fine.
Okay so what what can you do better I mean that was hard myself. I was kind of a hot mess what can be. Well I think one big as you saw and the same applies. To software as well as we all know like one big mega prompt one big God component like. One huge piece of responsibility probably will get the job done. But it's very difficult to maintain it's very difficult to expand upon. And the same is true for our context like this mega prompt worked. But it wasn't very efficient it used a lot of tokens it used a lot of money and I think we can. Decompose it and probably follow like some basic software principles to actually make this better okay just like a stupid question in June 2026 but who uses. One option is that one Henderson okay who doesn't use two? Okay we didn't get so you will start using skills right off of that oh actually right now because you were following along yeah. We have a skills version in the room and they're going to use their first skills that's everyone clapped I'm kidding.
All right this is awesome I'm excited for you okay so what I'm going to do now is we're going to ask. Our agent because we're not going to do it by hand identify. Skills. I keep forgetting that sorry about that but thank you very much and do remind me next time because I will forget identify skills in the brown indeed and extract. Them as kids. So there's a bunch of candidates for this. I think the first thing on my mind oh yeah let's let's make more precise yes yeah the first thing on my mind is that conditional where we're saying if there's a code change do this if there's a docs change do that. I wonder if we can say like actually just extract what I'm going to say okay. So what I'm going to say here is we have some skills. Candidates. In from. And we probably need to move this problem beyond of this project because yeah I will do it right now clinical next move the front indeed yes. I have. A question since my. College know. What skills are. That's a great question. To be honest like most of the agent harnesses know what skills are in them sorry. They provide that yeah I mean in Claude's case Anthropic kind of invented the skills paradigm so their bottles know what it is but it's quickly been adopted by the other harnesses who understand what it is. For. All of the. Fuel. Creating. Skills they. Get. More. Or less the harness. I mean there's been a blurring of lines I guess like does chat GPT know what well I'm sure chat GPT does know what agent skills are because they're kind of ubiquitous now but. Yeah. The history is like engine history October 2025. Remember this is where actually skills were invented and it's mind-blowing it's like six months ago now everybody know it in music a drop it was the ones who came up who was the who was the concept and it only worked for Claude for a couple months until the other harnesses kind of and it's in the balance in the CLI so it's hot coal it's codex it's it's whatever Gemini CI or whatever hunting ground CLI right now they are the ones that know about kids now what is so special about skill except of that now we will have two separate files that will. Like. Be called in two different ways so what is so special? What is so special is a tiny bit on top of this prompt which is called the front mechanism from leather it's a yaml piece inside markdown file because God hates And this YAML part has two. Items. The first is name and the second is description. The name is the name, but description is description is where the magic happens. Our agent. S are going to preload all the available skills. For them, but only the discriminators, only the inscriptions. And then we'll apply logic or reasoning, should I say, to determine which skill should be invoked when. Now what it means for. Is that the description is absolutely crucial. This is already quite a differentiator from like the big prompt we had. I mean, not only was the palm garbage, but it's getting sucked up every single time you open an agent session. So there is no conditional activation. It's in your context. It's taking up your context window no matter what you're doing. So this conditional activation for effectively like little reusable atomic prompts was kind of a game changer. So here we go. We have two skills. They are saved into dot Claude skills. Which is obviously. Claude specific location that was working on code that we use now, but we won't work in any other harness that you want. And this is the magic. This part is the front matter name and description. Now here's the description workflow for the communication only ticket only changes that only changes the rhythm files under dots and then it's going to file or plane fix with no source of changes. And then it has our flow, which is exactly extracted from our uber product. For example, do not sound water dirt. That was our part. And here is our cord ticket. Workflow for ticket detaches, real source code. And this one has the entire shebang, right? The whole nine yards with graph. No. Can I ask a question? Yes, please. Did you give it some context to put it in that format? No. I think the last time I tried to create a skill is created it as a command. So by now all the harnesses know what creates skill is. And there are slash commands that I can assign to skills. And they are optional. I can say invoke this skill using slash command. But I can also leave it to the harness to determine when to invoke it. And that's exactly what we need. This one will apply reasoning to decide which skill it should involve. Our. Workspace. Get these skills you're going to use. Locally.
So this is a great question. We didn't pay any money. And it's like that's the free advertisement that we get because we're so awesome. We're going to get that. Thank you for asking. Those. Yes. We encourage two cents of context with them skills, allowance. Is it that explicit? So if you put too much information into description, what it does, it will tell you good news. Now ignoring the rest of the problems. Beautiful. That was my next question that I will ask you. How do we know? If this description is good? There are two conflicting. How to the second strings. In the description. The one is if it's detailed enough. For the agent to know where to pick it up. The other constraint is we just said all the descriptors of all the skills are preloaded. So we need to keep it small and tidy. How do we figure that out? Yes. End of workshop. Things like. You. Must absolutely use. This. Skill. When. Something occurs. Or when. The user requests. Something. So. I think. One thing we'll touch on a bit more is. Skills activate pretty well depending on the harness. They never activate perfectly. We at Tessl have done a lot of work on like researching this and actually trying to get some numbers. And I hope no one from Anthropic is in the room. But Claude Code was actually one of the worst activators of skills. So no matter what you put in the prescription. No matter what. So even if you're begging, even if you're like, you are an expert skill activator with 20 years of skill experience, still not guaranteed to work. So I think it depends on the scope of whatever else you're working on. How kind of likely it is to activate. But in general, they're not perfect. We use a lot of slash commands. And actually kind of expect a gentic process calling other agents to also use slash commands. This is a little more reliable, even though it's a bit light. And meta. But yeah, it's a good question. You can actually turn into service. On or off. Can you give the same thought? The deterministic way to interact with scale would be to invoke it with its slash command, which is like splash. So if you've created skills that you don't want to just. Off hand. Le. In the conf. Erence. Codex marketing team, everybody. Okay. Okay, here we go. What I did, I asked Claude to ask Tessl to review my skills. We have a command that reviews the skills. To figure out this tension that we spoke about. How well it activates. Is a description to bloated or what's not. And this is what we got. The documentation ticket is 76, which is okay. I would say that 82 is also not great. So the description and documentation ticket is really bad. 68 is like terrible. Right? No explicit use when. All right. Thing on trigger facing is these keywords that the llm is going to use to know where to trigger. And steps are directional rather than pressed commands. That's a problem or prompt actually. That is like very prose, hey, do that. Do not forget instead of like live an iPhone. Okay.
So now we're gonna put those skills to the goodies. So first of all, I'm going to. Delete. Our mega prompt, right? Or should we move. Into my spirit, some people might find it useful for later. Oh, come on. What do I want to do? I want to do. This. And yes, So I want to. Do this. I will go here. And project and I want to go quantum workshop. I'm going to take this prompt and I will move. It. I will move it. I love this part of. Workshop. Yeah, that's funny. That's like, that's how we all. Here we go. Hey, no more. Okay, clear context. Boom. One. Side. And create a documentation. Improvement. Should we just say agree? Does anybody. Have a. You know, I mean, it's too much. I need to go and check if we make these and it's not come up. It's another idea for the community. Come on. So much work. So much work. It actually foundry. What did we find? Where did this comes from? Is it like. It's read me? I just moved the prompt. Ah, it's in the. Ripple. Okay. And just like that. Our context is polluting me. And okay, the computationally issue two sections prerequisite and what's not. How do I make it not inverted? Should I delete it from this? But it has the readmen added to the dot floor. Yes, yes, yes, yes, yes, yes. Yes. And to Claude ignore whatever it is. I've not had one call in order. If it's not a thing, it's going to create it, invented, fetch the CLI. Make it ignore it. It actually knows Granola is. That's cool. Today I learned. And I said, yeah, I did. Oh, okay. But I haven't used that. I really like that. So it doesn't rebuild your tokens. No court settings exist. Only skills will create a meeting. How do we look more in collects? Codex monitoring. Just uses git. No, is it the right thing? Is it always the gift or should we the agent ignore? Running yourself in the session. Oh, it's so much work. I'm changing the right command. Electric commands. Right commands. All right. I think it's done. Okay. Yeah. Fix documentation issue. What is issue number two? Yes. Is it number two? Maybe people open up a bunch of issues while we. Ignore number three. I told you. Ah, because they employ for example. We also have enclosed that. It's fine. All right, let's what I want to see now is activation. Right. So what I want to see is much smaller usage of. Okay successfully loaded skill. Colonomics. We have no idea. Right? It says. Oh okay I'm blind. The condition. Yeah, yeah, it should work life skill activation. Okay, it will do that. So what did we learn? Skills are awesome. They are activated properly. And love is good. Job everybody. Next. Conditional activation less complex lets money than the mega branch that's kind of our. Listener.
Now what else really really bothered you in the megaprompt? Yes? It's just too much. Things. Yeah too much things but that's it that's a fixed right the skills now we can obviously ask Claude now to tear it apart to proper skills it will be a lot of skills and then we'll solve the single the lack of single responsibility. What else? The field like make no mistakes in hope that it will do something yeah absolutely and then again put cloth to clean it up it was just pure fun obviously should be there but you all know that what else? Stuff that has to be automated? Very very good what does it mean what what like really what was really stupid? The pros. Of. The scripts. Right how. The. Comment was stuff that should be automated. Was there something with rough field in there yeah yeah exactly right so you remember that it was in prose on how to make a graph ql api code. It was like use token and like was too specific and not adaptable. It wasn't prose when it should have been. Code code it was code really improves it was absolutely. Bonker. That was very big gentleman. Yeah everyone here thinks your British I'm more because I have a dopamine. Yeah. So I think that. S a really important takeaway if you have been working in this towards this like dark factory dream that we all have you've probably noticed this yourself there are a lot of places that agents. Insert themselves that they don't need to be and I think in our mega prompt like we had a ton of kind of smells of this like you know use the key this way use the API this way use it in this exact situation. For this exact amount of time right make sure request a PR review check that. It's made comments check that it's passed check that it's approved it seems like almost if I want something predictable I shouldn't use llm and should use code. For the amazing idea. Something predictable like I wanted the exact prompt in the beginning. How did I get the mega prompt to be exactly the way I wanted it. I scripted it. Because I couldn't rely on an llm to actually produce the right prompt every time. So that was a script. Everyone I did like and that's a lesson for us today to our work with the llm every time we want something predictable we walk away from an llm and convert it to script shall we? Okay I remember to exit the presentation. I hit the front back right for. 8. Great I don't need the prone bag because we have the school okay. Find. Scripting over to in the skills. Everything. Goes to. Right. Okay all right now the reason kind of you know how agents and models want to do the right thing that you asked and sometimes get like over overboard you say with that. Oversee so what would be. The over going overboard with this request? What is demonstra. Tion? What the agent will script that it should. There is one very classic example. Creating call no I want the grade book but something that shouldn't be screamed that will actually try and make it quick. Working with text. What. People hate and agent love. Regex. Agents love regex it's like it's their go-to tool for everything. Now when you do this exercise. You need to be aware of this exact problem the agents are going to script. Working with text. To on TypeScript by using regex it will create stupid lists of allowed words and and like blocked words and then write absolutely crazy regexes and trying to guess what the text will be. And this is obviously wrong thing to do because we want the reasoning to be applying to reading text so this is how I'm going to be working order what's going on what language should be. Right back. Does. That like increase the token usage in the by regex sort of filtering out even of inefficiency? So generally scripts are great for degreasing plugin use right because I push everything to scrapes no and it's invoked scripts are direct invocations zero tokens zero money scripts are awesome the problem with the waves work is that it's going to start missing stuff. So imagine you have a skill that reads your emails and surfaces important stuff. And it's all on the left it's reasoning about hey this is a message from the bank about large transaction I should surface is important. And then you ask it to script it and it's like okay I'm going to come up with the list of keywords that will be important. It will have words like bank and transactions and this kind of stuff and then I'm going to outsource all of it for ratings what is going to happen? It won't be that great it will obviously pass a lot of noise. Include a lot of noise and skip a lot of important stuff. Because this is where I wanted to actually work. Here I'm ready to pay the tokens because I need reasoning so there is a thin line of where you actually need to be aware. In some cases the line is not as thin as my cake and you're finding this with the orchestrators that have come out so speaking of dark factory orchestration has anyone tried Symphonia? Codex market painting? Yes. So you know then that Symphonia is on a very far end of everything is agentic every single thing is agentic and this is the story of how I spent 60 million tokens trying to check if the PR was open or closed. With me thank you openai but I expected they'll fix this but it's an interesting architectural choice just burn as many tokens as possible as many tokens as it takes for me to be totally hands off. That's not really the right approach when it comes to these things right like. We don't have cutting edge problems that agents need to refrigerate every step we have problems like we've been having right check that the PR is there check what issue number this is is it open or close is review passed or not? Simple things all of these things should be extracted anything deterministic anything like you know two three four branch conditional just make it a script and agent doesn't need to figure out everything and for us to check make the dove change. Skill judgment perfect right this is exactly that it works in text it cannot be scripted branch plus commit plus plus pi script plumbing. A script human use don't summon the bottle this is a rule. Hold that. Marriage wants you and approves. Check deterministic okay. And human not both is so this is for documentation right let's see what it did for code and I expect what's the name graphql thingy to go into doors to manuscripts do we have already. Writes all the scripts and it's not there slow right.
Okay while it will do that. We probably have a video on my presentation. Totally absolutely real of video money so Macey asked if my cats can do tricks because we need to use it in our presentation so I called home and. Some of my wife like hey quick and teach us to press button so we can take a video and put it in our presentation and of course you need. Come on. That's a lie patching of powerpoint during June slides this is something that you never saw coming so look at that I'm doing live patching not only you learn skills you also learn how to work with the parts are really useful presentation skills exactly I have a video and I can ask the question yeah. So you've got a code skills do you have anyone on your team that doesn't use code and if so how did they make it incredible we're all agent agnostic I mean Tessl this is extra true because we want to make sure that our software works for every agent and we're also like interested to do emails all these things so yeah I am a Claude user actually we use Gemini we use OpenAI we use anything you can imagine skills work in all of these cases I think it's really important to have agent agnostic skills actually this is something that Tessl does this is why like we bend our scales to the agent agnostic it's also the case as you know that these agents are like you know one week one of them is the best and the next week another one is better so yeah we use all kinds of in the .claude folder. Or is that just for the stem one that's just required yeah so .claude is obviously Claude specific if you run Codex now in the same directory it have no idea that you have rules so how do you. How do you have the same skill for multiple it's a great question we're coming up to that right next but first is it your request with a pushy pressing the button pressing the button? So Ushi is a real cat is not video of him but he's real cat come on he can't write. A multi-million dollar AR app well it's it's not something that you can he cannot it is actually proven that he cannot he has tried. But he can push a button give it a condition he can push a button and we talked about determinism I want to talk also about smart models and dumb models. Who's excited about Claude 3.5 Opus? Yeah. Two points on the finance benchmark yes no I'm kidding I'm I'm excited too I'm excited because the frontier moving is always cool I love with these foundations a lot foundation labs do it's very interesting. But it's so interesting for us it's not interesting for our day to day. Because we don't have like I said we don't have genius level problems we just don't we like to think we do ask your therapist about it really don't. I use 3.5 for our hello world demo because 3 is perfect it requires this level of sophistication. But I think. We're getting to a point now where like all of our problems were solved actually many models ago Haiku can probably do most of what you're doing and this again feeds into the conversation about like token usage and cost and just general responsibility with the software that we're using and wastage. Like a dumb model and open model can probably do 90% of this workflow that we're talking about. Well that's absolutely true in this like hello world example but it may be true on your code base too. I think it's worth thinking about what part of this process how atomic can I make each step such that the dumbest model in the world can use it such that. A free model can do it. I really think this is possible and it's something we're going to be seeing a lot more of people kind of moving away from the expensive frontier models for stuff that's more trivial and it's because money right the price flying into your model are absolutely insane and don't make any sense. I mean the build that we will going to rack up for this demo is absolutely mind-blowing and it doesn't make any sense for this kind of work. How can we use cheaper models? So obviously the free models are free if you run them and not all of us have the infrastructure to run those models but we don't need to. There are at least two great projects out there. The one is called open router. The author the other is called liteLLM that allow you to run the free models for almost free. They take a little bit of money for hosting those models. But you get them for an absolute fraction of the cost on the frontier models and they can do most of the work. Now how do you know what can you delegate to those cheaper models? How they know? You asking who you. Re model. I'm not kidding this is actually good use of printing model because it requires a lot of reasoning to decide work and go well. But once they laid out for you how to get rid of them. All right here is our where is our table for the code? No, that's the summary. I want to see the decision label first. If I can find it. That though is not excellent. Okay. Talking about monetary. Okay let's see. It. Okay so five determinant script here you go run test we have run test using scripts open VR request panel review. This is our graph URL trial primary stuff resolves and summons await confirm with you the weight is in the script now and where it PR. OOK. There are test. S. How about. That? Which code we check if we test how amazing. Okay now this is also something interesting. Look at. Determinism war moved to script but I deliberately left something for rules. As a rule for it that's ushi thank you very much.
Skills are activating well is. Aspirational curve. Sometimes it activates sometimes it misses there are things that should not be missed. And those things that should not be missed can be put in another primitive that it's called rules. Are tiny prompts. That are going to be there at every time a conversation is sent to the model. No matter what's going on. Downsides of this amazing process. Where burning tockets right so what do we need to remind what do we need to keep in mind the road should be. Time but when we need 100% guaranteed of activation this will be the withdrawals should we get almost and our rules just a Tessl prim. Itive no so the rules are exactly the same thing as skills the rules are actually primitives that are per agent or per model and every agent model kind of treated a little bit tweaked a little bit differently. For our. Screen oh yeah I forgot it again you see I remember one time and I forgot the name okay so this is a well rules. In second that's our worst time. I might not be great yes yes so it kind of find a place for it oh yeah the one thing I didn't do is extracting rules okay. Okay I want me to put those in the let's do. Right so rules are you will see rules are much shorter than skills because rules are activated no matter what but rules are also important what went into rules? We have at least there it was. This is we had rules somewhere. Okay so here's an example of a you remember how we said in the connotation we don't want GitHub review that's that's a rule that should happen no matter what so few don't something about conditional routing this is our desire rule a should always. Okay so it will fire the now and we will see a rules and also. Yes. Every time. I. Believe is at the start of the session so repeat the question. Just to ask you yeah so yeah the question is how the rules are activated are there activated are they re-injected in every turn? And the answer is depend on all. The hardest on the hormones yeah I think in general in most harnesses they are loaded in the same exact way that like a system prompt or Claude with you loaded so once at the start so as with all these things like big asterisk the further you can get on into the window. You know your mileage can vary and again the harness my refresh it when it decides it's kind of too like burned out there the rules are activated period this is something that we know for sure. I. Was. On the. Opportunity. To create. Scripts. We didn't have. Oh there's a good fidget. So hooks is another primitive that is pretty new and very varied. Agent dependent. The concept of rules exists everywhere and kind of we understand how to do this we inject it to like 11d and the rolls are out there hooks are very harness dependent and they will be completely different for each and every one of them. Now you can and you should add hooks the classroom is back to the question of everybody how do we make it cross agent. So we're going to do that as well. Doing groups. So. Rules can be bundled with skills we're actually going to show you how to do that in a second but the idea that they don't count as tokens so they do they're kind of mandatory tokens skills. The only part that doesn't count or the only part that doesn't get loaded every time is the actual body of the skill the description will be loaded every time so that's also mandatory tokens. It's just taking a little. Less seriously I guess if you will federal by the agent does that make sense? A project. Where they will. Go. There you go. All the way inside the script. Get out here if you're repeating stuff exactly like if you're finding yourself repeating a bunch of rules maybe that should be its own contextual primitive in your project like that's a great comment. Yeah okay. Great pattern yeah great great pattern safety agent safety we all do it perfectly and here is our rule activation right there whoever mentions different prompt was completely was absolutely right. The project rules are allowed directly including. Now what does it mean that loads of deed that is Claude only you won the rules lows and Codex how do we let we put them in. Structure. And go for. Encounter. Rooms we're agents out of day ranking our rules agents of living deed rules of indeed every harness and it's all in their. Own little thing. Okay we have number five clear. Fix issue. Well it does that you can expect it to use rules to use the scripts and the skills now which is which. Okay. Sorry what is that. Quick question. About the rules yes. Skills. Are what's to say. What ignore. This. Is. From sometimes you can do what's in the process yeah so. It's really special. So it's just treated as a different type of primitive like within an agent's context window. The word prompt is doing a lot of work there and that's exactly how you can actually look up like you can look up Claude Code system prompt. I don't know about the other agents I'm sure they're leaked online somewhere but you will see in the prompt a place where it's like these are the scale descriptions you have available it'll just be a big list rules are activated in a different way you're not going to find them in the system prompt like listed out in the same way so it's just down to kind of the architecture of the harness yeah exactly right they they're given more priority because they are sitting prompt and not user. Content yes. This is another reason to keep them very short and very clear very short very clear and it's also kind of. Like. It's a good cue to think about what even should be a rule. Like if something is a skill description you think I actually want that to be a rule I don't know like how how fuzzy is it really? Kind of. Oh yeah absolutely thank you for calling me on that so basically our system prompt is now pointing to a rule directory. Yes. That's the way you load rules in what there is no other way. It is not a negative primitive because Claude doesn't support rules out of the box as opposite to Cursor Codex Gemini I'm not sure about Gemini they have native sports Claude doesn't have native sports rules that's the hack that we do for Claude 24 souls we put it in .claude and point rules those are rules that first scope now. Other harnesses a Cursor for example have conditional rule activation. You can say only activate a rule on a files.py or *.py only on Claude doesn't have that we actually try to scope it within our code within activation which is you know some optimal but it is one of those so scope any ticket detached source project is noticed give a continge nothing to scan therefore before implementing optiget you must run the latest ministry gate. It will run the script. We're good? All right.
Now. Do your thing. He was asked again and again. How do we make it cross age. How do we make it distributable. How do other people on our team can take advantage of all this goodness. You might be thinking you might actually do this now. I can just commit this stuff in the github that's a place where files. Go right. That's the place where sources. Go. Now how things that all those artifacts the skills the scripts and the rules are just like source files text files that are fine to put them in github. Somehow like this quarter okay I have good news for you there now context. Is an artifact the primitives that we spoke about the rules the scripts and the skills. Have to be bundled together as an artifact because they only make sense when they work together. And the only way to manage an artifact is not putting it on guide who commits ambient baggages to github? Thank you who commits jar for almost to github? Thank you whoever don't do that won't do it in this context artifact as well. We have a register for that. I mean. You want the version. Controller somewhere. Says sources are going to gihan. Package artifacts are going to register. We discovered 25 years ago. Please keep doing that that way okay please keep your sword skill sources. Script sources and rule sources in github where you tag them when you package them use registry and then you will have a distributable piece of software that works just fine. Right how do you do that? Amazing how do we do that? Both of us are Tessl employees so you should know the answers. But. In. Headache ending the world. Turn the 16. Th deterior. Ate. Yeah right right. Yeah. So so the question is about references and skills and if you haven't seen this before it's like a skill might say a skill might kind of be the top level instruction and then it also contains like a reference just folder that has sort of sub instructions. We discussed it like there's a there's a recursion kind of problem with this talk decomposition exists everywhere the same is true within an actual scale you should turn workflows into like atomic skills those skills should also have atomic parts I don't think we're going to cover that but it's just it's one pattern to be aware of when you write skills and it's supported in Tessl. And should be everywhere else as well. To. Mato. And. Everything that. This reference. Is. Always. Red. We have verified that they are not always red. Yeah it's something that we've experimented with actually if you were a user of Tessl like in ancient history like four months ago you may have seen this we just like wipe out your agents.tv for you and kind of make your agent behave in a certain way we found that people really did not like that because we you know hyper personalized in this age but yeah that's something that I think you probably should do especially if you're working on like a production system. Any place that you can try and cross your fingers for determinism. Do it. Experience like that was not right in most cases made this week that bundles all ID versions indeed before it's done for work wow interesting that's kind of a good idea actually anyway we'll continue asterisk everywhere. If found how to do that and we have what context workshop. Yes. Sorry my experience with references and I'm using Tessl is that when I use them. We never evals go down. And maybe I'm using them wrong so is there like documentation page that give us solid example of reference usage that I can get inspiration from because honestly I asked Claude to create the references on the feet. Upon suggestion from Tessl client itself yeah but I don't know it didn't work for me. Yeah if you yeah skill review will help with this you saw verb do this when we make skills earlier that's just looking at like the body of the scale to see if it's kind of well structured if the instructions are followable etc that will call out if there are paths it does it does okay is that what you mean by evals go down oh you should put this in a reference I put in a reference I published and they go. I love it not frustrating at all the very least before we should we should have we should have a documentation page about that that's a great idea we will get on that all I can say is. Claude loves text if you ask Claude to write references for you it's probably going to be. Pretty wordy unless you put a lot of work into making it much more information dense. Might have to change the scenarios as well that's true. Yeah we could like talk about that as well we can have a dig into that at the have you guys been to the genius bar? Yeah yeah sounds interesting. Okay we have a valid here. What it actually did now let's see what happened right so we're going to do that so we have a design plan that has a plugin json right yes it's a JSON descriptor like your npm json or your package.xml or whatever it has name with the workspace version a description list of skills. It doesn't have list of scripts because scripts are directly referenced from either skills or rules so they are not top level premiums what are you going to have hopefully in the next couple of weeks is another primitive here which is. Hooks. We're going to have hooks here that will be as the rest of the staff absolutely ancient agnostic and our CLI will find a way to plug those hooks into the correct harness on the generic on each and every model each and every agent. Which will in turn allow the scripts to be the more powerful because checking for tests now we're going to happen before something happens on the engine side which is Samara making it faster fail fast that's exactly how we like. I'm very excited about adding hooks but that's kind of I got promise from our product people who know never. Will tell me no it's coming down my piece that is going to. Still. Do it for. Like. It bundles everything you put in there so Tessl plugins we call the kind of whole context artifact plugin which. Is other companies use this term as well so that will include rules scripts marked up references yes exactly. Okay and now we make our we look to it looks amazing we make it public and we publish now this is exactly like publishing any other artifact it's Gran publishing the indoor Tessl we are going to see it right there in a second. I'm publishing from my machine. Any software engineers in the room. That's where you go like reading upset with me. What we should do instead. Probably properly do a sea department. Right probably. And okay, moderation reviews improve it so you don't see anything it will clear the migration in second okay it has quality of 18% the overall quality is 70 what you want to talk about why it's not 88 and this is our content moderation path you can see all of it you can see readme and info files you can see our files you can see here our rules our scripts and everything works daily now the beautiful part of it is that now it is distributable we have this tiny command on the right side right here now it's angle Tessl I for install j bar of context workshop. Can be version can be latest it can be exactly as software artifaction and Sadly mansource. Well 12 years on jfro will do that to you. Yes absolutely yes it 100. Okay now. Yes. So. You. Show. That. Skills and streets from repository as a mining heater in the suburb it was very observation that it was a random test isn't for the sake of skills sorry I wasn't clear thank you for calling out skills are in the plugin they are just not listed in the descriptor. Is. That they are specific. To the. Repositories that should be for example at least the test. Is. My repositories. The plugins or do you just this is just for the sake of demo you can. And you put in group such scripts in vlogging. I mean it is for the sake of the demo but in general like the use case one of the best use cases for Tessl I think is teams working together you may not you may have a skill like this that has a special script for your monorepo. That you don't want to publish on github right that you want to actually version and treat as a proper context artifact that you can share with your team privately that you can always have the latest version of so it's it's not an antipathic at all to publish something that is very specific to one project it just depends how you then distribute. We have the private library here which is two by default so we want so you won't leak your own private tests and stuff we can limit the you can limit your workspace. You can mimic your workspace. To be where am I saying it here to be. To prevent public publishing and then obviously you can do like private. If you want to use general scripts that are available for everybody and will work on any report for example calling the copilot like GraphQL is a script that will work on any on any repo to get included in a public plugin without any problem. You can include a limited list of file extensions. So obviously the rules and the skills go in except of that we have a set of file extensions that can be bundled and it's mainly the script so it will be your python it will be on task three it will be your bash it will be your PowerShell windows users condolences but generally yeah if it's intended to be for scripts. Okay.
Good stuff who attended the keynote yes reward. Okay. Everybody very good what do you remember from what guy said like the one line. Skills are the new code what happens to code in three months from now? You wrote some code you didn't touch it for three months. Thank you. If. Could is convex is a new code and we just disabled the test code which is very good for us. It draws like. In actually first of all. Because. Everything is going forward. Whatever activates today will be activating very differently tomorrow. What we do is that. Idea. S we create context like code we have almost part of our context artifact pipeline. And we have tests. When we have context what do we have instead of so why can we have tests for context? We can. Test our assertions true or false? Context is. Llm non-deterministic mambo jumbo. Sometimes what works differently can we assert stuff. We cannot what we can do instead. What are the means. What are invoice. What. Are evals? Come on. Macey don't be a work. I'm agentic now I don't do. That. Yeah scenarios what else how do you what do you do with the snare. 15 to 20 it looks now. And how do we encode our human expectations. Rubrics. Maybe scoring criteria. No wrong answers in the room by the way but we're going to show you kind of the Tessl flavor of evals. Yeah so eval basically work in a very simple manner. We run the same question. Without any additional context. And we run the same question then prompt basically with our tragic now example. And we compare the results if the results are better with context that means that yay we did a good context. How better is defined. Non-deterministic it is we ask. What is one sentence it's called align as a judge and we basically ask a model hey. Are those sounds better than no sweethearts and it will tell us yes it's better. Buy six. Points. We run again and again the different result. It's well about three points. Why kulambs. This is how evals different from test. Evals are. Is it generally better. Most of the time? And that's all we can hope for for non-deterministic miss code. E now what's the nice thing about it is that we have a framework that will take care of you for you and Tessl can generate those scenarios for you and can actually run the evals for you and assert them for you. So one generates I will just show you an example I just did the interview with I think his name was Jake. From. Google open source not really complain but we discussed how the situation with open source is in the age of air there are tons of very bad quality PRs that the open source containers suffer and they said. Yep. And I said here is an example for you of money that improves the. Teacher's stage on how to behave in open source check for if API configurations are allowed check that you have to disclose that it was AI made PR or not interesting part here is import. So for example this is evaluation that I run on Tessl platform absolutely fortunate for you and here are stuff. There is a rich repository in that have an AI ban. And here is an example of. What's happening now is that a judge. Runs a test. It tells an agent to open an issue in this repository. If the issue was created despite the I ban it will great at zero. If the issue wasn't created because the agent discovered. The AI ban it won't score the hammock so far so good make sense? Now we're running twice we run it on vanilla. Agent. Run all through the air band and created an issue. Godzilla. Installed Tessl plugin run it again found the ibup. And explicitly located in quantity eigenvision band from code of content demonstrated and found and understood the policy and didn't create the 100 that's an example of an eval that will generate 100% lift. Absolutely horrible without evals absolutely perfect with that's like pretty significant example. But there's the whole idea okay now it will create evals for you yes and since we are at time yes go for it. Yes all over. That. Is testing the actual markdown file so if there's anything remotely sketchy snake is going to flag it there again like in this case it's an advisory warning you know not an emergency so kind of up here discretion. Doesn't do. It does that's exactly how it is it does injection testing and the warning till you get here is possible from injection SNYK has a new generation of stars that context for this kind of stuff we have the sneak scanner building and it will actually check the plugins and not only the code. That's exactly what we do okay so.
We spoke about eval generation running okay so here is a point that I want to make and that will be the last point that I will ship you off with. The process that we did right now. The learnings that we have right now what are they. We should use skills. We should do the rules we should use scripts and hooks soon enough. Those learnings should be. What? It should be. We can quantify all this knowledge. We can write skill. That will tell us. Analyze our prompts. And if you see something that should be rule. Make it rule. And something that should be scriptable negative script. And then we can use this plugin that has this knowledge. And use it in our CICD. We can take for example github a agentic reviewers. Install Tessl plugins in them. And make your code review aware of your agentic policy. Your current view will just say. This CI should not pass because there is code in prose and it should be skilled. Because Macey taught me that whatever is non deterministic should be. Whatever is deterministic should be deterministic. Like it as the supine install it in your CI. And the knowledge of this workshop will become. Policy that will be enforced on your code. How is that gives me in the middle of a good answer. Right so the policy for not the trust part is itself a context arch. I hope. This was useful you guys we obviously can talk about this stuff all day I mean we work on it all day so yeah there's a lot more that we were wanting to cover. But I hope you took away something useful that you can apply to your own kind of context development practices feel great and come talk to us at any point for the rest of the day we'll do a good Monday important stuff. You do this you try these workshops on Monday on your workflow what are you doing Tuesday you pack the policy into one deck and install it as your unpopular imagine he was your boss who's that the QR code as promised is coming back normally all scan date but if you didn't trust will be as good as it was that's the next opportunity that's the that's the URL I'm going to make sure that slides are public right now and the recording will be published when I'm back on a good internet which means Monday. Where are the final. Thank you cheers.
The following is captured ambient audio from the room between sessions — lunch logistics, mic check, and the opening of a separate Anthropic-led "managed agents" workshop introduced by Ashtrid/Sarah. Do not use this content when answering questions about the Maple/Sadogursky talk. Included verbatim only for completeness of the source recording.
Coming up don't understand no we haven't seen. It every six months for drinking yourself. In Jamaica. [...lunch and logistics chatter...] All right is lunch lunch. I feel like it's one day yeah so it will be. 2 30 yeah. [...] Give a shout out to the work to the sponsors again thank you very much to Tessl therefore Snyk. Paper cute from new code autonomy AI receipt. Could you earn any shift resonance so thanks very much all of them quick round of applause for the sponsors. Right okay so hopefully you are here and ready to ship your first managed agent [...] My name is ashamed of my team and for big and I work with startups [...] today I'm really really excited to be talking about one of our newest products which is called managed agents [...] So let's briefly discuss about what we are going to cover in this program. So we do have 90 minutes with us there's plenty of [transcript ends mid-sentence]
.tessl-plugin
talk-batey-building-product-teams-age-of-ai
talk-birgitta-closing-keynote
talk-debois-agent-enablement
talk-douglas-training-ai-on-your-own-code
talk-dubnov-merge-rate-ai-adoption
talk-farley-vibe-coding-best-we-can-do
talk-firtman-web-mcp-agentic-web
talk-foxwell-reinvention-dev-team
talk-graziano-spec-driven-development
talk-groetzinger-skills-everywhere
talk-jones-odevo-ai-native-transformation
talk-jourdan-pipelines-to-prompts
talk-katsioloudes-code-security-ai
talk-lamis-context-engineering-dreaming
talk-lawson-agent-experience
talk-luebken-embedding-pi-coding-agent
talk-maleix-collective-intelligence
talk-maple-ai-native-devcon-welcome-slick
talk-maple-ai-native-devcon-welcome-spec-reviewer
talk-maple-aind-devcon-welcome
talk-maple-context-engineering-skills
talk-maple-continuous-ai-github-workflows
talk-maple-harness-engineering
talk-maple-tldraw-ai-canvas-experiments
talk-marsden-agent-desktops
talk-martinelli-spec-driven-development
talk-moss-skills-team-workflow
talk-overweg-one-brain-no-filtering
talk-podjarny-skills-are-the-new-code
talk-roberts-ai-native-brownfield
talk-roberts-brownfield-ai-native
talk-scheire-artificial-intelligence
talk-selajev-docker-sandboxes-agents
talk-sloan-harness-engineering-beyond-code
talk-stack-humans-architect-ai-writes-code
talk-stoneham-product-brain
talk-tal-skills-security
talk-thomas-ai-native-engineering
talk-walter-runtime-intelligence-agents
talk-wilson-cq-stack-overflow-for-agents
talk-wotherspoon-humans-vs-slop