CtrlK
BlogDocsLog inGet started
Tessl Logo

ainativedev/latest-aidevcon-speakers-london-2026

AI Native DevCon 2026 London — all conference sessions as interactive skills

66

Quality

83%

Does it follow best practices?

Impact

No eval scenarios have been run

SecuritybySnyk

Risky

Do not use without reviewing

Overview
Quality
Evals
Security
Files

transcript.mdtalk-groetzinger-skills-everywhere/

Transcript — Skills Everywhere

⚠ Speaker-label warning. This transcript has no per-speaker labels. The opening ~1 minute is an MC introducing the speaker. The body of the talk is John Groetzinger speaking. The closing section (from "Thank you, John" onward) is a Q&A in which audience-member questions and Groetzinger's answers are interleaved without labels.

When attributing, prefer:

  • "the MC said..." for lines 1–8
  • "Groetzinger said..." for the talk body (lines 9–366)
  • "an audience member asked..." and "Groetzinger answered..." for the Q&A (lines 367–end)

Do not invent named attributions for audience members. The transcript contains one apparent audience-member name fragment ("Romano") which is likely a speech-to-text artifact. Other likely STT artifacts include "MTP tool" (probably MCP), "LangChain and LangGraph and like Smiths" (probably LangSmith), and "Tessell" appearing as "Tessel"/similar variants. Quote them verbatim.


Section 1 — MC introduction (L1–8)

Okay, we're gonna kick off. So, yeah, actually, following on nicely from that previous session with Mark talking, talking about skills, this is a really interesting session about how, you know, what happens at a, on a larger enterprise when, when skills really are distributed, and there are many, many development teams wanting to, wanting to use skills. It really poses kind of different challenges that a larger, larger organization. So John Groetzinger, he's a principal engineer at Cisco. We'll be, we'll be presenting this. I actually presented with John a little while earlier at Cisco Live down in Amsterdam, so it gives me great pleasure to welcome John on stage. So please give it up for John.

Section 2 — The real bottleneck is context, not models (L9–32)

Here. There's about skills and putting them everywhere and really putting pipelines around that so you don't have to maintain multiple versions of context in various places. Because, again, games of the day, both humans and agents really need to consume this context. So I'm going to kind of talk about some things we've been struggling with, missing patterns, reviews internally at Cisco. So this might be a little controversial for some people here, especially maybe a few years or a frontier model thing to hear more say, but you don't really need a smarter model today. The models are perfectly capable of doing what you need for business value, right? I'm not talking like pan intelligence. I don't think getting business out of the models. It's already there today. The problem is really that you need smarter context and smarter context engineering.

And so for me, the last couple years, I've been jumping harness to harness. I'm always chasing the latest, greatest thing, constantly breaking my workflows and other people can relate to this. And always, there's so many variables that anytime I switch to, like, a lower model and it breaks something, I'm immediately like, oh, that's the model's bottom. Going back to this, right? Because this thing that I don't understand has to be the model. Right.

But once skills came onto the scene, it kind of changed my perspective for section on things, especially recently with the new patterns of agentic fan out in the harness. Right. Where I have my model set to open. So I asked it to do, you know, this massive prompt, and it spawns 15 sub agents, and that cost is just really increasing. And we're actually seeing that becoming actually a concern where we have to start limiting model use on analytics enterprise because of the cost of this stuff in the agentic fan or being a problem.

Section 3 — Why skills changed his perspective (L33–54)

So I really wanted to figure out, okay, why can't I just use sonar for everything? And I found out I can. And today, for the last couple months, I largely use the medium tier model. I'm GPT medium reasoning. I don't really use those high-end models. It's very rare that I use those, you know, maybe for some super complex planning, lots of context. I still use them. But I challenge my engineers to really try to use that medium tier as their baseline. Right. And only jump into the greater field when they need to.

And the real unlock for this was skills because all the harnesses, all the models are training on skills, they're honoring skills. And so if you can lean into that, your context can actually transfer between the harnesses. And that was great for me because I can, you know, I can try cloud code. I can jump into dev and I can jump into GitHub Copilot CLI. And if my skills work in all three of them, relatively, it works the same. So, you know, it's great.

But what was, what you need to understand is how do you manage your context? Right? How do you share it, package it once, and then ship it to everyone, but only have one source of, you know, management. And the real unlock for that is evals. Evals are extremely important and you need to make sure that you're actually spending engineering effort learning how to build evaluations for your context.

And what you'll find is if you do that correctly, you can actually use extremely low end models in your workflows become much more deterministic, especially for things that are repetitive that you need to do all the time. Right. I don't think anyone in this room, if I went to them and I said, hey, I can, you can use Opus and do this workflow, or you can use Haiku. And one way faster and way cheaper, but you get the exact same result. I mean, which one are you going to choose? Right? It's a no brainer. No one's going to choose the more expensive, slow model just because. Right. If you actually figure out how to do that optimization, you're going to go with the cheaper, faster one.

So the story that I'm going to tell today stem from kind of that journey. Of two completely different teams. One is more of a traditional context already exists that humans maintain. How do we convert that into something agents can maintain without Reinventing it or, you know, having complete drift in the few systems. And then the second one is more developer centric, where everything we do should be more skill based going forward and kind of being the starting point.

Section 4 — Personal background (L55–72)

So before we go too deep in, just a little background on myself. My name is John Groetzinger, and I've been in Cisco for about 14 years now. I actually came in through an acquisition of a company called Sourcefire. We wrote an intrusion prevention software. It's still one of the core technologies in Cisco firewall today. Right.

And so I spent about 12 years in the TAC, which is our technical assistance center, technical support for enterprise customers. Now, I'd be dealing with very high pressure situations, financial institutions, banks, hospitals, you know, critical infrastructure. When that goes down, I'd be the kind of person figuring out how to get them up quickly, you know, and I really want to automate that because those are very unknowing situations. Right.

And so I've been a full stack developer for the last 12 to 14 years. But the last two to five years, there's a lot of edge cases that you can't handle in automation. Right easily anyway. And AI kind of changed that a lot. Right. I really got excited. I was like, oh, man, these 50 edge cases. I don't have to hard code if AI is for. I can just throw it to an agent with some context and actually handles that quite well.

And then for the last year or so, I've been focused heavily on production agentic system. We just released a couple weeks ago, GA. We have a fully AI native platform built from the ground up to be AI native. It's customer external facing. We have over a thousand users on boarded in just a few weeks. And it's been, you know, that's what these stories are actually coming from, is shipping production agents.

Section 5 — What a skill is (L73–98)

Okay, so let's take a step back and hopefully everyone knows what a skill is. If not, you probably haven't been paying attention over the last couple days. And so I really hope you have. But a core of it, this can be very simple or it can be very complex, which is nice. This is flexible. But the bare minimum is just you have that skill.md that just, you know, it's like your starting point for your agent.

And so one skill, for example, that I've created for our team is a repository standard because we have tons of technical debt. We have people creating new projects all the time using models that are trained on stuff that's not quite up to date. Right. So if I say create me a new python project, it pumps out a requirements.txt with the pip install and I go insane because I only want people using uv. Right. And so I put that into the skill so I don't have to have my engineers worry about, like, setting up projects the wrong way. They just install the skill and the set of the project with the agent. Right. So that's just one example. That's an easy way. Right.

So you want to make your skill ship with rules, which is essentially just more markdown files. You can put scripts in there, any scaffolding you want can really be, you know, as advanced as you want. But what's key is that you add evaluations for your skills. Right. Define the bare minimum that you expect this skill to always help with and always do it well. And so as you increase your, you know, improve the skill or you add more to it, you don't want that old functionality to break. Right. So you need to focus on what is the core stuff that should always do well. It's going to handle those other edge cases decently well, and you can update the eval as those come out.

Now, these are not really case studies. This is not like, hey, you must go do this. This is the way forward. These are really just some patterns we've tried because, again, one team, it was a development shipping team. We have extremely tight deadlines because our executive leadership says you have AI now. You can deliver way more features just as fast, right? Let the world be live in. Not quite, but that is the expectation that we live in. So we don't really have time to engineer our workflows day to day, unfortunately. I wish we did. And then our support teams, same thing. They're under a lot of pressure, have high severity cases coming in. You know, they really have to figure out, you know, how to spend their time on focusing on the customers and not really optimizing the day-to-day. So a lot of anecdotal stuff in here, but these are just some patterns I wanted to share. That might be useful for you guys.

Section 6 — Story 1: Pipelining the Cisco TAC knowledge base into skills (L99–158)

So story number one comes from the support team. Right. And if you have an existing platform of content for humans, how do you take that and make it usable for agents? It's kind of a source story.

So in our, in Cisco TAC, we have a very strong culture around our knowledge base, right? Something instill in our culture because engineers are constantly shifting out. The real value comes from that experience. I help this customer with this really complicated problem. There's other ones that are definitely going to have the same problem. Let me write that down so the next engineer doesn't have to suffer like I did. And the customer doesn't have to wait as long for a solution. It's just ingrained in our culture.

So we really wanted to lean into how do you leverage that, which is highly curated, high quality content that's feed it into an agent system. And I think that some of the first things we tried were just given MTP tool to search that HTML, pull in the docs and let the agent figure it out. Didn't really work so well. You know, there were winds here and there, but there were a lot of hallucinations and stuff that just didn't make sense. Because there is still stale content in our knowledge base system, right? Not every article is up to date. So you really need to pay attention to which ones you let the agent consume.

So two engineers that have been kind of working on this for two years, a couple months ago created a platform that converts our articles, which are kind of handpicked, doesn't just blindly take every article and shove it into this agent system, but highly quality articles that people are actually actively maintaining get converted into skills, and then we build evaluations for that. Right. So think customer opens a case. They have a problem description. We go through three weeks of collecting logs and figuring how to fix the problem. And then finally we fix it. Right. Why did it take three weeks? Right? How can we cut that time down and to just going from like description to solution on the first day and you'll write an article to kind of help with that. And then we give it to our agent so we can kind of automate that even more.

Now, how many people here have actually written a skill? Everyone most hands up? Okay, now how many people of the righteous ghost, how many right then by hand? Do you write your skills out by hand? All right, a few people. Stop doing that.

Section 7 — "Skills are not for you — stop writing them by hand" (L159–186)

The skill is not for you. The skill is for your agent. So it's good to have human docs, but skill is for agents, right? It shouldn't, you should not really care what is inside this skill, the context itself, what you should care about is the outcome that comes out of using this skill with the agent. So where your human spends the time, that is your most precious resource is your humans validating your tests, the quality, what actually matters. That's the eval and then the initial KB. The stuff in the middle, let the agent do it for you. Let LMs do it for you. Even better, if you want a smaller model to do this workflow, let the small model build the skill for itself. Found a lot of anecdotal success in doing that where it just kind of like, hey, well, this is for you. How do you want to organize this information so you can be more efficient with it? Just put that in the pipeline, has the output quality of an eval and you can kind of verify if it's working or not based on that.

So a lot of people will say, you know, we can use our docs going to die. And of course largely I do agree with that because people just don't really maintain docs. But when you have a strong culture like we do, our docs are not really dying. We're actually staying the opposite. Our docs are becoming more important to grounding the agents. But what we're seeing is that a lot of people are building agentic systems that blindly try to use any content or they'll have agents that build their own memories on supporting stuff using that and then the memories drift from the content and then you end up having some answer come out to a customer and that's just completely unaligned with the article that it says it uses as a source and then everyone just loses faith in the entire system. Right? And that's the problem we've been struggling with for the last couple years.

So we're trying to figure out how do we keep these more in sync in a way that we can actually trust it? And that's what the system that these engineers built is that when the article changes, we kind of pipeline that, right? Think of like a GitHub action triggering when we update a knowledge base article that goes into a pipeline with an LLM that reviews the changes, determines if it's a minor change, moderate change, major, and then makes the change for itself to update its skill. And then it runs an eval at the eval's path. Great. Everything's good. If it's a minor change, just changing a typo or wording and eval's pass, I don't really need to have my human gate that new version, right? Just let it automatically release. But if you're doing a more complex change and major change, maybe you're introducing a new topic, maybe that's something that you need some more quality control and that's where you insert your human, right? Your humans either go far on the left or far on the right, but in between, you should not be wasting your time. Should not be caring about the text if your engineers are reviewing text diffs on a skill, you are wasting a lot of time, that is just unnecessary, at least in my experience.

So new topic means new eval, right? You need to build a new evaluation. Of course, you can let the LLM do that, but you need to really tell it what success looks like in order for it to build a good evaluation. So that's stories number one.

Section 8 — Story 2: Shipping an eval framework to 8 distributed teams as a skill (L187–262)

Now story number two is a little bit more complex and it's more probably relatable than one in the room because it's developer focused. And so this is a concept of just, you know, how do you instill that culture of skill first context engineering into your engineering teams? And that is something that we kind of did on our team building this new agent platform. And so I'm going to kind of share that story.

Here. And so to talk a little bit about, you know, the complexity and the environment we're working in, I want to tell you a little bit about the system we built first. So the one that we released a few weeks ago that I mentioned earlier, this AI native platform. It's a multi-agent orchestration system. Of course, we have essential orchestrator. We don't call an orchestrator because we're Cisco. It's a router. It's a semantic router, of course. Right. So we have our semantic router. It routes to multiple agents. There are books on these eight specialist agents. That's just what we have today. And, you know, we're already building 10 more of this quarter, right? And that's constantly grounding.

So those are owned by different development teams. Some of them are owned by different organizations. But all of this goes into one single platform. So the customer gets one single unified interface. They have different apps and different pages. They can navigate in there. There's whole pipelines and like databases behind like each of those apps. And each of them comes with their own agent. But we don't expose that kind of complexity to the customer. That would be annoying. They have one AI assistant that they can talk to everywhere. We have AI buttons embedded everywhere. Some are, you know, static buttons that are directly routed to an agent to help them. But then we have free form chat as well. So if I'm stating on the configuration, you know, best practice app and I ask it what tap pieces I have open that shouldn't go to the configuration agent. It should go into the cases agent. And our system works well for that. The only way we're able to shift out of confidence is because we did heavy evaluations and we have a strongly giant culture around that now that we had to build. Right.

And so that was something that I was actually tasked with was figuring out how do we evaluate this stuff. You know, I got the goal of saying, hey, you're going to be the give the go or no go on if this is actually going to be good enough to ship because I think a lot of people are stuck in the PSD phase. So we need to be very confident that we're not going to release something that's just, you know, terrible. Right? We've seen all the disasters, people having agents fail in the car for a dollar. Right. We don't need that bad PR. We really need it to be confident about this.

So I kind of came up with this framework for testing, right? We use a lot of LangChain and LangGraph and like Smiths. We're observing our agents. Which has a good eval culture. But the typical way that you create evals and data sets today isn't really aligned with what I wanted. Because I really wanted to be able to compare all the environments. Right.

So when you build an agentic platform, your signals really come from your traces. You should ship early, monitor it, and then adjust based on what you're observing how people interact with it. Because we have a freeform chat. People can ask anything, right? So you can't really predict what your people are going to ask. Kind of just kind of let them do it. Right.

So you're going to be constantly pulling evals from production data and then figuring out, oh, that's a question they asked did not go well, went to the wrong agent. Even if it went to the right agent, maybe not even that answer. I want to build an evaluation for that because it's something we want to support. Now if I pull in this new eval and I run it on my local laptop and I get it all working. Okay, great. That likely it resulted in some sort of prompt change, some sort of tool update, whatever it may be. What else did that break to fix this one thing? Right. That's the loop we constantly see. I'm chasing this current bug. I don't even know what that breaks when I change the prompt. Without evaluations, you won't. Right.

And so I had to figure out, okay, how do we actually do this more unit test style, right, as an enterprise? If you've ever scaled up software, you get testing is absolutely essential. Evals are the same in this world with agentic. They are the unit test for your agents.

And so building this concept, you know, simple things. I wanted basic dataset schemas. Everyone could use kind of that same baseline schema. And storing that with the repository, right? So the data set stored with the code. Now typically a data set is in json format, right? And usually a json. It's just a single line file, right? So when an agent reads it, if you have a thousand examples in there, it's just this massive context that explodes. So immediately I said, okay, we need to switch. Everyone needs to be using JSONL instead of JSON. So then each example for each eval is an individual line. Your coding agent can make more precise edits on the data set because again, we're not hand curating data sets. We absolutely don't have time for that. Everyone has a coding agent. And it's basically updating the data set. We need to make that more efficient.

This is something I've learned that I would instill into this skill that I'm going to talk about. I wrote a script so this can work basically into your local laptop dev staging the same exact eval statements that code works in any environment because it's very simple. You know, traditionally when you have a unit test, you import the function. You pass in whatever input you're testing and then you verify the output. Right? Agents are no different. When you have this like kind of black box evaluation, you can do the exact same thing. However, what happened in the middle, you might actually care about as well. Right? And so you can actually observe that stuff in a platform like LangSmith or any system observability system. That's what our eval code actually runs against. Right? So we don't import right now like a python import. Don't evals don't work like that. They get an endpoint, right? And we point our evaluation code at the endpoint to invoke the agent. Answer comes back out. We check the observability system, you know, did it call the tools we expected it to call what the right parameters did it, you know, spend the roughly amount of tokens that we expected it to spend. What was a latency on this based on the ran the last time? Right, you got to have a baseline for all this stuff built into your events.

So all this is what I'm trying to basically teach all of my globally distributed team and just basically have like a week or two to basically introduce this concept to eight different teams distributed around the world. What options do I have? Get one meeting and get everyone on it that they can just not pay attention to. And let's be honest. Anyone going to pay attention to for an hour of me yapping about testing. No developer cares about this. It's another complex problem. Testing is not exciting. It's not exciting work. Engineers don't get excited about it. So you got to do it for them as much as you can. How do you plug it into CI? How do you make it environment aware? Right? How do we do the dataset schema? All these things we want a consistency around with, you know, again, hundreds of engineers distributed around the world.

So my thought was, okay, meeting's not going to work. If you have a little bit of time, but everyone has a coding agent in these development teams, it's actual developers. Why don't we make a skill and then I can just have everyone install the skill and the agent can build the models for them. So that was basically the theory. And I ended up working quite well.

So the plugin that I built has our basic skill.md, has a bunch of examples, scripts, example scripts, example evaluations. Basically, I spent a lot of time doing this really well with one single agent to basically figure out all these good patterns. And then I had my agent distill that down into a skill, basically explaining, hey, everything that we just created here, can you make a skill for yourself? For other people with coding agents to be able to do the exact same thing. And so that's what came out. The agent kind of built the skill for me. You know, we iterated it with versions and I shared it slowly. One team got some feedback on day one, day two out of their team, day three, everybody. Right. And so teams just kind of installed it. They asked their coding agent, hey, do this thing that I don't care about that John wants me to do. And I have a PR due by the end of the week for, right? Their agent just kind of did it for them. And by the end of the week, every agent team kind of had a data set done the same way, similar metrics, right? Because before this, if we just told eight teams to go build evaluations for their agents, I mean, what do you think would have happened? Right? Have we ever had massive inconsistency? If I want to go help some other team, I understand, oh they're metric zero to 10 and 10 means good. This team did zero to five and five minutes back. And this team's at zero to one. Like that is just something we have to speak the same language and engineer towards the same goal. And a skill can really help you kind of do that.

So that's what I really wanted. Everyone contribute to this framework and we install it as a skill. We version it and everyone kind of shaped it together. So that was essentially for theory and what it worked.

Section 9 — The README-to-Confluence sync (L263–294)

There was one problem that happened is that the management would then come to me, my boss or whoever, and they would say, hey, what's the update on that evaluation framework? So it says that they liked it. Can you show me what you did? I was like, oh yeah, you can just ask your coding agent, right? You don't have one of those because you're an Azure. So that was a problem being.

I was like, okay, there's still a gap here. But I wanted a very simple solution to this because again, I don't, my immediate reaction or anyone's might be as Confluence, wiki, whatever. I'll go put a document that explains this eval framework that I built. And then I'll never update it and it'll drift, right? So I didn't want that.

So the concept is really just, everything is in the repository. The skill is obviously backed by game. It's in a repo, publish it through a registry like Tessell. But you maintain it in a git repository. It's version control. And you have a skill readme for an agent and then you have a README for a human, right? Very simple. You can use an AI coding agent to kind of update both. But you as the human are kind of just validating the README and the skills for the agent and you have to evaluate.

But not everyone gets on git either. A lot of our managers are not even in git. They operate solely in like Jira Confluence. So I can't explain to 100 managers go make an account on GitHub. And now you got to put this in Confluence. So there's very, there's really cool open source tools. You can build your own script. To basically convert that README that's in the project directly to just shift that to Confluence, right? And you make a doc, you know, think hashtag one markdown becomes h1. Right? It's a very simple, very deterministic conversion. Don't have an LLM update Confluence. Many can do that too, but it's kind of token heavy. Just have your agent right a deterministic script to sync that markdown into a pretty HTML file that everyone in that other system uses. And that was it. That was all I did.

So now when someone asks me for, hey, can you tell me about that agent framework or they just ping me on Webex and ask me where can I get information about this? I send them just a link to that Confluence doc, which is a direct one-to-one map of the repo. And so there's no drift. So what my agents are reading, what their manager they're reading is all the same content. And it's all maintained in one place. And that was really just a very productive unlock. And it worked really well.

Section 10 — The cultural shift: "is this a skill?" (L295–320)

And so the cultural shift that I really have to instill in my team was asking first, is this a skill, right? So, you know, someone asked Webex message that, hey, new engineers onboarding today. Where's the DevOps information? How do I get access to the Kubernetes cluster? Right? It's not really that is sometimes the question. And when I see that, I say, no, no, no, no, no. Go ask, do we have a skill that explains this so I can install it in my agent? That needs to become the default reaction. Right?

And it's a cultural shift you have to actually intentionally kind of engineer and kind of distill down to your team. And so that's kind of our default now, or that we're operating at is like, hey, does anyone have the skill for this? If not, we create one together and we maintain it together. Again, I don't want 15 engineers creating 15 skills for the same thing, especially with this world of the token budget becoming a problem. That's just extremely wasteful engineering, right? We should all be trying to share when we can. We have coding agents. There's no reason that we can't really all work together and have our agents maintain it. As long as we have evals that define what the behavior should be, this is actually much easier than it sounds.

Section 11 — What to do tomorrow (L321–354)

Okay, so what do you do when you go back to work one day or maybe tomorrow, whatever. Listen to me, I'm going back one day. But you want to just start small. You don't need to go build some crazy robust platform like we do. That came from like two years of us being our head off the wall, right? So engineers really want something more robust. But you don't have to start there.

Start with one word, one concept, one thing that you find yourself explaining over and over to different audiences and just turn that into a skill or a package, whatever you want to call it. But really at the end of the day an agent skill. And really just give thought to, okay, who are the audiences, what platforms do they operate on? Can I maintain the README here and sync to those registries or whatever it is that should be kind of your objective?

Add evaluations. That is key. If you don't do that, it's just going to fall apart. So make sure you define the bare minimum. What do I expect this skill to actually help people with? And how do I actually validate that? That's it. Just give a little thought to that up front. It'll go a long way and kind of compound on that over time.

And then of course start sharing it. But you don't want to just share with everyone right away. Kind of use it yourself first. This is like the 0.0.x version of the context. You really do want to kind of semantically version your skills and your context to give signal to your engineers consuming it. Right? Because if I go look at a skill and it's version 1.0, I expect it to do exactly what it's supposed to do on the first try with almost no friction. Right? And if it doesn't do that, you've lost my trust. So signify that with your semantic versioning. How mature is that actually? So starting with version 0, 0 share with a few, get some feedback, iterate. Once it's becoming a little more stable, maybe 0.x, right? And then finally, once you share it with a lot more people and it's regular, like people are like, man, this is working great for me. That's the signal to move it to version 1. And then that gets everyone else a signal. Hey, you can consume this. Here's what the skill does. I promise it'll work because it's battle tested and evaluated.

Section 12 — Takeaways (L355–366)

So takeaways here. Again, our agent architectures, they change every day. And you'll have a new one next week, next month. You really can't predict that. But your context is your durable investment, right? Really spend time on how you engineer context and maintain things in one place, but distribute that to multiple systems to keep them in sync. That's really where you want to spend your time. And just build your pipelines today because everything else is definitely going to change on you anyway.

So hopefully this was useful information for everyone. And thank you for listening to me for a while.

Section 13 — Q&A (L367–end)

[MC / host] Thank you, John. Thank you very much. We have a few minutes. If anyone has any questions, shout. And I'll, I don't know, not throw the mic in. Health and safety and all that.

[Audience question 1] Hi, thanks for the talk. There's skill.md and README and d that come out very different.

[Groetzinger] There are, to be honest, there are probably like 80% similar in most cases. The README go look very similar. I'll usually start with the skill for the agent and I'll say, can you document this in a README? And if I have questions, I kind of updated. But honestly, they're very, very similar, which feels wasteful. But again, the outline is maintaining it, so it's not really like that much mixture. To maintain. But they're very similar in my experience.

[Audience question 2] Hi there. Nice to talk. Just right here. So my question is around when you scale out of like having like a multiple skills. I think you've been this journey like starting with a few skills. But now I'm assuming that you have. Maybe. Hundreds of skills. How would you like orchestrate to make sure like the end user finds the right skill that is needed when it is distributed? As you said.

[Groetzinger] You're asking how to maintain like or kind of prevent like skill explosion, right? Is that kind of the question? I don't know if I followed exactly.

[Audience] So the state is you already in or you get to it having multiple skills and then the end user is looking to do one task. And require like one or two skills. How do you orchestrate that end user to probably skill with design?

[Groetzinger] Right. So. I mean we for our development team, we're actually using Tessell which kind of helps with that. And they have kind of a server that helps you search your skills. But figuring out how to index your skills so they're searchable by an agent generally is the answer how you do that kind of depends on what your tooling is being used. But you can, what we do, you can put them all in one GitHub repository and you can have your agent have access to that and then they can just grep for the skills that it needs. I mean, that's a simple solution you can do today with almost no effort, right? But if you need something more robust, then you have to kind of look to more robust systems.

[Audience question 3] Okay, let's see. I actually just answered my first question. Do you store your skills in a GitHub repo or something? Or rather than per project. And then I guess it sounds like the GitHub repos, but like separate from the projects.

[Groetzinger] Right. Yes, correct. We have like each engineering team kind of has their own skills repo. You know, I'll be nice to that. But yes, we have one repo for like all of our team skills.

[Audience] And how do you go about evaluating like, you know, when a skill needs to get updated or something is that then become a few engineers, all coordinators, is everyone free to like put in a pull request to update the skills. I mean, obviously you don't want everyone's direct. Influencing the context for all of your engineers.

[Groetzinger] Yeah, so we evaluations are honestly answered that. We do let anyone submit PRs to the skills. We treat it just like we would a shared library. Right? So if you go in with that mentality, this is a shared library. You cannot break functionality for other people. How do you validate that? The evaluations need to cover it. Right. So if there's something in the skill that you care about, build an evaluation for it. Every time you update the skill, make sure that evaluation passes. It's not any different than a shared library with unit testing. And if you think of it that way, it is that simple. We have to give a little more thought to how you build your evals because it's just a little more complicated with the non deterministic nature of all LLMs, but conceptually it's the same.

[MC] Wonderful. There's no more questions unless there any last minute.

[Audience question 4 — name likely garbled by STT: "Romano"] I wonder the frontier. Romano. Go. Thank you. For talking. Is. There any way we can go to. Read more into evaluating Cisco ones and what we can use today? Or to that end of this being a skill you are, for example?

[Groetzinger] I'm certainly happy to talk to you one on one about it. Like I could talk about evals all day. So, you know, if you want to find me afterwards, I'm happy to chat about it for sure. Yeah, yeah, sure. Just come find me after the talk.

[MC] Sweet. Thank you very much. Another round of applause. Great job. It's copyright time, I believe. So you can weave your way out of the room and get some sustenance. See, reckon about half an hour. Have you ever seen that? Is it?

talk-groetzinger-skills-everywhere

README.md

tile.json