⚠ Attribution warning. This transcript was captured without per-speaker labels and contains speech-to-text artifacts (e.g. the emcee's name appears as "Macy"; "Llamas" almost certainly means "LLM-as-judge"; "Copilot Topics" almost certainly means "Copilot Autofix"; "calling zip" likely means a zip-library buffer overflow; "Sam. Antha." is garbled). The vast majority of the talk is Joseph Katsioloudes speaking. The intro and outro are by the emcee Macy. Two unnamed audience members ask questions during Q&A. Do not invent attributions. When the transcript clearly garbles a term, note the likely intended term but quote the transcript verbatim.

Participants:

Joseph Katsioloudes (speaker, GitHub Security Lab)

Macy (emcee)

Audience Q1 (false-positives question)

Audience Q2 (AI-as-judge / dual-LLM question)

Section 1 — Emcee intro

Hi, everyone. Great. Hi, everybody. 12 room to the tool call. Sam gave me this room. Because I. Am the T. I'm just kidding. This is called the tool called because all the talks in this room today and workshops tomorrow are going to be very practical. They're going to be about things you can take away today. No, like petty theory stuff. Well, maybe a little bit. Depends. Depends how we go. My name is Macy. I will be emceeing today and tomorrow. I will also be giving a workshop tomorrow. That should be fun. Yeah, it's going to be good. But we have a great conference for you. We have so many good sessions. I'm really excited to be emceeing this room, but I'm also. Like, going to be missing a couple of talks. So you guys have to go and report back to me, please. Housekeeping toilets are out there and down the hall. I, if you can read English signs, you will find them some water just outside the room. If you need some, please feel free to grab. So our first talk today. Is by the one and only Joseph Katsioloudes. Joseph is a senior developer advocate at a company called GitHub. Raise a hand if you have heard of GitHub. Okay. Okay. So you're getting a little traction. Good for you guys. Really proud of you. So no pressure to speakers after this, but, Joseph. Has spoken in over 25 countries and has over 2.8 million views on his videos. Again, no pressure. But he is a seasoned pro, so you're in very good hands. Can we get a little round of applause for Joseph?

Section 2 — Opening + GitHub Security Lab context

Okay. Good morning, everybody. Hello everyone. Perfect. Welcome to Code Security Mindset: navigating the era of artificial intelligence. My goal today is to show you practical ways to use artificial intelligence for security use cases. Whatever I show you, you can do also in CodeX. Not just GitHub Copilot. Okay, let's get started. Yeah. 180 million plus developers, the platform that build scales secure software. Specifically my team is that you have secured the lab, which is a team of security experts with the mission to secure the open source software that we all rely on. And we do this through research, education and other activities. For instance, this is some research from last year about bypassing Ruby sample just last week. We showed how somebody could exploit caling zip with a key buffer overflow. We have found and help fix more than thousand vulnerabilities, 900 plus of which have been given unique security identifiers. And the most important thing is that we help people fix those. We don't just report them.

Section 3 — The 1-to-100 security gap

And this is important because exactly the answer gap between security and developers. If we try to quantify this gap, there is just one application security specialist for every 100 software developers. This gap is the opportunity that we can help minimize with AI. However, if we are not equipped with the right knowledge, the gap maybe is going to widen or is not going to close because we didn't maximize our potential there. And that's the point of this problem. I want to show you the pros and cons, the drawbacks of AI so you can make a human in the loop use case of how you can use it responsibly. So this is the first statement. AI can help minimize the security gap. Let's see how.

Section 4 — Writing safer code: start left, not shift left

The first thing is writing safer code. For me, this is very important because all my career so far, I'm hearing from senior leaders in cyber security about shifting life. The problem when you shift left is that you keep having a gap on the left. The whole point and the opportunity here is to start left. And starting left means a lot of things. And of course it has to touch in the developers how the code is being built in the first place. I brought some demos that are starting from the beginning of this AI wave that we are living in. So slowly, slowly, we can come up to date releases up to two days ago. Or after Friday. So let's get started. The code might be a bit small to read clearly at the back. I'm gonna show you the blazers and I'm gonna read everything. So we are all on the same page. The first example I want to show you is from the first releases of AI with a child on the left. Here we have an example of vulnerable code. On the right you can see a very simple SQL injection online 23. It comes from the fact that line 22 has a variable that is user control that is not being sanitized. But the first thing I want to show you with that is I'm asking an early version of GitHub Copilot. What is wrong with this code? What are the security issues here? And right straight away I want to highlight that out of the three responses that are here. There are Sam. Antha. So this one, passwords in plain text. Yeah, absolutely. There are sequel injection gums. No improvisation, but there are slow password in plain text. And with that, I wanted to show you that we suffered from hallucinations and we will never get away with zero hallucinations.

The second thing I want to show you is non-determinism at its best. So I'm bringing in a code base of 10 security vulnerabilities. I'm asking the same question in my context. I have the whole code base. And with that, I get back eight results. I get my age results. Somebody can tell me, yeah, you give eight results back. You have 80 accuracy. However, that's not the case. And I will not really be worse specific here. Number six, somebody can read prototype permission. In JavaScript, which happens when a method such as the array function is security something from the parent method. And when the parents poison, you get down to the children methods. And because I was so impressed by laburn, I have put in context, the specific file that was suffering from that collision. And I asked the exact same question with the difference that this vulnerability never came up. I reduce the context. And it didn't come up. So this is not determinism at its best. Someone could tell me that I can run this many times and find out more and more about this just adds to the friction and to the false positive I have to go through. Then I try to change the model and see if that improves the situation at all, but didn't because the pollution didn't come up. And with that, I want to mention that it's not about models about the rest of things, specifically in AI. In security models can definitely help, but you can't cheat more when you have better scaffolding and we are going to go there as well.

Section 5 — AI as reasoning layer, not detection

Now, the next thing I was thinking in the early days was, okay, but what if we can still achieve something? And always I was trying to fix the problems that we have pointed out by AI because in cyber security, and I want you to remember that we don't have a detection problem. In cyber security, we have a fixing problem. We have so much ways to find what's wrong. And we don't have the ways to minimize that gap by get up to the fixing speed. So by that, I started to realize that maybe I can use something else. For detection. And then I can use AI to be the reasoning layer to help me fix this. So the next thing I started to do was, okay, let's be more agentic slowly focus a bit on security. The results have definitely improved, but always there was this non-determinism and the hallucinations come into play. And so certainly we are getting up to speed with the latest developments in AI. Before that, I want to mention that AI can definitely help minimize the security gap. But in our view, as the GitHub Security Lab, it's not here to replace the human in the loop or skip security testing entirely. However, AI is definitely changing the scene of security testing. Now. You should be always following good security hygiene and AI shouldn't be your only safety net.

Section 6 — MCP (Model Context Protocol)

And with that, I'm gonna make a bridge to MCP: Model Context Protocol. The Model Context Protocol designed by Anthropic was designed in order to help the AI models to go outside of their small and narrow training books. Help us access server silo from information, from companies that we are using and so on. I can't think for a long time about why you should be careful with MCP and so on. I assume that by now we are in good shape and the protocol is help improving itself and taking steps towards more secure direction. I just want to mention that I'm gonna assume that for the rest of the you have a good security hygiene of installing MCP servers in tools that you already trust and you are gonna go out there in the wild and trustworth protects with all their league protocols and stuff like that. So let's see that. MCP's company help main security better and find on those problems. Yes and no. So number one thing when you detect issues with AI. Yes, they are really deterministic, it may miss me how the snake, it has higher dollar cost because the cost is viable. However, you use it as the reasoning layer. You can't use it for systematic issues contextual issues. It's easier to model. You can ask questions. Am I using the right cryptographic primitives? Is there any sensitive data on the client side? This is things you will never be able to do with traditional static security tooling such as, for example Semgrep, SonarQube and CodeQL. Because this can model everything and they can't detect Semgrep side. However, before the however they are. Checking stuff on top tier matching way, right? But right now what are the trading security issues in AI? It's all about behavior of stuff. For example something can be amazing syntactically correct and pass all these partner checks. But at the same time it can leak the whole stuff we have internally because the behavior is wrong. At the same time they have a fixed cost, they have high confidence and they are more mature. So we can still be able to use them.

This one we covered it was all about you following what you already trust for MCP. And with that I want to give you the example of like you have the MCP server. I can go in more detail with other MCPs but I prefer to choose a company that if I say something wrong at least in my compile. So for security specifically you have some functions such as code security secret security advisories. These are going to pull server side findings and bring them to either your CLI, your IDs. And the whole point of this is about transport you translate these findings into outcomes so that the AI model, the agent can reason with more context using the latest findings on your GitHub account or what other companies and MCP are going to store. So therefore this is going to help to extend the context. It can help you reason about your actions in a way that is more focused.

Section 7 — Skills

Let's now go to the next step about skills. So when it comes to skills. Think about it this way with this specific diagram. Let's start from the bottom actually. So if I have an MCP, I can get access to my scanner's results to issues that are there maybe to seem basically in every functionality out there. But if you have MCPs without skills that are going to give structure, your AI agents are just going to have capability but without. Your process. If you have just skills, say for example this is what you should be doing in one, two, three, four and you don't have MCPs. Maybe here you don't have enough power to execute what you want to execute. And of course MCP servers can come up in the skills being processed being analyzed there. And the good thing with skills is that you can have them in a way that they are auditable audible maintainable extensible in the speed registering. So in the security world some use cases for you is that oh I found something with my code scanning. I'll be able to pull that information and issue is going to be created. My Copilot CodeQL code is going to fix that thing propose this fix in the PR and then I'm going to be able to be the human the review coming in this case here. In order to release and keep up my CICD. If something else needs to be added there you can use a skill for the triage logic and of course to extend into more capabilities we can't use VMC layer here.

Section 8 — Remediation in the PR (Copilot Autofix)

I'm going to show you an example of the CICD from the moment that you can push code. And I want to name that file and fix security issues. In the traditional way before AI somebody could go in their dashboard such as for example this one from CodeQL that is the GitHub solution. You can see a list of what is wrong with your code alongside the severities and where you have that you are going to see some explanations that are always the same. They are not going to be specific to your code. There is no AI up to this point. I am showing you the past so that you can appreciate the now. In the now. Fixes are coming to the p arms of course. This is called Copilot Topics. You can have the same functionality with other AI assistant. Of course. Here you get an explanation that is based on the deterministic findings that happen from the s tools. Of course you can have a layer of. AI finding themes and connecting the context you have there with your goals and all the things you want to have that are out of scope. For this uni day actually right and the latest plugin from after update just three days ago was trying just to do that. I am going to cover that in a bit. So here you did an explanation of why this is highlighted. You get proposed code that is going to fix this vulnerability on the PR. And then here on the PR level you can continue to commit directly to remove this vulnerability. You can edit on the PR and then once you do that you are good to keep going.

I want to share something. So work for GitHub for the past four and a half years a big learning I had is that developers have to be working on one place. And this should be that for request. Of course with products like CodeX or the new app from GitHub Copilot, you can have everything in an application. However the moment you give developers more power and they have to log in on the x website and the y website and it is not a long term sustainable. So by being in the PR we manage and our clients as well to become three times faster when it comes to fixed because remember in AI in security we have fixing problem. We don't have a detention problem. So they have fixed 600 vulnerabilities in two weeks bringing the fixed rate to the top and that the clients part shape the same.

Section 9 — Agentic workflows

The next big thing is agentic workflows. I am going to get the question I am sure. Okay, so if we are using a general workflows for security what's the point of using SAST tooling or other security tools? The answer is all the things we saw before will not determine zone and the maturity and the other stuff. However the big advantage of this one is tailored and I'm going to show you exactly what this means. So here we are going to see a basic example. We have a security agent that is being set up. In this agent. I gave exactly what is out of scope, what I care about how our company is doing stuff. And then I'm running this agent. This vulnerabilities are going to be picked up only by all the code level. And then somebody can create issues about them or somebody can choose to fix them on the spot. In the same place that the code is being developed. And here is a script. You can see let's go back. I'm the presenter mode. Okay basically what I wanted to show you here is that you can change the script the way you want. And the AI agent is going to deterministically as much as possible follow exactly what you try to achieve with your script. You still have to be careful with this stuff because if you put let's think about style now. And taste. So I noticed but if I put every structure I have and all the rules and all the things are applied to my team in one script. I'm definitely bloating the AI context and the process is not the best. Therefore I realized that some things should be going into agents.md some other things should be part of these files and some other scripts will be just three hours about this thing. So let's keep it at the end if I have time.

Now. Let's see how you can be on your record and you can create energetic workflow straight on there. The good thing with these workflows is that they can either run on a schedule or they know themselves went wrong because they understand what's going to trigger them. They run the same VMs as you have actions and the whole idea is that you have a smart agent tracking and waiting for something to happen through a workflow. Then of course it's not about just security. You have the chance to have productivity gains as well. On the left I. General here I put two different workflows. You can take directly from the free online library of GitHub agent workflows. Here we have one that is going to help you with automatic reviews of what's coming in. And on the right. I can't really read it. Yeah, so it's a triage issue cookie master. So trying to triage everything that comes in.

Section 10 — Task flows for vulnerability finding

Now you heard about methods from other abilities fund raise your hands. Perfect. So my team has access to that and of course access to other models and we try to find vulnerabilities with that. And we try to open source our knowledge of how to do that. So with these models these models don't find the vulnerabilities by themselves right. It's also security people steering them and codifying their knowledge in what is called task flows. And we did exactly that in order to help you find vulnerabilities what we did was that we have quantified knowledge we have from our security researchers that found all these vulnerabilities. In a way that you can open this repo you have in a code space. And when you do so we are going to run our manual security reviews in a way that is automated by AI and then you can have vulnerabilities being shown to you. You can access that at gh.io/taskflows. It's free is open source. And it's how vulnerabilities are being found at the end of the day from powerful models. With securities and researchers young men.

Section 11 — Supply chain decisions

Next one I want to highlight because it's all about supply chain these days. Is how somebody yourself or your agent can take better informed supply chain decisions. We tried a lot of things. But miles appetite aligns with this approach. I want to show you now. It might be less neat that I wanted but it matches the quality of something I would be confident to ship in front of. You. So. How to make supply chain decisions like be honest here how many of you are spending more than three minutes deciding if they should introduce something in their order. You spend more than three. Is it more than 30 minutes? Okay so to speed up your research about this we have put together four instruction files. That you can extend. They are free and open source the URLs in the next slide. The whole point is for you to get. These executive summary with these emojis showing you what you have to be careful of. We are using sources of information that we are happy about. And in here you can go to the killings and find out for yourself. This is a gh.io/sk in order for you to make good security decisions.

As we are running slowly out of time I want to remind you that with AI you can have security guidelines and this is something you should never underestimate. I did the same thing in the CLI the results were very similar but I want to show you this on the web because it's easier to visualize. I went in a project called Bootstrap. It open source project that is a project I'm considering to use on my supply chain. And I asked the question. What I should be careful about in this project if I use it or if I use it basically. And this is the number one question people are asking us when we run office hours with open source projects. How can somebody hack me? The results here are running with different models are consistent. It's a security person telling you what you should be careful when you start using these projects so that you don't have to review all the code yourselves. You can go to the specific URLs and start saving time.

Section 12 — AI-assisted fuzzing

Let's go fast over this. So fasting is a technique that you can use to cause undesirable behavior in your code. You can use any tooling for this. And the whole point is that security. Engagements get really slow with fasting right because security person like myself have to produce millions of inputs in order to cause undesirable behavior. In an AI is amazing with that. AI can create this million of inputs and the boilerplates and also the hardness which also slows down this engagement. And hack your code in a safe way and tell you what you can improve so you don't just do these things in a static way. AI reading code and is monetaristic. I mean even your code is being attacked. You can still benefit even if there is no determinism and hallucinations you can still find a lot of findings there. Always remember that. Privacy is important. That's why I do have Copilot transcender is there with Q&A on how you can. I mean these days you are interested about certificates and is my context for my league and stuff like that all is the answer there.

Section 13 — Education: the gh.io/scg playground

And with that, I prepare playground for you to try everything here for free. You can just get started in two minutes right from your browser by navigating to gh.io/scg. I created a bot like it's a CLI it looks and feels like open claw. And the rest of agents like that. Its inside code spaces therefore is sandbox already. You can ask it to do some things like create this file access this simulated internet we put together I actually put a genetic workflows there as well that you can access stocks information in Bloomberg and like the simulated framework and find stuff out. And also multi-agentia viability so you can play and understand what can go wrong before you use the real tooling out there. Are some pre-training as well you can have where I put together a chatbot that in actual language you might ask for it to leak some security shouldn't leak. You don't have to know how code you don't have to be a security expert you can just use natural language and see for yourself what's happening there and trust me it's not 2025 stuff it's the latest stuff if you say for yellow previous instructions nothing of value is going to work so how to go is found I want to build some more credibility more than 10,000 players is used by enterprises around the world and they have a health tone of the time. And it's all about you are on the application layer here starting from the model up at the top you don't have to do anything else because we are using real models behind the scenes from OpenAI for free you can do that using your GitHub account.

Section 14 — Wrap-up

Today we started with this problem that there is one application security specialist for every 100 developers and we went through different use cases in order for you to close this gap. So you can write safer code, you can use MCPs and skills in the way that they are intended for. Of course you should apply best security practice everywhere. You can find and fix security bugs. It suggests us to do that in the PR. You can make better supply chain decisions, you can have security guidance and of course never forget about tailor masters you can get with AI. You can try everything we saw for free and educate yourself about the latest with this repo here that you can play directly from your browser in two minutes. And with that. One minute today you can connect with me in this QR code that you are free to trust. It's not going to drive python a little bit. I'm around the conference today more than happy to chat about AI in general not just the new hub world. I'll try my best to help you. And with that I want to thank you. You've been a fantastic audience.

I will sing the noise around. I have never seen timing like that. That was unbelievable because I told you it's a seasoned professional. So we have five minutes for questions. Does anyone have a question? Excellent. I could use more time with the presentation partner in five minutes.

Section 15 — Q&A 1: false positives burning developer time

Audience Q1: Thanks for the talk. I have a question about. One of the things you mentioned fairly early was the risk of hallucinations of false positives. And think about most organizations treat security vulnerabilities very seriously and there is a probably quite a comprehensive process when an issue is raised. So an AI security view that brings up five false positives. Can take a tremendous amount of time from a development team because the seriousness of that. I think you mentioned something about making sure that the people that are doing these reviews kind of have the right domain knowledge and things like that. But do you have any kind of what's more of an organizational problem than a technical problem? But like how do you avoid that time burn from AI dyspoke vulnerabilities?

Joseph: Let's take a step back. How you can improve your AI approach so that what is given in front of you as a human is as curated as possible. Because that help at the end result as well. So different approaches. Different budgets. Different taste and so on. One approach can be that you are using different models. This is because they have different trainings. They might find different stuff. So two different models. If you aggregate them you might get more prominent to play with. You might have an approach that you are running these models more times. Therefore you only get the results that are common between the runs and then you aggregate. Them. Every time you run the cost goes up. You can have the approach that this is an area that started turning definitely has to improve and we are taking active steps on that. You shouldn't care too much to build that DIY. It should be you trust a company. They have Semgrep you name it. At the moment you push code you don't care what happens behind the scenes. It's their job to combine static testing with AI testing. And maybe another round of AI reasoning all about them to give you the best result they are at the end. We are actively working on that. It's an area that we have to improve. Because I think the most waste of time for developers is when they lose confidence that the results of. Static traditional testing with pattern matching is not going to add value to a world where we move from the execution layer in the vulnerabilities are on the behavior of these agents.

So when it comes to the developers that have to spend time on this, I think it touches Bible education. Traditionally. Technology has been done in a way that developers are learning how to add functionality but they didn't learn how to do that. I don't want to be. Like speaking 100% but most developers didn't think that is the job of somebody else. Companies have a hard time to educate developers about creative video based trainings are boring they don't work. There are some other trainings out there but they are like who wants to be millionaire kind of approach they give you four options and you choose one. In my experience security you can introduce more vulnerabilities by that. There is no guarantees you are going to fix that there are more than 100 ways. That's exactly why I created that training that allows telling you here the gh.io/scg is hands on. That's the number one thing you get from there. So I believe that if I was a chief security officer I will push down to my developer team service level objectives. About expectations I have that are tightly. Aligned with their performance objective. So it's not about shipping code, it's about shipping code that secure because if the code is not secure is own quality code therefore I will tell them at the end of each screen this is your allowance for open security problems with that specific severity score and anything else is not accepted because you don't pass the service level objectives. This is me giving you context experience by working with enterprises and trying all the other things with champions and non-scale other things. I'm sure people in the room have experience with other approaches that work. In my view I will follow that one so that developers are dedicated and where the problems arrive to them they have the objectives and the knowledge and the knowledge to pixels.

Thank you.

Section 16 — Q&A 2: AI-as-judge / dual LLM

Audience Q2: Have a question. Yeah quick very quick question. You use AI as judge in these kind of examples that's kind of gaining attention right now is instead of focusing on just one AI model to do all the checking.

Joseph: So you are referring to Llamas. Yeah, I do some approach. Is using the approach as well in Azure to make sure for polynomial policies. I want to explain it so give me one and a half minute here. So this technique is working. It's not perfect. For those that might not be familiar known as dual LLM or LLM jury the whole idea is that you have a seam of LLM judging their work of the first. So the second LLM you can ask questions like is this first LLM going to leak some sensitive information? Actually you can try this technique. It's part of that season 3 level 3 level 4. It's implemented if they are. I put all the latest on there. Azure is using that technique to answer questions in production something like is the user trying to hack me? Is it trying to do this? Do that. This technique is nice. It's working is not perfect. People are bypassing it by saying I'm another hacker. I'm trying to test me and the other element. The end has to give a deterministic answer. Right so is this a hacking attempt? Yes or no? The more I try the more I hack it because when I run it 10 times I'm successful the three times and I attack your house to succeed once. No matter how much you defend and how much bucket you spend in everything in your company. So it's working. It's a good mitigation but it's a mitigation. If you don't plan output filtering, input filtering, I'm not a big fan because you restrict AI input. So if you don't do output filtering. If you don't do the number one thing list privilege access right AI shouldn't be touching anything sensitive because assume it's gonna take it using achieve agents should have been negotiating any boundaries. So that's the number one thing don't give access to anything that AI shouldn't have access use containers and then use LLM to validation and so on as many agents. Right? Perfect. Thanks so much great questions.

Section 17 — Emcee outro

Every talk you see at this conference is going to be turned into a skill that you can install on your agent that you can then be like hey how do I get the most out of Joseph's talk in my code base today? And it should work if you have questions for Joseph please find him there's now a coffee break. Talks will resume in this room at 11:05. Thank you everyone. Seems to remember seeing something come up by the haven.

.tessl-plugin

talk-batey-building-product-teams-age-of-ai

talk-birgitta-closing-keynote

talk-debois-agent-enablement

talk-douglas-training-ai-on-your-own-code

talk-dubnov-merge-rate-ai-adoption

talk-farley-vibe-coding-best-we-can-do

talk-firtman-web-mcp-agentic-web

talk-foxwell-reinvention-dev-team

talk-graziano-spec-driven-development

talk-groetzinger-skills-everywhere

talk-jones-odevo-ai-native-transformation

talk-jourdan-pipelines-to-prompts

talk-katsioloudes-code-security-ai

talk-lamis-context-engineering-dreaming

talk-lawson-agent-experience

talk-luebken-embedding-pi-coding-agent

talk-maleix-collective-intelligence

talk-maple-ai-native-devcon-welcome-slick

talk-maple-ai-native-devcon-welcome-spec-reviewer

talk-maple-aind-devcon-welcome

talk-maple-context-engineering-skills

talk-maple-continuous-ai-github-workflows

talk-maple-harness-engineering

talk-maple-tldraw-ai-canvas-experiments

talk-marsden-agent-desktops

talk-martinelli-spec-driven-development

talk-moss-skills-team-workflow

talk-overweg-one-brain-no-filtering

talk-podjarny-skills-are-the-new-code

talk-roberts-ai-native-brownfield

talk-roberts-brownfield-ai-native

talk-scheire-artificial-intelligence

talk-selajev-docker-sandboxes-agents

talk-sloan-harness-engineering-beyond-code

talk-stack-humans-architect-ai-writes-code

talk-stoneham-product-brain

talk-tal-skills-security

talk-thomas-ai-native-engineering

talk-walter-runtime-intelligence-agents

talk-wilson-cq-stack-overflow-for-agents

talk-wotherspoon-humans-vs-slop

README.md

tile.json

ainativedev/latest-aidevcon-speakers-london-2026

transcript.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}talk-katsioloudes-code-security-ai/

Transcript — Code Security Reinvented: Navigating the era of AI