We're live on Product Hunt! Product huntUpvote & share feedback
Logo
Back to podcasts

Cisco Principal Engineer's Fix for AI Code Security

Transcript

Chapters

Trailer
[00:00:00]
Introduction
[00:02:07]
Defining Code Guard: Security Skills for AI
[00:04:46]
AI Culture at Cisco: Balancing Creativity & Security
[00:06:33]
The Evolution of Tooling vs. LLM Models
[00:07:55]
From OWASP Rules to Agent Skills
[00:09:09]
Challenges of Distribution Across Different IDEs
[00:12:48]
Tips for Increasing Agent Skill Activation
[00:15:48]
Measuring Success: The Tessl Evaluation Journey
[00:20:25]
Navigating Enterprise-Grade Security Scenarios
[00:24:41]
How to Build an AI Skill from Scratch
[00:28:15]
The Future of Code Guard & Open Source
[00:31:59]

In this episode

Your AI coding agent learned from millions of lines of code, including insecure ones. That means by default, it can write vulnerable code too.


So how do you fix that?


John Groetzinger, Principal Engineer at Cisco, built CodeGuard, a security skills layer that teaches coding agents how to write and review code securely. He tested it against real scenarios.


The result:
84% success rate vs 47% baseline. Nearly 2× improvement.


In this episode we get into:
• how CodeGuard works
• why Cisco open sourced it
• the surprisingly simple method that gets agents to fix their own mistakes


Try CodeGuard: cisco/software-security on the Tessl registry.


Connect with us here:

John Groetzinger: https://www.linkedin.com/in/john-groetzinger/

Cisco: https://www.linkedin.com/company/cisco/

Simon Maple: https://www.linkedin.com/in/simonmaple/

Tessl: https://www.linkedin.com/company/tesslio/

How Cisco Built Security Skills for AI Coding Agents

Getting coding agents to write secure code remains one of the harder problems in agentic development. Agents have learned from all of our code, including our worst practices, and they tend to reproduce those patterns unless given specific guidance. In a recent episode of the AI Native Dev podcast recorded at Cisco Live Amsterdam, Simon Maple sat down with John Groetzinger, a principal engineer at Cisco, to explore how the company tackled this challenge with CodeGuard, a set of security skills designed to work across multiple AI coding tools.

The conversation offered practical insights into skill development, context engineering, and the evaluation workflows that help teams know whether their agent guidance is actually working.

Why Security Context Matters for AI Coding Agents

CodeGuard emerged from a straightforward problem: Cisco wanted to accelerate software development with AI coding agents, but doing so securely and to enterprise standards required giving those agents specific security knowledge. As John explained, "You can't think of it as security skills for your AI coding agent. We really want our developers to use AI coding because it accelerates software development. But doing that securely and to our standards is very important."

The approach took existing OWASP best practices and internal security standards and distilled them into context that agents could actually consume. The key insight was simplification. Raw security documentation is too dense for effective agent consumption. "If you've ever tried to go look at OWASP schools, it's very encompassing," John noted. "It's not something you can even feed to an agent that would make sense of that, to make sense of your code."

The resulting skills cover traditional code security concerns like SQL injection and session management, but have also expanded to include MCP security and skill security as the agentic landscape evolves. This reflects a broader pattern: security context needs to grow alongside the capabilities teams give their agents.

The Distribution Problem Across Multiple Tools

One of the more instructive parts of the conversation centred on distribution. Cisco developers use Cursor, Windsurf, Claude Code, and other tools. Each has different conventions for agent configuration. Getting security context to all developers without adding friction proved surprisingly difficult.

The initial approach involved creating tool-specific packages that would unzip into the right locations for each IDE. John even wrote a script, or rather had Claude write a script, that would create symlinks from a single agents.md file to all the various tool-specific directories. But this still cluttered repositories with dot files and created maintenance overhead whenever the security guidance needed updates.

"Every time it updates, you have to merge it in," John observed. "It really complicates PR processes too. It's just noise that your team shouldn't have to deal with. Because it's not part of the central code base."

The comparison to package managers proves useful here. Just as developers do not commit entire NPM packages to their repositories, security context should be pulled in as needed rather than duplicated across codebases. This points toward a model where context engineering (https://claude.ai/blog/context-engineering-guide) happens at the organisation level, with distribution handled separately from the code itself.

Evaluating Whether Agent Security Guidance Works

Perhaps the most valuable portion of the conversation addressed evaluation. Before structured evals, John's approach was largely anecdotal: "Did I get a good vibe that this worked well? Did it do the tasks that I wanted to do efficiently?" He would share skills with other engineers, but their feedback was equally anecdotal, complicated by the fact that different tools and models produced different results.

The shift to structured evaluation through task evals provided something more concrete. CodeGuard showed a 1.79x improvement over baseline Claude Code performance across security scenarios. More importantly, the scenario-based approach revealed specific gaps. In one case, baseline performance showed zero percent success across all evaluation criteria, while the skill-equipped agent achieved success on four out of five criteria.

"I was a skeptic on 'Do I really need this?'" John admitted. "I know security; I can just tell it to go read about security and do this. Where does this help me? And it's optimised, so it's way faster. And the 1.8 improvement was impressive, especially when you have a baseline to compare it to."

The evaluation data also surfaced scenarios that John had not considered. Security requirements vary by context. Enterprise applications demand stricter session management than consumer products. Having distinct scenarios for these cases helps teams understand whether their agents are appropriately calibrated for their specific environment.

Building Skills That Agents Actually Use

The conversation surfaced several practical lessons for anyone building agent skills. First, keeping skills lean matters. Bloated skill files become less useful as agents struggle to extract relevant guidance from too much context. John's approach involves putting large concepts in separate files and referencing them from the main skill document, though activation remains inconsistent across models.

Second, building skills collaboratively with the model helps. Rather than writing skill documentation himself, John describes the task, does a dry run with the agent, and then has the agent generate the skill based on that interaction. "At the end, you're going to review everything we just did and we're going to make a skill out of that. And then we also have our first evaluation from that as well."

Third, self-healing workflows can address activation problems. When the agent fails to use relevant context, John asks it directly why it missed the hint, then has it update its own configuration to improve future behaviour. "Who better to ask than the model itself?"

From Internal Tool to Open Standard

CodeGuard has since been donated to the Coalition for Secure AI, reflecting Cisco's view that these security patterns should be broadly available. The skill is also available through public registries for teams that want to experiment with it.

For development organisations wrestling with similar challenges, the CodeGuard journey suggests a path forward: start with existing best practices, simplify them for agent consumption, solve the distribution problem at the organisation level rather than per-repository, and invest in evaluation workflows that provide actual data rather than anecdotal impressions.

The full conversation covers additional ground on model selection, context window management, and the evolving tooling landscape. Worth a listen for teams working to make their AI coding workflows more secure.

Chapters

Trailer
[00:00:00]
Introduction
[00:02:07]
Defining Code Guard: Security Skills for AI
[00:04:46]
AI Culture at Cisco: Balancing Creativity & Security
[00:06:33]
The Evolution of Tooling vs. LLM Models
[00:07:55]
From OWASP Rules to Agent Skills
[00:09:09]
Challenges of Distribution Across Different IDEs
[00:12:48]
Tips for Increasing Agent Skill Activation
[00:15:48]
Measuring Success: The Tessl Evaluation Journey
[00:20:25]
Navigating Enterprise-Grade Security Scenarios
[00:24:41]
How to Build an AI Skill from Scratch
[00:28:15]
The Future of Code Guard & Open Source
[00:31:59]