Audit applications for AI prompt injection, agent security, and LLM permission boundary vulnerabilities. Use when the user mentions 'prompt injection,' 'LLM security,' 'AI security,' 'jailbreak,' 'indirect prompt injection,' 'prompt leaking,' 'AI red team,' 'LLM vulnerabilities,' 'AI input validation,' 'system prompt extraction,' 'agent security,' 'MCP security,' 'AI permissions,' 'AI privilege escalation,' or needs to secure any application with AI features, AI agents, or LLM integrations.
68
83%
Does it follow best practices?
Impact
—
No eval scenarios have been run
Passed
No known issues
Audit applications that use AI features, LLM integrations, or AI agents for prompt injection, privilege escalation, and authorization bypass vulnerabilities.
Prompt injection is the #1 vulnerability in LLM-integrated applications (OWASP Top 10 for LLMs, LLM01). It occurs when untrusted input influences the instructions an LLM follows, causing it to ignore its system prompt, leak secrets, or take unauthorized actions.
Three attack classes:
Identify every place the application uses AI. This includes direct LLM API calls AND higher-level AI features:
Grep for LLM API calls:
- openai, anthropic, cohere, replicate, ollama
- ChatCompletion, messages.create, generate, complete
- langchain, llamaindex, autogen, crewai
Also look for AI features that may not be obvious LLM calls:
- AI-powered search or recommendations
- AI content generation (summaries, descriptions, emails)
- AI chatbots or copilots embedded in the app
- AI-assisted form completion or auto-fill
- AI moderation or classification
- AI-driven workflow automation
- MCP (Model Context Protocol) servers and tool registrationsFor each AI integration, document:
Check how prompts are assembled. Look for:
Unsanitized interpolation:
# VULNERABLE — user input directly in prompt
prompt = f"Summarize this: {user_input}"
# VULNERABLE — external data injected without marking
prompt = f"Answer based on this context: {rag_results}"Missing input/output boundaries:
# BETTER — clear delimiters separating instructions from data
prompt = f"""Summarize the text between the <document> tags.
<document>
{user_input}
</document>"""Secrets in system prompts:
# VULNERABLE — API keys, database credentials, or internal URLs in system prompt
system = f"You are a helper. Use API key {API_KEY} to call..."Check for these patterns:
Check what happens with LLM responses:
Rendered as HTML (XSS via LLM):
// VULNERABLE — LLM output rendered as raw HTML
<div dangerouslySetInnerHTML={{ __html: llmResponse }} />If the LLM can be tricked into outputting <script> tags or event handlers, and the output is rendered unsanitized, this is XSS.
Executed as code:
# VULNERABLE — LLM output passed to eval/exec
exec(llm_response)Used in database queries:
# VULNERABLE — LLM output used in raw SQL
cursor.execute(f"SELECT * FROM {llm_response}")Passed to another LLM (chained injection): If LLM A's output becomes input to LLM B, an attacker can inject instructions that propagate through the chain.
If the LLM has access to tools, function calls, or operates as an autonomous agent:
Tool inventory and validation:
# VULNERABLE — LLM can call any tool without validation
result = execute_tool(tool_name=llm_choice, args=llm_args)
# BETTER — allowlist + argument validation + confirmation for destructive actions
if tool_name not in ALLOWED_TOOLS:
raise ValueError("Tool not permitted")
validated_args = validate_tool_args(tool_name, llm_args)
if tool_name in DESTRUCTIVE_TOOLS:
require_user_confirmation(tool_name, validated_args)AI agent-specific risks:
Check for autonomous agent patterns (agent loops, multi-agent orchestration, agent frameworks):
Test whether the system prompt can be extracted:
Common extraction attempts:
Check if the application:
This is critical for apps with role-based access, multi-tenant data, or tiered permissions.
Confused deputy — does the AI inherit the right permissions?
# VULNERABLE — AI queries database with admin-level service account
results = db.query(ai_generated_sql) # Bypasses row-level security
# BETTER — AI queries execute under the requesting user's permissions
results = db.query(ai_generated_sql, user_context=request.user)Privilege escalation through AI:
Multi-tenant data leakage:
Cross-privilege injection:
Permission check checklist for AI features:
| Check | Status | Notes |
|---|---|---|
| AI tool calls go through the same auth middleware as user actions | ||
| AI database queries are scoped to the requesting user's permissions | ||
| RAG retrieval is filtered by tenant/user access level | ||
| AI cannot access admin APIs on behalf of non-admin users | ||
| Shared data consumed by AI is treated as untrusted input | ||
| AI feature access itself is gated by user role where appropriate |
Check what defenses are in place and whether they're sufficient:
| Defense | Present? | Notes |
|---|---|---|
| Input validation/sanitization | Strip or escape control characters, limit length | |
| Prompt delimiters | Clear boundaries between instructions and data | |
| Output validation | Check LLM output before rendering/executing/storing | |
| Tool call validation | Allowlist tools, validate arguments, gate destructive actions | |
| Privilege separation | LLM operates with minimum necessary permissions | |
| User-scoped AI queries | AI data access filtered by requesting user's role/tenant | |
| Agent loop limits | Max iterations, token budgets, timeouts for autonomous agents | |
| Agent memory isolation | Untrusted data cannot poison agent memory/state | |
| MCP server auth | MCP tools authenticated and scoped per user | |
| Rate limiting | Prevent automated injection attempts | |
| Monitoring/logging | Log prompts, completions, and tool calls for anomaly detection | |
| Human-in-the-loop | Require approval for high-risk actions |
# Prompt Injection Audit Report
## Application: [name]
## Date: [date]
### LLM Integration Map
| Integration | Model | User Input? | External Data? | Tools? | Output Usage |
|-------------|-------|-------------|----------------|--------|-------------|
### Findings
#### [SEVERITY] [Title]
**File:** `path/to/file:line`
**Category:** Direct Injection / Indirect Injection / Cross-Privilege Injection / Prompt Leaking / Insecure Output / Tool Abuse / Agent Security / Permission Bypass
**Description:** [What the vulnerability is]
**Attack scenario:** [How an attacker could exploit this]
**Vulnerable code:**
[code snippet]
**Remediation:**
[Fixed code with explanation]
---
### Defense Assessment
| Defense Layer | Status | Recommendation |
|--------------|--------|----------------|
### Prioritized Remediation
1. [Critical — permission bypass, privilege escalation, or multi-tenant data leakage through AI]
2. [Critical — exploitable injection paths with tool/agent access]
3. [High — unsanitized user input in prompts, agent memory poisoning]
4. [Medium — missing output validation, unbounded agent loops]
5. [Low — defense-in-depth improvements, monitoring gaps]2400590
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.