Security Benchmark Runner - Auto-activating skill for Security Advanced. Triggers on: security benchmark runner, security benchmark runner Part of the Security Advanced skill category.
38
Quality
7%
Does it follow best practices?
Impact
94%
1.02xAverage score across 3 eval scenarios
Passed
No known issues
Optimize this skill with Tessl
npx tessl skill review --optimize ./planned-skills/generated/04-security-advanced/security-benchmark-runner/SKILL.mdSOC2 compliance benchmark automation
Step-by-step structure
100%
100%
Production-ready script
100%
100%
SOC2 trust criteria coverage
100%
100%
Industry standard alignment
100%
100%
Validation against standards
100%
100%
Configuration checks included
100%
100%
Output report generation
100%
100%
Access control checks
100%
100%
Logging and monitoring checks
100%
100%
Error handling
90%
90%
Without context: $0.5579 · 2m 26s · 23 turns · 22 in / 10,713 out tokens
With context: $0.7092 · 2m 58s · 34 turns · 34 in / 9,957 out tokens
Threat modeling documentation and assessment
Structured methodology
100%
100%
Step-by-step process
100%
100%
Enterprise security domains covered
100%
100%
Threat enumeration
100%
100%
Risk validation
100%
100%
Mitigation recommendations
100%
100%
Industry standard reference
100%
100%
Data flow or trust boundary analysis
100%
100%
Compliance context
100%
100%
Actionable output format
100%
100%
Without context: $0.4627 · 3m 12s · 11 turns · 53 in / 11,352 out tokens
With context: $0.7151 · 4m 11s · 24 turns · 208 in / 13,858 out tokens
Penetration testing benchmark runner script
Runnable script produced
100%
100%
Step-by-step organization
100%
100%
Multiple pentesting domains
100%
100%
Industry tool usage
50%
70%
Standards-based validation
20%
30%
Structured results output
100%
100%
Scope safety controls
100%
100%
Compliance tagging
0%
12%
Production-ready quality
100%
100%
No destructive actions
100%
100%
Without context: $0.7116 · 3m 31s · 23 turns · 24 in / 14,552 out tokens
With context: $0.6892 · 3m 14s · 28 turns · 29 in / 12,107 out tokens
994edc4
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.