Cross-model benchmark for gstack skills. Runs the same prompt through Claude, GPT (via Codex CLI), and Gemini side-by-side — compares latency, tokens, cost, and optionally quality via LLM judge. Answers "which model is actually best for this skill?" with data instead of vibes. Separate from /benchmark, which measures web page performance. Use when: "benchmark models", "compare models", "which model is best for X", "cross-model comparison", "model shootout". (gstack) Voice triggers (speech-to-text aliases): "compare models", "model shootout", "which model is best".
Security
2 findings — 1 critical severity, 1 medium severity. Installing this skill is not recommended: please review these findings carefully if you do intend to do so.
Detected a prompt injection in the skill instructions. The skill contains hidden or deceptive instructions that fall outside its stated purpose and attempt to override the agent’s safety guidelines or intended behavior.
Potential prompt injection detected (high risk: 0.90). The skill embeds numerous side-effecting instructions unrelated to benchmarking—automatic telemetry writes, config changes (proactive/routing/telemetry), possible git commits to CLAUDE.md, and GBrain sync/publish flows (potentially publishing session memory)—and it instructs the agent to prioritize/auto-run these steps (including plan-mode exceptions), which are hidden/outsized behaviors outside the skill's stated benchmarking purpose.
The skill exposes the agent to untrusted, user-generated content from public third-party sources, creating a risk of indirect prompt injection. This includes browsing arbitrary URLs, reading social media posts or forum comments, and analyzing content from unknown websites.
Third-party content exposure detected (high risk: 0.80). The skill explicitly runs external providers via the gstack-model-benchmark (Step 4/5) which streams and interprets remote model outputs to decide the "best" model, and its GBrain Sync (skill start) and optional Lake intro flow perform git fetch / gstack-brain-sync and may open a public URL (https://garryslist.org/...)—all clear instances of ingesting third‑party web/repo/model content that can materially influence actions.
db9447c
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.