Use when the user asks to run Codex CLI (codex exec, codex resume) or references OpenAI Codex for code analysis, refactoring, or automated editing. Uses GPT-5.2 by default for state-of-the-art software engineering.
86
83%
Does it follow best practices?
Impact
88%
4.00xAverage score across 3 eval scenarios
Advisory
Suggest reviewing before use
Quality
Discovery
89%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
This is a well-structured skill description with explicit 'Use when...' guidance and strong trigger terms specific to the Codex CLI tool. The main weakness is that the capability description could be more concrete about what specific actions the skill enables beyond general categories like 'code analysis' and 'refactoring'.
Suggestions
Add more specific concrete actions beyond general categories - e.g., 'generate code patches, apply multi-file edits, execute shell commands' instead of just 'code analysis, refactoring, or automated editing'
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | Names the domain (Codex CLI) and some actions ('code analysis, refactoring, or automated editing'), but doesn't list comprehensive concrete actions. The capabilities are somewhat vague beyond the general categories mentioned. | 2 / 3 |
Completeness | Explicitly answers both what (run Codex CLI for code analysis, refactoring, automated editing using GPT-5.2) and when (starts with 'Use when...' clause with specific triggers like 'codex exec', 'codex resume', or references to OpenAI Codex). | 3 / 3 |
Trigger Term Quality | Includes strong natural trigger terms users would say: 'codex exec', 'codex resume', 'OpenAI Codex', 'code analysis', 'refactoring', 'automated editing'. Good coverage of both command-specific and task-oriented terms. | 3 / 3 |
Distinctiveness Conflict Risk | Highly distinctive with specific tool references (Codex CLI, codex exec, codex resume, OpenAI Codex, GPT-5.2). Unlikely to conflict with generic coding skills due to the specific product/command triggers. | 3 / 3 |
Total | 11 / 12 Passed |
Implementation
77%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This is a well-structured, actionable skill with clear workflows and good error handling. The command syntax and quick reference table are particularly strong. Minor improvements could be made by trimming redundant model information and potentially splitting detailed model specs to a separate reference file.
Suggestions
Consider moving the detailed Model Options table to a separate MODELS.md reference file, keeping only the default model and a brief note about alternatives in the main skill
Remove redundant information in the model table (identical context windows, repeated SWE-bench scores) to improve token efficiency
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | Generally efficient but includes some redundancy (model table repeats SWE-bench scores, context windows are identical across models). The reasoning effort levels and model descriptions could be tighter. | 2 / 3 |
Actionability | Provides concrete, executable commands with specific flags and syntax. The quick reference table gives copy-paste ready examples, and the resume syntax is explicit with proper stdin piping. | 3 / 3 |
Workflow Clarity | Clear numbered steps for running tasks, explicit validation via AskUserQuestion checkpoints, error handling section with specific recovery actions, and follow-up procedures are well-defined with feedback loops. | 3 / 3 |
Progressive Disclosure | Content is reasonably organized with clear sections and tables, but the model options table is quite detailed for an overview skill file. Could benefit from splitting model details to a separate reference file. | 2 / 3 |
Total | 10 / 12 Passed |
Validation
100%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 11 / 11 Passed
Validation for skill structure
No warnings or errors.
3027f20
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.