Hand off a task to Codex CLI for autonomous execution. Use when a task would benefit from a capable subagent to implement, fix, investigate, or review code. Codex has full codebase access and can make changes.
83
76%
Does it follow best practices?
Impact
99%
2.67xAverage score across 3 eval scenarios
Advisory
Suggest reviewing before use
Optimize this skill with Tessl
npx tessl skill review --optimize ./data/skills-md/0xbigboss/claude-code/codex/SKILL.mdQuality
Discovery
75%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
This is a solid description that clearly communicates both what the skill does and when to use it, with a distinct niche around Codex CLI delegation. Its main weakness is that the action verbs ('implement, fix, investigate, review') are somewhat broad, and the trigger terms could include more natural user language variations. Overall it performs well for skill selection purposes.
Suggestions
Add more specific concrete actions or examples, e.g., 'delegate file refactoring, bug fixes, test writing, or code reviews to Codex CLI'.
Include more natural trigger terms users might say, such as 'delegate', 'run autonomously', 'background task', or 'offload work'.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | Names the domain (Codex CLI, autonomous execution) and some actions ('implement, fix, investigate, or review code'), but these are fairly broad categories rather than highly specific concrete actions like 'run shell commands', 'create pull requests', or 'generate diffs'. | 2 / 3 |
Completeness | Clearly answers both 'what' (hand off a task to Codex CLI for autonomous execution, full codebase access, can make changes) and 'when' ('Use when a task would benefit from a capable subagent to implement, fix, investigate, or review code') with an explicit 'Use when' clause. | 3 / 3 |
Trigger Term Quality | Includes some relevant terms like 'Codex CLI', 'subagent', 'implement', 'fix', 'investigate', 'review code', and 'codebase access'. However, it misses natural user phrases like 'delegate', 'background task', 'run in parallel', or 'autonomous agent', and 'subagent' is more technical jargon than a natural user term. | 2 / 3 |
Distinctiveness Conflict Risk | The description is clearly about delegating to Codex CLI specifically, which is a distinct tool/workflow. The mention of 'subagent', 'autonomous execution', and 'Codex CLI' creates a clear niche that is unlikely to conflict with other coding or review skills. | 3 / 3 |
Total | 10 / 12 Passed |
Implementation
77%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This is a well-crafted skill with strong actionability and workflow clarity — it provides concrete commands, clear sequencing, and explicit validation/monitoring steps throughout the Codex subagent lifecycle. The main weaknesses are moderate verbosity (model descriptions, complexity assessment examples that Claude could infer) and the monolithic structure that could benefit from splitting detailed reference material into separate files. Overall it's a solid, production-ready skill that effectively guides Claude through a complex multi-step orchestration task.
Suggestions
Trim the model descriptions to just model names and one-word descriptors (e.g., 'gpt-5.2 - flagship'), since Claude doesn't need marketing-style descriptions to select the right model.
Consider extracting the CTCO prompt template and the model/sandbox reference tables into a separate REFERENCE.md to reduce the main file's token footprint.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | The skill is fairly well-structured but includes some unnecessary verbosity. The model descriptions ('Flagship model, best for complex professional tasks') and the task complexity assessment section explain things Claude can infer. The CTCO prompt template and monitoring sections are useful but could be tighter. Some sections like 'Assess Task Complexity' with examples are helpful but borderline redundant for Claude. | 2 / 3 |
Actionability | The skill provides concrete, executable bash commands throughout — from git state gathering, to mkdir, to the actual codex exec command with heredoc syntax, to monitoring commands (wc -l, tail -n 3), to reading results. Flag rules are specific and conditional. The CTCO prompt template is copy-paste ready with clear placeholders. | 3 / 3 |
Workflow Clarity | The multi-step workflow is clearly sequenced: parse arguments → assess complexity → gather context → generate prompt → execute → monitor → return result. Each step has explicit instructions. Monitoring includes validation checkpoints (check if summary exists, check line count, tail for status). Background vs foreground decision criteria are explicit. The 'Do NOT' anti-patterns for monitoring serve as important guardrails. | 3 / 3 |
Progressive Disclosure | The content is a single monolithic file with no references to supporting files, which is acceptable given no bundle exists. However, at ~200+ lines, some sections (like the full CTCO prompt template, the model list, or the detailed monitoring instructions) could benefit from being split into referenced files. The internal organization with clear headers is good, but the length pushes against what should be inline. | 2 / 3 |
Total | 10 / 12 Passed |
Validation
81%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 9 / 11 Passed
Validation for skill structure
| Criteria | Description | Result |
|---|---|---|
allowed_tools_field | 'allowed-tools' contains unusual tool name(s) | Warning |
frontmatter_unknown_keys | Unknown frontmatter key(s) found; consider removing or moving to metadata | Warning |
Total | 9 / 11 Passed | |
aa009ea
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.