CtrlK
BlogDocsLog inGet started
Tessl Logo

codex

Hand off a task to Codex CLI for autonomous execution. Use when a task would benefit from a capable subagent to implement, fix, investigate, or review code. Codex has full codebase access and can make changes.

83

2.67x
Quality

76%

Does it follow best practices?

Impact

99%

2.67x

Average score across 3 eval scenarios

SecuritybySnyk

Advisory

Suggest reviewing before use

Optimize this skill with Tessl

npx tessl skill review --optimize ./data/skills-md/0xbigboss/claude-code/codex/SKILL.md
SKILL.md
Quality
Evals
Security

Quality

Discovery

75%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

The description is well-structured with a clear 'Use when' clause and a distinct niche around Codex CLI delegation. Its main weakness is that the listed capabilities (implement, fix, investigate, review) are somewhat broad, and the trigger terms could better cover natural user phrasings for task delegation. Overall it's a solid, functional description that would perform well in skill selection.

Suggestions

Add more specific concrete actions beyond the broad categories, e.g., 'run tests, apply patches, refactor modules, debug errors' to improve specificity.

Include more natural trigger terms users might say, such as 'delegate', 'run in background', 'parallel task', or 'offload work' to improve discoverability.

DimensionReasoningScore

Specificity

Names the domain (autonomous code execution via Codex CLI) and lists some actions ('implement, fix, investigate, or review code'), but these are fairly broad categories rather than highly specific concrete actions like 'run tests', 'create pull requests', or 'apply patches'.

2 / 3

Completeness

Clearly answers both 'what' (hand off a task to Codex CLI for autonomous execution, with full codebase access and ability to make changes) and 'when' ('Use when a task would benefit from a capable subagent to implement, fix, investigate, or review code') with an explicit 'Use when' clause.

3 / 3

Trigger Term Quality

Includes some relevant keywords like 'Codex CLI', 'subagent', 'implement', 'fix', 'investigate', 'review code', and 'codebase'. However, it misses common user phrasings like 'delegate', 'background task', 'run in parallel', or 'autonomous agent', and 'subagent' is more technical jargon than a natural user term.

2 / 3

Distinctiveness Conflict Risk

The description is clearly about delegating to a specific tool (Codex CLI) for autonomous execution, which is a distinct niche unlikely to conflict with other skills. The mention of 'subagent' and 'Codex CLI' makes it clearly distinguishable from general coding or code review skills.

3 / 3

Total

10

/

12

Passed

Implementation

77%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This is a well-crafted, highly actionable skill with clear workflow sequencing and concrete executable commands throughout. Its main weakness is moderate verbosity — some sections explain things Claude could infer (model descriptions, complexity heuristics) and the overall length could benefit from splitting detailed reference material into separate files. The monitoring section with explicit DO/DON'T guidance and the conditional flag logic are particular strengths.

Suggestions

Trim model descriptions to just model names and one-word descriptors (e.g., 'gpt-5.2-codex - default', 'o3 - deep reasoning') since Claude doesn't need marketing-style descriptions.

Consider extracting the CTCO prompt template and monitoring commands into a referenced file to reduce the main skill's token footprint.

DimensionReasoningScore

Conciseness

The skill is fairly well-structured but includes some unnecessary verbosity. The model descriptions ('Flagship model, best for complex professional tasks', etc.) and the detailed complexity assessment section add tokens that Claude could infer. The CTCO prompt template is somewhat redundant given Claude knows how to structure prompts. However, most content is operationally relevant.

2 / 3

Actionability

The skill provides concrete, executable bash commands throughout — from git state gathering, to mkdir, to the full codex exec command with heredoc syntax, to monitoring commands. Flag rules are specific and conditional. The output format template is copy-paste ready.

3 / 3

Workflow Clarity

The multi-step workflow is clearly sequenced: parse arguments → assess complexity → gather context → generate prompt → execute → monitor → return result. Validation checkpoints are present (check git repo status, check if summary exists before reading, token-efficient monitoring with explicit DO/DON'T rules). The background vs foreground decision tree and monitoring feedback loop are well-defined.

3 / 3

Progressive Disclosure

The content is entirely self-contained in one file with no references to external documentation, which is acceptable for a skill of this complexity. However, at ~200+ lines, some sections (like the full CTCO prompt template, the model list, or the monitoring details) could be split into referenced files. The structure uses headers well but the document is on the edge of being a monolithic wall.

2 / 3

Total

10

/

12

Passed

Validation

81%

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation9 / 11 Passed

Validation for skill structure

CriteriaDescriptionResult

allowed_tools_field

'allowed-tools' contains unusual tool name(s)

Warning

frontmatter_unknown_keys

Unknown frontmatter key(s) found; consider removing or moving to metadata

Warning

Total

9

/

11

Passed

Repository
NeverSight/skills_feed
Reviewed

Table of Contents

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.