CtrlK
BlogDocsLog inGet started
Tessl Logo

codex

Use when the user asks to run Codex CLI (codex exec, codex resume) or references OpenAI Codex for code analysis, refactoring, or automated editing. Uses GPT-5.2 by default for state-of-the-art software engineering.

Install with Tessl CLI

npx tessl i github:softaworks/agent-toolkit --skill codex
What are skills?

Overall
score

84%

Does it follow best practices?

Validation for skill structure

SKILL.md
Review
Evals

Discovery

90%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

This is a well-structured description with strong trigger terms and explicit 'Use when' guidance. The main weakness is that the capability descriptions are somewhat generic (code analysis, refactoring, automated editing) rather than listing specific concrete actions the skill can perform. The mention of GPT-5.2 adds context but doesn't clarify capabilities.

Suggestions

Add more specific concrete actions beyond 'code analysis, refactoring, or automated editing' - e.g., 'generate code patches, apply multi-file edits, explain code changes'

DimensionReasoningScore

Specificity

Names the domain (Codex CLI) and some actions (code analysis, refactoring, automated editing), but doesn't list comprehensive concrete actions. The capabilities are somewhat vague - 'code analysis' and 'automated editing' are broad categories rather than specific operations.

2 / 3

Completeness

Explicitly answers both what (run Codex CLI for code analysis, refactoring, automated editing) and when (starts with 'Use when the user asks to run Codex CLI...'). Has clear explicit trigger guidance at the beginning.

3 / 3

Trigger Term Quality

Includes strong natural trigger terms users would say: 'codex exec', 'codex resume', 'OpenAI Codex', 'code analysis', 'refactoring', 'automated editing'. Good coverage of both command-specific and task-oriented keywords.

3 / 3

Distinctiveness Conflict Risk

Very distinct niche - specifically targets Codex CLI commands ('codex exec', 'codex resume') and OpenAI Codex references. Unlikely to conflict with general coding skills due to the specific tool focus and command names.

3 / 3

Total

11

/

12

Passed

Implementation

77%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This is a well-structured, actionable skill with clear workflows and good use of tables for quick reference. The main weaknesses are minor redundancy in the model information and the monolithic structure that could benefit from splitting reference material. The explicit permission checks and error handling demonstrate good safety practices.

Suggestions

Remove duplicate model entries (gpt-5.2 and gpt-5.2-max appear to have identical stats) or clarify their differences

Consider moving the detailed Model Options table to a separate MODELS.md reference file to keep the main skill leaner

DimensionReasoningScore

Conciseness

Generally efficient but includes some redundancy (model table has duplicate gpt-5.2 and gpt-5.2-max with same stats, CLI version section repeats default model info). The quick reference table is well-structured but some explanatory text could be tighter.

2 / 3

Actionability

Provides concrete, copy-paste ready commands with specific flags, clear syntax for resume operations, and a practical quick reference table. The command patterns are executable and complete.

3 / 3

Workflow Clarity

Clear numbered sequence for running tasks with explicit validation checkpoints (ask user for reasoning effort, confirm permissions for high-impact flags, summarize outcomes). Error handling section provides explicit feedback loops for failures and partial results.

3 / 3

Progressive Disclosure

Content is well-organized with clear sections and tables, but everything is inline in a single file. The model options table and reasoning effort levels could be split to a reference file for a cleaner overview, though the current length (~80 lines) is borderline acceptable.

2 / 3

Total

10

/

12

Passed

Validation

81%

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation13 / 16 Passed

Validation for skill structure

CriteriaDescriptionResult

metadata_version

'metadata' field is not a dictionary

Warning

license_field

'license' field is missing

Warning

body_examples

No examples detected (no code fences and no 'Example' wording)

Warning

Total

13

/

16

Passed

Reviewed

Table of Contents

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.