Use when the user wants Codex to build, refine, test, or validate a CLI-Anything harness for a GUI application or source repository. Adapts the CLI-Anything methodology to Codex without changing the generated Python harness format.
74
62%
Does it follow best practices?
Impact
91%
1.75xAverage score across 3 eval scenarios
Advisory
Suggest reviewing before use
Optimize this skill with Tessl
npx tessl skill review --optimize ./codex-skill/SKILL.mdQuality
Discovery
75%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
The description has a clear 'Use when...' clause and targets a distinctive niche, which are its main strengths. However, the specific actions (build, refine, test, validate) are somewhat generic, and the trigger terms rely heavily on the specialized term 'CLI-Anything' which users may not naturally use. Adding more natural language synonyms and concrete capability details would improve discoverability.
Suggestions
Add natural language trigger terms that users might actually say, such as 'command line wrapper', 'CLI wrapper for GUI', 'automate GUI testing', or 'headless automation'.
List more concrete actions/outputs, e.g., 'Generates a Python CLI harness that wraps GUI interactions, adds argument parsing, and includes test scaffolding for automated validation.'
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | The description names a specific domain ('CLI-Anything harness for a GUI application or source repository') and mentions actions like 'build, refine, test, or validate', but these actions are somewhat generic and don't describe concrete capabilities like specific outputs or transformations. | 2 / 3 |
Completeness | The description explicitly answers both 'what' (build/refine/test/validate a CLI-Anything harness, adapts methodology to Codex) and 'when' ('Use when the user wants Codex to build, refine, test, or validate a CLI-Anything harness for a GUI application or source repository'). The 'Use when...' clause is present and explicit. | 3 / 3 |
Trigger Term Quality | Includes some relevant terms like 'CLI-Anything', 'GUI application', 'harness', 'Python harness', and 'Codex', but 'CLI-Anything' is a specialized/niche term that users may not naturally say. Missing common variations like 'command line wrapper', 'CLI wrapper', 'automate GUI', or 'headless testing'. | 2 / 3 |
Distinctiveness Conflict Risk | The description targets a very specific niche — CLI-Anything harnesses for GUI applications within the Codex context — which is highly distinctive and unlikely to conflict with other skills. The combination of 'CLI-Anything', 'harness', and 'GUI application' creates a clear, unique trigger profile. | 3 / 3 |
Total | 10 / 12 Passed |
Implementation
50%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This skill provides a solid structural overview of the CLI-Anything methodology with clear modes of operation and a useful directory layout template. However, it falls short on actionability by lacking executable code examples (e.g., a Click CLI skeleton or sample setup.py), and the workflow lacks explicit validation checkpoints between steps. The content is reasonably concise but could be tightened in places where it offers generic software engineering advice.
Suggestions
Add a concrete, executable Click CLI skeleton example (even minimal) showing the REPL default, a sample subcommand, and --json flag implementation to boost actionability.
Include a sample setup.py snippet with the console_scripts entry point and find_namespace_packages call, since these are specific and easy to get wrong.
Add explicit validation checkpoints in the 7-step workflow, e.g., 'After step 2, confirm architecture analysis with user before proceeding' and 'After step 5, verify all tests pass before updating docs.'
Provide a concrete example of a backend wrapper in utils/<software>_backend.py to make the Backend Rules section actionable rather than descriptive.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | The content is mostly efficient and well-structured, but includes some sections that could be tightened—e.g., the 'Refine' mode's bullet list of preferences is somewhat generic advice Claude already knows (prefer high-impact features, compose well with existing commands). The Backend Rules and Packaging Rules sections are appropriately concise. | 2 / 3 |
Actionability | The skill provides a concrete directory structure and specific packaging commands (find_namespace_packages, console_scripts), but lacks executable code examples—no actual Click CLI skeleton, no sample setup.py, no example command implementation. Key details like how to wrap a backend executable are described abstractly rather than with copy-paste-ready code. | 2 / 3 |
Workflow Clarity | The 7-step workflow is clearly sequenced and the Validate mode provides a checklist, but there are no explicit validation checkpoints between steps (e.g., verify analysis before designing, verify tests pass before updating docs). The workflow involves code generation and installation—missing feedback loops for error recovery caps this at 2. | 2 / 3 |
Progressive Disclosure | The skill references `../cli-anything-plugin/HARNESS.md` as a deeper source of truth, which is good progressive disclosure. However, no bundle files are provided to support this reference, and the skill itself is somewhat monolithic—the modes (Build/Refine/Test/Validate) could benefit from being split or more clearly signaled as separate reference sections. The content is reasonably organized with headers but could be better structured. | 2 / 3 |
Total | 8 / 12 Passed |
Validation
100%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 11 / 11 Passed
Validation for skill structure
No warnings or errors.
436a4f5
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.