Systematically review and improve every shell feature and builtin command. Iterates through each feature/command, runs code-review, fixes issues, and re-reviews until clean.
66
52%
Does it follow best practices?
Impact
91%
1.56xAverage score across 3 eval scenarios
Advisory
Suggest reviewing before use
Optimize this skill with Tessl
npx tessl skill review --optimize ./.claude/skills/improve-loop/SKILL.mdQuality
Discovery
50%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
The description communicates a clear domain (shell features and builtins) and a defined workflow (iterative review and fix), but lacks explicit trigger guidance ('Use when...') and natural user-facing keywords. It would benefit from more specific actions and common terminology users would employ when requesting this kind of work.
Suggestions
Add an explicit 'Use when...' clause, e.g., 'Use when the user wants to audit, review, or improve shell builtins, bash/zsh features, or shell script quality.'
Include natural trigger terms users would say, such as 'bash', 'zsh', 'shell script', 'audit builtins', 'lint shell code', or specific shell names.
List more concrete actions beyond the generic review loop, e.g., 'checks for POSIX compliance, identifies deprecated syntax, improves error handling in shell builtins.'
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | Names the domain (shell features and builtin commands) and describes a process (iterate, run code-review, fix issues, re-review), but the actions are somewhat generic and process-oriented rather than listing specific concrete capabilities. | 2 / 3 |
Completeness | Describes what it does (review and improve shell features/builtins via iterative code review) but lacks an explicit 'Use when...' clause or equivalent trigger guidance, which caps this at 2 per the rubric guidelines. | 2 / 3 |
Trigger Term Quality | Includes relevant terms like 'shell', 'builtin command', 'code-review', and 'review', but misses common user variations like 'bash', 'zsh', 'shell script', 'lint', 'audit', or specific shell names that users would naturally mention. | 2 / 3 |
Distinctiveness Conflict Risk | The focus on shell features and builtin commands provides some specificity, but 'code-review' and 'fixes issues' are broad enough to overlap with general code review or linting skills. The iterative review process adds some distinction but not enough to be clearly unique. | 2 / 3 |
Total | 8 / 12 Passed |
Implementation
55%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This skill is extremely thorough and actionable with an exemplary workflow structure including gate checks, parallel execution, feedback loops, and decision tables. However, it is severely over-long and monolithic — the entire content is crammed into a single file with no progressive disclosure, making it a massive token burden. The review criteria sections (A-K) alone constitute a reference manual that should be extracted into separate files.
Suggestions
Extract the review dimensions (A through K) into a separate REVIEW_CRITERIA.md file and reference it from the main skill, reducing the SKILL.md body by ~60%.
Extract the output format templates (findings format, PR comment templates, final summary format) into a TEMPLATES.md file to reduce inline verbosity.
Remove redundant reminders that appear multiple times (e.g., 'fix implementation not tests' appears in Steps 2C, 2D, 3B, and Important Rules) — state each rule once and reference it.
Trim explanatory text that Claude can infer, such as 'The randomized order ensures that each run of the improve loop covers targets in a different sequence, avoiding systematic bias toward alphabetically early targets' — the command `sort -R` is self-explanatory.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | Extremely verbose at ~500+ lines. Massive amounts of detail that Claude could infer or that repeat themselves. The review dimensions alone (A through K) are exhaustive to the point of being a reference manual rather than a concise skill. Many sections restate the same rules (e.g., 'fix implementation not tests' appears multiple times). The step-by-step protocol with all its sub-steps, gate checks, and repeated completion checks adds significant token overhead. | 1 / 3 |
Actionability | Highly actionable with concrete bash commands, exact tool invocations (gh pr comment, go test, gofmt), specific file paths, precise output formats, and copy-paste ready code blocks throughout. The review dimensions provide specific checks with exact function names and patterns to look for. | 3 / 3 |
Workflow Clarity | Exceptionally clear multi-step workflow with explicit gating checks (TaskList verification before each step), numbered sub-steps, decision tables for branching logic, feedback loops (fix → test → retry up to 3 times), and a clear visual flow diagram. Validation checkpoints are explicit at every stage including test runs after fixes and a full sweep re-review. | 3 / 3 |
Progressive Disclosure | Monolithic wall of text with no references to external files for detailed content. The review dimensions (A-K) alone could be a separate REVIEW_CRITERIA.md file. The output format templates, the agent launch instructions, and the pentest checks could all be split into referenced files. Everything is inlined into one massive document with no bundle files to support it. | 1 / 3 |
Total | 8 / 12 Passed |
Validation
81%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 9 / 11 Passed
Validation for skill structure
| Criteria | Description | Result |
|---|---|---|
skill_md_line_count | SKILL.md is long (552 lines); consider splitting into references/ and linking | Warning |
frontmatter_unknown_keys | Unknown frontmatter key(s) found; consider removing or moving to metadata | Warning |
Total | 9 / 11 Passed | |
729dfbb
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.