improve-loop

Systematically review and improve every shell feature and builtin command. Iterates through each feature/command, runs code-review, fixes issues, and re-reviews until clean.

1.56x

Quality

—

Does it follow best practices?

Impact

91%

1.56x

Average score across 3 eval scenarios

Securityby

Advisory

Suggest reviewing before use

Quality

Content

77%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

The skill is highly actionable with a rigorous, gated multi-step workflow and explicit validation loops. Its weaknesses are verbosity (padding and repetition across ~540 lines) and a monolithic structure that would benefit from splitting the review-dimensions reference into a bundled file.

Suggestions

Move the A–K review-dimensions checklist into a bundled reference file (e.g. references/review-dimensions.md) and link to it from the body, improving both conciseness and progressive disclosure.

Collapse the repeated security/prompt-injection warnings into a single stated-once section; the duplicated agent-instruction paragraph inflates token cost.

Trim the 'STOP — READ THIS' framing and repeated 'Never skip steps' rationale to essentials — the gate-check protocol already enforces ordering.

Dimension	Reasoning	Score
Conciseness	The body is mostly efficient domain-specific guidance (RULES compliance, GTFOBins, review dimensions A–K) that Claude would not already know, but at ~540 lines it carries noticeable padding — repeated security warnings, 'STOP — READ THIS' framing, and redundant gate-check prose — so it is not lean enough for a 3.	2 / 3
Actionability	It provides exact, copy-paste-ready commands ('gh pr view ... --json', 'ls -d interp/builtins/*/'), concrete commit-message formats, output templates, and explicit agent-launch examples — fully executable guidance.	3 / 3
Workflow Clarity	Steps are strictly sequenced with TaskList gate checks between phases, validation feedback loops (run tests → fix → re-test, max 3 attempts), an explicit decision table in 2G, and a hard 50-batch limit, satisfying the clear-sequence-with-checkpoints anchor.	3 / 3
Progressive Disclosure	No bundle files exist in references/scripts/assets, and the ~540-line body is monolithic; the detailed review-dimensions block (A–K, ~100 lines) is well-organized but inline content that could be split into a reference file, matching the 'some structure but content that should be separate is inline' anchor.	2 / 3
	Total	10 / 12 Passed

Description

67%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

The description is specific, third-person, and occupies a clear niche, but it lacks an explicit 'Use when...' trigger clause, which caps completeness and trigger-term quality. Adding a usage-trigger sentence would lift both dimensions.

Suggestions

Append an explicit trigger clause, e.g. 'Use when systematically reviewing and improving shell builtins or shell features on a PR, or when the user asks to harden/review shell command behavior.'

Include natural user phrasings ('hardening', 'bash compatibility', 'sandbox review') to broaden trigger-term coverage.

Keep the third-person voice but ensure the trigger sentence names the entry conditions (a PR or branch under review).

Dimension	Reasoning	Score
Specificity	Lists several concrete actions — 'review and improve', 'runs code-review', 'fixes issues', and 're-reviews until clean' — matching the multi-action anchor rather than the single-action level 2.	3 / 3
Completeness	It clearly states what the skill does, but the 'when to use' trigger is only implied (no 'Use when...' clause); per the guidelines a missing explicit trigger clause caps completeness at 2.	2 / 3
Trigger Term Quality	Terms like 'shell feature', 'builtin command', and 'code-review' are relevant but there is no explicit 'Use when...' trigger guidance and coverage of natural user phrasings is partial, so it sits at 'some relevant keywords' rather than full coverage.	2 / 3
Distinctiveness Conflict Risk	It targets a narrow niche — every shell feature and builtin command in a shell-interpreter codebase — with distinct triggers unlikely to collide with generic review skills.	3 / 3
	Total	10 / 12 Passed

Validation

87%

Warnings & errors only

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation — 14 / 16 Passed

Validation for skill structure

Criteria	Description	Result
skill_md_line_count	SKILL.md is long (552 lines); consider splitting into references/ and linking	Warning
frontmatter_unknown_keys	Unknown frontmatter key(s) found; consider removing or moving to metadata	Warning

	Total	14 / 16 Passed

Repository: DataDog/rshell
Commit: a0a1140

Reviewed: about 15 hours ago

Table of Contents

Discovery Implementation Validation

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.