Retrospective meta-skill for the `repo-visuals` skill. Reads accumulated evaluation logs from past runs, spots patterns (recurring low-score criteria, repeated iteration failures, unsatisfied requests), consults other expert skills (skill-creator, frontend-design, etc.) where relevant, and proposes concrete edits to `repo-visuals/SKILL.md` as a reviewable diff. Runs on-demand, not per run.
68
81%
Does it follow best practices?
Impact
—
No eval scenarios have been run
Passed
No known issues
Quality
Discovery
85%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
This is a well-crafted description with strong specificity and distinctiveness, clearly carving out a unique niche as a meta-skill for improving repo-visuals. The main weakness is in trigger term quality—the language is quite technical and internal-facing, which could make it harder for users to naturally invoke. The 'when' guidance exists but could be more explicit with a 'Use when...' clause.
Suggestions
Add a 'Use when...' clause with natural trigger terms like 'improve repo-visuals skill', 'review past visualization performance', 'analyze skill evaluation history', or 'skill retrospective'.
Include more user-facing synonyms or phrases that would naturally trigger selection, such as 'skill improvement', 'quality review', or 'performance analysis'.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | Lists multiple specific concrete actions: reads evaluation logs, spots patterns (with examples like recurring low-score criteria, repeated iteration failures, unsatisfied requests), consults expert skills, and proposes concrete edits as a reviewable diff. | 3 / 3 |
Completeness | Clearly answers 'what' (reads logs, spots patterns, consults skills, proposes edits as diffs) and 'when' ('Runs on-demand, not per run' and implicitly when reviewing/improving the repo-visuals skill). While the 'when' clause isn't a traditional 'Use when...' format, it does explicitly state the trigger condition. | 3 / 3 |
Trigger Term Quality | Contains some relevant terms like 'retrospective', 'evaluation logs', 'repo-visuals', and 'SKILL.md', but these are fairly technical/internal. Missing natural user-facing trigger terms a user would actually say, such as 'improve skill', 'review past performance', or 'skill quality'. | 2 / 3 |
Distinctiveness Conflict Risk | Highly distinctive as a meta-skill specifically targeting `repo-visuals` skill improvement via evaluation log analysis. The narrow scope (retrospective analysis of one specific skill's logs to propose SKILL.md edits) makes it very unlikely to conflict with other skills. | 3 / 3 |
Total | 11 / 12 Passed |
Implementation
77%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This is a well-structured meta-skill with a clear workflow, good constraints (minimum 5 runs, user approval gate), and efficient prose. Its main weakness is that the actionability could be stronger — the steps describe what to do conceptually but lack concrete formats, commands, or executable examples for key operations like tabulation and skill consultation. The progressive disclosure is adequate for the content length but could benefit from supporting bundle files.
Suggestions
Add a concrete example of the tabulation output format (e.g., a markdown table with criterion names, averages, variance columns) so Claude knows exactly what to produce in step 1.
Provide a concrete example of how to invoke/consult another skill, rather than just describing it abstractly — e.g., show the actual prompt or interaction pattern to use with `frontend-design`.
Include a template or example for the retro summary that gets appended to `./evaluations/index.md` so the output format is unambiguous.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | The skill is lean and efficient. It doesn't explain what a retro is or how diffs work — it assumes Claude's competence. Every section earns its place: when to invoke, inputs, workflow steps, outputs, and explicit non-goals. No padding or unnecessary context. | 3 / 3 |
Actionability | The workflow steps are clearly described and the diff-style proposal example is helpful, but the guidance is largely procedural prose rather than executable code/commands. Steps like 'Read all inputs. Build a small table' and 'consult the relevant expert skill(s)' lack concrete implementation details — e.g., no specific file-reading commands, no example table format, no example of how to invoke another skill. | 2 / 3 |
Workflow Clarity | The 5-step workflow is clearly sequenced with logical progression (read → identify → consult → propose → apply). It includes an explicit approval gate before applying edits (step 4: 'Wait for user approval before applying'), a post-apply bookkeeping step (moving processed files), and clear separation of concerns. The non-destructive checkpoint before editing is well-handled. | 3 / 3 |
Progressive Disclosure | The content is well-organized with clear sections and references to external files (evaluations/index.md, runs/*.md, sibling SKILL.md), but there are no bundle files to support the references. The skill references other skills (frontend-design, skill-creator) and file paths but everything is inline in a single document — for a meta-skill of this complexity, separating the tabulation format or example retro output into a reference file would improve navigation. | 2 / 3 |
Total | 10 / 12 Passed |
Validation
100%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 11 / 11 Passed
Validation for skill structure
No warnings or errors.
7f66ce9
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.