Retrospective meta-skill for the `repo-visuals` skill. Reads accumulated evaluation logs from past runs, spots patterns (recurring low-score criteria, repeated iteration failures, unsatisfied requests), consults other expert skills (skill-creator, frontend-design, etc.) where relevant, and proposes concrete edits to `repo-visuals/SKILL.md` as a reviewable diff. Runs on-demand, not per run.
85
81%
Does it follow best practices?
Impact
Pending
No eval scenarios have been run
Passed
No known issues
Quality
Discovery
85%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
This is a well-crafted description with strong specificity and distinctiveness, clearly scoped to a narrow meta-skill purpose. Its main weakness is the lack of natural user-facing trigger terms — the language is quite internal/technical, which could make it harder for Claude to match against user requests. The 'when' guidance exists but could be more explicit with a 'Use when...' clause.
Suggestions
Add a 'Use when...' clause with natural trigger phrases like 'Use when the user wants to review repo-visuals performance, improve the repo-visuals skill, or analyze past visualization run quality'.
Include more natural language trigger terms a user might say, such as 'improve skill', 'review past runs', 'skill quality analysis', or 'fix recurring issues'.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | Lists multiple specific concrete actions: reads evaluation logs, spots patterns (with examples like recurring low-score criteria, repeated iteration failures, unsatisfied requests), consults expert skills, and proposes concrete edits as a reviewable diff. | 3 / 3 |
Completeness | Clearly answers 'what' (reads logs, spots patterns, consults skills, proposes edits) and 'when' ('Runs on-demand, not per run' and implicitly when reviewing/improving the repo-visuals skill). While the 'when' clause isn't a traditional 'Use when...' format, it does explicitly state the trigger condition. | 3 / 3 |
Trigger Term Quality | Contains some relevant terms like 'retrospective', 'evaluation logs', 'repo-visuals', and 'SKILL.md', but these are fairly technical/internal. Missing natural user-facing trigger terms a user would say, such as 'improve skill', 'review past performance', or 'skill quality'. | 2 / 3 |
Distinctiveness Conflict Risk | Highly distinctive — it's a meta-skill specifically scoped to retrospective analysis of the `repo-visuals` skill, reading evaluation logs and proposing diffs to a specific SKILL.md file. Very unlikely to conflict with other skills. | 3 / 3 |
Total | 11 / 12 Passed |
Implementation
77%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This is a well-structured meta-skill with a clear, sequenced workflow and good constraints (minimum sample size, approval gates, explicit non-goals). Its main weakness is the gap between descriptive guidance and executable specifics — the tabulation step, file operations, and aggregate summary format are described conceptually but lack concrete templates or commands. The progressive disclosure is adequate but could benefit from separating reference material.
Suggestions
Add a concrete template or markdown table format for the tabulation step (step 1), showing exactly what the score summary should look like.
Provide a concrete example of the retro summary entry format that gets appended to `./evaluations/index.md`, so the output format is unambiguous.
Include the specific file-move command or script snippet for the processed-runs archival step, rather than just describing it.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | The skill is lean and efficient. Every section earns its place — no unnecessary explanations of what a retro is, no padding about meta-skills or evaluation theory. The examples are concrete and illustrative without being verbose. | 3 / 3 |
Actionability | The workflow steps are clearly described and the diff-style proposal example is helpful, but there's no executable code — no scripts for tabulating scores, no concrete commands for moving files, no template for the aggregate summary. The guidance is specific in intent but relies on Claude inferring implementation details. | 2 / 3 |
Workflow Clarity | The 5-step workflow is clearly sequenced with logical progression (read → identify → consult → propose → apply). It includes an explicit approval gate before destructive edits, a feedback loop via expert skill consultation, and clear post-processing steps (moving files to processed/). The 'do not auto-apply' constraint is explicitly stated. | 3 / 3 |
Progressive Disclosure | The content is well-structured with clear sections, but everything is inline in a single file. For a skill of this complexity — with evaluation methodology, expert consultation patterns, and diff proposal formats — some content (e.g., the tabulation template, the retro summary format for index.md) could be split into referenced files. However, it's not egregiously monolithic. | 2 / 3 |
Total | 10 / 12 Passed |
Validation
100%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 11 / 11 Passed
Validation for skill structure
No warnings or errors.
90aa043
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.