skill-refactor

Scan Codex session history for skill failures, usage patterns, and coverage gaps. Use when the user wants daily skill-health monitoring or evidence-backed recommendations about installing, improving, merging, or pruning skills.

Quality

81%

Does it follow best practices?

Impact

—

No eval scenarios have been run

Securityby

Passed

No known issues

Quality

Content

62%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This skill provides a well-structured workflow with strong validation and failure-handling sections, making it reliable for guiding Claude through a multi-step analysis process. However, it lacks concrete executable examples—no sample command invocations, no example output tables, and no schema snippets inline—which limits immediate actionability. There is also moderate redundancy across Constraints, Anti-patterns, Gotchas, and Failure mode sections.

Suggestions

Add a concrete example of the 'keep/improve/merge/retire action table' output format so Claude knows exactly what to produce, including the schema_version field.

Include at least one executable command example showing how to invoke the referenced scripts (e.g., `python scripts/scan_codex_sessions.py --scope=last-week --output=findings.json`).

Consolidate 'Anti-patterns', 'Gotchas', and overlapping 'Constraints' entries into a single 'Guardrails' section to reduce redundancy and save tokens.

Inline a minimal snippet from contract.yaml showing the evidence schema structure, so Claude doesn't need to read the reference file for basic usage.

Dimension	Reasoning	Score
Conciseness	The skill is reasonably efficient but includes some sections that could be tightened—'Philosophy' bullets are somewhat generic, 'Examples' are natural-language prompts rather than concrete demonstrations, and 'Gotchas' partially duplicates 'Anti-patterns' and 'Constraints'. Some redundancy between sections (e.g., 'Do not invent evidence' appears in both Constraints and Gotchas).	2 / 3
Actionability	The procedure provides a clear sequence of steps and references specific scripts, but there are no executable code snippets, command-line invocations, or concrete output examples. The deliverables describe what to produce but don't show a sample action table or structured output format. The referenced scripts and contract.yaml could fill this gap but aren't provided in the bundle.	2 / 3
Workflow Clarity	The six-step procedure is clearly sequenced with explicit validation steps (Section 'Validation' with four concrete checks), a fail-fast policy for missing evidence, and anti-patterns that serve as guardrails. The feedback loop of 'stop and report the exact gap' for missing sources is explicit. The workflow handles scope ambiguity with a clarification gate.	3 / 3
Progressive Disclosure	The skill references external files well (contract.yaml, session-evidence-workflow.md, two Python scripts) with clear 'Read when' signals. However, no bundle files were provided, so we cannot verify these references resolve. The asset references (PNG files) at the end appear to be a flat list rather than well-organized navigation. Some content that could be in references (e.g., the full anti-patterns and gotchas lists) is inline.	2 / 3
	Total	9 / 12 Passed

Description

100%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

This is a strong description that clearly defines a specific niche (skill health monitoring and management), provides concrete actions, and includes an explicit 'Use when' clause with natural trigger terms. It uses proper third-person voice and is concise without being vague. The description effectively distinguishes itself from other potential skills through its meta-level focus on skill lifecycle management.

Dimension	Reasoning	Score
Specificity	Lists multiple specific concrete actions: 'scan session history for skill failures, usage patterns, and coverage gaps' plus 'installing, improving, merging, or pruning skills'. These are concrete, actionable capabilities.	3 / 3
Completeness	Clearly answers both what ('Scan Codex session history for skill failures, usage patterns, and coverage gaps') and when ('Use when the user wants daily skill-health monitoring or evidence-backed recommendations about installing, improving, merging, or pruning skills').	3 / 3
Trigger Term Quality	Includes natural keywords users would say: 'skill failures', 'usage patterns', 'coverage gaps', 'skill-health monitoring', 'installing', 'improving', 'merging', 'pruning skills', 'session history'. These are terms a user managing skills would naturally use.	3 / 3
Distinctiveness Conflict Risk	Highly distinctive niche focused on meta-level skill management and health monitoring via Codex session history. Unlikely to conflict with other skills due to its specific focus on skill lifecycle management and session analysis.	3 / 3
	Total	12 / 12 Passed

Validation

90%

Warnings & errors only

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation — 10 / 11 Passed

Validation for skill structure

Criteria	Description	Result
metadata_version	'metadata.version' is missing	Warning

	Total	10 / 11 Passed

Repository: jscraik/Agent-Skills
Commit: 8e7e19d

Reviewed: 5 days ago

Table of Contents

Discovery Implementation Validation

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.