CtrlK
BlogDocsLog inGet started
Tessl Logo

skill-arc-reactor

Build new Claude skills from scratch or supercharge existing ones through rigorous evaluation and iterative improvement. Use when the user wants to create, build, improve, evaluate, audit, enhance, benchmark, test, or package a skill. Also trigger for "turn this into a skill", "make this reusable", "I keep repeating this workflow", or references to SKILL.md, skill frontmatter, description optimization, or skill packaging. Do NOT use for general coding tasks, document creation, or other non-skill workflows. Even if the user just says "skill" in the context of Claude capabilities, this is likely the right skill to load.

94

Quality

92%

Does it follow best practices?

Impact

Pending

No eval scenarios have been run

SecuritybySnyk

Advisory

Suggest reviewing before use

SKILL.md
Quality
Evals
Security

Quality

Discovery

100%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

This is a strong, well-crafted description that excels across all dimensions. It provides specific capabilities, extensive natural trigger terms, explicit 'Use when' and 'Do NOT use' clauses, and clear boundaries that distinguish it from general coding or document skills. The inclusion of quoted user phrases like 'turn this into a skill' and 'I keep repeating this workflow' is particularly effective for matching real user intent.

DimensionReasoningScore

Specificity

Lists multiple concrete actions: build new skills, improve existing ones through evaluation, iterative improvement, benchmarking, testing, packaging. Also specifies what NOT to use it for, adding further specificity.

3 / 3

Completeness

Clearly answers both 'what' (build new skills, improve existing ones through evaluation and iteration) and 'when' (explicit 'Use when...' clause with extensive trigger terms, plus a 'Do NOT use' clause for disambiguation). Both dimensions are thoroughly addressed.

3 / 3

Trigger Term Quality

Excellent coverage of natural trigger terms: 'create', 'build', 'improve', 'evaluate', 'audit', 'enhance', 'benchmark', 'test', 'package', 'turn this into a skill', 'make this reusable', 'I keep repeating this workflow', 'SKILL.md', 'skill frontmatter', 'description optimization', 'skill packaging'. These are terms users would naturally say.

3 / 3

Distinctiveness Conflict Risk

Highly distinctive with a clear niche around skill creation/improvement. The explicit 'Do NOT use for general coding tasks, document creation, or other non-skill workflows' clause actively reduces conflict risk with other skills.

3 / 3

Total

12

/

12

Passed

Implementation

85%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This is a well-crafted meta-skill with strong actionability and excellent progressive disclosure. The two-mode structure (Create/Enhance) is clearly delineated with concrete phases, validation checkpoints, and specific tooling references. The main weakness is moderate verbosity — some principles are repeated across sections, and the Improvement Philosophy section partially duplicates guidance already given in the phases — but the content density is generally high enough to justify its length.

DimensionReasoningScore

Conciseness

The skill is lengthy (~300+ lines) but most content is structural and actionable. Some sections could be tightened — the Improvement Philosophy section restates points already made earlier (e.g., description importance, gotchas), and some guidance like 'explain why over MUST' appears multiple times. However, it avoids explaining concepts Claude already knows and stays focused on novel workflow instructions.

2 / 3

Actionability

Highly actionable throughout: concrete CLI commands (python scripts/aggregate_benchmark.py, python scripts/package_skill.py), specific JSON schemas for eval files, exact file paths for references, clear phase-by-phase instructions with specific deliverables at each step. The test case design guidance includes concrete categories and a complete JSON example.

3 / 3

Workflow Clarity

Both Create and Enhance modes follow clearly numbered phases with explicit sequencing. Phase 5 (Run & Evaluate) has a particularly well-structured 5-step workflow with validation checkpoints ('Wait for user feedback before making changes'), feedback loops ('fix and re-validate'), and explicit ordering ('Launch everything at once', 'Don't wait idle'). The iterate phase includes clear stop conditions.

3 / 3

Progressive Disclosure

Excellent progressive disclosure: SKILL.md serves as the orchestration overview, with detailed content delegated to clearly-signaled reference files (references/skill-anatomy.md, references/writing-guide.md, etc.). The reference tables at the bottom provide clear navigation with 'When to read' guidance. Agent files, scripts, and hooks are each in their own well-organized tables. No deeply nested references.

3 / 3

Total

11

/

12

Passed

Validation

100%

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation11 / 11 Passed

Validation for skill structure

No warnings or errors.

Repository
endor-matt/Arc-Reactor-Skill-Evaluator
Reviewed

Table of Contents

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.