CtrlK
BlogDocsLog inGet started
Tessl Logo

cekura-metric-improvement

Use when the user asks to "improve a metric", "run labs", "leave feedback on a metric", "add to labs", "fix metric accuracy", "review metric results", "find misaligned metrics", or "iterate on metric quality". Covers the metric improvement cycle, the feedback workflow, and the labs pipeline used to refine metric accuracy over time.

68

Quality

83%

Does it follow best practices?

Impact

No eval scenarios have been run

SecuritybySnyk

Advisory

Suggest reviewing before use

SKILL.md
Quality
Evals
Security

Quality

Discovery

89%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

The description excels at trigger term coverage and completeness, with a strong 'Use when...' clause containing many natural user phrases. Its main weakness is that the 'what it does' portion is somewhat abstract — it describes the domain ('metric improvement cycle', 'feedback workflow', 'labs pipeline') without listing the specific concrete actions the skill performs. Adding more granular action verbs would strengthen specificity.

Suggestions

Replace abstract phrases like 'covers the metric improvement cycle' with specific concrete actions, e.g., 'Creates lab runs, annotates metric outputs, compares metric versions, and surfaces misaligned results.'

DimensionReasoningScore

Specificity

The description names a domain (metric improvement/labs pipeline) and mentions some actions like 'feedback workflow' and 'labs pipeline', but the actual concrete actions are vague — 'covers the metric improvement cycle' is abstract rather than listing specific steps like 'create lab runs', 'annotate metric outputs', or 'compare metric versions'.

2 / 3

Completeness

The description explicitly answers both 'when' (via the 'Use when...' clause with multiple trigger phrases) and 'what' (covers the metric improvement cycle, feedback workflow, and labs pipeline). Both dimensions are clearly addressed.

3 / 3

Trigger Term Quality

Excellent coverage of natural trigger terms: 'improve a metric', 'run labs', 'leave feedback on a metric', 'add to labs', 'fix metric accuracy', 'review metric results', 'find misaligned metrics', 'iterate on metric quality'. These are varied and represent phrases users would naturally say.

3 / 3

Distinctiveness Conflict Risk

The description targets a very specific niche — metric improvement cycles, labs pipelines, and feedback workflows for metric accuracy. The trigger terms like 'run labs', 'metric accuracy', and 'misaligned metrics' are highly distinctive and unlikely to conflict with other skills.

3 / 3

Total

11

/

12

Passed

Implementation

77%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This is a well-structured, highly actionable skill that clearly guides the metric improvement lifecycle with concrete API endpoints, validation checkpoints, and cost safeguards. Its main weakness is that the API reference section is lengthy and inline rather than in a separate file, and some sections contain slightly verbose explanations that Claude wouldn't need. The workflow clarity is excellent with proper feedback loops and explicit decision gates.

Suggestions

Move the API Endpoints Reference section to a separate file (e.g., references/api-endpoints.md) and link to it from the main skill to improve progressive disclosure and reduce inline bulk.

Trim the 'Good Feedback Patterns' / 'Bad Feedback Patterns' section — Claude understands what constitutes good feedback; a single concise example of each would suffice.

DimensionReasoningScore

Conciseness

The skill is mostly efficient and covers a complex multi-step workflow, but includes some unnecessary explanatory text (e.g., 'Good Feedback Patterns' vs 'Bad Feedback Patterns' section explains things Claude already understands about giving good feedback). Some sections like the interactive simulation could be tightened.

2 / 3

Actionability

Provides concrete API endpoints with exact paths, JSON request/response payloads, specific thresholds (6+ feedback instances, >100 calls cost guard), and clear step-by-step instructions. The API reference table and example payloads are copy-paste ready.

3 / 3

Workflow Clarity

The workflow is clearly sequenced across 6 numbered steps with explicit validation checkpoints (Step 5 validates changes, cost guard at >100 calls requires confirmation, 'manual fix first then labs' prioritization). Includes feedback loops (if validation fails, leave additional feedback and iterate) and clear decision points.

3 / 3

Progressive Disclosure

The skill references a bundle file (`references/feedback-examples.md`) and cross-references other skills (cekura-metric-design, cekura-eval-design), which is good. However, the API reference section is quite long and inline — it could be split into a separate reference file. The bundle file referenced doesn't actually exist in the provided bundle, which is a minor issue.

2 / 3

Total

10

/

12

Passed

Validation

100%

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation11 / 11 Passed

Validation for skill structure

No warnings or errors.

Repository
cekura-ai/cekura-skills
Reviewed

Table of Contents

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.