CtrlK
BlogDocsLog inGet started
Tessl Logo

cekura-metric-improvement

Use when the user asks to "improve a metric", "run labs", "leave feedback on a metric", "add to labs", "fix metric accuracy", "review metric results", "find misaligned metrics", or "iterate on metric quality". Covers the metric improvement cycle, the feedback workflow, and the labs pipeline used to refine metric accuracy over time.

68

Quality

83%

Does it follow best practices?

Impact

No eval scenarios have been run

SecuritybySnyk

Advisory

Suggest reviewing before use

SKILL.md
Quality
Evals
Security

Quality

Content

77%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This is a well-structured, highly actionable skill that clearly guides the metric improvement lifecycle with concrete API calls, validation checkpoints, and cost safeguards. Its main weakness is that it's somewhat long — the API reference section and some explanatory content (good/bad feedback patterns) could be offloaded to reference files to improve conciseness and progressive disclosure. The 'Manual Fix First' prioritization and cost guard sections are excellent additions that show domain expertise.

Suggestions

Move the API Endpoints Reference table and JSON payload examples into a separate reference file (e.g., references/api-endpoints.md) and link to it from the main skill to improve progressive disclosure and conciseness.

Trim the 'Good Feedback Patterns' / 'Bad Feedback Patterns' section — Claude understands what makes feedback good; replace with 2-3 terse examples rather than explaining the reasoning behind each pattern.

DimensionReasoningScore

Conciseness

The skill is mostly efficient and covers a complex multi-step workflow, but includes some unnecessary explanatory text (e.g., 'Good Feedback Patterns' vs 'Bad Feedback Patterns' section explains things Claude already understands about giving good feedback). Some sections like the interactive simulation could be tightened.

2 / 3

Actionability

Provides concrete API endpoints with JSON payloads, specific step-by-step workflows, exact parameter names, and copy-paste ready request bodies. The process_feedbacks and create_from_call_log examples include actual JSON schemas with important gotchas (e.g., metrics must be array of objects, not bare IDs).

3 / 3

Workflow Clarity

The 6-step cycle is clearly sequenced with explicit validation checkpoints (Step 5 validates changes, checks for regression), feedback loops (if validation fails, leave additional feedback and iterate), and a cost guard that prevents destructive bulk operations without confirmation. The 'Manual Fix First, Then Labs' section adds an important pre-workflow gate.

3 / 3

Progressive Disclosure

The skill references a bundle file 'references/feedback-examples.md' and cross-references other skills (cekura-metric-design, cekura-eval-design), which is good. However, the API endpoints reference table and detailed JSON examples are inline rather than in a separate reference file, making the main skill longer than necessary. The referenced feedback-examples.md is not provided in the bundle.

2 / 3

Total

10

/

12

Passed

Description

89%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

This description excels at trigger term coverage and completeness, with a strong 'Use when...' clause containing many natural user phrases. Its main weakness is that the 'what it does' portion relies on somewhat abstract concepts ('metric improvement cycle', 'feedback workflow', 'labs pipeline') rather than listing concrete actions the skill performs. The domain specificity makes it highly distinctive.

Suggestions

Replace abstract phrases like 'covers the metric improvement cycle' with concrete actions such as 'analyzes metric results, identifies misaligned examples, generates feedback, and queues lab runs to refine scoring accuracy'.

DimensionReasoningScore

Specificity

The description names a domain (metric improvement/labs pipeline) and mentions some actions like 'improve a metric', 'run labs', 'leave feedback', but the actual capabilities are described in abstract terms ('metric improvement cycle', 'feedback workflow', 'labs pipeline') rather than listing concrete specific actions the skill performs.

2 / 3

Completeness

The description explicitly answers both 'when' (via the 'Use when...' clause with multiple trigger phrases) and 'what' (covers the metric improvement cycle, feedback workflow, and labs pipeline for refining metric accuracy). Both dimensions are clearly addressed.

3 / 3

Trigger Term Quality

The description includes a rich set of natural trigger phrases that users would actually say: 'improve a metric', 'run labs', 'leave feedback on a metric', 'add to labs', 'fix metric accuracy', 'review metric results', 'find misaligned metrics', 'iterate on metric quality'. These are varied and cover multiple natural phrasings.

3 / 3

Distinctiveness Conflict Risk

The description targets a very specific niche — metric improvement, labs pipeline, and feedback workflows for metric accuracy. The trigger terms like 'run labs', 'metric accuracy', and 'labs pipeline' are highly domain-specific and unlikely to conflict with other skills.

3 / 3

Total

11

/

12

Passed

Validation

100%

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation11 / 11 Passed

Validation for skill structure

No warnings or errors.

Repository
cekura-ai/cekura-skills
Reviewed

Table of Contents

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.