cekura-metric-improvement

Use when the user asks to "improve a metric", "run labs", "leave feedback on a metric", "add to labs", "fix metric accuracy", "review metric results", "find misaligned metrics", or "iterate on metric quality". Covers the metric improvement cycle, the feedback workflow, and the labs pipeline used to refine metric accuracy over time.

Quality

—

Does it follow best practices?

Impact

—

No eval scenarios have been run

Securityby

Advisory

Suggest reviewing before use

Quality

Content

77%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

The body is highly actionable with concrete endpoints, payloads, and well-sequenced validation checkpoints. Its main weaknesses are redundant coverage of the improvement cycle and an inline API reference that would benefit from being moved to a bundled reference file.

Suggestions

Consolidate the three overlapping descriptions of the improvement cycle ("The Labs Improvement Cycle", the Step 1-6 sections, and "Interactive Labs Simulation") into a single canonical sequence to remove redundancy.

Move the API Endpoints Reference table and JSON payload examples into a separate references/api-endpoints.md file, linking to it from the workflow steps to keep SKILL.md a lean overview.

The Cost Guard and 6+ feedback thresholds are valuable; surface them once near the top as a quick-reference checklist rather than burying them mid-workflow.

Dimension	Reasoning	Score
Conciseness	Mostly efficient and free of generic concept explanations, but the improvement cycle is described three times ("The Labs Improvement Cycle", the Step 1-6 walkthrough, and "Interactive Labs Simulation"), which is redundant and could be tightened.	2 / 3
Actionability	Provides concrete, executable guidance: exact API endpoints, JSON payloads with field types, parameter values (page_size, agent_id filters), and specific thresholds (6+ feedback instances, 20-30 call samples, >100 cost guard).	3 / 3
Workflow Clarity	The cycle is clearly sequenced (Steps 1-6) with explicit validation checkpoints (Step 5 regression checks, the >100-call Cost Guard requiring approval, and the validate-iterate feedback loop), satisfying the batch-operation validation requirement.	3 / 3
Progressive Disclosure	One clearly-signaled one-level reference exists (references/feedback-examples.md), but the substantial API Endpoints Reference table and JSON payloads are inline content that could be split into a separate reference file for a cleaner overview.	2 / 3
	Total	10 / 12 Passed

Description

100%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

The description is strong: it pairs a comprehensive set of natural trigger phrases with a concise statement of what the skill covers. Both the 'what' and 'when' are explicit and the niche is distinct.

Dimension	Reasoning	Score
Specificity	Lists multiple concrete actions such as "improve a metric", "run labs", "leave feedback on a metric", "fix metric accuracy", and "review metric results" rather than vague language, matching the top anchor.	3 / 3
Completeness	Explicitly answers both halves: the "Use when the user asks to..." clause covers when, and "Covers the metric improvement cycle, the feedback workflow, and the labs pipeline..." covers what, with explicit triggers present.	3 / 3
Trigger Term Quality	Provides good coverage of natural phrases a user would actually say ("improve a metric", "run labs", "leave feedback on a metric", "find misaligned metrics", "iterate on metric quality"), not technical jargon.	3 / 3
Distinctiveness Conflict Risk	Occupies a clear niche (Cekura metric improvement / labs feedback) with distinct triggers like "run labs" and "leave feedback on a metric" that are unlikely to fire for unrelated skills.	3 / 3
	Total	12 / 12 Passed

Validation

100%

Warnings & errors only

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation — 16 / 16 Passed

Validation for skill structure

No warnings or errors.

Repository: cekura-ai/cekura-skills
Commit: f0854af

Reviewed: about 23 hours ago

Table of Contents

Discovery Implementation Validation

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.