cekura-predefined-metrics

Use when the user asks "what predefined metrics are available", "which built-in metrics should I use", "what does CSAT measure", "how does hallucination detection work", "what's the difference between Interruption Score and AI Interrupting User", "which metrics are free", "which metrics need audio", "configure silence threshold", "set up sentiment metric", or any question about Cekura's out-of-the-box metrics. Covers the full catalog of predefined metrics — what each does, costs, constraints, configuration options, and when to use each one.

Quality

—

Does it follow best practices?

Impact

—

No eval scenarios have been run

Securityby

Passed

No known issues

Quality

Content

85%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

The body is a well-structured, actionable catalog with a clear validated workflow and good progressive disclosure into three reference files. Its main weakness is redundancy: the cost quick-reference, key constraints, and enabling-metrics sections restate catalog content and could be condensed.

Suggestions

Collapse the 'Cost & Credits Quick Reference' table or the 'Key Constraints' section into references to the catalog tables (e.g., 'See the Notes and Cost columns above'), since both largely duplicate inline content.

Remove or merge the standalone 'Enabling Predefined Metrics' section — its two-step requirement is already covered as steps 3-4 of 'The Predefined Metrics Workflow' and risks contradicting or drifting from it.

Trim repeated caveats (e.g., the 'main agent'/'testing agent' naming rule appears in both the Expected Outcome row and Common Pitfalls); state once and cross-reference.

Dimension	Reasoning	Score
Conciseness	Mostly efficient domain reference, but the 'Cost & Credits Quick Reference', 'Key Constraints', and the standalone 'Enabling Predefined Metrics' two-step block largely restate information already present in the catalog tables and workflow, so the body could be tightened. It is not score 1 because it never explains generic concepts Claude already knows and is not padded with fluff.	2 / 3
Actionability	Provides concrete config keys, types, defaults, and examples (dropoff_nodes arrays, pronunciation_words phoneme pairs), an exact API endpoint, and specific numeric thresholds (10s silence, 2000ms latency, 0.4-0.6 talk ratio, 70+ score cutoffs). For an instruction/catalog skill this is copy-paste-ready, actionable guidance.	3 / 3
Workflow Clarity	The six-step 'Predefined Metrics Workflow' is clearly sequenced and includes an explicit validation checkpoint ('Validate by running — Execute a small batch and review results') with a feedback loop to the Common Pitfalls section. The two-step enabling requirement and 'missing either means the metric never fires' add a clear checkpoint.	3 / 3
Progressive Disclosure	SKILL.md serves as an overview with three real, verified one-level-deep reference files (configuration-guide.md, api-reference.md, selection-by-use-case.md), each clearly signaled inline and indexed in an 'Additional Resources' section — matching the well-signaled one-level-deep anchor.	3 / 3
	Total	11 / 12 Passed

Description

100%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

The description is specific, trigger-rich, complete, and clearly distinct. It leads with natural user phrasings and pairs an explicit 'Use when' clause with a concrete statement of what the skill covers. Voice is appropriately third person throughout.

Dimension	Reasoning	Score
Specificity	Lists multiple concrete coverage dimensions ('what each does, costs, constraints, configuration options') plus concrete trigger actions ('configure silence threshold', 'set up sentiment metric'), matching the 'lists multiple specific concrete actions' anchor.	3 / 3
Completeness	Explicitly answers both: 'Use when the user asks...' (when) and 'Covers the full catalog of predefined metrics — what each does, costs, constraints, configuration options, and when to use each one' (what), with explicit triggers.	3 / 3
Trigger Term Quality	Surfaces many natural phrases a user would actually say ('what does CSAT measure', 'how does hallucination detection work', 'which metrics are free', 'which metrics need audio'), giving broad coverage of natural terms.	3 / 3
Distinctiveness Conflict Risk	Tightly scoped to 'Cekura's out-of-the-box metrics' with named metric triggers (CSAT, Interruption Score, hallucination), establishing a clear niche unlikely to conflict with other skills.	3 / 3
	Total	12 / 12 Passed

Validation

100%

Warnings & errors only

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation — 16 / 16 Passed

Validation for skill structure

No warnings or errors.

Repository: cekura-ai/cekura-skills
Commit: f0854af

Reviewed: about 23 hours ago

Table of Contents

Discovery Implementation Validation

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.