Use when the user asks "what predefined metrics are available", "which built-in metrics should I use", "what does CSAT measure", "how does hallucination detection work", "what's the difference between Interruption Score and AI Interrupting User", "which metrics are free", "which metrics need audio", "configure silence threshold", "set up sentiment metric", or any question about Cekura's out-of-the-box metrics. Covers the full catalog of predefined metrics — what each does, costs, constraints, configuration options, and when to use each one.
67
80%
Does it follow best practices?
Impact
—
No eval scenarios have been run
Passed
No known issues
Optimize this skill with Tessl
npx tessl skill review --optimize ./cekura/skills/cekura-predefined-metrics/SKILL.mdQuality
Discovery
89%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
This is a strong skill description with excellent trigger term coverage and completeness. The 'Use when' clause is front-loaded with numerous specific example queries users might ask, and the scope is clearly defined around Cekura's predefined metrics catalog. The main weakness is that the specificity of capabilities leans more toward describing coverage areas rather than listing concrete actions the skill performs.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | The description names the domain (Cekura's predefined metrics) and mentions what it covers (what each does, costs, constraints, configuration options, when to use each), but the actions are more about answering questions than performing concrete operations. It describes coverage areas rather than listing specific concrete actions. | 2 / 3 |
Completeness | Clearly answers both 'what' (covers the full catalog of predefined metrics — what each does, costs, constraints, configuration options, and when to use each one) and 'when' (explicit 'Use when' clause with extensive trigger examples and a catch-all 'any question about Cekura's out-of-the-box metrics'). | 3 / 3 |
Trigger Term Quality | Excellent coverage of natural trigger terms including specific user phrases like 'what predefined metrics are available', 'which metrics are free', 'which metrics need audio', 'configure silence threshold', 'set up sentiment metric', and mentions of specific metric names like CSAT, hallucination detection, and Interruption Score. These are highly natural phrases a user would actually say. | 3 / 3 |
Distinctiveness Conflict Risk | Highly distinctive — it's specifically about Cekura's predefined/out-of-the-box metrics catalog. The specific product name (Cekura) and focus on predefined metrics (as opposed to custom metrics or other Cekura features) creates a clear niche that is unlikely to conflict with other skills. | 3 / 3 |
Total | 11 / 12 Passed |
Implementation
70%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This is a well-structured reference skill that excels at organizing a large catalog of metrics into scannable tables with clear constraints, costs, and availability. The workflow is clearly sequenced with good validation checkpoints and common pitfalls. The main weakness is the lack of executable examples — no API call snippets, no JSON configuration payloads, and no tool invocation examples are included inline, with key actionable details deferred entirely to reference files that weren't provided for evaluation.
Suggestions
Add at least one complete, executable API call or tool invocation example inline (e.g., toggling a metric on and attaching it to an evaluator) rather than deferring all examples to reference files.
Include one inline JSON configuration payload example (e.g., for silence_duration or dropoff_nodes) so users can act immediately without loading the configuration guide.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | The skill is generally well-organized but includes some unnecessary sections like 'Core Terminology' definitions that Claude could infer, the 'Performing Platform Actions' preamble, and the 'Predefined vs Custom Metrics' section which is somewhat obvious. The catalog tables and configuration reference are dense and efficient, but the overall document could be tightened by ~20%. | 2 / 3 |
Actionability | The skill provides concrete metric names, codes, configuration keys, and a clear two-step enablement process. However, it lacks executable code/command examples — no actual API call examples, no JSON payloads for configuration, and no tool invocation examples. The single API endpoint mentioned (GET /test_framework/v1/predefined-metrics/) is helpful but insufficient. Key actionable details are deferred to reference files that aren't provided. | 2 / 3 |
Workflow Clarity | The six-step workflow is clearly sequenced with explicit validation ('Validate by running'), the two-step activation requirement is called out prominently with a warning about the consequence of missing either step, and Common Pitfalls serves as an effective error-recovery checklist. The baseline recommendation provides a clear starting point. | 3 / 3 |
Progressive Disclosure | The skill provides a comprehensive overview with well-organized catalog tables inline (appropriate since they're the core reference), then clearly signals one-level-deep references for configuration details, API endpoints, and use-case-specific selections. The 'Next Steps' section provides clear navigation to related skills. References are consistently formatted and clearly described. | 3 / 3 |
Total | 10 / 12 Passed |
Validation
100%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 11 / 11 Passed
Validation for skill structure
No warnings or errors.
24ad1d0
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.