Use when the user asks to "create a metric", "write a metric", "design a metric", "build a metric for", "evaluate agent performance", "measure call quality", "track a KPI", "add a workflow metric", "improve my metric", "fix a metric", "debug metric results", "set up quality scoring", or "what metrics do I need". Also relevant when discussing LLM judge prompts, custom code metrics, evaluation triggers, VALID_SKIP patterns, section extraction, or metric best practices for Cekura voice AI agents. Covers both creating new metrics and reviewing, iterating on, or troubleshooting existing ones.
81
71%
Does it follow best practices?
Impact
98%
1.38xAverage score across 3 eval scenarios
Advisory
Suggest reviewing before use
Optimize this skill with Tessl
npx tessl skill review --optimize ./cekura/skills/cekura-metric-design/SKILL.mdLLM judge metric with anti-cross-pollination scoping
Metric type llm_judge
100%
100%
Prompt in description field
100%
100%
No deprecated type
100%
100%
SCOPE & FOCUS present
100%
100%
Generic scoping language
100%
100%
DO NOT FLAG section
75%
100%
Closed FAILURE CONDITIONS
62%
100%
N/A conditions checked first
100%
100%
Relevant variables only
33%
100%
Timestamps in output instructions
0%
100%
Safeguarding / spirit vs letter
75%
100%
Correct eval_type
0%
100%
Design notes explain cross-pollination fix
100%
100%
Conditional trigger design and two-layer N/A strategy
Positive-then-negative pattern
100%
100%
Short-call exclusion present
91%
100%
Two-layer distinction explained
100%
100%
VALID_SKIP pattern demonstrated
25%
100%
Trigger type selection
100%
100%
Inclusive trigger instruction
0%
100%
Specific flow indicators
100%
100%
Transfer/human exclusion
25%
25%
Rationale quality
100%
100%
No 'always' for conditional flows
100%
100%
Dynamic variable metrics and tool call hallucination architecture
One metric per dynamic variable
100%
100%
Specific variable reference only
30%
100%
Tool-to-scenario mapping
80%
100%
Tool metric DO NOT FLAG API errors
0%
100%
Tool metric closed FAILURE CONDITIONS
50%
100%
Tool metric scope: tool correctness only
0%
100%
Baseline metrics recommended
25%
100%
Two-step activation documented
50%
100%
llm_judge as default type
100%
100%
Tool metric not custom_code
100%
100%
Identity verification prerequisite in tool metric
100%
100%
schedule_payment vs promise_to_pay distinction
100%
100%
24ad1d0
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.