CtrlK
BlogDocsLog inGet started
Tessl Logo

cekura-metric-design

Use when the user asks to "create a metric", "write a metric", "design a metric", "build a metric for", "evaluate agent performance", "measure call quality", "track a KPI", "add a workflow metric", "improve my metric", "fix a metric", "debug metric results", "set up quality scoring", or "what metrics do I need". Also relevant when discussing LLM judge prompts, custom code metrics, evaluation triggers, VALID_SKIP patterns, section extraction, or metric best practices for Cekura voice AI agents. Covers both creating new metrics and reviewing, iterating on, or troubleshooting existing ones.

81

1.38x
Quality

71%

Does it follow best practices?

Impact

98%

1.38x

Average score across 3 eval scenarios

SecuritybySnyk

Advisory

Suggest reviewing before use

Optimize this skill with Tessl

npx tessl skill review --optimize ./cekura/skills/cekura-metric-design/SKILL.md
SKILL.md
Quality
Evals
Security

Evaluation results

100%

25%

Appointment Booking Flow Metric

LLM judge metric with anti-cross-pollination scoping

Criteria
Without context
With context

Metric type llm_judge

100%

100%

Prompt in description field

100%

100%

No deprecated type

100%

100%

SCOPE & FOCUS present

100%

100%

Generic scoping language

100%

100%

DO NOT FLAG section

75%

100%

Closed FAILURE CONDITIONS

62%

100%

N/A conditions checked first

100%

100%

Relevant variables only

33%

100%

Timestamps in output instructions

0%

100%

Safeguarding / spirit vs letter

75%

100%

Correct eval_type

0%

100%

Design notes explain cross-pollination fix

100%

100%

94%

18%

Conditional Metric Triggers for a Customer Support Agent

Conditional trigger design and two-layer N/A strategy

Criteria
Without context
With context

Positive-then-negative pattern

100%

100%

Short-call exclusion present

91%

100%

Two-layer distinction explained

100%

100%

VALID_SKIP pattern demonstrated

25%

100%

Trigger type selection

100%

100%

Inclusive trigger instruction

0%

100%

Specific flow indicators

100%

100%

Transfer/human exclusion

25%

25%

Rationale quality

100%

100%

No 'always' for conditional flows

100%

100%

100%

38%

Metric Architecture for a Loan Servicing Voice Agent

Dynamic variable metrics and tool call hallucination architecture

Criteria
Without context
With context

One metric per dynamic variable

100%

100%

Specific variable reference only

30%

100%

Tool-to-scenario mapping

80%

100%

Tool metric DO NOT FLAG API errors

0%

100%

Tool metric closed FAILURE CONDITIONS

50%

100%

Tool metric scope: tool correctness only

0%

100%

Baseline metrics recommended

25%

100%

Two-step activation documented

50%

100%

llm_judge as default type

100%

100%

Tool metric not custom_code

100%

100%

Identity verification prerequisite in tool metric

100%

100%

schedule_payment vs promise_to_pay distinction

100%

100%

Repository
cekura-ai/cekura-skills
Evaluated
Agent
Claude Code
Model
Claude Sonnet 4.6

Table of Contents

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.