cekura-metric-design

Use when the user asks to "create a metric", "write a metric", "design a metric", "build a metric for", "evaluate agent performance", "measure call quality", "track a KPI", "add a workflow metric", "improve my metric", "fix a metric", "debug metric results", "set up quality scoring", or "what metrics do I need". Also relevant when discussing LLM judge prompts, custom code metrics, evaluation triggers, VALID_SKIP patterns, section extraction, or metric best practices for Cekura voice AI agents. Covers both creating new metrics and reviewing, iterating on, or troubleshooting existing ones.

1.38x

Quality

—

Does it follow best practices?

Impact

98%

1.38x

Average score across 3 eval scenarios

Securityby

Advisory

Suggest reviewing before use

Evaluation results

100%

25%

Appointment Booking Flow Metric

LLM judge metric with anti-cross-pollination scoping

Criteria

Baseline

With context

Metric type llm_judge

100%

Prompt in description field

100%

No deprecated type

100%

SCOPE & FOCUS present

100%

Generic scoping language

100%

DO NOT FLAG section

75%

100%

Closed FAILURE CONDITIONS

62%

100%

N/A conditions checked first

100%

Relevant variables only

33%

100%

Timestamps in output instructions

100%

Safeguarding / spirit vs letter

75%

100%

Correct eval_type

100%

Design notes explain cross-pollination fix

100%

94%

18%

Conditional Metric Triggers for a Customer Support Agent

Conditional trigger design and two-layer N/A strategy

Criteria

Baseline

With context

Positive-then-negative pattern

100%

Short-call exclusion present

91%

100%

Two-layer distinction explained

100%

VALID_SKIP pattern demonstrated

25%

100%

Trigger type selection

100%

Inclusive trigger instruction

100%

Specific flow indicators

100%

Transfer/human exclusion

25%

Rationale quality

100%

No 'always' for conditional flows

100%

38%

Metric Architecture for a Loan Servicing Voice Agent

Dynamic variable metrics and tool call hallucination architecture

Criteria

Baseline

With context

One metric per dynamic variable

100%

Specific variable reference only

30%

100%

Tool-to-scenario mapping

80%

100%

Tool metric DO NOT FLAG API errors

100%

Tool metric closed FAILURE CONDITIONS

50%

100%

Tool metric scope: tool correctness only

100%

Baseline metrics recommended

25%

100%

Two-step activation documented

50%

100%

llm_judge as default type

100%

Tool metric not custom_code

100%

Identity verification prerequisite in tool metric

100%

schedule_payment vs promise_to_pay distinction

100%

Repository: cekura-ai/cekura-skills
Commit: f0854af

Evaluated: about 2 months ago
Agent: Claude Code
Model: Claude Sonnet 4.6

Table of Contents

Appointment Booking Flow Metric Conditional Metric Triggers for a Customer Support Agent Metric Architecture for a Loan Servicing Voice Agent

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.