Use when the user says "get started with Cekura", "set up Cekura", "onboard to Cekura", "I'm new to Cekura", "help me set up my agent", "how do I use Cekura", "walk me through Cekura", "configure my project", "first time using Cekura", or needs guidance on initial platform setup. Covers two onboarding paths: **testing** (default — build evaluators and run simulated calls) and **observability** (ingest production call logs and evaluate them).
68
83%
Does it follow best practices?
Impact
—
No eval scenarios have been run
Advisory
Suggest reviewing before use
Walk a new user through the complete Cekura setup — from account creation to their first useful artifact.
Two onboarding paths share the same Phases 1–2 (account, project, agent) and diverge after that:
This is an interactive, step-by-step walkthrough. At each phase, confirm with the user before proceeding and help them with the actual API calls or UI steps.
When this skill suggests creating, listing, updating, or evaluating something on Cekura, prefer using available platform tools over describing API calls or dashboard steps. In Claude Code with the Cekura plugin installed, these tools are auto-configured and handle authentication, parameter validation, and error handling for you. Fall back to direct API endpoints or dashboard guidance only when no tools are available in the current session.
Each phase below names the primary tool for that step. Actually call the tool rather than telling the user to do it in the dashboard — that's what makes the onboarding hands-on instead of a tutorial. If a call fails (validation error, missing field, auth), fix the cause or ask the user for the missing input, then retry; don't claim a step is done until the call succeeds.
Every agent ID, scenario ID, call log ID, metric ID, and run ID comes from a real tool response. If you don't have an ID you need, call the relevant list/retrieve tool and pull it from the response — do not fabricate one to keep the flow moving. This holds even when the user gives you a name ("the Booking Bot agent"): look it up and use the returned id. Provider-side identifiers the user must supply (VAPI assistant IDs, Retell agent IDs, API keys, webhook URLs) follow the same rule — ask the user, never guess.
This is an interactive walkthrough, not a reference doc. Guide the user through each phase conversationally:
cekura-create-agent, cekura-metric-design, cekura-eval-design, cekura-metric-improvement) when appropriate.If the caller already specified a path — via the /cekura-onboarding command argument or the invoking context — honour it without asking.
Otherwise, ask once:
Two onboarding paths — which fits your goal?
- Testing (default) — build evaluators and run simulated calls against your agent.
- Observability — ingest your production call logs and evaluate them.
Default to testing when ambiguous. Phases 1–2 are identical for both; the flow forks at Phase 3.
Survey what already exists in the user's project before walking them through any phase. This prevents asking "Resume where?" on an empty project (redundant) and prevents skipping past existing work (risky).
Gathering state:
/cekura-onboarding command pre-detected project state and passed it in context), trust it — don't re-run the same lookups.Decision:
| State of the path's relevant resources | Action |
|---|---|
| Clean slate — none exist (testing: 0 agents + 0 scenarios + 0 results; observability: 0 agents + 0 call logs + 0 metrics) | Proceed straight to Phase 1 (or Phase 2 if account/project already set up). Don't ask "Resume where?" / "Ready to continue?" — there's nothing to resume. |
| Mid-onboarding — some relevant resources exist but the flow is incomplete | Surface ONE concrete clarification: e.g. "Found existing agent Booking Bot with 12 scenarios and 1 result. Continue with it, or create a new agent?" — never a generic "Ready to continue?". |
| Obvious from the user's message — they said "create a new agent" / "start fresh" / named a specific agent | Honour that intent without an extra confirm. |
After deciding, move into the appropriate phase. Confirm at phase boundaries and before destructive operations, but never re-ask the state you just surveyed.
Skip this phase entirely if the user is already signed in with a project selected (or state was handed to you showing an existing project) — go straight to Phase 2. Phase 1 is only for users starting from nothing; don't re-ask account or project facts you already have.
Ask the user:
If they have an API key, verify it works by listing metrics. A successful response (even empty) confirms the key is valid.
If they don't have an account, direct them to sign up at https://dashboard.cekura.ai/sign-up and create a project.
For Claude Code plugin users: If platform operations aren't working, run /setup-mcp to configure API access.
Ask: "Do you already have a project, or do we need to create one?"
If creating: Create the project (projects_create) or point them to the dashboard.
Project organization guidance:
Both paths register the agent with aiagents_create. What differs is the framing:
Ask:
Create the agent on Cekura with aiagents_create — agent name, project ID, and description. For detailed agent setup (provider integration, mock tools, KB, dynamic variables), hand off to the cekura-create-agent skill.
Critical: agent description is essential. It enables automatic evaluator generation (testing) and powers metrics that reference {{agent.description}} (both paths). Ask the user to paste their agent's full system prompt — it's the single most leverage-rich field on the agent record.
Based on their provider, guide them through connecting:
VAPI:
assistant_id alone.Retell:
LiveKit:
metadata.raw_metrics for latency tracking.ElevenLabs:
Pipecat:
provider.type: "pipecat". Credentials: provider.credentials.api_key (Pipecat Cloud API key from pipecat.daily.co → Settings → API Keys).provider.credentials.config.pipecat_agent_name — the Pipecat agent name from your dashboard (required unless tracing_enabled is true).scenarios_run_pipecat_v2.Self-hosted / Custom (reached via SIP, WebSocket, or chat):
provider.type: "self_hosted" agents — SIP / WebSocket / chat are connection modes, not providers.Ask: "Does your agent use dynamic variables — per-call data like customer names, account IDs, or configuration flags?"
If yes:
{{variableName}} patterns in the agent description.{{dynamic_variables.keyName}}.dynamic_variables is also a field on ingestion payloads — values appear alongside the transcript in the UI.Testing path — ask: "Does your agent call external APIs or tools during calls?" If yes:
Observability path — tool calls in production are real. They surface in the call log as tool_calls (alongside the transcript) and the Tool Call Success metric scores them automatically once enabled.
After Phase 2, the flow diverges. Follow ONLY your path's Phase 3+ sections below.
Use this branch when the path is testing (default).
Always recommend selecting ALL pre-defined metrics for comprehensive analysis:
| Category | Metrics |
|---|---|
| Accuracy | Expected Outcome, Hallucination, Relevancy, Response Consistency, Tool Call Success, Transcription Accuracy, Voicemail Detection |
| Quality | Interruption counts, Response latency, Silence detection, Call termination appropriateness |
| Customer Experience | CSAT, Sentiment, Dropoff nodes, Topic categorization |
| Speech Quality | Pitch, Speaking rate, Gibberish detection, Pronunciation verification |
Guide: "Go to your project's Metrics section and enable all pre-defined metrics. This gives you a comprehensive baseline."
Two-step activation: Metrics must be (1) toggled on at the project level AND (2) attached to individual evaluators.
For first-time users, skip custom metrics initially. Once they have test results, they can use the cekura-metric-design skill to create targeted custom metrics.
The fastest path to first tests — generate scenarios with scenarios_agent_create:
{
"agent_id": <agent_id>,
"num_scenarios": 10,
"personalities": [<personality_id>],
"generate_expected_outcomes": true,
"tool_ids": ["TOOL_END_CALL", "TOOL_END_CALL_ONLY_ON_TRANSFER"]
}Generation runs in the background — poll scenarios_generate_progress until it completes, then review the generated scenarios.
After generation, check:
scenario_language to correct code.Common gaps in auto-generated evals:
Hand off to the cekura-eval-design skill for designing more targeted evaluators.
Every evaluator needs metrics attached. At minimum:
Use bulk-add via actions → modify scenarios in the UI.
Run the scenarios with one of the scenarios_run_* tools. The exact tool depends on the agent's provider/transport:
scenarios_run_pipecat_v2 — Pipecat Cloud, WebRTC (uses the agent's provider.credentials.api_key)scenarios_run_livekit_v2 — LiveKit, WebRTCscenarios_run_vapi_webrtc / scenarios_run_retell_webrtc — VAPI / Retell WebRTCscenarios_run_elevenlabs — ElevenLabsscenarios_run_websocket — custom/self-hosted WebSocket agentsscenarios_run_sip — SIP endpointsscenarios_run_voice / scenarios_run_text — phone (PSTN) / text{
"agent_id": <agent_id>,
"scenarios": [<scenario_ids>],
"frequency": 1
}Start with 5–10 scenarios for the first run. Voice calls take 1–3 minutes each.
Check results via the results endpoint. Each run includes:
Guide the user through interpreting results:
| Need | Next step | Description |
|---|---|---|
| Better metrics | cekura-metric-design | Design custom metrics for specific workflows. |
| More evaluators | cekura-eval-design | Design targeted test scenarios. |
| Improve metric quality | cekura-metric-improvement | Iterate metric quality through feedback. |
| Monitor production | Re-run onboarding on the observability path | Ingest live calls and score them. |
| CI/CD integration | GitHub Actions | Auto-test on code changes. |
| Scheduled tests | Cron jobs | Recurring test suites. |
Use this branch when the path is observability.
The observability path does not generate scenarios or run simulations. Instead, you ingest the user's actual production calls, attach metrics, evaluate, and review. The agent registered in Phase 2 is the production agent that owns those calls.
Get the user's production calls into Cekura with observe_create.
Ask: "Do you want to (a) upload a sample call to see how Cekura processes it, or (b) configure continuous webhook ingestion from your provider?"
Call observe_create with the user's transcript. Identify the agent by either:
agent: the Cekura agent ID from Phase 2 (preferred), orassistant_id: the external provider-side ID (Cekura resolves it to your agent).Minimum payload:
{
"call_id": "<unique call id>",
"agent": <agent_id>,
"transcript_type": "cekura",
"transcript_json": [
{"role": "Testing Agent", "content": "Hi, can I book a room?", "start_time": 0.0, "end_time": 2.1},
{"role": "Main Agent", "content": "Of course — for what date?", "start_time": 2.3, "end_time": 4.0}
],
"call_ended_reason": "completed"
}For transcript_type: "cekura", the only valid roles are "Testing Agent" (caller) and "Main Agent" (the agent under test). "agent" / "user" are NOT valid for this format.
If the user has a provider-native transcript (VAPI, Retell, ElevenLabs, Bland, LiveKit, Pipecat, KoreAI, Trillet), set transcript_type to that provider and pass transcript_json exactly as the provider emits it — Cekura normalises it internally.
Useful optional fields:
voice_recording_url — enables audio-based metrics (pitch, speaking rate, gibberish detection).metadata — freeform tags for Observability filtering ({"customer_id": "...", "campaign_id": "..."}).dynamic_variables — values injected into the agent at runtime; shown alongside the transcript.customer_number — caller's number in E.164.metric_ids — comma-separated metric IDs to evaluate immediately (skips Phase 5's separate kickoff).Cekura ships provider-specific webhook endpoints that accept the provider's raw post-call shape — no transformation on the user's side:
| Provider | Webhook URL |
|---|---|
| VAPI | POST /observability/v1/vapi/observe/ |
| Retell | POST /observability/v1/retell/observe/ |
| ElevenLabs | POST /observability/v1/elevenlabs/observe/ |
| LiveKit | POST /observability/v1/livekit/observe/ |
| Pipecat | POST /observability/v1/pipecat/observe/ |
| Other | Use generic observe_create |
Guide the user to configure their provider's webhook to POST every completed call to the relevant URL with their Cekura API key in the Authorization: Bearer ... header. Then trigger one test call so a real ingestion lands.
After the first call lands:
call_logs_list) to confirm it's visible.status is evaluating; full results appear shortly after.If the user has more than one provider, repeat with a sample from each to build the call inventory for Phase 5 evaluation.
Metrics in observability mode score real calls. The starter set should cover correctness, customer experience, and safety.
List metrics (metrics_list) to see what's already configured. If the project already has metrics from a prior testing onboarding, reuse them — they apply to call logs as well as test runs.
For first-time observability onboarding, recommend three metrics that cover the high-value bases:
| Metric | Why it matters in observability |
|---|---|
| Hallucination | Catches the agent inventing facts on live calls — highest blast-radius failure mode. |
| Expected Outcome adherence | Did the agent accomplish the call's purpose (booking, transfer, info-gathering)? |
| Sentiment | Surfaces customer frustration trends; a leading indicator for churn. |
List the full catalog with predefined_metrics_list. For each chosen metric, create it with metrics_create (single) or metrics_bulk_create (multiple), passing the project_id and metric specifics.
If the user wants metrics auto-tailored to their agent (e.g. workflow-specific outcome metrics), use metrics_generate — Cekura generates metric definitions from the agent's description. Defer to the cekura-metric-design skill for designing custom metrics carefully.
If you passed metric_ids during ingestion, auto-evaluation already started. This phase evaluates additional metrics on existing call logs.
Call call_logs_evaluate_metrics_create:
{
"call_log_ids": [<id1>, <id2>],
"metric_ids": [<metric_id1>, <metric_id2>]
}Evaluation runs async — the response shows status: "evaluating" and the call log's metrics array is initially empty. Re-retrieve the call log shortly after to see scores.
If a metric prompt was updated and the user wants existing call logs re-scored, use call_logs_rerun_evaluation_create.
The point of observability is closing the loop: humans review scores, mark ones that disagree with their judgment, and that feedback improves future metric quality.
Retrieve the call log (call_logs_retrieve) with metric results. Walk the user through:
If results still show status: "evaluating", wait a moment and re-retrieve.
Ask the user to pick at least one metric result they disagree with and explain why. Then record it with call_logs_mark_metric_vote_create:
{
"call_log_id": <id>,
"metric_id": <id>,
"vote": "incorrect",
"reasoning": "<user's reason>"
}Encourage 3–5 votes for a meaningful feedback signal.
Hand off to cekura-metric-improvement to use the collected votes to actually refine metric prompts. That skill loops: rebuild prompt → preview on the voted call logs → ship.
| Need | Next step | Description |
|---|---|---|
| Improve metrics with votes | cekura-metric-improvement | Use Phase 6's votes to refine metric prompts. |
| Design custom metrics | cekura-metric-design | New metrics for workflow-specific behaviour. |
| Add pre-deploy tests | Re-run onboarding on the testing path | Use real production calls as the basis for new scenarios. |
| Scheduled re-evaluation | Cron jobs | Re-score live calls as metrics evolve. |
| Multi-project rollups | Observability dashboards | Aggregate metric scores across agents/projects. |
See references/api-quickstart.md for the essential endpoints used during onboarding.
7a49e22
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.