cekura-self-improving-agent

Use to close the loop on agent quality — turn a failure signal into a verified fix. Triggers: "improve my agent", "self-improving agent", "auto-tune / iterate on my prompt", "fix my agent from test results", "optimize my prompt based on failures", "rewrite my prompt". ALSO for production-call bug fixing: "fix this prod call issue", "debug and fix call ID", "reproduce this production bug", "regression test before a PR", "fix the bug from this call and open a PR". Works across VAPI, ElevenLabs, and self-hosted agents, and across three fix surfaces — prompt, tool config, and (self-hosted) owned source code, including infra-flavored / forked-SDK bugs, which are reproduced and validated on Cekura (never a code test) and, for source edits, shipped as a PR.

Quality

—

Does it follow best practices?

Impact

—

No eval scenarios have been run

Securityby

Advisory

Suggest reviewing before use

Quality

Content

85%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

The body is an exceptionally well-structured orchestrator: sequenced phases, explicit validation gates, feedback loops, and clean progressive disclosure into per-phase and per-provider files. The main weakness is reinforcement redundancy across the loop, Invariants, and pause sections.

Suggestions

State the must-fail-first gate and the CodeBug-vs-Upstream rule once in their canonical location and reference them elsewhere rather than restating them in the loop, Invariants, and When-to-pause sections.

The "When to pause and ask" list is long; consider moving the exhaustive conditions into a referenced file and keeping only the highest-frequency pause triggers inline.

Dimension	Reasoning	Score
Conciseness	Dense and free of concepts Claude already knows, but critical rules are reinforced redundantly — the must-fail-first gate and the "owned code is a CodeBug, not Upstream" rule each appear in the loop, Invariants, and When-to-pause sections — so it could be tightened.	2 / 3
Actionability	Provides concrete, specific guidance: exact field paths (conversation_config.agent.prompt.prompt), numeric thresholds (dataset_size 8, range 5–10, ≥ M of N gates), per-phase file links, and explicit stop conditions — fully actionable for an instruction skill.	3 / 3
Workflow Clarity	Ten phases run in a strict sequence with hard pre-conditions, explicit must-fail-first and must-pass validation gates, and named feedback loops (drift rolls back to Apply; regression hands back to Collect; Eval→Collect loop), matching the score-3 anchor.	3 / 3
Progressive Disclosure	The body is a concise overview with clearly signaled one-level-deep links into phases/ and providers/, a dedicated Files tree, and real references/ files — content is appropriately split and easy to navigate.	3 / 3
	Total	11 / 12 Passed

Description

100%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

The description is specific, trigger-rich, and clearly states both capability and use-conditions in third person. Its only weakness is verbosity — the trigger list is long — but every clause is concrete rather than fluff.

Dimension	Reasoning	Score
Specificity	Lists multiple concrete actions — "turn a failure signal into a verified fix", reproduce bugs, regression test, and ship a PR — across three named fix surfaces (prompt, tool config, owned source code), matching the score-3 anchor.	3 / 3
Completeness	Explicitly answers both what (close the loop / verified fix across providers and surfaces) and when via a labelled "Triggers:" clause plus "ALSO for production-call bug fixing:" guidance, satisfying the explicit-trigger requirement.	3 / 3
Trigger Term Quality	Strong coverage of natural phrasings users would actually say ("improve my agent", "fix my agent from test results", "debug and fix call ID", "regression test before a PR"), well beyond a single keyword.	3 / 3
Distinctiveness Conflict Risk	Scoped to a clear Cekura niche with provider and fix-surface qualifiers and distinctive triggers; unlikely to fire for unrelated skills despite nearby sibling cekura skills.	3 / 3
	Total	12 / 12 Passed

Validation

93%

Warnings & errors only

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation — 15 / 16 Passed

Validation for skill structure

Criteria	Description	Result
relative_links	Relative link issues: 15 missing, 6 deeper-than-1-level	Warning

	Total	15 / 16 Passed

Repository: cekura-ai/cekura-skills
Commit: f0854af

Reviewed: about 23 hours ago

Table of Contents

Discovery Implementation Validation

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.