CtrlK
BlogDocsLog inGet started
Tessl Logo

cekura-self-improving-agent

Use when the user asks to "improve my agent", "self-improving agent", "auto-tune my agent", "iterate on my agent prompt", "fix my agent based on test results", "close the loop on agent quality", "auto-improve agent prompt", "use eval results to improve agent", "optimize my prompt based on failures", "rewrite my prompt", or describes agent self-improvement, prompt iteration from run results, or automated agent quality loops. Covers the full diagnose → propose → apply → re-validate loop for VAPI agents (squads + tool definitions), ElevenLabs Conversational AI agents (system prompt + tool definitions), and for self-hosted agents (pipecat pipelines and custom websocket servers, including the offline / pasted- prompt degenerate variant).

62

Quality

72%

Does it follow best practices?

Impact

No eval scenarios have been run

SecuritybySnyk

Advisory

Suggest reviewing before use

Optimize this skill with Tessl

npx tessl skill review --optimize ./cekura/skills/cekura-self-improving-agent/SKILL.md
SKILL.md
Quality
Evals
Security

Quality

Content

54%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This skill has excellent workflow clarity and progressive disclosure — the multi-phase architecture is well-documented with clear hand-off rules, validation checkpoints, and a comprehensive directory of sub-files. However, it is severely over-verbose: the 40+ Common Pitfalls bullets, exhaustive loop hand-off rules, and repeated explanations of the same concepts (e.g., 'don't skip Setup Step 1.4' appears in at least 4 places) bloat the orchestrator far beyond what's needed. The content would benefit enormously from aggressive deduplication and trusting that detailed rules live in the phase files they govern.

Suggestions

Cut 'Common Pitfalls' by 60-70% — move provider-specific pitfalls into their respective provider files and phase-specific pitfalls into their phase files, keeping only the 5-8 most critical cross-cutting pitfalls in the orchestrator.

Deduplicate repeated explanations: the Setup Step 1.4 hard-gate, the 'auto_mode doesn't skip setup questions' rule, and the 'don't parallelize across phases' rule each appear 3-4 times — state each once with a clear location anchor.

Compress the 'Loop hand-off rules' section into a compact table (Phase A → Phase B: condition) rather than prose paragraphs for each transition.

Move the detailed 'When to ask for feedback' list into a separate reference file and keep only a 2-3 line summary in the orchestrator pointing to it.

DimensionReasoningScore

Conciseness

This is extremely verbose — over 500 lines of dense prose that extensively explains internal architecture, phase boundaries, loop hand-off rules, anti-patterns, and edge cases at a level far beyond what's needed for an orchestrator file. Massive sections like 'Common Pitfalls' (40+ bullet points) and exhaustive hand-off rules repeat information that likely lives in the referenced phase files. Claude doesn't need paragraphs explaining why parallelizing across phase boundaries is bad — a one-line rule suffices.

1 / 3

Actionability

The skill provides concrete workflow steps and references to specific phase files with step numbers (e.g., 'Steps 1.1–1.4', 'Step COLLECT.3'), and names specific tools and API paths. However, there is no executable code or copy-paste-ready commands in the orchestrator itself — it delegates everything to sub-files. The guidance is detailed but largely descriptive rather than directly executable.

2 / 3

Workflow Clarity

The multi-step workflow is exceptionally well-sequenced with an ASCII architecture diagram, explicit phase ordering, clear hand-off conditions between every phase pair, validation checkpoints (Sync verifies writes, Overfitting Gate scrubs edits, Eval validates), feedback loops (drift → rollback to Apply, failure → loop to Collect), and multiple explicit stop conditions. Destructive operations have explicit gates.

3 / 3

Progressive Disclosure

The skill is structured as a thin orchestrator with clear one-level-deep references to 15+ phase and provider files, each with descriptive link text and path. The directory layout is comprehensive, and every referenced file has a summary of what it contains. Navigation is straightforward despite the complexity.

3 / 3

Total

9

/

12

Passed

Description

89%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

This is a strong description with excellent trigger term coverage and clear completeness, explicitly listing both what the skill does and when to use it. The main weakness is that the specificity of concrete actions could be improved — the description leans on process-level language ('diagnose → propose → apply → re-validate loop') rather than enumerating discrete actions. The extensive trigger phrase list is a notable strength for skill selection.

Suggestions

Replace or supplement the abstract 'diagnose → propose → apply → re-validate loop' with more concrete action verbs, e.g., 'Analyzes eval failures, rewrites system prompts, updates tool definitions, and re-runs validation tests.'

DimensionReasoningScore

Specificity

The description mentions a 'diagnose → propose → apply → re-validate loop' and names specific platforms (VAPI, ElevenLabs, pipecat), but the actual concrete actions are somewhat abstract — it describes a process pattern rather than listing specific discrete actions like 'rewrites system prompts', 'updates tool definitions', 'runs eval suites'.

2 / 3

Completeness

The description explicitly answers both 'what' (the full diagnose → propose → apply → re-validate loop for VAPI, ElevenLabs, and self-hosted agents) and 'when' (extensive list of trigger phrases prefaced with 'Use when'). Both dimensions are clearly addressed.

3 / 3

Trigger Term Quality

Excellent coverage of natural trigger phrases: 'improve my agent', 'self-improving agent', 'auto-tune my agent', 'iterate on my agent prompt', 'fix my agent based on test results', 'optimize my prompt based on failures', 'rewrite my prompt'. These are highly natural phrases a user would actually say.

3 / 3

Distinctiveness Conflict Risk

Highly distinctive — it targets a very specific niche of agent self-improvement loops for specific platforms (VAPI, ElevenLabs, pipecat). The combination of agent improvement + specific platforms + eval-driven iteration makes it unlikely to conflict with generic prompt-writing or coding skills.

3 / 3

Total

11

/

12

Passed

Validation

100%

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation11 / 11 Passed

Validation for skill structure

No warnings or errors.

Repository
cekura-ai/cekura-skills
Reviewed

Table of Contents

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.