CtrlK
BlogDocsLog inGet started
Tessl Logo

cekura-self-improving-agent

Use when the user asks to "improve my agent", "self-improving agent", "auto-tune my agent", "iterate on my agent prompt", "fix my agent based on test results", "close the loop on agent quality", "auto-improve agent prompt", "use eval results to improve agent", "optimize my prompt based on failures", "rewrite my prompt", or describes agent self-improvement, prompt iteration from run results, or automated agent quality loops. Covers the full diagnose → propose → apply → re-validate loop for VAPI agents (squads + tool definitions) and for self-hosted agents (custom websocket servers, including the offline / pasted-prompt degenerate variant).

59

Quality

68%

Does it follow best practices?

Impact

No eval scenarios have been run

SecuritybySnyk

Advisory

Suggest reviewing before use

Optimize this skill with Tessl

npx tessl skill review --optimize ./cekura/skills/cekura-self-improving-agent/SKILL.md
SKILL.md
Quality
Evals
Security

Quality

Discovery

82%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

This description excels at trigger term coverage and completeness, providing an extensive list of natural user phrases and clearly stating both what the skill does and when to use it. However, the actual capabilities could be more concretely specified—the 'diagnose → propose → apply → re-validate loop' is somewhat abstract—and some trigger terms like 'rewrite my prompt' risk overlap with general prompt engineering skills.

Suggestions

Make the capabilities more concrete by specifying what 'diagnose' and 'propose' entail, e.g., 'Analyzes eval/test failures to identify prompt weaknesses, generates revised prompt versions, applies changes to agent configs, and re-runs validation.'

Narrow overly broad trigger terms like 'rewrite my prompt' by qualifying them, e.g., 'rewrite my agent prompt based on eval results', to reduce conflict risk with general prompt engineering skills.

DimensionReasoningScore

Specificity

The description mentions a 'diagnose → propose → apply → re-validate loop' and covers 'VAPI agents (squads + tool definitions)' and 'self-hosted agents (custom websocket servers)', which names the domain and some actions, but the concrete actions (diagnose what? propose what exactly?) remain somewhat abstract rather than listing specific discrete capabilities.

2 / 3

Completeness

The description explicitly answers both 'what' (the full diagnose → propose → apply → re-validate loop for VAPI and self-hosted agents) and 'when' (with an extensive 'Use when...' clause listing many trigger phrases and scenarios). Both dimensions are clearly addressed.

3 / 3

Trigger Term Quality

Excellent coverage of natural trigger terms: 'improve my agent', 'self-improving agent', 'auto-tune my agent', 'iterate on my agent prompt', 'fix my agent based on test results', 'optimize my prompt based on failures', 'rewrite my prompt', etc. These are phrases users would naturally say.

3 / 3

Distinctiveness Conflict Risk

While it targets a specific niche (agent self-improvement loops), terms like 'rewrite my prompt' and 'optimize my prompt based on failures' could overlap with general prompt engineering or optimization skills. The VAPI-specific mentions help but the broader terms introduce some conflict risk.

2 / 3

Total

10

/

12

Passed

Implementation

54%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This is a sophisticated, well-architected skill with outstanding workflow clarity and progressive disclosure — the phase decomposition, hand-off conditions, and file organization are exemplary. However, it is severely undermined by extreme verbosity: the same rules and warnings are repeated multiple times across sections, the Common Pitfalls section alone could be cut by 60% without losing information, and the skill explains reasoning that Claude doesn't need. The actionability is moderate since all concrete execution details are deferred to sub-files not provided here.

Suggestions

Cut the Common Pitfalls section by at least 50% — most items restate rules already given in the orchestration flow, mode descriptions, or 'When to ask' section. Consolidate into a single authoritative location per rule.

Eliminate repeated explanations of the same concept (e.g., 'auto_mode skips routine gates but not setup questions' appears at least 4 times; 'redeploy_command collected at Setup Step 1.4' appears 6+ times). State each rule once and reference it.

Add at least one concrete, executable example in the SKILL.md itself — e.g., a sample user input and the expected first-phase output, or a concrete tool invocation showing how the orchestrator calls a phase.

Move the lengthy mode/provider descriptions into the already-referenced provider overview files and keep only a 2-3 line summary per mode in SKILL.md.

DimensionReasoningScore

Conciseness

Extremely verbose at ~450+ lines. Massive amounts of repetition — the same concepts (phase sequencing, auto_mode behavior, Setup Step 1.4 hard gate, redeploy_command, file-source discovery) are restated 3-5 times across different sections. The 'Common Pitfalls' section alone is enormous and repeats guidance already given in the orchestration flow and mode descriptions. Claude does not need concepts like 'what a websocket is' or lengthy explanations of why pre-fetching is bad — a single rule suffices.

1 / 3

Actionability

The skill provides detailed procedural guidance and references to phase files with specific step numbers (e.g., 'Steps COLLECT.1–5'), but the SKILL.md itself contains zero executable code, no concrete command examples, no curl snippets, and no copy-paste-ready templates. All concrete execution details are deferred to sub-files which are not provided. The orchestration logic is described rather than demonstrated with concrete examples.

2 / 3

Workflow Clarity

The multi-step workflow is exceptionally well-sequenced with an ASCII diagram, explicit phase-boundary hand-off conditions, clear loop/exit decision points, validation checkpoints (Sync verifies writes, Overfitting Gate scrubs edits, Eval validates), and explicit feedback loops (drift → rollback to Apply, failure → loop to Collect). Stop conditions are enumerated precisely. The sequential-only constraint is clearly stated with rationale.

3 / 3

Progressive Disclosure

Excellent progressive disclosure structure. The SKILL.md serves as a clear orchestrator/overview with well-signaled one-level-deep references to phase files, provider files, and reference files. The directory layout is provided, every referenced file has a relative link and a one-line description of its responsibility. Content is appropriately split — phases, providers, and cross-cutting references each have their own files.

3 / 3

Total

9

/

12

Passed

Validation

100%

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation11 / 11 Passed

Validation for skill structure

No warnings or errors.

Repository
cekura-ai/cekura-skills
Reviewed

Table of Contents

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.