CtrlK
BlogDocsLog inGet started
Tessl Logo

agent-orchestration-improve-agent

Systematic improvement of existing agents through performance analysis, prompt engineering, and continuous iteration.

40

1.24x
Quality

13%

Does it follow best practices?

Impact

81%

1.24x

Average score across 3 eval scenarios

SecuritybySnyk

Passed

No known issues

Optimize this skill with Tessl

npx tessl skill review --optimize ./skills/agent-orchestration-improve-agent/SKILL.md
SKILL.md
Quality
Evals
Security

Quality

Discovery

14%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

This description reads like a high-level abstract rather than a functional skill description. It relies on vague buzzwords ('systematic improvement', 'continuous iteration') without specifying concrete actions, lacks any 'Use when...' trigger guidance, and is too generic to be reliably distinguished from other optimization or analysis skills.

Suggestions

Add a 'Use when...' clause with specific trigger terms like 'improve agent performance', 'optimize prompts', 'agent not working well', 'evaluate agent outputs', 'agent accuracy'.

Replace abstract phrases with concrete actions, e.g., 'Analyzes agent outputs against expected results, rewrites system prompts, designs evaluation datasets, and iterates on tool definitions to improve agent reliability.'

Add distinguishing details about what kind of agents (LLM agents, Claude agents) and what specific improvement methods are used to reduce overlap with generic optimization or debugging skills.

DimensionReasoningScore

Specificity

The description uses vague, abstract language like 'systematic improvement', 'performance analysis', and 'continuous iteration' without listing concrete actions. These are buzzwords rather than specific capabilities.

1 / 3

Completeness

The 'what' is vague (no concrete actions listed), and there is no 'when' clause or explicit trigger guidance at all. The missing 'Use when...' clause caps this at 2, but the weak 'what' brings it to 1.

1 / 3

Trigger Term Quality

Contains some relevant keywords like 'agents', 'prompt engineering', and 'performance analysis' that users might mention, but misses common variations like 'optimize prompts', 'debug agent', 'improve accuracy', 'eval', 'benchmarks', or 'agent tuning'.

2 / 3

Distinctiveness Conflict Risk

Very generic phrasing like 'systematic improvement' and 'performance analysis' could overlap with many skills related to code optimization, testing, debugging, or general performance tuning. 'Existing agents' is somewhat specific but not enough to create a clear niche.

1 / 3

Total

5

/

12

Passed

Implementation

12%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This skill is a comprehensive but overly verbose and abstract guide to agent optimization. It reads more like a general methodology document than an actionable skill for Claude, explaining many concepts Claude already understands (A/B testing, version control, staged rollouts) without providing concrete, executable guidance. The pseudocode tool references (context-manager, prompt-engineer, parallel-test-runner) are undefined, making the skill largely non-actionable despite its extensive length.

Suggestions

Replace pseudocode tool references with actual executable commands or concrete code examples that Claude can directly use, or clearly document what these tools are and how to invoke them.

Cut the content by at least 60%—remove explanations of well-known concepts (A/B testing methodology, version numbering semantics, what hallucination means) and focus only on project-specific patterns and decisions.

Split into multiple files: keep SKILL.md as a concise overview with phases summarized in 2-3 lines each, then link to separate files like TESTING.md, DEPLOYMENT.md, and PROMPT-ENGINEERING.md for detailed guidance.

Add explicit validation checkpoints between phases (e.g., 'Do not proceed to Phase 3 until baseline metrics document exists and has been reviewed') to create concrete feedback loops.

DimensionReasoningScore

Conciseness

Extremely verbose at ~300+ lines. Much of the content describes general software engineering practices (A/B testing, staged rollouts, version management) and agent optimization concepts that Claude already knows. The skill reads like a textbook chapter rather than a concise reference. Lists like 'Constitutional AI Integration' and 'Human Evaluation Protocol' explain well-known concepts without adding novel, project-specific value.

1 / 3

Actionability

Despite its length, the skill contains no executable code or concrete commands. The code blocks are pseudocode placeholders (e.g., 'Use: context-manager', 'Use: prompt-engineer', 'Use: parallel-test-runner') referencing tools that aren't defined or explained. Template fields like '[X%]' and '[Y]' provide no real guidance. There's nothing copy-paste ready or directly executable.

1 / 3

Workflow Clarity

The four-phase structure provides a clear sequence (Analyze → Improve → Test → Deploy), and rollback triggers are explicitly defined. However, validation checkpoints between phases are implicit rather than explicit, and the feedback loop between testing failure and re-optimization is not clearly articulated as a concrete decision point. The rollback section is good but the overall workflow lacks tight integration between steps.

2 / 3

Progressive Disclosure

This is a monolithic wall of text with no references to external files. All content—from performance analysis to deployment procedures to continuous monitoring—is inlined in a single massive document. Content like the A/B testing framework, human evaluation protocol, and deployment strategies could easily be split into separate reference files. No navigation aids or cross-references exist.

1 / 3

Total

5

/

12

Passed

Validation

90%

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation10 / 11 Passed

Validation for skill structure

CriteriaDescriptionResult

frontmatter_unknown_keys

Unknown frontmatter key(s) found; consider removing or moving to metadata

Warning

Total

10

/

11

Passed

Repository
sickn33/antigravity-awesome-skills
Reviewed

Table of Contents

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.