CtrlK
BlogDocsLog inGet started
Tessl Logo

agent-orchestration-improve-agent

Systematic improvement of existing agents through performance analysis, prompt engineering, and continuous iteration.

30

Quality

13%

Does it follow best practices?

Impact

Pending

No eval scenarios have been run

SecuritybySnyk

Passed

No known issues

Optimize this skill with Tessl

npx tessl skill review --optimize ./skills/antigravity-agent-orchestration-improve-agent/SKILL.md
SKILL.md
Quality
Evals
Security

Quality

Discovery

14%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

This description relies heavily on abstract buzzwords ('systematic improvement', 'continuous iteration') without specifying concrete actions or when the skill should be triggered. It lacks a 'Use when...' clause and would be difficult for Claude to distinguish from other optimization or prompt-related skills. The description needs significant reworking to be functional in a multi-skill selection context.

Suggestions

Add a 'Use when...' clause with specific trigger scenarios, e.g., 'Use when the user wants to improve an existing agent's performance, debug agent behavior, refine prompts, or run evaluations against benchmarks.'

Replace abstract language with concrete actions, e.g., 'Analyzes agent outputs against expected results, rewrites system prompts for clarity and accuracy, designs evaluation test cases, and iterates on tool definitions.'

Include natural trigger terms users would say, such as 'optimize agent', 'improve prompts', 'agent not working', 'eval results', 'agent debugging', or 'prompt tuning'.

DimensionReasoningScore

Specificity

The description uses vague, abstract language like 'systematic improvement', 'performance analysis', and 'continuous iteration' without listing concrete actions. These are buzzwords rather than specific capabilities.

1 / 3

Completeness

The 'what' is vaguely stated and the 'when' is entirely missing. There is no 'Use when...' clause or equivalent explicit trigger guidance, which per the rubric should cap completeness at 2, but since the 'what' is also weak, this scores a 1.

1 / 3

Trigger Term Quality

Contains some relevant keywords like 'agents', 'prompt engineering', and 'performance analysis' that users might mention, but misses common variations like 'optimize prompts', 'debug agent', 'improve accuracy', 'eval', 'benchmarks', or 'agent tuning'.

2 / 3

Distinctiveness Conflict Risk

The description is very generic and could overlap with many skills related to code improvement, debugging, optimization, or general prompt writing. 'Systematic improvement' and 'continuous iteration' are too broad to carve out a clear niche.

1 / 3

Total

5

/

12

Passed

Implementation

12%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This skill is a comprehensive but overly verbose conceptual framework for agent optimization that reads more like a textbook than an actionable skill. It explains many concepts Claude already knows (A/B testing, semantic versioning, chain-of-thought prompting) without providing concrete, executable guidance. The references to tools like 'context-manager' and 'prompt-engineer' are undefined, making the workflow aspirational rather than actionable.

Suggestions

Cut content by 60-70% by removing explanations of concepts Claude already knows (A/B testing methodology, semantic versioning, chain-of-thought prompting basics) and focus only on project-specific configurations and decision criteria.

Replace pseudocode tool references ('Use: context-manager', 'Use: prompt-engineer') with actual executable commands, scripts, or concrete file paths that Claude can use, or remove them if the tools don't exist.

Split the monolithic content into separate files (e.g., TESTING.md, DEPLOYMENT.md, METRICS.md) and keep SKILL.md as a concise overview with clear references to each.

Add explicit validation gates between phases (e.g., 'Do not proceed to Phase 2 until baseline metrics are documented in metrics.json') with concrete verification steps.

DimensionReasoningScore

Conciseness

Extremely verbose at ~300+ lines. Much of the content is generic knowledge Claude already possesses (what A/B testing is, what Cohen's d means, what semantic versioning is, what chain-of-thought prompting is). Lists like 'Correction patterns, Clarification requests, Task abandonment' are things Claude can generate on its own. The skill reads more like a textbook chapter than a concise operational guide.

1 / 3

Actionability

Despite its length, the skill contains no executable code or concrete commands. References like 'Use: context-manager' and 'Use: prompt-engineer' are pseudocode pointing to undefined tools. The code blocks are templates with placeholders (e.g., '[X%]', '$ARGUMENTS') rather than copy-paste-ready instructions. There's no concrete guidance on how to actually implement any of these steps.

1 / 3

Workflow Clarity

The four-phase structure provides a clear sequence (analyze → improve → test → deploy), and the rollback triggers/procedures are well-defined with explicit thresholds. However, validation checkpoints between phases are implicit rather than explicit, and the feedback loop between testing failure and re-optimization is not clearly articulated as a concrete decision point.

2 / 3

Progressive Disclosure

The entire skill is a monolithic wall of text with no references to external files or bundle resources. Content that could be split into separate reference files (evaluation rubrics, test frameworks, deployment checklists) is all inline, making the skill extremely long. No bundle files are provided or referenced.

1 / 3

Total

5

/

12

Passed

Validation

90%

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation10 / 11 Passed

Validation for skill structure

CriteriaDescriptionResult

frontmatter_unknown_keys

Unknown frontmatter key(s) found; consider removing or moving to metadata

Warning

Total

10

/

11

Passed

Repository
boisenoise/skills-collections
Reviewed

Table of Contents

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.