Systematic improvement of existing agents through performance analysis, prompt engineering, and continuous iteration.
40
13%
Does it follow best practices?
Impact
81%
1.24xAverage score across 3 eval scenarios
Passed
No known issues
Optimize this skill with Tessl
npx tessl skill review --optimize ./skills/antigravity-agent-orchestration-improve-agent/SKILL.mdQuality
Discovery
14%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
This description relies heavily on abstract buzzwords ('systematic improvement', 'continuous iteration') without specifying concrete actions or when the skill should be triggered. It lacks a 'Use when...' clause and is too generic to be reliably distinguished from other skills related to prompt engineering, agent development, or optimization.
Suggestions
Add a 'Use when...' clause with specific trigger scenarios, e.g., 'Use when the user wants to improve an existing agent's performance, debug agent behavior, refine prompts, or run evaluations.'
Replace abstract language with concrete actions, e.g., 'Analyzes agent outputs against expected results, rewrites system prompts, designs evaluation criteria, and iterates on tool configurations.'
Include natural trigger terms users would say, such as 'optimize agent', 'improve prompts', 'agent not working well', 'eval results', 'prompt tuning', or 'agent debugging'.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | The description uses vague, abstract language like 'systematic improvement', 'performance analysis', and 'continuous iteration' without listing concrete actions. These are buzzwords rather than specific capabilities. | 1 / 3 |
Completeness | The 'what' is vaguely stated and the 'when' is entirely missing. There is no 'Use when...' clause or equivalent explicit trigger guidance, which per the rubric should cap completeness at 2, but since the 'what' is also weak, this scores a 1. | 1 / 3 |
Trigger Term Quality | Contains some relevant keywords like 'agents', 'prompt engineering', and 'performance analysis' that users might mention, but misses common variations like 'optimize prompts', 'debug agent', 'improve accuracy', 'eval', 'benchmarks', or 'agent tuning'. | 2 / 3 |
Distinctiveness Conflict Risk | The description is very generic and could overlap with many skills related to prompt writing, code optimization, debugging, testing, or general agent development. 'Systematic improvement' and 'continuous iteration' are too broad to carve out a clear niche. | 1 / 3 |
Total | 5 / 12 Passed |
Implementation
12%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This skill is a comprehensive but overly verbose and abstract guide that reads more like a consulting framework than actionable instructions for Claude. It explains many concepts Claude already knows (A/B testing, semantic versioning, chain-of-thought prompting) without providing any truly executable code, real tool commands, or concrete examples. The four-phase structure provides reasonable workflow clarity, but the lack of actionable specifics and the monolithic format significantly reduce its utility.
Suggestions
Cut content by 60-70%: Remove explanations of well-known concepts (A/B testing, semantic versioning, Cohen's d, chain-of-thought prompting) and focus only on project-specific conventions and decisions Claude wouldn't already know.
Replace pseudo-tool invocations with real, executable commands or code snippets. If 'context-manager' and 'prompt-engineer' are actual tools, provide their real CLI syntax with concrete examples; if not, remove them.
Split into multiple files: Keep SKILL.md as a concise overview (<50 lines) with links to separate files for each phase (e.g., ANALYSIS.md, PROMPT_ENGINEERING.md, TESTING.md, DEPLOYMENT.md).
Add at least one concrete, end-to-end worked example showing a specific agent optimization (e.g., 'Agent had 60% task completion → identified missing few-shot examples → added 3 examples → completion rose to 82%') with actual prompt diffs.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | Extremely verbose at ~300+ lines. Much of the content is generic knowledge Claude already possesses (what A/B testing is, what Cohen's d means, what semantic versioning is, what chain-of-thought prompting is). Lists like 'correction patterns, clarification requests, task abandonment' are obvious categories that don't need enumeration. The skill reads like a textbook chapter rather than actionable instructions. | 1 / 3 |
Actionability | Despite its length, the skill contains no executable code or concrete commands. Pseudo-tool invocations like 'Use: context-manager Command: analyze-agent-performance $ARGUMENTS --days 30' and 'Use: prompt-engineer Technique: chain-of-thought-optimization' are not real executable commands—they reference undefined tools with placeholder syntax. The templates use bracket placeholders like '[X%]' without showing how to actually compute or collect these values. | 1 / 3 |
Workflow Clarity | The four-phase structure provides a clear sequence (analyze → improve → test → deploy), and the rollback triggers/procedures in Phase 4 represent meaningful validation checkpoints. However, the phases are so broad and abstract that the actual steps within each phase lack concrete validation gates. The rollback section is the strongest part but relies on undefined monitoring infrastructure. | 2 / 3 |
Progressive Disclosure | This is a monolithic wall of text with no references to external files. All content is inline despite being far too long for a single SKILL.md. Phases 2-4 could each be separate reference documents. There are no links to supplementary materials, templates, or examples files that would make this navigable. | 1 / 3 |
Total | 5 / 12 Passed |
Validation
90%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 10 / 11 Passed
Validation for skill structure
| Criteria | Description | Result |
|---|---|---|
frontmatter_unknown_keys | Unknown frontmatter key(s) found; consider removing or moving to metadata | Warning |
Total | 10 / 11 Passed | |
636b862
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.