Systematic improvement of existing agents through performance analysis, prompt engineering, and continuous iteration.
61
Quality
47%
Does it follow best practices?
Impact
81%
1.24xAverage score across 3 eval scenarios
Passed
No known issues
Optimize this skill with Tessl
npx tessl skill review --optimize ./skills/antigravity-agent-orchestration-improve-agent/SKILL.mdQuality
Discovery
32%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
The description identifies a clear domain (agent improvement) but relies on abstract terminology rather than concrete actions. The complete absence of a 'Use when...' clause significantly weakens its utility for skill selection, and the trigger terms used are more technical than what users would naturally say.
Suggestions
Add an explicit 'Use when...' clause with natural trigger phrases like 'when the user wants to improve, debug, or optimize an existing agent' or 'when agent performance is poor'.
Replace abstract terms with concrete actions such as 'analyze agent logs', 'rewrite system prompts', 'add error handling', or 'tune temperature settings'.
Include common user phrasings like 'fix my agent', 'agent not working', 'make agent better', or 'agent keeps failing' to improve trigger term coverage.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | Names the domain (agent improvement) and some actions (performance analysis, prompt engineering, continuous iteration), but these are somewhat abstract rather than concrete specific actions like 'analyze error logs' or 'rewrite system prompts'. | 2 / 3 |
Completeness | Describes what the skill does but completely lacks a 'Use when...' clause or any explicit trigger guidance. Per rubric guidelines, missing explicit trigger guidance caps completeness at 2, and this is weak enough to warrant a 1. | 1 / 3 |
Trigger Term Quality | Includes some relevant terms like 'agents', 'prompt engineering', and 'performance analysis', but missing common variations users might say like 'fix my agent', 'agent not working', 'improve prompts', 'debug agent', or 'optimize agent'. | 2 / 3 |
Distinctiveness Conflict Risk | The focus on 'agents' provides some specificity, but 'prompt engineering' and 'performance analysis' could overlap with general coding skills or other optimization-focused skills. Not clearly distinct enough. | 2 / 3 |
Total | 7 / 12 Passed |
Implementation
62%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This skill provides a comprehensive framework for agent optimization with strong workflow structure and clear validation checkpoints. However, it suffers from verbosity, explaining concepts Claude already understands, and lacks truly executable code examples—most commands are placeholder syntax rather than real implementations. The monolithic structure could benefit from splitting detailed content into referenced files.
Suggestions
Replace placeholder command syntax (e.g., 'Use: context-manager') with actual executable code or CLI commands that Claude can run
Remove explanatory content about well-known concepts (A/B testing basics, semantic versioning) to improve token efficiency
Split detailed content like test suite templates, evaluation rubrics, and example prompts into separate referenced files
Add concrete input/output examples showing actual prompt improvements with before/after comparisons
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | The skill contains useful content but is verbose in places, explaining concepts Claude likely knows (e.g., what A/B testing is, basic versioning semantics). Some sections like 'Continuous Improvement Cycle' and 'Post-Deployment Review' add padding without actionable specifics. | 2 / 3 |
Actionability | Provides structured guidance and some command examples, but most code blocks are pseudocode or placeholder syntax (e.g., 'Use: context-manager', 'Use: prompt-engineer') rather than executable commands. Lacks concrete, copy-paste ready implementations. | 2 / 3 |
Workflow Clarity | Clear four-phase workflow with explicit sequencing, validation checkpoints (rollback triggers, success criteria), and feedback loops (detect → alert → rollback → analyze → fix). The staged rollout and rollback procedures provide strong error recovery guidance. | 3 / 3 |
Progressive Disclosure | Content is well-organized with clear headers and phases, but everything is inline in one large document. No references to external files for detailed content like test suite templates, evaluation rubrics, or example prompts that could be split out. | 2 / 3 |
Total | 9 / 12 Passed |
Validation
90%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 10 / 11 Passed
Validation for skill structure
| Criteria | Description | Result |
|---|---|---|
frontmatter_unknown_keys | Unknown frontmatter key(s) found; consider removing or moving to metadata | Warning |
Total | 10 / 11 Passed | |
5c5ae21
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.