agent-orchestration-improve-agent

Systematic improvement of existing agents through performance analysis, prompt engineering, and continuous iteration.

Quality

13%

Does it follow best practices?

Impact

—

No eval scenarios have been run

Securityby

Passed

No known issues

Optimize this skill with Tessl

npx tessl skill review --optimize ./skills/antigravity-agent-orchestration-improve-agent/SKILL.md

Quality

Discovery

14%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

This description relies heavily on abstract buzzwords ('systematic improvement', 'continuous iteration') without specifying concrete actions or when the skill should be triggered. It lacks a 'Use when...' clause and would be difficult for Claude to distinguish from other optimization or prompt-related skills. The description needs significant reworking to be functional in a multi-skill selection context.

Suggestions

Add a 'Use when...' clause with specific trigger scenarios, e.g., 'Use when the user wants to improve an existing agent's performance, debug agent behavior, refine prompts, or run evaluations against benchmarks.'

Replace abstract language with concrete actions, e.g., 'Analyzes agent outputs against expected results, rewrites system prompts for clarity and accuracy, designs evaluation test cases, and iterates on tool definitions.'

Include natural trigger terms users would say, such as 'optimize agent', 'improve prompts', 'agent not working', 'eval results', 'agent debugging', 'prompt tuning'.

Dimension	Reasoning	Score
Specificity	The description uses vague, abstract language like 'systematic improvement', 'performance analysis', and 'continuous iteration' without listing concrete actions. These are buzzwords rather than specific capabilities.	1 / 3
Completeness	The 'what' is vaguely stated and the 'when' is entirely missing. There is no 'Use when...' clause or equivalent explicit trigger guidance, which per the rubric should cap completeness at 2, but since the 'what' is also weak, this scores a 1.	1 / 3
Trigger Term Quality	Contains some relevant keywords like 'agents', 'prompt engineering', and 'performance analysis' that users might mention, but misses common variations like 'optimize prompts', 'debug agent', 'improve accuracy', 'eval', 'benchmarks', or 'agent tuning'.	2 / 3
Distinctiveness Conflict Risk	The description is very generic and could overlap with many skills related to code improvement, debugging, optimization, or general prompt writing. 'Systematic improvement' and 'continuous iteration' are too broad to carve out a clear niche.	1 / 3
	Total	5 / 12 Passed

Implementation

12%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This skill is a lengthy, textbook-style document that describes agent optimization concepts at a high level without providing concrete, executable guidance. It explains many concepts Claude already knows (A/B testing, semantic versioning, chain-of-thought prompting) while failing to provide the specific, actionable commands or code that would make it useful. The content would benefit enormously from being condensed to ~50 lines of concrete instructions with references to separate detailed files.

Suggestions

Reduce the body to a concise overview (~50-80 lines) with concrete, executable commands for each phase, removing explanations of well-known concepts like A/B testing, semantic versioning, and chain-of-thought prompting.

Replace pseudocode tool references (e.g., 'Use: context-manager', 'Use: prompt-engineer') with actual executable commands, real CLI invocations, or specific API calls that Claude can run.

Split detailed content (evaluation metrics, test suite templates, rollback procedures) into separate referenced files to enable progressive disclosure rather than inlining everything.

Add concrete feedback loops with specific validation criteria—e.g., 'Run X command, check output for Y, if Z then retry with W'—rather than abstract descriptions of what to monitor.

Dimension	Reasoning	Score
Conciseness	Extremely verbose at ~300+ lines. Much of the content explains concepts Claude already knows (what chain-of-thought is, what A/B testing is, what Cohen's d is, what semantic versioning means). Lists like 'Constitutional AI Integration' and 'Human Evaluation Protocol' describe well-known concepts without adding novel, project-specific guidance. The skill reads more like a textbook chapter than actionable instructions.	1 / 3
Actionability	Despite its length, the skill contains no executable code or real commands. References like 'Use: context-manager' and 'Use: prompt-engineer' are pseudocode pointing to undefined tools. The code blocks are templates with placeholders (e.g., '[X%]', '$ARGUMENTS') rather than concrete, copy-paste-ready commands. The guidance remains abstract and descriptive throughout.	1 / 3
Workflow Clarity	The four-phase structure provides a clear sequence, and the rollback triggers/procedures in Phase 4 represent meaningful validation checkpoints. However, the phases are so broad and abstract that they don't provide actionable step-by-step guidance. The validation steps reference undefined tools and lack concrete feedback loops (e.g., what exactly to run, how to interpret results, when to loop back).	2 / 3
Progressive Disclosure	The entire skill is a monolithic wall of text with no references to external files or bundle resources. All content—from performance analysis to deployment procedures—is inlined in a single massive document. There is no separation of overview from detailed reference material, and no bundle files exist to support progressive disclosure.	1 / 3
	Total	5 / 12 Passed

Validation

90%

Warnings & errors only

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation — 10 / 11 Passed

Validation for skill structure

Criteria	Description	Result
frontmatter_unknown_keys	Unknown frontmatter key(s) found; consider removing or moving to metadata	Warning

	Total	10 / 11 Passed

Repository: boisenoise/skills-collections
Commit: cfced3a

Reviewed: about 2 hours ago

Table of Contents

Discovery Implementation Validation

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.