CtrlK
BlogDocsLog inGet started
Tessl Logo

tdg-personal/agent-harness-construction

Design and optimize AI agent action spaces, tool definitions, and observation formatting for higher completion rates.

41

Quality

41%

Does it follow best practices?

Impact

Pending

No eval scenarios have been run

SecuritybySnyk

Passed

No known issues

Overview
Quality
Evals
Security
Files

Quality

Discovery

32%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

The description identifies a specific domain (AI agent design) and mentions several relevant concepts, but remains at a somewhat abstract level without concrete actions or deliverables. The biggest weakness is the complete absence of a 'Use when...' clause, which makes it harder for Claude to know when to select this skill. Adding explicit trigger conditions and more natural user-facing keywords would significantly improve selection accuracy.

Suggestions

Add a 'Use when...' clause with trigger phrases like 'Use when designing agent tools, optimizing function calling schemas, improving agent task completion, or structuring tool-use APIs.'

Include more natural keyword variations users might say, such as 'function calling', 'tool use', 'agentic workflows', 'agent prompting', or 'ReAct patterns'.

List more concrete actions like 'define tool schemas', 'structure observation payloads', 'reduce action space complexity', or 'format tool responses' to improve specificity.

DimensionReasoningScore

Specificity

Names the domain (AI agent design) and some actions ('design and optimize', 'action spaces', 'tool definitions', 'observation formatting'), but these are still somewhat abstract and don't list concrete deliverables or operations like 'generate tool schemas' or 'refactor action enums'.

2 / 3

Completeness

Describes what the skill does but completely lacks a 'Use when...' clause or any explicit trigger guidance for when Claude should select this skill. Per the rubric, a missing 'Use when...' clause caps completeness at 2, and since the 'what' is also somewhat vague, this scores a 1.

1 / 3

Trigger Term Quality

Includes relevant terms like 'AI agent', 'action spaces', 'tool definitions', and 'observation formatting', but these are fairly technical. Missing common user phrasings like 'agent tools', 'function calling', 'tool use', 'agent design', 'prompt engineering for agents', or 'agentic workflows'.

2 / 3

Distinctiveness Conflict Risk

The focus on AI agent action spaces and tool definitions is a reasonably specific niche, but could overlap with general prompt engineering skills, API design skills, or broader AI development skills. The lack of explicit triggers increases conflict risk.

2 / 3

Total

7

/

12

Passed

Implementation

22%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This skill reads as a high-level checklist of agent design principles rather than actionable construction guidance. It lacks concrete examples (e.g., a sample tool definition schema, an example observation payload, a worked example of error recovery), executable code, and any sequenced workflow for actually building or improving an agent harness. The organization is decent but the content is too abstract to meaningfully change Claude's behavior.

Suggestions

Add concrete, executable examples: a sample tool definition JSON schema, an example observation response payload, and a before/after example of improving a poorly designed action space.

Introduce a step-by-step workflow for auditing and improving an existing agent harness, with explicit validation checkpoints (e.g., 'Run the benchmark suite after each action space change; only proceed if completion rate holds or improves').

Replace abstract guidance like 'Use stable, explicit tool names' with specific patterns and anti-pattern examples showing the actual tool definitions side by side.

Split detailed sections (e.g., observation design, error recovery contracts) into referenced files and keep SKILL.md as a concise overview with navigation links.

DimensionReasoningScore

Conciseness

The content is reasonably lean and avoids lengthy explanations of basic concepts, but some sections like 'Architecture Pattern Guidance' and 'Granularity Rules' offer surface-level descriptions that don't add much beyond what Claude already knows about agent design patterns.

2 / 3

Actionability

The skill is almost entirely abstract guidance with no concrete code, tool definition schemas, example JSON payloads, or executable commands. Statements like 'Use stable, explicit tool names' and 'Keep inputs schema-first and narrow' describe rather than instruct—there are no copy-paste-ready artifacts.

1 / 3

Workflow Clarity

There is no sequenced workflow for constructing or improving an agent harness. The content is a collection of principles organized by topic but lacks any step-by-step process, validation checkpoints, or feedback loops for iterating on agent design.

1 / 3

Progressive Disclosure

The content is organized into clearly labeled sections which aids scanning, but everything is inline with no references to deeper materials. For a topic this broad (action spaces, observation design, error recovery, benchmarking), splitting detailed guidance into referenced files would be appropriate.

2 / 3

Total

6

/

12

Passed

Validation

90%

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation10 / 11 Passed

Validation for skill structure

CriteriaDescriptionResult

frontmatter_unknown_keys

Unknown frontmatter key(s) found; consider removing or moving to metadata

Warning

Total

10

/

11

Passed

Reviewed

Table of Contents