CtrlK
BlogDocsLog inGet started
Tessl Logo

tdg-personal/santa-method

"Multi-agent adversarial verification with convergence loop. Two independent review agents must both pass before output ships."

38

Quality

38%

Does it follow best practices?

Impact

Pending

No eval scenarios have been run

SecuritybySnyk

Passed

No known issues

Overview
Quality
Evals
Security
Files

Quality

Discovery

7%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

This description is heavily laden with technical jargon and buzzwords while failing to communicate concrete actions, applicable domains, or trigger conditions. It reads more like an architecture note than a skill description, providing no guidance for when Claude should select it. A user needing quality review or verification would never use these terms naturally.

Suggestions

Replace abstract jargon with concrete actions: specify what is being verified (e.g., 'Verifies code output quality by running two independent review passes checking for correctness and edge cases').

Add an explicit 'Use when...' clause with natural trigger terms users would actually say, such as 'Use when the user asks for thorough review, double-checking, or high-confidence verification of generated output'.

Specify the domain or type of output this applies to (code, documents, data analysis, etc.) to make the skill distinguishable and actionable.

DimensionReasoningScore

Specificity

The description uses abstract, buzzword-heavy language like 'multi-agent adversarial verification' and 'convergence loop' without specifying concrete actions. It does not explain what is being verified, what kind of output is being reviewed, or what domain this applies to.

1 / 3

Completeness

The 'what' is vaguely described with abstract process language, and there is no 'when' clause or any explicit trigger guidance for when Claude should select this skill.

1 / 3

Trigger Term Quality

The terms used ('multi-agent', 'adversarial verification', 'convergence loop') are technical jargon that users would almost never naturally say. There are no natural trigger terms a user would use when needing this skill.

1 / 3

Distinctiveness Conflict Risk

The jargon-heavy language is somewhat distinctive in that few other skills would use terms like 'adversarial verification' or 'convergence loop,' but the lack of domain specificity means it's unclear what this skill actually covers, making it hard to distinguish from other review/validation skills.

2 / 3

Total

5

/

12

Passed

Implementation

39%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

The skill has excellent workflow clarity with a well-defined 4-phase process, convergence loop, and clear escalation path. However, it is severely bloated — much of the content (domain-specific rubrics, cost analysis, metrics, concept explanations) is either generic knowledge Claude already has or belongs in separate reference files. The actionability is moderate since the code is pseudocode rather than executable patterns for actual Claude Code usage.

Suggestions

Cut the content by 50-60%: remove the 'core insight' explanation, cost analysis section, metrics section, and domain-specific rubric extension tables — these are generic knowledge Claude already possesses.

Split into SKILL.md (overview + 4 phases briefly) with references to RUBRIC_TEMPLATES.md, BATCH_PATTERN.md, and FAILURE_MODES.md for detailed content.

Replace pseudocode with actual Claude Code Agent tool invocation syntax showing real tool_use patterns rather than fictional Python functions.

Remove the 'When to Activate' and 'When NOT to use' sections — Claude can infer appropriate contexts from the skill description and architecture.

DimensionReasoningScore

Conciseness

The skill is extremely verbose at ~300+ lines. It explains concepts Claude already knows (why independent review matters, what anchoring bias is, cost-benefit analysis), includes extensive tables of domain-specific rubric extensions that are generic knowledge, and pads with motivational rationale ('the core insight') that doesn't add actionable value. The cost analysis section and metrics section are largely filler.

1 / 3

Actionability

The skill provides structured pseudocode and prompt templates that are somewhat concrete, but nothing is truly executable — it's all pseudocode with fictional functions (generate(), Agent(), fix_agent.execute(), ship(), escalate_to_human()). The reviewer prompt template is the most actionable element, but the actual implementation for Claude Code subagents is hand-waved ('use the Agent tool to spawn reviewers').

2 / 3

Workflow Clarity

The multi-step workflow is clearly sequenced across 4 phases with an explicit convergence loop, max iteration cap, escalation path, and the critical invariant that fresh agents must be used each round. The ASCII diagram, verdict gate logic, and fix cycle with explicit termination conditions provide strong workflow clarity with validation checkpoints.

3 / 3

Progressive Disclosure

This is a monolithic wall of text with no references to external files. Domain-specific rubric extensions, batch sampling patterns, failure mode tables, metrics, cost analysis, and integration notes are all inlined when they could be split into separate reference files. The skill would benefit enormously from being an overview that points to detailed materials.

1 / 3

Total

7

/

12

Passed

Validation

90%

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation10 / 11 Passed

Validation for skill structure

CriteriaDescriptionResult

frontmatter_unknown_keys

Unknown frontmatter key(s) found; consider removing or moving to metadata

Warning

Total

10

/

11

Passed

Reviewed

Table of Contents