CtrlK
BlogDocsLog inGet started
Tessl Logo

doubt-driven-development

Subjects every non-trivial decision to a fresh-context adversarial review before it stands. Use when correctness matters more than speed, when working in unfamiliar code, when stakes are high (production, security-sensitive logic, irreversible operations), or any time a confident output would be cheaper to verify now than to debug later.

54

Quality

60%

Does it follow best practices?

Impact

No eval scenarios have been run

SecuritybySnyk

Passed

No known issues

Optimize this skill with Tessl

npx tessl skill review --optimize ./skills/doubt-driven-development/SKILL.md
SKILL.md
Quality
Evals
Security

Quality

Content

62%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

The skill excels at actionability and workflow clarity — the 5-step doubt cycle is well-structured with concrete artifacts, adversarial prompts, classification frameworks, and bounded iteration. However, it is significantly over-long and repetitive, with key constraints stated multiple times across the process steps, red flags, rationalizations table, and verification checklist. The content would benefit greatly from aggressive trimming and splitting supplementary material (rationalizations, red flags, cross-model details) into separate reference files.

Suggestions

Cut the 'Common Rationalizations' table to 3-4 entries max or move it to a separate reference file — most entries restate constraints already in the process steps.

Move the cross-model escalation subsection (Steps 1-4 within Step 3) into a separate CROSS-MODEL.md reference file, keeping only a 2-3 line summary and link in the main skill.

Deduplicate the 'Red Flags' section against the process steps — many red flags (e.g., 'silently skipping cross-model', 'passing the CLAIM to the reviewer') are already stated as explicit constraints in the workflow. Keep only flags that add genuinely new information.

Remove explanations of why things matter when Claude can infer them (e.g., 'Confidence correlates poorly with correctness on novel problems' and 'Debugging a wrong commit in production is more expensive') — these are general truths Claude already knows.

DimensionReasoningScore

Conciseness

The skill is extremely verbose at ~300+ lines. While the domain is complex, there is significant redundancy: the 'Common Rationalizations' table repeats points already made in the body, 'Red Flags' duplicates constraints from the process steps, and the cross-model escalation section alone is longer than many entire skills. Multiple points are stated, restated, and then restated again (e.g., 'silent skipping is not okay' appears at least 4 times). Claude already understands adversarial reasoning, shell escaping risks, and orchestration patterns — much of this is over-explained.

1 / 3

Actionability

The skill provides highly concrete, executable guidance: a copy-paste checklist, exact adversarial prompt text, specific bash invocation examples for cross-model CLIs, a clear classification framework (contract misread / actionable / trade-off / noise) with precedence ordering, and explicit CLAIM formatting examples. Every step has concrete artifacts to produce.

3 / 3

Workflow Clarity

The 5-step process is clearly sequenced with explicit validation checkpoints (Step 5 stop conditions, Step 4 classification precedence), feedback loops (re-loop on actionable findings, fix contract on misread, decompose if 3 cycles insufficient), and bounded iteration (hard cap at 3 cycles with escalation). The degraded-mode fallback and non-interactive context handling are also well-specified.

3 / 3

Progressive Disclosure

The skill references external files (agents/, references/orchestration-patterns.md, other skills) appropriately, but the SKILL.md itself is monolithic — the cross-model escalation section, common rationalizations table, red flags list, and interaction notes could all be split into separate reference files. No bundle files are provided to offload this content, so everything is inline in one very long document.

2 / 3

Total

9

/

12

Passed

Description

57%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

The description has a strong 'Use when...' clause with multiple explicit trigger conditions, which is its main strength. However, it suffers from vague capability description — 'adversarial review' doesn't convey concrete actions the skill performs. The description reads more like a philosophy or methodology than a skill with specific, tangible outputs.

Suggestions

Replace 'subjects every non-trivial decision to a fresh-context adversarial review' with concrete actions like 'Reviews code changes, validates logic, identifies edge cases, and challenges assumptions through structured adversarial analysis'.

Add natural trigger terms users would actually say, such as 'double-check', 'verify', 'sanity check', 'code review', 'validate my approach', or 'check for bugs'.

Clarify what 'fresh-context' means in practice — does it re-read files, create a checklist, generate test cases? Concrete outputs help distinguish this from generic review skills.

DimensionReasoningScore

Specificity

The description uses abstract language like 'non-trivial decision' and 'fresh-context adversarial review' without listing concrete actions. It doesn't specify what it actually does — no verbs like 'reviews code', 'validates logic', 'checks configurations', etc. 'Adversarial review' is vague and doesn't describe tangible outputs.

1 / 3

Completeness

The description answers both 'what' (subjects decisions to adversarial review) and 'when' with an explicit 'Use when...' clause listing multiple trigger conditions: correctness over speed, unfamiliar code, high stakes, production, security-sensitive logic, and irreversible operations.

3 / 3

Trigger Term Quality

It includes some relevant terms like 'production', 'security-sensitive logic', 'irreversible operations', 'unfamiliar code', and 'debug' that users might naturally mention. However, it lacks common trigger terms users would actually say like 'review my code', 'double-check', 'verify', 'sanity check', or 'code review'.

2 / 3

Distinctiveness Conflict Risk

The concept of 'adversarial review' is somewhat distinctive, but the broad scope ('every non-trivial decision') could overlap with code review skills, testing skills, or general quality assurance skills. The triggers like 'production' and 'security-sensitive logic' could conflict with deployment or security-specific skills.

2 / 3

Total

8

/

12

Passed

Validation

100%

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation11 / 11 Passed

Validation for skill structure

No warnings or errors.

Repository
addyosmani/agent-skills
Reviewed

Table of Contents

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.