tree-of-thoughts

Execute tasks through systematic exploration, pruning, and expansion using Tree of Thoughts methodology with meta-judge evaluation specifications and multi-agent evaluation

Quality

19%

Does it follow best practices?

Impact

—

No eval scenarios have been run

Securityby

Passed

No known issues

Optimize this skill with Tessl

npx tessl skill review --optimize ./plugins/sadd/skills/tree-of-thoughts/SKILL.md

Quality

Discovery

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

This description is heavily laden with abstract methodology jargon without conveying any concrete capabilities or use cases. It fails to help Claude distinguish when to select this skill, as it neither specifies what tasks it handles nor provides natural trigger terms users would use. The description reads more like an academic paper title than a practical skill description.

Suggestions

Replace abstract methodology language with concrete actions: specify what types of tasks or problems this skill solves (e.g., 'Breaks down complex multi-step problems into branching solution paths, evaluates alternatives, and selects optimal approaches').

Add an explicit 'Use when...' clause with natural trigger terms users would actually say, such as 'Use when the user asks to explore multiple approaches, compare solution strategies, or solve complex reasoning problems'.

Remove or minimize jargon like 'meta-judge evaluation specifications' and 'multi-agent evaluation' — instead describe the user-facing benefit in plain language.

Dimension	Reasoning	Score
Specificity	The description uses abstract, buzzword-heavy language ('systematic exploration', 'pruning', 'expansion', 'meta-judge evaluation specifications', 'multi-agent evaluation') without listing any concrete actions a user would recognize. It describes a methodology rather than specific capabilities.	1 / 3
Completeness	The description vaguely addresses 'what' (execute tasks through Tree of Thoughts) but provides no 'when' clause or explicit trigger guidance. There is no 'Use when...' or equivalent, and the 'what' itself is too abstract to be useful.	1 / 3
Trigger Term Quality	The terms used are highly technical jargon ('Tree of Thoughts methodology', 'meta-judge evaluation', 'multi-agent evaluation', 'pruning') that users would almost never naturally say when requesting help. No common user-facing keywords are present.	1 / 3
Distinctiveness Conflict Risk	'Execute tasks' is extremely generic and could conflict with virtually any skill. The methodology-specific terms (Tree of Thoughts) are niche but the overall framing is so broad that it's unclear what domain or task type this skill is actually for.	1 / 3
	Total	4 / 12 Passed

Implementation

39%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This skill defines a sophisticated multi-phase Tree of Thoughts reasoning pattern with strong workflow clarity and adaptive strategy selection. However, it is severely undermined by extreme verbosity—inlining every prompt template, step-by-step thinking instructions, and examples into a single monolithic document that wastes enormous context window space. Much of the content coaches Claude on basic reasoning steps it already knows, and the lack of any progressive disclosure or bundle structure makes this impractical for real use.

Suggestions

Extract prompt templates into separate bundle files (e.g., prompts/explorer.md, prompts/pruning-judge.md, prompts/synthesizer.md) and reference them from the main SKILL.md to dramatically reduce token usage.

Remove step-by-step thinking scaffolding from prompt templates (e.g., 'Step 1: Understand the proposal deeply', 'Let's approach this systematically') — Claude already knows how to reason through problems.

Move the detailed API Design example walkthrough to a separate EXAMPLES.md file, keeping only a brief summary in the main skill.

Define or reference the required tooling (Task tool, subagent types like 'sadd:meta-judge') explicitly, or link to documentation that explains the execution environment.

Dimension	Reasoning	Score
Conciseness	Extremely verbose at ~600+ lines. Massive prompt templates are repeated with extensive inline instructions that explain basic reasoning steps Claude already knows (e.g., 'Step 1: Understand the proposal deeply', 'Before implementing, analyze'). The ASCII diagram, while helpful, is followed by redundant re-explanation of every phase. Significant token waste from coaching Claude on how to think step-by-step.	1 / 3
Actionability	Provides detailed prompt templates and dispatch instructions with specific file naming conventions and directory structures, which is concrete. However, the actual execution relies on unspecified tools (Task tool, subagent_type 'sadd:meta-judge', 'sadd:judge') without defining them, and the CLI example at the end uses a '/tree-of-thoughts' command that isn't defined. The prompts are templates with placeholders rather than fully executable examples.	2 / 3
Workflow Clarity	The multi-phase workflow is exceptionally well-sequenced with clear dependencies (wait for both Phase 1 AND Phase 1.5 before Phase 2), explicit validation checkpoints (Phase 4.5 adaptive strategy with concrete decision logic including score thresholds), feedback loops (REDESIGN returns to Phase 3, with escalation after two failures), and clear parallel execution points. The decision tree for adaptive strategy selection is precise with specific conditions.	3 / 3
Progressive Disclosure	Monolithic wall of text with no bundle files or external references. All prompt templates, examples, and detailed instructions are inlined in a single massive document. The prompt templates alone could be separate files, and the example walkthrough at the end could be a separate EXAMPLES.md. No content is split or referenced externally despite the document being extremely long.	1 / 3
	Total	7 / 12 Passed

Validation

81%

Warnings & errors only

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation — 9 / 11 Passed

Validation for skill structure

Criteria	Description	Result
skill_md_line_count	SKILL.md is long (943 lines); consider splitting into references/ and linking	Warning
frontmatter_unknown_keys	Unknown frontmatter key(s) found; consider removing or moving to metadata	Warning

	Total	9 / 11 Passed

Repository: NeoLabHQ/context-engineering-kit
Commit: dedca19

Reviewed: 29 days ago

Table of Contents

Discovery Implementation Validation

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.