thought-based-reasoning

Use when tackling complex reasoning tasks requiring step-by-step logic, multi-step arithmetic, commonsense reasoning, symbolic manipulation, or problems where simple prompting fails - provides comprehensive guide to Chain-of-Thought and related prompting techniques (Zero-shot CoT, Self-Consistency, Tree of Thoughts, Least-to-Most, ReAct, PAL, Reflexion) with templates, decision matrices, and research-backed patterns

Quality

67%

Does it follow best practices?

Impact

—

No eval scenarios have been run

Securityby

Advisory

Suggest reviewing before use

Optimize this skill with Tessl

npx tessl skill review --optimize ./plugins/customaize-agent/skills/thought-based-reasoning/SKILL.md

Quality

Discovery

92%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

This is a strong description that clearly communicates both what the skill does and when to use it, with rich trigger terms covering multiple prompting methodologies. The main weakness is that the scope is somewhat broad ('complex reasoning tasks', 'problems where simple prompting fails'), which could cause overlap with other reasoning or prompting-related skills. The description uses proper third-person voice and avoids vague fluff.

Dimension	Reasoning	Score
Specificity	Lists multiple specific concrete actions and techniques: Chain-of-Thought, Zero-shot CoT, Self-Consistency, Tree of Thoughts, Least-to-Most, ReAct, PAL, Reflexion, along with deliverables like templates, decision matrices, and research-backed patterns.	3 / 3
Completeness	Clearly answers both what ('provides comprehensive guide to Chain-of-Thought and related prompting techniques with templates, decision matrices, and research-backed patterns') and when ('Use when tackling complex reasoning tasks requiring step-by-step logic, multi-step arithmetic, commonsense reasoning, symbolic manipulation, or problems where simple prompting fails').	3 / 3
Trigger Term Quality	Includes strong natural trigger terms users would say: 'complex reasoning', 'step-by-step logic', 'multi-step arithmetic', 'commonsense reasoning', 'symbolic manipulation', 'prompting techniques', 'Chain-of-Thought'. These cover a good range of how users would describe needing this skill.	3 / 3
Distinctiveness Conflict Risk	While the specific prompting technique names (CoT, Tree of Thoughts, ReAct, etc.) create some distinctiveness, the broader framing around 'complex reasoning tasks' and 'prompting techniques' could overlap with other general prompting or reasoning skills. The phrase 'problems where simple prompting fails' is quite broad and could trigger for many different skill types.	2 / 3
	Total	11 / 12 Passed

Implementation

42%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This skill is a comprehensive reference document on CoT prompting techniques with strong actionability through concrete templates and code examples. However, it is severely over-long for a SKILL.md, explaining concepts Claude already knows (paper citations, what prompting is, how LLMs reason) and cramming all content into a single monolithic file. It reads more like a tutorial or survey paper than a concise skill instruction.

Suggestions

Reduce content by 60-70%: remove paper citations/counts, 'How It Works' explanations of concepts Claude knows, and the strengths/limitations sections. Keep only the prompt templates, code examples, decision matrix, and common mistakes.

Split into separate files: create individual technique files (e.g., COT.md, REACT.md, TOT.md) and have SKILL.md serve as a concise overview with the quick reference table and decision matrix linking to detail files.

Add explicit validation/feedback workflow: include a step like 'If technique X doesn't improve results after 2 attempts, escalate to the next technique in the decision matrix' with concrete checkpoints.

Remove the References section entirely—Claude doesn't need arxiv links to follow skill instructions.

Dimension	Reasoning	Score
Conciseness	Extremely verbose at ~500+ lines. Explains concepts Claude already knows well (what CoT is, how LLMs work, paper citations, citation counts). Much of this is textbook-level prompting knowledge that doesn't need to be in a skill file. The accuracy gain percentages, paper citations, and extensive background explanations waste significant token budget.	1 / 3
Actionability	Provides concrete, executable prompt templates and Python code for each technique. The examples are copy-paste ready (Self-Consistency implementation, ToT search, PAL templates, ReAct trace format) and include specific, usable patterns.	3 / 3
Workflow Clarity	The decision matrix provides a clear flowchart for technique selection, and individual techniques have clear steps. However, there are no validation checkpoints or feedback loops for when a technique fails to improve results—the 'Common Mistakes' table partially addresses this but doesn't provide explicit recovery workflows.	2 / 3
Progressive Disclosure	Monolithic wall of text with no references to external files. All 9 techniques are fully detailed inline, making this extremely long. Content should be split into separate files per technique with the SKILL.md serving as an overview with links. No bundle files exist to support any splitting.	1 / 3
	Total	7 / 12 Passed

Validation

90%

Warnings & errors only

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation — 10 / 11 Passed

Validation for skill structure

Criteria	Description	Result
skill_md_line_count	SKILL.md is long (659 lines); consider splitting into references/ and linking	Warning

	Total	10 / 11 Passed

Repository: NeoLabHQ/context-engineering-kit
Commit: dedca19

Reviewed: about 1 month ago

Table of Contents

Discovery Implementation Validation

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.