Use when tackling complex reasoning tasks requiring step-by-step logic, multi-step arithmetic, commonsense reasoning, symbolic manipulation, or problems where simple prompting fails - provides comprehensive guide to Chain-of-Thought and related prompting techniques (Zero-shot CoT, Self-Consistency, Tree of Thoughts, Least-to-Most, ReAct, PAL, Reflexion) with templates, decision matrices, and research-backed patterns
73
67%
Does it follow best practices?
Impact
Pending
No eval scenarios have been run
Advisory
Suggest reviewing before use
Optimize this skill with Tessl
npx tessl skill review --optimize ./plugins/customaize-agent/skills/thought-based-reasoning/SKILL.mdQuality
Discovery
92%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
This is a strong description that clearly communicates both what the skill does and when to use it, with rich trigger terms covering multiple prompting methodologies. The main weakness is that the scope is somewhat broad ('complex reasoning tasks', 'problems where simple prompting fails'), which could cause overlap with other reasoning or prompting-related skills. The description uses proper third-person voice and avoids vague fluff.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | Lists multiple specific concrete actions and techniques: Chain-of-Thought, Zero-shot CoT, Self-Consistency, Tree of Thoughts, Least-to-Most, ReAct, PAL, Reflexion, along with deliverables like templates, decision matrices, and research-backed patterns. | 3 / 3 |
Completeness | Clearly answers both what ('provides comprehensive guide to Chain-of-Thought and related prompting techniques with templates, decision matrices, and research-backed patterns') and when ('Use when tackling complex reasoning tasks requiring step-by-step logic, multi-step arithmetic, commonsense reasoning, symbolic manipulation, or problems where simple prompting fails'). | 3 / 3 |
Trigger Term Quality | Includes strong natural trigger terms users would say: 'complex reasoning', 'step-by-step logic', 'multi-step arithmetic', 'commonsense reasoning', 'symbolic manipulation', 'prompting techniques', 'Chain-of-Thought'. These cover a good range of how users would describe needing this skill. | 3 / 3 |
Distinctiveness Conflict Risk | While the specific prompting technique names (CoT, Tree of Thoughts, ReAct, etc.) create some distinctiveness, the broader framing around 'complex reasoning tasks' and 'prompting techniques' could overlap with other general prompting or reasoning skills. The phrase 'problems where simple prompting fails' is quite broad and could trigger for many different skill types. | 2 / 3 |
Total | 11 / 12 Passed |
Implementation
42%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This skill is a comprehensive reference document on CoT prompting techniques with strong actionability through concrete templates and code examples. However, it is severely over-long for a SKILL.md, explaining concepts Claude already knows (paper citations, what prompting is, how LLMs reason) and cramming all content into a single monolithic file. It reads more like a tutorial or survey paper than a concise skill instruction.
Suggestions
Reduce content by 60-70%: remove paper citations/counts, 'How It Works' explanations of concepts Claude knows, and the strengths/limitations sections. Keep only the prompt templates, code examples, decision matrix, and common mistakes.
Split into separate files: create individual technique files (e.g., COT.md, REACT.md, TOT.md) and have SKILL.md serve as a concise overview with the quick reference table and decision matrix linking to detail files.
Add explicit validation/feedback workflow: include a step like 'If technique X doesn't improve results after 2 attempts, escalate to the next technique in the decision matrix' with concrete checkpoints.
Remove the References section entirely—Claude doesn't need arxiv links to follow skill instructions.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | Extremely verbose at ~500+ lines. Explains concepts Claude already knows well (what CoT is, how LLMs work, paper citations, citation counts). Much of this is textbook-level prompting knowledge that doesn't need to be in a skill file. The accuracy gain percentages, paper citations, and extensive background explanations waste significant token budget. | 1 / 3 |
Actionability | Provides concrete, executable prompt templates and Python code for each technique. The examples are copy-paste ready (Self-Consistency implementation, ToT search, PAL templates, ReAct trace format) and include specific, usable patterns. | 3 / 3 |
Workflow Clarity | The decision matrix provides a clear flowchart for technique selection, and individual techniques have clear steps. However, there are no validation checkpoints or feedback loops for when a technique fails to improve results—the 'Common Mistakes' table partially addresses this but doesn't provide explicit recovery workflows. | 2 / 3 |
Progressive Disclosure | Monolithic wall of text with no references to external files. All 9 techniques are fully detailed inline, making this extremely long. Content should be split into separate files per technique with the SKILL.md serving as an overview with links. No bundle files exist to support any splitting. | 1 / 3 |
Total | 7 / 12 Passed |
Validation
90%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 10 / 11 Passed
Validation for skill structure
| Criteria | Description | Result |
|---|---|---|
skill_md_line_count | SKILL.md is long (659 lines); consider splitting into references/ and linking | Warning |
Total | 10 / 11 Passed | |
dedca19
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.