CtrlK
BlogDocsLog inGet started
Tessl Logo

reflect

Reflect on previus response and output, based on Self-refinement framework for iterative improvement with complexity triage and verification

28

Quality

13%

Does it follow best practices?

Impact

Pending

No eval scenarios have been run

SecuritybySnyk

Passed

No known issues

Optimize this skill with Tessl

npx tessl skill review --optimize ./plugins/reflexion/skills/reflect/SKILL.md
SKILL.md
Quality
Evals
Security

Quality

Discovery

0%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

This description is vague, jargon-heavy, and lacks concrete actions, natural trigger terms, and explicit usage guidance. It reads more like an academic concept label than a functional skill description. It would be nearly impossible for Claude to reliably select this skill from a pool of alternatives.

Suggestions

Replace abstract language with specific concrete actions, e.g., 'Reviews and improves Claude's previous response by checking for errors, refining reasoning, and verifying accuracy.'

Add an explicit 'Use when...' clause with natural trigger terms, e.g., 'Use when the user asks to review, improve, double-check, or refine a previous answer.'

Remove jargon like 'Self-refinement framework' and 'complexity triage' and replace with plain language that describes what the skill actually does in practice.

DimensionReasoningScore

Specificity

The description uses vague, abstract language like 'reflect on previous response and output' and 'self-refinement framework for iterative improvement.' No concrete actions are listed—there's no indication of what specific operations are performed.

1 / 3

Completeness

The description vaguely addresses 'what' (reflect on previous response) but provides no 'when' clause or explicit trigger guidance. There is no 'Use when...' or equivalent, and even the 'what' is too abstract to be useful.

1 / 3

Trigger Term Quality

The description relies on jargon like 'Self-refinement framework,' 'complexity triage,' and 'verification' which are not natural terms a user would say. Users are unlikely to request 'complexity triage' or 'iterative improvement with verification.'

1 / 3

Distinctiveness Conflict Risk

The description is extremely generic—'reflect on previous response' and 'iterative improvement' could apply to virtually any skill that involves revision, editing, debugging, or quality checking, creating high conflict risk with many other skills.

1 / 3

Total

4

/

12

Passed

Implementation

27%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This skill is extremely verbose and tries to be a comprehensive software engineering evaluation handbook rather than a focused, actionable skill for Claude. It explains many concepts Claude already knows (SOLID, Clean Architecture, common libraries, testing patterns), wastes tokens on theatrical persona framing, and presents everything in a single monolithic document. The core idea of structured self-reflection with complexity triage is sound but buried under excessive generic content.

Suggestions

Reduce content by 70%+ by removing explanations of concepts Claude already knows (SOLID, DDD, common libraries, testing patterns, code smells) and keeping only the reflection workflow steps and report template.

Split into multiple files: keep SKILL.md as a concise overview with the triage system and core workflow, then reference separate files for the detailed evaluation criteria, code-specific checks, and report format template.

Remove the theatrical persona instructions ('You will be killed', bias awareness tables) which waste significant tokens without adding actionable guidance - a single line like 'Be critically rigorous and evidence-based' suffices.

Clarify the workflow by creating a single linear decision tree: Triage → Select applicable checklist(s) → Evaluate → Generate report, with explicit transitions between phases rather than parallel sections the reader must navigate.

DimensionReasoningScore

Conciseness

Extremely verbose at ~500+ lines. Explains concepts Claude already knows extensively (what Clean Architecture is, what SOLID principles are, what code smells are, common libraries like lodash/date-fns, what AAA testing pattern is). Massive amounts of padding with generic software engineering advice that doesn't earn its token cost. The 'Library & Existing Solution Check' section alone lists common libraries Claude already knows about. The threatening persona instructions ('You will be killed') waste tokens on theatrical framing.

1 / 3

Actionability

Provides checklists and a report format template which are somewhat concrete, but the actual guidance is largely abstract checklist items rather than executable steps. The code examples are illustrative comparisons (good vs bad patterns) rather than executable tools for performing reflection. The report format template is the most actionable element, but the bulk of content is vague directives like 'verify assumptions' without specifying how.

2 / 3

Workflow Clarity

Has a triage system (Quick/Standard/Deep paths) and numbered steps, but the workflow is sprawling and hard to follow. The relationship between sections is unclear - after Step 2's decision point, the flow branches into multiple parallel evaluation sections (Code-Specific, Non-Code, Fact-Checking) without clear guidance on which to use when. The 'Quick Path' says 'Skip to Final Verification' but Final Verification is itself a lengthy checklist. No clear validation checkpoints between the many evaluation phases.

2 / 3

Progressive Disclosure

Monolithic wall of text with no references to external files and no bundle files provided. All content is inline in a single massive document. Content like the library comparison examples, anti-pattern catalogs, and detailed scoring rubrics could easily be split into separate reference files. The skill would benefit enormously from being an overview that points to detailed reference materials.

1 / 3

Total

6

/

12

Passed

Validation

81%

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation9 / 11 Passed

Validation for skill structure

CriteriaDescriptionResult

skill_md_line_count

SKILL.md is long (651 lines); consider splitting into references/ and linking

Warning

frontmatter_unknown_keys

Unknown frontmatter key(s) found; consider removing or moving to metadata

Warning

Total

9

/

11

Passed

Repository
NeoLabHQ/context-engineering-kit
Reviewed

Table of Contents

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.