reality-checker

Stops fantasy approvals, evidence-based certification - Default to "NEEDS WORK", requires overwhelming proof for production readiness

Quality

21%

Does it follow best practices?

Impact

Pending

No eval scenarios have been run

Securityby

Passed

No known issues

Optimize this skill with Tessl

npx tessl skill review --optimize ./testing-reality-checker/skills/SKILL.md

Quality

Discovery

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

This description reads more like an opinionated tagline or mission statement than a functional skill description. It fails to specify concrete actions, lacks natural trigger terms users would use, and provides no explicit guidance on when Claude should select this skill. The attitude-driven language ('fantasy approvals', 'overwhelming proof') obscures the actual functionality.

Suggestions

Replace abstract language with concrete actions, e.g., 'Reviews code changes, validates test coverage, checks deployment criteria, and gates production approvals based on evidence.'

Add an explicit 'Use when...' clause with natural trigger terms like 'code review', 'approve PR', 'deploy to production', 'release readiness check', 'QA gate'.

Remove opinionated/editorial language ('fantasy approvals', 'overwhelming proof') and replace with objective capability descriptions that help Claude match this skill to user requests.

Dimension	Reasoning	Score
Specificity	The description uses vague, abstract language like 'fantasy approvals' and 'evidence-based certification' without listing concrete actions. It describes an attitude/philosophy rather than specific capabilities like 'reviews pull requests', 'checks test coverage', or 'validates deployment criteria'.	1 / 3
Completeness	The 'what' is vaguely implied (some kind of certification/approval process) but not clearly stated, and there is no 'when' clause or explicit trigger guidance at all. Missing a 'Use when...' clause caps this at 2, but the 'what' is also too weak, resulting in a 1.	1 / 3
Trigger Term Quality	The terms used ('fantasy approvals', 'overwhelming proof', 'production readiness') are not natural keywords a user would say. Users would more likely say 'code review', 'approve PR', 'deployment check', or 'release readiness'. The language is more like internal jargon or opinionated branding.	1 / 3
Distinctiveness Conflict Risk	The description has a somewhat distinctive tone and niche (strict approval/certification gating), which gives it some identity. However, without concrete triggers, it could overlap with any code review, QA, or deployment skill.	2 / 3
	Total	5 / 12 Passed

Implementation

35%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This skill is extremely verbose, spending significant tokens on persona definition, motivational framing, and repeated emphasis on skepticism that Claude doesn't need. While it contains some actionable elements (bash commands, report template structure, automatic fail triggers), much of the guidance is qualitative and descriptive rather than providing concrete, measurable decision criteria. The content would benefit greatly from being split into a concise overview with references to detailed templates and methodology files.

Suggestions

Cut persona/identity/memory/communication style sections entirely—these consume tokens without adding actionable guidance. Reduce to a single sentence about defaulting to 'NEEDS WORK' status.

Move the large report template and methodology sections to separate referenced files (e.g., REPORT_TEMPLATE.md, METHODOLOGY.md) and keep only the core workflow steps in SKILL.md.

Add concrete, measurable pass/fail thresholds for each validation step (e.g., specific accessibility scores, exact responsive breakpoint requirements, interaction success rates from test-results.json).

Add an explicit feedback loop: what to do when validation fails at each step, how to communicate findings, and when to trigger re-assessment.

Dimension	Reasoning	Score
Conciseness	Extremely verbose at ~200+ lines. Extensive sections on personality, identity, memory, communication style, learning patterns, and success metrics are unnecessary—Claude doesn't need motivational framing or persona backstory. The report template alone is massive and could be a separate reference file. Much content restates the same ideas (e.g., 'stop fantasy approvals' repeated in multiple sections).	1 / 3
Actionability	The STEP 1 bash commands are concrete and executable, and the report template provides specific structure. However, much of the content is descriptive rather than instructive—markdown templates with bracketed placeholders like '[Honest description]' are not truly executable. The methodology sections describe what to analyze but rely on vague directives like 'review' and 'analyze' without concrete decision criteria.	2 / 3
Workflow Clarity	There is a 3-step process (Reality Check → QA Cross-Validation → End-to-End Validation) which provides sequence, but Steps 2 and 3 lack concrete validation checkpoints or explicit pass/fail criteria. The 'AUTOMATIC FAIL' triggers provide some decision boundaries but are qualitative rather than measurable (e.g., '>3 second load times' is the only concrete threshold). No feedback loop for error recovery is defined.	2 / 3
Progressive Disclosure	There is a reference to `ai/agents/integration.md` at the end for detailed methodology, which is good. However, the massive report template and multiple methodology sections should be in separate reference files rather than inline. The skill tries to be both an overview and a complete reference, resulting in a monolithic document.	2 / 3
	Total	7 / 12 Passed

Validation

90%

Warnings & errors only

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation — 10 / 11 Passed

Validation for skill structure

Criteria	Description	Result
frontmatter_unknown_keys	Unknown frontmatter key(s) found; consider removing or moving to metadata	Warning

	Total	10 / 11 Passed

Repository: OpenRoster-ai/awesome-openroster
Commit: 09aef5d

Reviewed: about 1 month ago

Table of Contents

Discovery Implementation Validation

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.