CtrlK
BlogDocsLog inGet started
Tessl Logo

proof-checker

Rigorous mathematical proof verification and fixing workflow. Reads a LaTeX proof, identifies gaps via cross-model review (Codex GPT-5.4 xhigh), fixes each gap with full derivations, re-reviews, and generates an audit report. Use when user says "检查证明", "verify proof", "proof check", "审证明", "check this proof", or wants rigorous mathematical verification of a theory paper.

79

Quality

77%

Does it follow best practices?

Impact

Pending

No eval scenarios have been run

SecuritybySnyk

Passed

No known issues

Optimize this skill with Tessl

npx tessl skill review --optimize ./skills/proof-checker/SKILL.md
SKILL.md
Quality
Evals
Security

Quality

Discovery

100%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

This is a strong skill description that clearly articulates a specific workflow for mathematical proof verification, includes explicit trigger terms in both English and Chinese, and occupies a distinct niche. The description effectively communicates both what the skill does (multi-step proof verification and fixing) and when to use it (with natural trigger phrases). Minor note: the reference to 'Codex GPT-5.4 xhigh' is an implementation detail that could be confusing but doesn't significantly detract from the description's quality.

DimensionReasoningScore

Specificity

Lists multiple specific concrete actions: reads a LaTeX proof, identifies gaps via cross-model review, fixes each gap with full derivations, re-reviews, and generates an audit report. This is a detailed workflow description with clear steps.

3 / 3

Completeness

Clearly answers both 'what' (reads LaTeX proof, identifies gaps, fixes with derivations, re-reviews, generates audit report) and 'when' (explicit 'Use when...' clause with specific trigger phrases and a general use case description).

3 / 3

Trigger Term Quality

Includes excellent natural trigger terms in both English and Chinese: '检查证明', 'verify proof', 'proof check', '审证明', 'check this proof', plus contextual triggers like 'rigorous mathematical verification of a theory paper'. Good multilingual coverage of terms users would naturally say.

3 / 3

Distinctiveness Conflict Risk

Highly distinctive niche: mathematical proof verification in LaTeX with a specific cross-model review workflow. The combination of LaTeX proofs, mathematical rigor, audit reports, and bilingual Chinese/English triggers makes it very unlikely to conflict with other skills.

3 / 3

Total

12

/

12

Passed

Implementation

55%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This skill demonstrates exceptional workflow design and actionability — the multi-phase verification process with adversarial review, counterexample red-teaming, blind re-review, and formal acceptance gates is thorough and well-structured. However, it is severely undermined by its extreme length and monolithic structure: the entire skill is a single massive document that inlines taxonomy tables, JSON schemas, opt-in mode specifications, and failure-mode documentation that should be split across reference files. The token cost is very high, with significant portions covering mathematical knowledge Claude already possesses.

Suggestions

Extract the 20-category issue taxonomy, two-axis severity system, and side-condition checklists into a separate reference file (e.g., TAXONOMY.md) and link to it from the main skill.

Move the detailed JSON schemas for PROOF_AUDIT.json (including deep_fix_plans and restatement_drift optional fields) into a separate SCHEMA.md or the referenced shared-references/assurance-contract.md.

Move the deep-fix and restatement-check opt-in specifications (Phase 3.6 algorithm, failure modes, field semantics) into separate reference files since they are opt-in features that most invocations won't use.

Remove or drastically compress the side-condition checklists for common theorems (DCT, MCT, Fubini, etc.) — Claude already knows these conditions and a brief reminder ('verify all side-conditions for cited theorems') would suffice.

DimensionReasoningScore

Conciseness

This skill is extremely verbose — well over 500 lines with exhaustive taxonomy tables, detailed JSON schemas, multiple opt-in mode specifications, failure mode documentation, and extensive side-condition checklists. Much of this (e.g., the 20-category issue taxonomy, the side-condition checklists for DCT/MCT/Fubini, the two-axis severity system) is knowledge Claude already possesses. The deep-fix and restatement-check opt-in sections alone consume hundreds of tokens on schema details and failure-mode edge cases that could be in separate reference files.

1 / 3

Actionability

The skill provides highly concrete, executable guidance: specific MCP tool invocations with exact prompt templates, bash commands for LaTeX compilation, structured JSON schemas for output artifacts, explicit fix-recording templates, and detailed per-phase instructions. The reviewer prompt is copy-paste ready, and output formats are fully specified with examples.

3 / 3

Workflow Clarity

The multi-phase workflow (0 → 0.5 → 1 → 1.5 → 2 → 3 → 3.5 → 3.6 → 3.9 → 4 → 5) is clearly sequenced with explicit validation checkpoints: acceptance gates, compile checks after fixes, blind re-review for FATAL/CRITICAL fixes, regression proof-audit, and a clear unrecoverable protocol (Phase 3.9) with feedback loops (repeat Phases 2-3 up to MAX_REVIEW_ROUNDS). The workflow handles error recovery thoroughly.

3 / 3

Progressive Disclosure

This is a monolithic wall of text with everything inlined into a single massive SKILL.md. The 20-category taxonomy, side-condition checklists, detailed JSON schemas, deep-fix specifications, restatement-check algorithm, and submission artifact schemas should all be in separate reference files. There is one reference to `shared-references/reviewer-routing.md` and `shared-references/assurance-contract.md`, but no bundle files are provided, and the vast majority of content that should be split out remains inline.

1 / 3

Total

8

/

12

Passed

Validation

72%

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation8 / 11 Passed

Validation for skill structure

CriteriaDescriptionResult

skill_md_line_count

SKILL.md is long (711 lines); consider splitting into references/ and linking

Warning

allowed_tools_field

'allowed-tools' contains unusual tool name(s)

Warning

frontmatter_unknown_keys

Unknown frontmatter key(s) found; consider removing or moving to metadata

Warning

Total

8

/

11

Passed

Repository
wanshuiyin/Auto-claude-code-research-in-sleep
Reviewed

Table of Contents

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.