semantic-consistency-auditor

Use semantic consistency auditor for academic writing workflows that need structured execution, explicit assumptions, and clear output boundaries.

Quality

17%

Does it follow best practices?

Impact

—

No eval scenarios have been run

Securityby

Passed

No known issues

Optimize this skill with Tessl

npx tessl skill review --optimize ./scientific-skills/Academic Writing/semantic-consistency-auditor/SKILL.md

Quality

Discovery

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

This description is heavily padded with abstract buzzwords ('structured execution', 'explicit assumptions', 'clear output boundaries') without explaining what the skill concretely does. It fails to list specific actions, lacks natural trigger terms a user would use, and does not clearly answer either 'what does this do' or 'when should Claude use it' in a meaningful way.

Suggestions

Replace abstract buzzwords with concrete actions the skill performs, e.g., 'Checks academic papers for terminological consistency, flags contradictory claims, and verifies definition usage across sections.'

Add a clear 'Use when...' clause with natural trigger terms, e.g., 'Use when the user asks to review a paper for consistency, check terminology usage, or audit definitions in academic writing.'

Remove vague phrases like 'structured execution' and 'clear output boundaries' that add no discriminative value for skill selection.

Dimension	Reasoning	Score
Specificity	The description uses vague, abstract language like 'structured execution', 'explicit assumptions', and 'clear output boundaries' without listing any concrete actions the skill performs. There are no specific capabilities mentioned—just buzzwords.	1 / 3
Completeness	The description has a weak 'when' clause ('for academic writing workflows that need...') but the 'what' is essentially missing—it never explains what the skill actually does beyond the vague name 'semantic consistency auditor'. Both components are very weak.	1 / 3
Trigger Term Quality	The terms 'semantic consistency auditor', 'structured execution', 'explicit assumptions', and 'clear output boundaries' are not natural phrases a user would say. Only 'academic writing' is somewhat natural, but the rest is jargon. A user would more likely say 'check my paper for consistency' or 'review my thesis'.	1 / 3
Distinctiveness Conflict Risk	The term 'semantic consistency auditor' is fairly unique and unlikely to conflict with many other skills, but the vague description of what it actually does ('structured execution', 'explicit assumptions') could overlap with other academic writing or editing tools.	2 / 3
	Total	5 / 12 Passed

Implementation

27%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This skill suffers from severe verbosity, redundancy, and poor organization. It contains multiple overlapping sections covering the same topics (installation, usage, workflow), explains well-known ML concepts unnecessarily, and has broken internal cross-references. While it does provide some concrete CLI and Python API examples, the overall structure makes it difficult to follow and wastes significant token budget on generic boilerplate that adds no value.

Suggestions

Consolidate redundant sections: merge 'Example Usage'/'Usage', 'Quick Check'/'Audit-Ready Commands', 'Dependencies'/'Prerequisites'/'Installation' into single sections each, cutting the document by at least 40%.

Remove explanations of BERTScore/COMET algorithms, precision/recall definitions, and other ML concepts Claude already knows — keep only the specific configuration and thresholds relevant to this tool.

Fix broken internal cross-references (e.g., 'See ## Prerequisites above' when it appears below) and move detailed output format schemas and configuration examples to separate reference files.

Integrate validation checkpoints directly into the workflow steps rather than having a separate generic 'Error Handling' section — e.g., 'After running main.py, verify output JSON contains expected fields before returning results.'

Dimension	Reasoning	Score
Conciseness	The skill is extremely verbose and repetitive. It contains multiple redundant sections (e.g., 'Example Usage' and 'Usage', 'Quick Check' and 'Audit-Ready Commands' with identical content, 'Dependencies' and 'Prerequisites' and 'Installation' all covering setup). It explains concepts Claude already knows (what BERTScore and COMET are, what precision/recall/F1 mean), includes a changelog with a single entry, and has boilerplate sections like 'Evaluation Criteria' with generic test case descriptions. The 'When to Use' section repeats the description nearly verbatim three times.	1 / 3
Actionability	The skill provides concrete CLI commands and Python API examples with specific arguments, which is good. However, much of the code is in ```text blocks rather than properly typed code blocks, the installation command uses 'pip install bertscore comet-ml' which may not match the actual package names, version numbers are 'unspecified' throughout, and the Python API example imports from a module ('semantic_consistency_auditor') without clarifying its relationship to scripts/main.py. The examples are plausible but not verifiably executable.	2 / 3
Workflow Clarity	There are multiple workflow sections ('Example Usage' run plan, 'Workflow' section, 'Response Template') that present overlapping but different step sequences, creating confusion about which workflow to follow. The main 'Workflow' section is abstract and generic (could apply to any skill). Error handling is mentioned but validation checkpoints are not integrated into the workflow steps themselves — they're in a separate section. Missing explicit validate-then-proceed feedback loops for the actual semantic evaluation process.	2 / 3
Progressive Disclosure	The skill is a monolithic wall of text at ~300+ lines with no meaningful progressive disclosure. It references 'references/audit-reference.md' but no bundle files are provided. Multiple sections reference other sections within the same document with broken cross-references ('See `## Prerequisites` above' when Prerequisites appears below, 'See `## Usage` above' when Usage appears below). Content that should be in separate files (full output format schemas, algorithm explanations, configuration details) is all inline.	1 / 3
	Total	6 / 12 Passed

Validation

90%

Warnings & errors only

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation — 10 / 11 Passed

Validation for skill structure

Criteria	Description	Result
frontmatter_unknown_keys	Unknown frontmatter key(s) found; consider removing or moving to metadata	Warning

	Total	10 / 11 Passed

Repository: aipoch/medical-research-skills
Commit: 73f6514

Reviewed: 3 days ago

Table of Contents

Discovery Implementation Validation

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.