semantic-consistency-auditor

Use semantic consistency auditor for academic writing workflows that need structured execution, explicit assumptions, and clear output boundaries.

Quality

—

Does it follow best practices?

Impact

—

No eval scenarios have been run

Securityby

Passed

No known issues

Quality

Content

77%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

The body is highly actionable with real, verified commands and formats and a well-sequenced workflow, but it is weighed down by verbatim repetition, duplicated command sections, and generic boilerplate, and keeps most detail inline rather than progressively disclosing it.

Suggestions

Remove the verbatim repetition of the description in 'When to Use' and 'Key Features', and consolidate the duplicated 'Quick Check' and 'Audit-Ready Commands' into one section.

Cut generic boilerplate (Output Requirements, Response Template, Evaluation Criteria, Test Cases) that is unrelated to the BERTScore/COMET task, or move it to a reference file.

Move the detailed algorithm, installation, configuration, and input/output format sections into a reference file, keeping SKILL.md a concise overview that links out.

Dimension	Reasoning	Score
Conciseness	Useful domain content (commands, config, JSON formats) is present, but the description string is repeated verbatim three times, the 'Quick Check' and 'Audit-Ready Commands' sections duplicate the same two commands, and generic boilerplate sections (Output Requirements, Response Template, Evaluation Criteria) pad the body.	2 / 3
Actionability	Concrete executable commands, a full config YAML, JSON input/output schemas, and CLI flags that match the actual scripts/main.py are provided in copy-paste-ready form.	3 / 3
Workflow Clarity	The Workflow section lists five sequenced steps with an explicit validation checkpoint ('Validate that the request matches the documented scope and stop early') and a fallback path, and the Error Handling section adds a clear feedback loop for failures.	3 / 3
Progressive Disclosure	The bundle is real and correctly referenced one level deep (references/audit-reference.md and scripts/main.py both exist and are linked), but most domain detail (algorithms, installation, configuration, input/output formats) is inline in the ~360-line SKILL.md rather than split into separate reference files.	2 / 3
	Total	10 / 12 Passed

Description

27%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

The description states a generic domain and process attributes but omits the skill's actual capability (semantic consistency evaluation via BERTScore/COMET) and lacks natural user trigger terms, making it both incomplete and easily confusable with other structured-writing skills.

Suggestions

State the concrete capability, e.g. 'Evaluates semantic consistency between AI-generated clinical notes and expert gold standards using BERTScore and COMET.'

Add natural trigger terms a user would say, such as 'semantic consistency', 'BERTScore', 'COMET', 'clinical notes', or 'gold-standard comparison'.

Replace generic process phrasing ('structured execution, explicit assumptions') with what the skill does, to distinguish it from other academic-writing skills.

Dimension	Reasoning	Score
Specificity	Names a domain ('academic writing workflows') and process attributes ('structured execution', 'explicit assumptions', 'clear output boundaries') but never states a concrete action like evaluating semantic consistency, so it stops short of listing multiple concrete actions.	2 / 3
Completeness	It gives a 'when'-style framing ('for academic writing workflows that need...') but the 'what does this do' is effectively missing — the actual evaluation capability is never described — so it does not clearly answer both what and when.	2 / 3
Trigger Term Quality	Terms like 'structured execution', 'explicit assumptions', and 'academic writing workflows' are not phrasing a user would naturally say when needing to compare AI clinical notes to gold standards; no natural keywords (e.g. BERTScore, semantic consistency, clinical notes) appear.	1 / 3
Distinctiveness Conflict Risk	'Academic writing workflows that need structured execution, explicit assumptions, and clear output boundaries' is generic enough to apply to almost any structured academic-writing skill and would conflict with many of them.	1 / 3
	Total	6 / 12 Passed

Validation

93%

Warnings & errors only

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation — 15 / 16 Passed

Validation for skill structure

Criteria	Description	Result
frontmatter_unknown_keys	Unknown frontmatter key(s) found; consider removing or moving to metadata	Warning

	Total	15 / 16 Passed

Repository: aipoch/medical-research-skills
Commit: d924410

Reviewed: 3 days ago

Table of Contents

Discovery Implementation Validation

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.