Patient safety evaluation harness for healthcare application deployments. Automated test suites for CDSS accuracy, PHI exposure, clinical workflow integrity, and integration compliance. Blocks deployments on safety failures.
77
77%
Does it follow best practices?
Impact
Pending
No eval scenarios have been run
Passed
No known issues
Quality
Discovery
67%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
The description is strong in specificity and distinctiveness, clearly carving out a unique niche in healthcare deployment safety testing. However, it lacks an explicit 'Use when...' clause and relies on technical acronyms (CDSS, PHI) without expansion, which could hinder skill selection when users describe their needs in more natural language.
Suggestions
Add an explicit 'Use when...' clause, e.g., 'Use when deploying healthcare applications, running clinical safety checks, or validating medical decision support systems before release.'
Expand acronyms and add natural language variations: 'clinical decision support systems (CDSS)', 'protected health information (PHI)', 'HIPAA compliance', 'medical app testing'.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | Lists multiple specific concrete actions: 'automated test suites for CDSS accuracy, PHI exposure, clinical workflow integrity, and integration compliance' and 'blocks deployments on safety failures.' These are concrete, domain-specific capabilities. | 3 / 3 |
Completeness | Clearly answers 'what does this do' (patient safety evaluation, automated test suites, deployment blocking), but lacks an explicit 'Use when...' clause or equivalent trigger guidance. The 'when' is only implied by context. | 2 / 3 |
Trigger Term Quality | Includes some relevant domain keywords like 'patient safety', 'CDSS', 'PHI', 'clinical workflow', and 'deployments', but uses technical jargon (CDSS, PHI) that users might not naturally say. Missing common variations like 'healthcare testing', 'clinical decision support', 'HIPAA', or 'medical app deployment'. | 2 / 3 |
Distinctiveness Conflict Risk | Highly distinctive niche combining healthcare, patient safety, deployment testing, CDSS, and PHI. Very unlikely to conflict with other skills given the specific healthcare deployment safety focus. | 3 / 3 |
Total | 10 / 12 Passed |
Implementation
77%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This is a strong, highly actionable skill with clear workflow sequencing and explicit validation gates appropriate for safety-critical healthcare deployments. Its main weakness is redundancy — the HIGH gate pass-rate scripts are repeated verbatim across sections, inflating token cost. The content would benefit from splitting the CI/CD configuration and examples into separate referenced files.
Suggestions
Extract the CI/CD YAML workflow into a separate referenced file (e.g., CI_PIPELINE.md or a template YAML) to reduce duplication and improve progressive disclosure.
Consolidate the HIGH gate pass-rate bash script into a single reusable snippet referenced by name rather than repeating it three times.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | The content is mostly efficient but has notable redundancy — the bash scripts for HIGH gates appear three times (in the category descriptions, CI/CD YAML, and examples). The anti-patterns section and 'When to Use' list add value but could be tighter. Some explanatory text like 'Patient safety is non-negotiable' is filler. | 2 / 3 |
Actionability | Fully executable bash commands, a complete GitHub Actions YAML workflow, and concrete examples with expected output are provided. Every test category has a copy-paste-ready command, and the CI/CD integration is production-ready. | 3 / 3 |
Workflow Clarity | The five-step sequential process is clearly defined with explicit pass/fail thresholds, a pass/fail matrix table, bail-on-failure for critical gates, and clear escalation paths (BLOCK vs WARN with review). The ordering is intentional and validation is built into every step. | 3 / 3 |
Progressive Disclosure | The content is well-structured with clear sections and headers, but it's a long monolithic file (~150+ lines of substantive content). The CI/CD YAML, anti-patterns, and examples could be split into referenced files. No external references are provided for deeper topics like FHIR validation or HL7 parsing specifics. | 2 / 3 |
Total | 10 / 12 Passed |
Validation
90%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 10 / 11 Passed
Validation for skill structure
| Criteria | Description | Result |
|---|---|---|
frontmatter_unknown_keys | Unknown frontmatter key(s) found; consider removing or moving to metadata | Warning |
Total | 10 / 11 Passed | |
Reviewed
Table of Contents