CtrlK
BlogDocsLog inGet started
Tessl Logo

tdg-personal/healthcare-eval-harness

Patient safety evaluation harness for healthcare application deployments. Automated test suites for CDSS accuracy, PHI exposure, clinical workflow integrity, and integration compliance. Blocks deployments on safety failures.

77

Quality

77%

Does it follow best practices?

Impact

Pending

No eval scenarios have been run

SecuritybySnyk

Passed

No known issues

Overview
Quality
Evals
Security
Files

Quality

Discovery

67%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

The description is strong in specificity and distinctiveness, clearly carving out a unique niche in healthcare deployment safety testing. However, it lacks an explicit 'Use when...' clause and relies on technical acronyms (CDSS, PHI) without expansion, which could hinder skill selection when users describe their needs in more natural language.

Suggestions

Add an explicit 'Use when...' clause, e.g., 'Use when deploying healthcare applications, running clinical safety checks, or validating medical decision support systems before release.'

Expand acronyms and add natural language variations: 'clinical decision support systems (CDSS)', 'protected health information (PHI)', 'HIPAA compliance', 'medical app testing'.

DimensionReasoningScore

Specificity

Lists multiple specific concrete actions: 'automated test suites for CDSS accuracy, PHI exposure, clinical workflow integrity, and integration compliance' and 'blocks deployments on safety failures.' These are concrete, domain-specific capabilities.

3 / 3

Completeness

Clearly answers 'what does this do' (patient safety evaluation, automated test suites, deployment blocking), but lacks an explicit 'Use when...' clause or equivalent trigger guidance. The 'when' is only implied by context.

2 / 3

Trigger Term Quality

Includes some relevant domain keywords like 'patient safety', 'CDSS', 'PHI', 'clinical workflow', and 'deployments', but uses technical jargon (CDSS, PHI) that users might not naturally say. Missing common variations like 'healthcare testing', 'clinical decision support', 'HIPAA', or 'medical app deployment'.

2 / 3

Distinctiveness Conflict Risk

Highly distinctive niche combining healthcare, patient safety, deployment testing, CDSS, and PHI. Very unlikely to conflict with other skills given the specific healthcare deployment safety focus.

3 / 3

Total

10

/

12

Passed

Implementation

77%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This is a strong, highly actionable skill with clear workflow sequencing and explicit validation gates appropriate for safety-critical healthcare deployments. Its main weakness is redundancy — the HIGH gate pass-rate scripts are repeated verbatim across sections, inflating token cost. The content would benefit from splitting the CI/CD configuration and examples into separate referenced files.

Suggestions

Extract the CI/CD YAML workflow into a separate referenced file (e.g., CI_PIPELINE.md or a template YAML) to reduce duplication and improve progressive disclosure.

Consolidate the HIGH gate pass-rate bash script into a single reusable snippet referenced by name rather than repeating it three times.

DimensionReasoningScore

Conciseness

The content is mostly efficient but has notable redundancy — the bash scripts for HIGH gates appear three times (in the category descriptions, CI/CD YAML, and examples). The anti-patterns section and 'When to Use' list add value but could be tighter. Some explanatory text like 'Patient safety is non-negotiable' is filler.

2 / 3

Actionability

Fully executable bash commands, a complete GitHub Actions YAML workflow, and concrete examples with expected output are provided. Every test category has a copy-paste-ready command, and the CI/CD integration is production-ready.

3 / 3

Workflow Clarity

The five-step sequential process is clearly defined with explicit pass/fail thresholds, a pass/fail matrix table, bail-on-failure for critical gates, and clear escalation paths (BLOCK vs WARN with review). The ordering is intentional and validation is built into every step.

3 / 3

Progressive Disclosure

The content is well-structured with clear sections and headers, but it's a long monolithic file (~150+ lines of substantive content). The CI/CD YAML, anti-patterns, and examples could be split into referenced files. No external references are provided for deeper topics like FHIR validation or HL7 parsing specifics.

2 / 3

Total

10

/

12

Passed

Validation

90%

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation10 / 11 Passed

Validation for skill structure

CriteriaDescriptionResult

frontmatter_unknown_keys

Unknown frontmatter key(s) found; consider removing or moving to metadata

Warning

Total

10

/

11

Passed

Reviewed

Table of Contents