validating-ai-ethics-and-fairness

tessl i github:jeremylongshore/claude-code-plugins-plus-skills --skill validating-ai-ethics-and-fairness

Validate AI/ML models and datasets for bias, fairness, and ethical concerns. Use when auditing AI systems for ethical compliance, fairness assessment, or bias detection. Trigger with phrases like "evaluate model fairness", "check for bias", or "validate AI ethics".

53%

Overall

Validation — 81%

Implementation — 13%

Activation — 90%

SKILL.md

Review

Evals

Validation

81%

Warnings & errors only

Criteria	Description	Result
allowed_tools_field	'allowed-tools' contains unusual tool name(s)	Warning
metadata_version	'metadata' field is not a dictionary	Warning
frontmatter_unknown_keys	Unknown frontmatter key(s) found; consider removing or moving to metadata	Warning

	Total	13 / 16 Passed

Implementation

13%

This skill is verbose and abstract, explaining concepts Claude already understands while failing to provide any executable code or concrete commands. The workflow structure exists but lacks validation checkpoints, and the content organization is poor with redundant sections and empty placeholders. It reads more like a conceptual overview than actionable guidance.

Suggestions

Add executable Python code examples using Fairlearn or AIF360 showing actual bias detection (e.g., `from fairlearn.metrics import demographic_parity_difference; dpd = demographic_parity_difference(y_true, y_pred, sensitive_features=gender)`)

Remove explanatory content about what fairness metrics are and what bias means - Claude knows this. Focus only on project-specific configurations or non-obvious implementation details.

Add validation checkpoints to the workflow, such as 'Verify sample sizes are sufficient before proceeding' with specific thresholds and commands to check them.

Move the Resources section to a separate RESOURCES.md file and delete the empty Overview and Examples sections at the bottom.

Dimension	Reasoning	Score
Conciseness	Extremely verbose with unnecessary explanations Claude already knows (what fairness metrics are, what demographic parity means). Contains redundant sections (Overview repeats the intro, Examples section is empty placeholder). Many bullet points explain concepts rather than provide actionable guidance.	1 / 3
Actionability	No executable code despite mentioning Python libraries like Fairlearn and AIF360. Instructions are vague ('Use the skill to examine', 'Load model predictions') without concrete commands or code examples. References tools but never shows how to use them.	1 / 3
Workflow Clarity	Steps are numbered and sequenced logically (Identify Scope → Analyze → Report → Mitigate), but lacks validation checkpoints. No feedback loops for verifying bias detection results or confirming mitigation effectiveness. Error handling section exists but is disconnected from the workflow.	2 / 3
Progressive Disclosure	Monolithic wall of text with no references to external files. All content is inline including detailed resources that could be separate. The Overview section at the bottom is misplaced and redundant. Examples section is an empty placeholder.	1 / 3
	Total	5 / 12 Passed

Activation

90%

This is a well-structured description with strong completeness and trigger term coverage. It explicitly addresses both what the skill does and when to use it, with natural language triggers. The main weakness is the somewhat generic action verbs (validate, audit) rather than listing specific concrete capabilities.

Suggestions

Add more specific concrete actions such as 'analyze demographic parity', 'generate fairness metrics reports', 'test for disparate impact', or 'evaluate protected attribute correlations' to improve specificity.

Dimension	Reasoning	Score
Specificity	Names the domain (AI/ML models and datasets) and some actions (validate, audit), but lacks specific concrete actions like 'analyze demographic parity metrics', 'generate fairness reports', or 'test for disparate impact'.	2 / 3
Completeness	Clearly answers both what (validate AI/ML models and datasets for bias, fairness, and ethical concerns) and when (explicit 'Use when' clause with triggers, plus 'Trigger with phrases' providing additional explicit guidance).	3 / 3
Trigger Term Quality	Includes good coverage of natural terms users would say: 'evaluate model fairness', 'check for bias', 'validate AI ethics', 'fairness assessment', 'bias detection', 'ethical compliance'. These are phrases users would naturally use.	3 / 3
Distinctiveness Conflict Risk	Clear niche focused specifically on AI ethics, bias, and fairness validation. Distinct triggers like 'model fairness', 'bias detection', and 'AI ethics' are unlikely to conflict with general ML or data processing skills.	3 / 3
	Total	11 / 12 Passed

Reviewed

16 days ago

Table of Contents

Validation Implementation Activation

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.