preprocessing-data-with-automated-pipelines

tessl i github:jeremylongshore/claude-code-plugins-plus-skills --skill preprocessing-data-with-automated-pipelines

Process automate data cleaning, transformation, and validation for ML tasks. Use when requesting "preprocess data", "clean data", "ETL pipeline", or "data transformation". Trigger with relevant phrases based on skill purpose.

43%

Overall

Validation — 81%

Implementation — 7%

Activation — 67%

SKILL.md

Review

Evals

Validation

81%

Warnings & errors only

Criteria	Description	Result
allowed_tools_field	'allowed-tools' contains unusual tool name(s)	Warning
metadata_version	'metadata' field is not a dictionary	Warning
frontmatter_unknown_keys	Unknown frontmatter key(s) found; consider removing or moving to metadata	Warning

	Total	13 / 16 Passed

Implementation

This skill is essentially a template or placeholder with no actionable content. It describes what a data preprocessing skill should do in abstract terms but provides zero executable code, no concrete examples, and no specific guidance. The content explains concepts Claude already understands while failing to provide the actual implementation details that would make this skill useful.

Suggestions

Replace abstract descriptions with executable Python code examples using pandas, scikit-learn, or similar libraries for common preprocessing tasks (e.g., handling missing values, encoding categorical variables, scaling)

Remove the verbose 'Overview', 'How It Works', and 'When to Use' sections - replace with a quick-start code snippet that demonstrates immediate value

Add concrete validation steps with actual commands, such as data quality checks, schema validation, and before/after metrics

Provide specific, copy-paste ready code for the examples instead of describing what 'the skill will do'

Dimension	Reasoning	Score
Conciseness	Extremely verbose with extensive explanations of concepts Claude already knows (what ETL is, what data preprocessing means). The 'Overview', 'How It Works', and 'When to Use' sections are largely redundant and explain obvious concepts rather than providing actionable guidance.	1 / 3
Actionability	No executable code anywhere in the skill. Examples describe what 'the skill will do' in abstract terms rather than providing actual Python code snippets. Instructions like 'Invoke this skill when trigger conditions are met' are completely vague and non-actionable.	1 / 3
Workflow Clarity	The 'How It Works' section lists abstract steps without any concrete validation checkpoints or feedback loops. No actual commands, no validation steps, and the 'Instructions' section is generic boilerplate that provides no real workflow guidance for data preprocessing tasks.	1 / 3
Progressive Disclosure	The content has some structural organization with headers, but it's a monolithic document with no references to external files for detailed content. The 'Resources' section mentions documentation but provides no actual links or file references.	2 / 3
	Total	5 / 12 Passed

Activation

67%

The description provides adequate structure with explicit 'Use when' triggers and covers the ML data processing domain, but suffers from vague action descriptions and includes meaningless filler text ('Trigger with relevant phrases based on skill purpose'). The trigger terms are reasonable but incomplete, missing common ML preprocessing vocabulary.

Suggestions

Replace vague actions with specific concrete operations like 'handle missing values, normalize features, encode categorical variables, remove outliers, split train/test sets'

Remove the meaningless filler sentence 'Trigger with relevant phrases based on skill purpose' and instead add more natural trigger terms like 'feature engineering', 'data wrangling', 'normalize data', 'handle nulls'

Add file format triggers to improve distinctiveness, such as 'CSV cleaning', 'DataFrame preprocessing', or 'tabular data preparation'

Dimension	Reasoning	Score
Specificity	Names the domain (ML data processing) and lists general actions (cleaning, transformation, validation), but lacks concrete specifics like 'handle missing values', 'normalize features', or 'encode categorical variables'.	2 / 3
Completeness	Explicitly answers both what (data cleaning, transformation, validation for ML) and when (with a 'Use when' clause listing specific trigger phrases), meeting the rubric requirement for explicit triggers.	3 / 3
Trigger Term Quality	Includes some useful trigger terms ('preprocess data', 'clean data', 'ETL pipeline', 'data transformation') but the final sentence 'Trigger with relevant phrases based on skill purpose' is vague filler that adds no value. Missing common variations like 'feature engineering', 'data wrangling', 'normalize', 'missing values'.	2 / 3
Distinctiveness Conflict Risk	The ML focus helps distinguish it from general data processing, but 'data transformation' and 'ETL pipeline' could overlap with database or general data engineering skills. The niche is somewhat defined but not sharply bounded.	2 / 3
	Total	9 / 12 Passed

Reviewed

16 days ago

Table of Contents

Validation Implementation Activation

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.