preprocessing-data-with-automated-pipelines

Process automate data cleaning, transformation, and validation for ML tasks. Use when requesting "preprocess data", "clean data", "ETL pipeline", or "data transformation". Trigger with relevant phrases based on skill purpose.

Quality

33%

Does it follow best practices?

Impact

—

No eval scenarios have been run

Securityby

Passed

No known issues

Optimize this skill with Tessl

npx tessl skill review --optimize ./plugins/ai-ml/data-preprocessing-pipeline/skills/preprocessing-data-with-automated-pipelines/SKILL.md

Quality

Discovery

67%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

The description has a reasonable structure with both 'what' and 'when' clauses, but suffers from moderate vagueness in its capability listing and includes a meaningless filler sentence ('Trigger with relevant phrases based on skill purpose') that adds no value. The trigger terms cover some common phrases but miss many natural variations users would employ when needing ML data preprocessing.

Suggestions

Replace the vague filler sentence 'Trigger with relevant phrases based on skill purpose' with additional specific trigger terms like 'feature engineering', 'missing values', 'normalize data', 'data wrangling', 'encode categorical variables'.

Make capabilities more concrete by listing specific actions such as 'handle missing values, remove duplicates, normalize/scale features, encode categorical variables, split train/test sets' instead of broad categories.

Add file format or tool references (e.g., 'CSV, Parquet, pandas DataFrames') to improve distinctiveness from generic data processing skills.

Dimension	Reasoning	Score
Specificity	Names the domain (ML data preprocessing) and some actions ('data cleaning, transformation, and validation'), but these are fairly broad categories rather than multiple specific concrete actions like 'remove duplicates, handle missing values, normalize columns, encode categorical features'.	2 / 3
Completeness	Explicitly answers both 'what' (data cleaning, transformation, and validation for ML tasks) and 'when' (with a 'Use when' clause listing specific trigger phrases). Despite the vague trailing sentence, the core structure covers both dimensions.	3 / 3
Trigger Term Quality	Includes some useful trigger terms like 'preprocess data', 'clean data', 'ETL pipeline', and 'data transformation', but misses common variations users might say such as 'feature engineering', 'missing values', 'normalize', 'encode', 'impute', or 'data wrangling'. The final sentence 'Trigger with relevant phrases based on skill purpose' is meaningless filler.	2 / 3
Distinctiveness Conflict Risk	The ML-specific focus helps narrow the scope somewhat, but terms like 'data cleaning' and 'data transformation' are broad enough to overlap with general data processing, database ETL, or analytics skills. The trailing generic sentence further dilutes distinctiveness.	2 / 3
	Total	9 / 12 Passed

Implementation

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This skill is almost entirely generic boilerplate with no actionable, executable content. It reads like a template that was filled in with abstract descriptions rather than concrete instructions, code examples, or specific guidance. Every section either restates obvious information Claude already knows or describes what should happen without showing how to do it.

Suggestions

Replace the abstract examples with concrete, executable Python code using specific libraries (e.g., pandas for cleaning, sklearn.preprocessing for transformations) with actual input/output demonstrations.

Remove all generic boilerplate sections (Integration, Prerequisites, Instructions, Output, Error Handling, Resources) that contain only placeholder content and add no value.

Add a concrete workflow with validation checkpoints, e.g., '1. Load data → 2. Profile with df.describe() → 3. Clean nulls → 4. Validate with assertions → 5. Export' with actual code for each step.

Consolidate the 'Overview' and 'How It Works' sections into a single lean section that focuses on what's unique about this skill's approach rather than restating generic concepts.

Dimension	Reasoning	Score
Conciseness	Extremely verbose with extensive padding. The 'Overview' section repeats the title/description. Sections like 'How It Works', 'When to Use This Skill', 'Integration', 'Prerequisites', 'Instructions', 'Output', 'Error Handling', and 'Resources' are all generic filler that explain nothing Claude doesn't already know. The content could be reduced by 80% without losing any actionable information.	1 / 3
Actionability	No executable code anywhere in the skill. Examples describe what 'the skill will' do in abstract terms rather than providing concrete Python code, specific library usage, or copy-paste ready commands. Phrases like 'using appropriate techniques (e.g., mean imputation)' and 'suitable storage location' are vague hand-waving rather than actionable guidance.	1 / 3
Workflow Clarity	The 'How It Works' section lists abstract meta-steps (analyze, generate, execute, provide metrics) that describe a generic agent loop rather than a concrete data preprocessing workflow. No validation checkpoints, no feedback loops for error recovery, and no specific sequencing of actual preprocessing operations. The examples describe outcomes without showing the actual steps.	1 / 3
Progressive Disclosure	Monolithic wall of text with no references to external files and no bundle files provided. Content is poorly organized with many boilerplate sections ('Resources', 'Output', 'Prerequisites') that contain placeholder-quality content. No clear navigation structure or separation of concerns.	1 / 3
	Total	4 / 12 Passed

Validation

81%

Warnings & errors only

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation — 9 / 11 Passed

Validation for skill structure

Criteria	Description	Result
allowed_tools_field	'allowed-tools' contains unusual tool name(s)	Warning
frontmatter_unknown_keys	Unknown frontmatter key(s) found; consider removing or moving to metadata	Warning

	Total	9 / 11 Passed

Repository: jeremylongshore/claude-code-plugins-plus-skills
Commit: 196527a

Reviewed: about 22 hours ago

Table of Contents

Discovery Implementation Validation

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.