Machine learning in Python with scikit-learn. Use when working with supervised learning (classification, regression), unsupervised learning (clustering, dimensionality reduction), model evaluation, hyperparameter tuning, preprocessing, or building ML pipelines. Provides comprehensive reference documentation for algorithms, preprocessing techniques, pipelines, and best practices.
88
75%
Does it follow best practices?
Impact
98%
1.10xAverage score across 6 eval scenarios
Passed
No known issues
Optimize this skill with Tessl
npx tessl skill review --optimize ./scientific-skills/scikit-learn/SKILL.mdPipeline construction with mixed data
Uses Pipeline
100%
100%
ColumnTransformer for mixed data
100%
100%
Numeric imputation strategy
100%
100%
Categorical imputation strategy
100%
100%
handle_unknown='ignore'
100%
100%
Stratified split
100%
100%
random_state set
100%
100%
Fit on train only
100%
100%
Meaningful step names
100%
100%
Feature scaling present
100%
100%
Uses sklearn Pipeline not manual
100%
100%
Hyperparameter tuning with imbalanced data
Uses Pipeline
100%
100%
GridSearchCV or RandomizedSearchCV
100%
100%
Double-underscore param notation
100%
100%
StratifiedKFold or stratify in CV
100%
100%
Imbalanced metric used
100%
100%
Does NOT rely on accuracy alone
100%
100%
class_weight or resampling
100%
100%
n_jobs=-1 in search
100%
100%
Stratified split
100%
100%
random_state set
100%
100%
best_estimator_ used
100%
100%
Clustering analysis with optimal K selection
Scales before clustering
100%
100%
Silhouette score for optimal K
100%
100%
k-means++ init
0%
100%
random_state set
100%
100%
PCA for visualization
100%
100%
Scatter plot saved
100%
100%
Silhouette score reported
100%
100%
Fit scaler on full data
100%
100%
Cluster count reported
100%
100%
Does NOT skip scaling
100%
100%
Regression pipeline with persistence
Uses Pipeline
100%
100%
TransformedTargetRegressor
0%
100%
Target inverse transform
0%
100%
RMSE reported
100%
100%
MAE reported
100%
100%
R2 reported
100%
100%
joblib persistence
0%
0%
model.pkl exists
100%
100%
Fit on train only
100%
100%
random_state set
100%
100%
results.json structure
75%
100%
Text classification with TF-IDF pipeline
TfidfVectorizer used
100%
100%
stop_words parameter
0%
100%
ngram_range parameter
100%
100%
max_features parameter
100%
100%
min_df or max_df parameter
100%
100%
Uses Pipeline
100%
100%
Stratified split
100%
100%
classification_report used
100%
100%
classifier.pkl saved
100%
100%
Meaningful step names
100%
100%
Temporal cross-validation for forecasting
TimeSeriesSplit used
100%
100%
No random split for validation
100%
100%
RMSE reported per model
100%
100%
MAE reported per model
100%
100%
Multiple models compared
100%
100%
cross_validate or cross_val_score
62%
100%
cv_results.json structure
100%
100%
Data is time-ordered
100%
100%
random_state set
100%
100%
Pipeline or fit-on-train
100%
100%
086de41
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.