Python data science: notebook structure, data validation, reproducibility, and model documentation
69
60%
Does it follow best practices?
Impact
76%
1.11xAverage score across 3 eval scenarios
Passed
No known issues
Optimize this skill with Tessl
npx tessl skill review --optimize ./skills/data-science-python/SKILL.mdOrganize notebooks in this order:
Keep notebooks for exploration. Move reusable logic to src/ Python modules with tests.
# Always validate after loading
assert df.shape[0] > 0, "DataFrame is empty"
assert df.isnull().sum().sum() == 0, f"Nulls found: {df.isnull().sum()}"
assert df['price'].between(0, 1_000_000).all(), "Price out of expected range"
# For production pipelines: use pandera
import pandera as pa
schema = pa.DataFrameSchema({
"price": pa.Column(float, pa.Check.ge(0)),
"category": pa.Column(str, pa.Check.isin(["A", "B", "C"])),
})
schema.validate(df)import random
import numpy as np
SEED = 42
random.seed(SEED)
np.random.seed(SEED)
# sklearn: pass random_state=SEED to all estimatorsrequirements.txt or pyproject.toml.experiments/ log with metadata JSON.model_rf_20260115_v1.pkl.Document for every model:
# Extract transforms into sklearn Pipeline
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestClassifier
pipeline = Pipeline([
('scaler', StandardScaler()),
('classifier', RandomForestClassifier(random_state=SEED)),
])tests/.pathlib.Path — never hardcode file paths.logging not print in production code.c0b2e4b
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.