Enforces Polars over Pandas for functional pipe-style data manipulation (like dplyr in R). Use when writing Python data processing code, data transformation pipelines, ETL workflows, or analytical queries—e.g., "process this CSV", "aggregate sales data", "filter and transform DataFrame", "group by and calculate metrics".
90
Does it follow best practices?
If you maintain this skill, you can automatically optimize it using the tessl CLI to improve its score:
npx tessl skill review --optimize ./path/to/skillValidation for skill structure
Enforce Polars as the default for all Python data manipulation. Polars provides a functional, pipe-style API similar to dplyr in R—code reads as a clear series of composable transformations rather than imperative steps. This produces more readable, maintainable, and performant data pipelines.
YOU MUST use Polars for ALL data manipulation in Python. No exceptions for new code.
IMMEDIATELY upon loading this skill:
%>%)If you MUST use pandas (ML ecosystem compatibility only):
df_polars.to_pandas())Polars represents the modern functional approach to data manipulation—like dplyr in R, it emphasizes readable pipelines over imperative code. Legacy pandas patterns (in-place mutation, index manipulation, row iteration) are technical debt.
| Principle | Polars (Good) | Pandas (Bad) |
|---|---|---|
| Method chains | Continuous pipeline | Breaking into separate statements |
| Readability | Each step clearly named | Mental state tracking required |
| Expressions | pl.col("x") * 2 | df["x"] * 2 (implicit) |
| Immutability | df.with_columns(...) returns new | df["x"] = y mutates in place |
| Functional | Data flows through transformations | Imperative step-by-step |
GOOD - Clear pipe-style chain:
import polars as pl
result = (
pl.scan_csv("sales.csv") # 1. Start with data source
.filter(pl.col("date") >= "2024-01-01") # 2. Filter to relevant rows
.with_columns( # 3. Add computed columns
revenue=pl.col("units") * pl.col("price"),
is_weekend=pl.col("date").dt.weekday().is_in([6, 7]),
)
.group_by(["region", "is_weekend"]) # 4. Group for aggregation
.agg( # 5. Calculate metrics
total_revenue=pl.col("revenue").sum(),
order_count=pl.col("order_id").count(),
avg_order=pl.col("revenue").mean(),
)
.filter(pl.col("order_count") > 10) # 6. Filter aggregated results
.sort(["region", "total_revenue"], descending=[False, True])
.collect() # 7. Execute pipeline
)BAD - Broken into imperative steps:
import pandas as pd # WRONG - using pandas
df = pd.read_csv("sales.csv") # Eager load
df = df[df["date"] >= "2024-01-01"] # Filter
df["revenue"] = df["units"] * df["price"] # Mutate
df["is_weekend"] = df["date"].dt.weekday.isin([6, 7]) # More mutation
result = df.groupby(["region", "is_weekend"]).agg({ # Group and aggregate
"revenue": ["sum", "mean", "count"]
})pl.scan_csv(), pl.scan_parquet() for large datapl.col("name") over string indexing.with_columns()context7_query-docs(libraryId="/pola-rs/polars", query="...")iter_rows() - use vectorized expressionsapply() - use native Polars expressions| Operation | Pandas (Imperative) | Polars (Pipe-Style) |
|---|---|---|
| Read CSV (large) | pd.read_csv() then filter | pl.scan_csv().filter().collect() |
| Select columns | df[["a", "b"]] | df.select("a", "b") |
| Filter rows | df[df.a > 10] | df.filter(pl.col("a") > 10) |
| Add column | df["c"] = df.a + df.b | df.with_columns(c=pl.col("a") + pl.col("b")) |
| Group by + agg | df.groupby("x").y.sum() | df.group_by("x").agg(pl.col("y").sum()) |
| Window/rank | df.groupby("x").y.rank() | df.with_columns(pl.col("y").rank().over("x")) |
| Conditional | np.where(df.a > 10, "high", "low") | pl.when(pl.col("a") > 10).then("high").otherwise("low") |
| Join | df1.merge(df2, on="id") | df1.join(df2, on="id") |
| Sort | df.sort_values("a") | df.sort("a") |
| Drop duplicates | df.drop_duplicates() | df.unique() |
| Missing values | df.fillna(0) | df.fill_null(0) |
Master these patterns for pipe-style code:
# Column reference and arithmetic
pl.col("revenue") / pl.col("units")
# Conditional logic (CASE WHEN equivalent)
pl.when(pl.col("age") >= 18).then("adult").otherwise("minor")
# String operations
pl.col("name").str.to_uppercase()
pl.col("email").str.contains("@")
# Date/time operations
pl.col("timestamp").dt.year()
pl.col("date").dt.truncate("1d")
# Aggregations (use in .agg() context)
pl.col("value").sum()
pl.col("value").mean()
pl.col("id").n_unique()
# Window functions (use .over() for group transforms)
pl.col("value").sum().over("category") # Category total per row
pl.col("value").rank().over("category") # Rank within categoryYOU MUST query Context7 for Polars APIs before writing implementation code.
# Resolve library ID first
context7_resolve-library-id(libraryName="polars", query="Polars DataFrame library")
# Then query for specific operations
context7_query-docs(libraryId="/pola-rs/polars", query="lazy scan filter example")Context7 provides current, accurate API documentation. Pre-trained knowledge of Polars APIs will be stale and cause bugs.
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.