A Grammar of Graphics for Python providing a declarative approach to data visualization similar to R's ggplot2
—
Plotnine includes a comprehensive collection of datasets commonly used for data visualization examples, tutorials, and exploration. These datasets provide real-world data across various domains including economics, biology, automotive, and demographics.
# Import specific datasets
from plotnine.data import mtcars, diamonds, economics
# Import all datasets (not recommended for production)
from plotnine.data import *
# Access via module reference
import plotnine.data as data
df = data.mtcars# Motor Trend Car Road Tests - 32 automobiles (1973-74 models)
mtcars: pandas.DataFrame
# Columns: mpg, cyl, disp, hp, drat, wt, qsec, vs, am, gear, carb
# Fuel economy data from 1999 and 2008 for 38 popular car models
mpg: pandas.DataFrame
# Columns: manufacturer, model, displ, year, cyl, trans, drv, cty, hwy, fl, classUsage Example:
from plotnine import ggplot, aes, geom_point
from plotnine.data import mtcars
# Scatter plot of weight vs mpg
plot = (ggplot(mtcars, aes(x='wt', y='mpg')) +
geom_point())# Prices and attributes of ~54,000 diamonds
diamonds: pandas.DataFrame
# Columns: price, carat, cut, color, clarity, x, y, z, depth, table
# cut: Factor with levels Fair, Good, Very Good, Premium, Ideal
# color: Factor with levels D, E, F, G, H, I, J (D=best, J=worst)
# clarity: Factor with levels I1, SI2, SI1, VS2, VS1, VVS2, VVS1, IF (I1=worst, IF=best)# US economic time series from FRED database
economics: pandas.DataFrame
# Columns: date, psavert, pce, unemploy, uempmed, pop
# US economic data in long format for easier visualization
economics_long: pandas.DataFrame
# Same data as economics but in tidy/long format# Palmer Penguins - 3 species from Antarctica
penguins: pandas.DataFrame
# Columns: species, island, bill_length_mm, bill_depth_mm, flipper_length_mm, body_mass_g, sex, year
# species: Factor with levels Adelie, Chinstrap, Gentoo
# Updated mammals sleep dataset
msleep: pandas.DataFrame
# Columns: name, genus, vore, order, conservation, sleep_total, sleep_rem, sleep_cycle, awake, brainwt, bodywt# Midwest demographics by county
midwest: pandas.DataFrame
# Columns: PID, county, state, area, poptotal, popdensity, popwhite, popblack, etc.
# Texas housing market data from TAMU real estate center
txhousing: pandas.DataFrame
# Columns: city, year, month, sales, volume, median, listings, inventory, date# Old Faithful Geyser eruption data
faithful: pandas.DataFrame
# Columns: eruptions, waiting
# Old Faithful data with density estimates (grid format)
faithfuld: pandas.DataFrame
# Columns: eruptions, waiting, density
# Lake Huron water levels 1875-1972
huron: pandas.DataFrame
# Columns: year, level, decade
# Vector field of seal movements
seals: pandas.DataFrame
# Columns: lat, long, delta_long, delta_lat# US meat production by month (millions of lbs)
meat: pandas.DataFrame
# Columns: date, beef, veal, pork, lamb_and_mutton, broilers, other_chicken, turkey
# Website pageview data
pageviews: pandas.DataFrame
# Columns: date, pageviews# Terms of 11 US presidents from Eisenhower to Obama
presidential: pandas.DataFrame
# Columns: name, start, end, party# Anscombe's Quartet - 4 datasets with identical statistical properties
anscombe_quartet: pandas.DataFrame
# Columns: dataset, x, y
# Colors in Luv color space
luv_colours: pandas.DataFrame
# Columns: L, u, v, colfrom plotnine import ggplot, aes, geom_histogram, geom_point, facet_wrap
from plotnine.data import diamonds, penguins
# Explore diamond prices
price_dist = (ggplot(diamonds, aes(x='price')) +
geom_histogram(bins=30) +
facet_wrap('cut'))
# Penguin species comparison
penguin_plot = (ggplot(penguins, aes(x='bill_length_mm', y='bill_depth_mm', color='species')) +
geom_point())from plotnine import ggplot, aes, geom_line
from plotnine.data import economics
# Economic trends over time
econ_plot = (ggplot(economics, aes(x='date', y='unemploy')) +
geom_line())from plotnine import ggplot, aes, geom_point, stat_smooth
from plotnine.data import mtcars
# Regression analysis
regression_plot = (ggplot(mtcars, aes(x='wt', y='mpg')) +
geom_point() +
stat_smooth(method='lm'))| Category | Datasets | Use Cases |
|---|---|---|
| Automotive | mtcars, mpg | Regression, clustering, factor analysis |
| Economics | economics, economics_long, txhousing | Time series, trend analysis |
| Biology | penguins, msleep, faithful | Species comparison, behavioral analysis |
| Geography | midwest, seals, huron | Spatial analysis, movement patterns |
| Retail | diamonds | Price modeling, categorical analysis |
| Food | meat | Production trends, seasonal patterns |
| Politics | presidential | Timeline analysis, categorical data |
| Statistics | anscombe_quartet, luv_colours | Statistical education, color analysis |
All datasets are provided as pandas DataFrames with appropriate data types, including categorical variables where relevant for optimal plotting performance.
Install with Tessl CLI
npx tessl i tessl/pypi-plotnine