CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl/pypi-plotnine

A Grammar of Graphics for Python providing a declarative approach to data visualization similar to R's ggplot2

Pending
Overview
Eval results
Files

sample-datasets.mddocs/

Sample Datasets

Plotnine includes a comprehensive collection of datasets commonly used for data visualization examples, tutorials, and exploration. These datasets provide real-world data across various domains including economics, biology, automotive, and demographics.

Import Patterns

# Import specific datasets
from plotnine.data import mtcars, diamonds, economics

# Import all datasets (not recommended for production)
from plotnine.data import *

# Access via module reference
import plotnine.data as data
df = data.mtcars

Capabilities

Automotive Data

# Motor Trend Car Road Tests - 32 automobiles (1973-74 models)
mtcars: pandas.DataFrame
# Columns: mpg, cyl, disp, hp, drat, wt, qsec, vs, am, gear, carb

# Fuel economy data from 1999 and 2008 for 38 popular car models  
mpg: pandas.DataFrame
# Columns: manufacturer, model, displ, year, cyl, trans, drv, cty, hwy, fl, class

Usage Example:

from plotnine import ggplot, aes, geom_point
from plotnine.data import mtcars

# Scatter plot of weight vs mpg
plot = (ggplot(mtcars, aes(x='wt', y='mpg')) + 
        geom_point())

Jewelry and Precious Stones

# Prices and attributes of ~54,000 diamonds
diamonds: pandas.DataFrame  
# Columns: price, carat, cut, color, clarity, x, y, z, depth, table
# cut: Factor with levels Fair, Good, Very Good, Premium, Ideal
# color: Factor with levels D, E, F, G, H, I, J (D=best, J=worst)
# clarity: Factor with levels I1, SI2, SI1, VS2, VS1, VVS2, VVS1, IF (I1=worst, IF=best)

Economic Data

# US economic time series from FRED database
economics: pandas.DataFrame
# Columns: date, psavert, pce, unemploy, uempmed, pop

# US economic data in long format for easier visualization
economics_long: pandas.DataFrame
# Same data as economics but in tidy/long format

Biological Data

# Palmer Penguins - 3 species from Antarctica
penguins: pandas.DataFrame
# Columns: species, island, bill_length_mm, bill_depth_mm, flipper_length_mm, body_mass_g, sex, year
# species: Factor with levels Adelie, Chinstrap, Gentoo

# Updated mammals sleep dataset  
msleep: pandas.DataFrame
# Columns: name, genus, vore, order, conservation, sleep_total, sleep_rem, sleep_cycle, awake, brainwt, bodywt

Geographic and Demographic Data

# Midwest demographics by county
midwest: pandas.DataFrame  
# Columns: PID, county, state, area, poptotal, popdensity, popwhite, popblack, etc.

# Texas housing market data from TAMU real estate center
txhousing: pandas.DataFrame
# Columns: city, year, month, sales, volume, median, listings, inventory, date

Natural Phenomena

# Old Faithful Geyser eruption data
faithful: pandas.DataFrame
# Columns: eruptions, waiting

# Old Faithful data with density estimates (grid format)
faithfuld: pandas.DataFrame  
# Columns: eruptions, waiting, density

# Lake Huron water levels 1875-1972
huron: pandas.DataFrame
# Columns: year, level, decade

# Vector field of seal movements
seals: pandas.DataFrame
# Columns: lat, long, delta_long, delta_lat

Food Production and Web Data

# US meat production by month (millions of lbs)
meat: pandas.DataFrame
# Columns: date, beef, veal, pork, lamb_and_mutton, broilers, other_chicken, turkey

# Website pageview data 
pageviews: pandas.DataFrame
# Columns: date, pageviews

Political Data

# Terms of 11 US presidents from Eisenhower to Obama
presidential: pandas.DataFrame
# Columns: name, start, end, party

Statistical Datasets

# Anscombe's Quartet - 4 datasets with identical statistical properties
anscombe_quartet: pandas.DataFrame
# Columns: dataset, x, y

# Colors in Luv color space
luv_colours: pandas.DataFrame
# Columns: L, u, v, col

Common Usage Patterns

Quick Data Exploration

from plotnine import ggplot, aes, geom_histogram, geom_point, facet_wrap
from plotnine.data import diamonds, penguins

# Explore diamond prices
price_dist = (ggplot(diamonds, aes(x='price')) + 
             geom_histogram(bins=30) +
             facet_wrap('cut'))

# Penguin species comparison  
penguin_plot = (ggplot(penguins, aes(x='bill_length_mm', y='bill_depth_mm', color='species')) +
               geom_point())

Time Series Analysis

from plotnine import ggplot, aes, geom_line
from plotnine.data import economics

# Economic trends over time
econ_plot = (ggplot(economics, aes(x='date', y='unemploy')) +
            geom_line())

Statistical Examples

from plotnine import ggplot, aes, geom_point, stat_smooth
from plotnine.data import mtcars

# Regression analysis
regression_plot = (ggplot(mtcars, aes(x='wt', y='mpg')) +  
                  geom_point() +
                  stat_smooth(method='lm'))

Dataset Categories

CategoryDatasetsUse Cases
Automotivemtcars, mpgRegression, clustering, factor analysis
Economicseconomics, economics_long, txhousingTime series, trend analysis
Biologypenguins, msleep, faithfulSpecies comparison, behavioral analysis
Geographymidwest, seals, huronSpatial analysis, movement patterns
RetaildiamondsPrice modeling, categorical analysis
FoodmeatProduction trends, seasonal patterns
PoliticspresidentialTimeline analysis, categorical data
Statisticsanscombe_quartet, luv_coloursStatistical education, color analysis

All datasets are provided as pandas DataFrames with appropriate data types, including categorical variables where relevant for optimal plotting performance.

Install with Tessl CLI

npx tessl i tessl/pypi-plotnine

docs

aesthetic-mappings.md

coordinate-systems.md

core-plotting.md

faceting.md

geometric-objects.md

guides-and-legends.md

index.md

labels-and-annotations.md

position-adjustments.md

sample-datasets.md

scales-and-axes.md

statistical-transformations.md

themes-and-styling.md

watermarks.md

tile.json