or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

aesthetic-mappings.mdcoordinate-systems.mdcore-plotting.mdfaceting.mdgeometric-objects.mdguides-and-legends.mdindex.mdlabels-and-annotations.mdposition-adjustments.mdsample-datasets.mdscales-and-axes.mdstatistical-transformations.mdthemes-and-styling.mdwatermarks.md

sample-datasets.mddocs/

0

# Sample Datasets

1

2

Plotnine includes a comprehensive collection of datasets commonly used for data visualization examples, tutorials, and exploration. These datasets provide real-world data across various domains including economics, biology, automotive, and demographics.

3

4

## Import Patterns

5

6

```python

7

# Import specific datasets

8

from plotnine.data import mtcars, diamonds, economics

9

10

# Import all datasets (not recommended for production)

11

from plotnine.data import *

12

13

# Access via module reference

14

import plotnine.data as data

15

df = data.mtcars

16

```

17

18

## Capabilities

19

20

### Automotive Data

21

22

```python { .api }

23

# Motor Trend Car Road Tests - 32 automobiles (1973-74 models)

24

mtcars: pandas.DataFrame

25

# Columns: mpg, cyl, disp, hp, drat, wt, qsec, vs, am, gear, carb

26

27

# Fuel economy data from 1999 and 2008 for 38 popular car models

28

mpg: pandas.DataFrame

29

# Columns: manufacturer, model, displ, year, cyl, trans, drv, cty, hwy, fl, class

30

```

31

32

**Usage Example:**

33

```python

34

from plotnine import ggplot, aes, geom_point

35

from plotnine.data import mtcars

36

37

# Scatter plot of weight vs mpg

38

plot = (ggplot(mtcars, aes(x='wt', y='mpg')) +

39

geom_point())

40

```

41

42

### Jewelry and Precious Stones

43

44

```python { .api }

45

# Prices and attributes of ~54,000 diamonds

46

diamonds: pandas.DataFrame

47

# Columns: price, carat, cut, color, clarity, x, y, z, depth, table

48

# cut: Factor with levels Fair, Good, Very Good, Premium, Ideal

49

# color: Factor with levels D, E, F, G, H, I, J (D=best, J=worst)

50

# clarity: Factor with levels I1, SI2, SI1, VS2, VS1, VVS2, VVS1, IF (I1=worst, IF=best)

51

```

52

53

### Economic Data

54

55

```python { .api }

56

# US economic time series from FRED database

57

economics: pandas.DataFrame

58

# Columns: date, psavert, pce, unemploy, uempmed, pop

59

60

# US economic data in long format for easier visualization

61

economics_long: pandas.DataFrame

62

# Same data as economics but in tidy/long format

63

```

64

65

### Biological Data

66

67

```python { .api }

68

# Palmer Penguins - 3 species from Antarctica

69

penguins: pandas.DataFrame

70

# Columns: species, island, bill_length_mm, bill_depth_mm, flipper_length_mm, body_mass_g, sex, year

71

# species: Factor with levels Adelie, Chinstrap, Gentoo

72

73

# Updated mammals sleep dataset

74

msleep: pandas.DataFrame

75

# Columns: name, genus, vore, order, conservation, sleep_total, sleep_rem, sleep_cycle, awake, brainwt, bodywt

76

```

77

78

### Geographic and Demographic Data

79

80

```python { .api }

81

# Midwest demographics by county

82

midwest: pandas.DataFrame

83

# Columns: PID, county, state, area, poptotal, popdensity, popwhite, popblack, etc.

84

85

# Texas housing market data from TAMU real estate center

86

txhousing: pandas.DataFrame

87

# Columns: city, year, month, sales, volume, median, listings, inventory, date

88

```

89

90

### Natural Phenomena

91

92

```python { .api }

93

# Old Faithful Geyser eruption data

94

faithful: pandas.DataFrame

95

# Columns: eruptions, waiting

96

97

# Old Faithful data with density estimates (grid format)

98

faithfuld: pandas.DataFrame

99

# Columns: eruptions, waiting, density

100

101

# Lake Huron water levels 1875-1972

102

huron: pandas.DataFrame

103

# Columns: year, level, decade

104

105

# Vector field of seal movements

106

seals: pandas.DataFrame

107

# Columns: lat, long, delta_long, delta_lat

108

```

109

110

### Food Production and Web Data

111

112

```python { .api }

113

# US meat production by month (millions of lbs)

114

meat: pandas.DataFrame

115

# Columns: date, beef, veal, pork, lamb_and_mutton, broilers, other_chicken, turkey

116

117

# Website pageview data

118

pageviews: pandas.DataFrame

119

# Columns: date, pageviews

120

```

121

122

### Political Data

123

124

```python { .api }

125

# Terms of 11 US presidents from Eisenhower to Obama

126

presidential: pandas.DataFrame

127

# Columns: name, start, end, party

128

```

129

130

### Statistical Datasets

131

132

```python { .api }

133

# Anscombe's Quartet - 4 datasets with identical statistical properties

134

anscombe_quartet: pandas.DataFrame

135

# Columns: dataset, x, y

136

137

# Colors in Luv color space

138

luv_colours: pandas.DataFrame

139

# Columns: L, u, v, col

140

```

141

142

## Common Usage Patterns

143

144

### Quick Data Exploration

145

```python

146

from plotnine import ggplot, aes, geom_histogram, geom_point, facet_wrap

147

from plotnine.data import diamonds, penguins

148

149

# Explore diamond prices

150

price_dist = (ggplot(diamonds, aes(x='price')) +

151

geom_histogram(bins=30) +

152

facet_wrap('cut'))

153

154

# Penguin species comparison

155

penguin_plot = (ggplot(penguins, aes(x='bill_length_mm', y='bill_depth_mm', color='species')) +

156

geom_point())

157

```

158

159

### Time Series Analysis

160

```python

161

from plotnine import ggplot, aes, geom_line

162

from plotnine.data import economics

163

164

# Economic trends over time

165

econ_plot = (ggplot(economics, aes(x='date', y='unemploy')) +

166

geom_line())

167

```

168

169

### Statistical Examples

170

```python

171

from plotnine import ggplot, aes, geom_point, stat_smooth

172

from plotnine.data import mtcars

173

174

# Regression analysis

175

regression_plot = (ggplot(mtcars, aes(x='wt', y='mpg')) +

176

geom_point() +

177

stat_smooth(method='lm'))

178

```

179

180

## Dataset Categories

181

182

| Category | Datasets | Use Cases |

183

|----------|----------|-----------|

184

| **Automotive** | mtcars, mpg | Regression, clustering, factor analysis |

185

| **Economics** | economics, economics_long, txhousing | Time series, trend analysis |

186

| **Biology** | penguins, msleep, faithful | Species comparison, behavioral analysis |

187

| **Geography** | midwest, seals, huron | Spatial analysis, movement patterns |

188

| **Retail** | diamonds | Price modeling, categorical analysis |

189

| **Food** | meat | Production trends, seasonal patterns |

190

| **Politics** | presidential | Timeline analysis, categorical data |

191

| **Statistics** | anscombe_quartet, luv_colours | Statistical education, color analysis |

192

193

All datasets are provided as pandas DataFrames with appropriate data types, including categorical variables where relevant for optimal plotting performance.