or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

color-utilities.mddatasets.mdexpress-plotting.mdfigure-factory.mdgraph-objects.mdindex.mdio-operations.mdtools-utilities.md

datasets.mddocs/

0

# Built-in Datasets

1

2

Sample datasets for learning and experimentation with plotly visualizations. The data module provides 10+ commonly used datasets in data science, returned as pandas DataFrames (or other backends if configured).

3

4

## Capabilities

5

6

### Classification and Clustering Datasets

7

8

Classic datasets for machine learning and statistical analysis.

9

10

```python { .api }

11

def iris():

12

"""

13

Load the Iris flower dataset.

14

15

Contains measurements of iris flowers from three species: setosa, versicolor, and virginica.

16

Each sample has four features: sepal length, sepal width, petal length, and petal width.

17

18

Returns:

19

DataFrame: 150 rows × 5 columns

20

- sepal_length: float, sepal length in cm

21

- sepal_width: float, sepal width in cm

22

- petal_length: float, petal length in cm

23

- petal_width: float, petal width in cm

24

- species: str, flower species ('setosa', 'versicolor', 'virginica')

25

- species_id: int, numeric species identifier (0, 1, 2)

26

"""

27

28

def tips():

29

"""

30

Load restaurant tips dataset.

31

32

Contains information about restaurant bills, tips, and customer characteristics.

33

Useful for exploring relationships between categorical and continuous variables.

34

35

Returns:

36

DataFrame: 244 rows × 7 columns

37

- total_bill: float, total bill amount in dollars

38

- tip: float, tip amount in dollars

39

- sex: str, customer gender ('Male', 'Female')

40

- smoker: str, smoking status ('Yes', 'No')

41

- day: str, day of week ('Sun', 'Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat')

42

- time: str, meal time ('Lunch', 'Dinner')

43

- size: int, party size (number of people)

44

"""

45

```

46

47

### Economic and Demographic Data

48

49

Datasets containing economic indicators and demographic information over time.

50

51

```python { .api }

52

def gapminder():

53

"""

54

Load Gapminder world development dataset.

55

56

Contains country-level data on life expectancy, GDP per capita, and population

57

from 1952 to 2007. Excellent for demonstrating animated visualizations and

58

geographic mapping.

59

60

Returns:

61

DataFrame: 1704 rows × 8 columns

62

- country: str, country name

63

- continent: str, continent name ('Africa', 'Americas', 'Asia', 'Europe', 'Oceania')

64

- year: int, year (1952, 1957, 1962, 1967, 1972, 1977, 1982, 1987, 1992, 1997, 2002, 2007)

65

- lifeExp: float, life expectancy in years

66

- pop: int, population count

67

- gdpPercap: float, GDP per capita in US dollars

68

- iso_alpha: str, 3-letter ISO country code

69

- iso_num: int, numeric ISO country code

70

"""

71

72

def medals_wide():

73

"""

74

Load Olympic medals dataset in wide format.

75

76

Contains medal counts by country for 2018 Winter Olympics, with separate

77

columns for each medal type.

78

79

Returns:

80

DataFrame: 30 rows × 4 columns

81

- nation: str, country name

82

- gold: int, number of gold medals

83

- silver: int, number of silver medals

84

- bronze: int, number of bronze medals

85

"""

86

87

def medals_long():

88

"""

89

Load Olympic medals dataset in long format.

90

91

Same data as medals_wide but in tidy/long format with medal type as a variable.

92

93

Returns:

94

DataFrame: 90 rows × 3 columns

95

- nation: str, country name

96

- medal: str, medal type ('gold', 'silver', 'bronze')

97

- count: int, number of medals of that type

98

"""

99

```

100

101

### Time Series and Financial Data

102

103

Datasets with temporal components for time series analysis and visualization.

104

105

```python { .api }

106

def stocks():

107

"""

108

Load stock price dataset.

109

110

Contains daily stock prices for major technology companies (AAPL, GOOGL, AMZN, FB, NFLX, MSFT)

111

from 2018-2020. Useful for financial charts and time series analysis.

112

113

Returns:

114

DataFrame: 1560 rows × 3 columns

115

- date: datetime, trading date

116

- AAPL: float, Apple stock price

117

- GOOGL: float, Google stock price

118

- AMZN: float, Amazon stock price

119

- FB: float, Facebook stock price

120

- NFLX: float, Netflix stock price

121

- MSFT: float, Microsoft stock price

122

"""

123

124

def flights():

125

"""

126

Load airline passenger flights dataset.

127

128

Contains monthly passenger counts for different airlines and airports.

129

Good for demonstrating time series patterns and seasonal trends.

130

131

Returns:

132

DataFrame: 5733 rows × 4 columns

133

- year: int, year

134

- month: int, month (1-12)

135

- passengers: int, number of passengers

136

- airline: str, airline identifier

137

"""

138

```

139

140

### Election and Political Data

141

142

Datasets containing electoral and political information.

143

144

```python { .api }

145

def election():

146

"""

147

Load 2013 Montreal mayoral election results.

148

149

Contains voting results by district with candidate vote shares and

150

geographic information for choropleth mapping.

151

152

Returns:

153

DataFrame: 58 rows × 15 columns

154

- district: int, electoral district number

155

- Coderre: float, vote percentage for Denis Coderre

156

- Bergeron: float, vote percentage for Richard Bergeron

157

- Joly: float, vote percentage for Mélanie Joly

158

- total: int, total votes cast

159

- winner: str, winning candidate name

160

- result: str, result type ('win', 'lose')

161

- district_id: int, district identifier for mapping

162

- ... additional demographic columns

163

"""

164

165

def election_geojson():

166

"""

167

Load GeoJSON data for Montreal election districts.

168

169

Geographic boundary data corresponding to the election dataset,

170

used for creating choropleth maps.

171

172

Returns:

173

dict: GeoJSON feature collection with district boundaries

174

"""

175

```

176

177

### Scientific and Environmental Data

178

179

Datasets from scientific measurements and environmental monitoring.

180

181

```python { .api }

182

def wind():

183

"""

184

Load wind measurement dataset.

185

186

Contains wind speed and direction measurements, useful for polar plots,

187

wind roses, and meteorological visualizations.

188

189

Returns:

190

DataFrame: 128 rows × 4 columns

191

- direction: str, wind direction ('N', 'NE', 'E', 'SE', 'S', 'SW', 'W', 'NW')

192

- strength: str, wind strength category ('0-1', '1-2', '2-3', '3-4', '4-4+', '4-5', '5-6', '6+')

193

- frequency: float, frequency of occurrence

194

- magnitude: float, magnitude value for polar plotting

195

"""

196

197

def carshare():

198

"""

199

Load car sharing usage dataset.

200

201

Contains information about car sharing service usage patterns,

202

including temporal and geographic distribution.

203

204

Returns:

205

DataFrame: 249 rows × 4 columns

206

- centroid_lat: float, latitude of service area centroid

207

- centroid_lon: float, longitude of service area centroid

208

- car_hours: float, total car usage hours

209

- member_birth_year: int, birth year of member

210

"""

211

```

212

213

### Experimental and A/B Testing Data

214

215

Datasets designed for statistical analysis and experimental design examples.

216

217

```python { .api }

218

def experiment():

219

"""

220

Load A/B testing experiment dataset.

221

222

Contains results from a controlled experiment with treatment and control groups,

223

useful for demonstrating statistical analysis and hypothesis testing.

224

225

Returns:

226

DataFrame: 100 rows × 4 columns

227

- experiment_1: int, first experiment result

228

- experiment_2: int, second experiment result

229

- experiment_3: int, third experiment result

230

- group: str, experimental group ('control', 'treatment')

231

"""

232

```

233

234

## Usage Examples

235

236

```python

237

import plotly.express as px

238

import plotly.data as data

239

240

# Load and explore iris dataset

241

df_iris = data.iris()

242

print(df_iris.head())

243

print(df_iris.info())

244

245

# Create scatter plot with iris data

246

fig1 = px.scatter(df_iris, x="sepal_width", y="sepal_length",

247

color="species", size="petal_length",

248

title="Iris Dataset Visualization")

249

fig1.show()

250

251

# Load gapminder for animated visualization

252

df_gap = data.gapminder()

253

fig2 = px.scatter(df_gap, x="gdpPercap", y="lifeExp",

254

animation_frame="year", animation_group="country",

255

size="pop", color="continent", hover_name="country",

256

log_x=True, size_max=55, range_x=[100,100000],

257

range_y=[25,90], title="Gapminder Animation")

258

fig2.show()

259

260

# Stock price time series

261

df_stocks = data.stocks()

262

fig3 = px.line(df_stocks, x="date", y=["AAPL", "GOOGL", "AMZN"],

263

title="Tech Stock Prices")

264

fig3.show()

265

266

# Tips dataset for statistical analysis

267

df_tips = data.tips()

268

fig4 = px.box(df_tips, x="day", y="total_bill", color="time",

269

title="Restaurant Bills by Day and Time")

270

fig4.show()

271

272

# Wind data for polar visualization

273

df_wind = data.wind()

274

fig5 = px.bar_polar(df_wind, r="frequency", theta="direction",

275

color="strength", template="plotly_dark",

276

color_discrete_sequence=px.colors.sequential.Plasma_r,

277

title="Wind Pattern Analysis")

278

fig5.show()

279

280

# Election data for choropleth mapping

281

df_election = data.election()

282

geojson = data.election_geojson()

283

fig6 = px.choropleth(df_election, geojson=geojson, locations="district",

284

color="winner",

285

hover_data=["Coderre", "Bergeron", "Joly"],

286

title="Montreal Election Results")

287

fig6.show()

288

289

# Car sharing geographic analysis

290

df_cars = data.carshare()

291

fig7 = px.scatter_mapbox(df_cars, lat="centroid_lat", lon="centroid_lon",

292

size="car_hours", color="member_birth_year",

293

hover_data=["car_hours"], zoom=10, height=600,

294

mapbox_style="open-street-map",

295

title="Car Sharing Usage Patterns")

296

fig7.show()

297

298

# Olympic medals comparison

299

df_medals = data.medals_long()

300

fig8 = px.bar(df_medals, x="nation", y="count", color="medal",

301

title="2018 Winter Olympics Medal Count")

302

fig8.show()

303

304

# Flight passenger trends

305

df_flights = data.flights()

306

fig9 = px.line(df_flights, x="month", y="passengers", color="airline",

307

title="Airline Passenger Trends")

308

fig9.show()

309

310

# A/B testing results

311

df_experiment = data.experiment()

312

fig10 = px.box(df_experiment, y=["experiment_1", "experiment_2", "experiment_3"],

313

color="group", title="A/B Testing Results")

314

fig10.show()

315

316

# Dataset information summary

317

datasets = [

318

('iris', data.iris),

319

('tips', data.tips),

320

('gapminder', data.gapminder),

321

('stocks', data.stocks),

322

('flights', data.flights),

323

('wind', data.wind),

324

('election', data.election),

325

('carshare', data.carshare),

326

('medals_long', data.medals_long),

327

('experiment', data.experiment)

328

]

329

330

for name, func in datasets:

331

df = func()

332

print(f"{name}: {df.shape[0]} rows, {df.shape[1]} columns")

333

```