Tessl Tile for pypi/plotly@6.3.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

color-utilities.md datasets.md express-plotting.md figure-factory.md graph-objects.md index.md io-operations.md tools-utilities.md

datasets.mddocs/

0
# Built-in Datasets
1

2
Sample datasets for learning and experimentation with plotly visualizations. The data module provides 10+ commonly used datasets in data science, returned as pandas DataFrames (or other backends if configured).
3

4
## Capabilities
5

6
### Classification and Clustering Datasets
7

8
Classic datasets for machine learning and statistical analysis.
9

10
```python { .api }
11
def iris():
12
    """
13
    Load the Iris flower dataset.
14
    
15
    Contains measurements of iris flowers from three species: setosa, versicolor, and virginica.
16
    Each sample has four features: sepal length, sepal width, petal length, and petal width.
17
    
18
    Returns:
19
    DataFrame: 150 rows × 5 columns
20
        - sepal_length: float, sepal length in cm
21
        - sepal_width: float, sepal width in cm  
22
        - petal_length: float, petal length in cm
23
        - petal_width: float, petal width in cm
24
        - species: str, flower species ('setosa', 'versicolor', 'virginica')
25
        - species_id: int, numeric species identifier (0, 1, 2)
26
    """
27

28
def tips():
29
    """
30
    Load restaurant tips dataset.
31
    
32
    Contains information about restaurant bills, tips, and customer characteristics.
33
    Useful for exploring relationships between categorical and continuous variables.
34
    
35
    Returns:
36
    DataFrame: 244 rows × 7 columns
37
        - total_bill: float, total bill amount in dollars
38
        - tip: float, tip amount in dollars
39
        - sex: str, customer gender ('Male', 'Female')
40
        - smoker: str, smoking status ('Yes', 'No')
41
        - day: str, day of week ('Sun', 'Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat')
42
        - time: str, meal time ('Lunch', 'Dinner')
43
        - size: int, party size (number of people)
44
    """
45
```
46

47
### Economic and Demographic Data
48

49
Datasets containing economic indicators and demographic information over time.
50

51
```python { .api }
52
def gapminder():
53
    """
54
    Load Gapminder world development dataset.
55
    
56
    Contains country-level data on life expectancy, GDP per capita, and population
57
    from 1952 to 2007. Excellent for demonstrating animated visualizations and
58
    geographic mapping.
59
    
60
    Returns:
61
    DataFrame: 1704 rows × 8 columns
62
        - country: str, country name
63
        - continent: str, continent name ('Africa', 'Americas', 'Asia', 'Europe', 'Oceania')
64
        - year: int, year (1952, 1957, 1962, 1967, 1972, 1977, 1982, 1987, 1992, 1997, 2002, 2007)
65
        - lifeExp: float, life expectancy in years
66
        - pop: int, population count
67
        - gdpPercap: float, GDP per capita in US dollars
68
        - iso_alpha: str, 3-letter ISO country code
69
        - iso_num: int, numeric ISO country code
70
    """
71

72
def medals_wide():
73
    """
74
    Load Olympic medals dataset in wide format.
75
    
76
    Contains medal counts by country for 2018 Winter Olympics, with separate
77
    columns for each medal type.
78
    
79
    Returns:
80
    DataFrame: 30 rows × 4 columns
81
        - nation: str, country name
82
        - gold: int, number of gold medals
83
        - silver: int, number of silver medals  
84
        - bronze: int, number of bronze medals
85
    """
86

87
def medals_long():
88
    """
89
    Load Olympic medals dataset in long format.
90
    
91
    Same data as medals_wide but in tidy/long format with medal type as a variable.
92
    
93
    Returns:
94
    DataFrame: 90 rows × 3 columns
95
        - nation: str, country name
96
        - medal: str, medal type ('gold', 'silver', 'bronze')
97
        - count: int, number of medals of that type
98
    """
99
```
100

101
### Time Series and Financial Data
102

103
Datasets with temporal components for time series analysis and visualization.
104

105
```python { .api }
106
def stocks():
107
    """
108
    Load stock price dataset.
109
    
110
    Contains daily stock prices for major technology companies (AAPL, GOOGL, AMZN, FB, NFLX, MSFT)
111
    from 2018-2020. Useful for financial charts and time series analysis.
112
    
113
    Returns:
114
    DataFrame: 1560 rows × 3 columns
115
        - date: datetime, trading date
116
        - AAPL: float, Apple stock price
117
        - GOOGL: float, Google stock price
118
        - AMZN: float, Amazon stock price
119
        - FB: float, Facebook stock price
120
        - NFLX: float, Netflix stock price
121
        - MSFT: float, Microsoft stock price
122
    """
123

124
def flights():
125
    """
126
    Load airline passenger flights dataset.
127
    
128
    Contains monthly passenger counts for different airlines and airports.
129
    Good for demonstrating time series patterns and seasonal trends.
130
    
131
    Returns:
132
    DataFrame: 5733 rows × 4 columns
133
        - year: int, year
134
        - month: int, month (1-12)
135
        - passengers: int, number of passengers
136
        - airline: str, airline identifier
137
    """
138
```
139

140
### Election and Political Data
141

142
Datasets containing electoral and political information.
143

144
```python { .api }
145
def election():
146
    """
147
    Load 2013 Montreal mayoral election results.
148
    
149
    Contains voting results by district with candidate vote shares and
150
    geographic information for choropleth mapping.
151
    
152
    Returns:
153
    DataFrame: 58 rows × 15 columns
154
        - district: int, electoral district number
155
        - Coderre: float, vote percentage for Denis Coderre
156
        - Bergeron: float, vote percentage for Richard Bergeron  
157
        - Joly: float, vote percentage for Mélanie Joly
158
        - total: int, total votes cast
159
        - winner: str, winning candidate name
160
        - result: str, result type ('win', 'lose')
161
        - district_id: int, district identifier for mapping
162
        - ... additional demographic columns
163
    """
164

165
def election_geojson():
166
    """
167
    Load GeoJSON data for Montreal election districts.
168
    
169
    Geographic boundary data corresponding to the election dataset,
170
    used for creating choropleth maps.
171
    
172
    Returns:
173
    dict: GeoJSON feature collection with district boundaries
174
    """
175
```
176

177
### Scientific and Environmental Data
178

179
Datasets from scientific measurements and environmental monitoring.
180

181
```python { .api }
182
def wind():
183
    """
184
    Load wind measurement dataset.
185
    
186
    Contains wind speed and direction measurements, useful for polar plots,
187
    wind roses, and meteorological visualizations.
188
    
189
    Returns:
190
    DataFrame: 128 rows × 4 columns
191
        - direction: str, wind direction ('N', 'NE', 'E', 'SE', 'S', 'SW', 'W', 'NW')
192
        - strength: str, wind strength category ('0-1', '1-2', '2-3', '3-4', '4-4+', '4-5', '5-6', '6+')
193
        - frequency: float, frequency of occurrence
194
        - magnitude: float, magnitude value for polar plotting
195
    """
196

197
def carshare():
198
    """
199
    Load car sharing usage dataset.
200
    
201
    Contains information about car sharing service usage patterns,
202
    including temporal and geographic distribution.
203
    
204
    Returns:
205
    DataFrame: 249 rows × 4 columns
206
        - centroid_lat: float, latitude of service area centroid
207
        - centroid_lon: float, longitude of service area centroid
208
        - car_hours: float, total car usage hours
209
        - member_birth_year: int, birth year of member
210
    """
211
```
212

213
### Experimental and A/B Testing Data
214

215
Datasets designed for statistical analysis and experimental design examples.
216

217
```python { .api }
218
def experiment():
219
    """
220
    Load A/B testing experiment dataset.
221
    
222
    Contains results from a controlled experiment with treatment and control groups,
223
    useful for demonstrating statistical analysis and hypothesis testing.
224
    
225
    Returns:
226
    DataFrame: 100 rows × 4 columns
227
        - experiment_1: int, first experiment result
228
        - experiment_2: int, second experiment result  
229
        - experiment_3: int, third experiment result
230
        - group: str, experimental group ('control', 'treatment')
231
    """
232
```
233

234
## Usage Examples
235

236
```python
237
import plotly.express as px
238
import plotly.data as data
239

240
# Load and explore iris dataset
241
df_iris = data.iris()
242
print(df_iris.head())
243
print(df_iris.info())
244

245
# Create scatter plot with iris data
246
fig1 = px.scatter(df_iris, x="sepal_width", y="sepal_length", 
247
                 color="species", size="petal_length",
248
                 title="Iris Dataset Visualization")
249
fig1.show()
250

251
# Load gapminder for animated visualization
252
df_gap = data.gapminder()
253
fig2 = px.scatter(df_gap, x="gdpPercap", y="lifeExp", 
254
                 animation_frame="year", animation_group="country",
255
                 size="pop", color="continent", hover_name="country",
256
                 log_x=True, size_max=55, range_x=[100,100000], 
257
                 range_y=[25,90], title="Gapminder Animation")
258
fig2.show()
259

260
# Stock price time series
261
df_stocks = data.stocks()
262
fig3 = px.line(df_stocks, x="date", y=["AAPL", "GOOGL", "AMZN"], 
263
              title="Tech Stock Prices")
264
fig3.show()
265

266
# Tips dataset for statistical analysis
267
df_tips = data.tips()
268
fig4 = px.box(df_tips, x="day", y="total_bill", color="time",
269
             title="Restaurant Bills by Day and Time")
270
fig4.show()
271

272
# Wind data for polar visualization
273
df_wind = data.wind()
274
fig5 = px.bar_polar(df_wind, r="frequency", theta="direction",
275
                   color="strength", template="plotly_dark",
276
                   color_discrete_sequence=px.colors.sequential.Plasma_r,
277
                   title="Wind Pattern Analysis")
278
fig5.show()
279

280
# Election data for choropleth mapping
281
df_election = data.election()
282
geojson = data.election_geojson()
283
fig6 = px.choropleth(df_election, geojson=geojson, locations="district",
284
                    color="winner", 
285
                    hover_data=["Coderre", "Bergeron", "Joly"],
286
                    title="Montreal Election Results")
287
fig6.show()
288

289
# Car sharing geographic analysis
290
df_cars = data.carshare()
291
fig7 = px.scatter_mapbox(df_cars, lat="centroid_lat", lon="centroid_lon",
292
                        size="car_hours", color="member_birth_year",
293
                        hover_data=["car_hours"], zoom=10, height=600,
294
                        mapbox_style="open-street-map",
295
                        title="Car Sharing Usage Patterns")
296
fig7.show()
297

298
# Olympic medals comparison
299
df_medals = data.medals_long()
300
fig8 = px.bar(df_medals, x="nation", y="count", color="medal",
301
             title="2018 Winter Olympics Medal Count")
302
fig8.show()
303

304
# Flight passenger trends
305
df_flights = data.flights()
306
fig9 = px.line(df_flights, x="month", y="passengers", color="airline",
307
              title="Airline Passenger Trends")
308
fig9.show()
309

310
# A/B testing results
311
df_experiment = data.experiment()
312
fig10 = px.box(df_experiment, y=["experiment_1", "experiment_2", "experiment_3"],
313
              color="group", title="A/B Testing Results")
314
fig10.show()
315

316
# Dataset information summary
317
datasets = [
318
    ('iris', data.iris),
319
    ('tips', data.tips), 
320
    ('gapminder', data.gapminder),
321
    ('stocks', data.stocks),
322
    ('flights', data.flights),
323
    ('wind', data.wind),
324
    ('election', data.election),
325
    ('carshare', data.carshare),
326
    ('medals_long', data.medals_long),
327
    ('experiment', data.experiment)
328
]
329

330
for name, func in datasets:
331
    df = func()
332
    print(f"{name}: {df.shape[0]} rows, {df.shape[1]} columns")
333
```

Version

Tile

Files

datasets.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

datasets.mddocs/