Tessl Tile for pypi/itables@2.5.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

configuration.md core-display.md data-utilities.md framework-extensions.md index.md type-system.md

data-utilities.mddocs/

0
# Data Processing and Utilities
1

2
Utilities for data handling, downsampling, sample data generation, and type processing to ensure optimal table performance and provide testing data for development and demonstration purposes.
3

4
## Capabilities
5

6
### Data Downsampling
7

8
Functions for automatically reducing DataFrame size when it exceeds specified limits, ensuring responsive table performance while preserving data structure and representation.
9

10
```python { .api }
11
def downsample(df, max_rows=0, max_columns=0, max_bytes=0):
12
    """
13
    Return a subset of the DataFrame that fits the specified limits.
14

15
    Parameters:
16
    - df: Pandas/Polars DataFrame or Series to downsample
17
    - max_rows (int): Maximum number of rows (0 = unlimited)
18
    - max_columns (int): Maximum number of columns (0 = unlimited)  
19
    - max_bytes (int | str): Maximum memory usage ("64KB", "1MB", or integer bytes)
20

21
    Returns:
22
    tuple[DataFrame, str]: (downsampled_df, warning_message)
23
    - warning_message is empty string if no downsampling occurred
24
    """
25

26
def nbytes(df):
27
    """
28
    Calculate memory usage of DataFrame.
29

30
    Parameters:
31
    - df: Pandas/Polars DataFrame or Series
32

33
    Returns:
34
    int: Memory usage in bytes
35
    """
36

37
def as_nbytes(mem):
38
    """
39
    Convert memory specification to bytes.
40

41
    Parameters:
42
    - mem (int | float | str): Memory specification ("64KB", "1MB", etc. or numeric)
43

44
    Returns:
45
    int: Memory size in bytes
46

47
    Raises:
48
    ValueError: If specification format is invalid or too large (>= 1GB)
49
    """
50
```
51

52
### Sample Data Generation
53

54
Comprehensive collection of functions for generating test data with various data types, structures, and complexities for development, testing, and demonstration purposes.
55

56
```python { .api }
57
def get_countries(html=False, climate_zone=False):
58
    """
59
    Return DataFrame with world countries data from World Bank.
60

61
    Parameters:
62
    - html (bool): If True, include HTML formatted country/capital links and flag images
63
    - climate_zone (bool): If True, add climate zone and hemisphere columns
64

65
    Returns:
66
    pd.DataFrame: Countries data with columns: region, country, capital, longitude, latitude
67
    """
68

69
def get_population():
70
    """
71
    Return Series with world population data from World Bank.
72

73
    Returns:
74
    pd.Series: Population data indexed by country name
75
    """
76

77
def get_indicators():
78
    """
79
    Return DataFrame with subset of World Bank indicators.
80

81
    Returns:
82
    pd.DataFrame: World Bank indicators data
83
    """
84

85
def get_df_complex_index():
86
    """
87
    Return DataFrame with complex multi-level index for testing.
88

89
    Returns:
90
    pd.DataFrame: DataFrame with MultiIndex (region, country) and MultiIndex columns
91
    """
92

93
def get_dict_of_test_dfs(N=100, M=100):
94
    """
95
    Return dictionary of test DataFrames with various data types and structures.
96

97
    Parameters:
98
    - N (int): Number of rows for generated data
99
    - M (int): Number of columns for wide DataFrame
100

101
    Returns:
102
    dict[str, pd.DataFrame]: Test DataFrames including empty, boolean, int, float, 
103
    string, datetime, categorical, object, multiindex, and complex index types
104
    """
105

106
def get_dict_of_polars_test_dfs(N=100, M=100):
107
    """
108
    Return dictionary of Polars test DataFrames.
109

110
    Parameters:
111
    - N (int): Number of rows for generated data
112
    - M (int): Number of columns for wide DataFrame
113

114
    Returns:
115
    dict[str, pl.DataFrame]: Polars versions of test DataFrames
116
    """
117

118
def generate_random_df(rows, columns, column_types=None):
119
    """
120
    Generate random DataFrame with specified dimensions and data types.
121

122
    Parameters:
123
    - rows (int): Number of rows to generate
124
    - columns (int): Number of columns to generate
125
    - column_types (list, optional): List of data types to use (default: COLUMN_TYPES)
126

127
    Returns:
128
    pd.DataFrame: Random DataFrame with mixed data types
129
    """
130

131
def generate_random_series(rows, type):
132
    """
133
    Generate random Series of specified type and length.
134

135
    Parameters:
136
    - rows (int): Number of rows to generate
137
    - type (str): Data type ("bool", "int", "float", "str", "categories", 
138
                  "boolean", "Int64", "date", "datetime", "timedelta")
139

140
    Returns:
141
    pd.Series: Random Series of specified type
142
    """
143

144
def get_dict_of_polars_test_dfs(N=100, M=100):
145
    """
146
    Return dictionary of Polars test DataFrames.
147

148
    Parameters:
149
    - N (int): Number of rows for generated data
150
    - M (int): Number of columns for wide DataFrame
151

152
    Returns:
153
    dict[str, pl.DataFrame]: Polars versions of test DataFrames with same structure as pandas versions
154
    """
155

156
def get_dict_of_test_series():
157
    """
158
    Return dictionary of test Series with various data types.
159

160
    Returns:
161
    dict[str, pd.Series]: Test Series including boolean, int, float, string, 
162
    categorical, datetime, and complex types
163
    """
164

165
def get_dict_of_polars_test_series():
166
    """
167
    Return dictionary of Polars test Series.
168

169
    Returns:
170
    dict[str, pl.Series]: Polars versions of test Series
171
    """
172

173
def generate_date_series():
174
    """
175
    Generate Series with various date formats and edge cases.
176

177
    Returns:
178
    pd.Series: Date series with timezone, leap years, and boundary dates
179
    """
180

181
def get_pandas_styler():
182
    """
183
    Return styled Pandas DataFrame with background colors and tooltips.
184

185
    Returns:
186
    pd.Styler: Styled DataFrame with trigonometric data and formatting
187
    """
188
```
189

190
### Package Utilities
191

192
Helper functions for accessing ITables package resources and internal file management.
193

194
```python { .api }
195
def find_package_file(*path):
196
    """
197
    Return full path to file within ITables package.
198

199
    Parameters:
200
    - *path (str): Path components relative to package root
201

202
    Returns:
203
    Path: Full path to package file
204
    """
205

206
def read_package_file(*path):
207
    """
208
    Read and return content of file within ITables package.
209

210
    Parameters:
211
    - *path (str): Path components relative to package root
212

213
    Returns:
214
    str: File content as string
215
    """
216
```
217

218
## Usage Examples
219

220
### Automatic Downsampling
221

222
```python
223
import pandas as pd
224
from itables.downsample import downsample
225

226
# Create large DataFrame
227
df = pd.DataFrame({
228
    'data': range(10000),
229
    'values': np.random.randn(10000)
230
})
231

232
# Downsample to fit limits
233
small_df, warning = downsample(df, max_rows=1000, max_bytes="1MB")
234

235
if warning:
236
    print(f"Downsampling applied: {warning}")
237
    print(f"Original shape: {df.shape}, New shape: {small_df.shape}")
238
```
239

240
### Sample Data Usage
241

242
```python
243
from itables.sample_dfs import get_countries, get_dict_of_test_dfs
244
from itables import show
245

246
# Display world countries data
247
countries = get_countries(html=True, climate_zone=True)
248
show(countries, caption="World Countries with Climate Data")
249

250
# Get various test DataFrames
251
test_dfs = get_dict_of_test_dfs(N=50, M=10)
252

253
# Display different data types
254
show(test_dfs['float'], caption="Float Data Types")
255
show(test_dfs['time'], caption="Time Data Types") 
256
show(test_dfs['multiindex'], caption="MultiIndex Example")
257
```
258

259
### Random Data Generation
260

261
```python
262
from itables.sample_dfs import generate_random_df, COLUMN_TYPES
263
from itables import show
264

265
# Generate random DataFrame
266
random_df = generate_random_df(
267
    rows=100, 
268
    columns=8, 
269
    column_types=['int', 'float', 'str', 'bool', 'date', 'categories']
270
)
271

272
show(random_df, caption="Random Generated Data")
273

274
# Generate with all supported types
275
full_random = generate_random_df(rows=50, columns=len(COLUMN_TYPES))
276
show(full_random, caption="All Data Types")
277
```
278

279
### Styled DataFrames
280

281
```python
282
from itables.sample_dfs import get_pandas_styler
283
from itables import show
284

285
# Get pre-styled DataFrame
286
styled_df = get_pandas_styler()
287
show(styled_df, 
288
     caption="Styled Trigonometric Data",
289
     allow_html=True)  # Required for styled DataFrames
290
```
291

292
### Memory Analysis
293

294
```python
295
from itables.downsample import nbytes, as_nbytes
296
import pandas as pd
297

298
# Analyze DataFrame memory usage
299
df = pd.DataFrame({
300
    'A': range(1000),
301
    'B': ['text'] * 1000,
302
    'C': pd.date_range('2020-01-01', periods=1000)
303
})
304

305
memory_usage = nbytes(df)
306
print(f"DataFrame uses {memory_usage:,} bytes")
307

308
# Convert memory specifications
309
print(f"64KB = {as_nbytes('64KB'):,} bytes")
310
print(f"1MB = {as_nbytes('1MB'):,} bytes")
311
print(f"Direct int: {as_nbytes(1024)} bytes")
312
```
313

314
### Custom Test Data
315

316
```python
317
from itables.sample_dfs import get_dict_of_test_dfs, get_dict_of_test_series
318
from itables import show
319

320
# Get all test DataFrames
321
test_data = get_dict_of_test_dfs(N=20, M=5)
322

323
# Show specific interesting cases
324
show(test_data['empty'], caption="Empty DataFrame")
325
show(test_data['duplicated_columns'], caption="Duplicated Column Names")
326
show(test_data['big_integers'], caption="Large Integer Handling")
327

328
# Test Series data
329
test_series = get_dict_of_test_series()
330
for name, series in list(test_series.items())[:3]:
331
    show(series.to_frame(), caption=f"Series: {name}")
332
```
333

334
### Package Resource Access
335

336
```python
337
from itables.utils import find_package_file, read_package_file
338

339
# Find package files
340
dt_bundle_path = find_package_file("html", "dt_bundle.js")
341
print(f"DataTables bundle located at: {dt_bundle_path}")
342

343
# Read package content (for advanced use cases)
344
init_html = read_package_file("html", "init_datatables.html")
345
print(f"Init HTML template length: {len(init_html)} characters")
346
```
347

348
## Data Type Support
349

350
### Supported Column Types
351

352
The `COLUMN_TYPES` constant defines all supported data types for random generation:
353

354
```python
355
COLUMN_TYPES = [
356
    "bool",        # Boolean values
357
    "int",         # Integer values  
358
    "float",       # Floating point (with NaN, inf handling)
359
    "str",         # String values
360
    "categories",  # Categorical data
361
    "boolean",     # Nullable boolean (pandas extension)
362
    "Int64",       # Nullable integer (pandas extension)
363
    "date",        # Date values
364
    "datetime",    # Datetime values
365
    "timedelta"    # Time duration values
366
]
367
```
368

369
### Special Value Handling
370

371
- **NaN/Null values**: Automatically handled for appropriate data types
372
- **Infinite values**: Properly encoded for JSON serialization
373
- **Large integers**: Preserved without precision loss
374
- **Complex objects**: Converted to string representation with warnings
375
- **Polars types**: Full compatibility including unsigned integers and struct types
376

377
### Memory Optimization
378

379
The downsampling system uses intelligent algorithms to:
380
- Preserve data structure (first/last rows for temporal continuity)
381
- Maintain aspect ratios when possible
382
- Provide clear warnings about data reduction
383
- Support both row and column limits simultaneously

Version

Tile

Files

data-utilities.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

data-utilities.mddocs/