Tessl Tile for pypi/autogluon@1.4.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

core.md features.md index.md multimodal.md tabular.md timeseries.md

features.mddocs/

0
# Feature Engineering
1

2
Comprehensive feature generation and transformation capabilities for automated feature engineering across different data types. AutoGluon's feature engineering system provides modular, composable feature generators that can handle text, categorical, numerical, and datetime data with intelligent preprocessing pipelines.
3

4
## Capabilities
5

6
### AutoML Pipeline Feature Generators
7

8
High-level feature generation pipelines that automatically select and configure appropriate feature transformations.
9

10
```python { .api }
11
class AutoMLPipelineFeatureGenerator:
12
    def __init__(
13
        self,
14
        enable_numeric_features: bool = True,
15
        enable_categorical_features: bool = True,
16
        enable_datetime_features: bool = True,
17
        enable_text_special_features: bool = True,
18
        enable_text_ngram_features: bool = True,
19
        enable_raw_text_features: bool = False,
20
        enable_vision_features: bool = True,
21
        **kwargs
22
    ):
23
        """
24
        Initialize automated feature generation pipeline.
25
        
26
        Parameters:
27
        - enable_numeric_features: Generate numerical feature transformations
28
        - enable_categorical_features: Generate categorical encodings
29
        - enable_datetime_features: Generate datetime-based features
30
        - enable_text_special_features: Generate text special character features
31
        - enable_text_ngram_features: Generate text n-gram features
32
        - enable_raw_text_features: Keep raw text features
33
        - enable_vision_features: Generate image-based features
34
        """
35

36
    def fit_transform(self, X, y=None, **kwargs):
37
        """
38
        Fit feature generators and transform input data.
39
        
40
        Parameters:
41
        - X: Input DataFrame with raw features
42
        - y: Target values (optional)
43
        
44
        Returns:
45
        Transformed DataFrame with engineered features
46
        """
47

48
    def transform(self, X, **kwargs):
49
        """
50
        Transform input data using fitted feature generators.
51
        
52
        Parameters:
53
        - X: Input DataFrame to transform
54
        
55
        Returns:
56
        Transformed DataFrame with engineered features
57
        """
58

59
class AutoMLInterpretablePipelineFeatureGenerator:
60
    def __init__(self, **kwargs):
61
        """
62
        Initialize interpretable feature generation pipeline.
63
        
64
        Similar to AutoMLPipelineFeatureGenerator but focuses on 
65
        interpretable transformations suitable for model explanation.
66
        """
67
```
68

69
### Core Feature Generators
70

71
Base classes and fundamental feature transformation components.
72

73
```python { .api }
74
class AbstractFeatureGenerator:
75
    def __init__(self, **kwargs):
76
        """Base class for all feature generators."""
77
    
78
    def fit_transform(self, X, y=None, **kwargs):
79
        """Fit generator and transform data in one step."""
80
    
81
    def fit(self, X, y=None, **kwargs):
82
        """Fit feature generator to training data."""
83
    
84
    def transform(self, X, **kwargs):
85
        """Transform data using fitted generator."""
86

87
class PipelineFeatureGenerator(AbstractFeatureGenerator):
88
    def __init__(self, generators: list, **kwargs):
89
        """
90
        Chain multiple feature generators in sequence.
91
        
92
        Parameters:
93
        - generators: List of feature generator instances
94
        """
95

96
class BulkFeatureGenerator(AbstractFeatureGenerator):
97
    def __init__(self, generators: list, **kwargs):
98
        """
99
        Apply multiple feature generators in parallel.
100
        
101
        Parameters:
102
        - generators: List of feature generator instances
103
        """
104
```
105

106
### Categorical Feature Processing
107

108
Feature generators for categorical data encoding and transformation.
109

110
```python { .api }
111
class CategoryFeatureGenerator(AbstractFeatureGenerator):
112
    def __init__(
113
        self,
114
        cat_order: str = 'count',
115
        maximum_num_cat: int = 10000,
116
        verbosity: int = 0,
117
        **kwargs
118
    ):
119
        """
120
        Generate categorical features with label encoding.
121
        
122
        Parameters:
123
        - cat_order: Category ordering method ('count', 'alphabetic')
124
        - maximum_num_cat: Maximum number of categories to process
125
        - verbosity: Logging verbosity level
126
        """
127

128
class OneHotEncoderFeatureGenerator(AbstractFeatureGenerator):
129
    def __init__(
130
        self,
131
        maximum_num_cat: int = 10,
132
        minimum_cat_count: int = 30,
133
        **kwargs
134
    ):
135
        """
136
        Generate one-hot encoded features for categorical data.
137
        
138
        Parameters:
139
        - maximum_num_cat: Maximum categories for one-hot encoding
140
        - minimum_cat_count: Minimum category frequency for inclusion
141
        """
142

143
class LabelEncoderFeatureGenerator(AbstractFeatureGenerator):
144
    def __init__(self, verbosity: int = 0, **kwargs):
145
        """
146
        Generate label encoded features for categorical data.
147
        
148
        Parameters:
149
        - verbosity: Logging verbosity level
150
        """
151
```
152

153
### Numerical Feature Processing
154

155
Feature generators for numerical data transformation and binning.
156

157
```python { .api }
158
class BinnedFeatureGenerator(AbstractFeatureGenerator):
159
    def __init__(
160
        self,
161
        num_bins: int = 10,
162
        quantile_bin: bool = True,
163
        **kwargs
164
    ):
165
        """
166
        Generate binned features from numerical data.
167
        
168
        Parameters:
169
        - num_bins: Number of bins to create
170
        - quantile_bin: Use quantile-based binning
171
        """
172

173
class NumericMemoryMinimizeFeatureGenerator(AbstractFeatureGenerator):
174
    def __init__(self, **kwargs):
175
        """
176
        Minimize memory usage of numerical features through dtype optimization.
177
        """
178

179
class CategoryMemoryMinimizeFeatureGenerator(AbstractFeatureGenerator):
180
    def __init__(self, **kwargs):
181
        """
182
        Minimize memory usage of categorical features through dtype optimization.
183
        """
184
```
185

186
### Text Feature Processing
187

188
Feature generators specialized for text data processing and transformation.
189

190
```python { .api }
191
class TextNgramFeatureGenerator(AbstractFeatureGenerator):
192
    def __init__(
193
        self,
194
        vectorizer_strategy: str = 'tf-idf',
195
        max_features: int = 10000,
196
        ngram_range: tuple = (1, 3),
197
        **kwargs
198
    ):
199
        """
200
        Generate n-gram features from text data.
201
        
202
        Parameters:
203
        - vectorizer_strategy: Vectorization method ('tf-idf', 'count')
204
        - max_features: Maximum number of features to generate
205
        - ngram_range: Range of n-gram sizes (min_n, max_n)
206
        """
207

208
class TextSpecialFeatureGenerator(AbstractFeatureGenerator):
209
    def __init__(self, **kwargs):
210
        """
211
        Generate special character and text statistics features.
212
        
213
        Creates features like text length, number of words, 
214
        special character counts, etc.
215
        """
216
```
217

218
### Datetime Feature Processing
219

220
Feature generators for datetime and temporal data transformation.
221

222
```python { .api }
223
class DatetimeFeatureGenerator(AbstractFeatureGenerator):
224
    def __init__(
225
        self,
226
        features_to_extract: list = None,
227
        **kwargs
228
    ):
229
        """
230
        Generate datetime-based features from timestamp columns.
231
        
232
        Parameters:
233
        - features_to_extract: List of datetime features to extract
234
          Options: ['year', 'month', 'day', 'dayofweek', 'hour', 'minute', 'second']
235
        
236
        Generates features like:
237
        - Year, month, day components
238
        - Day of week, hour of day
239
        - Is weekend, is business hour
240
        - Cyclical encodings for periodic features
241
        """
242
```
243

244
### Data Cleaning and Preprocessing
245

246
Feature generators for data cleaning and basic preprocessing operations.
247

248
```python { .api }
249
class FillNaFeatureGenerator(AbstractFeatureGenerator):
250
    def __init__(
251
        self,
252
        inplace: bool = True,
253
        fillna_map: dict = None,
254
        **kwargs
255
    ):
256
        """
257
        Handle missing values through various imputation strategies.
258
        
259
        Parameters:
260
        - inplace: Modify features in place
261
        - fillna_map: Custom fill values for specific columns
262
        """
263

264
class DropUniqueFeatureGenerator(AbstractFeatureGenerator):
265
    def __init__(self, **kwargs):
266
        """
267
        Remove features with only one unique value (constant features).
268
        """
269

270
class DropDuplicatesFeatureGenerator(AbstractFeatureGenerator):
271
    def __init__(self, **kwargs):
272
        """
273
        Remove duplicate features (identical columns).
274
        """
275

276
class IsNanFeatureGenerator(AbstractFeatureGenerator):
277
    def __init__(self, **kwargs):
278
        """
279
        Generate binary indicator features for missing values.
280
        """
281
```
282

283
### Utility Feature Generators
284

285
Helper feature generators for type conversion and feature management.
286

287
```python { .api }
288
class AsTypeFeatureGenerator(AbstractFeatureGenerator):
289
    def __init__(
290
        self,
291
        convert_map: dict,
292
        **kwargs
293
    ):
294
        """
295
        Convert feature data types.
296
        
297
        Parameters:
298
        - convert_map: Dictionary mapping column names to target dtypes
299
        """
300

301
class IdentityFeatureGenerator(AbstractFeatureGenerator):
302
    def __init__(self, **kwargs):
303
        """
304
        Pass-through generator that returns features unchanged.
305
        """
306

307
class RenameFeatureGenerator(AbstractFeatureGenerator):
308
    def __init__(
309
        self,
310
        rename_map: dict,
311
        **kwargs
312
    ):
313
        """
314
        Rename features according to mapping.
315
        
316
        Parameters:
317
        - rename_map: Dictionary mapping old names to new names
318
        """
319

320
class DummyFeatureGenerator(AbstractFeatureGenerator):
321
    def __init__(self, **kwargs):
322
        """
323
        Placeholder generator for testing and debugging.
324
        """
325
```
326

327
## Usage Examples
328

329
### Basic Feature Engineering Pipeline
330

331
```python
332
from autogluon.features import AutoMLPipelineFeatureGenerator
333
import pandas as pd
334

335
# Sample dataset with mixed data types
336
df = pd.DataFrame({
337
    'numerical_col': [1.5, 2.3, 3.1, 4.7],
338
    'categorical_col': ['A', 'B', 'A', 'C'],
339
    'text_col': ['hello world', 'goodbye moon', 'hello again', 'farewell sun'],
340
    'datetime_col': pd.date_range('2023-01-01', periods=4, freq='D'),
341
    'target': [0, 1, 0, 1]
342
})
343

344
# Initialize automated feature generator
345
feature_generator = AutoMLPipelineFeatureGenerator(
346
    enable_text_ngram_features=True,
347
    enable_datetime_features=True,
348
    enable_categorical_features=True
349
)
350

351
# Fit and transform features
352
X = df.drop('target', axis=1)
353
y = df['target']
354

355
X_transformed = feature_generator.fit_transform(X, y)
356
print(f"Original features: {X.shape[1]}")
357
print(f"Engineered features: {X_transformed.shape[1]}")
358
print(f"New columns: {list(X_transformed.columns)}")
359

360
# Transform new data
361
X_new_transformed = feature_generator.transform(new_data)
362
```
363

364
### Custom Feature Engineering Pipeline
365

366
```python
367
from autogluon.features import (
368
    PipelineFeatureGenerator,
369
    DatetimeFeatureGenerator,
370
    CategoryFeatureGenerator,
371
    TextNgramFeatureGenerator,
372
    FillNaFeatureGenerator
373
)
374

375
# Build custom pipeline
376
custom_pipeline = PipelineFeatureGenerator([
377
    FillNaFeatureGenerator(),  # Handle missing values first
378
    DatetimeFeatureGenerator(
379
        features_to_extract=['year', 'month', 'dayofweek', 'hour']
380
    ),
381
    CategoryFeatureGenerator(maximum_num_cat=1000),
382
    TextNgramFeatureGenerator(
383
        max_features=5000,
384
        ngram_range=(1, 2),
385
        vectorizer_strategy='tf-idf'
386
    )
387
])
388

389
# Apply custom pipeline
390
X_custom = custom_pipeline.fit_transform(raw_data, target_data)
391
```
392

393
### Specialized Text Processing
394

395
```python
396
from autogluon.features import TextSpecialFeatureGenerator, TextNgramFeatureGenerator
397
from autogluon.features import BulkFeatureGenerator
398

399
# Combine multiple text feature generators
400
text_features = BulkFeatureGenerator([
401
    TextSpecialFeatureGenerator(),  # Text statistics
402
    TextNgramFeatureGenerator(
403
        ngram_range=(1, 3),
404
        max_features=10000,
405
        vectorizer_strategy='tf-idf'
406
    )
407
])
408

409
# Process text data
410
text_df = pd.DataFrame({
411
    'review_text': ['Great product!', 'Not bad', 'Excellent quality', 'Poor service'],
412
    'description': ['Short desc', 'Longer description here', 'Brief', 'Detailed info']
413
})
414

415
text_features_generated = text_features.fit_transform(text_df)
416
print(f"Generated {text_features_generated.shape[1]} text features")
417
```
418

419
### Memory-Optimized Feature Processing
420

421
```python
422
from autogluon.features import (
423
    AutoMLPipelineFeatureGenerator,
424
    NumericMemoryMinimizeFeatureGenerator,
425
    CategoryMemoryMinimizeFeatureGenerator,
426
    PipelineFeatureGenerator
427
)
428

429
# Memory-optimized pipeline for large datasets
430
memory_optimized = PipelineFeatureGenerator([
431
    AutoMLPipelineFeatureGenerator(),
432
    NumericMemoryMinimizeFeatureGenerator(),
433
    CategoryMemoryMinimizeFeatureGenerator()
434
])
435

436
# Process large dataset with memory optimization
437
large_data_processed = memory_optimized.fit_transform(large_dataset)
438
print(f"Memory usage reduced by dtype optimization")
439
```

Version

Tile

Files

features.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

features.mddocs/