Tessl Tile for pypi/autogluon.tabular@1.4.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

configurations.md experimental.md index.md models.md predictor.md

experimental.mddocs/

0
# Experimental Scikit-learn Compatible Interfaces
1

2
AutoGluon provides experimental scikit-learn compatible interfaces for seamless integration with existing scikit-learn workflows, pipelines, and ecosystem tools. These classes provide familiar fit/predict APIs while leveraging AutoGluon's automated machine learning capabilities.
3

4
## Capabilities
5

6
### Tabular Classification
7

8
Scikit-learn compatible classifier interface that wraps AutoGluon's TabularPredictor for classification tasks with standard sklearn API conventions.
9

10
```python { .api }
11
class TabularClassifier:
12
    """
13
    Scikit-learn compatible classifier using AutoGluon's automated ML.
14
    
15
    Provides standard sklearn interface (fit, predict, predict_proba, score)
16
    while leveraging AutoGluon's model selection and ensemble capabilities.
17
    """
18
    
19
    def __init__(
20
        self,
21
        eval_metric: str = None,
22
        time_limit: float = None,
23
        presets: list[str] | str = None,
24
        hyperparameters: dict | str = None,
25
        path: str = None,
26
        verbosity: int = 2,
27
        init_args: dict = None,
28
        fit_args: dict = None
29
    ):
30
        """
31
        Initialize TabularClassifier.
32
        
33
        Parameters:
34
        - eval_metric: Evaluation metric for model selection
35
        - time_limit: Maximum training time in seconds
36
        - presets: Preset configurations for training
37
        - hyperparameters: Custom hyperparameter configurations
38
        - path: Directory to save models
39
        - verbosity: Logging level (0-4)
40
        - init_args: Additional initialization arguments
41
        - fit_args: Additional fitting arguments
42
        """
43
    
44
    def fit(
45
        self,
46
        X: pd.DataFrame | np.ndarray,
47
        y: pd.Series | np.ndarray,
48
        **kwargs
49
    ) -> 'TabularClassifier':
50
        """
51
        Train the classifier on the provided data.
52
        
53
        Parameters:
54
        - X: Training features
55
        - y: Training labels
56
        - kwargs: Additional arguments passed to TabularPredictor.fit()
57
        
58
        Returns:
59
        Self (fitted TabularClassifier)
60
        """
61
    
62
    def predict(
63
        self,
64
        X: pd.DataFrame | np.ndarray
65
    ) -> np.ndarray:
66
        """
67
        Generate class predictions for input data.
68
        
69
        Parameters:
70
        - X: Input features
71
        
72
        Returns:
73
        Predicted class labels as numpy array
74
        """
75
    
76
    def predict_proba(
77
        self,
78
        X: pd.DataFrame | np.ndarray
79
    ) -> np.ndarray:
80
        """
81
        Generate class probabilities for input data.
82
        
83
        Parameters:
84
        - X: Input features
85
        
86
        Returns:
87
        Class probabilities as numpy array
88
        """
89
    
90
    def score(
91
        self,
92
        X: pd.DataFrame | np.ndarray,
93
        y: pd.Series | np.ndarray,
94
        sample_weight: np.ndarray = None
95
    ) -> float:
96
        """
97
        Calculate accuracy score on the given test data and labels.
98
        
99
        Parameters:
100
        - X: Test features
101
        - y: True labels
102
        - sample_weight: Sample weights for scoring
103
        
104
        Returns:
105
        Mean accuracy score
106
        """
107
```
108

109
### Tabular Regression
110

111
Scikit-learn compatible regressor interface that wraps AutoGluon's TabularPredictor for regression tasks with standard sklearn API conventions.
112

113
```python { .api }
114
class TabularRegressor:
115
    """
116
    Scikit-learn compatible regressor using AutoGluon's automated ML.
117
    
118
    Provides standard sklearn interface (fit, predict, score)
119
    while leveraging AutoGluon's model selection and ensemble capabilities.
120
    """
121
    
122
    def __init__(
123
        self,
124
        eval_metric: str = None,
125
        time_limit: float = None,
126
        presets: list[str] | str = None,
127
        hyperparameters: dict | str = None,
128
        path: str = None,
129
        verbosity: int = 2,
130
        init_args: dict = None,
131
        fit_args: dict = None
132
    ):
133
        """
134
        Initialize TabularRegressor.
135
        
136
        Parameters:
137
        - eval_metric: Evaluation metric for model selection
138
        - time_limit: Maximum training time in seconds
139
        - presets: Preset configurations for training
140
        - hyperparameters: Custom hyperparameter configurations
141
        - path: Directory to save models
142
        - verbosity: Logging level (0-4)
143
        - init_args: Additional initialization arguments
144
        - fit_args: Additional fitting arguments
145
        """
146
    
147
    def fit(
148
        self,
149
        X: pd.DataFrame | np.ndarray,
150
        y: pd.Series | np.ndarray,
151
        **kwargs
152
    ) -> 'TabularRegressor':
153
        """
154
        Train the regressor on the provided data.
155
        
156
        Parameters:
157
        - X: Training features
158
        - y: Training target values
159
        - kwargs: Additional arguments passed to TabularPredictor.fit()
160
        
161
        Returns:
162
        Self (fitted TabularRegressor)
163
        """
164
    
165
    def predict(
166
        self,
167
        X: pd.DataFrame | np.ndarray
168
    ) -> np.ndarray:
169
        """
170
        Generate predictions for input data.
171
        
172
        Parameters:
173
        - X: Input features
174
        
175
        Returns:
176
        Predicted values as numpy array
177
        """
178
    
179
    def score(
180
        self,
181
        X: pd.DataFrame | np.ndarray,
182
        y: pd.Series | np.ndarray,
183
        sample_weight: np.ndarray = None
184
    ) -> float:
185
        """
186
        Calculate R² coefficient of determination on test data.
187
        
188
        Parameters:
189
        - X: Test features
190
        - y: True target values
191
        - sample_weight: Sample weights for scoring
192
        
193
        Returns:
194
        R² score
195
        """
196
```
197

198
## Usage Examples
199

200
### Classification with Scikit-learn Pipeline
201

202
```python
203
from autogluon.tabular.experimental import TabularClassifier
204
from sklearn.pipeline import Pipeline
205
from sklearn.preprocessing import StandardScaler
206
from sklearn.model_selection import cross_val_score
207
import pandas as pd
208

209
# Load data
210
X_train = pd.read_csv('X_train.csv')
211
y_train = pd.read_csv('y_train.csv').squeeze()
212
X_test = pd.read_csv('X_test.csv')
213

214
# Create sklearn-compatible classifier
215
classifier = TabularClassifier(
216
    eval_metric='roc_auc',
217
    verbosity=1
218
)
219

220
# Use in sklearn pipeline
221
pipeline = Pipeline([
222
    ('scaler', StandardScaler()),
223
    ('classifier', classifier)
224
])
225

226
# Cross-validation with sklearn
227
cv_scores = cross_val_score(
228
    pipeline, 
229
    X_train, 
230
    y_train, 
231
    cv=5, 
232
    scoring='roc_auc'
233
)
234

235
print(f"Cross-validation AUC: {cv_scores.mean():.4f} (+/- {cv_scores.std() * 2:.4f})")
236

237
# Fit and predict
238
pipeline.fit(X_train, y_train)
239
predictions = pipeline.predict(X_test)
240
probabilities = pipeline.predict_proba(X_test)
241
```
242

243
### Regression with GridSearchCV
244

245
```python
246
from autogluon.tabular.experimental import TabularRegressor
247
from sklearn.model_selection import GridSearchCV
248
from sklearn.metrics import mean_squared_error
249
import pandas as pd
250
import numpy as np
251

252
# Load regression data
253
X_train = pd.read_csv('X_train.csv')
254
y_train = pd.read_csv('y_train.csv').squeeze()
255
X_test = pd.read_csv('X_test.csv')
256
y_test = pd.read_csv('y_test.csv').squeeze()
257

258
# Create regressor
259
regressor = TabularRegressor(verbosity=1)
260

261
# Grid search over AutoGluon parameters
262
param_grid = {
263
    'eval_metric': ['mean_squared_error', 'mean_absolute_error'],
264
    'time_limit': [300, 600],
265
    'presets': ['good_quality', 'best_quality']
266
}
267

268
# Perform grid search
269
grid_search = GridSearchCV(
270
    regressor,
271
    param_grid,
272
    cv=3,
273
    scoring='neg_mean_squared_error',
274
    n_jobs=1  # AutoGluon handles parallelization internally
275
)
276

277
# Fit with grid search
278
grid_search.fit(X_train, y_train)
279

280
# Best model predictions
281
best_model = grid_search.best_estimator_
282
predictions = best_model.predict(X_test)
283

284
# Evaluate
285
mse = mean_squared_error(y_test, predictions)
286
rmse = np.sqrt(mse)
287

288
print(f"Best parameters: {grid_search.best_params_}")
289
print(f"Test RMSE: {rmse:.4f}")
290
print(f"Test R²: {best_model.score(X_test, y_test):.4f}")
291
```
292

293
### Integration with Model Selection
294

295
```python
296
from autogluon.tabular.experimental import TabularClassifier, TabularRegressor
297
from sklearn.model_selection import train_test_split
298
from sklearn.ensemble import RandomForestClassifier
299
from sklearn.linear_model import LogisticRegression
300
from sklearn.metrics import classification_report
301
import pandas as pd
302

303
# Prepare data
304
X = pd.read_csv('features.csv')
305
y = pd.read_csv('target.csv').squeeze()
306
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=42)
307

308
# Compare AutoGluon with sklearn models
309
models = {
310
    'AutoGluon': TabularClassifier(time_limit=300, verbosity=0),
311
    'RandomForest': RandomForestClassifier(n_estimators=100, random_state=42),
312
    'LogisticRegression': LogisticRegression(random_state=42)
313
}
314

315
results = {}
316
for name, model in models.items():
317
    # Fit model
318
    model.fit(X_train, y_train)
319
    
320
    # Predictions
321
    predictions = model.predict(X_val)
322
    
323
    # Store results
324
    results[name] = {
325
        'accuracy': model.score(X_val, y_val),
326
        'predictions': predictions
327
    }
328
    
329
    print(f"\n{name} Results:")
330
    print(f"Accuracy: {results[name]['accuracy']:.4f}")
331
    print(classification_report(y_val, predictions))
332
```
333

334
### Advanced Usage with Custom Configurations
335

336
```python
337
from autogluon.tabular.experimental import TabularClassifier
338

339
# Custom hyperparameters for AutoGluon models
340
hyperparameters = {
341
    'LGB': {'num_leaves': [26, 66, 176]},
342
    'XGB': {'n_estimators': [50, 100, 200]},
343
    'CAT': {'iterations': [100, 200, 500]}
344
}
345

346
# Advanced classifier with custom settings
347
classifier = TabularClassifier(
348
    problem_type='multiclass',
349
    eval_metric='f1_macro',
350
    path='./sklearn_compatible_models/',
351
    verbosity=2
352
)
353

354
# Fit with custom hyperparameters and advanced options
355
classifier.fit(
356
    X_train,
357
    y_train,
358
    time_limit=900,
359
    hyperparameters=hyperparameters,
360
    num_bag_folds=5,
361
    presets='best_quality'
362
)
363

364
# Access underlying AutoGluon predictor for advanced functionality
365
autogluon_predictor = classifier.predictor
366
leaderboard = autogluon_predictor.leaderboard(extra_info=True)
367
print(leaderboard)
368

369
# Standard sklearn predictions
370
predictions = classifier.predict(X_test)
371
probabilities = classifier.predict_proba(X_test)
372
```
373

374
## Notes
375

376
- **Experimental Status**: These interfaces are experimental and may change in future versions
377
- **Feature Compatibility**: Most AutoGluon features are accessible through the underlying predictor
378
- **Performance**: Same performance as using TabularPredictor directly
379
- **Integration**: Full compatibility with sklearn pipelines, grid search, and cross-validation
380
- **Memory**: Models are stored in the specified path directory for persistence
381
- **Parallelization**: AutoGluon handles internal parallelization; avoid nested parallelization in sklearn tools

Version

Tile

Files

experimental.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

experimental.mddocs/