Tessl Tile for pypi/flaml@2.3.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

autogen.md automl.md default-estimators.md index.md online-learning.md tuning.md

automl.mddocs/

0
# Automated Machine Learning
1

2
Complete automated machine learning pipeline that supports classification, regression, forecasting, ranking, and NLP tasks. AutoML automatically selects the best model and hyperparameters within a specified time budget, providing an efficient solution for various machine learning problems.
3

4
## Capabilities
5

6
### AutoML Class
7

8
The main AutoML class provides automated machine learning with intelligent model selection, hyperparameter optimization, and ensemble methods.
9

10
```python { .api }
11
class AutoML:
12
    def __init__(self):
13
        """Initialize AutoML instance."""
14
        
15
    def fit(self, X_train, y_train, task="classification", time_budget=60, 
16
            metric="auto", estimator_list="auto", eval_method="auto", 
17
            split_ratio=0.1, n_splits=5, ensemble=False, 
18
            n_jobs=1, verbose=0, **kwargs):
19
        """
20
        Train AutoML model.
21
        
22
        Args:
23
            X_train: Training feature data (pandas DataFrame, numpy array, or sparse matrix)
24
            y_train: Training target data (pandas Series or numpy array)
25
            task (str): Task type - 'classification', 'regression', 'ts_forecast', 'rank', 'nlp'
26
            time_budget (float): Time budget in seconds for training
27
            metric (str or callable): Evaluation metric ('accuracy', 'roc_auc', 'rmse', 'mae', etc.)
28
            estimator_list (list): List of estimator names to try ('auto' for default selection)
29
            eval_method (str): Evaluation method - 'auto', 'cv', 'holdout'
30
            split_ratio (float): Validation split ratio for holdout method
31
            n_splits (int): Number of cross-validation folds
32
            ensemble (bool): Whether to perform ensemble learning
33
            n_jobs (int): Number of parallel jobs (-1 for all processors)
34
            verbose (int): Verbosity level (0-5+)
35
            
36
        Returns:
37
            self: Fitted AutoML instance
38
        """
39
        
40
    def predict(self, X, **kwargs):
41
        """
42
        Make predictions on new data.
43
        
44
        Args:
45
            X: Feature data for prediction (same format as training data)
46
            **kwargs: Additional prediction parameters
47
            
48
        Returns:
49
            numpy.ndarray: Predictions
50
        """
51
        
52
    def predict_proba(self, X, **kwargs):
53
        """
54
        Get prediction probabilities (classification only).
55
        
56
        Args:
57
            X: Feature data for prediction
58
            **kwargs: Additional prediction parameters
59
            
60
        Returns:
61
            numpy.ndarray: Prediction probabilities
62
        """
63
        
64
    def score(self, X, y, **kwargs):
65
        """
66
        Evaluate model performance.
67
        
68
        Args:
69
            X: Feature data for evaluation
70
            y: True target values
71
            **kwargs: Additional scoring parameters
72
            
73
        Returns:
74
            float: Score based on the specified metric
75
        """
76
        
77
    def add_learner(self, learner_name, learner_class):
78
        """
79
        Add custom learner to estimator list.
80
        
81
        Args:
82
            learner_name (str): Name for the custom learner
83
            learner_class: Learner class implementing fit/predict interface
84
        """
85
```
86

87
### Model Properties and Results
88

89
Access to the best model, configuration, and training results.
90

91
```python { .api }
92
class AutoML:
93
    @property
94
    def best_estimator(self):
95
        """Best trained estimator instance."""
96
        
97
    @property
98
    def best_config(self):
99
        """Best hyperparameter configuration found."""
100
        
101
    @property
102
    def best_loss(self):
103
        """Best validation loss achieved."""
104
        
105
    @property
106
    def model(self):
107
        """Trained model object (alias for best_estimator)."""
108
        
109
    @property
110
    def feature_importances_(self):
111
        """Feature importance values from the best model."""
112
        
113
    @property
114
    def classes_(self):
115
        """Class labels for classification tasks."""
116
        
117
    @property
118
    def best_config_per_estimator(self):
119
        """Best configuration for each estimator type tried."""
120
        
121
    @property
122
    def time_to_find_best_model(self):
123
        """Time taken to find the best model in seconds."""
124
        
125
    @property
126
    def feature_transformer(self):
127
        """Feature preprocessing pipeline."""
128
        
129
    @property
130
    def label_transformer(self):
131
        """Label preprocessing pipeline."""
132
```
133

134
### Model Management and Persistence
135

136
Save, load, and retrain models with configuration management.
137

138
```python { .api }
139
class AutoML:
140
    def save_best_config(self, filename):
141
        """
142
        Save best configuration to file.
143
        
144
        Args:
145
            filename (str): Path to save configuration
146
        """
147
        
148
    def get_estimator_from_log(self, log_file_name, record_id, task):
149
        """
150
        Extract estimator from training log.
151
        
152
        Args:
153
            log_file_name (str): Path to log file
154
            record_id (int): Record identifier
155
            task (str): Task type
156
            
157
        Returns:
158
            Trained estimator instance
159
        """
160
        
161
    def retrain_from_log(self, log_file_name, X_train, y_train, 
162
                        task, record_id=-1, **kwargs):
163
        """
164
        Retrain model from logged configuration.
165
        
166
        Args:
167
            log_file_name (str): Path to training log
168
            X_train: Training features
169
            y_train: Training targets  
170
            task (str): Task type
171
            record_id (int): Record ID (-1 for best)
172
            **kwargs: Additional training parameters
173
        """
174
```
175

176
### Utility Functions
177

178
Helper functions for model analysis and configuration.
179

180
```python { .api }
181
def size(learner_classes, config):
182
    """
183
    Calculate memory size for a model configuration.
184
    
185
    Args:
186
        learner_classes (dict): Dictionary of learner classes
187
        config (dict): Model configuration
188
        
189
    Returns:
190
        float: Estimated memory size in bytes
191
    """
192
```
193

194
### Usage Examples
195

196
#### Basic Classification
197
```python
198
from flaml import AutoML
199
from sklearn.datasets import load_iris
200
from sklearn.model_selection import train_test_split
201

202
# Load data
203
X, y = load_iris(return_X_y=True)
204
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
205

206
# Train AutoML model
207
automl = AutoML()
208
automl.fit(X_train, y_train, task="classification", time_budget=30)
209

210
# Make predictions
211
predictions = automl.predict(X_test)
212
probabilities = automl.predict_proba(X_test)
213

214
print(f"Best model: {automl.best_estimator}")
215
print(f"Accuracy: {automl.score(X_test, y_test)}")
216
```
217

218
#### Regression with Custom Settings
219
```python
220
from flaml import AutoML
221
import pandas as pd
222

223
# Load regression data
224
df = pd.read_csv("regression_data.csv")
225
X = df.drop("target", axis=1)
226
y = df["target"]
227

228
# Configure AutoML
229
automl = AutoML()
230
settings = {
231
    "task": "regression",
232
    "time_budget": 300,
233
    "metric": "rmse", 
234
    "estimator_list": ["lgbm", "xgboost", "rf"],
235
    "ensemble": True,
236
    "n_jobs": -1,
237
    "verbose": 1
238
}
239

240
# Train and evaluate
241
automl.fit(X, y, **settings)
242
print(f"Best RMSE: {automl.best_loss}")
243
print(f"Feature importance: {automl.feature_importances_}")
244
```
245

246
#### Time Series Forecasting
247
```python
248
from flaml import AutoML
249
import pandas as pd
250

251
# Load time series data
252
df = pd.read_csv("timeseries.csv")
253
df["ds"] = pd.to_datetime(df["ds"])
254

255
# Configure for forecasting
256
automl = AutoML()
257
automl.fit(
258
    df, 
259
    task="ts_forecast",
260
    time_budget=600,
261
    metric="mape",
262
    period=12,  # seasonal period
263
    verbose=2
264
)
265

266
# Generate forecasts
267
forecasts = automl.predict(steps=24)  # 24 steps ahead
268
```
269

270
#### Custom Learner Integration
271
```python
272
from flaml import AutoML
273
from sklearn.svm import SVC
274

275
# Add custom learner
276
automl = AutoML()
277
automl.add_learner("custom_svm", SVC)
278

279
# Use custom learner in training
280
automl.fit(
281
    X_train, y_train,
282
    task="classification", 
283
    estimator_list=["lgbm", "custom_svm"],
284
    time_budget=120
285
)
286
```
287

288
## State Management Classes
289

290
Classes for managing AutoML training state and search configuration.
291

292
```python { .api }
293
class AutoMLState:
294
    """Manages AutoML training state and sample data preparation."""
295
    
296
    def prepare_sample_train_data(self, sample_size):
297
        """
298
        Prepare sampled training data for efficient search.
299
        
300
        Args:
301
            sample_size (int): Size of sample to create
302
        """
303

304
class SearchState:
305
    """Manages hyperparameter search state and validation."""
306
    
307
    @property
308
    def search_space(self):
309
        """Current search space configuration."""
310
        
311
    @property  
312
    def estimated_cost4improvement(self):
313
        """Estimated cost for model improvement."""
314
```
315

316
## Supported Tasks and Metrics
317

318
### Task Types
319
- **classification**: Binary and multi-class classification
320
- **regression**: Continuous target prediction  
321
- **ts_forecast**: Time series forecasting
322
- **rank**: Learning to rank
323
- **nlp**: Natural language processing tasks
324

325
### Metrics
326
- **Classification**: accuracy, roc_auc, roc_auc_ovr, f1, log_loss, precision, recall
327
- **Regression**: rmse, mae, mse, r2, mape  
328
- **Forecasting**: mape, smape, mae, rmse
329
- **Ranking**: ndcg, ap
330

331
### Estimators
332
- **lgbm**: LightGBM (gradient boosting)
333
- **xgboost**: XGBoost (gradient boosting)
334
- **rf**: Random Forest
335
- **extra_tree**: Extra Trees
336
- **lrl1**: Logistic Regression with L1 regularization
337
- **lrl2**: Logistic Regression with L2 regularization
338
- **catboost**: CatBoost (if installed)
339
- **kneighbor**: K-Nearest Neighbors
340
- **prophet**: Prophet for time series (if installed)
341
- **arima**: ARIMA for time series

Version

Tile

Files

automl.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

automl.mddocs/