or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

autogen.mdautoml.mddefault-estimators.mdindex.mdonline-learning.mdtuning.md

automl.mddocs/

0

# Automated Machine Learning

1

2

Complete automated machine learning pipeline that supports classification, regression, forecasting, ranking, and NLP tasks. AutoML automatically selects the best model and hyperparameters within a specified time budget, providing an efficient solution for various machine learning problems.

3

4

## Capabilities

5

6

### AutoML Class

7

8

The main AutoML class provides automated machine learning with intelligent model selection, hyperparameter optimization, and ensemble methods.

9

10

```python { .api }

11

class AutoML:

12

def __init__(self):

13

"""Initialize AutoML instance."""

14

15

def fit(self, X_train, y_train, task="classification", time_budget=60,

16

metric="auto", estimator_list="auto", eval_method="auto",

17

split_ratio=0.1, n_splits=5, ensemble=False,

18

n_jobs=1, verbose=0, **kwargs):

19

"""

20

Train AutoML model.

21

22

Args:

23

X_train: Training feature data (pandas DataFrame, numpy array, or sparse matrix)

24

y_train: Training target data (pandas Series or numpy array)

25

task (str): Task type - 'classification', 'regression', 'ts_forecast', 'rank', 'nlp'

26

time_budget (float): Time budget in seconds for training

27

metric (str or callable): Evaluation metric ('accuracy', 'roc_auc', 'rmse', 'mae', etc.)

28

estimator_list (list): List of estimator names to try ('auto' for default selection)

29

eval_method (str): Evaluation method - 'auto', 'cv', 'holdout'

30

split_ratio (float): Validation split ratio for holdout method

31

n_splits (int): Number of cross-validation folds

32

ensemble (bool): Whether to perform ensemble learning

33

n_jobs (int): Number of parallel jobs (-1 for all processors)

34

verbose (int): Verbosity level (0-5+)

35

36

Returns:

37

self: Fitted AutoML instance

38

"""

39

40

def predict(self, X, **kwargs):

41

"""

42

Make predictions on new data.

43

44

Args:

45

X: Feature data for prediction (same format as training data)

46

**kwargs: Additional prediction parameters

47

48

Returns:

49

numpy.ndarray: Predictions

50

"""

51

52

def predict_proba(self, X, **kwargs):

53

"""

54

Get prediction probabilities (classification only).

55

56

Args:

57

X: Feature data for prediction

58

**kwargs: Additional prediction parameters

59

60

Returns:

61

numpy.ndarray: Prediction probabilities

62

"""

63

64

def score(self, X, y, **kwargs):

65

"""

66

Evaluate model performance.

67

68

Args:

69

X: Feature data for evaluation

70

y: True target values

71

**kwargs: Additional scoring parameters

72

73

Returns:

74

float: Score based on the specified metric

75

"""

76

77

def add_learner(self, learner_name, learner_class):

78

"""

79

Add custom learner to estimator list.

80

81

Args:

82

learner_name (str): Name for the custom learner

83

learner_class: Learner class implementing fit/predict interface

84

"""

85

```

86

87

### Model Properties and Results

88

89

Access to the best model, configuration, and training results.

90

91

```python { .api }

92

class AutoML:

93

@property

94

def best_estimator(self):

95

"""Best trained estimator instance."""

96

97

@property

98

def best_config(self):

99

"""Best hyperparameter configuration found."""

100

101

@property

102

def best_loss(self):

103

"""Best validation loss achieved."""

104

105

@property

106

def model(self):

107

"""Trained model object (alias for best_estimator)."""

108

109

@property

110

def feature_importances_(self):

111

"""Feature importance values from the best model."""

112

113

@property

114

def classes_(self):

115

"""Class labels for classification tasks."""

116

117

@property

118

def best_config_per_estimator(self):

119

"""Best configuration for each estimator type tried."""

120

121

@property

122

def time_to_find_best_model(self):

123

"""Time taken to find the best model in seconds."""

124

125

@property

126

def feature_transformer(self):

127

"""Feature preprocessing pipeline."""

128

129

@property

130

def label_transformer(self):

131

"""Label preprocessing pipeline."""

132

```

133

134

### Model Management and Persistence

135

136

Save, load, and retrain models with configuration management.

137

138

```python { .api }

139

class AutoML:

140

def save_best_config(self, filename):

141

"""

142

Save best configuration to file.

143

144

Args:

145

filename (str): Path to save configuration

146

"""

147

148

def get_estimator_from_log(self, log_file_name, record_id, task):

149

"""

150

Extract estimator from training log.

151

152

Args:

153

log_file_name (str): Path to log file

154

record_id (int): Record identifier

155

task (str): Task type

156

157

Returns:

158

Trained estimator instance

159

"""

160

161

def retrain_from_log(self, log_file_name, X_train, y_train,

162

task, record_id=-1, **kwargs):

163

"""

164

Retrain model from logged configuration.

165

166

Args:

167

log_file_name (str): Path to training log

168

X_train: Training features

169

y_train: Training targets

170

task (str): Task type

171

record_id (int): Record ID (-1 for best)

172

**kwargs: Additional training parameters

173

"""

174

```

175

176

### Utility Functions

177

178

Helper functions for model analysis and configuration.

179

180

```python { .api }

181

def size(learner_classes, config):

182

"""

183

Calculate memory size for a model configuration.

184

185

Args:

186

learner_classes (dict): Dictionary of learner classes

187

config (dict): Model configuration

188

189

Returns:

190

float: Estimated memory size in bytes

191

"""

192

```

193

194

### Usage Examples

195

196

#### Basic Classification

197

```python

198

from flaml import AutoML

199

from sklearn.datasets import load_iris

200

from sklearn.model_selection import train_test_split

201

202

# Load data

203

X, y = load_iris(return_X_y=True)

204

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

205

206

# Train AutoML model

207

automl = AutoML()

208

automl.fit(X_train, y_train, task="classification", time_budget=30)

209

210

# Make predictions

211

predictions = automl.predict(X_test)

212

probabilities = automl.predict_proba(X_test)

213

214

print(f"Best model: {automl.best_estimator}")

215

print(f"Accuracy: {automl.score(X_test, y_test)}")

216

```

217

218

#### Regression with Custom Settings

219

```python

220

from flaml import AutoML

221

import pandas as pd

222

223

# Load regression data

224

df = pd.read_csv("regression_data.csv")

225

X = df.drop("target", axis=1)

226

y = df["target"]

227

228

# Configure AutoML

229

automl = AutoML()

230

settings = {

231

"task": "regression",

232

"time_budget": 300,

233

"metric": "rmse",

234

"estimator_list": ["lgbm", "xgboost", "rf"],

235

"ensemble": True,

236

"n_jobs": -1,

237

"verbose": 1

238

}

239

240

# Train and evaluate

241

automl.fit(X, y, **settings)

242

print(f"Best RMSE: {automl.best_loss}")

243

print(f"Feature importance: {automl.feature_importances_}")

244

```

245

246

#### Time Series Forecasting

247

```python

248

from flaml import AutoML

249

import pandas as pd

250

251

# Load time series data

252

df = pd.read_csv("timeseries.csv")

253

df["ds"] = pd.to_datetime(df["ds"])

254

255

# Configure for forecasting

256

automl = AutoML()

257

automl.fit(

258

df,

259

task="ts_forecast",

260

time_budget=600,

261

metric="mape",

262

period=12, # seasonal period

263

verbose=2

264

)

265

266

# Generate forecasts

267

forecasts = automl.predict(steps=24) # 24 steps ahead

268

```

269

270

#### Custom Learner Integration

271

```python

272

from flaml import AutoML

273

from sklearn.svm import SVC

274

275

# Add custom learner

276

automl = AutoML()

277

automl.add_learner("custom_svm", SVC)

278

279

# Use custom learner in training

280

automl.fit(

281

X_train, y_train,

282

task="classification",

283

estimator_list=["lgbm", "custom_svm"],

284

time_budget=120

285

)

286

```

287

288

## State Management Classes

289

290

Classes for managing AutoML training state and search configuration.

291

292

```python { .api }

293

class AutoMLState:

294

"""Manages AutoML training state and sample data preparation."""

295

296

def prepare_sample_train_data(self, sample_size):

297

"""

298

Prepare sampled training data for efficient search.

299

300

Args:

301

sample_size (int): Size of sample to create

302

"""

303

304

class SearchState:

305

"""Manages hyperparameter search state and validation."""

306

307

@property

308

def search_space(self):

309

"""Current search space configuration."""

310

311

@property

312

def estimated_cost4improvement(self):

313

"""Estimated cost for model improvement."""

314

```

315

316

## Supported Tasks and Metrics

317

318

### Task Types

319

- **classification**: Binary and multi-class classification

320

- **regression**: Continuous target prediction

321

- **ts_forecast**: Time series forecasting

322

- **rank**: Learning to rank

323

- **nlp**: Natural language processing tasks

324

325

### Metrics

326

- **Classification**: accuracy, roc_auc, roc_auc_ovr, f1, log_loss, precision, recall

327

- **Regression**: rmse, mae, mse, r2, mape

328

- **Forecasting**: mape, smape, mae, rmse

329

- **Ranking**: ndcg, ap

330

331

### Estimators

332

- **lgbm**: LightGBM (gradient boosting)

333

- **xgboost**: XGBoost (gradient boosting)

334

- **rf**: Random Forest

335

- **extra_tree**: Extra Trees

336

- **lrl1**: Logistic Regression with L1 regularization

337

- **lrl2**: Logistic Regression with L2 regularization

338

- **catboost**: CatBoost (if installed)

339

- **kneighbor**: K-Nearest Neighbors

340

- **prophet**: Prophet for time series (if installed)

341

- **arima**: ARIMA for time series