or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

configurations.mdexperimental.mdindex.mdmodels.mdpredictor.md

experimental.mddocs/

0

# Experimental Scikit-learn Compatible Interfaces

1

2

AutoGluon provides experimental scikit-learn compatible interfaces for seamless integration with existing scikit-learn workflows, pipelines, and ecosystem tools. These classes provide familiar fit/predict APIs while leveraging AutoGluon's automated machine learning capabilities.

3

4

## Capabilities

5

6

### Tabular Classification

7

8

Scikit-learn compatible classifier interface that wraps AutoGluon's TabularPredictor for classification tasks with standard sklearn API conventions.

9

10

```python { .api }

11

class TabularClassifier:

12

"""

13

Scikit-learn compatible classifier using AutoGluon's automated ML.

14

15

Provides standard sklearn interface (fit, predict, predict_proba, score)

16

while leveraging AutoGluon's model selection and ensemble capabilities.

17

"""

18

19

def __init__(

20

self,

21

eval_metric: str = None,

22

time_limit: float = None,

23

presets: list[str] | str = None,

24

hyperparameters: dict | str = None,

25

path: str = None,

26

verbosity: int = 2,

27

init_args: dict = None,

28

fit_args: dict = None

29

):

30

"""

31

Initialize TabularClassifier.

32

33

Parameters:

34

- eval_metric: Evaluation metric for model selection

35

- time_limit: Maximum training time in seconds

36

- presets: Preset configurations for training

37

- hyperparameters: Custom hyperparameter configurations

38

- path: Directory to save models

39

- verbosity: Logging level (0-4)

40

- init_args: Additional initialization arguments

41

- fit_args: Additional fitting arguments

42

"""

43

44

def fit(

45

self,

46

X: pd.DataFrame | np.ndarray,

47

y: pd.Series | np.ndarray,

48

**kwargs

49

) -> 'TabularClassifier':

50

"""

51

Train the classifier on the provided data.

52

53

Parameters:

54

- X: Training features

55

- y: Training labels

56

- kwargs: Additional arguments passed to TabularPredictor.fit()

57

58

Returns:

59

Self (fitted TabularClassifier)

60

"""

61

62

def predict(

63

self,

64

X: pd.DataFrame | np.ndarray

65

) -> np.ndarray:

66

"""

67

Generate class predictions for input data.

68

69

Parameters:

70

- X: Input features

71

72

Returns:

73

Predicted class labels as numpy array

74

"""

75

76

def predict_proba(

77

self,

78

X: pd.DataFrame | np.ndarray

79

) -> np.ndarray:

80

"""

81

Generate class probabilities for input data.

82

83

Parameters:

84

- X: Input features

85

86

Returns:

87

Class probabilities as numpy array

88

"""

89

90

def score(

91

self,

92

X: pd.DataFrame | np.ndarray,

93

y: pd.Series | np.ndarray,

94

sample_weight: np.ndarray = None

95

) -> float:

96

"""

97

Calculate accuracy score on the given test data and labels.

98

99

Parameters:

100

- X: Test features

101

- y: True labels

102

- sample_weight: Sample weights for scoring

103

104

Returns:

105

Mean accuracy score

106

"""

107

```

108

109

### Tabular Regression

110

111

Scikit-learn compatible regressor interface that wraps AutoGluon's TabularPredictor for regression tasks with standard sklearn API conventions.

112

113

```python { .api }

114

class TabularRegressor:

115

"""

116

Scikit-learn compatible regressor using AutoGluon's automated ML.

117

118

Provides standard sklearn interface (fit, predict, score)

119

while leveraging AutoGluon's model selection and ensemble capabilities.

120

"""

121

122

def __init__(

123

self,

124

eval_metric: str = None,

125

time_limit: float = None,

126

presets: list[str] | str = None,

127

hyperparameters: dict | str = None,

128

path: str = None,

129

verbosity: int = 2,

130

init_args: dict = None,

131

fit_args: dict = None

132

):

133

"""

134

Initialize TabularRegressor.

135

136

Parameters:

137

- eval_metric: Evaluation metric for model selection

138

- time_limit: Maximum training time in seconds

139

- presets: Preset configurations for training

140

- hyperparameters: Custom hyperparameter configurations

141

- path: Directory to save models

142

- verbosity: Logging level (0-4)

143

- init_args: Additional initialization arguments

144

- fit_args: Additional fitting arguments

145

"""

146

147

def fit(

148

self,

149

X: pd.DataFrame | np.ndarray,

150

y: pd.Series | np.ndarray,

151

**kwargs

152

) -> 'TabularRegressor':

153

"""

154

Train the regressor on the provided data.

155

156

Parameters:

157

- X: Training features

158

- y: Training target values

159

- kwargs: Additional arguments passed to TabularPredictor.fit()

160

161

Returns:

162

Self (fitted TabularRegressor)

163

"""

164

165

def predict(

166

self,

167

X: pd.DataFrame | np.ndarray

168

) -> np.ndarray:

169

"""

170

Generate predictions for input data.

171

172

Parameters:

173

- X: Input features

174

175

Returns:

176

Predicted values as numpy array

177

"""

178

179

def score(

180

self,

181

X: pd.DataFrame | np.ndarray,

182

y: pd.Series | np.ndarray,

183

sample_weight: np.ndarray = None

184

) -> float:

185

"""

186

Calculate R² coefficient of determination on test data.

187

188

Parameters:

189

- X: Test features

190

- y: True target values

191

- sample_weight: Sample weights for scoring

192

193

Returns:

194

R² score

195

"""

196

```

197

198

## Usage Examples

199

200

### Classification with Scikit-learn Pipeline

201

202

```python

203

from autogluon.tabular.experimental import TabularClassifier

204

from sklearn.pipeline import Pipeline

205

from sklearn.preprocessing import StandardScaler

206

from sklearn.model_selection import cross_val_score

207

import pandas as pd

208

209

# Load data

210

X_train = pd.read_csv('X_train.csv')

211

y_train = pd.read_csv('y_train.csv').squeeze()

212

X_test = pd.read_csv('X_test.csv')

213

214

# Create sklearn-compatible classifier

215

classifier = TabularClassifier(

216

eval_metric='roc_auc',

217

verbosity=1

218

)

219

220

# Use in sklearn pipeline

221

pipeline = Pipeline([

222

('scaler', StandardScaler()),

223

('classifier', classifier)

224

])

225

226

# Cross-validation with sklearn

227

cv_scores = cross_val_score(

228

pipeline,

229

X_train,

230

y_train,

231

cv=5,

232

scoring='roc_auc'

233

)

234

235

print(f"Cross-validation AUC: {cv_scores.mean():.4f} (+/- {cv_scores.std() * 2:.4f})")

236

237

# Fit and predict

238

pipeline.fit(X_train, y_train)

239

predictions = pipeline.predict(X_test)

240

probabilities = pipeline.predict_proba(X_test)

241

```

242

243

### Regression with GridSearchCV

244

245

```python

246

from autogluon.tabular.experimental import TabularRegressor

247

from sklearn.model_selection import GridSearchCV

248

from sklearn.metrics import mean_squared_error

249

import pandas as pd

250

import numpy as np

251

252

# Load regression data

253

X_train = pd.read_csv('X_train.csv')

254

y_train = pd.read_csv('y_train.csv').squeeze()

255

X_test = pd.read_csv('X_test.csv')

256

y_test = pd.read_csv('y_test.csv').squeeze()

257

258

# Create regressor

259

regressor = TabularRegressor(verbosity=1)

260

261

# Grid search over AutoGluon parameters

262

param_grid = {

263

'eval_metric': ['mean_squared_error', 'mean_absolute_error'],

264

'time_limit': [300, 600],

265

'presets': ['good_quality', 'best_quality']

266

}

267

268

# Perform grid search

269

grid_search = GridSearchCV(

270

regressor,

271

param_grid,

272

cv=3,

273

scoring='neg_mean_squared_error',

274

n_jobs=1 # AutoGluon handles parallelization internally

275

)

276

277

# Fit with grid search

278

grid_search.fit(X_train, y_train)

279

280

# Best model predictions

281

best_model = grid_search.best_estimator_

282

predictions = best_model.predict(X_test)

283

284

# Evaluate

285

mse = mean_squared_error(y_test, predictions)

286

rmse = np.sqrt(mse)

287

288

print(f"Best parameters: {grid_search.best_params_}")

289

print(f"Test RMSE: {rmse:.4f}")

290

print(f"Test R²: {best_model.score(X_test, y_test):.4f}")

291

```

292

293

### Integration with Model Selection

294

295

```python

296

from autogluon.tabular.experimental import TabularClassifier, TabularRegressor

297

from sklearn.model_selection import train_test_split

298

from sklearn.ensemble import RandomForestClassifier

299

from sklearn.linear_model import LogisticRegression

300

from sklearn.metrics import classification_report

301

import pandas as pd

302

303

# Prepare data

304

X = pd.read_csv('features.csv')

305

y = pd.read_csv('target.csv').squeeze()

306

X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=42)

307

308

# Compare AutoGluon with sklearn models

309

models = {

310

'AutoGluon': TabularClassifier(time_limit=300, verbosity=0),

311

'RandomForest': RandomForestClassifier(n_estimators=100, random_state=42),

312

'LogisticRegression': LogisticRegression(random_state=42)

313

}

314

315

results = {}

316

for name, model in models.items():

317

# Fit model

318

model.fit(X_train, y_train)

319

320

# Predictions

321

predictions = model.predict(X_val)

322

323

# Store results

324

results[name] = {

325

'accuracy': model.score(X_val, y_val),

326

'predictions': predictions

327

}

328

329

print(f"\n{name} Results:")

330

print(f"Accuracy: {results[name]['accuracy']:.4f}")

331

print(classification_report(y_val, predictions))

332

```

333

334

### Advanced Usage with Custom Configurations

335

336

```python

337

from autogluon.tabular.experimental import TabularClassifier

338

339

# Custom hyperparameters for AutoGluon models

340

hyperparameters = {

341

'LGB': {'num_leaves': [26, 66, 176]},

342

'XGB': {'n_estimators': [50, 100, 200]},

343

'CAT': {'iterations': [100, 200, 500]}

344

}

345

346

# Advanced classifier with custom settings

347

classifier = TabularClassifier(

348

problem_type='multiclass',

349

eval_metric='f1_macro',

350

path='./sklearn_compatible_models/',

351

verbosity=2

352

)

353

354

# Fit with custom hyperparameters and advanced options

355

classifier.fit(

356

X_train,

357

y_train,

358

time_limit=900,

359

hyperparameters=hyperparameters,

360

num_bag_folds=5,

361

presets='best_quality'

362

)

363

364

# Access underlying AutoGluon predictor for advanced functionality

365

autogluon_predictor = classifier.predictor

366

leaderboard = autogluon_predictor.leaderboard(extra_info=True)

367

print(leaderboard)

368

369

# Standard sklearn predictions

370

predictions = classifier.predict(X_test)

371

probabilities = classifier.predict_proba(X_test)

372

```

373

374

## Notes

375

376

- **Experimental Status**: These interfaces are experimental and may change in future versions

377

- **Feature Compatibility**: Most AutoGluon features are accessible through the underlying predictor

378

- **Performance**: Same performance as using TabularPredictor directly

379

- **Integration**: Full compatibility with sklearn pipelines, grid search, and cross-validation

380

- **Memory**: Models are stored in the specified path directory for persistence

381

- **Parallelization**: AutoGluon handles internal parallelization; avoid nested parallelization in sklearn tools