or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

autogen.mdautoml.mddefault-estimators.mdindex.mdonline-learning.mdtuning.md

online-learning.mddocs/

0

# Online Learning

1

2

Automated online learning system using Vowpal Wabbit with multiple model management, adaptive resource allocation, and real-time model selection. The online learning module is designed for streaming data scenarios where models need to continuously adapt to new information.

3

4

## Capabilities

5

6

### AutoVW Class

7

8

Main class for automated online learning with Vowpal Wabbit, managing multiple models simultaneously and selecting the best performer dynamically.

9

10

```python { .api }

11

class AutoVW:

12

def __init__(self, max_live_model_num, search_space, init_config={},

13

min_resource_lease="auto", automl_runner_args={}, scheduler_args={},

14

model_select_policy="threshold_loss_ucb", metric="mae_clipped",

15

random_seed=None, model_selection_mode="min", cb_coef=None):

16

"""

17

Initialize AutoVW for automated online learning.

18

19

Args:

20

max_live_model_num (int): Maximum number of 'live' models to maintain

21

search_space (dict): Hyperparameter search space including both tunable

22

and fixed hyperparameters

23

init_config (dict): Initial partial or full configuration

24

min_resource_lease (str or float): Minimum resource lease for models ('auto' or float)

25

automl_runner_args (dict): Configuration for OnlineTrialRunner

26

scheduler_args (dict): Configuration for scheduler

27

model_select_policy (str): Model selection policy ('threshold_loss_ucb', etc.)

28

metric (str): Loss function metric ('mae_clipped', 'mae', 'mse', 'absolute_loss')

29

random_seed (int): Random seed for reproducibility

30

model_selection_mode (str): Optimization mode ('min' or 'max')

31

cb_coef (float): Sample complexity bound coefficient

32

"""

33

34

def predict(self, data_sample):

35

"""

36

Make prediction on a data sample.

37

38

Args:

39

data_sample: Input data sample in VW format or structured format

40

41

Returns:

42

Prediction value from the selected model

43

"""

44

45

def learn(self, data_sample):

46

"""

47

Update models with new data sample.

48

49

Args:

50

data_sample: Training data sample with features and label

51

"""

52

```

53

54

### Class Constants

55

56

```python { .api }

57

class AutoVW:

58

WARMSTART_NUM = 100 # Number of warmstart samples

59

AUTOMATIC = "_auto" # Automatic configuration identifier

60

VW_INTERACTION_ARG_NAME = "interactions" # VW interactions argument name

61

```

62

63

### Supporting Classes

64

65

#### VowpalWabbitTrial

66

67

Individual Vowpal Wabbit trial representing a single model configuration.

68

69

```python { .api }

70

class VowpalWabbitTrial:

71

"""

72

Individual VW model trial in online learning system.

73

Manages a single VW model instance with specific hyperparameters.

74

"""

75

```

76

77

#### OnlineTrialRunner

78

79

Manages execution and coordination of multiple online learning trials.

80

81

```python { .api }

82

class OnlineTrialRunner:

83

"""

84

Manages execution of online learning trials.

85

Coordinates multiple VW models and handles resource allocation.

86

"""

87

```

88

89

### Utility Functions

90

91

```python { .api }

92

def get_ns_feature_dim_from_vw_example(vw_example):

93

"""

94

Extract namespace feature dimensions from VW example.

95

96

Args:

97

vw_example (str): Vowpal Wabbit format example string

98

99

Returns:

100

dict: Dictionary mapping namespace to feature dimensions

101

"""

102

```

103

104

### Usage Examples

105

106

#### Basic Online Learning Setup

107

```python

108

from flaml import AutoVW

109

110

# Define search space for hyperparameters

111

search_space = {

112

"learning_rate": {"_type": "loguniform", "_value": [0.001, 1.0]},

113

"l1": {"_type": "loguniform", "_value": [1e-10, 1.0]},

114

"l2": {"_type": "loguniform", "_value": [1e-10, 1.0]},

115

"interactions": {"_type": "choice", "_value": [set(), {"ab"}, {"ac"}, {"ab", "ac"}]}

116

}

117

118

# Initialize AutoVW

119

autovw = AutoVW(

120

max_live_model_num=5,

121

search_space=search_space,

122

init_config={"learning_rate": 0.1},

123

metric="mae_clipped",

124

random_seed=42

125

)

126

127

# Simulate streaming data

128

for i, data_sample in enumerate(streaming_data):

129

# Make prediction

130

prediction = autovw.predict(data_sample)

131

132

# Update models with new sample

133

autovw.learn(data_sample)

134

135

if i % 1000 == 0:

136

print(f"Processed {i} samples, latest prediction: {prediction}")

137

```

138

139

#### Advanced Configuration with Custom Policies

140

```python

141

from flaml import AutoVW

142

143

# Advanced search space with multiple hyperparameters

144

search_space = {

145

"learning_rate": {"_type": "loguniform", "_value": [0.0001, 1.0]},

146

"power_t": {"_type": "uniform", "_value": [0.0, 1.0]},

147

"l1": {"_type": "loguniform", "_value": [1e-10, 1.0]},

148

"l2": {"_type": "loguniform", "_value": [1e-10, 1.0]},

149

"interactions": {"_type": "choice", "_value": [

150

set(), {"ab"}, {"ac"}, {"bc"}, {"ab", "ac"}, {"ab", "bc"}, {"ac", "bc"}

151

]},

152

"bit_precision": {"_type": "choice", "_value": [18, 20, 22, 24]}

153

}

154

155

# Custom runner and scheduler arguments

156

automl_runner_args = {

157

"champion_test_policy": "loss_ucb",

158

"remove_worse": True

159

}

160

161

scheduler_args = {

162

"resource_dimension": "sample_size",

163

"max_resource": 10000,

164

"reduction_factor": 2

165

}

166

167

# Initialize with advanced configuration

168

autovw = AutoVW(

169

max_live_model_num=10,

170

search_space=search_space,

171

init_config={"learning_rate": 0.05, "l1": 1e-6},

172

min_resource_lease=100,

173

automl_runner_args=automl_runner_args,

174

scheduler_args=scheduler_args,

175

model_select_policy="threshold_loss_ucb",

176

metric="mae", # Mean absolute error

177

cb_coef=0.1, # Confidence bound coefficient

178

random_seed=123

179

)

180

```

181

182

#### Integration with Data Streams

183

```python

184

import pandas as pd

185

from flaml import AutoVW

186

187

# Search space for regression task

188

search_space = {

189

"learning_rate": {"_type": "loguniform", "_value": [0.001, 0.5]},

190

"l1": {"_type": "loguniform", "_value": [1e-8, 0.1]},

191

"l2": {"_type": "loguniform", "_value": [1e-8, 0.1]}

192

}

193

194

autovw = AutoVW(

195

max_live_model_num=3,

196

search_space=search_space,

197

metric="mse",

198

model_selection_mode="min"

199

)

200

201

# Process streaming CSV data

202

def process_csv_stream(csv_file):

203

for chunk in pd.read_csv(csv_file, chunksize=1000):

204

for _, row in chunk.iterrows():

205

# Convert to VW format: label |features feature1:value1 feature2:value2

206

vw_sample = f"{row['target']} |features "

207

vw_sample += " ".join([f"{col}:{row[col]}" for col in chunk.columns if col != 'target'])

208

209

# Get prediction before updating

210

pred = autovw.predict(vw_sample)

211

212

# Update model

213

autovw.learn(vw_sample)

214

215

yield pred, row['target']

216

217

# Use with streaming data

218

predictions_and_actuals = list(process_csv_stream("streaming_data.csv"))

219

```

220

221

#### Multi-Class Classification Online Learning

222

```python

223

from flaml import AutoVW

224

225

# Search space for multi-class classification

226

search_space = {

227

"learning_rate": {"_type": "loguniform", "_value": [0.01, 1.0]},

228

"oaa": {"_type": "choice", "_value": [3, 5, 10]}, # One-Against-All classes

229

"loss_function": {"_type": "choice", "_value": ["logistic", "hinge"]}

230

}

231

232

# Initialize for classification

233

autovw_classifier = AutoVW(

234

max_live_model_num=4,

235

search_space=search_space,

236

init_config={"oaa": 3},

237

metric="absolute_loss",

238

random_seed=456

239

)

240

241

# Example with categorical features

242

def create_vw_multiclass_sample(features, label):

243

"""Convert features to VW multi-class format."""

244

vw_line = f"{label} |features "

245

246

for key, value in features.items():

247

if isinstance(value, str):

248

# Categorical feature

249

vw_line += f"{key}_{value}:1 "

250

else:

251

# Numerical feature

252

vw_line += f"{key}:{value} "

253

254

return vw_line.strip()

255

256

# Process multi-class data

257

sample_features = {"age": 25, "category": "A", "score": 0.8}

258

sample_label = 2 # Class label

259

260

vw_sample = create_vw_multiclass_sample(sample_features, sample_label)

261

prediction = autovw_classifier.predict(vw_sample)

262

autovw_classifier.learn(vw_sample)

263

```

264

265

#### Contextual Bandit Learning

266

```python

267

from flaml import AutoVW

268

269

# Search space for contextual bandits

270

search_space = {

271

"learning_rate": {"_type": "loguniform", "_value": [0.001, 0.1]},

272

"cb_explore_adf": {"_type": "choice", "_value": [True]},

273

"epsilon": {"_type": "uniform", "_value": [0.01, 0.3]}

274

}

275

276

# Initialize for contextual bandit

277

autovw_cb = AutoVW(

278

max_live_model_num=5,

279

search_space=search_space,

280

metric="cb_loss",

281

model_selection_mode="min"

282

)

283

284

def create_cb_sample(context, action, cost, probability):

285

"""Create contextual bandit VW format sample."""

286

# Format: cost:probability:action |context features

287

vw_line = f"{cost}:{probability}:{action} |context "

288

vw_line += " ".join([f"{k}:{v}" for k, v in context.items()])

289

return vw_line

290

291

# Example contextual bandit interaction

292

context = {"user_age": 30, "day_of_week": 2, "weather": 1}

293

action = 1 # Action taken

294

cost = 0.5 # Cost observed (lower is better)

295

probability = 0.2 # Probability of taking this action

296

297

cb_sample = create_cb_sample(context, action, cost, probability)

298

autovw_cb.learn(cb_sample)

299

300

# For prediction, provide context without action/cost

301

prediction_context = "1 |context user_age:25 day_of_week:3 weather:0"

302

predicted_action = autovw_cb.predict(prediction_context)

303

```

304

305

## Model Selection Policies

306

307

### Available Policies

308

- **threshold_loss_ucb**: Threshold-based selection with upper confidence bounds

309

- **loss_ucb**: Loss-based selection with confidence bounds

310

- **min_loss**: Select model with minimum observed loss

311

- **random**: Random model selection (baseline)

312

313

### Metrics

314

- **mae_clipped**: Mean absolute error with clipping

315

- **mae**: Mean absolute error

316

- **mse**: Mean squared error

317

- **absolute_loss**: Absolute loss (for classification)

318

- **squared_loss**: Squared loss

319

- **cb_loss**: Contextual bandit loss

320

321

### Advanced Trial Management

322

323

Lower-level components for managing individual Vowpal Wabbit trials and online trial execution.

324

325

```python { .api }

326

class VowpalWabbitTrial:

327

"""Individual Vowpal Wabbit trial with specific hyperparameters."""

328

329

def __init__(self, config, trial_id=None):

330

"""

331

Initialize VW trial.

332

333

Args:

334

config (dict): VW hyperparameter configuration

335

trial_id (str): Unique trial identifier

336

"""

337

338

def train_eval(self, data_sample, eval_only=False):

339

"""

340

Train and/or evaluate on data sample.

341

342

Args:

343

data_sample (str): VW-formatted data sample

344

eval_only (bool): Only evaluate without training

345

346

Returns:

347

dict: Performance metrics

348

"""

349

350

def predict(self, data_sample):

351

"""Make prediction on data sample."""

352

353

@property

354

def config(self):

355

"""dict: Trial configuration"""

356

357

@property

358

def trial_id(self):

359

"""str: Trial identifier"""

360

361

class OnlineTrialRunner:

362

"""Manager for running multiple online learning trials."""

363

364

def __init__(self, search_space, max_live_model_num=5, **kwargs):

365

"""

366

Initialize online trial runner.

367

368

Args:

369

search_space (dict): Hyperparameter search space

370

max_live_model_num (int): Maximum concurrent models

371

**kwargs: Additional configuration

372

"""

373

374

def step(self, data_sample):

375

"""

376

Process one data sample across all active trials.

377

378

Args:

379

data_sample (str): VW-formatted data sample

380

381

Returns:

382

dict: Aggregated results from all trials

383

"""

384

385

def get_best_trial(self):

386

"""Get currently best performing trial."""

387

388

def suggest_trial(self):

389

"""Suggest new trial configuration."""

390

391

def remove_trial(self, trial_id):

392

"""Remove trial from active set."""

393

```

394

395

## Integration Features

396

397

- **Vowpal Wabbit Backend**: Leverages VW's efficient online learning algorithms

398

- **Multi-Model Management**: Maintains multiple models with different hyperparameters

399

- **Adaptive Selection**: Dynamic model selection based on performance

400

- **Resource Management**: Intelligent allocation of computational resources

401

- **Streaming Data Support**: Designed for continuous data streams

402

- **Multiple Task Support**: Regression, classification, contextual bandits

403

- **Hyperparameter Optimization**: Automated search over hyperparameter space