Tessl Tile for pypi/imbalanced-learn@0.14.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

combination.md datasets.md deep-learning.md ensemble.md index.md metrics.md model-selection.md over-sampling.md pipeline.md under-sampling.md utilities.md

deep-learning.mddocs/

0
# Deep Learning Integration
1

2
Utilities for handling imbalanced datasets in deep learning frameworks, providing balanced batch generators for Keras and TensorFlow that ensure fair representation of all classes during training.
3

4
## Overview
5

6
Imbalanced-learn provides specialized batch generators for deep learning frameworks that address class imbalance by creating balanced batches during training. These tools integrate seamlessly with Keras and TensorFlow workflows while maintaining the benefits of sampling techniques.
7

8
### Key Features
9
- **Balanced batch generation**: Ensures each batch contains balanced class representation
10
- **Framework compatibility**: Native support for Keras and TensorFlow
11
- **Sampling integration**: Uses imblearn samplers for batch balancing
12
- **Memory efficiency**: Generates balanced batches on-demand without duplicating entire dataset
13
- **Sparse data support**: Handles both dense and sparse input matrices
14

15
### Supported Frameworks
16
- **Keras**: Via `BalancedBatchGenerator` class and `balanced_batch_generator` function
17
- **TensorFlow**: Via `balanced_batch_generator` function
18

19
## Keras Integration
20

21
### BalancedBatchGenerator
22

23
#### BalancedBatchGenerator
24

25
```python
26
{ .api }
27
class BalancedBatchGenerator:
28
    def __init__(
29
        self,
30
        X,
31
        y,
32
        *,
33
        sample_weight=None,
34
        sampler=None,
35
        batch_size=32,
36
        keep_sparse=False,
37
        random_state=None
38
    ): ...
39
    def __len__(self): ...
40
    def __getitem__(self, index): ...
41
```
42

43
Create balanced batches when training a keras model using the Sequence API.
44

45
**Parameters:**
46
- **X** (`ndarray` of shape `(n_samples, n_features)`): Original imbalanced dataset
47
- **y** (`ndarray` of shape `(n_samples,)` or `(n_samples, n_classes)`): Associated targets
48
- **sample_weight** (`ndarray` of shape `(n_samples,)`, default=`None`): Sample weight
49
- **sampler** (`sampler object`, default=`None`): A sampler instance which has an attribute `sample_indices_`. By default, the sampler used is a `RandomUnderSampler`
50
- **batch_size** (`int`, default=`32`): Number of samples per gradient update
51
- **keep_sparse** (`bool`, default=`False`): Either or not to conserve or not the sparsity of the input. By default, the returned batches will be dense
52
- **random_state** (`int`, `RandomState` instance or `None`, default=`None`): Control the randomization of the algorithm
53

54
**Attributes:**
55
- **sampler_** (`sampler object`): The sampler used to balance the dataset
56
- **indices_** (`ndarray` of shape `(n_samples, n_features)`): The indices of the samples selected during sampling
57

58
**Methods:**
59

60
##### __len__
61

62
```python
63
def __len__(self) -> int
64
```
65

66
Returns the number of batches per epoch.
67

68
##### __getitem__
69

70
```python
71
def __getitem__(self, index) -> tuple[ndarray, ndarray] | tuple[ndarray, ndarray, ndarray]
72
```
73

74
Generate one batch of data.
75

76
**Parameters:**
77
- **index** (`int`): Batch index
78

79
**Returns:**
80
- **batch** (`tuple`): Either `(X_batch, y_batch)` or `(X_batch, y_batch, sample_weight_batch)` if sample weights are provided
81

82
**Usage with Keras:**
83
The class implements the Keras `Sequence` interface for use with `model.fit()`:
84

85
```python
86
from imblearn.keras import BalancedBatchGenerator
87
from imblearn.under_sampling import NearMiss
88
import tensorflow.keras as keras
89

90
# Create balanced batch generator
91
training_generator = BalancedBatchGenerator(
92
    X, y, 
93
    sampler=NearMiss(), 
94
    batch_size=32, 
95
    random_state=42
96
)
97

98
# Use with Keras model
99
model = keras.Sequential([
100
    keras.layers.Dense(64, activation='relu', input_shape=(X.shape[1],)),
101
    keras.layers.Dense(1, activation='sigmoid')
102
])
103

104
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
105
history = model.fit(training_generator, epochs=10)
106
```
107

108
#### balanced_batch_generator (Keras)
109

110
```python
111
{ .api }
112
def balanced_batch_generator(
113
    X,
114
    y,
115
    *,
116
    sample_weight=None,
117
    sampler=None,
118
    batch_size=32,
119
    keep_sparse=False,
120
    random_state=None
121
) -> tuple[Generator, int]
122
```
123

124
Create a balanced batch generator to train keras model.
125

126
**Parameters:**
127
- **X** (`ndarray` of shape `(n_samples, n_features)`): Original imbalanced dataset
128
- **y** (`ndarray` of shape `(n_samples,)` or `(n_samples, n_classes)`): Associated targets
129
- **sample_weight** (`ndarray` of shape `(n_samples,)`, default=`None`): Sample weight
130
- **sampler** (`sampler object`, default=`None`): A sampler instance which has an attribute `sample_indices_`. By default, the sampler used is a `RandomUnderSampler`
131
- **batch_size** (`int`, default=`32`): Number of samples per gradient update
132
- **keep_sparse** (`bool`, default=`False`): Either or not to conserve or not the sparsity of the input. By default, the returned batches will be dense
133
- **random_state** (`int`, `RandomState` instance or `None`, default=`None`): Control the randomization of the algorithm
134

135
**Returns:**
136
- **generator** (`generator` of `tuple`): Generate batch of data. The tuple generated are either `(X_batch, y_batch)` or `(X_batch, y_batch, sample_weight_batch)`
137
- **steps_per_epoch** (`int`): The number of samples per epoch. Required by `fit_generator` in keras
138

139
**Usage Example:**
140
```python
141
from imblearn.keras import balanced_batch_generator
142
from imblearn.under_sampling import EditedNearestNeighbours
143

144
training_generator, steps_per_epoch = balanced_batch_generator(
145
    X, y, 
146
    sampler=EditedNearestNeighbours(), 
147
    batch_size=64, 
148
    random_state=42
149
)
150

151
# Use with older Keras API
152
history = model.fit_generator(
153
    training_generator,
154
    steps_per_epoch=steps_per_epoch,
155
    epochs=20
156
)
157
```
158

159
## TensorFlow Integration
160

161
### balanced_batch_generator (TensorFlow)
162

163
#### balanced_batch_generator
164

165
```python
166
{ .api }
167
def balanced_batch_generator(
168
    X,
169
    y,
170
    *,
171
    sample_weight=None,
172
    sampler=None,
173
    batch_size=32,
174
    keep_sparse=False,
175
    random_state=None
176
) -> tuple[Generator, int]
177
```
178

179
Create a balanced batch generator to train tensorflow model.
180

181
**Parameters:**
182
- **X** (`ndarray` of shape `(n_samples, n_features)`): Original imbalanced dataset
183
- **y** (`ndarray` of shape `(n_samples,)` or `(n_samples, n_classes)`): Associated targets  
184
- **sample_weight** (`ndarray` of shape `(n_samples,)`, default=`None`): Sample weight
185
- **sampler** (`sampler object`, default=`None`): A sampler instance which has an attribute `sample_indices_`. By default, the sampler used is a `RandomUnderSampler`
186
- **batch_size** (`int`, default=`32`): Number of samples per gradient update
187
- **keep_sparse** (`bool`, default=`False`): Either or not to conserve or not the sparsity of the input `X`. By default, the returned batches will be dense
188
- **random_state** (`int`, `RandomState` instance or `None`, default=`None`): Control the randomization of the algorithm
189

190
**Returns:**
191
- **generator** (`generator` of `tuple`): Generate batch of data. The tuple generated are either `(X_batch, y_batch)` or `(X_batch, y_batch, sample_weight_batch)`
192
- **steps_per_epoch** (`int`): The number of samples per epoch
193

194
**Generator Function:**
195
The returned generator infinitely loops through balanced batches:
196
1. Applies the sampler to balance the dataset 
197
2. Shuffles the resampled indices
198
3. Creates batches of the specified size
199
4. Yields batches cyclically for training
200

201
**Usage with TensorFlow:**
202
```python
203
from imblearn.tensorflow import balanced_batch_generator
204
from imblearn.over_sampling import SMOTE
205
import tensorflow as tf
206

207
# Create generator
208
training_generator, steps_per_epoch = balanced_batch_generator(
209
    X, y,
210
    sampler=SMOTE(random_state=42),
211
    batch_size=128,
212
    random_state=42
213
)
214

215
# Use with tf.keras
216
model = tf.keras.Sequential([
217
    tf.keras.layers.Dense(128, activation='relu'),
218
    tf.keras.layers.Dropout(0.3),
219
    tf.keras.layers.Dense(3, activation='softmax')
220
])
221

222
model.compile(
223
    optimizer='adam',
224
    loss='categorical_crossentropy', 
225
    metrics=['accuracy']
226
)
227

228
history = model.fit(
229
    training_generator,
230
    steps_per_epoch=steps_per_epoch,
231
    epochs=50,
232
    validation_data=(X_val, y_val)
233
)
234
```
235

236
## Sampler Integration
237

238
### Compatible Samplers
239

240
All imblearn samplers with the `sample_indices_` attribute can be used:
241

242
**Over-sampling Methods:**
243
```python
244
from imblearn.over_sampling import SMOTE, ADASYN, BorderlineSMOTE
245
from imblearn.keras import BalancedBatchGenerator
246

247
# Using SMOTE
248
generator = BalancedBatchGenerator(X, y, sampler=SMOTE(k_neighbors=3))
249

250
# Using ADASYN  
251
generator = BalancedBatchGenerator(X, y, sampler=ADASYN(n_neighbors=5))
252
```
253

254
**Under-sampling Methods:**
255
```python
256
from imblearn.under_sampling import RandomUnderSampler, TomekLinks, EditedNearestNeighbours
257

258
# Using random under-sampling
259
generator = BalancedBatchGenerator(X, y, sampler=RandomUnderSampler())
260

261
# Using Tomek links cleaning
262
generator = BalancedBatchGenerator(X, y, sampler=TomekLinks())
263
```
264

265
**Combination Methods:**
266
```python
267
from imblearn.combine import SMOTEENN, SMOTETomek
268

269
# Using SMOTE + Edited Nearest Neighbours
270
generator = BalancedBatchGenerator(X, y, sampler=SMOTEENN())
271

272
# Using SMOTE + Tomek links
273
generator = BalancedBatchGenerator(X, y, sampler=SMOTETomek())
274
```
275

276
## Advanced Usage Patterns
277

278
### Multi-Class Classification
279

280
```python
281
from sklearn.datasets import make_classification
282
from imblearn.keras import BalancedBatchGenerator
283
from imblearn.over_sampling import SMOTE
284
import tensorflow.keras as keras
285

286
# Create multi-class imbalanced dataset
287
X, y = make_classification(
288
    n_classes=3, 
289
    n_informative=5,
290
    weights=[0.7, 0.2, 0.1],
291
    n_samples=2000,
292
    random_state=42
293
)
294

295
# Convert to categorical
296
y_cat = keras.utils.to_categorical(y, 3)
297

298
# Create balanced generator
299
generator = BalancedBatchGenerator(
300
    X, y_cat,
301
    sampler=SMOTE(random_state=42),
302
    batch_size=64,
303
    random_state=42
304
)
305

306
# Multi-class model
307
model = keras.Sequential([
308
    keras.layers.Dense(64, activation='relu', input_shape=(X.shape[1],)),
309
    keras.layers.BatchNormalization(),
310
    keras.layers.Dropout(0.3),
311
    keras.layers.Dense(32, activation='relu'),
312
    keras.layers.Dense(3, activation='softmax')
313
])
314

315
model.compile(
316
    optimizer='adam',
317
    loss='categorical_crossentropy',
318
    metrics=['accuracy', 'categorical_accuracy']
319
)
320

321
history = model.fit(generator, epochs=100, verbose=1)
322
```
323

324
### Sparse Data Handling
325

326
```python
327
from scipy.sparse import csr_matrix
328
from imblearn.tensorflow import balanced_batch_generator
329

330
# Convert to sparse matrix
331
X_sparse = csr_matrix(X)
332

333
# Keep data sparse during batch generation
334
generator, steps = balanced_batch_generator(
335
    X_sparse, y,
336
    keep_sparse=True,
337
    batch_size=32
338
)
339

340
# Use with TensorFlow model that handles sparse input
341
for batch_X, batch_y in generator:
342
    if batch_X.issparse():
343
        batch_X = batch_X.toarray()  # Convert if needed
344
    # Train with batch
345
```
346

347
### Sample Weight Integration
348

349
```python
350
from sklearn.utils.class_weight import compute_sample_weight
351

352
# Compute sample weights
353
sample_weights = compute_sample_weight('balanced', y)
354

355
# Use with generator
356
generator = BalancedBatchGenerator(
357
    X, y,
358
    sample_weight=sample_weights,
359
    sampler=SMOTE(),
360
    batch_size=32
361
)
362

363
# Each batch will include sample weights
364
for batch_data in generator:
365
    X_batch, y_batch, weights_batch = batch_data
366
    # Use weights in training
367
```
368

369
## Framework Comparison
370

371
### Keras vs TensorFlow Generators
372

373
| Feature | Keras BalancedBatchGenerator | TensorFlow balanced_batch_generator |
374
|---------|------------------------------|-------------------------------------|
375
| **API** | Keras Sequence interface | Plain generator function |
376
| **Integration** | `model.fit(generator)` | `model.fit(generator, steps_per_epoch=steps)` |
377
| **Memory** | Sequence protocol | Manual iteration control |
378
| **Features** | Full Keras integration | More flexible, lower-level |
379
380
## Best Practices
381

382
1. **Choose appropriate sampler**: Match sampler to your problem characteristics
383
2. **Batch size considerations**: Balance memory usage with training stability
384
3. **Reproducibility**: Always set `random_state` for consistent results
385
4. **Validation strategy**: Use separate validation data, don't apply sampling to validation
386
5. **Monitor class distribution**: Verify balanced batches are being generated
387

388
**Complete Training Example:**
389
```python
390
from imblearn.keras import BalancedBatchGenerator
391
from imblearn.over_sampling import SMOTE
392
from sklearn.model_selection import train_test_split
393
import tensorflow.keras as keras
394

395
# Split data
396
X_train, X_val, y_train, y_val = train_test_split(
397
    X, y, test_size=0.2, stratify=y, random_state=42
398
)
399

400
# Create balanced training generator
401
train_generator = BalancedBatchGenerator(
402
    X_train, y_train,
403
    sampler=SMOTE(random_state=42),
404
    batch_size=64,
405
    random_state=42
406
)
407

408
# Build model
409
model = keras.Sequential([
410
    keras.layers.Dense(128, activation='relu', input_shape=(X_train.shape[1],)),
411
    keras.layers.BatchNormalization(),
412
    keras.layers.Dropout(0.5),
413
    keras.layers.Dense(64, activation='relu'),
414
    keras.layers.BatchNormalization(), 
415
    keras.layers.Dropout(0.3),
416
    keras.layers.Dense(1, activation='sigmoid')
417
])
418

419
# Compile with class-aware metrics
420
model.compile(
421
    optimizer=keras.optimizers.Adam(learning_rate=0.001),
422
    loss='binary_crossentropy',
423
    metrics=['accuracy', 'precision', 'recall']
424
)
425

426
# Train with early stopping
427
callbacks = [
428
    keras.callbacks.EarlyStopping(patience=10, restore_best_weights=True),
429
    keras.callbacks.ReduceLROnPlateau(factor=0.5, patience=5)
430
]
431

432
history = model.fit(
433
    train_generator,
434
    validation_data=(X_val, y_val),
435
    epochs=100,
436
    callbacks=callbacks,
437
    verbose=1
438
)
439
```

Version

Tile

Files

deep-learning.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

deep-learning.mddocs/