0
# Regularizers
1
2
Regularizers apply penalties to layer parameters during training to reduce overfitting by constraining the complexity of the model. They add terms to the loss function that penalize large weights.
3
4
## Capabilities
5
6
### Base Regularizer
7
8
The abstract base class for all regularizers, providing the interface for regularization penalties.
9
10
```python { .api }
11
class Regularizer:
12
"""
13
Base class for weight regularizers.
14
15
All regularizers should inherit from this class and implement the __call__ method.
16
"""
17
def __call__(self, x):
18
"""
19
Compute the regularization penalty.
20
21
Parameters:
22
- x: Weight tensor to regularize
23
24
Returns:
25
Scalar tensor representing the regularization penalty
26
"""
27
```
28
29
### L1 Regularization
30
31
L1 regularization adds a penalty term proportional to the sum of the absolute values of the weights, promoting sparsity.
32
33
```python { .api }
34
class L1(Regularizer):
35
"""
36
L1 regularization penalty.
37
38
Adds a penalty term proportional to the sum of absolute values of weights.
39
Promotes sparsity by driving some weights to exactly zero.
40
41
Usage:
42
```python
43
layer = Dense(10, kernel_regularizer=L1(0.01))
44
# or
45
layer = Dense(10, kernel_regularizer='l1')
46
```
47
"""
48
def __init__(self, l1=0.01):
49
"""
50
Initialize the L1 regularizer.
51
52
Parameters:
53
- l1: L1 regularization factor (default: 0.01)
54
"""
55
56
def l1(l1=0.01):
57
"""
58
Create an L1 regularizer.
59
60
Parameters:
61
- l1: L1 regularization factor (default: 0.01)
62
63
Returns:
64
L1 regularizer instance
65
"""
66
```
67
68
### L2 Regularization
69
70
L2 regularization adds a penalty term proportional to the sum of squares of the weights, promoting small weights.
71
72
```python { .api }
73
class L2(Regularizer):
74
"""
75
L2 regularization penalty.
76
77
Adds a penalty term proportional to the sum of squares of weights.
78
Promotes small weights and smooth solutions.
79
80
Usage:
81
```python
82
layer = Dense(10, kernel_regularizer=L2(0.01))
83
# or
84
layer = Dense(10, kernel_regularizer='l2')
85
```
86
"""
87
def __init__(self, l2=0.01):
88
"""
89
Initialize the L2 regularizer.
90
91
Parameters:
92
- l2: L2 regularization factor (default: 0.01)
93
"""
94
95
def l2(l2=0.01):
96
"""
97
Create an L2 regularizer.
98
99
Parameters:
100
- l2: L2 regularization factor (default: 0.01)
101
102
Returns:
103
L2 regularizer instance
104
"""
105
```
106
107
### Combined L1L2 Regularization
108
109
Combines both L1 and L2 regularization penalties, providing benefits of both sparsity and weight decay.
110
111
```python { .api }
112
class L1L2(Regularizer):
113
"""
114
Combined L1 and L2 regularization penalty.
115
116
Combines both L1 and L2 penalties, providing both sparsity (L1) and
117
weight decay (L2) effects.
118
119
Usage:
120
```python
121
layer = Dense(10, kernel_regularizer=L1L2(l1=0.01, l2=0.01))
122
# or
123
layer = Dense(10, kernel_regularizer='l1_l2')
124
```
125
"""
126
def __init__(self, l1=0.0, l2=0.0):
127
"""
128
Initialize the L1L2 regularizer.
129
130
Parameters:
131
- l1: L1 regularization factor (default: 0.0)
132
- l2: L2 regularization factor (default: 0.0)
133
"""
134
135
def l1_l2(l1=0.01, l2=0.01):
136
"""
137
Create a combined L1L2 regularizer.
138
139
Parameters:
140
- l1: L1 regularization factor (default: 0.01)
141
- l2: L2 regularization factor (default: 0.01)
142
143
Returns:
144
L1L2 regularizer instance
145
"""
146
```
147
148
### Orthogonal Regularization
149
150
Encourages weight matrices to be orthogonal, which can help with gradient flow and representational diversity.
151
152
```python { .api }
153
class OrthogonalRegularizer(Regularizer):
154
"""
155
Orthogonal regularization penalty.
156
157
Encourages weight matrices to be orthogonal by penalizing the deviation
158
from orthogonality. Useful for maintaining diverse representations and
159
improving gradient flow.
160
161
Usage:
162
```python
163
layer = Dense(10, kernel_regularizer=OrthogonalRegularizer(factor=0.01))
164
```
165
"""
166
def __init__(self, factor=0.01, mode='rows'):
167
"""
168
Initialize the OrthogonalRegularizer.
169
170
Parameters:
171
- factor: Regularization strength (default: 0.01)
172
- mode: 'rows' or 'columns' - which dimension to orthogonalize (default: 'rows')
173
"""
174
175
def orthogonal_regularizer(factor=0.01, mode='rows'):
176
"""
177
Create an orthogonal regularizer.
178
179
Parameters:
180
- factor: Regularization strength (default: 0.01)
181
- mode: 'rows' or 'columns' - which dimension to orthogonalize (default: 'rows')
182
183
Returns:
184
OrthogonalRegularizer instance
185
"""
186
```
187
188
### Utility Functions
189
190
Helper functions for regularizer management and serialization.
191
192
```python { .api }
193
def serialize(regularizer):
194
"""
195
Serialize a regularizer to a string or config dict.
196
197
Parameters:
198
- regularizer: Regularizer to serialize
199
200
Returns:
201
String identifier or config dictionary
202
"""
203
204
def deserialize(config, custom_objects=None):
205
"""
206
Deserialize a regularizer from a string or config dict.
207
208
Parameters:
209
- config: String identifier or config dictionary
210
- custom_objects: Optional dict mapping names to custom objects
211
212
Returns:
213
Regularizer instance
214
"""
215
216
def get(identifier):
217
"""
218
Retrieve a regularizer by string identifier.
219
220
Parameters:
221
- identifier: String name or regularizer instance
222
223
Returns:
224
Regularizer instance
225
"""
226
```
227
228
## Usage Examples
229
230
### Basic Regularization
231
232
```python
233
import keras
234
from keras import regularizers
235
236
# Using string identifiers
237
model = keras.Sequential([
238
keras.layers.Dense(64, kernel_regularizer='l2', activation='relu'),
239
keras.layers.Dense(32, kernel_regularizer='l1', activation='relu'),
240
keras.layers.Dense(10, activation='softmax')
241
])
242
243
# Using regularizer classes directly
244
model = keras.Sequential([
245
keras.layers.Dense(64,
246
kernel_regularizer=regularizers.L2(0.01),
247
bias_regularizer=regularizers.L1(0.01),
248
activation='relu'),
249
keras.layers.Dense(32,
250
kernel_regularizer=regularizers.L1L2(l1=0.01, l2=0.01),
251
activation='relu'),
252
keras.layers.Dense(10, activation='softmax')
253
])
254
```
255
256
### Advanced Regularization
257
258
```python
259
import keras
260
from keras import regularizers
261
262
# Orthogonal regularization for maintaining diverse representations
263
layer = keras.layers.Dense(
264
128,
265
kernel_regularizer=regularizers.OrthogonalRegularizer(factor=0.01),
266
activation='relu'
267
)
268
269
# Different regularizers for different parts of the layer
270
layer = keras.layers.Dense(
271
64,
272
kernel_regularizer=regularizers.L2(0.01), # Weight regularization
273
bias_regularizer=regularizers.L1(0.01), # Bias regularization
274
activity_regularizer=regularizers.L1(0.01), # Output regularization
275
activation='relu'
276
)
277
278
# Custom regularization strength
279
strong_l2 = regularizers.L2(0.1) # Strong regularization
280
weak_l1 = regularizers.L1(0.001) # Weak regularization
281
282
model = keras.Sequential([
283
keras.layers.Dense(128, kernel_regularizer=strong_l2, activation='relu'),
284
keras.layers.Dropout(0.5), # Combine with dropout for better regularization
285
keras.layers.Dense(64, kernel_regularizer=weak_l1, activation='relu'),
286
keras.layers.Dense(10, activation='softmax')
287
])
288
```
289
290
### Regularization in Different Layer Types
291
292
```python
293
import keras
294
from keras import regularizers
295
296
# Convolutional layers
297
conv_model = keras.Sequential([
298
keras.layers.Conv2D(32, 3,
299
kernel_regularizer=regularizers.L2(0.01),
300
activation='relu'),
301
keras.layers.Conv2D(64, 3,
302
kernel_regularizer=regularizers.L1L2(l1=0.01, l2=0.01),
303
activation='relu'),
304
keras.layers.GlobalAveragePooling2D(),
305
keras.layers.Dense(10, activation='softmax')
306
])
307
308
# Recurrent layers
309
rnn_model = keras.Sequential([
310
keras.layers.LSTM(64,
311
kernel_regularizer=regularizers.L2(0.01),
312
recurrent_regularizer=regularizers.L1(0.01),
313
return_sequences=True),
314
keras.layers.LSTM(32,
315
kernel_regularizer=regularizers.OrthogonalRegularizer(0.01)),
316
keras.layers.Dense(10, activation='softmax')
317
])
318
```
319
320
## Regularization Guidelines
321
322
### When to Use Each Type:
323
324
- **L1 Regularization**: Use when you want sparse weights (feature selection)
325
- **L2 Regularization**: Use for general overfitting prevention and smooth solutions
326
- **L1L2 Regularization**: Use when you want both sparsity and weight decay
327
- **Orthogonal Regularization**: Use when you want diverse, uncorrelated representations
328
329
### Typical Regularization Strengths:
330
331
- **Light regularization**: 0.001 - 0.01
332
- **Moderate regularization**: 0.01 - 0.1
333
- **Strong regularization**: 0.1 - 1.0
334
335
### Best Practices:
336
337
1. Start with L2 regularization (0.01) as a baseline
338
2. Combine with dropout for better regularization
339
3. Use different strengths for different layers
340
4. Monitor validation loss to tune regularization strength
341
5. Apply regularization primarily to dense layers rather than convolutional layers