0
# Weight Initializers
1
2
Weight initializers determine how layer weights are initialized before training. Proper initialization is crucial for effective training and convergence of neural networks.
3
4
## Capabilities
5
6
### Base Initializer
7
8
The abstract base class for all weight initializers, providing the interface for weight initialization.
9
10
```python { .api }
11
class Initializer:
12
"""
13
Base class for weight initializers.
14
15
All initializers should inherit from this class and implement the __call__ method.
16
"""
17
def __call__(self, shape, dtype=None, **kwargs):
18
"""
19
Generate initial weights.
20
21
Parameters:
22
- shape: Shape of the weight tensor to initialize
23
- dtype: Data type of the weights (default: None)
24
- **kwargs: Additional initializer-specific arguments
25
26
Returns:
27
Tensor of initialized weights
28
"""
29
```
30
31
### Constant Initializers
32
33
Initializers that set weights to constant values or specific patterns.
34
35
```python { .api }
36
class Zeros(Initializer):
37
"""
38
Initializes weights to zero.
39
40
Usage:
41
```python
42
layer = Dense(10, kernel_initializer='zeros')
43
# or
44
layer = Dense(10, kernel_initializer=Zeros())
45
```
46
"""
47
def __init__(self):
48
"""Initialize the Zeros initializer."""
49
50
class Ones(Initializer):
51
"""
52
Initializes weights to one.
53
54
Usage:
55
```python
56
layer = Dense(10, kernel_initializer='ones')
57
# or
58
layer = Dense(10, kernel_initializer=Ones())
59
```
60
"""
61
def __init__(self):
62
"""Initialize the Ones initializer."""
63
64
class Constant(Initializer):
65
"""
66
Initializes weights to a constant value.
67
68
Usage:
69
```python
70
layer = Dense(10, kernel_initializer=Constant(value=0.5))
71
```
72
"""
73
def __init__(self, value=0.0):
74
"""
75
Initialize the Constant initializer.
76
77
Parameters:
78
- value: Constant value to initialize weights to (default: 0.0)
79
"""
80
81
class Identity(Initializer):
82
"""
83
Initializes weights to the identity matrix (for square matrices).
84
85
For non-square matrices, initializes with identity matrix in the center.
86
87
Usage:
88
```python
89
layer = Dense(10, kernel_initializer='identity')
90
# or
91
layer = Dense(10, kernel_initializer=Identity(gain=1.0))
92
```
93
"""
94
def __init__(self, gain=1.0):
95
"""
96
Initialize the Identity initializer.
97
98
Parameters:
99
- gain: Scaling factor for the identity matrix (default: 1.0)
100
"""
101
```
102
103
### Random Initializers
104
105
Initializers that generate random weights from various probability distributions.
106
107
```python { .api }
108
class RandomNormal(Initializer):
109
"""
110
Initializes weights with random values from a normal distribution.
111
112
Usage:
113
```python
114
layer = Dense(10, kernel_initializer=RandomNormal(mean=0.0, stddev=0.05))
115
```
116
"""
117
def __init__(self, mean=0.0, stddev=0.05, seed=None):
118
"""
119
Initialize the RandomNormal initializer.
120
121
Parameters:
122
- mean: Mean of the normal distribution (default: 0.0)
123
- stddev: Standard deviation of the normal distribution (default: 0.05)
124
- seed: Random seed for reproducibility (default: None)
125
"""
126
127
class RandomUniform(Initializer):
128
"""
129
Initializes weights with random values from a uniform distribution.
130
131
Usage:
132
```python
133
layer = Dense(10, kernel_initializer=RandomUniform(minval=-0.1, maxval=0.1))
134
```
135
"""
136
def __init__(self, minval=-0.05, maxval=0.05, seed=None):
137
"""
138
Initialize the RandomUniform initializer.
139
140
Parameters:
141
- minval: Lower bound of the uniform distribution (default: -0.05)
142
- maxval: Upper bound of the uniform distribution (default: 0.05)
143
- seed: Random seed for reproducibility (default: None)
144
"""
145
146
class TruncatedNormal(Initializer):
147
"""
148
Initializes weights with truncated normal distribution.
149
150
Values more than 2 standard deviations from mean are discarded and redrawn.
151
152
Usage:
153
```python
154
layer = Dense(10, kernel_initializer=TruncatedNormal(stddev=0.1))
155
```
156
"""
157
def __init__(self, mean=0.0, stddev=0.05, seed=None):
158
"""
159
Initialize the TruncatedNormal initializer.
160
161
Parameters:
162
- mean: Mean of the truncated normal distribution (default: 0.0)
163
- stddev: Standard deviation before truncation (default: 0.05)
164
- seed: Random seed for reproducibility (default: None)
165
"""
166
```
167
168
### Variance Scaling Initializers
169
170
Initializers that scale the variance based on the number of input and output units.
171
172
```python { .api }
173
class VarianceScaling(Initializer):
174
"""
175
Base class for variance scaling initializers.
176
177
Scales variance based on fan-in, fan-out, or their average.
178
179
Usage:
180
```python
181
layer = Dense(10, kernel_initializer=VarianceScaling(
182
scale=2.0, mode='fan_in', distribution='truncated_normal'
183
))
184
```
185
"""
186
def __init__(self, scale=1.0, mode='fan_in', distribution='truncated_normal', seed=None):
187
"""
188
Initialize the VarianceScaling initializer.
189
190
Parameters:
191
- scale: Scaling factor for the variance (default: 1.0)
192
- mode: 'fan_in', 'fan_out', or 'fan_avg' (default: 'fan_in')
193
- distribution: 'normal', 'uniform', or 'truncated_normal' (default: 'truncated_normal')
194
- seed: Random seed for reproducibility (default: None)
195
"""
196
197
class GlorotNormal(VarianceScaling):
198
"""
199
Glorot normal initializer (Xavier normal).
200
201
Draws samples from truncated normal with stddev = sqrt(2 / (fan_in + fan_out)).
202
203
Usage:
204
```python
205
layer = Dense(10, kernel_initializer='glorot_normal')
206
# or
207
layer = Dense(10, kernel_initializer=GlorotNormal())
208
```
209
"""
210
def __init__(self, seed=None):
211
"""
212
Initialize the GlorotNormal initializer.
213
214
Parameters:
215
- seed: Random seed for reproducibility (default: None)
216
"""
217
218
class GlorotUniform(VarianceScaling):
219
"""
220
Glorot uniform initializer (Xavier uniform).
221
222
Draws samples from uniform distribution within [-limit, limit] where
223
limit = sqrt(6 / (fan_in + fan_out)).
224
225
Usage:
226
```python
227
layer = Dense(10, kernel_initializer='glorot_uniform')
228
# or
229
layer = Dense(10, kernel_initializer=GlorotUniform())
230
```
231
"""
232
def __init__(self, seed=None):
233
"""
234
Initialize the GlorotUniform initializer.
235
236
Parameters:
237
- seed: Random seed for reproducibility (default: None)
238
"""
239
240
class HeNormal(VarianceScaling):
241
"""
242
He normal initializer (Kaiming normal).
243
244
Draws samples from truncated normal with stddev = sqrt(2 / fan_in).
245
Recommended for ReLU activations.
246
247
Usage:
248
```python
249
layer = Dense(10, kernel_initializer='he_normal')
250
# or
251
layer = Dense(10, kernel_initializer=HeNormal())
252
```
253
"""
254
def __init__(self, seed=None):
255
"""
256
Initialize the HeNormal initializer.
257
258
Parameters:
259
- seed: Random seed for reproducibility (default: None)
260
"""
261
262
class HeUniform(VarianceScaling):
263
"""
264
He uniform initializer (Kaiming uniform).
265
266
Draws samples from uniform distribution within [-limit, limit] where
267
limit = sqrt(6 / fan_in). Recommended for ReLU activations.
268
269
Usage:
270
```python
271
layer = Dense(10, kernel_initializer='he_uniform')
272
# or
273
layer = Dense(10, kernel_initializer=HeUniform())
274
```
275
"""
276
def __init__(self, seed=None):
277
"""
278
Initialize the HeUniform initializer.
279
280
Parameters:
281
- seed: Random seed for reproducibility (default: None)
282
"""
283
284
class LecunNormal(VarianceScaling):
285
"""
286
Lecun normal initializer.
287
288
Draws samples from truncated normal with stddev = sqrt(1 / fan_in).
289
Recommended for SELU activations.
290
291
Usage:
292
```python
293
layer = Dense(10, kernel_initializer='lecun_normal')
294
# or
295
layer = Dense(10, kernel_initializer=LecunNormal())
296
```
297
"""
298
def __init__(self, seed=None):
299
"""
300
Initialize the LecunNormal initializer.
301
302
Parameters:
303
- seed: Random seed for reproducibility (default: None)
304
"""
305
306
class LecunUniform(VarianceScaling):
307
"""
308
Lecun uniform initializer.
309
310
Draws samples from uniform distribution within [-limit, limit] where
311
limit = sqrt(3 / fan_in). Recommended for SELU activations.
312
313
Usage:
314
```python
315
layer = Dense(10, kernel_initializer='lecun_uniform')
316
# or
317
layer = Dense(10, kernel_initializer=LecunUniform())
318
```
319
"""
320
def __init__(self, seed=None):
321
"""
322
Initialize the LecunUniform initializer.
323
324
Parameters:
325
- seed: Random seed for reproducibility (default: None)
326
"""
327
```
328
329
### Advanced Initializers
330
331
Specialized initializers for specific architectures and use cases.
332
333
```python { .api }
334
class Orthogonal(Initializer):
335
"""
336
Initializes weights with orthogonal matrices.
337
338
Generates random orthogonal matrices using SVD decomposition.
339
Useful for RNNs to avoid vanishing/exploding gradients.
340
341
Usage:
342
```python
343
layer = Dense(10, kernel_initializer=Orthogonal(gain=1.0))
344
```
345
"""
346
def __init__(self, gain=1.0, seed=None):
347
"""
348
Initialize the Orthogonal initializer.
349
350
Parameters:
351
- gain: Scaling factor for the orthogonal matrix (default: 1.0)
352
- seed: Random seed for reproducibility (default: None)
353
"""
354
355
class STFT(Initializer):
356
"""
357
STFT initializer for specific signal processing applications.
358
359
Usage:
360
```python
361
layer = Dense(10, kernel_initializer=STFT())
362
```
363
"""
364
def __init__(self, **kwargs):
365
"""
366
Initialize the STFT initializer.
367
368
Parameters:
369
- **kwargs: Additional STFT-specific parameters
370
"""
371
```
372
373
### Utility Functions
374
375
Helper functions for initializer management and serialization.
376
377
```python { .api }
378
def serialize(initializer):
379
"""
380
Serialize an initializer to a string or config dict.
381
382
Parameters:
383
- initializer: Initializer to serialize
384
385
Returns:
386
String identifier or config dictionary
387
"""
388
389
def deserialize(config, custom_objects=None):
390
"""
391
Deserialize an initializer from a string or config dict.
392
393
Parameters:
394
- config: String identifier or config dictionary
395
- custom_objects: Optional dict mapping names to custom objects
396
397
Returns:
398
Initializer instance
399
"""
400
401
def get(identifier):
402
"""
403
Retrieve an initializer by string identifier.
404
405
Parameters:
406
- identifier: String name or initializer instance
407
408
Returns:
409
Initializer instance
410
"""
411
```
412
413
## Usage Examples
414
415
```python
416
import keras
417
from keras import initializers
418
419
# Using string identifiers
420
model = keras.Sequential([
421
keras.layers.Dense(64, kernel_initializer='he_normal', activation='relu'),
422
keras.layers.Dense(32, kernel_initializer='glorot_uniform', activation='tanh'),
423
keras.layers.Dense(10, kernel_initializer='zeros', activation='softmax')
424
])
425
426
# Using initializer classes directly
427
model = keras.Sequential([
428
keras.layers.Dense(64,
429
kernel_initializer=initializers.HeNormal(),
430
bias_initializer=initializers.Zeros(),
431
activation='relu'),
432
keras.layers.Dense(32,
433
kernel_initializer=initializers.GlorotUniform(seed=42),
434
activation='tanh'),
435
keras.layers.Dense(10,
436
kernel_initializer=initializers.Constant(0.1),
437
activation='softmax')
438
])
439
440
# Custom variance scaling
441
custom_init = initializers.VarianceScaling(
442
scale=2.0,
443
mode='fan_out',
444
distribution='uniform'
445
)
446
layer = keras.layers.Dense(128, kernel_initializer=custom_init)
447
448
# For RNNs - orthogonal initialization
449
rnn_layer = keras.layers.LSTM(
450
64,
451
kernel_initializer='orthogonal',
452
recurrent_initializer='orthogonal'
453
)
454
```
455
456
## Initialization Guidelines
457
458
- **ReLU activations**: Use `he_normal` or `he_uniform`
459
- **Tanh/Sigmoid activations**: Use `glorot_normal` or `glorot_uniform`
460
- **SELU activations**: Use `lecun_normal` or `lecun_uniform`
461
- **RNN layers**: Use `orthogonal` for recurrent weights
462
- **General purpose**: `glorot_uniform` is a good default choice