0
# Weight Initializers
1
2
Comprehensive collection of weight initialization strategies for neural network layers. Proper weight initialization is crucial for training stability and convergence speed. Keras provides various initializers from simple constant values to sophisticated variance-scaling methods based on layer characteristics.
3
4
## Capabilities
5
6
### Constant Initializers
7
8
Initializers that set weights to constant values or specific patterns.
9
10
```python { .api }
11
class Zeros:
12
"""Initialize weights to zero."""
13
def __init__(self): ...
14
15
class Ones:
16
"""Initialize weights to one."""
17
def __init__(self): ...
18
19
class Constant:
20
"""Initialize weights to a constant value."""
21
def __init__(self, value=0.0): ...
22
23
class Identity:
24
"""Initialize weights as identity matrix (for square matrices)."""
25
def __init__(self, gain=1.0): ...
26
27
class STFT:
28
"""Short-Time Fourier Transform initializer."""
29
def __init__(self, fft_length=128, window_length=128, window_step=32): ...
30
```
31
32
### Random Initializers
33
34
Random initialization strategies with different distributions and scaling approaches.
35
36
```python { .api }
37
class RandomNormal:
38
"""Initialize weights with normal distribution."""
39
def __init__(self, mean=0.0, stddev=0.05, seed=None): ...
40
41
class RandomUniform:
42
"""Initialize weights with uniform distribution."""
43
def __init__(self, minval=-0.05, maxval=0.05, seed=None): ...
44
45
class TruncatedNormal:
46
"""Initialize weights with truncated normal distribution."""
47
def __init__(self, mean=0.0, stddev=0.05, seed=None): ...
48
49
class Orthogonal:
50
"""Initialize weights as orthogonal matrix."""
51
def __init__(self, gain=1.0, seed=None): ...
52
53
class VarianceScaling:
54
"""Initialize weights with variance scaling."""
55
def __init__(self, scale=1.0, mode='fan_in', distribution='truncated_normal', seed=None): ...
56
```
57
58
### Xavier/Glorot Initializers
59
60
Xavier (Glorot) initialization methods that scale weights based on input and output dimensions.
61
62
```python { .api }
63
class GlorotUniform:
64
"""Glorot uniform initializer (Xavier uniform)."""
65
def __init__(self, seed=None): ...
66
67
class GlorotNormal:
68
"""Glorot normal initializer (Xavier normal)."""
69
def __init__(self, seed=None): ...
70
```
71
72
### He Initializers
73
74
He initialization methods optimized for ReLU activations.
75
76
```python { .api }
77
class HeUniform:
78
"""He uniform initializer."""
79
def __init__(self, seed=None): ...
80
81
class HeNormal:
82
"""He normal initializer."""
83
def __init__(self, seed=None): ...
84
```
85
86
### LeCun Initializers
87
88
LeCun initialization methods for SELU activations.
89
90
```python { .api }
91
class LecunUniform:
92
"""LeCun uniform initializer."""
93
def __init__(self, seed=None): ...
94
95
class LecunNormal:
96
"""LeCun normal initializer."""
97
def __init__(self, seed=None): ...
98
```
99
100
### Base Classes and Utilities
101
102
Base classes and utility functions for working with initializers.
103
104
```python { .api }
105
class Initializer:
106
"""Base class for all initializers."""
107
def __call__(self, shape, dtype=None, **kwargs): ...
108
def get_config(self): ...
109
110
def get(identifier):
111
"""Retrieve an initializer by name or instance."""
112
113
def serialize(initializer):
114
"""Serialize an initializer to configuration."""
115
116
def deserialize(config, custom_objects=None):
117
"""Deserialize an initializer from configuration."""
118
```
119
120
## Usage Examples
121
122
### Basic Initialization
123
124
```python
125
from keras import layers, initializers
126
127
# Using string identifiers
128
dense_layer = layers.Dense(64, kernel_initializer='he_normal')
129
130
# Using initializer classes
131
dense_layer = layers.Dense(64,
132
kernel_initializer=initializers.HeNormal(),
133
bias_initializer=initializers.Zeros())
134
135
# Custom parameters
136
dense_layer = layers.Dense(64,
137
kernel_initializer=initializers.RandomNormal(mean=0.0, stddev=0.01),
138
bias_initializer=initializers.Constant(value=0.1))
139
```
140
141
### Convolutional Layer Initialization
142
143
```python
144
from keras import layers, initializers
145
146
# Convolutional layer with He initialization
147
conv_layer = layers.Conv2D(32, (3, 3),
148
kernel_initializer='he_uniform',
149
bias_initializer='zeros')
150
151
# With custom variance scaling
152
conv_layer = layers.Conv2D(32, (3, 3),
153
kernel_initializer=initializers.VarianceScaling(
154
scale=2.0, mode='fan_out', distribution='uniform'))
155
```
156
157
### RNN Layer Initialization
158
159
```python
160
from keras import layers, initializers
161
162
# LSTM with orthogonal recurrent weights
163
lstm_layer = layers.LSTM(128,
164
kernel_initializer='glorot_uniform',
165
recurrent_initializer='orthogonal',
166
bias_initializer='zeros')
167
168
# GRU with custom initialization
169
gru_layer = layers.GRU(64,
170
kernel_initializer=initializers.GlorotNormal(),
171
recurrent_initializer=initializers.Orthogonal(gain=1.0))
172
```
173
174
### Custom Initializer
175
176
```python
177
import keras
178
from keras import initializers
179
180
class CustomInitializer(initializers.Initializer):
181
def __init__(self, scale=1.0):
182
self.scale = scale
183
184
def __call__(self, shape, dtype=None, **kwargs):
185
# Custom initialization logic
186
values = keras.random.normal(shape, dtype=dtype) * self.scale
187
return values
188
189
def get_config(self):
190
return {'scale': self.scale}
191
192
# Use custom initializer
193
dense_layer = layers.Dense(64, kernel_initializer=CustomInitializer(scale=0.5))
194
```
195
196
### Initialization Comparison
197
198
```python
199
import keras
200
from keras import initializers
201
import numpy as np
202
203
# Compare different initializers for same shape
204
shape = (100, 50)
205
206
# Glorot (Xavier) initialization
207
glorot_weights = initializers.GlorotNormal()(shape)
208
print(f"Glorot std: {keras.ops.std(glorot_weights):.4f}")
209
210
# He initialization
211
he_weights = initializers.HeNormal()(shape)
212
print(f"He std: {keras.ops.std(he_weights):.4f}")
213
214
# LeCun initialization
215
lecun_weights = initializers.LecunNormal()(shape)
216
print(f"LeCun std: {keras.ops.std(lecun_weights):.4f}")
217
```
218
219
### Identity Initialization for Skip Connections
220
221
```python
222
from keras import layers, initializers, models
223
224
# Identity initialization for residual connections
225
inputs = layers.Input(shape=(64,))
226
x = layers.Dense(64, kernel_initializer='he_normal')(inputs)
227
x = layers.ReLU()(x)
228
229
# Skip connection with identity initialization
230
skip = layers.Dense(64, kernel_initializer=initializers.Identity(gain=0.1))(inputs)
231
outputs = layers.Add()([x, skip])
232
233
model = models.Model(inputs, outputs)
234
```
235
236
## Initialization Guidelines
237
238
### By Activation Function
239
240
- **ReLU/Leaky ReLU**: Use `HeNormal` or `HeUniform`
241
- **SELU**: Use `LecunNormal` or `LecunUniform`
242
- **Tanh/Sigmoid**: Use `GlorotNormal` or `GlorotUniform`
243
- **Linear**: Use `GlorotNormal` or custom variance scaling
244
245
### By Layer Type
246
247
- **Dense layers**: Glorot (balanced) or He (with ReLU)
248
- **Convolutional layers**: He initialization is commonly used
249
- **Recurrent layers**: Orthogonal for recurrent weights, Glorot for input weights
250
- **Batch normalization**: Ones for gamma, Zeros for beta
251
- **Embeddings**: Random uniform or normal with small variance
252
253
### General Principles
254
255
- Avoid zero initialization for weights (except biases)
256
- Consider activation function when choosing initializer
257
- Use orthogonal initialization for recurrent connections
258
- Adjust scale based on network depth and width