Tessl Tile for pypi/keras-nightly@3.11.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

activations.md applications.md backend-config.md core-framework.md index.md initializers.md layers.md losses-metrics.md operations.md optimizers.md preprocessing.md regularizers.md training-callbacks.md

initializers.mddocs/

0
# Weight Initializers
1

2
Comprehensive collection of weight initialization strategies for neural network layers. Proper weight initialization is crucial for training stability and convergence speed. Keras provides various initializers from simple constant values to sophisticated variance-scaling methods based on layer characteristics.
3

4
## Capabilities
5

6
### Constant Initializers
7

8
Initializers that set weights to constant values or specific patterns.
9

10
```python { .api }
11
class Zeros:
12
    """Initialize weights to zero."""
13
    def __init__(self): ...
14

15
class Ones:
16
    """Initialize weights to one."""
17
    def __init__(self): ...
18

19
class Constant:
20
    """Initialize weights to a constant value."""
21
    def __init__(self, value=0.0): ...
22

23
class Identity:
24
    """Initialize weights as identity matrix (for square matrices)."""
25
    def __init__(self, gain=1.0): ...
26

27
class STFT:
28
    """Short-Time Fourier Transform initializer."""
29
    def __init__(self, fft_length=128, window_length=128, window_step=32): ...
30
```
31

32
### Random Initializers
33

34
Random initialization strategies with different distributions and scaling approaches.
35

36
```python { .api }
37
class RandomNormal:
38
    """Initialize weights with normal distribution."""
39
    def __init__(self, mean=0.0, stddev=0.05, seed=None): ...
40

41
class RandomUniform:
42
    """Initialize weights with uniform distribution."""
43
    def __init__(self, minval=-0.05, maxval=0.05, seed=None): ...
44

45
class TruncatedNormal:
46
    """Initialize weights with truncated normal distribution."""
47
    def __init__(self, mean=0.0, stddev=0.05, seed=None): ...
48

49
class Orthogonal:
50
    """Initialize weights as orthogonal matrix."""
51
    def __init__(self, gain=1.0, seed=None): ...
52

53
class VarianceScaling:
54
    """Initialize weights with variance scaling."""
55
    def __init__(self, scale=1.0, mode='fan_in', distribution='truncated_normal', seed=None): ...
56
```
57

58
### Xavier/Glorot Initializers
59

60
Xavier (Glorot) initialization methods that scale weights based on input and output dimensions.
61

62
```python { .api }
63
class GlorotUniform:
64
    """Glorot uniform initializer (Xavier uniform)."""
65
    def __init__(self, seed=None): ...
66

67
class GlorotNormal:
68
    """Glorot normal initializer (Xavier normal)."""
69
    def __init__(self, seed=None): ...
70
```
71

72
### He Initializers
73

74
He initialization methods optimized for ReLU activations.
75

76
```python { .api }
77
class HeUniform:
78
    """He uniform initializer."""
79
    def __init__(self, seed=None): ...
80

81
class HeNormal:
82
    """He normal initializer."""
83
    def __init__(self, seed=None): ...
84
```
85

86
### LeCun Initializers
87

88
LeCun initialization methods for SELU activations.
89

90
```python { .api }
91
class LecunUniform:
92
    """LeCun uniform initializer."""
93
    def __init__(self, seed=None): ...
94

95
class LecunNormal:
96
    """LeCun normal initializer."""
97
    def __init__(self, seed=None): ...
98
```
99

100
### Base Classes and Utilities
101

102
Base classes and utility functions for working with initializers.
103

104
```python { .api }
105
class Initializer:
106
    """Base class for all initializers."""
107
    def __call__(self, shape, dtype=None, **kwargs): ...
108
    def get_config(self): ...
109
    
110
def get(identifier):
111
    """Retrieve an initializer by name or instance."""
112

113
def serialize(initializer):
114
    """Serialize an initializer to configuration."""
115

116
def deserialize(config, custom_objects=None):
117
    """Deserialize an initializer from configuration."""
118
```
119

120
## Usage Examples
121

122
### Basic Initialization
123

124
```python
125
from keras import layers, initializers
126

127
# Using string identifiers
128
dense_layer = layers.Dense(64, kernel_initializer='he_normal')
129

130
# Using initializer classes
131
dense_layer = layers.Dense(64, 
132
                          kernel_initializer=initializers.HeNormal(),
133
                          bias_initializer=initializers.Zeros())
134

135
# Custom parameters
136
dense_layer = layers.Dense(64,
137
                          kernel_initializer=initializers.RandomNormal(mean=0.0, stddev=0.01),
138
                          bias_initializer=initializers.Constant(value=0.1))
139
```
140

141
### Convolutional Layer Initialization
142

143
```python
144
from keras import layers, initializers
145

146
# Convolutional layer with He initialization
147
conv_layer = layers.Conv2D(32, (3, 3),
148
                          kernel_initializer='he_uniform',
149
                          bias_initializer='zeros')
150

151
# With custom variance scaling
152
conv_layer = layers.Conv2D(32, (3, 3),
153
                          kernel_initializer=initializers.VarianceScaling(
154
                              scale=2.0, mode='fan_out', distribution='uniform'))
155
```
156

157
### RNN Layer Initialization
158

159
```python
160
from keras import layers, initializers
161

162
# LSTM with orthogonal recurrent weights
163
lstm_layer = layers.LSTM(128,
164
                        kernel_initializer='glorot_uniform',
165
                        recurrent_initializer='orthogonal',
166
                        bias_initializer='zeros')
167

168
# GRU with custom initialization
169
gru_layer = layers.GRU(64,
170
                      kernel_initializer=initializers.GlorotNormal(),
171
                      recurrent_initializer=initializers.Orthogonal(gain=1.0))
172
```
173

174
### Custom Initializer
175

176
```python
177
import keras
178
from keras import initializers
179

180
class CustomInitializer(initializers.Initializer):
181
    def __init__(self, scale=1.0):
182
        self.scale = scale
183
    
184
    def __call__(self, shape, dtype=None, **kwargs):
185
        # Custom initialization logic
186
        values = keras.random.normal(shape, dtype=dtype) * self.scale
187
        return values
188
    
189
    def get_config(self):
190
        return {'scale': self.scale}
191

192
# Use custom initializer
193
dense_layer = layers.Dense(64, kernel_initializer=CustomInitializer(scale=0.5))
194
```
195

196
### Initialization Comparison
197

198
```python
199
import keras
200
from keras import initializers
201
import numpy as np
202

203
# Compare different initializers for same shape
204
shape = (100, 50)
205

206
# Glorot (Xavier) initialization
207
glorot_weights = initializers.GlorotNormal()(shape)
208
print(f"Glorot std: {keras.ops.std(glorot_weights):.4f}")
209

210
# He initialization  
211
he_weights = initializers.HeNormal()(shape)
212
print(f"He std: {keras.ops.std(he_weights):.4f}")
213

214
# LeCun initialization
215
lecun_weights = initializers.LecunNormal()(shape)
216
print(f"LeCun std: {keras.ops.std(lecun_weights):.4f}")
217
```
218

219
### Identity Initialization for Skip Connections
220

221
```python
222
from keras import layers, initializers, models
223

224
# Identity initialization for residual connections
225
inputs = layers.Input(shape=(64,))
226
x = layers.Dense(64, kernel_initializer='he_normal')(inputs)
227
x = layers.ReLU()(x)
228

229
# Skip connection with identity initialization
230
skip = layers.Dense(64, kernel_initializer=initializers.Identity(gain=0.1))(inputs)
231
outputs = layers.Add()([x, skip])
232

233
model = models.Model(inputs, outputs)
234
```
235

236
## Initialization Guidelines
237

238
### By Activation Function
239

240
- **ReLU/Leaky ReLU**: Use `HeNormal` or `HeUniform`
241
- **SELU**: Use `LecunNormal` or `LecunUniform`
242
- **Tanh/Sigmoid**: Use `GlorotNormal` or `GlorotUniform`
243
- **Linear**: Use `GlorotNormal` or custom variance scaling
244

245
### By Layer Type
246

247
- **Dense layers**: Glorot (balanced) or He (with ReLU)
248
- **Convolutional layers**: He initialization is commonly used
249
- **Recurrent layers**: Orthogonal for recurrent weights, Glorot for input weights
250
- **Batch normalization**: Ones for gamma, Zeros for beta
251
- **Embeddings**: Random uniform or normal with small variance
252

253
### General Principles
254

255
- Avoid zero initialization for weights (except biases)
256
- Consider activation function when choosing initializer
257
- Use orthogonal initialization for recurrent connections
258
- Adjust scale based on network depth and width

Version

Tile

Files

initializers.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

initializers.mddocs/