or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

activations.mdapplications.mdbackend-config.mdcore-framework.mdindex.mdinitializers.mdlayers.mdlosses-metrics.mdoperations.mdoptimizers.mdpreprocessing.mdregularizers.mdtraining-callbacks.md

initializers.mddocs/

0

# Weight Initializers

1

2

Comprehensive collection of weight initialization strategies for neural network layers. Proper weight initialization is crucial for training stability and convergence speed. Keras provides various initializers from simple constant values to sophisticated variance-scaling methods based on layer characteristics.

3

4

## Capabilities

5

6

### Constant Initializers

7

8

Initializers that set weights to constant values or specific patterns.

9

10

```python { .api }

11

class Zeros:

12

"""Initialize weights to zero."""

13

def __init__(self): ...

14

15

class Ones:

16

"""Initialize weights to one."""

17

def __init__(self): ...

18

19

class Constant:

20

"""Initialize weights to a constant value."""

21

def __init__(self, value=0.0): ...

22

23

class Identity:

24

"""Initialize weights as identity matrix (for square matrices)."""

25

def __init__(self, gain=1.0): ...

26

27

class STFT:

28

"""Short-Time Fourier Transform initializer."""

29

def __init__(self, fft_length=128, window_length=128, window_step=32): ...

30

```

31

32

### Random Initializers

33

34

Random initialization strategies with different distributions and scaling approaches.

35

36

```python { .api }

37

class RandomNormal:

38

"""Initialize weights with normal distribution."""

39

def __init__(self, mean=0.0, stddev=0.05, seed=None): ...

40

41

class RandomUniform:

42

"""Initialize weights with uniform distribution."""

43

def __init__(self, minval=-0.05, maxval=0.05, seed=None): ...

44

45

class TruncatedNormal:

46

"""Initialize weights with truncated normal distribution."""

47

def __init__(self, mean=0.0, stddev=0.05, seed=None): ...

48

49

class Orthogonal:

50

"""Initialize weights as orthogonal matrix."""

51

def __init__(self, gain=1.0, seed=None): ...

52

53

class VarianceScaling:

54

"""Initialize weights with variance scaling."""

55

def __init__(self, scale=1.0, mode='fan_in', distribution='truncated_normal', seed=None): ...

56

```

57

58

### Xavier/Glorot Initializers

59

60

Xavier (Glorot) initialization methods that scale weights based on input and output dimensions.

61

62

```python { .api }

63

class GlorotUniform:

64

"""Glorot uniform initializer (Xavier uniform)."""

65

def __init__(self, seed=None): ...

66

67

class GlorotNormal:

68

"""Glorot normal initializer (Xavier normal)."""

69

def __init__(self, seed=None): ...

70

```

71

72

### He Initializers

73

74

He initialization methods optimized for ReLU activations.

75

76

```python { .api }

77

class HeUniform:

78

"""He uniform initializer."""

79

def __init__(self, seed=None): ...

80

81

class HeNormal:

82

"""He normal initializer."""

83

def __init__(self, seed=None): ...

84

```

85

86

### LeCun Initializers

87

88

LeCun initialization methods for SELU activations.

89

90

```python { .api }

91

class LecunUniform:

92

"""LeCun uniform initializer."""

93

def __init__(self, seed=None): ...

94

95

class LecunNormal:

96

"""LeCun normal initializer."""

97

def __init__(self, seed=None): ...

98

```

99

100

### Base Classes and Utilities

101

102

Base classes and utility functions for working with initializers.

103

104

```python { .api }

105

class Initializer:

106

"""Base class for all initializers."""

107

def __call__(self, shape, dtype=None, **kwargs): ...

108

def get_config(self): ...

109

110

def get(identifier):

111

"""Retrieve an initializer by name or instance."""

112

113

def serialize(initializer):

114

"""Serialize an initializer to configuration."""

115

116

def deserialize(config, custom_objects=None):

117

"""Deserialize an initializer from configuration."""

118

```

119

120

## Usage Examples

121

122

### Basic Initialization

123

124

```python

125

from keras import layers, initializers

126

127

# Using string identifiers

128

dense_layer = layers.Dense(64, kernel_initializer='he_normal')

129

130

# Using initializer classes

131

dense_layer = layers.Dense(64,

132

kernel_initializer=initializers.HeNormal(),

133

bias_initializer=initializers.Zeros())

134

135

# Custom parameters

136

dense_layer = layers.Dense(64,

137

kernel_initializer=initializers.RandomNormal(mean=0.0, stddev=0.01),

138

bias_initializer=initializers.Constant(value=0.1))

139

```

140

141

### Convolutional Layer Initialization

142

143

```python

144

from keras import layers, initializers

145

146

# Convolutional layer with He initialization

147

conv_layer = layers.Conv2D(32, (3, 3),

148

kernel_initializer='he_uniform',

149

bias_initializer='zeros')

150

151

# With custom variance scaling

152

conv_layer = layers.Conv2D(32, (3, 3),

153

kernel_initializer=initializers.VarianceScaling(

154

scale=2.0, mode='fan_out', distribution='uniform'))

155

```

156

157

### RNN Layer Initialization

158

159

```python

160

from keras import layers, initializers

161

162

# LSTM with orthogonal recurrent weights

163

lstm_layer = layers.LSTM(128,

164

kernel_initializer='glorot_uniform',

165

recurrent_initializer='orthogonal',

166

bias_initializer='zeros')

167

168

# GRU with custom initialization

169

gru_layer = layers.GRU(64,

170

kernel_initializer=initializers.GlorotNormal(),

171

recurrent_initializer=initializers.Orthogonal(gain=1.0))

172

```

173

174

### Custom Initializer

175

176

```python

177

import keras

178

from keras import initializers

179

180

class CustomInitializer(initializers.Initializer):

181

def __init__(self, scale=1.0):

182

self.scale = scale

183

184

def __call__(self, shape, dtype=None, **kwargs):

185

# Custom initialization logic

186

values = keras.random.normal(shape, dtype=dtype) * self.scale

187

return values

188

189

def get_config(self):

190

return {'scale': self.scale}

191

192

# Use custom initializer

193

dense_layer = layers.Dense(64, kernel_initializer=CustomInitializer(scale=0.5))

194

```

195

196

### Initialization Comparison

197

198

```python

199

import keras

200

from keras import initializers

201

import numpy as np

202

203

# Compare different initializers for same shape

204

shape = (100, 50)

205

206

# Glorot (Xavier) initialization

207

glorot_weights = initializers.GlorotNormal()(shape)

208

print(f"Glorot std: {keras.ops.std(glorot_weights):.4f}")

209

210

# He initialization

211

he_weights = initializers.HeNormal()(shape)

212

print(f"He std: {keras.ops.std(he_weights):.4f}")

213

214

# LeCun initialization

215

lecun_weights = initializers.LecunNormal()(shape)

216

print(f"LeCun std: {keras.ops.std(lecun_weights):.4f}")

217

```

218

219

### Identity Initialization for Skip Connections

220

221

```python

222

from keras import layers, initializers, models

223

224

# Identity initialization for residual connections

225

inputs = layers.Input(shape=(64,))

226

x = layers.Dense(64, kernel_initializer='he_normal')(inputs)

227

x = layers.ReLU()(x)

228

229

# Skip connection with identity initialization

230

skip = layers.Dense(64, kernel_initializer=initializers.Identity(gain=0.1))(inputs)

231

outputs = layers.Add()([x, skip])

232

233

model = models.Model(inputs, outputs)

234

```

235

236

## Initialization Guidelines

237

238

### By Activation Function

239

240

- **ReLU/Leaky ReLU**: Use `HeNormal` or `HeUniform`

241

- **SELU**: Use `LecunNormal` or `LecunUniform`

242

- **Tanh/Sigmoid**: Use `GlorotNormal` or `GlorotUniform`

243

- **Linear**: Use `GlorotNormal` or custom variance scaling

244

245

### By Layer Type

246

247

- **Dense layers**: Glorot (balanced) or He (with ReLU)

248

- **Convolutional layers**: He initialization is commonly used

249

- **Recurrent layers**: Orthogonal for recurrent weights, Glorot for input weights

250

- **Batch normalization**: Ones for gamma, Zeros for beta

251

- **Embeddings**: Random uniform or normal with small variance

252

253

### General Principles

254

255

- Avoid zero initialization for weights (except biases)

256

- Consider activation function when choosing initializer

257

- Use orthogonal initialization for recurrent connections

258

- Adjust scale based on network depth and width