or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

activations.mdapplications.mddata-utils.mdindex.mdinitializers.mdlayers.mdmodels.mdoperations.mdrandom.mdregularizers.mdsaving.mdtraining.md

initializers.mddocs/

0

# Weight Initializers

1

2

Weight initializers determine how layer weights are initialized before training. Proper initialization is crucial for effective training and convergence of neural networks.

3

4

## Capabilities

5

6

### Base Initializer

7

8

The abstract base class for all weight initializers, providing the interface for weight initialization.

9

10

```python { .api }

11

class Initializer:

12

"""

13

Base class for weight initializers.

14

15

All initializers should inherit from this class and implement the __call__ method.

16

"""

17

def __call__(self, shape, dtype=None, **kwargs):

18

"""

19

Generate initial weights.

20

21

Parameters:

22

- shape: Shape of the weight tensor to initialize

23

- dtype: Data type of the weights (default: None)

24

- **kwargs: Additional initializer-specific arguments

25

26

Returns:

27

Tensor of initialized weights

28

"""

29

```

30

31

### Constant Initializers

32

33

Initializers that set weights to constant values or specific patterns.

34

35

```python { .api }

36

class Zeros(Initializer):

37

"""

38

Initializes weights to zero.

39

40

Usage:

41

```python

42

layer = Dense(10, kernel_initializer='zeros')

43

# or

44

layer = Dense(10, kernel_initializer=Zeros())

45

```

46

"""

47

def __init__(self):

48

"""Initialize the Zeros initializer."""

49

50

class Ones(Initializer):

51

"""

52

Initializes weights to one.

53

54

Usage:

55

```python

56

layer = Dense(10, kernel_initializer='ones')

57

# or

58

layer = Dense(10, kernel_initializer=Ones())

59

```

60

"""

61

def __init__(self):

62

"""Initialize the Ones initializer."""

63

64

class Constant(Initializer):

65

"""

66

Initializes weights to a constant value.

67

68

Usage:

69

```python

70

layer = Dense(10, kernel_initializer=Constant(value=0.5))

71

```

72

"""

73

def __init__(self, value=0.0):

74

"""

75

Initialize the Constant initializer.

76

77

Parameters:

78

- value: Constant value to initialize weights to (default: 0.0)

79

"""

80

81

class Identity(Initializer):

82

"""

83

Initializes weights to the identity matrix (for square matrices).

84

85

For non-square matrices, initializes with identity matrix in the center.

86

87

Usage:

88

```python

89

layer = Dense(10, kernel_initializer='identity')

90

# or

91

layer = Dense(10, kernel_initializer=Identity(gain=1.0))

92

```

93

"""

94

def __init__(self, gain=1.0):

95

"""

96

Initialize the Identity initializer.

97

98

Parameters:

99

- gain: Scaling factor for the identity matrix (default: 1.0)

100

"""

101

```

102

103

### Random Initializers

104

105

Initializers that generate random weights from various probability distributions.

106

107

```python { .api }

108

class RandomNormal(Initializer):

109

"""

110

Initializes weights with random values from a normal distribution.

111

112

Usage:

113

```python

114

layer = Dense(10, kernel_initializer=RandomNormal(mean=0.0, stddev=0.05))

115

```

116

"""

117

def __init__(self, mean=0.0, stddev=0.05, seed=None):

118

"""

119

Initialize the RandomNormal initializer.

120

121

Parameters:

122

- mean: Mean of the normal distribution (default: 0.0)

123

- stddev: Standard deviation of the normal distribution (default: 0.05)

124

- seed: Random seed for reproducibility (default: None)

125

"""

126

127

class RandomUniform(Initializer):

128

"""

129

Initializes weights with random values from a uniform distribution.

130

131

Usage:

132

```python

133

layer = Dense(10, kernel_initializer=RandomUniform(minval=-0.1, maxval=0.1))

134

```

135

"""

136

def __init__(self, minval=-0.05, maxval=0.05, seed=None):

137

"""

138

Initialize the RandomUniform initializer.

139

140

Parameters:

141

- minval: Lower bound of the uniform distribution (default: -0.05)

142

- maxval: Upper bound of the uniform distribution (default: 0.05)

143

- seed: Random seed for reproducibility (default: None)

144

"""

145

146

class TruncatedNormal(Initializer):

147

"""

148

Initializes weights with truncated normal distribution.

149

150

Values more than 2 standard deviations from mean are discarded and redrawn.

151

152

Usage:

153

```python

154

layer = Dense(10, kernel_initializer=TruncatedNormal(stddev=0.1))

155

```

156

"""

157

def __init__(self, mean=0.0, stddev=0.05, seed=None):

158

"""

159

Initialize the TruncatedNormal initializer.

160

161

Parameters:

162

- mean: Mean of the truncated normal distribution (default: 0.0)

163

- stddev: Standard deviation before truncation (default: 0.05)

164

- seed: Random seed for reproducibility (default: None)

165

"""

166

```

167

168

### Variance Scaling Initializers

169

170

Initializers that scale the variance based on the number of input and output units.

171

172

```python { .api }

173

class VarianceScaling(Initializer):

174

"""

175

Base class for variance scaling initializers.

176

177

Scales variance based on fan-in, fan-out, or their average.

178

179

Usage:

180

```python

181

layer = Dense(10, kernel_initializer=VarianceScaling(

182

scale=2.0, mode='fan_in', distribution='truncated_normal'

183

))

184

```

185

"""

186

def __init__(self, scale=1.0, mode='fan_in', distribution='truncated_normal', seed=None):

187

"""

188

Initialize the VarianceScaling initializer.

189

190

Parameters:

191

- scale: Scaling factor for the variance (default: 1.0)

192

- mode: 'fan_in', 'fan_out', or 'fan_avg' (default: 'fan_in')

193

- distribution: 'normal', 'uniform', or 'truncated_normal' (default: 'truncated_normal')

194

- seed: Random seed for reproducibility (default: None)

195

"""

196

197

class GlorotNormal(VarianceScaling):

198

"""

199

Glorot normal initializer (Xavier normal).

200

201

Draws samples from truncated normal with stddev = sqrt(2 / (fan_in + fan_out)).

202

203

Usage:

204

```python

205

layer = Dense(10, kernel_initializer='glorot_normal')

206

# or

207

layer = Dense(10, kernel_initializer=GlorotNormal())

208

```

209

"""

210

def __init__(self, seed=None):

211

"""

212

Initialize the GlorotNormal initializer.

213

214

Parameters:

215

- seed: Random seed for reproducibility (default: None)

216

"""

217

218

class GlorotUniform(VarianceScaling):

219

"""

220

Glorot uniform initializer (Xavier uniform).

221

222

Draws samples from uniform distribution within [-limit, limit] where

223

limit = sqrt(6 / (fan_in + fan_out)).

224

225

Usage:

226

```python

227

layer = Dense(10, kernel_initializer='glorot_uniform')

228

# or

229

layer = Dense(10, kernel_initializer=GlorotUniform())

230

```

231

"""

232

def __init__(self, seed=None):

233

"""

234

Initialize the GlorotUniform initializer.

235

236

Parameters:

237

- seed: Random seed for reproducibility (default: None)

238

"""

239

240

class HeNormal(VarianceScaling):

241

"""

242

He normal initializer (Kaiming normal).

243

244

Draws samples from truncated normal with stddev = sqrt(2 / fan_in).

245

Recommended for ReLU activations.

246

247

Usage:

248

```python

249

layer = Dense(10, kernel_initializer='he_normal')

250

# or

251

layer = Dense(10, kernel_initializer=HeNormal())

252

```

253

"""

254

def __init__(self, seed=None):

255

"""

256

Initialize the HeNormal initializer.

257

258

Parameters:

259

- seed: Random seed for reproducibility (default: None)

260

"""

261

262

class HeUniform(VarianceScaling):

263

"""

264

He uniform initializer (Kaiming uniform).

265

266

Draws samples from uniform distribution within [-limit, limit] where

267

limit = sqrt(6 / fan_in). Recommended for ReLU activations.

268

269

Usage:

270

```python

271

layer = Dense(10, kernel_initializer='he_uniform')

272

# or

273

layer = Dense(10, kernel_initializer=HeUniform())

274

```

275

"""

276

def __init__(self, seed=None):

277

"""

278

Initialize the HeUniform initializer.

279

280

Parameters:

281

- seed: Random seed for reproducibility (default: None)

282

"""

283

284

class LecunNormal(VarianceScaling):

285

"""

286

Lecun normal initializer.

287

288

Draws samples from truncated normal with stddev = sqrt(1 / fan_in).

289

Recommended for SELU activations.

290

291

Usage:

292

```python

293

layer = Dense(10, kernel_initializer='lecun_normal')

294

# or

295

layer = Dense(10, kernel_initializer=LecunNormal())

296

```

297

"""

298

def __init__(self, seed=None):

299

"""

300

Initialize the LecunNormal initializer.

301

302

Parameters:

303

- seed: Random seed for reproducibility (default: None)

304

"""

305

306

class LecunUniform(VarianceScaling):

307

"""

308

Lecun uniform initializer.

309

310

Draws samples from uniform distribution within [-limit, limit] where

311

limit = sqrt(3 / fan_in). Recommended for SELU activations.

312

313

Usage:

314

```python

315

layer = Dense(10, kernel_initializer='lecun_uniform')

316

# or

317

layer = Dense(10, kernel_initializer=LecunUniform())

318

```

319

"""

320

def __init__(self, seed=None):

321

"""

322

Initialize the LecunUniform initializer.

323

324

Parameters:

325

- seed: Random seed for reproducibility (default: None)

326

"""

327

```

328

329

### Advanced Initializers

330

331

Specialized initializers for specific architectures and use cases.

332

333

```python { .api }

334

class Orthogonal(Initializer):

335

"""

336

Initializes weights with orthogonal matrices.

337

338

Generates random orthogonal matrices using SVD decomposition.

339

Useful for RNNs to avoid vanishing/exploding gradients.

340

341

Usage:

342

```python

343

layer = Dense(10, kernel_initializer=Orthogonal(gain=1.0))

344

```

345

"""

346

def __init__(self, gain=1.0, seed=None):

347

"""

348

Initialize the Orthogonal initializer.

349

350

Parameters:

351

- gain: Scaling factor for the orthogonal matrix (default: 1.0)

352

- seed: Random seed for reproducibility (default: None)

353

"""

354

355

class STFT(Initializer):

356

"""

357

STFT initializer for specific signal processing applications.

358

359

Usage:

360

```python

361

layer = Dense(10, kernel_initializer=STFT())

362

```

363

"""

364

def __init__(self, **kwargs):

365

"""

366

Initialize the STFT initializer.

367

368

Parameters:

369

- **kwargs: Additional STFT-specific parameters

370

"""

371

```

372

373

### Utility Functions

374

375

Helper functions for initializer management and serialization.

376

377

```python { .api }

378

def serialize(initializer):

379

"""

380

Serialize an initializer to a string or config dict.

381

382

Parameters:

383

- initializer: Initializer to serialize

384

385

Returns:

386

String identifier or config dictionary

387

"""

388

389

def deserialize(config, custom_objects=None):

390

"""

391

Deserialize an initializer from a string or config dict.

392

393

Parameters:

394

- config: String identifier or config dictionary

395

- custom_objects: Optional dict mapping names to custom objects

396

397

Returns:

398

Initializer instance

399

"""

400

401

def get(identifier):

402

"""

403

Retrieve an initializer by string identifier.

404

405

Parameters:

406

- identifier: String name or initializer instance

407

408

Returns:

409

Initializer instance

410

"""

411

```

412

413

## Usage Examples

414

415

```python

416

import keras

417

from keras import initializers

418

419

# Using string identifiers

420

model = keras.Sequential([

421

keras.layers.Dense(64, kernel_initializer='he_normal', activation='relu'),

422

keras.layers.Dense(32, kernel_initializer='glorot_uniform', activation='tanh'),

423

keras.layers.Dense(10, kernel_initializer='zeros', activation='softmax')

424

])

425

426

# Using initializer classes directly

427

model = keras.Sequential([

428

keras.layers.Dense(64,

429

kernel_initializer=initializers.HeNormal(),

430

bias_initializer=initializers.Zeros(),

431

activation='relu'),

432

keras.layers.Dense(32,

433

kernel_initializer=initializers.GlorotUniform(seed=42),

434

activation='tanh'),

435

keras.layers.Dense(10,

436

kernel_initializer=initializers.Constant(0.1),

437

activation='softmax')

438

])

439

440

# Custom variance scaling

441

custom_init = initializers.VarianceScaling(

442

scale=2.0,

443

mode='fan_out',

444

distribution='uniform'

445

)

446

layer = keras.layers.Dense(128, kernel_initializer=custom_init)

447

448

# For RNNs - orthogonal initialization

449

rnn_layer = keras.layers.LSTM(

450

64,

451

kernel_initializer='orthogonal',

452

recurrent_initializer='orthogonal'

453

)

454

```

455

456

## Initialization Guidelines

457

458

- **ReLU activations**: Use `he_normal` or `he_uniform`

459

- **Tanh/Sigmoid activations**: Use `glorot_normal` or `glorot_uniform`

460

- **SELU activations**: Use `lecun_normal` or `lecun_uniform`

461

- **RNN layers**: Use `orthogonal` for recurrent weights

462

- **General purpose**: `glorot_uniform` is a good default choice