0
# Layers and Building Blocks
1
2
Comprehensive layer types for building neural networks in Keras. Layers are the fundamental building blocks that transform inputs through learnable parameters and mathematical operations.
3
4
## Capabilities
5
6
### Base Layer Class
7
8
The foundational Layer class that all Keras layers inherit from, providing core functionality for parameter management, computation, and serialization.
9
10
```python { .api }
11
class Layer:
12
def __init__(self, trainable=True, name=None, dtype=None, **kwargs):
13
"""
14
Base class for all neural network layers.
15
16
Parameters:
17
- trainable: Whether layer weights should be trainable
18
- name: Name of the layer
19
- dtype: Data type for layer computations
20
"""
21
22
def call(self, inputs, **kwargs):
23
"""
24
Forward pass computation logic.
25
26
Parameters:
27
- inputs: Input tensor(s)
28
29
Returns:
30
Output tensor(s)
31
"""
32
33
def build(self, input_shape):
34
"""
35
Create layer weights based on input shape.
36
37
Parameters:
38
- input_shape: Shape of input tensor
39
"""
40
41
def get_config(self):
42
"""
43
Get layer configuration for serialization.
44
45
Returns:
46
Dict containing layer configuration
47
"""
48
```
49
50
### Core Layers
51
52
Fundamental layers for basic neural network operations including dense connections, embeddings, and utility layers.
53
54
```python { .api }
55
class Dense(Layer):
56
def __init__(self, units, activation=None, use_bias=True,
57
kernel_initializer='glorot_uniform', bias_initializer='zeros',
58
kernel_regularizer=None, bias_regularizer=None,
59
activity_regularizer=None, kernel_constraint=None,
60
bias_constraint=None, lora_rank=None, lora_alpha=None, **kwargs):
61
"""
62
Fully connected layer.
63
64
Parameters:
65
- units: Number of output units
66
- activation: Activation function to use
67
- use_bias: Whether to use bias vector
68
- kernel_initializer: Initializer for weight matrix
69
- bias_initializer: Initializer for bias vector
70
- kernel_regularizer: Regularizer for weight matrix
71
- bias_regularizer: Regularizer for bias vector
72
- activity_regularizer: Regularizer for layer output
73
- kernel_constraint: Constraint for weight matrix
74
- bias_constraint: Constraint for bias vector
75
- lora_rank: Rank for LoRA (Low-Rank Adaptation)
76
- lora_alpha: Alpha parameter for LoRA scaling
77
"""
78
79
class Embedding(Layer):
80
def __init__(self, input_dim, output_dim, embeddings_initializer='uniform',
81
embeddings_regularizer=None, mask_zero=False, **kwargs):
82
"""
83
Embedding layer for discrete tokens.
84
85
Parameters:
86
- input_dim: Size of vocabulary
87
- output_dim: Size of dense vector embeddings
88
- embeddings_initializer: Initializer for embedding matrix
89
- embeddings_regularizer: Regularizer for embedding matrix
90
- mask_zero: Whether input value 0 is special "padding" value
91
"""
92
93
class Flatten(Layer):
94
def __init__(self, data_format=None, **kwargs):
95
"""
96
Flatten input tensor to 1D (except batch dimension).
97
98
Parameters:
99
- data_format: Data format for input tensor
100
"""
101
102
class Reshape(Layer):
103
def __init__(self, target_shape, **kwargs):
104
"""
105
Reshape input tensor to target shape.
106
107
Parameters:
108
- target_shape: Target shape tuple (not including batch dimension)
109
"""
110
111
class Lambda(Layer):
112
def __init__(self, function, output_shape=None, mask=None, **kwargs):
113
"""
114
Wrap arbitrary expression as layer.
115
116
Parameters:
117
- function: Function to be evaluated
118
- output_shape: Expected output shape from function
119
- mask: Mask to be applied to output
120
"""
121
```
122
123
### Convolutional Layers
124
125
Layers for convolutional operations in 1D, 2D, and 3D, including standard convolution, transposed convolution, depthwise, and separable convolutions.
126
127
```python { .api }
128
class Conv2D(Layer):
129
def __init__(self, filters, kernel_size, strides=(1, 1), padding='valid',
130
data_format=None, dilation_rate=(1, 1), groups=1,
131
activation=None, use_bias=True, **kwargs):
132
"""
133
2D convolution layer.
134
135
Parameters:
136
- filters: Number of output filters
137
- kernel_size: Size of convolution window
138
- strides: Stride of convolution
139
- padding: Padding mode ('valid' or 'same')
140
- data_format: Data format ('channels_last' or 'channels_first')
141
- dilation_rate: Dilation rate for dilated convolution
142
- groups: Number of groups for grouped convolution
143
- activation: Activation function
144
- use_bias: Whether to use bias
145
"""
146
147
class Conv1D(Layer):
148
def __init__(self, filters, kernel_size, strides=1, padding='valid',
149
data_format='channels_last', dilation_rate=1, groups=1,
150
activation=None, use_bias=True, **kwargs):
151
"""1D convolution layer."""
152
153
class Conv3D(Layer):
154
def __init__(self, filters, kernel_size, strides=(1, 1, 1), padding='valid',
155
data_format=None, dilation_rate=(1, 1, 1), groups=1,
156
activation=None, use_bias=True, **kwargs):
157
"""3D convolution layer."""
158
159
class Conv2DTranspose(Layer):
160
def __init__(self, filters, kernel_size, strides=(1, 1), padding='valid',
161
output_padding=None, data_format=None, dilation_rate=(1, 1),
162
activation=None, use_bias=True, **kwargs):
163
"""2D transposed convolution layer."""
164
165
class DepthwiseConv2D(Layer):
166
def __init__(self, kernel_size, strides=(1, 1), padding='valid',
167
depth_multiplier=1, data_format=None, dilation_rate=(1, 1),
168
activation=None, use_bias=True, **kwargs):
169
"""2D depthwise convolution layer."""
170
171
class SeparableConv2D(Layer):
172
def __init__(self, filters, kernel_size, strides=(1, 1), padding='valid',
173
data_format=None, dilation_rate=(1, 1), depth_multiplier=1,
174
activation=None, use_bias=True, **kwargs):
175
"""2D separable convolution layer."""
176
```
177
178
### Pooling Layers
179
180
Pooling operations for downsampling feature maps using max pooling, average pooling, and global pooling variants.
181
182
```python { .api }
183
class MaxPooling2D(Layer):
184
def __init__(self, pool_size=(2, 2), strides=None, padding='valid',
185
data_format=None, **kwargs):
186
"""
187
2D max pooling layer.
188
189
Parameters:
190
- pool_size: Size of pooling window
191
- strides: Stride of pooling operation
192
- padding: Padding mode
193
- data_format: Data format
194
"""
195
196
class AveragePooling2D(Layer):
197
def __init__(self, pool_size=(2, 2), strides=None, padding='valid',
198
data_format=None, **kwargs):
199
"""2D average pooling layer."""
200
201
class GlobalMaxPooling2D(Layer):
202
def __init__(self, data_format=None, keepdims=False, **kwargs):
203
"""
204
Global max pooling for 2D data.
205
206
Parameters:
207
- data_format: Data format
208
- keepdims: Whether to keep spatial dimensions
209
"""
210
211
class GlobalAveragePooling2D(Layer):
212
def __init__(self, data_format=None, keepdims=False, **kwargs):
213
"""Global average pooling for 2D data."""
214
```
215
216
### Recurrent Layers
217
218
Recurrent neural network layers including LSTM, GRU, and simple RNN variants for sequence processing.
219
220
```python { .api }
221
class LSTM(Layer):
222
def __init__(self, units, activation='tanh', recurrent_activation='sigmoid',
223
use_bias=True, kernel_initializer='glorot_uniform',
224
recurrent_initializer='orthogonal', bias_initializer='zeros',
225
dropout=0.0, recurrent_dropout=0.0, return_sequences=False,
226
return_state=False, go_backwards=False, stateful=False,
227
unroll=False, **kwargs):
228
"""
229
Long Short-Term Memory layer.
230
231
Parameters:
232
- units: Dimensionality of output space
233
- activation: Activation function for gates
234
- recurrent_activation: Activation function for recurrent step
235
- use_bias: Whether to use bias vectors
236
- kernel_initializer: Initializer for input weights
237
- recurrent_initializer: Initializer for recurrent weights
238
- bias_initializer: Initializer for bias vectors
239
- dropout: Dropout rate for input connections
240
- recurrent_dropout: Dropout rate for recurrent connections
241
- return_sequences: Whether to return full sequence or last output
242
- return_state: Whether to return last state in addition to output
243
- go_backwards: Whether to process sequence backwards
244
- stateful: Whether to reset states between batches
245
- unroll: Whether to unroll the recurrent loop
246
"""
247
248
class GRU(Layer):
249
def __init__(self, units, activation='tanh', recurrent_activation='sigmoid',
250
use_bias=True, dropout=0.0, recurrent_dropout=0.0,
251
return_sequences=False, return_state=False, **kwargs):
252
"""Gated Recurrent Unit layer."""
253
254
class SimpleRNN(Layer):
255
def __init__(self, units, activation='tanh', use_bias=True, dropout=0.0,
256
recurrent_dropout=0.0, return_sequences=False,
257
return_state=False, **kwargs):
258
"""Simple RNN layer."""
259
260
class Bidirectional(Layer):
261
def __init__(self, layer, merge_mode='concat', weights=None, **kwargs):
262
"""
263
Bidirectional wrapper for RNNs.
264
265
Parameters:
266
- layer: RNN layer to wrap
267
- merge_mode: How to combine forward and backward outputs
268
- weights: Initial weights
269
"""
270
```
271
272
### Attention Layers
273
274
Attention mechanisms for focusing on relevant parts of input sequences and implementing transformer-style architectures.
275
276
```python { .api }
277
class MultiHeadAttention(Layer):
278
def __init__(self, num_heads, key_dim, value_dim=None, dropout=0.0,
279
use_bias=True, output_shape=None, **kwargs):
280
"""
281
Multi-head attention layer.
282
283
Parameters:
284
- num_heads: Number of attention heads
285
- key_dim: Size of each attention head for query and key
286
- value_dim: Size of each attention head for value
287
- dropout: Dropout probability for attention weights
288
- use_bias: Whether to use bias in linear projections
289
- output_shape: Expected shape of output tensor
290
"""
291
292
class Attention(Layer):
293
def __init__(self, use_scale=False, score_mode='dot', **kwargs):
294
"""
295
Attention layer for computing attention weights.
296
297
Parameters:
298
- use_scale: Whether to scale attention scores
299
- score_mode: Type of attention score computation
300
"""
301
```
302
303
### Normalization Layers
304
305
Normalization techniques for stabilizing and accelerating training, including batch normalization, layer normalization, and group normalization.
306
307
```python { .api }
308
class BatchNormalization(Layer):
309
def __init__(self, axis=-1, momentum=0.99, epsilon=1e-3, center=True,
310
scale=True, beta_initializer='zeros', gamma_initializer='ones',
311
**kwargs):
312
"""
313
Batch normalization layer.
314
315
Parameters:
316
- axis: Axis to normalize along
317
- momentum: Momentum for moving statistics
318
- epsilon: Small constant for numerical stability
319
- center: Whether to add learned offset parameter
320
- scale: Whether to add learned scaling parameter
321
- beta_initializer: Initializer for beta parameter
322
- gamma_initializer: Initializer for gamma parameter
323
"""
324
325
class LayerNormalization(Layer):
326
def __init__(self, axis=-1, epsilon=1e-3, center=True, scale=True,
327
beta_initializer='zeros', gamma_initializer='ones', **kwargs):
328
"""Layer normalization layer."""
329
330
class GroupNormalization(Layer):
331
def __init__(self, groups=32, axis=-1, epsilon=1e-3, center=True,
332
scale=True, **kwargs):
333
"""
334
Group normalization layer.
335
336
Parameters:
337
- groups: Number of groups for normalization
338
- axis: Axis to normalize along
339
- epsilon: Small constant for numerical stability
340
- center: Whether to add learned offset parameter
341
- scale: Whether to add learned scaling parameter
342
"""
343
```
344
345
### Regularization Layers
346
347
Layers for regularization including various dropout techniques and noise injection to prevent overfitting.
348
349
```python { .api }
350
class Dropout(Layer):
351
def __init__(self, rate, noise_shape=None, seed=None, **kwargs):
352
"""
353
Dropout layer for regularization.
354
355
Parameters:
356
- rate: Fraction of input units to drop
357
- noise_shape: Shape of binary dropout mask
358
- seed: Random seed for dropout
359
"""
360
361
class SpatialDropout2D(Layer):
362
def __init__(self, rate, data_format=None, **kwargs):
363
"""
364
2D spatial dropout layer.
365
366
Parameters:
367
- rate: Fraction of input units to drop
368
- data_format: Data format
369
"""
370
371
class GaussianNoise(Layer):
372
def __init__(self, stddev, **kwargs):
373
"""
374
Gaussian noise regularization layer.
375
376
Parameters:
377
- stddev: Standard deviation of noise distribution
378
"""
379
380
class GaussianDropout(Layer):
381
def __init__(self, rate, **kwargs):
382
"""
383
Multiplicative Gaussian noise layer.
384
385
Parameters:
386
- rate: Drop probability as in Dropout
387
"""
388
```
389
390
### Activation Layers
391
392
Activation functions implemented as layers for explicit control and custom activation patterns.
393
394
```python { .api }
395
class Activation(Layer):
396
def __init__(self, activation, **kwargs):
397
"""
398
Activation layer.
399
400
Parameters:
401
- activation: Name of activation function or callable
402
"""
403
404
class ReLU(Layer):
405
def __init__(self, max_value=None, negative_slope=0.0, threshold=0.0, **kwargs):
406
"""
407
ReLU activation layer.
408
409
Parameters:
410
- max_value: Maximum activation value
411
- negative_slope: Slope for negative values
412
- threshold: Threshold value for activation
413
"""
414
415
class LeakyReLU(Layer):
416
def __init__(self, alpha=0.3, **kwargs):
417
"""
418
Leaky ReLU activation layer.
419
420
Parameters:
421
- alpha: Slope for negative values
422
"""
423
424
class ELU(Layer):
425
def __init__(self, alpha=1.0, **kwargs):
426
"""
427
ELU activation layer.
428
429
Parameters:
430
- alpha: Scale for negative values
431
"""
432
433
class Softmax(Layer):
434
def __init__(self, axis=-1, **kwargs):
435
"""
436
Softmax activation layer.
437
438
Parameters:
439
- axis: Axis along which to apply softmax
440
"""
441
```
442
443
### Merging Layers
444
445
Layers for combining multiple input tensors through various operations like addition, concatenation, and element-wise operations.
446
447
```python { .api }
448
class Add(Layer):
449
def __init__(self, **kwargs):
450
"""Element-wise addition layer."""
451
452
class Concatenate(Layer):
453
def __init__(self, axis=-1, **kwargs):
454
"""
455
Concatenation layer.
456
457
Parameters:
458
- axis: Axis along which to concatenate
459
"""
460
461
class Multiply(Layer):
462
def __init__(self, **kwargs):
463
"""Element-wise multiplication layer."""
464
465
class Average(Layer):
466
def __init__(self, **kwargs):
467
"""Element-wise averaging layer."""
468
469
class Maximum(Layer):
470
def __init__(self, **kwargs):
471
"""Element-wise maximum layer."""
472
473
class Minimum(Layer):
474
def __init__(self, **kwargs):
475
"""Element-wise minimum layer."""
476
477
class Dot(Layer):
478
def __init__(self, axes, normalize=False, **kwargs):
479
"""
480
Dot product layer.
481
482
Parameters:
483
- axes: Axes to compute dot product over
484
- normalize: Whether to normalize inputs
485
"""
486
```
487
488
## Usage Examples
489
490
### Building a CNN
491
492
```python
493
import keras
494
from keras import layers
495
496
model = keras.Sequential([
497
layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)),
498
layers.MaxPooling2D((2, 2)),
499
layers.Conv2D(64, (3, 3), activation='relu'),
500
layers.MaxPooling2D((2, 2)),
501
layers.Conv2D(64, (3, 3), activation='relu'),
502
layers.Flatten(),
503
layers.Dense(64, activation='relu'),
504
layers.Dense(10, activation='softmax')
505
])
506
```
507
508
### Building an LSTM for Sequence Processing
509
510
```python
511
import keras
512
from keras import layers
513
514
model = keras.Sequential([
515
layers.Embedding(10000, 128, input_length=100),
516
layers.LSTM(64, dropout=0.2, recurrent_dropout=0.2),
517
layers.Dense(1, activation='sigmoid')
518
])
519
```
520
521
### Using Functional API for Complex Architecture
522
523
```python
524
import keras
525
from keras import layers
526
527
inputs = keras.Input(shape=(784,))
528
x = layers.Dense(128, activation='relu')(inputs)
529
x = layers.Dropout(0.2)(x)
530
branch1 = layers.Dense(64, activation='relu', name='branch1')(x)
531
branch2 = layers.Dense(64, activation='relu', name='branch2')(x)
532
merged = layers.Add()([branch1, branch2])
533
outputs = layers.Dense(10, activation='softmax')(merged)
534
535
model = keras.Model(inputs=inputs, outputs=outputs)
536
```