0
# Keras-Preprocessing
1
2
Easy data preprocessing and data augmentation for deep learning models. Keras-Preprocessing provides comprehensive utilities for text tokenization, sequence padding, and image augmentation specifically designed for training deep neural networks.
3
4
## Package Information
5
6
- **Package Name**: keras-preprocessing
7
- **Language**: Python
8
- **Installation**: `pip install keras-preprocessing`
9
10
## Core Imports
11
12
```python
13
import keras_preprocessing
14
```
15
16
Specific modules:
17
18
```python
19
from keras_preprocessing.text import Tokenizer, text_to_word_sequence
20
from keras_preprocessing.sequence import pad_sequences, TimeseriesGenerator
21
from keras_preprocessing.image import ImageDataGenerator, load_img, img_to_array
22
```
23
24
Legacy compatibility imports:
25
26
```python
27
from keras_preprocessing import image, text, sequence
28
```
29
30
## Basic Usage
31
32
```python
33
from keras_preprocessing.text import Tokenizer
34
from keras_preprocessing.sequence import pad_sequences
35
from keras_preprocessing.image import ImageDataGenerator
36
37
# Text preprocessing
38
tokenizer = Tokenizer(num_words=1000)
39
texts = ['hello world', 'deep learning', 'neural networks']
40
tokenizer.fit_on_texts(texts)
41
sequences = tokenizer.texts_to_sequences(texts)
42
43
# Sequence padding
44
padded = pad_sequences(sequences, maxlen=10, padding='post')
45
46
# Image data augmentation
47
datagen = ImageDataGenerator(
48
rotation_range=20,
49
width_shift_range=0.2,
50
height_shift_range=0.2,
51
horizontal_flip=True
52
)
53
54
# Load data from directory
55
train_generator = datagen.flow_from_directory(
56
'train_data/',
57
target_size=(224, 224),
58
batch_size=32,
59
class_mode='categorical'
60
)
61
```
62
63
## Architecture
64
65
Keras-Preprocessing is organized into three main functional modules:
66
67
- **Text Module**: Tokenization, text-to-sequence conversion, and vocabulary management for NLP tasks
68
- **Sequence Module**: Padding, sampling, and temporal data generation for sequential models
69
- **Image Module**: Data generators, augmentation pipelines, and image transformations for computer vision
70
71
Each module provides both low-level utilities and high-level generators that integrate seamlessly with Keras training workflows.
72
73
## Capabilities
74
75
### Text Processing
76
77
Text tokenization, vocabulary management, and text-to-sequence conversion utilities for natural language processing. Includes hashing tricks, one-hot encoding, and comprehensive tokenization with configurable filtering and preprocessing.
78
79
```python { .api }
80
class Tokenizer:
81
def __init__(self, num_words=None, filters='!"#$%&()*+,-./:;<=>?@[\\]^_`{|}~\t\n',
82
lower=True, split=' ', char_level=False, oov_token=None, **kwargs): ...
83
def fit_on_texts(self, texts): ...
84
def texts_to_sequences(self, texts): ...
85
def texts_to_matrix(self, texts, mode='binary'): ...
86
87
def text_to_word_sequence(text, filters='!"#$%&()*+,-./:;<=>?@[\\]^_`{|}~\t\n',
88
lower=True, split=" "): ...
89
def one_hot(text, n, filters='!"#$%&()*+,-./:;<=>?@[\\]^_`{|}~\t\n',
90
lower=True, split=' '): ...
91
```
92
93
[Text Processing](./text-processing.md)
94
95
### Sequence Processing
96
97
Sequence padding, temporal data generation, and utilities for preparing sequential data for neural networks. Includes padding sequences to uniform length, generating skipgrams for word2vec, and creating time series batches.
98
99
```python { .api }
100
def pad_sequences(sequences, maxlen=None, dtype='int32', padding='pre',
101
truncating='pre', value=0.): ...
102
103
class TimeseriesGenerator:
104
def __init__(self, data, targets, length, sampling_rate=1, stride=1,
105
start_index=0, end_index=None, shuffle=False, reverse=False,
106
batch_size=128): ...
107
def __getitem__(self, index): ...
108
109
def skipgrams(sequence, vocabulary_size, window_size=4, negative_samples=1.,
110
shuffle=True, categorical=False, sampling_table=None, seed=None): ...
111
```
112
113
[Sequence Processing](./sequence-processing.md)
114
115
### Image Processing
116
117
Comprehensive image data augmentation, loading, and preprocessing utilities for computer vision models. Includes data generators, transformation functions, file utilities, and multiple data source iterators.
118
119
```python { .api }
120
class ImageDataGenerator:
121
def __init__(self, rotation_range=0., width_shift_range=0.,
122
height_shift_range=0., horizontal_flip=False, **kwargs): ...
123
def flow(self, x, y=None, batch_size=32, shuffle=True, **kwargs): ...
124
def flow_from_directory(self, directory, target_size=(256, 256),
125
color_mode='rgb', batch_size=32, **kwargs): ...
126
def flow_from_dataframe(self, dataframe, x_col="filename", y_col="class",
127
target_size=(256, 256), **kwargs): ...
128
129
def load_img(path, color_mode='rgb', target_size=None, interpolation='nearest'): ...
130
def img_to_array(img, data_format='channels_last', dtype='float32'): ...
131
def array_to_img(x, data_format='channels_last', scale=True, dtype='float32'): ...
132
```
133
134
[Image Processing](./image-processing.md)
135
136
## Types
137
138
```python { .api }
139
# Common types used across modules
140
NDArray = numpy.ndarray
141
PILImage = PIL.Image.Image
142
Generator = typing.Generator
143
Iterator = typing.Iterator
144
```
145
146
## Compatibility
147
148
```python { .api }
149
def set_keras_submodules(backend, utils):
150
"""Set Keras backend and utils submodules (deprecated)."""
151
152
def get_keras_submodule(name):
153
"""Retrieve Keras submodule by name (deprecated)."""
154
155
__version__ = '1.1.2'
156
```