0
# DataSet Iteration
1
2
Core functionality for converting RecordReader data into DataSet objects suitable for DeepLearning4j training. The RecordReaderDataSetIterator provides a bridge between DataVec's data reading capabilities and DeepLearning4j's training requirements.
3
4
## Capabilities
5
6
### RecordReaderDataSetIterator
7
8
Main class for converting RecordReader data into DataSet objects for neural network training.
9
10
```java { .api }
11
public class RecordReaderDataSetIterator implements DataSetIterator, Serializable {
12
// Main constructors
13
public RecordReaderDataSetIterator(RecordReader recordReader, int batchSize);
14
public RecordReaderDataSetIterator(RecordReader recordReader, int batchSize,
15
int labelIndex, int numPossibleLabels);
16
public RecordReaderDataSetIterator(RecordReader recordReader, int batchSize,
17
int labelIndex, int numPossibleLabels,
18
boolean regression);
19
public RecordReaderDataSetIterator(RecordReader recordReader,
20
WritableConverter converter, int batchSize,
21
int labelIndex, int numPossibleLabels,
22
boolean regression);
23
public RecordReaderDataSetIterator(RecordReader recordReader, int batchSize,
24
int labelIndexFrom, int labelIndexTo,
25
boolean regression);
26
public RecordReaderDataSetIterator(RecordReader recordReader, int batchSize,
27
int labelIndex, int numPossibleLabels,
28
int maxNumBatches);
29
30
// Iterator methods
31
public boolean hasNext();
32
public DataSet next();
33
public DataSet next(int num);
34
public void remove();
35
36
// Configuration methods
37
public void setPreProcessor(DataSetPreProcessor preProcessor);
38
public DataSetPreProcessor getPreProcessor();
39
public void setCollectMetaData(boolean collectMetaData);
40
public boolean getCollectMetaData();
41
42
// Information methods
43
public int totalExamples();
44
public int inputColumns();
45
public int totalOutcomes();
46
public int batch();
47
public int cursor();
48
public int numExamples();
49
public List<String> getLabels();
50
51
// Reset and async support
52
public boolean resetSupported();
53
public boolean asyncSupported();
54
public void reset();
55
56
// Metadata support
57
public DataSet loadFromMetaData(RecordMetaData recordMetaData) throws IOException;
58
public DataSet loadFromMetaData(List<RecordMetaData> recordMetaDatas) throws IOException;
59
}
60
```
61
62
## Constructor Parameters
63
64
### Basic Constructor
65
- **recordReader**: The RecordReader to read data from
66
- **batchSize**: Number of examples per batch
67
68
### Classification Constructor
69
- **recordReader**: The RecordReader to read data from
70
- **batchSize**: Number of examples per batch
71
- **labelIndex**: Column index containing the label (0-based)
72
- **numPossibleLabels**: Number of possible label classes
73
74
### Advanced Constructor
75
- **recordReader**: The RecordReader to read data from
76
- **converter**: WritableConverter for data type conversion (null for default)
77
- **batchSize**: Number of examples per batch
78
- **labelIndex**: Column index containing the label (0-based)
79
- **numPossibleLabels**: Number of possible label classes
80
- **regression**: true for regression, false for classification
81
82
### Multi-Label Constructor
83
- **recordReader**: The RecordReader to read data from
84
- **batchSize**: Number of examples per batch
85
- **labelIndexFrom**: Starting column index for labels (inclusive)
86
- **labelIndexTo**: Ending column index for labels (inclusive)
87
- **regression**: true for regression, false for classification
88
89
### Batch-Limited Constructor
90
- **recordReader**: The RecordReader to read data from
91
- **batchSize**: Number of examples per batch
92
- **labelIndex**: Column index containing the label (0-based)
93
- **numPossibleLabels**: Number of possible label classes
94
- **maxNumBatches**: Maximum number of batches to iterate over
95
96
## Usage Examples
97
98
### Basic CSV Classification
99
100
```java
101
import org.datavec.api.records.reader.impl.csv.CSVRecordReader;
102
import org.datavec.api.split.FileSplit;
103
import org.deeplearning4j.datasets.datavec.RecordReaderDataSetIterator;
104
105
// Setup CSV reader
106
RecordReader csvReader = new CSVRecordReader();
107
csvReader.initialize(new FileSplit(new File("iris.csv")));
108
109
// Create iterator for classification
110
DataSetIterator iterator = new RecordReaderDataSetIterator(
111
csvReader, // recordReader
112
32, // batchSize
113
4, // labelIndex (column 4 contains labels)
114
3 // numPossibleLabels (3 classes)
115
);
116
117
// Use iterator
118
while (iterator.hasNext()) {
119
DataSet dataSet = iterator.next();
120
System.out.println("Features shape: " + Arrays.toString(dataSet.getFeatures().shape()));
121
System.out.println("Labels shape: " + Arrays.toString(dataSet.getLabels().shape()));
122
}
123
```
124
125
### Regression Example
126
127
```java
128
// Setup for regression task
129
DataSetIterator regressionIterator = new RecordReaderDataSetIterator(
130
csvReader, // recordReader
131
64, // batchSize
132
5, // labelIndex (column 5 contains continuous target)
133
1, // numPossibleLabels (1 for regression)
134
true // regression = true
135
);
136
```
137
138
### Multi-Label Classification
139
140
```java
141
// Labels in columns 3, 4, and 5
142
DataSetIterator multiLabelIterator = new RecordReaderDataSetIterator(
143
csvReader, // recordReader
144
32, // batchSize
145
3, // labelIndexFrom (start of label columns)
146
5, // labelIndexTo (end of label columns)
147
false // regression = false (classification)
148
);
149
```
150
151
### With Data Preprocessing
152
153
```java
154
import org.nd4j.linalg.dataset.api.preprocessor.NormalizerMinMaxScaler;
155
156
// Create iterator
157
DataSetIterator iterator = new RecordReaderDataSetIterator(csvReader, 32, 4, 3);
158
159
// Add preprocessing
160
NormalizerMinMaxScaler scaler = new NormalizerMinMaxScaler();
161
iterator.setPreProcessor(scaler);
162
163
// First pass to calculate min/max
164
scaler.fit(iterator);
165
iterator.reset();
166
167
// Now use normalized data
168
while (iterator.hasNext()) {
169
DataSet normalizedData = iterator.next();
170
// Train with normalized data
171
}
172
```
173
174
### Metadata Collection
175
176
```java
177
// Enable metadata collection
178
RecordReaderDataSetIterator iterator = new RecordReaderDataSetIterator(
179
csvReader, 32, 4, 3);
180
iterator.setCollectMetaData(true);
181
182
// Process data
183
DataSet batch = iterator.next();
184
List<RecordMetaData> metaData = batch.getExampleMetaData();
185
186
// Later, load specific examples by metadata
187
DataSet specificExample = iterator.loadFromMetaData(metaData.get(0));
188
```
189
190
## Error Handling
191
192
The iterator handles various error conditions:
193
194
- **IOException**: Thrown when RecordReader encounters file reading errors
195
- **IllegalArgumentException**: Thrown for invalid constructor parameters
196
- **NoSuchElementException**: Thrown when calling next() with no more data
197
198
Common validation performed:
199
- batchSize must be positive
200
- labelIndex must be valid column index
201
- numPossibleLabels must be positive for classification
202
- labelIndexFrom must be <= labelIndexTo for multi-label scenarios