0
# Record Readers
1
2
Record readers provide the core functionality for reading structured data from various sources in DataVec. They implement a consistent iterator-based pattern and support metadata tracking for data lineage and debugging.
3
4
## Capabilities
5
6
### Core RecordReader Interface
7
8
The base interface that all record readers implement. Provides standard iteration patterns, initialization, and optional batch reading capabilities.
9
10
```java { .api }
11
public interface RecordReader {
12
void initialize(InputSplit split) throws IOException;
13
List<Writable> next();
14
boolean hasNext();
15
void reset();
16
List<String> getLabels();
17
Record nextRecord();
18
boolean batchesSupported();
19
List<Writable> next(int numRecords);
20
}
21
```
22
23
**Usage Example:**
24
25
```java
26
RecordReader reader = new CSVRecordReader();
27
reader.initialize(new FileSplit(new File("data.csv")));
28
29
while (reader.hasNext()) {
30
List<Writable> record = reader.next();
31
// Process record
32
}
33
reader.reset(); // Reset for reuse
34
```
35
36
### CSV Record Reading
37
38
Reads comma-separated values files with configurable delimiters and skip lines. Automatically handles type inference and provides labels for classification tasks.
39
40
```java { .api }
41
public class CSVRecordReader implements RecordReader {
42
public CSVRecordReader();
43
public CSVRecordReader(int skipLines, String delimiter);
44
public CSVRecordReader(int skipLines, String delimiter, String quote);
45
}
46
```
47
48
**Constructor Parameters:**
49
- `skipLines` - Number of header lines to skip (default: 0)
50
- `delimiter` - Field separator character (default: ",")
51
- `quote` - Quote character for escaped fields (default: "\"")
52
53
**Usage Example:**
54
55
```java
56
// Read CSV with header line
57
RecordReader csvReader = new CSVRecordReader(1, ",");
58
csvReader.initialize(new FileSplit(new File("data.csv")));
59
60
while (csvReader.hasNext()) {
61
List<Writable> record = csvReader.next();
62
// First column as integer
63
int id = record.get(0).toInt();
64
// Second column as string
65
String name = record.get(1).toString();
66
// Third column as double
67
double value = record.get(2).toDouble();
68
}
69
```
70
71
### Sequence Record Reading
72
73
Handles sequential or time-series data where each record consists of multiple time steps. Extends RecordReader with sequence-specific methods.
74
75
```java { .api }
76
public interface SequenceRecordReader extends RecordReader {
77
List<List<Writable>> sequenceRecord();
78
List<List<Writable>> sequenceRecord(URI uri, DataInputStream dataInputStream) throws IOException;
79
SequenceRecord nextSequence();
80
}
81
82
public class CSVSequenceRecordReader implements SequenceRecordReader {
83
public CSVSequenceRecordReader();
84
public CSVSequenceRecordReader(int skipLines, String delimiter);
85
}
86
```
87
88
**Usage Example:**
89
90
```java
91
SequenceRecordReader seqReader = new CSVSequenceRecordReader();
92
seqReader.initialize(new FileSplit(new File("sequence_data.csv")));
93
94
while (seqReader.hasNext()) {
95
List<List<Writable>> sequence = seqReader.sequenceRecord();
96
// Process sequence - each inner list is a time step
97
for (List<Writable> timeStep : sequence) {
98
// Process individual time step
99
}
100
}
101
```
102
103
### Collection Record Reading
104
105
Reads data from in-memory Java collections, useful for testing and when data is already loaded in memory.
106
107
```java { .api }
108
public class CollectionRecordReader implements RecordReader {
109
public CollectionRecordReader(Collection<Collection<Writable>> records);
110
public CollectionRecordReader(RecordReader recordReader);
111
}
112
113
public class CollectionSequenceRecordReader implements SequenceRecordReader {
114
public CollectionSequenceRecordReader(Collection<Collection<Collection<Writable>>> sequences);
115
}
116
```
117
118
**Usage Example:**
119
120
```java
121
// Create data in memory
122
List<List<Writable>> data = Arrays.asList(
123
Arrays.asList(new IntWritable(1), new DoubleWritable(2.5)),
124
Arrays.asList(new IntWritable(2), new DoubleWritable(3.7))
125
);
126
127
RecordReader collectionReader = new CollectionRecordReader(data);
128
collectionReader.initialize(new CollectionInputSplit(data));
129
130
while (collectionReader.hasNext()) {
131
List<Writable> record = collectionReader.next();
132
// Process in-memory record
133
}
134
```
135
136
### Record Metadata Support
137
138
All record readers support metadata tracking for data provenance and debugging. Metadata includes source location, line numbers, and transformation history.
139
140
```java { .api }
141
public interface Record {
142
List<Writable> getRecord();
143
RecordMetaData getMetaData();
144
}
145
146
public interface RecordMetaData {
147
String getLocation();
148
URI getURI();
149
Class<?> getReaderClass();
150
}
151
```
152
153
**Usage Example:**
154
155
```java
156
RecordReader reader = new CSVRecordReader();
157
reader.initialize(new FileSplit(new File("data.csv")));
158
159
while (reader.hasNext()) {
160
Record recordWithMeta = reader.nextRecord();
161
List<Writable> data = recordWithMeta.getRecord();
162
RecordMetaData meta = recordWithMeta.getMetaData();
163
164
System.out.println("Data from: " + meta.getLocation());
165
// Process data with metadata context
166
}
167
```
168
169
### Batch Reading Support
170
171
Some record readers support batch reading for improved performance when processing large datasets.
172
173
```java { .api }
174
// Check if batch reading is supported
175
if (reader.batchesSupported()) {
176
List<Writable> batch = reader.next(batchSize);
177
// Process batch of records
178
}
179
```
180
181
### Advanced Metadata Support
182
183
DataVec provides comprehensive metadata tracking through a hierarchy of interfaces and classes that enable data lineage, debugging, and provenance tracking.
184
185
```java { .api }
186
public interface RecordMetaData {
187
String getLocation();
188
URI getURI();
189
Class<?> getReaderClass();
190
}
191
192
public interface RecordMetaDataComposable extends RecordMetaData {
193
List<RecordMetaData> getMeta();
194
}
195
196
public class RecordMetaDataComposableMap implements RecordMetaDataComposable {
197
public RecordMetaDataComposableMap(Map<String, RecordMetaData> meta);
198
public RecordMetaData getMeta(String key);
199
public Set<String> getMetaKeys();
200
}
201
```
202
203
**Usage Example:**
204
205
```java
206
RecordReader reader = new CSVRecordReader();
207
reader.initialize(new FileSplit(new File("data.csv")));
208
209
while (reader.hasNext()) {
210
Record recordWithMeta = reader.nextRecord();
211
List<Writable> data = recordWithMeta.getRecord();
212
RecordMetaData meta = recordWithMeta.getMetaData();
213
214
// Access metadata information
215
String sourceLocation = meta.getLocation();
216
URI sourceURI = meta.getURI();
217
Class<?> readerClass = meta.getReaderClass();
218
219
System.out.println("Processing record from: " + sourceLocation);
220
System.out.println("Read by: " + readerClass.getSimpleName());
221
222
// For composite metadata
223
if (meta instanceof RecordMetaDataComposable) {
224
RecordMetaDataComposable composite = (RecordMetaDataComposable) meta;
225
List<RecordMetaData> allMeta = composite.getMeta();
226
System.out.println("Composite metadata contains " + allMeta.size() + " entries");
227
}
228
}
229
```
230
231
### Exception Handling in DataVec
232
233
DataVec defines specific exceptions for different error conditions during data processing:
234
235
```java { .api }
236
public class WritableConverterException extends Exception {
237
public WritableConverterException(String message);
238
public WritableConverterException(String message, Throwable cause);
239
}
240
241
public class ZeroLengthSequenceException extends RuntimeException {
242
public ZeroLengthSequenceException(String message);
243
}
244
```
245
246
**Common Exception Scenarios:**
247
248
```java
249
try {
250
RecordReader reader = new CSVRecordReader();
251
reader.initialize(new FileSplit(new File("data.csv")));
252
253
while (reader.hasNext()) {
254
List<Writable> record = reader.next();
255
256
// Custom converter may throw WritableConverterException
257
WritableConverter converter = new CustomConverter();
258
for (int i = 0; i < record.size(); i++) {
259
Writable converted = converter.convert(record.get(i));
260
record.set(i, converted);
261
}
262
}
263
} catch (IOException e) {
264
// Handle file I/O errors
265
System.err.println("Error reading file: " + e.getMessage());
266
} catch (WritableConverterException e) {
267
// Handle data conversion errors
268
System.err.println("Error converting data: " + e.getMessage());
269
} catch (ZeroLengthSequenceException e) {
270
// Handle empty sequence errors
271
System.err.println("Empty sequence encountered: " + e.getMessage());
272
}
273
```
274
275
### File-Based Record Reading
276
277
Reads data from general file inputs with customizable parsing logic.
278
279
```java { .api }
280
public class FileRecordReader implements RecordReader {
281
public FileRecordReader();
282
public FileRecordReader(RecordReader wrappedReader);
283
}
284
```
285
286
**Usage Example:**
287
288
```java
289
FileRecordReader fileReader = new FileRecordReader();
290
fileReader.initialize(new FileSplit(new File("data.txt")));
291
292
while (fileReader.hasNext()) {
293
List<Writable> record = fileReader.next();
294
// Process file-based record
295
}
296
```
297
298
### Line-by-Line Record Reading
299
300
Reads text files line by line, treating each line as a single record.
301
302
```java { .api }
303
public class LineRecordReader implements RecordReader {
304
public LineRecordReader();
305
public LineRecordReader(String delimiter);
306
}
307
```
308
309
**Usage Example:**
310
311
```java
312
LineRecordReader lineReader = new LineRecordReader();
313
lineReader.initialize(new FileSplit(new File("textfile.txt")));
314
315
while (lineReader.hasNext()) {
316
List<Writable> record = lineReader.next();
317
String line = record.get(0).toString(); // Each record contains one line as Text
318
}
319
```
320
321
### Composable Record Reading
322
323
Combines multiple record readers for complex data processing workflows.
324
325
```java { .api }
326
public class ComposableRecordReader implements RecordReader {
327
public ComposableRecordReader(RecordReader... readers);
328
public ComposableRecordReader(List<RecordReader> readers);
329
}
330
```
331
332
**Usage Example:**
333
334
```java
335
RecordReader csvReader = new CSVRecordReader();
336
RecordReader imageReader = new ImageRecordReader(64, 64, 3, labelGenerator);
337
338
ComposableRecordReader composableReader = new ComposableRecordReader(csvReader, imageReader);
339
composableReader.initialize(new FileSplit(new File("mixed_data")));
340
341
while (composableReader.hasNext()) {
342
List<Writable> record = composableReader.next();
343
// Process combined record from multiple readers
344
}
345
```
346
347
### Concatenating Record Reading
348
349
Sequentially processes multiple record readers, concatenating their outputs.
350
351
```java { .api }
352
public class ConcatenatingRecordReader implements RecordReader {
353
public ConcatenatingRecordReader(RecordReader... readers);
354
public ConcatenatingRecordReader(List<RecordReader> readers);
355
}
356
```
357
358
**Usage Example:**
359
360
```java
361
RecordReader reader1 = new CSVRecordReader();
362
reader1.initialize(new FileSplit(new File("part1.csv")));
363
364
RecordReader reader2 = new CSVRecordReader();
365
reader2.initialize(new FileSplit(new File("part2.csv")));
366
367
ConcatenatingRecordReader concatReader = new ConcatenatingRecordReader(reader1, reader2);
368
369
while (concatReader.hasNext()) {
370
List<Writable> record = concatReader.next();
371
// Process records from both files sequentially
372
}
373
```
374
375
## Integration Patterns
376
377
### With DataSetIterator
378
379
Record readers integrate seamlessly with DL4J's DataSetIterator for machine learning workflows:
380
381
```java
382
RecordReader recordReader = new CSVRecordReader();
383
recordReader.initialize(new FileSplit(new File("training_data.csv")));
384
385
DataSetIterator iterator = new RecordReaderDataSetIterator(
386
recordReader,
387
batchSize, // Number of examples per batch
388
labelIndex, // Column index of the label
389
numClasses // Number of possible classes
390
);
391
```
392
393
### Error Handling
394
395
Record readers may throw various exceptions during operation:
396
397
- `IOExc`ception - File I/O errors during reading
398
- `NumberFormatException` - Invalid numeric data in CSV files
399
- `IllegalStateException` - Reader not properly initialized
400
401
```java
402
try {
403
reader.initialize(new FileSplit(new File("data.csv")));
404
while (reader.hasNext()) {
405
List<Writable> record = reader.next();
406
// Process record
407
}
408
} catch (IOException e) {
409
// Handle file I/O errors
410
} catch (NumberFormatException e) {
411
// Handle invalid numeric data
412
}
413
```
414
415
## Types
416
417
### Core Interfaces
418
419
```java { .api }
420
public interface RecordReader {
421
void initialize(InputSplit split) throws IOException;
422
List<Writable> next();
423
boolean hasNext();
424
void reset();
425
List<String> getLabels();
426
Record nextRecord();
427
boolean batchesSupported();
428
List<Writable> next(int numRecords);
429
}
430
431
public interface SequenceRecordReader extends RecordReader {
432
List<List<Writable>> sequenceRecord();
433
List<List<Writable>> sequenceRecord(URI uri, DataInputStream dataInputStream) throws IOException;
434
SequenceRecord nextSequence();
435
}
436
437
public interface Record {
438
List<Writable> getRecord();
439
RecordMetaData getMetaData();
440
}
441
442
public interface SequenceRecord {
443
List<List<Writable>> getSequenceRecord();
444
RecordMetaData getMetaData();
445
}
446
```
447
448
### RecordReader Implementations
449
450
```java { .api }
451
// CSV-based readers
452
public class CSVRecordReader implements RecordReader;
453
public class CSVSequenceRecordReader implements SequenceRecordReader;
454
455
// Collection-based readers
456
public class CollectionRecordReader implements RecordReader;
457
public class CollectionSequenceRecordReader implements SequenceRecordReader;
458
459
// File-based readers
460
public class FileRecordReader implements RecordReader;
461
public class LineRecordReader implements RecordReader;
462
463
// Composite readers
464
public class ComposableRecordReader implements RecordReader;
465
public class ConcatenatingRecordReader implements RecordReader;
466
```