or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

dataset-iteration.mdindex.mdmulti-input-output.mdsequence-processing.mdspark-integration.md

dataset-iteration.mddocs/

0

# DataSet Iteration

1

2

Core functionality for converting RecordReader data into DataSet objects suitable for DeepLearning4j training. The RecordReaderDataSetIterator provides a bridge between DataVec's data reading capabilities and DeepLearning4j's training requirements.

3

4

## Capabilities

5

6

### RecordReaderDataSetIterator

7

8

Main class for converting RecordReader data into DataSet objects for neural network training.

9

10

```java { .api }

11

public class RecordReaderDataSetIterator implements DataSetIterator, Serializable {

12

// Main constructors

13

public RecordReaderDataSetIterator(RecordReader recordReader, int batchSize);

14

public RecordReaderDataSetIterator(RecordReader recordReader, int batchSize,

15

int labelIndex, int numPossibleLabels);

16

public RecordReaderDataSetIterator(RecordReader recordReader, int batchSize,

17

int labelIndex, int numPossibleLabels,

18

boolean regression);

19

public RecordReaderDataSetIterator(RecordReader recordReader,

20

WritableConverter converter, int batchSize,

21

int labelIndex, int numPossibleLabels,

22

boolean regression);

23

public RecordReaderDataSetIterator(RecordReader recordReader, int batchSize,

24

int labelIndexFrom, int labelIndexTo,

25

boolean regression);

26

public RecordReaderDataSetIterator(RecordReader recordReader, int batchSize,

27

int labelIndex, int numPossibleLabels,

28

int maxNumBatches);

29

30

// Iterator methods

31

public boolean hasNext();

32

public DataSet next();

33

public DataSet next(int num);

34

public void remove();

35

36

// Configuration methods

37

public void setPreProcessor(DataSetPreProcessor preProcessor);

38

public DataSetPreProcessor getPreProcessor();

39

public void setCollectMetaData(boolean collectMetaData);

40

public boolean getCollectMetaData();

41

42

// Information methods

43

public int totalExamples();

44

public int inputColumns();

45

public int totalOutcomes();

46

public int batch();

47

public int cursor();

48

public int numExamples();

49

public List<String> getLabels();

50

51

// Reset and async support

52

public boolean resetSupported();

53

public boolean asyncSupported();

54

public void reset();

55

56

// Metadata support

57

public DataSet loadFromMetaData(RecordMetaData recordMetaData) throws IOException;

58

public DataSet loadFromMetaData(List<RecordMetaData> recordMetaDatas) throws IOException;

59

}

60

```

61

62

## Constructor Parameters

63

64

### Basic Constructor

65

- **recordReader**: The RecordReader to read data from

66

- **batchSize**: Number of examples per batch

67

68

### Classification Constructor

69

- **recordReader**: The RecordReader to read data from

70

- **batchSize**: Number of examples per batch

71

- **labelIndex**: Column index containing the label (0-based)

72

- **numPossibleLabels**: Number of possible label classes

73

74

### Advanced Constructor

75

- **recordReader**: The RecordReader to read data from

76

- **converter**: WritableConverter for data type conversion (null for default)

77

- **batchSize**: Number of examples per batch

78

- **labelIndex**: Column index containing the label (0-based)

79

- **numPossibleLabels**: Number of possible label classes

80

- **regression**: true for regression, false for classification

81

82

### Multi-Label Constructor

83

- **recordReader**: The RecordReader to read data from

84

- **batchSize**: Number of examples per batch

85

- **labelIndexFrom**: Starting column index for labels (inclusive)

86

- **labelIndexTo**: Ending column index for labels (inclusive)

87

- **regression**: true for regression, false for classification

88

89

### Batch-Limited Constructor

90

- **recordReader**: The RecordReader to read data from

91

- **batchSize**: Number of examples per batch

92

- **labelIndex**: Column index containing the label (0-based)

93

- **numPossibleLabels**: Number of possible label classes

94

- **maxNumBatches**: Maximum number of batches to iterate over

95

96

## Usage Examples

97

98

### Basic CSV Classification

99

100

```java

101

import org.datavec.api.records.reader.impl.csv.CSVRecordReader;

102

import org.datavec.api.split.FileSplit;

103

import org.deeplearning4j.datasets.datavec.RecordReaderDataSetIterator;

104

105

// Setup CSV reader

106

RecordReader csvReader = new CSVRecordReader();

107

csvReader.initialize(new FileSplit(new File("iris.csv")));

108

109

// Create iterator for classification

110

DataSetIterator iterator = new RecordReaderDataSetIterator(

111

csvReader, // recordReader

112

32, // batchSize

113

4, // labelIndex (column 4 contains labels)

114

3 // numPossibleLabels (3 classes)

115

);

116

117

// Use iterator

118

while (iterator.hasNext()) {

119

DataSet dataSet = iterator.next();

120

System.out.println("Features shape: " + Arrays.toString(dataSet.getFeatures().shape()));

121

System.out.println("Labels shape: " + Arrays.toString(dataSet.getLabels().shape()));

122

}

123

```

124

125

### Regression Example

126

127

```java

128

// Setup for regression task

129

DataSetIterator regressionIterator = new RecordReaderDataSetIterator(

130

csvReader, // recordReader

131

64, // batchSize

132

5, // labelIndex (column 5 contains continuous target)

133

1, // numPossibleLabels (1 for regression)

134

true // regression = true

135

);

136

```

137

138

### Multi-Label Classification

139

140

```java

141

// Labels in columns 3, 4, and 5

142

DataSetIterator multiLabelIterator = new RecordReaderDataSetIterator(

143

csvReader, // recordReader

144

32, // batchSize

145

3, // labelIndexFrom (start of label columns)

146

5, // labelIndexTo (end of label columns)

147

false // regression = false (classification)

148

);

149

```

150

151

### With Data Preprocessing

152

153

```java

154

import org.nd4j.linalg.dataset.api.preprocessor.NormalizerMinMaxScaler;

155

156

// Create iterator

157

DataSetIterator iterator = new RecordReaderDataSetIterator(csvReader, 32, 4, 3);

158

159

// Add preprocessing

160

NormalizerMinMaxScaler scaler = new NormalizerMinMaxScaler();

161

iterator.setPreProcessor(scaler);

162

163

// First pass to calculate min/max

164

scaler.fit(iterator);

165

iterator.reset();

166

167

// Now use normalized data

168

while (iterator.hasNext()) {

169

DataSet normalizedData = iterator.next();

170

// Train with normalized data

171

}

172

```

173

174

### Metadata Collection

175

176

```java

177

// Enable metadata collection

178

RecordReaderDataSetIterator iterator = new RecordReaderDataSetIterator(

179

csvReader, 32, 4, 3);

180

iterator.setCollectMetaData(true);

181

182

// Process data

183

DataSet batch = iterator.next();

184

List<RecordMetaData> metaData = batch.getExampleMetaData();

185

186

// Later, load specific examples by metadata

187

DataSet specificExample = iterator.loadFromMetaData(metaData.get(0));

188

```

189

190

## Error Handling

191

192

The iterator handles various error conditions:

193

194

- **IOException**: Thrown when RecordReader encounters file reading errors

195

- **IllegalArgumentException**: Thrown for invalid constructor parameters

196

- **NoSuchElementException**: Thrown when calling next() with no more data

197

198

Common validation performed:

199

- batchSize must be positive

200

- labelIndex must be valid column index

201

- numPossibleLabels must be positive for classification

202

- labelIndexFrom must be <= labelIndexTo for multi-label scenarios