0
# Block and CID Iteration
1
2
Memory-efficient iteration over CAR contents without loading entire archive into memory. Provides streaming access to blocks or CIDs, ideal for processing large archives or when memory usage is constrained.
3
4
## Capabilities
5
6
### CarBlockIterator Class
7
8
Provides streaming iteration over all blocks in a CAR archive.
9
10
```typescript { .api }
11
/**
12
* Streaming iterator over all blocks in a CAR archive
13
* Processes blocks one at a time without loading entire archive into memory
14
* Can only be iterated once per instance
15
*/
16
class CarBlockIterator {
17
/** CAR version number (1 or 2) */
18
readonly version: number;
19
20
/** Get the list of root CIDs from the CAR header */
21
getRoots(): Promise<CID[]>;
22
23
/** Iterate over all blocks in the CAR */
24
[Symbol.asyncIterator](): AsyncIterator<Block>;
25
26
/** Create iterator from Uint8Array */
27
static fromBytes(bytes: Uint8Array): Promise<CarBlockIterator>;
28
29
/** Create iterator from async stream */
30
static fromIterable(asyncIterable: AsyncIterable<Uint8Array>): Promise<CarBlockIterator>;
31
}
32
```
33
34
**Usage Examples:**
35
36
```typescript
37
import { CarBlockIterator } from "@ipld/car/iterator";
38
import fs from 'fs';
39
40
// Iterate from bytes
41
const carBytes = fs.readFileSync('archive.car');
42
const iterator = await CarBlockIterator.fromBytes(carBytes);
43
44
// Iterate from stream (more memory efficient)
45
const stream = fs.createReadStream('large-archive.car');
46
const streamIterator = await CarBlockIterator.fromIterable(stream);
47
48
// Access roots
49
const roots = await iterator.getRoots();
50
console.log(`Processing CAR with ${roots.length} roots`);
51
52
// Process all blocks
53
for await (const block of iterator) {
54
console.log(`Block ${block.cid}: ${block.bytes.length} bytes`);
55
56
// Process block data
57
await processBlock(block);
58
}
59
```
60
61
### CarCIDIterator Class
62
63
Provides streaming iteration over all CIDs in a CAR archive without loading block data.
64
65
```typescript { .api }
66
/**
67
* Streaming iterator over all CIDs in a CAR archive
68
* More memory efficient than CarBlockIterator when block data is not needed
69
* Can only be iterated once per instance
70
*/
71
class CarCIDIterator {
72
/** CAR version number (1 or 2) */
73
readonly version: number;
74
75
/** Get the list of root CIDs from the CAR header */
76
getRoots(): Promise<CID[]>;
77
78
/** Iterate over all CIDs in the CAR */
79
[Symbol.asyncIterator](): AsyncIterator<CID>;
80
81
/** Create iterator from Uint8Array */
82
static fromBytes(bytes: Uint8Array): Promise<CarCIDIterator>;
83
84
/** Create iterator from async stream */
85
static fromIterable(asyncIterable: AsyncIterable<Uint8Array>): Promise<CarCIDIterator>;
86
}
87
```
88
89
**Usage Examples:**
90
91
```typescript
92
import { CarCIDIterator } from "@ipld/car/iterator";
93
import fs from 'fs';
94
95
// More efficient when you only need CIDs
96
const stream = fs.createReadStream('large-archive.car');
97
const cidIterator = await CarCIDIterator.fromIterable(stream);
98
99
// Collect all CIDs
100
const allCids = [];
101
for await (const cid of cidIterator) {
102
allCids.push(cid);
103
console.log(`Found CID: ${cid}`);
104
}
105
106
console.log(`Archive contains ${allCids.length} blocks`);
107
```
108
109
### Streaming Processing Patterns
110
111
Efficient patterns for processing large CAR files.
112
113
```typescript
114
import { CarBlockIterator, CarCIDIterator } from "@ipld/car/iterator";
115
import fs from 'fs';
116
117
// Pattern 1: Filter and process specific blocks
118
const stream1 = fs.createReadStream('data.car');
119
const blockIterator = await CarBlockIterator.fromIterable(stream1);
120
121
for await (const block of blockIterator) {
122
// Filter blocks by some criteria
123
if (isRelevantBlock(block.cid)) {
124
await processBlock(block);
125
}
126
127
// Memory management - process in batches
128
if (shouldFlushBatch()) {
129
await flushProcessedData();
130
}
131
}
132
133
// Pattern 2: CID analysis without loading block data
134
const stream2 = fs.createReadStream('data.car');
135
const cidIterator = await CarCIDIterator.fromIterable(stream2);
136
137
const cidStats = {
138
total: 0,
139
byCodec: new Map(),
140
byHashType: new Map()
141
};
142
143
for await (const cid of cidIterator) {
144
cidStats.total++;
145
146
// Analyze CID properties without loading block data
147
const codec = cid.code;
148
const hashType = cid.multihash.code;
149
150
cidStats.byCodec.set(codec, (cidStats.byCodec.get(codec) || 0) + 1);
151
cidStats.byHashType.set(hashType, (cidStats.byHashType.get(hashType) || 0) + 1);
152
}
153
154
console.log('CAR Analysis:', cidStats);
155
```
156
157
### Selective Block Loading
158
159
Combine CID iteration with selective block loading for memory efficiency.
160
161
```typescript
162
import { CarCIDIterator } from "@ipld/car/iterator";
163
import { CarIndexer } from "@ipld/car/indexer";
164
import { CarReader } from "@ipld/car/reader";
165
import fs from 'fs';
166
167
// First pass: identify blocks of interest using CID iterator
168
const stream1 = fs.createReadStream('large-archive.car');
169
const cidIterator = await CarCIDIterator.fromIterable(stream1);
170
171
const targetCids = new Set();
172
for await (const cid of cidIterator) {
173
if (isTargetCid(cid)) {
174
targetCids.add(cid.toString());
175
}
176
}
177
178
console.log(`Found ${targetCids.size} target CIDs`);
179
180
// Second pass: load only target blocks using indexer + raw reading
181
const fd = await fs.promises.open('large-archive.car', 'r');
182
const stream2 = fs.createReadStream('large-archive.car');
183
const indexer = await CarIndexer.fromIterable(stream2);
184
185
for await (const blockIndex of indexer) {
186
if (targetCids.has(blockIndex.cid.toString())) {
187
const block = await CarReader.readRaw(fd, blockIndex);
188
await processTargetBlock(block);
189
}
190
}
191
192
await fd.close();
193
```
194
195
### Data Pipeline Processing
196
197
Use iterators in data processing pipelines.
198
199
```typescript
200
import { CarBlockIterator } from "@ipld/car/iterator";
201
import { CarWriter } from "@ipld/car/writer";
202
import fs from 'fs';
203
import { Readable } from 'stream';
204
205
// Transform and filter CAR contents
206
const inputStream = fs.createReadStream('input.car');
207
const blockIterator = await CarBlockIterator.fromIterable(inputStream);
208
209
// Create output CAR
210
const roots = await blockIterator.getRoots();
211
const filteredRoots = roots.filter(root => shouldKeepRoot(root));
212
const { writer, out } = CarWriter.create(filteredRoots);
213
214
Readable.from(out).pipe(fs.createWriteStream('filtered.car'));
215
216
// Process and filter blocks
217
for await (const block of blockIterator) {
218
if (shouldKeepBlock(block)) {
219
// Optionally transform block data
220
const transformedBlock = transformBlock(block);
221
await writer.put(transformedBlock);
222
}
223
}
224
225
await writer.close();
226
console.log('Filtered CAR created');
227
```
228
229
### Error Handling
230
231
Common errors when using iterators:
232
233
- **TypeError**: Invalid input types
234
- **Error**: Multiple iteration attempts, malformed CAR data
235
- **Stream Errors**: Network or file system issues with streams
236
237
```typescript
238
try {
239
const iterator = await CarBlockIterator.fromBytes(invalidData);
240
} catch (error) {
241
if (error instanceof TypeError) {
242
console.log('Invalid input format');
243
}
244
}
245
246
// Iteration errors
247
const iterator = await CarBlockIterator.fromBytes(carBytes);
248
249
try {
250
for await (const block of iterator) {
251
// Process blocks
252
}
253
254
// Second iteration will fail
255
for await (const block of iterator) {
256
// Error: Cannot decode more than once
257
}
258
} catch (error) {
259
if (error.message.includes('decode more than once')) {
260
console.log('Iterator can only be used once - create new instance');
261
}
262
}
263
264
// Stream errors
265
const stream = fs.createReadStream('nonexistent.car');
266
try {
267
const iterator = await CarBlockIterator.fromIterable(stream);
268
for await (const block of iterator) {
269
// Process blocks
270
}
271
} catch (error) {
272
if (error.code === 'ENOENT') {
273
console.log('File not found');
274
}
275
}
276
```
277
278
## Performance Considerations
279
280
### Memory Usage
281
- **CarBlockIterator**: Uses minimal memory, processes one block at a time
282
- **CarCIDIterator**: More efficient than CarBlockIterator when block data not needed
283
- Both suitable for processing arbitrarily large CAR files
284
285
### Processing Speed
286
- Streaming iteration is I/O bound (disk/network speed)
287
- CID iteration is faster than block iteration
288
- Consider batch processing for better throughput
289
290
### Use Cases
291
- **Large File Processing**: Process CAR files larger than available memory
292
- **Selective Processing**: Filter blocks without loading all data
293
- **Data Analysis**: Analyze CAR contents and structure
294
- **Format Conversion**: Transform CAR files to other formats
295
- **Validation**: Verify CAR integrity and block accessibility