0
# Indexed Reading (Node.js)
1
2
File-based indexed reading with random access capabilities for large CAR files. CarIndexedReader pre-indexes CAR files and maintains an open file descriptor for efficient block retrieval by CID. **This functionality is only available in Node.js.**
3
4
## Capabilities
5
6
### CarIndexedReader Class
7
8
Provides memory-efficient random access to large CAR files through pre-built indices.
9
10
```typescript { .api }
11
/**
12
* File-based CAR reader with pre-built index for random access
13
* Maintains open file descriptor and in-memory CID-to-location mapping
14
* Significantly more memory efficient than CarReader for large files
15
* Node.js only - not available in browser environments
16
*/
17
class CarIndexedReader {
18
/** CAR version number (1 or 2) */
19
readonly version: number;
20
21
/** Get the list of root CIDs from the CAR header */
22
getRoots(): Promise<CID[]>;
23
24
/** Check whether a given CID exists within the CAR */
25
has(key: CID): Promise<boolean>;
26
27
/** Fetch a Block from the CAR by CID, returns undefined if not found */
28
get(key: CID): Promise<Block | undefined>;
29
30
/** Returns async iterator over all blocks in the CAR */
31
blocks(): AsyncGenerator<Block>;
32
33
/** Returns async iterator over all CIDs in the CAR */
34
cids(): AsyncGenerator<CID>;
35
36
/** Close the underlying file descriptor - must be called for cleanup */
37
close(): Promise<void>;
38
39
/** Create indexed reader from file path, builds complete index in memory */
40
static fromFile(path: string): Promise<CarIndexedReader>;
41
}
42
```
43
44
**Usage Examples:**
45
46
```typescript
47
import { CarIndexedReader } from "@ipld/car/indexed-reader";
48
49
// Create indexed reader from file
50
const reader = await CarIndexedReader.fromFile('large-archive.car');
51
52
// Random access to blocks (very efficient)
53
const roots = await reader.getRoots();
54
for (const root of roots) {
55
if (await reader.has(root)) {
56
const block = await reader.get(root);
57
console.log(`Root block ${root}: ${block.bytes.length} bytes`);
58
}
59
}
60
61
// Iterate through all blocks (uses index for efficient access)
62
for await (const block of reader.blocks()) {
63
console.log(`Block ${block.cid}: ${block.bytes.length} bytes`);
64
}
65
66
// Important: Always close when done
67
await reader.close();
68
```
69
70
### Efficient Random Access Patterns
71
72
Optimize random access patterns for large CAR files.
73
74
```typescript
75
import { CarIndexedReader } from "@ipld/car/indexed-reader";
76
77
// Pattern 1: Bulk CID lookups
78
const reader = await CarIndexedReader.fromFile('massive-archive.car');
79
const targetCids = [cid1, cid2, cid3, cid4, cid5];
80
81
// Efficient bulk checking
82
const existingCids = [];
83
for (const cid of targetCids) {
84
if (await reader.has(cid)) {
85
existingCids.push(cid);
86
}
87
}
88
89
// Bulk retrieval
90
const blocks = new Map();
91
for (const cid of existingCids) {
92
const block = await reader.get(cid);
93
blocks.set(cid.toString(), block);
94
}
95
96
await reader.close();
97
console.log(`Retrieved ${blocks.size} of ${targetCids.length} requested blocks`);
98
```
99
100
### Index-Based Processing
101
102
Use the pre-built index for efficient processing patterns.
103
104
```typescript
105
import { CarIndexedReader } from "@ipld/car/indexed-reader";
106
107
// Pattern 2: Selective processing with CID-first approach
108
const reader = await CarIndexedReader.fromFile('data.car');
109
110
// First, identify all CIDs of interest
111
const targetCids = [];
112
for await (const cid of reader.cids()) {
113
if (isInterestingCid(cid)) {
114
targetCids.push(cid);
115
}
116
}
117
118
// Then efficiently retrieve only the blocks we need
119
for (const cid of targetCids) {
120
const block = await reader.get(cid);
121
await processBlock(block);
122
}
123
124
await reader.close();
125
```
126
127
### Memory vs. CarReader Comparison
128
129
Compare memory usage between CarReader and CarIndexedReader.
130
131
```typescript
132
import { CarReader } from "@ipld/car/reader";
133
import { CarIndexedReader } from "@ipld/car/indexed-reader";
134
import fs from 'fs';
135
136
// CarReader: Loads entire CAR into memory
137
const carBytes = fs.readFileSync('large-archive.car'); // Full file in memory
138
const memoryReader = await CarReader.fromBytes(carBytes); // All blocks in memory
139
140
// CarIndexedReader: Only index in memory, blocks loaded on demand
141
const indexedReader = await CarIndexedReader.fromFile('large-archive.car'); // Only index in memory
142
143
// Both provide same interface, but very different memory usage
144
const block1 = await memoryReader.get(someCid); // Retrieved from memory
145
const block2 = await indexedReader.get(someCid); // Read from disk on demand
146
147
// Cleanup
148
await indexedReader.close(); // CarReader has no cleanup needed
149
```
150
151
### Integration with File Processing
152
153
Combine indexed reading with file operations.
154
155
```typescript
156
import { CarIndexedReader } from "@ipld/car/indexed-reader";
157
import fs from 'fs';
158
159
// Process multiple CAR files efficiently
160
const carFiles = ['archive1.car', 'archive2.car', 'archive3.car'];
161
const allBlocks = new Map();
162
163
for (const filePath of carFiles) {
164
const reader = await CarIndexedReader.fromFile(filePath);
165
166
try {
167
// Collect all blocks from this archive
168
for await (const block of reader.blocks()) {
169
allBlocks.set(block.cid.toString(), {
170
block,
171
source: filePath
172
});
173
}
174
} finally {
175
await reader.close(); // Always close, even on errors
176
}
177
}
178
179
console.log(`Collected ${allBlocks.size} blocks from ${carFiles.length} archives`);
180
```
181
182
### Resource Management
183
184
Proper resource management with file descriptors.
185
186
```typescript
187
import { CarIndexedReader } from "@ipld/car/indexed-reader";
188
189
// Pattern: Using try/finally for cleanup
190
let reader;
191
try {
192
reader = await CarIndexedReader.fromFile('archive.car');
193
194
// Do work with reader
195
const roots = await reader.getRoots();
196
for (const root of roots) {
197
const block = await reader.get(root);
198
await processBlock(block);
199
}
200
201
} finally {
202
// Always cleanup file descriptor
203
if (reader) {
204
await reader.close();
205
}
206
}
207
208
// Pattern: Using async/await with explicit cleanup
209
async function processCarFile(filePath) {
210
const reader = await CarIndexedReader.fromFile(filePath);
211
212
try {
213
// Process file
214
return await doWorkWithReader(reader);
215
} finally {
216
await reader.close();
217
}
218
}
219
```
220
221
### Error Handling
222
223
Handle errors specific to file-based operations.
224
225
```typescript
226
import { CarIndexedReader } from "@ipld/car/indexed-reader";
227
228
// File access errors
229
try {
230
const reader = await CarIndexedReader.fromFile('nonexistent.car');
231
} catch (error) {
232
if (error.code === 'ENOENT') {
233
console.log('CAR file not found');
234
} else if (error.code === 'EACCES') {
235
console.log('Permission denied accessing CAR file');
236
}
237
}
238
239
// Invalid file format errors
240
try {
241
const reader = await CarIndexedReader.fromFile('invalid.car');
242
} catch (error) {
243
if (error.message.includes('Invalid CAR')) {
244
console.log('File is not a valid CAR archive');
245
}
246
}
247
248
// File descriptor errors during operation
249
let reader;
250
try {
251
reader = await CarIndexedReader.fromFile('archive.car');
252
const block = await reader.get(someCid);
253
} catch (error) {
254
if (error.code === 'EBADF') {
255
console.log('File descriptor error - file may have been closed');
256
}
257
} finally {
258
if (reader) {
259
try {
260
await reader.close();
261
} catch (closeError) {
262
console.log('Error closing reader:', closeError.message);
263
}
264
}
265
}
266
```
267
268
## Performance Considerations
269
270
### Memory Usage
271
- **Index Size**: Uses memory for CID-to-location map (typically much smaller than full file)
272
- **Block Loading**: Loads blocks on demand, not all at once
273
- **Large Files**: Can handle CAR files larger than available memory
274
275
### Access Patterns
276
- **Random Access**: Extremely efficient for CID-based lookups
277
- **Sequential Access**: Less efficient than streaming iterators
278
- **Mixed Access**: Good balance for applications needing both patterns
279
280
### File System Considerations
281
- **File Descriptor Limits**: Each CarIndexedReader uses one file descriptor
282
- **Concurrent Access**: Multiple readers can access same file simultaneously
283
- **File Locking**: No file locking - safe for read-only access
284
285
### Use Cases
286
- **Large CAR Analysis**: Process CAR files too large for memory
287
- **Block Servers**: Serve blocks by CID from large archives
288
- **Data Mining**: Random access patterns over archived data
289
- **Content Verification**: Validate specific blocks without full loading
290
- **Selective Extraction**: Extract specific blocks from large archives