0
# CAR Indexing
1
2
Efficient indexing functionality for creating block indices and enabling random access to large CAR files. CarIndexer processes CAR archives to generate location metadata for each block without loading block data into memory.
3
4
## Capabilities
5
6
### CarIndexer Class
7
8
Provides efficient indexing of CAR archives with streaming block location generation.
9
10
```typescript { .api }
11
/**
12
* Creates block indices for CAR archives
13
* Processes header and generates BlockIndex entries for each block
14
* Implements AsyncIterable for streaming index generation
15
*/
16
class CarIndexer {
17
/** CAR version number (1 or 2) */
18
readonly version: number;
19
20
/** Get the list of root CIDs from the CAR header */
21
getRoots(): Promise<CID[]>;
22
23
/** Iterate over all block indices in the CAR */
24
[Symbol.asyncIterator](): AsyncIterator<BlockIndex>;
25
26
/** Create indexer from Uint8Array */
27
static fromBytes(bytes: Uint8Array): Promise<CarIndexer>;
28
29
/** Create indexer from async stream */
30
static fromIterable(asyncIterable: AsyncIterable<Uint8Array>): Promise<CarIndexer>;
31
}
32
33
/**
34
* Block index containing location and size information
35
*/
36
interface BlockIndex {
37
/** CID of the block */
38
cid: CID;
39
/** Total length including CID encoding */
40
length: number;
41
/** Length of block data only (excludes CID) */
42
blockLength: number;
43
/** Byte offset of entire block entry in CAR */
44
offset: number;
45
/** Byte offset of block data (after CID) in CAR */
46
blockOffset: number;
47
}
48
```
49
50
**Usage Examples:**
51
52
```typescript
53
import { CarIndexer } from "@ipld/car/indexer";
54
import fs from 'fs';
55
56
// Index from bytes
57
const carBytes = fs.readFileSync('archive.car');
58
const indexer = await CarIndexer.fromBytes(carBytes);
59
60
// Index from stream (more memory efficient)
61
const stream = fs.createReadStream('large-archive.car');
62
const streamIndexer = await CarIndexer.fromIterable(stream);
63
64
// Access roots
65
const roots = await indexer.getRoots();
66
console.log(`Indexing CAR with ${roots.length} roots`);
67
68
// Iterate through block indices
69
for await (const blockIndex of indexer) {
70
console.log(`Block ${blockIndex.cid}:`);
71
console.log(` Total length: ${blockIndex.length}`);
72
console.log(` Block data length: ${blockIndex.blockLength}`);
73
console.log(` Starts at byte: ${blockIndex.offset}`);
74
console.log(` Block data at byte: ${blockIndex.blockOffset}`);
75
}
76
```
77
78
### Building Block Location Maps
79
80
Create lookup maps for random access to blocks by CID.
81
82
```typescript
83
import { CarIndexer } from "@ipld/car/indexer";
84
import fs from 'fs';
85
86
// Build complete index map
87
const stream = fs.createReadStream('archive.car');
88
const indexer = await CarIndexer.fromIterable(stream);
89
90
const blockMap = new Map();
91
const sizeStats = { totalBlocks: 0, totalBytes: 0 };
92
93
for await (const blockIndex of indexer) {
94
// Store location info by CID string
95
blockMap.set(blockIndex.cid.toString(), {
96
offset: blockIndex.offset,
97
blockOffset: blockIndex.blockOffset,
98
blockLength: blockIndex.blockLength
99
});
100
101
// Collect statistics
102
sizeStats.totalBlocks++;
103
sizeStats.totalBytes += blockIndex.blockLength;
104
}
105
106
console.log(`Indexed ${sizeStats.totalBlocks} blocks, ${sizeStats.totalBytes} total bytes`);
107
108
// Use map for random access
109
const targetCid = someTargetCid;
110
const location = blockMap.get(targetCid.toString());
111
if (location) {
112
console.log(`Block ${targetCid} found at offset ${location.blockOffset}`);
113
}
114
```
115
116
### Integration with Raw Reading
117
118
Combine indexing with raw block reading for efficient random access.
119
120
```typescript
121
import { CarIndexer } from "@ipld/car/indexer";
122
import { CarReader } from "@ipld/car/reader";
123
import fs from 'fs';
124
125
// Index and read specific blocks
126
const fd = await fs.promises.open('large-archive.car', 'r');
127
const stream = fs.createReadStream('large-archive.car');
128
const indexer = await CarIndexer.fromIterable(stream);
129
130
// Find and read specific blocks
131
const targetCids = [cid1, cid2, cid3];
132
const foundBlocks = new Map();
133
134
for await (const blockIndex of indexer) {
135
const cidStr = blockIndex.cid.toString();
136
137
if (targetCids.some(cid => cid.toString() === cidStr)) {
138
// Read only the blocks we need
139
const block = await CarReader.readRaw(fd, blockIndex);
140
foundBlocks.set(cidStr, block);
141
142
// Stop early if we found all targets
143
if (foundBlocks.size === targetCids.length) {
144
break;
145
}
146
}
147
}
148
149
await fd.close();
150
console.log(`Found ${foundBlocks.size} of ${targetCids.length} target blocks`);
151
```
152
153
### Memory-Efficient Large File Processing
154
155
Process large CAR files without loading entire contents into memory.
156
157
```typescript
158
import { CarIndexer } from "@ipld/car/indexer";
159
import fs from 'fs';
160
161
// Process very large CAR file efficiently
162
const stream = fs.createReadStream('massive-archive.car');
163
const indexer = await CarIndexer.fromIterable(stream);
164
165
let processedCount = 0;
166
let processedBytes = 0;
167
168
for await (const blockIndex of indexer) {
169
// Process blocks in chunks or apply filtering
170
if (shouldProcessBlock(blockIndex.cid)) {
171
await processBlockIndex(blockIndex);
172
processedCount++;
173
processedBytes += blockIndex.blockLength;
174
175
// Progress reporting
176
if (processedCount % 1000 === 0) {
177
console.log(`Processed ${processedCount} blocks, ${processedBytes} bytes`);
178
}
179
}
180
}
181
182
console.log(`Completed processing: ${processedCount} blocks`);
183
```
184
185
### Error Handling
186
187
Common errors when indexing CAR files:
188
189
- **TypeError**: Invalid input types (not Uint8Array or async iterable)
190
- **Error**: Malformed CAR data, invalid headers, unexpected end of data
191
- **Iteration Errors**: Can only iterate once per CarIndexer instance
192
193
```typescript
194
try {
195
const indexer = await CarIndexer.fromBytes(invalidData);
196
} catch (error) {
197
if (error instanceof TypeError) {
198
console.log('Invalid input format');
199
} else if (error.message.includes('Invalid CAR')) {
200
console.log('Malformed CAR file');
201
}
202
}
203
204
// Iteration can only be performed once
205
const indexer = await CarIndexer.fromBytes(carBytes);
206
207
// First iteration works
208
for await (const blockIndex of indexer) {
209
// Process blocks
210
}
211
212
// Second iteration will not work - need new indexer instance
213
// for await (const blockIndex of indexer) { // Won't iterate
214
```
215
216
## Performance Considerations
217
218
### Memory Usage
219
- CarIndexer uses minimal memory - only processes one block index at a time
220
- Block data is never loaded into memory during indexing
221
- Suitable for indexing very large CAR files
222
223
### Processing Speed
224
- Indexing speed depends on stream/disk I/O performance
225
- Processing thousands of blocks per second is typical
226
- Use `fromIterable()` with file streams for best memory efficiency
227
228
### Use Cases
229
- **Random Access Preparation**: Build indices for later block lookups
230
- **CAR Analysis**: Analyze CAR structure without loading block data
231
- **Selective Processing**: Identify blocks of interest before reading data
232
- **Statistics Generation**: Count blocks, analyze size distributions
233
- **Validation**: Verify CAR structure integrity