or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

buffer-writing.mdindex.mdindexed-reading.mdindexing.mditeration.mdreading.mdwriting.md
tile.json

indexing.mddocs/

0

# CAR Indexing

1

2

Efficient indexing functionality for creating block indices and enabling random access to large CAR files. CarIndexer processes CAR archives to generate location metadata for each block without loading block data into memory.

3

4

## Capabilities

5

6

### CarIndexer Class

7

8

Provides efficient indexing of CAR archives with streaming block location generation.

9

10

```typescript { .api }

11

/**

12

* Creates block indices for CAR archives

13

* Processes header and generates BlockIndex entries for each block

14

* Implements AsyncIterable for streaming index generation

15

*/

16

class CarIndexer {

17

/** CAR version number (1 or 2) */

18

readonly version: number;

19

20

/** Get the list of root CIDs from the CAR header */

21

getRoots(): Promise<CID[]>;

22

23

/** Iterate over all block indices in the CAR */

24

[Symbol.asyncIterator](): AsyncIterator<BlockIndex>;

25

26

/** Create indexer from Uint8Array */

27

static fromBytes(bytes: Uint8Array): Promise<CarIndexer>;

28

29

/** Create indexer from async stream */

30

static fromIterable(asyncIterable: AsyncIterable<Uint8Array>): Promise<CarIndexer>;

31

}

32

33

/**

34

* Block index containing location and size information

35

*/

36

interface BlockIndex {

37

/** CID of the block */

38

cid: CID;

39

/** Total length including CID encoding */

40

length: number;

41

/** Length of block data only (excludes CID) */

42

blockLength: number;

43

/** Byte offset of entire block entry in CAR */

44

offset: number;

45

/** Byte offset of block data (after CID) in CAR */

46

blockOffset: number;

47

}

48

```

49

50

**Usage Examples:**

51

52

```typescript

53

import { CarIndexer } from "@ipld/car/indexer";

54

import fs from 'fs';

55

56

// Index from bytes

57

const carBytes = fs.readFileSync('archive.car');

58

const indexer = await CarIndexer.fromBytes(carBytes);

59

60

// Index from stream (more memory efficient)

61

const stream = fs.createReadStream('large-archive.car');

62

const streamIndexer = await CarIndexer.fromIterable(stream);

63

64

// Access roots

65

const roots = await indexer.getRoots();

66

console.log(`Indexing CAR with ${roots.length} roots`);

67

68

// Iterate through block indices

69

for await (const blockIndex of indexer) {

70

console.log(`Block ${blockIndex.cid}:`);

71

console.log(` Total length: ${blockIndex.length}`);

72

console.log(` Block data length: ${blockIndex.blockLength}`);

73

console.log(` Starts at byte: ${blockIndex.offset}`);

74

console.log(` Block data at byte: ${blockIndex.blockOffset}`);

75

}

76

```

77

78

### Building Block Location Maps

79

80

Create lookup maps for random access to blocks by CID.

81

82

```typescript

83

import { CarIndexer } from "@ipld/car/indexer";

84

import fs from 'fs';

85

86

// Build complete index map

87

const stream = fs.createReadStream('archive.car');

88

const indexer = await CarIndexer.fromIterable(stream);

89

90

const blockMap = new Map();

91

const sizeStats = { totalBlocks: 0, totalBytes: 0 };

92

93

for await (const blockIndex of indexer) {

94

// Store location info by CID string

95

blockMap.set(blockIndex.cid.toString(), {

96

offset: blockIndex.offset,

97

blockOffset: blockIndex.blockOffset,

98

blockLength: blockIndex.blockLength

99

});

100

101

// Collect statistics

102

sizeStats.totalBlocks++;

103

sizeStats.totalBytes += blockIndex.blockLength;

104

}

105

106

console.log(`Indexed ${sizeStats.totalBlocks} blocks, ${sizeStats.totalBytes} total bytes`);

107

108

// Use map for random access

109

const targetCid = someTargetCid;

110

const location = blockMap.get(targetCid.toString());

111

if (location) {

112

console.log(`Block ${targetCid} found at offset ${location.blockOffset}`);

113

}

114

```

115

116

### Integration with Raw Reading

117

118

Combine indexing with raw block reading for efficient random access.

119

120

```typescript

121

import { CarIndexer } from "@ipld/car/indexer";

122

import { CarReader } from "@ipld/car/reader";

123

import fs from 'fs';

124

125

// Index and read specific blocks

126

const fd = await fs.promises.open('large-archive.car', 'r');

127

const stream = fs.createReadStream('large-archive.car');

128

const indexer = await CarIndexer.fromIterable(stream);

129

130

// Find and read specific blocks

131

const targetCids = [cid1, cid2, cid3];

132

const foundBlocks = new Map();

133

134

for await (const blockIndex of indexer) {

135

const cidStr = blockIndex.cid.toString();

136

137

if (targetCids.some(cid => cid.toString() === cidStr)) {

138

// Read only the blocks we need

139

const block = await CarReader.readRaw(fd, blockIndex);

140

foundBlocks.set(cidStr, block);

141

142

// Stop early if we found all targets

143

if (foundBlocks.size === targetCids.length) {

144

break;

145

}

146

}

147

}

148

149

await fd.close();

150

console.log(`Found ${foundBlocks.size} of ${targetCids.length} target blocks`);

151

```

152

153

### Memory-Efficient Large File Processing

154

155

Process large CAR files without loading entire contents into memory.

156

157

```typescript

158

import { CarIndexer } from "@ipld/car/indexer";

159

import fs from 'fs';

160

161

// Process very large CAR file efficiently

162

const stream = fs.createReadStream('massive-archive.car');

163

const indexer = await CarIndexer.fromIterable(stream);

164

165

let processedCount = 0;

166

let processedBytes = 0;

167

168

for await (const blockIndex of indexer) {

169

// Process blocks in chunks or apply filtering

170

if (shouldProcessBlock(blockIndex.cid)) {

171

await processBlockIndex(blockIndex);

172

processedCount++;

173

processedBytes += blockIndex.blockLength;

174

175

// Progress reporting

176

if (processedCount % 1000 === 0) {

177

console.log(`Processed ${processedCount} blocks, ${processedBytes} bytes`);

178

}

179

}

180

}

181

182

console.log(`Completed processing: ${processedCount} blocks`);

183

```

184

185

### Error Handling

186

187

Common errors when indexing CAR files:

188

189

- **TypeError**: Invalid input types (not Uint8Array or async iterable)

190

- **Error**: Malformed CAR data, invalid headers, unexpected end of data

191

- **Iteration Errors**: Can only iterate once per CarIndexer instance

192

193

```typescript

194

try {

195

const indexer = await CarIndexer.fromBytes(invalidData);

196

} catch (error) {

197

if (error instanceof TypeError) {

198

console.log('Invalid input format');

199

} else if (error.message.includes('Invalid CAR')) {

200

console.log('Malformed CAR file');

201

}

202

}

203

204

// Iteration can only be performed once

205

const indexer = await CarIndexer.fromBytes(carBytes);

206

207

// First iteration works

208

for await (const blockIndex of indexer) {

209

// Process blocks

210

}

211

212

// Second iteration will not work - need new indexer instance

213

// for await (const blockIndex of indexer) { // Won't iterate

214

```

215

216

## Performance Considerations

217

218

### Memory Usage

219

- CarIndexer uses minimal memory - only processes one block index at a time

220

- Block data is never loaded into memory during indexing

221

- Suitable for indexing very large CAR files

222

223

### Processing Speed

224

- Indexing speed depends on stream/disk I/O performance

225

- Processing thousands of blocks per second is typical

226

- Use `fromIterable()` with file streams for best memory efficiency

227

228

### Use Cases

229

- **Random Access Preparation**: Build indices for later block lookups

230

- **CAR Analysis**: Analyze CAR structure without loading block data

231

- **Selective Processing**: Identify blocks of interest before reading data

232

- **Statistics Generation**: Count blocks, analyze size distributions

233

- **Validation**: Verify CAR structure integrity