tessl/npm-ipld--car

Content Addressable aRchive format reader and writer for IPLD data structures.

—

Pending

Quality

Pending

Does it follow best practices?

Impact

Pending

No eval scenarios have been run

Securityby

Pending

The risk profile of this skill

Overview

Eval results

Files

Block and CID Iteration

Name: tessl/npm-ipld--car
Author: tessl

Memory-efficient iteration over CAR contents without loading entire archive into memory. Provides streaming access to blocks or CIDs, ideal for processing large archives or when memory usage is constrained.

Capabilities

CarBlockIterator Class

Provides streaming iteration over all blocks in a CAR archive.

/**
 * Streaming iterator over all blocks in a CAR archive
 * Processes blocks one at a time without loading entire archive into memory
 * Can only be iterated once per instance
 */
class CarBlockIterator {
  /** CAR version number (1 or 2) */
  readonly version: number;
  
  /** Get the list of root CIDs from the CAR header */
  getRoots(): Promise<CID[]>;
  
  /** Iterate over all blocks in the CAR */
  [Symbol.asyncIterator](): AsyncIterator<Block>;
  
  /** Create iterator from Uint8Array */
  static fromBytes(bytes: Uint8Array): Promise<CarBlockIterator>;
  
  /** Create iterator from async stream */
  static fromIterable(asyncIterable: AsyncIterable<Uint8Array>): Promise<CarBlockIterator>;
}

Usage Examples:

import { CarBlockIterator } from "@ipld/car/iterator";
import fs from 'fs';

// Iterate from bytes
const carBytes = fs.readFileSync('archive.car');
const iterator = await CarBlockIterator.fromBytes(carBytes);

// Iterate from stream (more memory efficient)
const stream = fs.createReadStream('large-archive.car');
const streamIterator = await CarBlockIterator.fromIterable(stream);

// Access roots
const roots = await iterator.getRoots();
console.log(`Processing CAR with ${roots.length} roots`);

// Process all blocks
for await (const block of iterator) {
  console.log(`Block ${block.cid}: ${block.bytes.length} bytes`);
  
  // Process block data
  await processBlock(block);
}

CarCIDIterator Class

Provides streaming iteration over all CIDs in a CAR archive without loading block data.

/**
 * Streaming iterator over all CIDs in a CAR archive
 * More memory efficient than CarBlockIterator when block data is not needed
 * Can only be iterated once per instance
 */
class CarCIDIterator {
  /** CAR version number (1 or 2) */
  readonly version: number;
  
  /** Get the list of root CIDs from the CAR header */
  getRoots(): Promise<CID[]>;
  
  /** Iterate over all CIDs in the CAR */
  [Symbol.asyncIterator](): AsyncIterator<CID>;
  
  /** Create iterator from Uint8Array */
  static fromBytes(bytes: Uint8Array): Promise<CarCIDIterator>;
  
  /** Create iterator from async stream */
  static fromIterable(asyncIterable: AsyncIterable<Uint8Array>): Promise<CarCIDIterator>;
}

Usage Examples:

import { CarCIDIterator } from "@ipld/car/iterator";
import fs from 'fs';

// More efficient when you only need CIDs
const stream = fs.createReadStream('large-archive.car');
const cidIterator = await CarCIDIterator.fromIterable(stream);

// Collect all CIDs
const allCids = [];
for await (const cid of cidIterator) {
  allCids.push(cid);
  console.log(`Found CID: ${cid}`);
}

console.log(`Archive contains ${allCids.length} blocks`);

Streaming Processing Patterns

Efficient patterns for processing large CAR files.

import { CarBlockIterator, CarCIDIterator } from "@ipld/car/iterator";
import fs from 'fs';

// Pattern 1: Filter and process specific blocks
const stream1 = fs.createReadStream('data.car');
const blockIterator = await CarBlockIterator.fromIterable(stream1);

for await (const block of blockIterator) {
  // Filter blocks by some criteria
  if (isRelevantBlock(block.cid)) {
    await processBlock(block);
  }
  
  // Memory management - process in batches
  if (shouldFlushBatch()) {
    await flushProcessedData();
  }
}

// Pattern 2: CID analysis without loading block data
const stream2 = fs.createReadStream('data.car');
const cidIterator = await CarCIDIterator.fromIterable(stream2);

const cidStats = {
  total: 0,
  byCodec: new Map(),
  byHashType: new Map()
};

for await (const cid of cidIterator) {
  cidStats.total++;
  
  // Analyze CID properties without loading block data
  const codec = cid.code;
  const hashType = cid.multihash.code;
  
  cidStats.byCodec.set(codec, (cidStats.byCodec.get(codec) || 0) + 1);
  cidStats.byHashType.set(hashType, (cidStats.byHashType.get(hashType) || 0) + 1);
}

console.log('CAR Analysis:', cidStats);

Selective Block Loading

Combine CID iteration with selective block loading for memory efficiency.

import { CarCIDIterator } from "@ipld/car/iterator";
import { CarIndexer } from "@ipld/car/indexer";
import { CarReader } from "@ipld/car/reader";
import fs from 'fs';

// First pass: identify blocks of interest using CID iterator
const stream1 = fs.createReadStream('large-archive.car');
const cidIterator = await CarCIDIterator.fromIterable(stream1);

const targetCids = new Set();
for await (const cid of cidIterator) {
  if (isTargetCid(cid)) {
    targetCids.add(cid.toString());
  }
}

console.log(`Found ${targetCids.size} target CIDs`);

// Second pass: load only target blocks using indexer + raw reading
const fd = await fs.promises.open('large-archive.car', 'r');
const stream2 = fs.createReadStream('large-archive.car');
const indexer = await CarIndexer.fromIterable(stream2);

for await (const blockIndex of indexer) {
  if (targetCids.has(blockIndex.cid.toString())) {
    const block = await CarReader.readRaw(fd, blockIndex);
    await processTargetBlock(block);
  }
}

await fd.close();

Data Pipeline Processing

Use iterators in data processing pipelines.

import { CarBlockIterator } from "@ipld/car/iterator";
import { CarWriter } from "@ipld/car/writer";
import fs from 'fs';
import { Readable } from 'stream';

// Transform and filter CAR contents
const inputStream = fs.createReadStream('input.car');
const blockIterator = await CarBlockIterator.fromIterable(inputStream);

// Create output CAR
const roots = await blockIterator.getRoots();
const filteredRoots = roots.filter(root => shouldKeepRoot(root));
const { writer, out } = CarWriter.create(filteredRoots);

Readable.from(out).pipe(fs.createWriteStream('filtered.car'));

// Process and filter blocks
for await (const block of blockIterator) {
  if (shouldKeepBlock(block)) {
    // Optionally transform block data
    const transformedBlock = transformBlock(block);
    await writer.put(transformedBlock);
  }
}

await writer.close();
console.log('Filtered CAR created');

Error Handling

Common errors when using iterators:

TypeError: Invalid input types
Error: Multiple iteration attempts, malformed CAR data
Stream Errors: Network or file system issues with streams

try {
  const iterator = await CarBlockIterator.fromBytes(invalidData);
} catch (error) {
  if (error instanceof TypeError) {
    console.log('Invalid input format');
  }
}

// Iteration errors
const iterator = await CarBlockIterator.fromBytes(carBytes);

try {
  for await (const block of iterator) {
    // Process blocks
  }
  
  // Second iteration will fail
  for await (const block of iterator) {
    // Error: Cannot decode more than once
  }
} catch (error) {
  if (error.message.includes('decode more than once')) {
    console.log('Iterator can only be used once - create new instance');
  }
}

// Stream errors
const stream = fs.createReadStream('nonexistent.car');
try {
  const iterator = await CarBlockIterator.fromIterable(stream);
  for await (const block of iterator) {
    // Process blocks
  }
} catch (error) {
  if (error.code === 'ENOENT') {
    console.log('File not found');
  }
}

Performance Considerations

Memory Usage

CarBlockIterator: Uses minimal memory, processes one block at a time
CarCIDIterator: More efficient than CarBlockIterator when block data not needed
Both suitable for processing arbitrarily large CAR files