CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl/npm-ipld--car

Content Addressable aRchive format reader and writer for IPLD data structures.

Pending
Quality

Pending

Does it follow best practices?

Impact

Pending

No eval scenarios have been run

SecuritybySnyk

Pending

The risk profile of this skill

Overview
Eval results
Files

indexed-reading.mddocs/

Indexed Reading (Node.js)

File-based indexed reading with random access capabilities for large CAR files. CarIndexedReader pre-indexes CAR files and maintains an open file descriptor for efficient block retrieval by CID. This functionality is only available in Node.js.

Capabilities

CarIndexedReader Class

Provides memory-efficient random access to large CAR files through pre-built indices.

/**
 * File-based CAR reader with pre-built index for random access
 * Maintains open file descriptor and in-memory CID-to-location mapping
 * Significantly more memory efficient than CarReader for large files
 * Node.js only - not available in browser environments
 */
class CarIndexedReader {
  /** CAR version number (1 or 2) */
  readonly version: number;
  
  /** Get the list of root CIDs from the CAR header */  
  getRoots(): Promise<CID[]>;
  
  /** Check whether a given CID exists within the CAR */
  has(key: CID): Promise<boolean>;
  
  /** Fetch a Block from the CAR by CID, returns undefined if not found */
  get(key: CID): Promise<Block | undefined>;
  
  /** Returns async iterator over all blocks in the CAR */
  blocks(): AsyncGenerator<Block>;
  
  /** Returns async iterator over all CIDs in the CAR */
  cids(): AsyncGenerator<CID>;
  
  /** Close the underlying file descriptor - must be called for cleanup */
  close(): Promise<void>;
  
  /** Create indexed reader from file path, builds complete index in memory */
  static fromFile(path: string): Promise<CarIndexedReader>;
}

Usage Examples:

import { CarIndexedReader } from "@ipld/car/indexed-reader";

// Create indexed reader from file
const reader = await CarIndexedReader.fromFile('large-archive.car');

// Random access to blocks (very efficient)
const roots = await reader.getRoots();
for (const root of roots) {
  if (await reader.has(root)) {
    const block = await reader.get(root);
    console.log(`Root block ${root}: ${block.bytes.length} bytes`);
  }
}

// Iterate through all blocks (uses index for efficient access)
for await (const block of reader.blocks()) {
  console.log(`Block ${block.cid}: ${block.bytes.length} bytes`);
}

// Important: Always close when done
await reader.close();

Efficient Random Access Patterns

Optimize random access patterns for large CAR files.

import { CarIndexedReader } from "@ipld/car/indexed-reader";

// Pattern 1: Bulk CID lookups
const reader = await CarIndexedReader.fromFile('massive-archive.car');
const targetCids = [cid1, cid2, cid3, cid4, cid5];

// Efficient bulk checking
const existingCids = [];
for (const cid of targetCids) {
  if (await reader.has(cid)) {
    existingCids.push(cid);
  }
}

// Bulk retrieval
const blocks = new Map();
for (const cid of existingCids) {
  const block = await reader.get(cid);
  blocks.set(cid.toString(), block);
}

await reader.close();
console.log(`Retrieved ${blocks.size} of ${targetCids.length} requested blocks`);

Index-Based Processing

Use the pre-built index for efficient processing patterns.

import { CarIndexedReader } from "@ipld/car/indexed-reader";

// Pattern 2: Selective processing with CID-first approach
const reader = await CarIndexedReader.fromFile('data.car');

// First, identify all CIDs of interest
const targetCids = [];
for await (const cid of reader.cids()) {
  if (isInterestingCid(cid)) {
    targetCids.push(cid);
  }
}

// Then efficiently retrieve only the blocks we need
for (const cid of targetCids) {
  const block = await reader.get(cid);
  await processBlock(block);
}

await reader.close();

Memory vs. CarReader Comparison

Compare memory usage between CarReader and CarIndexedReader.

import { CarReader } from "@ipld/car/reader";
import { CarIndexedReader } from "@ipld/car/indexed-reader";
import fs from 'fs';

// CarReader: Loads entire CAR into memory
const carBytes = fs.readFileSync('large-archive.car'); // Full file in memory
const memoryReader = await CarReader.fromBytes(carBytes); // All blocks in memory

// CarIndexedReader: Only index in memory, blocks loaded on demand
const indexedReader = await CarIndexedReader.fromFile('large-archive.car'); // Only index in memory

// Both provide same interface, but very different memory usage
const block1 = await memoryReader.get(someCid);   // Retrieved from memory
const block2 = await indexedReader.get(someCid);  // Read from disk on demand

// Cleanup
await indexedReader.close(); // CarReader has no cleanup needed

Integration with File Processing

Combine indexed reading with file operations.

import { CarIndexedReader } from "@ipld/car/indexed-reader";
import fs from 'fs';

// Process multiple CAR files efficiently
const carFiles = ['archive1.car', 'archive2.car', 'archive3.car'];
const allBlocks = new Map();

for (const filePath of carFiles) {
  const reader = await CarIndexedReader.fromFile(filePath);
  
  try {
    // Collect all blocks from this archive
    for await (const block of reader.blocks()) {
      allBlocks.set(block.cid.toString(), {
        block,
        source: filePath
      });
    }
  } finally {
    await reader.close(); // Always close, even on errors
  }
}

console.log(`Collected ${allBlocks.size} blocks from ${carFiles.length} archives`);

Resource Management

Proper resource management with file descriptors.

import { CarIndexedReader } from "@ipld/car/indexed-reader";

// Pattern: Using try/finally for cleanup
let reader;
try {
  reader = await CarIndexedReader.fromFile('archive.car');
  
  // Do work with reader
  const roots = await reader.getRoots();
  for (const root of roots) {
    const block = await reader.get(root);
    await processBlock(block);
  }
  
} finally {
  // Always cleanup file descriptor
  if (reader) {
    await reader.close();
  }
}

// Pattern: Using async/await with explicit cleanup
async function processCarFile(filePath) {
  const reader = await CarIndexedReader.fromFile(filePath);
  
  try {
    // Process file
    return await doWorkWithReader(reader);
  } finally {
    await reader.close();
  }
}

Error Handling

Handle errors specific to file-based operations.

import { CarIndexedReader } from "@ipld/car/indexed-reader";

// File access errors
try {
  const reader = await CarIndexedReader.fromFile('nonexistent.car');
} catch (error) {
  if (error.code === 'ENOENT') {
    console.log('CAR file not found');
  } else if (error.code === 'EACCES') {
    console.log('Permission denied accessing CAR file');
  }
}

// Invalid file format errors  
try {
  const reader = await CarIndexedReader.fromFile('invalid.car');
} catch (error) {
  if (error.message.includes('Invalid CAR')) {
    console.log('File is not a valid CAR archive');
  }
}

// File descriptor errors during operation
let reader;
try {
  reader = await CarIndexedReader.fromFile('archive.car');
  const block = await reader.get(someCid);
} catch (error) {
  if (error.code === 'EBADF') {
    console.log('File descriptor error - file may have been closed');
  }
} finally {
  if (reader) {
    try {
      await reader.close();
    } catch (closeError) {
      console.log('Error closing reader:', closeError.message);
    }
  }
}

Performance Considerations

Memory Usage

  • Index Size: Uses memory for CID-to-location map (typically much smaller than full file)
  • Block Loading: Loads blocks on demand, not all at once
  • Large Files: Can handle CAR files larger than available memory

Access Patterns

  • Random Access: Extremely efficient for CID-based lookups
  • Sequential Access: Less efficient than streaming iterators
  • Mixed Access: Good balance for applications needing both patterns

File System Considerations

  • File Descriptor Limits: Each CarIndexedReader uses one file descriptor
  • Concurrent Access: Multiple readers can access same file simultaneously
  • File Locking: No file locking - safe for read-only access

Use Cases

  • Large CAR Analysis: Process CAR files too large for memory
  • Block Servers: Serve blocks by CID from large archives
  • Data Mining: Random access patterns over archived data
  • Content Verification: Validate specific blocks without full loading
  • Selective Extraction: Extract specific blocks from large archives

docs

buffer-writing.md

index.md

indexed-reading.md

indexing.md

iteration.md

reading.md

writing.md

tile.json