tessl/npm-crawler

A ready-to-use web spider that works with proxies, asynchrony, rate limit, configurable request pools, jQuery, and HTTP/2 support.

1.17x

Overview

Eval results

Files

Multi-Format Web Resource Aggregator

Name: tessl/npm-crawler
Rating: 0.94 (1 reviews)
Author: tessl

A web resource aggregation tool that fetches content from multiple URLs and intelligently processes the responses based on their Content-Type.

Capabilities

Fetches and processes HTML content

When fetching a URL that returns HTML (Content-Type: text/html), the response should be automatically parsed and the page title should be extracted. @test
When fetching a URL that returns HTML with a specific charset (e.g., Content-Type: text/html; charset=utf-8), the content should be properly decoded. @test

Fetches and processes JSON content

When fetching a URL that returns JSON (Content-Type: application/json), the response body should be automatically parsed into a JavaScript object. @test
When the JSON endpoint returns an array, it should be correctly parsed as an array. @test

Fetches and processes binary content

When fetching a URL that returns binary data (e.g., an image with Content-Type: image/png), the raw binary data should be preserved without text encoding. @test

Handles multiple URLs concurrently

When provided with multiple URLs of different content types (HTML, JSON, and binary), all should be fetched and processed correctly according to their respective Content-Type. @test

Implementation

@generates

API

/**
 * Fetches and processes web resources from the given URLs.
 * Automatically handles different content types based on Content-Type headers.
 *
 * @param {string[]} urls - Array of URLs to fetch
 * @param {function} callback - Called when all resources are fetched
 *   Receives (error, results) where results is an array of objects:
 *   { url: string, contentType: string, data: any }
 *   - For HTML: data contains { title: string, body: string }
 *   - For JSON: data contains the parsed JavaScript object/array
 *   - For binary: data contains { buffer: Buffer, size: number }
 */
function fetchResources(urls, callback) {
  // IMPLEMENTATION HERE
}

module.exports = {
  fetchResources
};

Dependencies { .dependencies }

crawler { .dependency }

Provides web scraping and HTTP request capabilities with automatic content-type based response processing.

Install with Tessl CLI

npx tessl i tessl/npm-crawler