tessl/npm-crawler

A ready-to-use web spider that works with proxies, asynchrony, rate limit, configurable request pools, jQuery, and HTTP/2 support.

1.17x

Overview

Eval results

Files

Multi-Region Product Availability Checker

Name: tessl/npm-crawler
Rating: 0.94 (1 reviews)
Author: tessl

Build a web scraper that checks product availability across multiple regional e-commerce endpoints. Each region has different rate limiting requirements that must be respected to avoid being blocked.

Requirements

Your system must scrape product availability information from multiple regional endpoints (simulated as different URLs). Each region has its own rate limiting policy:

Region A: Maximum 1 request per 2 seconds
Region B: Maximum 1 request per 3 seconds
Region C: Maximum 1 request per 1 second

The scraper should:

Accept a list of product URLs grouped by region
Fetch product information from each URL while respecting per-region rate limits
Allow multiple regions to be scraped concurrently (different regions should not block each other)
Extract and return the product title from each page
Handle the completion of all requests and report results

Implementation

@generates

API

/**
 * Creates and configures a multi-region product scraper.
 *
 * @param {Object} config - Configuration for the scraper
 * @param {Array<Object>} config.regions - Array of region configurations
 * @param {string} config.regions[].name - Region identifier
 * @param {number} config.regions[].rateLimit - Minimum milliseconds between requests for this region
 * @param {string} config.regions[].proxy - Proxy URL for this region (optional)
 * @returns {Object} Scraper instance with methods to add tasks and handle completion
 */
function createScraper(config) {
  // Returns an object with:
  // - addTask(regionName, url, callback): adds a scraping task for a specific region
  // - onComplete(callback): registers a callback for when all tasks finish
  // - start(): begins processing the queue
}

module.exports = { createScraper };

Capabilities

Rate limit enforcement per region

Given 2 URLs for Region A (2s rate limit), scraping both URLs takes at least 2 seconds but less than 5 seconds @test
When scraping 3 URLs from Region C (1s rate limit), the tasks complete in at least 2 seconds @test

Concurrent processing across regions

Given 1 URL for Region A (2s rate limit) and 1 URL for Region B (3s rate limit), scraping both concurrently takes approximately 3 seconds (not 5 seconds) @test

HTML content extraction

The scraper correctly extracts product titles from HTML responses @test

Dependencies { .dependencies }

crawler { .dependency }

Provides web scraping capabilities with rate limiting and queue management.

@satisfied-by

Install with Tessl CLI

npx tessl i tessl/npm-crawler