tessl/npm-crawler

A ready-to-use web spider that works with proxies, asynchrony, rate limit, configurable request pools, jQuery, and HTTP/2 support.

1.17x

Overview

Eval results

Files

Web Scraper with User Agent Rotation

Name: tessl/npm-crawler
Rating: 0.94 (1 reviews)
Author: tessl

A simple web scraping utility that implements user agent rotation to avoid detection when crawling multiple pages from the same website.

Capabilities

User agent rotation

Scrapes multiple web pages using different user agents for each request. The scraper should automatically rotate through a provided list of user agent strings, making it more difficult for websites to detect automated scraping behavior.

Given an array of 3 URLs ['http://example.com/page1', 'http://example.com/page2', 'http://example.com/page3'] and an array of 3 user agents, the scraper successfully completes all requests and invokes the completion callback with results @test
The returned results include the HTML body content from each URL @test
The completion callback receives an array with 3 result objects, one for each URL @test

Implementation

@generates

API

/**
 * Creates and executes a web scraper that rotates user agents across multiple requests.
 *
 * @param {Array<string>} urls - Array of URLs to scrape
 * @param {Array<string>} userAgents - Array of user agent strings to rotate through
 * @param {Function} onComplete - Callback function invoked when all URLs have been scraped
 * @returns {Object} Crawler instance
 */
function createScraper(urls, userAgents, onComplete) {
  // IMPLEMENTATION HERE
}

module.exports = {
  createScraper
};

Dependencies { .dependencies }

crawler { .dependency }

Provides web scraping capabilities with user agent rotation support.

@satisfied-by

Install with Tessl CLI

npx tessl i tessl/npm-crawler