tessl/npm-crawler

A ready-to-use web spider that works with proxies, asynchrony, rate limit, configurable request pools, jQuery, and HTTP/2 support.

1.17x

Overview

Eval results

Files

Resilient Web Scraper

Name: tessl/npm-crawler
Rating: 0.94 (1 reviews)
Author: tessl

Build a resilient web scraper that can handle unreliable endpoints with automatic retry capabilities.

Background

You are building a monitoring system that periodically checks the status of multiple web services. These services can be unreliable and may fail intermittently due to network issues, server overload, or temporary outages. Your scraper needs to automatically retry failed requests with appropriate delays to maximize successful data collection.

Requirements

Core Functionality

Implement a web scraper that:

Fetches data from multiple endpoints - The scraper should accept an array of URLs to scrape
Handles failures gracefully - When requests fail (timeout, network error, server error), the scraper should automatically retry
Configurable retry behavior - Support configuring:
- Number of retry attempts
- Delay between retry attempts
- Request timeout
Reports results - After all scraping is complete, report which URLs succeeded and which failed (even after retries)

Implementation Details

Use the crawler package for making HTTP requests
Configure automatic retries for failed requests
Set appropriate timeouts to detect failing endpoints
Track which URLs were successfully scraped vs which ones failed
Log retry attempts to show the retry mechanism in action

Test Cases

Create a test file that demonstrates the retry functionality:

Test Case 1: Successful request with no retries needed @test

// Given a reliable endpoint that responds successfully
// When the scraper makes a request
// Then the request succeeds on the first attempt with no retries

Test Case 2: Failed request with successful retry @test

// Given an endpoint that fails initially but succeeds on retry
// When the scraper makes a request with retry enabled
// Then the request eventually succeeds after one or more retries

Test Case 3: Complete failure after all retries exhausted @test

// Given an endpoint that consistently fails
// When the scraper makes a request with limited retries
// Then all retry attempts are exhausted and the request is marked as failed

Expected Behavior

When running the scraper:

It should attempt each request and automatically retry on failure
Each retry attempt should wait the configured interval before retrying
After all retries are exhausted, the request should be marked as failed
The scraper should complete processing all URLs even if some fail

Deliverables

Implementation file: scraper.js or scraper.ts
Test file: scraper.test.js or scraper.test.ts
The test file should include the three test cases specified above

Dependencies { .dependencies }

crawler { .dependency }

A ready-to-use web spider with automatic retry mechanisms and failure handling capabilities.

Install with Tessl CLI

npx tessl i tessl/npm-crawler