CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl/npm-crawler

A ready-to-use web spider that works with proxies, asynchrony, rate limit, configurable request pools, jQuery, and HTTP/2 support.

94

1.17x
Overview
Eval results
Files

task.mdevals/scenario-9/

Resilient Web Scraper

Build a resilient web scraper that can handle unreliable endpoints with automatic retry capabilities.

Background

You are building a monitoring system that periodically checks the status of multiple web services. These services can be unreliable and may fail intermittently due to network issues, server overload, or temporary outages. Your scraper needs to automatically retry failed requests with appropriate delays to maximize successful data collection.

Requirements

Core Functionality

Implement a web scraper that:

  1. Fetches data from multiple endpoints - The scraper should accept an array of URLs to scrape
  2. Handles failures gracefully - When requests fail (timeout, network error, server error), the scraper should automatically retry
  3. Configurable retry behavior - Support configuring:
    • Number of retry attempts
    • Delay between retry attempts
    • Request timeout
  4. Reports results - After all scraping is complete, report which URLs succeeded and which failed (even after retries)

Implementation Details

  • Use the crawler package for making HTTP requests
  • Configure automatic retries for failed requests
  • Set appropriate timeouts to detect failing endpoints
  • Track which URLs were successfully scraped vs which ones failed
  • Log retry attempts to show the retry mechanism in action

Test Cases

Create a test file that demonstrates the retry functionality:

Test Case 1: Successful request with no retries needed @test

// Given a reliable endpoint that responds successfully
// When the scraper makes a request
// Then the request succeeds on the first attempt with no retries

Test Case 2: Failed request with successful retry @test

// Given an endpoint that fails initially but succeeds on retry
// When the scraper makes a request with retry enabled
// Then the request eventually succeeds after one or more retries

Test Case 3: Complete failure after all retries exhausted @test

// Given an endpoint that consistently fails
// When the scraper makes a request with limited retries
// Then all retry attempts are exhausted and the request is marked as failed

Expected Behavior

When running the scraper:

  • It should attempt each request and automatically retry on failure
  • Each retry attempt should wait the configured interval before retrying
  • After all retries are exhausted, the request should be marked as failed
  • The scraper should complete processing all URLs even if some fail

Deliverables

  1. Implementation file: scraper.js or scraper.ts
  2. Test file: scraper.test.js or scraper.test.ts
  3. The test file should include the three test cases specified above

Dependencies { .dependencies }

crawler { .dependency }

A ready-to-use web spider with automatic retry mechanisms and failure handling capabilities.

Install with Tessl CLI

npx tessl i tessl/npm-crawler

tile.json