tessl/npm-crawler

A ready-to-use web spider that works with proxies, asynchrony, rate limit, configurable request pools, jQuery, and HTTP/2 support.

1.17x

Overview

Eval results

Files

Session-Aware Web Scraper

Name: tessl/npm-crawler
Rating: 0.94 (1 reviews)
Author: tessl

Build a web scraper that maintains user session state across multiple page requests by persisting cookies between requests.

Problem Description

You need to create a scraper that can navigate through a multi-page website requiring authentication or session management. The scraper should maintain cookies throughout the crawling session, allowing it to access pages that depend on session state.

Implement a scraper that:

Makes an initial request to establish a session (e.g., a login or homepage visit)
Makes subsequent requests that rely on the session cookies from the first request
Properly shares cookie state across all requests in the crawling session

Requirements

Create a module that exports a createSessionScraper function
The function should accept a configuration object with at least a callback function
All requests made by the scraper should share the same cookie jar
Cookies received from responses should be automatically stored and sent with subsequent requests
The scraper should queue multiple URLs and process them with shared session state

Test Cases

When the scraper makes two sequential requests to the same domain, cookies from the first response should be included in the second request @test
Multiple scrapers with different cookie jars should maintain independent session states @test

Implementation

@generates

API

/**
 * Creates a session-aware web scraper that maintains cookies across requests.
 *
 * @param {Object} options - Configuration options
 * @param {Function} options.callback - Callback function for processing responses
 * @returns {Object} Scraper instance with add() and drain event
 */
function createSessionScraper(options) {
  // Implementation
}

module.exports = { createSessionScraper };

Dependencies { .dependencies }

crawler { .dependency }

Provides web scraping functionality with cookie support.

@satisfied-by

tough-cookie { .dependency }

Provides cookie jar functionality for storing and managing cookies.