tessl/npm-crawler

A ready-to-use web spider that works with proxies, asynchrony, rate limit, configurable request pools, jQuery, and HTTP/2 support.

1.17x

Overview

Eval results

Files

International News Aggregator

Name: tessl/npm-crawler
Rating: 0.94 (1 reviews)
Author: tessl

Build a web scraper that collects news article titles and content from international news websites that use different character encodings. The scraper should handle multiple character sets correctly and convert all content to UTF-8 for storage.

Requirements

Your solution should:

Create a scraper that can fetch articles from multiple URLs
Automatically detect and handle different character encodings (e.g., UTF-8, GB2312, ISO-8859-1, Shift-JIS)
Convert all scraped content to UTF-8 format
Extract article titles from <h1> tags and article text from elements with class article-content
Store results in an array with the format: { url, title, content, encoding }
Handle encoding errors gracefully

Implementation

@generates

API

/**
 * Scrapes articles from the provided URLs with automatic charset handling
 *
 * @param {Array<string>} urls - Array of URLs to scrape
 * @param {function} onComplete - Callback invoked when all scraping is complete
 *                                 Receives array of results: [{ url, title, content, encoding }]
 */
function scrapeInternationalArticles(urls, onComplete) {
  // IMPLEMENTATION HERE
}

module.exports = { scrapeInternationalArticles };

Test Cases

Given a URL with UTF-8 encoding, the scraper extracts the title and content correctly @test
Given a URL with GB2312 encoding (Chinese), the scraper detects the encoding and converts content to UTF-8 @test
Given multiple URLs with different encodings, the scraper processes all of them and returns results in UTF-8 @test

Dependencies { .dependencies }

crawler { .dependency }

Provides web crawling and scraping functionality with charset detection and encoding conversion support.

Install with Tessl CLI

npx tessl i tessl/npm-crawler