CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl/npm-crawler

A ready-to-use web spider that works with proxies, asynchrony, rate limit, configurable request pools, jQuery, and HTTP/2 support.

94

1.17x
Overview
Eval results
Files

task.mdevals/scenario-7/

International News Aggregator

Build a web scraper that collects news article titles and content from international news websites that use different character encodings. The scraper should handle multiple character sets correctly and convert all content to UTF-8 for storage.

Requirements

Your solution should:

  1. Create a scraper that can fetch articles from multiple URLs
  2. Automatically detect and handle different character encodings (e.g., UTF-8, GB2312, ISO-8859-1, Shift-JIS)
  3. Convert all scraped content to UTF-8 format
  4. Extract article titles from <h1> tags and article text from elements with class article-content
  5. Store results in an array with the format: { url, title, content, encoding }
  6. Handle encoding errors gracefully

Implementation

@generates

API

/**
 * Scrapes articles from the provided URLs with automatic charset handling
 *
 * @param {Array<string>} urls - Array of URLs to scrape
 * @param {function} onComplete - Callback invoked when all scraping is complete
 *                                 Receives array of results: [{ url, title, content, encoding }]
 */
function scrapeInternationalArticles(urls, onComplete) {
  // IMPLEMENTATION HERE
}

module.exports = { scrapeInternationalArticles };

Test Cases

  • Given a URL with UTF-8 encoding, the scraper extracts the title and content correctly @test
  • Given a URL with GB2312 encoding (Chinese), the scraper detects the encoding and converts content to UTF-8 @test
  • Given multiple URLs with different encodings, the scraper processes all of them and returns results in UTF-8 @test

Dependencies { .dependencies }

crawler { .dependency }

Provides web crawling and scraping functionality with charset detection and encoding conversion support.

Install with Tessl CLI

npx tessl i tessl/npm-crawler

tile.json