A ready-to-use web spider that works with proxies, asynchrony, rate limit, configurable request pools, jQuery, and HTTP/2 support.
94
Build a web scraping tool that collects product information from a list of URLs while avoiding duplicate requests to the same URLs.
Your scraper should:
The scraper will receive product URLs as input. Each URL represents a product page with the following HTML structure:
<html>
<head><title>Product Name</title></head>
<body>
<h1 class="product-title">Product Name</h1>
<span class="price">$XX.XX</span>
</body>
</html>When all crawling completes, your scraper should output all collected products in JSON format:
[
{
"url": "http://example.com/product/1",
"title": "Product Name",
"price": "$XX.XX"
}
]@generates
/**
* Creates a new product scraper that prevents duplicate URL requests
*
* @param {Function} onComplete - Callback invoked when all crawling completes, receives array of products
* @returns {Object} Scraper instance with addUrls method
*/
function createScraper(onComplete) {
// Implementation
}
/**
* Scraper instance
* @typedef {Object} Scraper
* @property {Function} addUrls - Adds URLs to crawl. Signature: addUrls(urls: string[])
*/
module.exports = { createScraper };Provides web crawling and HTML parsing capabilities with built-in duplicate URL detection.
Install with Tessl CLI
npx tessl i tessl/npm-crawlerevals
scenario-1
scenario-2
scenario-3
scenario-4
scenario-5
scenario-6
scenario-7
scenario-8
scenario-9
scenario-10