CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl/pypi-w3lib

Library of web-related functions for HTML manipulation, HTTP processing, URL handling, and encoding detection

84

0.91x
Overview
Eval results
Files

task.mdevals/scenario-6/

Meta Refresh Redirect Extractor

Build a utility that extracts and validates meta refresh redirect information from HTML content. The tool should parse HTML documents to find meta refresh tags, extract the delay time and target URL, and handle various edge cases.

Functionality

Extract Meta Refresh Information

Your utility should extract redirect information from HTML meta refresh tags. The meta refresh tag can appear in different formats:

  • <meta http-equiv="refresh" content="0;url=https://example.com">
  • <meta http-equiv="refresh" content="5; url=https://example.com/page">
  • <meta http-equiv="Refresh" content="10;URL='https://example.com'">

The function should:

  • Return a tuple containing the delay interval (in seconds) and the target URL
  • Handle relative URLs by resolving them against a provided base URL
  • Return None if no meta refresh tag is found
  • Support case-insensitive matching of the http-equiv attribute
  • Handle both quoted and unquoted URLs in the content attribute

Test Cases

  • Given HTML with <meta http-equiv="refresh" content="0;url=https://example.com/redirect"> and base URL https://original.com, returns (0, "https://example.com/redirect") @test

  • Given HTML with <meta http-equiv="refresh" content="5;url=relative/path.html"> and base URL https://example.com/page/, returns (5, "https://example.com/page/relative/path.html") @test

  • Given HTML with no meta refresh tag, returns None @test

  • Given HTML with <meta http-equiv="Refresh" content="3; URL='https://example.com/target'"> (case-insensitive and quoted URL), returns (3, "https://example.com/target") @test

Implementation

@generates

API

def extract_meta_refresh(html_content: str, base_url: str) -> tuple[int, str] | None:
    """
    Extracts meta refresh redirect information from HTML content.

    Args:
        html_content: The HTML document as a string
        base_url: The base URL to resolve relative URLs against

    Returns:
        A tuple of (delay_seconds, target_url) if meta refresh is found,
        None otherwise
    """
    pass

Dependencies { .dependencies }

w3lib { .dependency }

Provides web utility functions for HTML processing and URL handling.

@satisfied-by

Install with Tessl CLI

npx tessl i tessl/pypi-w3lib

tile.json