tessl/pypi-w3lib

Library of web-related functions for HTML manipulation, HTTP processing, URL handling, and encoding detection

0.91x

Overview

Eval results

Files

Meta Refresh Redirect Extractor

Name: tessl/pypi-w3lib
Rating: 0.84 (1 reviews)
Author: tessl

Build a utility that extracts and validates meta refresh redirect information from HTML content. The tool should parse HTML documents to find meta refresh tags, extract the delay time and target URL, and handle various edge cases.

Functionality

Extract Meta Refresh Information

Your utility should extract redirect information from HTML meta refresh tags. The meta refresh tag can appear in different formats:

<meta http-equiv="refresh" content="0;url=https://example.com">
<meta http-equiv="refresh" content="5; url=https://example.com/page">
<meta http-equiv="Refresh" content="10;URL='https://example.com'">

The function should:

Return a tuple containing the delay interval (in seconds) and the target URL
Handle relative URLs by resolving them against a provided base URL
Return None if no meta refresh tag is found
Support case-insensitive matching of the http-equiv attribute
Handle both quoted and unquoted URLs in the content attribute

Test Cases

Given HTML with <meta http-equiv="refresh" content="0;url=https://example.com/redirect"> and base URL https://original.com, returns (0, "https://example.com/redirect") @test
Given HTML with <meta http-equiv="refresh" content="5;url=relative/path.html"> and base URL https://example.com/page/, returns (5, "https://example.com/page/relative/path.html") @test
Given HTML with no meta refresh tag, returns None @test
Given HTML with <meta http-equiv="Refresh" content="3; URL='https://example.com/target'"> (case-insensitive and quoted URL), returns (3, "https://example.com/target") @test

Implementation

@generates

API

def extract_meta_refresh(html_content: str, base_url: str) -> tuple[int, str] | None:
    """
    Extracts meta refresh redirect information from HTML content.

    Args:
        html_content: The HTML document as a string
        base_url: The base URL to resolve relative URLs against

    Returns:
        A tuple of (delay_seconds, target_url) if meta refresh is found,
        None otherwise
    """
    pass

Dependencies { .dependencies }

w3lib { .dependency }

Provides web utility functions for HTML processing and URL handling.

@satisfied-by

Install with Tessl CLI

npx tessl i tessl/pypi-w3lib