Ctrl + K
DocumentationLog inGet started

tessl/pypi-w3lib

tessl install tessl/pypi-w3lib@2.3.0

Library of web-related functions for HTML manipulation, HTTP processing, URL handling, and encoding detection

Agent Success

Agent success rate when using this tile

84%

Improvement

Agent success rate improvement when using this tile compared to baseline

0.91x

Baseline

Agent success rate without this tile

92%

task.mdevals/scenario-6/

Meta Refresh Redirect Extractor

Build a utility that extracts and validates meta refresh redirect information from HTML content. The tool should parse HTML documents to find meta refresh tags, extract the delay time and target URL, and handle various edge cases.

Functionality

Extract Meta Refresh Information

Your utility should extract redirect information from HTML meta refresh tags. The meta refresh tag can appear in different formats:

  • <meta http-equiv="refresh" content="0;url=https://example.com">
  • <meta http-equiv="refresh" content="5; url=https://example.com/page">
  • <meta http-equiv="Refresh" content="10;URL='https://example.com'">

The function should:

  • Return a tuple containing the delay interval (in seconds) and the target URL
  • Handle relative URLs by resolving them against a provided base URL
  • Return None if no meta refresh tag is found
  • Support case-insensitive matching of the http-equiv attribute
  • Handle both quoted and unquoted URLs in the content attribute

Test Cases

  • Given HTML with <meta http-equiv="refresh" content="0;url=https://example.com/redirect"> and base URL https://original.com, returns (0, "https://example.com/redirect") @test

  • Given HTML with <meta http-equiv="refresh" content="5;url=relative/path.html"> and base URL https://example.com/page/, returns (5, "https://example.com/page/relative/path.html") @test

  • Given HTML with no meta refresh tag, returns None @test

  • Given HTML with <meta http-equiv="Refresh" content="3; URL='https://example.com/target'"> (case-insensitive and quoted URL), returns (3, "https://example.com/target") @test

Implementation

@generates

API

def extract_meta_refresh(html_content: str, base_url: str) -> tuple[int, str] | None:
    """
    Extracts meta refresh redirect information from HTML content.

    Args:
        html_content: The HTML document as a string
        base_url: The base URL to resolve relative URLs against

    Returns:
        A tuple of (delay_seconds, target_url) if meta refresh is found,
        None otherwise
    """
    pass

Dependencies { .dependencies }

w3lib { .dependency }

Provides web utility functions for HTML processing and URL handling.

@satisfied-by

Version

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
pypipkg:pypi/w3lib@2.3.x
tile.json