or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

index.md

tile.json

tessl/pypi-mkdocs-htmlproofer-plugin

A MkDocs plugin that validates URLs, including anchors, in rendered HTML files

Workspace: tessl
Visibility: Public
Created: 3 months ago
Last updated: 3 months ago
Describes: pkg:pypi/mkdocs-htmlproofer-plugin@1.3.x

To install, run

npx @tessl/cli install tessl/pypi-mkdocs-htmlproofer-plugin@1.3.0

MkDocs HTMLProofer Plugin

A MkDocs plugin that validates URLs, including anchors, in rendered HTML files. It integrates seamlessly with the MkDocs build process to automatically check all links (both internal and external) for validity, ensuring documentation maintains high quality and user experience.

Package Information

Package Name: mkdocs-htmlproofer-plugin
Package Type: pypi
Language: Python
Installation: pip install mkdocs-htmlproofer-plugin

Core Imports

from htmlproofer.plugin import HtmlProoferPlugin

Basic Usage

Enable the plugin in your mkdocs.yml configuration:

plugins:
  - search
  - htmlproofer

Basic configuration with error handling:

plugins:
  - search
  - htmlproofer:
      enabled: true
      raise_error: true
      validate_external_urls: true
      skip_downloads: false

Advanced configuration with URL filtering:

plugins:
  - search
  - htmlproofer:
      raise_error_after_finish: true
      raise_error_excludes:
        504: ['https://www.mkdocs.org/']
        404: ['https://github.com/manuzhang/*']
      ignore_urls:
        - https://github.com/myprivateorg/*
        - https://app.dynamic-service.io*
      ignore_pages:
        - path/to/excluded/file
        - path/to/excluded/folder/*
      warn_on_ignored_urls: true

Architecture

The plugin operates through MkDocs' event-driven plugin system:

HtmlProoferPlugin: Main plugin class extending BasePlugin
URL Resolution: Handles different URL schemes (HTTP/HTTPS) with caching
Anchor Validation: Validates internal anchors and attr_list extension support
Error Reporting: Configurable error handling with exclusion patterns
File Mapping: Optimized file lookup for internal link resolution

Capabilities

Plugin Configuration

The main plugin class with comprehensive configuration options for URL validation behavior.

class HtmlProoferPlugin(BasePlugin):
    """
    MkDocs plugin for validating URLs in rendered HTML files.
    
    Configuration Options:
    - enabled (bool): Enable/disable plugin (default: True)
    - raise_error (bool): Raise error on first bad URL (default: False)
    - raise_error_after_finish (bool): Raise error after checking all links (default: False)  
    - raise_error_excludes (dict): URL patterns to exclude from errors by status code (default: {})
    - skip_downloads (bool): Skip downloading remote URL content (default: False)
    - validate_external_urls (bool): Validate external HTTP/HTTPS URLs (default: True)
    - validate_rendered_template (bool): Validate entire rendered template (default: False)
    - ignore_urls (list): URLs to ignore completely with wildcard support (default: [])
    - warn_on_ignored_urls (bool): Log warnings for ignored URLs (default: False) 
    - ignore_pages (list): Pages to ignore completely with wildcard support (default: [])
    """
    
    def __init__(self):
        """Initialize plugin with HTTP session and scheme handlers."""
    
    def on_post_build(self, config: Config) -> None:
        """Hook called after build completion to handle final error reporting."""
    
    def on_files(self, files: Files, config: Config) -> None:
        """Hook called to store files for later URL resolution."""
    
    def on_post_page(self, output_content: str, page: Page, config: Config) -> None:
        """Hook called after page processing to validate URLs."""

URL Validation

Core URL validation functionality with support for internal and external links.

def get_url_status(
    self,
    url: str, 
    src_path: str,
    all_element_ids: Set[str],
    files: Dict[str, File]
) -> int:
    """
    Get HTTP status code for a URL.
    
    Parameters:
    - url: URL to validate
    - src_path: Source file path for context
    - all_element_ids: Set of all element IDs on the page
    - files: Dictionary mapping paths to File objects
    
    Returns:
    Status code (0 for valid, 404 for not found, etc.)
    """

def get_external_url(self, url: str, scheme: str, src_path: str) -> int:
    """
    Get status for external URLs by delegating to scheme handlers.
    
    Parameters:
    - url: External URL to validate
    - scheme: URL scheme (http, https)
    - src_path: Source file path for context
    
    Returns:
    Status code from scheme handler or 0 for unknown schemes
    """

def resolve_web_scheme(self, url: str) -> int:
    """
    Resolve HTTP/HTTPS URLs with caching and timeout handling.
    
    Parameters:
    - url: HTTP/HTTPS URL to resolve
    
    Returns:
    HTTP status code or error code (-1 for connection errors, 504 for timeout)
    """

Internal Link Resolution

Static methods for resolving and validating internal links and anchors.

@staticmethod
def is_url_target_valid(url: str, src_path: str, files: Dict[str, File]) -> bool:
    """
    Check if a URL target is valid within the MkDocs site structure.
    
    Parameters:
    - url: URL to validate
    - src_path: Source file path for relative link resolution
    - files: Dictionary mapping paths to File objects
    
    Returns:
    True if target exists and anchor (if present) is valid
    """

@staticmethod
def find_source_file(url: str, src_path: str, files: Dict[str, File]) -> Optional[File]:
    """
    Find the original source file for a built URL.
    
    Parameters:
    - url: Built URL to resolve
    - src_path: Source file path for relative link resolution  
    - files: Dictionary mapping paths to File objects
    
    Returns:
    File object if found, None otherwise
    """

@staticmethod
def find_target_markdown(url: str, src_path: str, files: Dict[str, File]) -> Optional[str]:
    """
    Find the original Markdown source for a built URL.
    
    Parameters:
    - url: Built URL to resolve
    - src_path: Source file path for context
    - files: Dictionary mapping paths to File objects
    
    Returns:
    Markdown content if found, None otherwise
    """

Anchor Validation

Advanced anchor validation with support for attr_list extension and heading parsing.

@staticmethod
def contains_anchor(markdown: str, anchor: str) -> bool:
    """
    Check if Markdown source contains a heading or element that corresponds to an anchor.
    
    Supports:
    - Standard heading anchors (auto-generated from heading text)
    - attr_list extension custom anchors: # Heading {#custom-anchor}
    - HTML anchor tags: <a id="anchor-name">
    - Paragraph anchors: {#paragraph-anchor}
    - Image anchors: ![alt](image.png){#image-anchor}
    
    Parameters:
    - markdown: Markdown source text to search
    - anchor: Anchor name to find
    
    Returns:
    True if anchor exists in the markdown source
    """

Error Handling and Reporting

Configurable error handling with pattern-based URL exclusions.

def report_invalid_url(self, url: str, url_status: int, src_path: str):
    """
    Report invalid URL with configured behavior (error, warning, or build failure).
    
    Parameters:
    - url: Invalid URL
    - url_status: HTTP status code or error code
    - src_path: Source file path where URL was found
    """

@staticmethod
def bad_url(url_status: int) -> bool:
    """
    Determine if a URL status code indicates an error.
    
    Parameters:
    - url_status: HTTP status code or error code
    
    Returns:
    True if status indicates error (>=400 or -1)
    """

@staticmethod  
def is_error(config: Config, url: str, url_status: int) -> bool:
    """
    Check if URL should be treated as error based on exclusion configuration.
    
    Parameters:
    - config: Plugin configuration
    - url: URL to check
    - url_status: Status code
    
    Returns:
    True if URL should be treated as error (not excluded)
    """

Utility Functions

Logging utilities with plugin name prefixes.

def log_info(msg: str, *args, **kwargs):
    """Log info message with htmlproofer prefix."""

def log_warning(msg: str, *args, **kwargs):
    """Log warning message with htmlproofer prefix."""

def log_error(msg: str, *args, **kwargs):
    """Log error message with htmlproofer prefix."""

Configuration Patterns

Error Handling Strategies

Immediate Failure: Stop on first error

plugins:
  - htmlproofer:
      raise_error: true

Deferred Failure: Check all links, then fail if any are invalid

plugins:
  - htmlproofer:
      raise_error_after_finish: true

Warning Only: Report issues but don't fail build (default)

plugins:
  - htmlproofer:
      # Default behavior - no error raising configured

URL Filtering

Ignore Specific URLs: Skip validation entirely

plugins:
  - htmlproofer:
      ignore_urls:
        - https://private-site.com/*
        - https://localhost:*
        - https://127.0.0.1:*

Error Exclusions: Allow specific status codes for specific URLs

plugins:
  - htmlproofer:
      raise_error: true
      raise_error_excludes:
        404: ['https://github.com/*/archive/*']
        503: ['https://api.service.com/*']
        400: ['*']  # Ignore all 400 errors

Page Exclusions: Skip validation for specific pages

plugins:
  - htmlproofer:
      ignore_pages:
        - draft-content/*
        - internal-docs/private.md

Performance Optimization

Skip External URLs: Validate only internal links

plugins:
  - htmlproofer:
      validate_external_urls: false

Skip Downloads: Don't download full content (faster)

plugins:
  - htmlproofer:
      skip_downloads: true

Template Validation: Validate full page templates (slower but comprehensive)

plugins:
  - htmlproofer:
      validate_rendered_template: true

Constants and Patterns

URL_TIMEOUT: float = 10.0
"""Timeout for HTTP requests in seconds."""

URL_HEADERS: Dict[str, str]
"""Default headers for HTTP requests including User-Agent and Accept-Language."""

NAME: str = "htmlproofer"
"""Plugin name used in logging."""

MARKDOWN_ANCHOR_PATTERN: Pattern[str]
"""Regex pattern to match markdown links with optional anchors."""

HEADING_PATTERN: Pattern[str] 
"""Regex pattern to match markdown headings."""

HTML_LINK_PATTERN: Pattern[str]
"""Regex pattern to match HTML anchor tags with IDs."""

IMAGE_PATTERN: Pattern[str]
"""Regex pattern to match markdown image syntax."""

LOCAL_PATTERNS: List[Pattern[str]]
"""List of patterns to match local development URLs."""

ATTRLIST_ANCHOR_PATTERN: Pattern[str]
"""Regex pattern to match attr_list extension anchor syntax."""

ATTRLIST_PATTERN: Pattern[str]
"""Regex pattern to match attr_list extension syntax."""

EMOJI_PATTERN: Pattern[str]
"""Regex pattern to match emoji syntax in headings."""