A MkDocs plugin that validates URLs, including anchors, in rendered HTML files
npx @tessl/cli install tessl/pypi-mkdocs-htmlproofer-plugin@1.3.0A MkDocs plugin that validates URLs, including anchors, in rendered HTML files. It integrates seamlessly with the MkDocs build process to automatically check all links (both internal and external) for validity, ensuring documentation maintains high quality and user experience.
pip install mkdocs-htmlproofer-pluginfrom htmlproofer.plugin import HtmlProoferPluginEnable the plugin in your mkdocs.yml configuration:
plugins:
- search
- htmlprooferBasic configuration with error handling:
plugins:
- search
- htmlproofer:
enabled: true
raise_error: true
validate_external_urls: true
skip_downloads: falseAdvanced configuration with URL filtering:
plugins:
- search
- htmlproofer:
raise_error_after_finish: true
raise_error_excludes:
504: ['https://www.mkdocs.org/']
404: ['https://github.com/manuzhang/*']
ignore_urls:
- https://github.com/myprivateorg/*
- https://app.dynamic-service.io*
ignore_pages:
- path/to/excluded/file
- path/to/excluded/folder/*
warn_on_ignored_urls: trueThe plugin operates through MkDocs' event-driven plugin system:
The main plugin class with comprehensive configuration options for URL validation behavior.
class HtmlProoferPlugin(BasePlugin):
"""
MkDocs plugin for validating URLs in rendered HTML files.
Configuration Options:
- enabled (bool): Enable/disable plugin (default: True)
- raise_error (bool): Raise error on first bad URL (default: False)
- raise_error_after_finish (bool): Raise error after checking all links (default: False)
- raise_error_excludes (dict): URL patterns to exclude from errors by status code (default: {})
- skip_downloads (bool): Skip downloading remote URL content (default: False)
- validate_external_urls (bool): Validate external HTTP/HTTPS URLs (default: True)
- validate_rendered_template (bool): Validate entire rendered template (default: False)
- ignore_urls (list): URLs to ignore completely with wildcard support (default: [])
- warn_on_ignored_urls (bool): Log warnings for ignored URLs (default: False)
- ignore_pages (list): Pages to ignore completely with wildcard support (default: [])
"""
def __init__(self):
"""Initialize plugin with HTTP session and scheme handlers."""
def on_post_build(self, config: Config) -> None:
"""Hook called after build completion to handle final error reporting."""
def on_files(self, files: Files, config: Config) -> None:
"""Hook called to store files for later URL resolution."""
def on_post_page(self, output_content: str, page: Page, config: Config) -> None:
"""Hook called after page processing to validate URLs."""Core URL validation functionality with support for internal and external links.
def get_url_status(
self,
url: str,
src_path: str,
all_element_ids: Set[str],
files: Dict[str, File]
) -> int:
"""
Get HTTP status code for a URL.
Parameters:
- url: URL to validate
- src_path: Source file path for context
- all_element_ids: Set of all element IDs on the page
- files: Dictionary mapping paths to File objects
Returns:
Status code (0 for valid, 404 for not found, etc.)
"""
def get_external_url(self, url: str, scheme: str, src_path: str) -> int:
"""
Get status for external URLs by delegating to scheme handlers.
Parameters:
- url: External URL to validate
- scheme: URL scheme (http, https)
- src_path: Source file path for context
Returns:
Status code from scheme handler or 0 for unknown schemes
"""
def resolve_web_scheme(self, url: str) -> int:
"""
Resolve HTTP/HTTPS URLs with caching and timeout handling.
Parameters:
- url: HTTP/HTTPS URL to resolve
Returns:
HTTP status code or error code (-1 for connection errors, 504 for timeout)
"""Static methods for resolving and validating internal links and anchors.
@staticmethod
def is_url_target_valid(url: str, src_path: str, files: Dict[str, File]) -> bool:
"""
Check if a URL target is valid within the MkDocs site structure.
Parameters:
- url: URL to validate
- src_path: Source file path for relative link resolution
- files: Dictionary mapping paths to File objects
Returns:
True if target exists and anchor (if present) is valid
"""
@staticmethod
def find_source_file(url: str, src_path: str, files: Dict[str, File]) -> Optional[File]:
"""
Find the original source file for a built URL.
Parameters:
- url: Built URL to resolve
- src_path: Source file path for relative link resolution
- files: Dictionary mapping paths to File objects
Returns:
File object if found, None otherwise
"""
@staticmethod
def find_target_markdown(url: str, src_path: str, files: Dict[str, File]) -> Optional[str]:
"""
Find the original Markdown source for a built URL.
Parameters:
- url: Built URL to resolve
- src_path: Source file path for context
- files: Dictionary mapping paths to File objects
Returns:
Markdown content if found, None otherwise
"""Advanced anchor validation with support for attr_list extension and heading parsing.
@staticmethod
def contains_anchor(markdown: str, anchor: str) -> bool:
"""
Check if Markdown source contains a heading or element that corresponds to an anchor.
Supports:
- Standard heading anchors (auto-generated from heading text)
- attr_list extension custom anchors: # Heading {#custom-anchor}
- HTML anchor tags: <a id="anchor-name">
- Paragraph anchors: {#paragraph-anchor}
- Image anchors: {#image-anchor}
Parameters:
- markdown: Markdown source text to search
- anchor: Anchor name to find
Returns:
True if anchor exists in the markdown source
"""Configurable error handling with pattern-based URL exclusions.
def report_invalid_url(self, url: str, url_status: int, src_path: str):
"""
Report invalid URL with configured behavior (error, warning, or build failure).
Parameters:
- url: Invalid URL
- url_status: HTTP status code or error code
- src_path: Source file path where URL was found
"""
@staticmethod
def bad_url(url_status: int) -> bool:
"""
Determine if a URL status code indicates an error.
Parameters:
- url_status: HTTP status code or error code
Returns:
True if status indicates error (>=400 or -1)
"""
@staticmethod
def is_error(config: Config, url: str, url_status: int) -> bool:
"""
Check if URL should be treated as error based on exclusion configuration.
Parameters:
- config: Plugin configuration
- url: URL to check
- url_status: Status code
Returns:
True if URL should be treated as error (not excluded)
"""Logging utilities with plugin name prefixes.
def log_info(msg: str, *args, **kwargs):
"""Log info message with htmlproofer prefix."""
def log_warning(msg: str, *args, **kwargs):
"""Log warning message with htmlproofer prefix."""
def log_error(msg: str, *args, **kwargs):
"""Log error message with htmlproofer prefix."""Immediate Failure: Stop on first error
plugins:
- htmlproofer:
raise_error: trueDeferred Failure: Check all links, then fail if any are invalid
plugins:
- htmlproofer:
raise_error_after_finish: trueWarning Only: Report issues but don't fail build (default)
plugins:
- htmlproofer:
# Default behavior - no error raising configuredIgnore Specific URLs: Skip validation entirely
plugins:
- htmlproofer:
ignore_urls:
- https://private-site.com/*
- https://localhost:*
- https://127.0.0.1:*Error Exclusions: Allow specific status codes for specific URLs
plugins:
- htmlproofer:
raise_error: true
raise_error_excludes:
404: ['https://github.com/*/archive/*']
503: ['https://api.service.com/*']
400: ['*'] # Ignore all 400 errorsPage Exclusions: Skip validation for specific pages
plugins:
- htmlproofer:
ignore_pages:
- draft-content/*
- internal-docs/private.mdSkip External URLs: Validate only internal links
plugins:
- htmlproofer:
validate_external_urls: falseSkip Downloads: Don't download full content (faster)
plugins:
- htmlproofer:
skip_downloads: trueTemplate Validation: Validate full page templates (slower but comprehensive)
plugins:
- htmlproofer:
validate_rendered_template: trueURL_TIMEOUT: float = 10.0
"""Timeout for HTTP requests in seconds."""
URL_HEADERS: Dict[str, str]
"""Default headers for HTTP requests including User-Agent and Accept-Language."""
NAME: str = "htmlproofer"
"""Plugin name used in logging."""
MARKDOWN_ANCHOR_PATTERN: Pattern[str]
"""Regex pattern to match markdown links with optional anchors."""
HEADING_PATTERN: Pattern[str]
"""Regex pattern to match markdown headings."""
HTML_LINK_PATTERN: Pattern[str]
"""Regex pattern to match HTML anchor tags with IDs."""
IMAGE_PATTERN: Pattern[str]
"""Regex pattern to match markdown image syntax."""
LOCAL_PATTERNS: List[Pattern[str]]
"""List of patterns to match local development URLs."""
ATTRLIST_ANCHOR_PATTERN: Pattern[str]
"""Regex pattern to match attr_list extension anchor syntax."""
ATTRLIST_PATTERN: Pattern[str]
"""Regex pattern to match attr_list extension syntax."""
EMOJI_PATTERN: Pattern[str]
"""Regex pattern to match emoji syntax in headings."""