0
# MkDocs HTMLProofer Plugin
1
2
A MkDocs plugin that validates URLs, including anchors, in rendered HTML files. It integrates seamlessly with the MkDocs build process to automatically check all links (both internal and external) for validity, ensuring documentation maintains high quality and user experience.
3
4
## Package Information
5
6
- **Package Name**: mkdocs-htmlproofer-plugin
7
- **Package Type**: pypi
8
- **Language**: Python
9
- **Installation**: `pip install mkdocs-htmlproofer-plugin`
10
11
## Core Imports
12
13
```python
14
from htmlproofer.plugin import HtmlProoferPlugin
15
```
16
17
## Basic Usage
18
19
Enable the plugin in your `mkdocs.yml` configuration:
20
21
```yaml
22
plugins:
23
- search
24
- htmlproofer
25
```
26
27
Basic configuration with error handling:
28
29
```yaml
30
plugins:
31
- search
32
- htmlproofer:
33
enabled: true
34
raise_error: true
35
validate_external_urls: true
36
skip_downloads: false
37
```
38
39
Advanced configuration with URL filtering:
40
41
```yaml
42
plugins:
43
- search
44
- htmlproofer:
45
raise_error_after_finish: true
46
raise_error_excludes:
47
504: ['https://www.mkdocs.org/']
48
404: ['https://github.com/manuzhang/*']
49
ignore_urls:
50
- https://github.com/myprivateorg/*
51
- https://app.dynamic-service.io*
52
ignore_pages:
53
- path/to/excluded/file
54
- path/to/excluded/folder/*
55
warn_on_ignored_urls: true
56
```
57
58
## Architecture
59
60
The plugin operates through MkDocs' event-driven plugin system:
61
62
- **HtmlProoferPlugin**: Main plugin class extending BasePlugin
63
- **URL Resolution**: Handles different URL schemes (HTTP/HTTPS) with caching
64
- **Anchor Validation**: Validates internal anchors and attr_list extension support
65
- **Error Reporting**: Configurable error handling with exclusion patterns
66
- **File Mapping**: Optimized file lookup for internal link resolution
67
68
## Capabilities
69
70
### Plugin Configuration
71
72
The main plugin class with comprehensive configuration options for URL validation behavior.
73
74
```python { .api }
75
class HtmlProoferPlugin(BasePlugin):
76
"""
77
MkDocs plugin for validating URLs in rendered HTML files.
78
79
Configuration Options:
80
- enabled (bool): Enable/disable plugin (default: True)
81
- raise_error (bool): Raise error on first bad URL (default: False)
82
- raise_error_after_finish (bool): Raise error after checking all links (default: False)
83
- raise_error_excludes (dict): URL patterns to exclude from errors by status code (default: {})
84
- skip_downloads (bool): Skip downloading remote URL content (default: False)
85
- validate_external_urls (bool): Validate external HTTP/HTTPS URLs (default: True)
86
- validate_rendered_template (bool): Validate entire rendered template (default: False)
87
- ignore_urls (list): URLs to ignore completely with wildcard support (default: [])
88
- warn_on_ignored_urls (bool): Log warnings for ignored URLs (default: False)
89
- ignore_pages (list): Pages to ignore completely with wildcard support (default: [])
90
"""
91
92
def __init__(self):
93
"""Initialize plugin with HTTP session and scheme handlers."""
94
95
def on_post_build(self, config: Config) -> None:
96
"""Hook called after build completion to handle final error reporting."""
97
98
def on_files(self, files: Files, config: Config) -> None:
99
"""Hook called to store files for later URL resolution."""
100
101
def on_post_page(self, output_content: str, page: Page, config: Config) -> None:
102
"""Hook called after page processing to validate URLs."""
103
```
104
105
### URL Validation
106
107
Core URL validation functionality with support for internal and external links.
108
109
```python { .api }
110
def get_url_status(
111
self,
112
url: str,
113
src_path: str,
114
all_element_ids: Set[str],
115
files: Dict[str, File]
116
) -> int:
117
"""
118
Get HTTP status code for a URL.
119
120
Parameters:
121
- url: URL to validate
122
- src_path: Source file path for context
123
- all_element_ids: Set of all element IDs on the page
124
- files: Dictionary mapping paths to File objects
125
126
Returns:
127
Status code (0 for valid, 404 for not found, etc.)
128
"""
129
130
def get_external_url(self, url: str, scheme: str, src_path: str) -> int:
131
"""
132
Get status for external URLs by delegating to scheme handlers.
133
134
Parameters:
135
- url: External URL to validate
136
- scheme: URL scheme (http, https)
137
- src_path: Source file path for context
138
139
Returns:
140
Status code from scheme handler or 0 for unknown schemes
141
"""
142
143
def resolve_web_scheme(self, url: str) -> int:
144
"""
145
Resolve HTTP/HTTPS URLs with caching and timeout handling.
146
147
Parameters:
148
- url: HTTP/HTTPS URL to resolve
149
150
Returns:
151
HTTP status code or error code (-1 for connection errors, 504 for timeout)
152
"""
153
```
154
155
### Internal Link Resolution
156
157
Static methods for resolving and validating internal links and anchors.
158
159
```python { .api }
160
@staticmethod
161
def is_url_target_valid(url: str, src_path: str, files: Dict[str, File]) -> bool:
162
"""
163
Check if a URL target is valid within the MkDocs site structure.
164
165
Parameters:
166
- url: URL to validate
167
- src_path: Source file path for relative link resolution
168
- files: Dictionary mapping paths to File objects
169
170
Returns:
171
True if target exists and anchor (if present) is valid
172
"""
173
174
@staticmethod
175
def find_source_file(url: str, src_path: str, files: Dict[str, File]) -> Optional[File]:
176
"""
177
Find the original source file for a built URL.
178
179
Parameters:
180
- url: Built URL to resolve
181
- src_path: Source file path for relative link resolution
182
- files: Dictionary mapping paths to File objects
183
184
Returns:
185
File object if found, None otherwise
186
"""
187
188
@staticmethod
189
def find_target_markdown(url: str, src_path: str, files: Dict[str, File]) -> Optional[str]:
190
"""
191
Find the original Markdown source for a built URL.
192
193
Parameters:
194
- url: Built URL to resolve
195
- src_path: Source file path for context
196
- files: Dictionary mapping paths to File objects
197
198
Returns:
199
Markdown content if found, None otherwise
200
"""
201
```
202
203
### Anchor Validation
204
205
Advanced anchor validation with support for attr_list extension and heading parsing.
206
207
```python { .api }
208
@staticmethod
209
def contains_anchor(markdown: str, anchor: str) -> bool:
210
"""
211
Check if Markdown source contains a heading or element that corresponds to an anchor.
212
213
Supports:
214
- Standard heading anchors (auto-generated from heading text)
215
- attr_list extension custom anchors: # Heading {#custom-anchor}
216
- HTML anchor tags: <a id="anchor-name">
217
- Paragraph anchors: {#paragraph-anchor}
218
- Image anchors: {#image-anchor}
219
220
Parameters:
221
- markdown: Markdown source text to search
222
- anchor: Anchor name to find
223
224
Returns:
225
True if anchor exists in the markdown source
226
"""
227
```
228
229
### Error Handling and Reporting
230
231
Configurable error handling with pattern-based URL exclusions.
232
233
```python { .api }
234
def report_invalid_url(self, url: str, url_status: int, src_path: str):
235
"""
236
Report invalid URL with configured behavior (error, warning, or build failure).
237
238
Parameters:
239
- url: Invalid URL
240
- url_status: HTTP status code or error code
241
- src_path: Source file path where URL was found
242
"""
243
244
@staticmethod
245
def bad_url(url_status: int) -> bool:
246
"""
247
Determine if a URL status code indicates an error.
248
249
Parameters:
250
- url_status: HTTP status code or error code
251
252
Returns:
253
True if status indicates error (>=400 or -1)
254
"""
255
256
@staticmethod
257
def is_error(config: Config, url: str, url_status: int) -> bool:
258
"""
259
Check if URL should be treated as error based on exclusion configuration.
260
261
Parameters:
262
- config: Plugin configuration
263
- url: URL to check
264
- url_status: Status code
265
266
Returns:
267
True if URL should be treated as error (not excluded)
268
"""
269
```
270
271
### Utility Functions
272
273
Logging utilities with plugin name prefixes.
274
275
```python { .api }
276
def log_info(msg: str, *args, **kwargs):
277
"""Log info message with htmlproofer prefix."""
278
279
def log_warning(msg: str, *args, **kwargs):
280
"""Log warning message with htmlproofer prefix."""
281
282
def log_error(msg: str, *args, **kwargs):
283
"""Log error message with htmlproofer prefix."""
284
```
285
286
## Configuration Patterns
287
288
### Error Handling Strategies
289
290
**Immediate Failure**: Stop on first error
291
```yaml
292
plugins:
293
- htmlproofer:
294
raise_error: true
295
```
296
297
**Deferred Failure**: Check all links, then fail if any are invalid
298
```yaml
299
plugins:
300
- htmlproofer:
301
raise_error_after_finish: true
302
```
303
304
**Warning Only**: Report issues but don't fail build (default)
305
```yaml
306
plugins:
307
- htmlproofer:
308
# Default behavior - no error raising configured
309
```
310
311
### URL Filtering
312
313
**Ignore Specific URLs**: Skip validation entirely
314
```yaml
315
plugins:
316
- htmlproofer:
317
ignore_urls:
318
- https://private-site.com/*
319
- https://localhost:*
320
- https://127.0.0.1:*
321
```
322
323
**Error Exclusions**: Allow specific status codes for specific URLs
324
```yaml
325
plugins:
326
- htmlproofer:
327
raise_error: true
328
raise_error_excludes:
329
404: ['https://github.com/*/archive/*']
330
503: ['https://api.service.com/*']
331
400: ['*'] # Ignore all 400 errors
332
```
333
334
**Page Exclusions**: Skip validation for specific pages
335
```yaml
336
plugins:
337
- htmlproofer:
338
ignore_pages:
339
- draft-content/*
340
- internal-docs/private.md
341
```
342
343
### Performance Optimization
344
345
**Skip External URLs**: Validate only internal links
346
```yaml
347
plugins:
348
- htmlproofer:
349
validate_external_urls: false
350
```
351
352
**Skip Downloads**: Don't download full content (faster)
353
```yaml
354
plugins:
355
- htmlproofer:
356
skip_downloads: true
357
```
358
359
**Template Validation**: Validate full page templates (slower but comprehensive)
360
```yaml
361
plugins:
362
- htmlproofer:
363
validate_rendered_template: true
364
```
365
366
## Constants and Patterns
367
368
```python { .api }
369
URL_TIMEOUT: float = 10.0
370
"""Timeout for HTTP requests in seconds."""
371
372
URL_HEADERS: Dict[str, str]
373
"""Default headers for HTTP requests including User-Agent and Accept-Language."""
374
375
NAME: str = "htmlproofer"
376
"""Plugin name used in logging."""
377
378
MARKDOWN_ANCHOR_PATTERN: Pattern[str]
379
"""Regex pattern to match markdown links with optional anchors."""
380
381
HEADING_PATTERN: Pattern[str]
382
"""Regex pattern to match markdown headings."""
383
384
HTML_LINK_PATTERN: Pattern[str]
385
"""Regex pattern to match HTML anchor tags with IDs."""
386
387
IMAGE_PATTERN: Pattern[str]
388
"""Regex pattern to match markdown image syntax."""
389
390
LOCAL_PATTERNS: List[Pattern[str]]
391
"""List of patterns to match local development URLs."""
392
393
ATTRLIST_ANCHOR_PATTERN: Pattern[str]
394
"""Regex pattern to match attr_list extension anchor syntax."""
395
396
ATTRLIST_PATTERN: Pattern[str]
397
"""Regex pattern to match attr_list extension syntax."""
398
399
EMOJI_PATTERN: Pattern[str]
400
"""Regex pattern to match emoji syntax in headings."""
401
```