or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

index.md

index.mddocs/

0

# MkDocs HTMLProofer Plugin

1

2

A MkDocs plugin that validates URLs, including anchors, in rendered HTML files. It integrates seamlessly with the MkDocs build process to automatically check all links (both internal and external) for validity, ensuring documentation maintains high quality and user experience.

3

4

## Package Information

5

6

- **Package Name**: mkdocs-htmlproofer-plugin

7

- **Package Type**: pypi

8

- **Language**: Python

9

- **Installation**: `pip install mkdocs-htmlproofer-plugin`

10

11

## Core Imports

12

13

```python

14

from htmlproofer.plugin import HtmlProoferPlugin

15

```

16

17

## Basic Usage

18

19

Enable the plugin in your `mkdocs.yml` configuration:

20

21

```yaml

22

plugins:

23

- search

24

- htmlproofer

25

```

26

27

Basic configuration with error handling:

28

29

```yaml

30

plugins:

31

- search

32

- htmlproofer:

33

enabled: true

34

raise_error: true

35

validate_external_urls: true

36

skip_downloads: false

37

```

38

39

Advanced configuration with URL filtering:

40

41

```yaml

42

plugins:

43

- search

44

- htmlproofer:

45

raise_error_after_finish: true

46

raise_error_excludes:

47

504: ['https://www.mkdocs.org/']

48

404: ['https://github.com/manuzhang/*']

49

ignore_urls:

50

- https://github.com/myprivateorg/*

51

- https://app.dynamic-service.io*

52

ignore_pages:

53

- path/to/excluded/file

54

- path/to/excluded/folder/*

55

warn_on_ignored_urls: true

56

```

57

58

## Architecture

59

60

The plugin operates through MkDocs' event-driven plugin system:

61

62

- **HtmlProoferPlugin**: Main plugin class extending BasePlugin

63

- **URL Resolution**: Handles different URL schemes (HTTP/HTTPS) with caching

64

- **Anchor Validation**: Validates internal anchors and attr_list extension support

65

- **Error Reporting**: Configurable error handling with exclusion patterns

66

- **File Mapping**: Optimized file lookup for internal link resolution

67

68

## Capabilities

69

70

### Plugin Configuration

71

72

The main plugin class with comprehensive configuration options for URL validation behavior.

73

74

```python { .api }

75

class HtmlProoferPlugin(BasePlugin):

76

"""

77

MkDocs plugin for validating URLs in rendered HTML files.

78

79

Configuration Options:

80

- enabled (bool): Enable/disable plugin (default: True)

81

- raise_error (bool): Raise error on first bad URL (default: False)

82

- raise_error_after_finish (bool): Raise error after checking all links (default: False)

83

- raise_error_excludes (dict): URL patterns to exclude from errors by status code (default: {})

84

- skip_downloads (bool): Skip downloading remote URL content (default: False)

85

- validate_external_urls (bool): Validate external HTTP/HTTPS URLs (default: True)

86

- validate_rendered_template (bool): Validate entire rendered template (default: False)

87

- ignore_urls (list): URLs to ignore completely with wildcard support (default: [])

88

- warn_on_ignored_urls (bool): Log warnings for ignored URLs (default: False)

89

- ignore_pages (list): Pages to ignore completely with wildcard support (default: [])

90

"""

91

92

def __init__(self):

93

"""Initialize plugin with HTTP session and scheme handlers."""

94

95

def on_post_build(self, config: Config) -> None:

96

"""Hook called after build completion to handle final error reporting."""

97

98

def on_files(self, files: Files, config: Config) -> None:

99

"""Hook called to store files for later URL resolution."""

100

101

def on_post_page(self, output_content: str, page: Page, config: Config) -> None:

102

"""Hook called after page processing to validate URLs."""

103

```

104

105

### URL Validation

106

107

Core URL validation functionality with support for internal and external links.

108

109

```python { .api }

110

def get_url_status(

111

self,

112

url: str,

113

src_path: str,

114

all_element_ids: Set[str],

115

files: Dict[str, File]

116

) -> int:

117

"""

118

Get HTTP status code for a URL.

119

120

Parameters:

121

- url: URL to validate

122

- src_path: Source file path for context

123

- all_element_ids: Set of all element IDs on the page

124

- files: Dictionary mapping paths to File objects

125

126

Returns:

127

Status code (0 for valid, 404 for not found, etc.)

128

"""

129

130

def get_external_url(self, url: str, scheme: str, src_path: str) -> int:

131

"""

132

Get status for external URLs by delegating to scheme handlers.

133

134

Parameters:

135

- url: External URL to validate

136

- scheme: URL scheme (http, https)

137

- src_path: Source file path for context

138

139

Returns:

140

Status code from scheme handler or 0 for unknown schemes

141

"""

142

143

def resolve_web_scheme(self, url: str) -> int:

144

"""

145

Resolve HTTP/HTTPS URLs with caching and timeout handling.

146

147

Parameters:

148

- url: HTTP/HTTPS URL to resolve

149

150

Returns:

151

HTTP status code or error code (-1 for connection errors, 504 for timeout)

152

"""

153

```

154

155

### Internal Link Resolution

156

157

Static methods for resolving and validating internal links and anchors.

158

159

```python { .api }

160

@staticmethod

161

def is_url_target_valid(url: str, src_path: str, files: Dict[str, File]) -> bool:

162

"""

163

Check if a URL target is valid within the MkDocs site structure.

164

165

Parameters:

166

- url: URL to validate

167

- src_path: Source file path for relative link resolution

168

- files: Dictionary mapping paths to File objects

169

170

Returns:

171

True if target exists and anchor (if present) is valid

172

"""

173

174

@staticmethod

175

def find_source_file(url: str, src_path: str, files: Dict[str, File]) -> Optional[File]:

176

"""

177

Find the original source file for a built URL.

178

179

Parameters:

180

- url: Built URL to resolve

181

- src_path: Source file path for relative link resolution

182

- files: Dictionary mapping paths to File objects

183

184

Returns:

185

File object if found, None otherwise

186

"""

187

188

@staticmethod

189

def find_target_markdown(url: str, src_path: str, files: Dict[str, File]) -> Optional[str]:

190

"""

191

Find the original Markdown source for a built URL.

192

193

Parameters:

194

- url: Built URL to resolve

195

- src_path: Source file path for context

196

- files: Dictionary mapping paths to File objects

197

198

Returns:

199

Markdown content if found, None otherwise

200

"""

201

```

202

203

### Anchor Validation

204

205

Advanced anchor validation with support for attr_list extension and heading parsing.

206

207

```python { .api }

208

@staticmethod

209

def contains_anchor(markdown: str, anchor: str) -> bool:

210

"""

211

Check if Markdown source contains a heading or element that corresponds to an anchor.

212

213

Supports:

214

- Standard heading anchors (auto-generated from heading text)

215

- attr_list extension custom anchors: # Heading {#custom-anchor}

216

- HTML anchor tags: <a id="anchor-name">

217

- Paragraph anchors: {#paragraph-anchor}

218

- Image anchors: ![alt](image.png){#image-anchor}

219

220

Parameters:

221

- markdown: Markdown source text to search

222

- anchor: Anchor name to find

223

224

Returns:

225

True if anchor exists in the markdown source

226

"""

227

```

228

229

### Error Handling and Reporting

230

231

Configurable error handling with pattern-based URL exclusions.

232

233

```python { .api }

234

def report_invalid_url(self, url: str, url_status: int, src_path: str):

235

"""

236

Report invalid URL with configured behavior (error, warning, or build failure).

237

238

Parameters:

239

- url: Invalid URL

240

- url_status: HTTP status code or error code

241

- src_path: Source file path where URL was found

242

"""

243

244

@staticmethod

245

def bad_url(url_status: int) -> bool:

246

"""

247

Determine if a URL status code indicates an error.

248

249

Parameters:

250

- url_status: HTTP status code or error code

251

252

Returns:

253

True if status indicates error (>=400 or -1)

254

"""

255

256

@staticmethod

257

def is_error(config: Config, url: str, url_status: int) -> bool:

258

"""

259

Check if URL should be treated as error based on exclusion configuration.

260

261

Parameters:

262

- config: Plugin configuration

263

- url: URL to check

264

- url_status: Status code

265

266

Returns:

267

True if URL should be treated as error (not excluded)

268

"""

269

```

270

271

### Utility Functions

272

273

Logging utilities with plugin name prefixes.

274

275

```python { .api }

276

def log_info(msg: str, *args, **kwargs):

277

"""Log info message with htmlproofer prefix."""

278

279

def log_warning(msg: str, *args, **kwargs):

280

"""Log warning message with htmlproofer prefix."""

281

282

def log_error(msg: str, *args, **kwargs):

283

"""Log error message with htmlproofer prefix."""

284

```

285

286

## Configuration Patterns

287

288

### Error Handling Strategies

289

290

**Immediate Failure**: Stop on first error

291

```yaml

292

plugins:

293

- htmlproofer:

294

raise_error: true

295

```

296

297

**Deferred Failure**: Check all links, then fail if any are invalid

298

```yaml

299

plugins:

300

- htmlproofer:

301

raise_error_after_finish: true

302

```

303

304

**Warning Only**: Report issues but don't fail build (default)

305

```yaml

306

plugins:

307

- htmlproofer:

308

# Default behavior - no error raising configured

309

```

310

311

### URL Filtering

312

313

**Ignore Specific URLs**: Skip validation entirely

314

```yaml

315

plugins:

316

- htmlproofer:

317

ignore_urls:

318

- https://private-site.com/*

319

- https://localhost:*

320

- https://127.0.0.1:*

321

```

322

323

**Error Exclusions**: Allow specific status codes for specific URLs

324

```yaml

325

plugins:

326

- htmlproofer:

327

raise_error: true

328

raise_error_excludes:

329

404: ['https://github.com/*/archive/*']

330

503: ['https://api.service.com/*']

331

400: ['*'] # Ignore all 400 errors

332

```

333

334

**Page Exclusions**: Skip validation for specific pages

335

```yaml

336

plugins:

337

- htmlproofer:

338

ignore_pages:

339

- draft-content/*

340

- internal-docs/private.md

341

```

342

343

### Performance Optimization

344

345

**Skip External URLs**: Validate only internal links

346

```yaml

347

plugins:

348

- htmlproofer:

349

validate_external_urls: false

350

```

351

352

**Skip Downloads**: Don't download full content (faster)

353

```yaml

354

plugins:

355

- htmlproofer:

356

skip_downloads: true

357

```

358

359

**Template Validation**: Validate full page templates (slower but comprehensive)

360

```yaml

361

plugins:

362

- htmlproofer:

363

validate_rendered_template: true

364

```

365

366

## Constants and Patterns

367

368

```python { .api }

369

URL_TIMEOUT: float = 10.0

370

"""Timeout for HTTP requests in seconds."""

371

372

URL_HEADERS: Dict[str, str]

373

"""Default headers for HTTP requests including User-Agent and Accept-Language."""

374

375

NAME: str = "htmlproofer"

376

"""Plugin name used in logging."""

377

378

MARKDOWN_ANCHOR_PATTERN: Pattern[str]

379

"""Regex pattern to match markdown links with optional anchors."""

380

381

HEADING_PATTERN: Pattern[str]

382

"""Regex pattern to match markdown headings."""

383

384

HTML_LINK_PATTERN: Pattern[str]

385

"""Regex pattern to match HTML anchor tags with IDs."""

386

387

IMAGE_PATTERN: Pattern[str]

388

"""Regex pattern to match markdown image syntax."""

389

390

LOCAL_PATTERNS: List[Pattern[str]]

391

"""List of patterns to match local development URLs."""

392

393

ATTRLIST_ANCHOR_PATTERN: Pattern[str]

394

"""Regex pattern to match attr_list extension anchor syntax."""

395

396

ATTRLIST_PATTERN: Pattern[str]

397

"""Regex pattern to match attr_list extension syntax."""

398

399

EMOJI_PATTERN: Pattern[str]

400

"""Regex pattern to match emoji syntax in headings."""

401

```