0
# Content Reading
1
2
Reader classes for parsing different markup formats including Markdown, reStructuredText, and HTML. Readers extract metadata, process content, and convert markup to HTML for theme rendering.
3
4
## Capabilities
5
6
### Readers Manager
7
8
Central reader manager that coordinates different format readers and provides caching functionality for improved performance.
9
10
```python { .api }
11
class Readers(FileStampDataCacher):
12
"""
13
Content reader manager with caching support.
14
15
Parameters:
16
- settings (dict): Site configuration dictionary
17
- cache_name (str, optional): Cache identifier for file caching
18
"""
19
def __init__(self, settings: dict, cache_name: str = ""): ...
20
21
def read_file(
22
self,
23
base_path: str,
24
path: str,
25
content_class=Content,
26
fmt: str = None
27
) -> Content:
28
"""
29
Read and parse a content file.
30
31
Parameters:
32
- base_path (str): Base directory path
33
- path (str): Relative file path
34
- content_class (class, optional): Content class to instantiate (default: Content)
35
- fmt (str, optional): Force specific format reader
36
37
Returns:
38
Content: Parsed content object with metadata and HTML content
39
"""
40
41
# Available readers (populated from settings)
42
readers: dict[str, BaseReader] # Format -> Reader mapping
43
```
44
45
### Base Reader Class
46
47
Foundation class for all content format readers providing common functionality for metadata extraction and content processing.
48
49
```python { .api }
50
class BaseReader:
51
"""
52
Base class for content format readers.
53
54
Parameters:
55
- settings (dict): Site configuration dictionary
56
"""
57
def __init__(self, settings: dict): ...
58
59
enabled: bool = True # Whether this reader is enabled
60
file_extensions: list[str] # Supported file extensions
61
62
def read(self, source_path: str) -> tuple[str, dict]:
63
"""
64
Read and parse content file.
65
66
Parameters:
67
- source_path (str): Path to content file
68
69
Returns:
70
tuple: (HTML content string, metadata dictionary)
71
"""
72
73
def process_metadata(self, name: str, value: str) -> tuple[str, Any]:
74
"""
75
Process individual metadata field.
76
77
Parameters:
78
- name (str): Metadata field name
79
- value (str): Raw metadata value
80
81
Returns:
82
tuple: (processed name, processed value)
83
"""
84
```
85
86
### reStructuredText Reader
87
88
Reader for reStructuredText (.rst) files using the docutils library for parsing and HTML generation.
89
90
```python { .api }
91
class RstReader(BaseReader):
92
"""
93
reStructuredText content reader.
94
95
Supports:
96
- Standard reStructuredText syntax
97
- Custom Pelican directives (code highlighting, etc.)
98
- Metadata extraction from docutils meta fields
99
- Math rendering via MathJax
100
- Custom role and directive registration
101
"""
102
103
file_extensions: list[str] = ['rst']
104
105
def read(self, source_path: str) -> tuple[str, dict]:
106
"""
107
Parse reStructuredText file and extract content/metadata.
108
109
Uses docutils for parsing with Pelican-specific settings and directives.
110
Supports custom roles and directives for enhanced functionality.
111
"""
112
```
113
114
### Markdown Reader
115
116
Reader for Markdown (.md, .markdown, .mkd) files using the Python-Markdown library with configurable extensions.
117
118
```python { .api }
119
class MarkdownReader(BaseReader):
120
"""
121
Markdown content reader.
122
123
Supports:
124
- Standard Markdown syntax
125
- Configurable Python-Markdown extensions
126
- Metadata extraction from YAML front matter or meta extension
127
- Code highlighting via Pygments
128
- Table support, footnotes, and other extensions
129
"""
130
131
file_extensions: list[str] = ['md', 'markdown', 'mkd']
132
133
def read(self, source_path: str) -> tuple[str, dict]:
134
"""
135
Parse Markdown file and extract content/metadata.
136
137
Uses Python-Markdown with configurable extensions.
138
Metadata can be extracted from YAML front matter or meta extension.
139
"""
140
```
141
142
### HTML Reader
143
144
Reader for HTML (.html, .htm) files that extracts metadata from HTML meta tags and preserves HTML content.
145
146
```python { .api }
147
class HTMLReader(BaseReader):
148
"""
149
HTML content reader.
150
151
Supports:
152
- Raw HTML content preservation
153
- Metadata extraction from HTML meta tags
154
- Title extraction from <title> tag
155
- Custom metadata via <meta> tags
156
"""
157
158
file_extensions: list[str] = ['html', 'htm']
159
160
def read(self, source_path: str) -> tuple[str, dict]:
161
"""
162
Parse HTML file and extract content/metadata.
163
164
Extracts metadata from HTML meta tags and preserves HTML content as-is.
165
Useful for importing existing HTML content or custom layouts.
166
"""
167
```
168
169
## Reader Configuration
170
171
### Markdown Configuration
172
173
Configure Markdown reader behavior in settings:
174
175
```python
176
# In pelicanconf.py
177
MARKDOWN = {
178
'extension_configs': {
179
'markdown.extensions.codehilite': {'css_class': 'highlight'},
180
'markdown.extensions.extra': {},
181
'markdown.extensions.meta': {},
182
'markdown.extensions.toc': {'permalink': True},
183
},
184
'output_format': 'html5',
185
}
186
```
187
188
### reStructuredText Configuration
189
190
Configure reStructuredText reader behavior:
191
192
```python
193
# In pelicanconf.py
194
DOCUTILS_SETTINGS = {
195
'smart_quotes': True,
196
'initial_header_level': 2,
197
'syntax_highlight': 'short',
198
'input_encoding': 'utf-8',
199
'math_output': 'MathJax',
200
}
201
```
202
203
### Custom Readers
204
205
Register custom readers for additional formats:
206
207
```python
208
# In pelicanconf.py
209
READERS = {
210
'txt': 'path.to.custom.TextReader',
211
'org': 'path.to.custom.OrgModeReader',
212
}
213
```
214
215
## Metadata Processing
216
217
### Common Metadata Fields
218
219
All readers process these standard metadata fields:
220
221
- `title`: Content title
222
- `date`: Publication date (ISO format or custom format)
223
- `modified`: Last modification date
224
- `category`: Content category (articles only)
225
- `tags`: Comma-separated tags (articles only)
226
- `slug`: URL slug (auto-generated if not provided)
227
- `author`: Author name
228
- `authors`: Multiple authors (comma-separated)
229
- `summary`: Content summary/description
230
- `lang`: Content language code
231
- `status`: Content status (published, draft, hidden)
232
- `template`: Custom template name
233
- `save_as`: Custom output file path
234
- `url`: Custom URL path
235
236
### Metadata Format Examples
237
238
#### Markdown with YAML Front Matter
239
240
```markdown
241
---
242
title: My Article Title
243
date: 2023-01-15 10:30
244
category: Python
245
tags: tutorial, programming
246
author: John Doe
247
summary: A comprehensive guide to Python programming.
248
---
249
250
# Article Content
251
252
Content goes here...
253
```
254
255
#### Markdown with Meta Extension
256
257
```markdown
258
Title: My Article Title
259
Date: 2023-01-15 10:30
260
Category: Python
261
Tags: tutorial, programming
262
Author: John Doe
263
Summary: A comprehensive guide to Python programming.
264
265
# Article Content
266
267
Content goes here...
268
```
269
270
#### reStructuredText
271
272
```rst
273
My Article Title
274
================
275
276
:date: 2023-01-15 10:30
277
:category: Python
278
:tags: tutorial, programming
279
:author: John Doe
280
:summary: A comprehensive guide to Python programming.
281
282
Article Content
283
---------------
284
285
Content goes here...
286
```
287
288
#### HTML
289
290
```html
291
<html>
292
<head>
293
<title>My Article Title</title>
294
<meta name="date" content="2023-01-15 10:30">
295
<meta name="category" content="Python">
296
<meta name="tags" content="tutorial, programming">
297
<meta name="author" content="John Doe">
298
<meta name="summary" content="A comprehensive guide to Python programming.">
299
</head>
300
<body>
301
<h1>Article Content</h1>
302
<p>Content goes here...</p>
303
</body>
304
</html>
305
```
306
307
## Usage Examples
308
309
### Using Readers Directly
310
311
```python
312
from pelican.readers import Readers
313
from pelican.settings import read_settings
314
315
# Load settings and create readers
316
settings = read_settings('pelicanconf.py')
317
readers = Readers(settings)
318
319
# Read a Markdown file
320
content = readers.read_file(
321
base_path='content',
322
path='articles/my-post.md',
323
content_class=Article
324
)
325
326
print(content.title) # Article title
327
print(content.content) # HTML content
328
print(content.metadata) # Raw metadata dictionary
329
```
330
331
### Custom Reader Implementation
332
333
```python
334
from pelican.readers import BaseReader
335
import json
336
337
class JsonReader(BaseReader):
338
"""Custom reader for JSON content files."""
339
340
file_extensions = ['json']
341
342
def read(self, source_path):
343
"""Read JSON file and extract content/metadata."""
344
with open(source_path, 'r', encoding='utf-8') as f:
345
data = json.load(f)
346
347
# Extract content and metadata
348
content = data.get('content', '')
349
metadata = {k: v for k, v in data.items() if k != 'content'}
350
351
# Process metadata using base class method
352
processed_metadata = {}
353
for name, value in metadata.items():
354
name, value = self.process_metadata(name, str(value))
355
processed_metadata[name] = value
356
357
return content, processed_metadata
358
359
# Register custom reader
360
# In pelicanconf.py:
361
# READERS = {'json': 'path.to.JsonReader'}
362
```
363
364
### Reader Integration with Generators
365
366
```python
367
from pelican.generators import Generator
368
369
class CustomGenerator(Generator):
370
"""Generator that uses readers to process content."""
371
372
def generate_context(self):
373
"""Generate content using readers."""
374
content_files = self.get_content_files()
375
376
for content_file in content_files:
377
# Use readers to parse file
378
content = self.readers.read_file(
379
base_path=self.path,
380
path=content_file,
381
content_class=Article
382
)
383
384
# Process content
385
self.process_content(content)
386
387
def get_content_files(self):
388
"""Get list of content files to process."""
389
# Implementation depends on file discovery strategy
390
return []
391
392
def process_content(self, content):
393
"""Process parsed content."""
394
# Add to context or perform custom processing
395
pass
396
```
397
398
### Metadata Processing Customization
399
400
```python
401
from pelican.readers import BaseReader
402
from datetime import datetime
403
404
class CustomReader(BaseReader):
405
"""Reader with custom metadata processing."""
406
407
def process_metadata(self, name, value):
408
"""Custom metadata processing logic."""
409
name, value = super().process_metadata(name, value)
410
411
# Custom date parsing
412
if name == 'date':
413
if isinstance(value, str):
414
try:
415
value = datetime.strptime(value, '%Y-%m-%d %H:%M')
416
except ValueError:
417
value = datetime.strptime(value, '%Y-%m-%d')
418
419
# Custom tag processing
420
elif name == 'tags':
421
if isinstance(value, str):
422
value = [tag.strip() for tag in value.split(',')]
423
424
return name, value
425
```