0
# Preprocessor System
1
2
Foliant's preprocessor system provides content transformation capabilities for modifying Markdown before backend processing. Preprocessors use tag-based content processing to enable features like includes, diagram generation, conditional content, and custom transformations.
3
4
## Capabilities
5
6
### Base Preprocessor Class
7
8
Foundation class for all content preprocessors providing tag parsing, option handling, and common functionality.
9
10
```python { .api }
11
class BasePreprocessor:
12
"""Base preprocessor class that all preprocessors must inherit from."""
13
14
defaults: dict = {}
15
tags: tuple = ()
16
17
def __init__(self, context: dict, logger: Logger, quiet=False, debug=False, options={}):
18
"""
19
Initialize preprocessor with build context and options.
20
21
Parameters:
22
- context (dict): Build context containing project_path, config, target, backend
23
- logger (Logger): Logger instance for processing messages
24
- quiet (bool): Suppress output messages
25
- debug (bool): Enable debug logging
26
- options (dict): Preprocessor-specific configuration options
27
"""
28
29
@staticmethod
30
def get_options(options_string: str) -> Dict[str, OptionValue]:
31
"""
32
Parse XML attribute string into typed options dictionary.
33
34
Parameters:
35
- options_string (str): String of XML-style attributes
36
37
Returns:
38
Dict[str, OptionValue]: Parsed options with proper types
39
40
Example:
41
'width="800" height="600" visible="true"' ->
42
{'width': 800, 'height': 600, 'visible': True}
43
"""
44
45
def apply(self):
46
"""
47
Run preprocessor against project content.
48
Must be implemented by each preprocessor.
49
50
Raises:
51
NotImplementedError: If not implemented by subclass
52
"""
53
```
54
55
### Unescape Preprocessor
56
57
Built-in preprocessor that handles escaped tag processing for nested tag scenarios.
58
59
```python { .api }
60
class Preprocessor(BasePreprocessor):
61
"""
62
Internal preprocessor for unescaping escaped tags.
63
Removes leading < from escaped tag definitions.
64
"""
65
66
def process_escaped_tags(self, content: str) -> str:
67
"""
68
Remove escape sequences from tag definitions.
69
70
Parameters:
71
- content (str): Markdown content with escaped tags
72
73
Returns:
74
str: Content with tags unescaped
75
"""
76
77
def apply(self):
78
"""Process all .md files in working directory to unescape tags."""
79
```
80
81
## Type Definitions
82
83
```python { .api }
84
OptionValue = int | float | bool | str
85
86
# Preprocessor context structure
87
PreprocessorContext = {
88
'project_path': Path, # Path to project directory
89
'config': dict, # Parsed configuration
90
'target': str, # Target format
91
'backend': str # Backend name
92
}
93
94
# Tag pattern structure for regex matching
95
TagPattern = {
96
'tag': str, # Tag name
97
'options': str, # Options string
98
'body': str # Tag content body
99
}
100
```
101
102
## Usage Examples
103
104
### Custom Preprocessor Implementation
105
106
```python
107
from foliant.preprocessors.base import BasePreprocessor
108
import re
109
110
class CustomPreprocessor(BasePreprocessor):
111
"""Custom preprocessor for special content transformation."""
112
113
defaults = {
114
'format': 'html',
115
'style': 'default'
116
}
117
tags = ('custom', 'transform')
118
119
def apply(self):
120
"""Process all markdown files with custom tags."""
121
for markdown_file in self.working_dir.rglob('*.md'):
122
self.logger.debug(f'Processing {markdown_file}')
123
124
with open(markdown_file, 'r', encoding='utf8') as f:
125
content = f.read()
126
127
# Process tags using inherited pattern
128
content = self.pattern.sub(self._process_tag, content)
129
130
with open(markdown_file, 'w', encoding='utf8') as f:
131
f.write(content)
132
133
def _process_tag(self, match):
134
"""Process individual tag occurrence."""
135
tag = match.group('tag')
136
options_str = match.group('options') or ''
137
body = match.group('body')
138
139
# Parse options
140
options = self.get_options(options_str)
141
final_options = {**self.defaults, **self.options, **options}
142
143
# Transform content based on tag and options
144
if tag == 'custom':
145
return self._transform_custom(body, final_options)
146
elif tag == 'transform':
147
return self._transform_content(body, final_options)
148
149
return match.group(0) # Return unchanged if not handled
150
151
def _transform_custom(self, content, options):
152
"""Transform custom tag content."""
153
format_type = options['format']
154
style = options['style']
155
156
if format_type == 'html':
157
return f'<div class="custom-{style}">{content}</div>'
158
else:
159
return f'[{style.upper()}]: {content}'
160
161
def _transform_content(self, content, options):
162
"""Transform generic content."""
163
return content.upper() if options.get('uppercase') else content
164
```
165
166
### Tag-based Content Processing
167
168
Example Markdown with custom tags:
169
```markdown
170
# My Document
171
172
<custom format="html" style="highlight">
173
Important content here
174
</custom>
175
176
<transform uppercase="true">
177
This text will be uppercase
178
</transform>
179
180
<custom style="callout">
181
This is a callout box
182
</custom>
183
```
184
185
Preprocessor usage:
186
```python
187
from pathlib import Path
188
import logging
189
190
# Set up context
191
context = {
192
'project_path': Path('./project'),
193
'config': {'title': 'Test'},
194
'target': 'html',
195
'backend': 'mkdocs'
196
}
197
198
# Create and run preprocessor
199
preprocessor = CustomPreprocessor(
200
context=context,
201
logger=logging.getLogger(),
202
options={'format': 'html', 'style': 'modern'}
203
)
204
205
preprocessor.apply()
206
```
207
208
### Option Parsing
209
210
```python
211
from foliant.preprocessors.base import BasePreprocessor
212
213
# Parse XML-style options
214
options_string = 'width="800" height="600" visible="true" title="My Chart"'
215
options = BasePreprocessor.get_options(options_string)
216
217
print(options)
218
# Output: {'width': 800, 'height': 600, 'visible': True, 'title': 'My Chart'}
219
220
# Handle empty options
221
empty_options = BasePreprocessor.get_options('')
222
print(empty_options) # Output: {}
223
```
224
225
### Complex Preprocessor with File Operations
226
227
```python
228
from foliant.preprocessors.base import BasePreprocessor
229
import subprocess
230
from pathlib import Path
231
232
class DiagramPreprocessor(BasePreprocessor):
233
"""Preprocessor for generating diagrams from text."""
234
235
defaults = {
236
'format': 'png',
237
'theme': 'default',
238
'output_dir': 'images'
239
}
240
tags = ('plantuml', 'mermaid')
241
242
def apply(self):
243
"""Process diagram tags in all markdown files."""
244
# Create output directory
245
output_dir = self.working_dir / self.options['output_dir']
246
output_dir.mkdir(exist_ok=True)
247
248
for markdown_file in self.working_dir.rglob('*.md'):
249
content = self._process_file(markdown_file, output_dir)
250
251
with open(markdown_file, 'w', encoding='utf8') as f:
252
f.write(content)
253
254
def _process_file(self, file_path, output_dir):
255
"""Process single markdown file."""
256
with open(file_path, 'r', encoding='utf8') as f:
257
content = f.read()
258
259
return self.pattern.sub(
260
lambda m: self._process_diagram(m, output_dir, file_path.stem),
261
content
262
)
263
264
def _process_diagram(self, match, output_dir, file_stem):
265
"""Process individual diagram tag."""
266
tag = match.group('tag')
267
options_str = match.group('options') or ''
268
body = match.group('body')
269
270
options = {**self.defaults, **self.options, **self.get_options(options_str)}
271
272
# Generate unique filename
273
diagram_hash = hash(body + str(options))
274
filename = f"{file_stem}_{tag}_{abs(diagram_hash)}.{options['format']}"
275
output_path = output_dir / filename
276
277
# Generate diagram
278
if tag == 'plantuml':
279
self._generate_plantuml(body, output_path, options)
280
elif tag == 'mermaid':
281
self._generate_mermaid(body, output_path, options)
282
283
# Return markdown image reference
284
return f"})"
285
286
def _generate_plantuml(self, source, output_path, options):
287
"""Generate PlantUML diagram."""
288
subprocess.run([
289
'plantuml',
290
'-t' + options['format'],
291
'-o', str(output_path.parent),
292
'-'
293
], input=source, text=True, check=True)
294
295
def _generate_mermaid(self, source, output_path, options):
296
"""Generate Mermaid diagram."""
297
subprocess.run([
298
'mmdc',
299
'-i', '-',
300
'-o', str(output_path),
301
'-t', options['theme']
302
], input=source, text=True, check=True)
303
```
304
305
### Preprocessor Configuration
306
307
Example `foliant.yml` preprocessor configuration:
308
```yaml
309
title: My Project
310
311
preprocessors:
312
- includes
313
- plantuml:
314
format: svg
315
theme: dark
316
server_url: http://localhost:8080
317
- custom:
318
style: modern
319
format: html
320
uppercase: false
321
```
322
323
### Conditional Preprocessor
324
325
```python
326
class ConditionalPreprocessor(BasePreprocessor):
327
"""Preprocessor for conditional content inclusion."""
328
329
defaults = {'target': 'all'}
330
tags = ('if', 'unless', 'target')
331
332
def apply(self):
333
"""Remove or keep content based on conditions."""
334
current_target = self.context['target']
335
336
for markdown_file in self.working_dir.rglob('*.md'):
337
with open(markdown_file, 'r', encoding='utf8') as f:
338
content = f.read()
339
340
# Process conditional tags
341
content = self._process_conditionals(content, current_target)
342
343
with open(markdown_file, 'w', encoding='utf8') as f:
344
f.write(content)
345
346
def _process_conditionals(self, content, current_target):
347
"""Process conditional tags based on current build target."""
348
def process_tag(match):
349
tag = match.group('tag')
350
options_str = match.group('options') or ''
351
body = match.group('body')
352
353
options = self.get_options(options_str)
354
target_condition = options.get('target', 'all')
355
356
if tag == 'if':
357
# Include content if target matches
358
if target_condition == 'all' or target_condition == current_target:
359
return body
360
else:
361
return ''
362
elif tag == 'unless':
363
# Include content unless target matches
364
if target_condition != current_target:
365
return body
366
else:
367
return ''
368
elif tag == 'target':
369
# Include only for specific target
370
if target_condition == current_target:
371
return body
372
else:
373
return ''
374
375
return match.group(0)
376
377
return self.pattern.sub(process_tag, content)
378
```
379
380
Usage in Markdown:
381
```markdown
382
# Documentation
383
384
<if target="html">
385
This content only appears in HTML builds.
386
</if>
387
388
<unless target="pdf">
389
This content appears in all formats except PDF.
390
</unless>
391
392
<target target="pdf">
393
PDF-specific content here.
394
</target>
395
```