Tessl Tile for pypi/foliant@1.0.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

backends.md cli.md config.md index.md preprocessors.md utils.md

preprocessors.mddocs/

0
# Preprocessor System
1

2
Foliant's preprocessor system provides content transformation capabilities for modifying Markdown before backend processing. Preprocessors use tag-based content processing to enable features like includes, diagram generation, conditional content, and custom transformations.
3

4
## Capabilities
5

6
### Base Preprocessor Class
7

8
Foundation class for all content preprocessors providing tag parsing, option handling, and common functionality.
9

10
```python { .api }
11
class BasePreprocessor:
12
    """Base preprocessor class that all preprocessors must inherit from."""
13
    
14
    defaults: dict = {}
15
    tags: tuple = ()
16
    
17
    def __init__(self, context: dict, logger: Logger, quiet=False, debug=False, options={}):
18
        """
19
        Initialize preprocessor with build context and options.
20
        
21
        Parameters:
22
        - context (dict): Build context containing project_path, config, target, backend
23
        - logger (Logger): Logger instance for processing messages
24
        - quiet (bool): Suppress output messages
25
        - debug (bool): Enable debug logging
26
        - options (dict): Preprocessor-specific configuration options
27
        """
28
    
29
    @staticmethod
30
    def get_options(options_string: str) -> Dict[str, OptionValue]:
31
        """
32
        Parse XML attribute string into typed options dictionary.
33
        
34
        Parameters:
35
        - options_string (str): String of XML-style attributes
36
        
37
        Returns:
38
        Dict[str, OptionValue]: Parsed options with proper types
39
        
40
        Example:
41
        'width="800" height="600" visible="true"' ->
42
        {'width': 800, 'height': 600, 'visible': True}
43
        """
44
    
45
    def apply(self):
46
        """
47
        Run preprocessor against project content.
48
        Must be implemented by each preprocessor.
49
        
50
        Raises:
51
        NotImplementedError: If not implemented by subclass
52
        """
53
```
54

55
### Unescape Preprocessor
56

57
Built-in preprocessor that handles escaped tag processing for nested tag scenarios.
58

59
```python { .api }
60
class Preprocessor(BasePreprocessor):
61
    """
62
    Internal preprocessor for unescaping escaped tags.
63
    Removes leading < from escaped tag definitions.
64
    """
65
    
66
    def process_escaped_tags(self, content: str) -> str:
67
        """
68
        Remove escape sequences from tag definitions.
69
        
70
        Parameters:
71
        - content (str): Markdown content with escaped tags
72
        
73
        Returns:
74
        str: Content with tags unescaped
75
        """
76
    
77
    def apply(self):
78
        """Process all .md files in working directory to unescape tags."""
79
```
80

81
## Type Definitions
82

83
```python { .api }
84
OptionValue = int | float | bool | str
85

86
# Preprocessor context structure
87
PreprocessorContext = {
88
    'project_path': Path,    # Path to project directory
89
    'config': dict,          # Parsed configuration
90
    'target': str,           # Target format
91
    'backend': str           # Backend name
92
}
93

94
# Tag pattern structure for regex matching
95
TagPattern = {
96
    'tag': str,              # Tag name
97
    'options': str,          # Options string
98
    'body': str              # Tag content body
99
}
100
```
101

102
## Usage Examples
103

104
### Custom Preprocessor Implementation
105

106
```python
107
from foliant.preprocessors.base import BasePreprocessor
108
import re
109

110
class CustomPreprocessor(BasePreprocessor):
111
    """Custom preprocessor for special content transformation."""
112
    
113
    defaults = {
114
        'format': 'html',
115
        'style': 'default'
116
    }
117
    tags = ('custom', 'transform')
118
    
119
    def apply(self):
120
        """Process all markdown files with custom tags."""
121
        for markdown_file in self.working_dir.rglob('*.md'):
122
            self.logger.debug(f'Processing {markdown_file}')
123
            
124
            with open(markdown_file, 'r', encoding='utf8') as f:
125
                content = f.read()
126
            
127
            # Process tags using inherited pattern
128
            content = self.pattern.sub(self._process_tag, content)
129
            
130
            with open(markdown_file, 'w', encoding='utf8') as f:
131
                f.write(content)
132
    
133
    def _process_tag(self, match):
134
        """Process individual tag occurrence."""
135
        tag = match.group('tag')
136
        options_str = match.group('options') or ''
137
        body = match.group('body')
138
        
139
        # Parse options
140
        options = self.get_options(options_str)
141
        final_options = {**self.defaults, **self.options, **options}
142
        
143
        # Transform content based on tag and options
144
        if tag == 'custom':
145
            return self._transform_custom(body, final_options)
146
        elif tag == 'transform':
147
            return self._transform_content(body, final_options)
148
        
149
        return match.group(0)  # Return unchanged if not handled
150
    
151
    def _transform_custom(self, content, options):
152
        """Transform custom tag content."""
153
        format_type = options['format']
154
        style = options['style']
155
        
156
        if format_type == 'html':
157
            return f'<div class="custom-{style}">{content}</div>'
158
        else:
159
            return f'[{style.upper()}]: {content}'
160
    
161
    def _transform_content(self, content, options):
162
        """Transform generic content."""
163
        return content.upper() if options.get('uppercase') else content
164
```
165

166
### Tag-based Content Processing
167

168
Example Markdown with custom tags:
169
```markdown
170
# My Document
171

172
<custom format="html" style="highlight">
173
Important content here
174
</custom>
175

176
<transform uppercase="true">
177
This text will be uppercase
178
</transform>
179

180
<custom style="callout">
181
This is a callout box
182
</custom>
183
```
184

185
Preprocessor usage:
186
```python
187
from pathlib import Path
188
import logging
189

190
# Set up context
191
context = {
192
    'project_path': Path('./project'),
193
    'config': {'title': 'Test'},
194
    'target': 'html',
195
    'backend': 'mkdocs'
196
}
197

198
# Create and run preprocessor
199
preprocessor = CustomPreprocessor(
200
    context=context,
201
    logger=logging.getLogger(),
202
    options={'format': 'html', 'style': 'modern'}
203
)
204

205
preprocessor.apply()
206
```
207

208
### Option Parsing
209

210
```python
211
from foliant.preprocessors.base import BasePreprocessor
212

213
# Parse XML-style options
214
options_string = 'width="800" height="600" visible="true" title="My Chart"'
215
options = BasePreprocessor.get_options(options_string)
216

217
print(options)
218
# Output: {'width': 800, 'height': 600, 'visible': True, 'title': 'My Chart'}
219

220
# Handle empty options
221
empty_options = BasePreprocessor.get_options('')
222
print(empty_options)  # Output: {}
223
```
224

225
### Complex Preprocessor with File Operations
226

227
```python
228
from foliant.preprocessors.base import BasePreprocessor
229
import subprocess
230
from pathlib import Path
231

232
class DiagramPreprocessor(BasePreprocessor):
233
    """Preprocessor for generating diagrams from text."""
234
    
235
    defaults = {
236
        'format': 'png',
237
        'theme': 'default',
238
        'output_dir': 'images'
239
    }
240
    tags = ('plantuml', 'mermaid')
241
    
242
    def apply(self):
243
        """Process diagram tags in all markdown files."""
244
        # Create output directory
245
        output_dir = self.working_dir / self.options['output_dir']
246
        output_dir.mkdir(exist_ok=True)
247
        
248
        for markdown_file in self.working_dir.rglob('*.md'):
249
            content = self._process_file(markdown_file, output_dir)
250
            
251
            with open(markdown_file, 'w', encoding='utf8') as f:
252
                f.write(content)
253
    
254
    def _process_file(self, file_path, output_dir):
255
        """Process single markdown file."""
256
        with open(file_path, 'r', encoding='utf8') as f:
257
            content = f.read()
258
        
259
        return self.pattern.sub(
260
            lambda m: self._process_diagram(m, output_dir, file_path.stem),
261
            content
262
        )
263
    
264
    def _process_diagram(self, match, output_dir, file_stem):
265
        """Process individual diagram tag."""
266
        tag = match.group('tag')
267
        options_str = match.group('options') or ''
268
        body = match.group('body')
269
        
270
        options = {**self.defaults, **self.options, **self.get_options(options_str)}
271
        
272
        # Generate unique filename
273
        diagram_hash = hash(body + str(options))
274
        filename = f"{file_stem}_{tag}_{abs(diagram_hash)}.{options['format']}"
275
        output_path = output_dir / filename
276
        
277
        # Generate diagram
278
        if tag == 'plantuml':
279
            self._generate_plantuml(body, output_path, options)
280
        elif tag == 'mermaid':
281
            self._generate_mermaid(body, output_path, options)
282
        
283
        # Return markdown image reference
284
        return f"![Diagram]({output_path.relative_to(self.working_dir)})"
285
    
286
    def _generate_plantuml(self, source, output_path, options):
287
        """Generate PlantUML diagram."""
288
        subprocess.run([
289
            'plantuml', 
290
            '-t' + options['format'],
291
            '-o', str(output_path.parent),
292
            '-'
293
        ], input=source, text=True, check=True)
294
    
295
    def _generate_mermaid(self, source, output_path, options):
296
        """Generate Mermaid diagram."""
297
        subprocess.run([
298
            'mmdc',
299
            '-i', '-',
300
            '-o', str(output_path),
301
            '-t', options['theme']
302
        ], input=source, text=True, check=True)
303
```
304

305
### Preprocessor Configuration
306

307
Example `foliant.yml` preprocessor configuration:
308
```yaml
309
title: My Project
310

311
preprocessors:
312
  - includes
313
  - plantuml:
314
      format: svg
315
      theme: dark
316
      server_url: http://localhost:8080
317
  - custom:
318
      style: modern
319
      format: html
320
      uppercase: false
321
```
322

323
### Conditional Preprocessor
324

325
```python
326
class ConditionalPreprocessor(BasePreprocessor):
327
    """Preprocessor for conditional content inclusion."""
328
    
329
    defaults = {'target': 'all'}
330
    tags = ('if', 'unless', 'target')
331
    
332
    def apply(self):
333
        """Remove or keep content based on conditions."""
334
        current_target = self.context['target']
335
        
336
        for markdown_file in self.working_dir.rglob('*.md'):
337
            with open(markdown_file, 'r', encoding='utf8') as f:
338
                content = f.read()
339
            
340
            # Process conditional tags
341
            content = self._process_conditionals(content, current_target)
342
            
343
            with open(markdown_file, 'w', encoding='utf8') as f:
344
                f.write(content)
345
    
346
    def _process_conditionals(self, content, current_target):
347
        """Process conditional tags based on current build target."""
348
        def process_tag(match):
349
            tag = match.group('tag')
350
            options_str = match.group('options') or ''
351
            body = match.group('body')
352
            
353
            options = self.get_options(options_str)
354
            target_condition = options.get('target', 'all')
355
            
356
            if tag == 'if':
357
                # Include content if target matches
358
                if target_condition == 'all' or target_condition == current_target:
359
                    return body
360
                else:
361
                    return ''
362
            elif tag == 'unless':
363
                # Include content unless target matches
364
                if target_condition != current_target:
365
                    return body
366
                else:
367
                    return ''
368
            elif tag == 'target':
369
                # Include only for specific target
370
                if target_condition == current_target:
371
                    return body
372
                else:
373
                    return ''
374
            
375
            return match.group(0)
376
        
377
        return self.pattern.sub(process_tag, content)
378
```
379

380
Usage in Markdown:
381
```markdown
382
# Documentation
383

384
<if target="html">
385
This content only appears in HTML builds.
386
</if>
387

388
<unless target="pdf">
389
This content appears in all formats except PDF.
390
</unless>
391

392
<target target="pdf">
393
PDF-specific content here.
394
</target>
395
```

Version

Tile

Files

preprocessors.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

preprocessors.mddocs/