or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

backends.mdcli.mdconfig.mdindex.mdpreprocessors.mdutils.md

preprocessors.mddocs/

0

# Preprocessor System

1

2

Foliant's preprocessor system provides content transformation capabilities for modifying Markdown before backend processing. Preprocessors use tag-based content processing to enable features like includes, diagram generation, conditional content, and custom transformations.

3

4

## Capabilities

5

6

### Base Preprocessor Class

7

8

Foundation class for all content preprocessors providing tag parsing, option handling, and common functionality.

9

10

```python { .api }

11

class BasePreprocessor:

12

"""Base preprocessor class that all preprocessors must inherit from."""

13

14

defaults: dict = {}

15

tags: tuple = ()

16

17

def __init__(self, context: dict, logger: Logger, quiet=False, debug=False, options={}):

18

"""

19

Initialize preprocessor with build context and options.

20

21

Parameters:

22

- context (dict): Build context containing project_path, config, target, backend

23

- logger (Logger): Logger instance for processing messages

24

- quiet (bool): Suppress output messages

25

- debug (bool): Enable debug logging

26

- options (dict): Preprocessor-specific configuration options

27

"""

28

29

@staticmethod

30

def get_options(options_string: str) -> Dict[str, OptionValue]:

31

"""

32

Parse XML attribute string into typed options dictionary.

33

34

Parameters:

35

- options_string (str): String of XML-style attributes

36

37

Returns:

38

Dict[str, OptionValue]: Parsed options with proper types

39

40

Example:

41

'width="800" height="600" visible="true"' ->

42

{'width': 800, 'height': 600, 'visible': True}

43

"""

44

45

def apply(self):

46

"""

47

Run preprocessor against project content.

48

Must be implemented by each preprocessor.

49

50

Raises:

51

NotImplementedError: If not implemented by subclass

52

"""

53

```

54

55

### Unescape Preprocessor

56

57

Built-in preprocessor that handles escaped tag processing for nested tag scenarios.

58

59

```python { .api }

60

class Preprocessor(BasePreprocessor):

61

"""

62

Internal preprocessor for unescaping escaped tags.

63

Removes leading < from escaped tag definitions.

64

"""

65

66

def process_escaped_tags(self, content: str) -> str:

67

"""

68

Remove escape sequences from tag definitions.

69

70

Parameters:

71

- content (str): Markdown content with escaped tags

72

73

Returns:

74

str: Content with tags unescaped

75

"""

76

77

def apply(self):

78

"""Process all .md files in working directory to unescape tags."""

79

```

80

81

## Type Definitions

82

83

```python { .api }

84

OptionValue = int | float | bool | str

85

86

# Preprocessor context structure

87

PreprocessorContext = {

88

'project_path': Path, # Path to project directory

89

'config': dict, # Parsed configuration

90

'target': str, # Target format

91

'backend': str # Backend name

92

}

93

94

# Tag pattern structure for regex matching

95

TagPattern = {

96

'tag': str, # Tag name

97

'options': str, # Options string

98

'body': str # Tag content body

99

}

100

```

101

102

## Usage Examples

103

104

### Custom Preprocessor Implementation

105

106

```python

107

from foliant.preprocessors.base import BasePreprocessor

108

import re

109

110

class CustomPreprocessor(BasePreprocessor):

111

"""Custom preprocessor for special content transformation."""

112

113

defaults = {

114

'format': 'html',

115

'style': 'default'

116

}

117

tags = ('custom', 'transform')

118

119

def apply(self):

120

"""Process all markdown files with custom tags."""

121

for markdown_file in self.working_dir.rglob('*.md'):

122

self.logger.debug(f'Processing {markdown_file}')

123

124

with open(markdown_file, 'r', encoding='utf8') as f:

125

content = f.read()

126

127

# Process tags using inherited pattern

128

content = self.pattern.sub(self._process_tag, content)

129

130

with open(markdown_file, 'w', encoding='utf8') as f:

131

f.write(content)

132

133

def _process_tag(self, match):

134

"""Process individual tag occurrence."""

135

tag = match.group('tag')

136

options_str = match.group('options') or ''

137

body = match.group('body')

138

139

# Parse options

140

options = self.get_options(options_str)

141

final_options = {**self.defaults, **self.options, **options}

142

143

# Transform content based on tag and options

144

if tag == 'custom':

145

return self._transform_custom(body, final_options)

146

elif tag == 'transform':

147

return self._transform_content(body, final_options)

148

149

return match.group(0) # Return unchanged if not handled

150

151

def _transform_custom(self, content, options):

152

"""Transform custom tag content."""

153

format_type = options['format']

154

style = options['style']

155

156

if format_type == 'html':

157

return f'<div class="custom-{style}">{content}</div>'

158

else:

159

return f'[{style.upper()}]: {content}'

160

161

def _transform_content(self, content, options):

162

"""Transform generic content."""

163

return content.upper() if options.get('uppercase') else content

164

```

165

166

### Tag-based Content Processing

167

168

Example Markdown with custom tags:

169

```markdown

170

# My Document

171

172

<custom format="html" style="highlight">

173

Important content here

174

</custom>

175

176

<transform uppercase="true">

177

This text will be uppercase

178

</transform>

179

180

<custom style="callout">

181

This is a callout box

182

</custom>

183

```

184

185

Preprocessor usage:

186

```python

187

from pathlib import Path

188

import logging

189

190

# Set up context

191

context = {

192

'project_path': Path('./project'),

193

'config': {'title': 'Test'},

194

'target': 'html',

195

'backend': 'mkdocs'

196

}

197

198

# Create and run preprocessor

199

preprocessor = CustomPreprocessor(

200

context=context,

201

logger=logging.getLogger(),

202

options={'format': 'html', 'style': 'modern'}

203

)

204

205

preprocessor.apply()

206

```

207

208

### Option Parsing

209

210

```python

211

from foliant.preprocessors.base import BasePreprocessor

212

213

# Parse XML-style options

214

options_string = 'width="800" height="600" visible="true" title="My Chart"'

215

options = BasePreprocessor.get_options(options_string)

216

217

print(options)

218

# Output: {'width': 800, 'height': 600, 'visible': True, 'title': 'My Chart'}

219

220

# Handle empty options

221

empty_options = BasePreprocessor.get_options('')

222

print(empty_options) # Output: {}

223

```

224

225

### Complex Preprocessor with File Operations

226

227

```python

228

from foliant.preprocessors.base import BasePreprocessor

229

import subprocess

230

from pathlib import Path

231

232

class DiagramPreprocessor(BasePreprocessor):

233

"""Preprocessor for generating diagrams from text."""

234

235

defaults = {

236

'format': 'png',

237

'theme': 'default',

238

'output_dir': 'images'

239

}

240

tags = ('plantuml', 'mermaid')

241

242

def apply(self):

243

"""Process diagram tags in all markdown files."""

244

# Create output directory

245

output_dir = self.working_dir / self.options['output_dir']

246

output_dir.mkdir(exist_ok=True)

247

248

for markdown_file in self.working_dir.rglob('*.md'):

249

content = self._process_file(markdown_file, output_dir)

250

251

with open(markdown_file, 'w', encoding='utf8') as f:

252

f.write(content)

253

254

def _process_file(self, file_path, output_dir):

255

"""Process single markdown file."""

256

with open(file_path, 'r', encoding='utf8') as f:

257

content = f.read()

258

259

return self.pattern.sub(

260

lambda m: self._process_diagram(m, output_dir, file_path.stem),

261

content

262

)

263

264

def _process_diagram(self, match, output_dir, file_stem):

265

"""Process individual diagram tag."""

266

tag = match.group('tag')

267

options_str = match.group('options') or ''

268

body = match.group('body')

269

270

options = {**self.defaults, **self.options, **self.get_options(options_str)}

271

272

# Generate unique filename

273

diagram_hash = hash(body + str(options))

274

filename = f"{file_stem}_{tag}_{abs(diagram_hash)}.{options['format']}"

275

output_path = output_dir / filename

276

277

# Generate diagram

278

if tag == 'plantuml':

279

self._generate_plantuml(body, output_path, options)

280

elif tag == 'mermaid':

281

self._generate_mermaid(body, output_path, options)

282

283

# Return markdown image reference

284

return f"![Diagram]({output_path.relative_to(self.working_dir)})"

285

286

def _generate_plantuml(self, source, output_path, options):

287

"""Generate PlantUML diagram."""

288

subprocess.run([

289

'plantuml',

290

'-t' + options['format'],

291

'-o', str(output_path.parent),

292

'-'

293

], input=source, text=True, check=True)

294

295

def _generate_mermaid(self, source, output_path, options):

296

"""Generate Mermaid diagram."""

297

subprocess.run([

298

'mmdc',

299

'-i', '-',

300

'-o', str(output_path),

301

'-t', options['theme']

302

], input=source, text=True, check=True)

303

```

304

305

### Preprocessor Configuration

306

307

Example `foliant.yml` preprocessor configuration:

308

```yaml

309

title: My Project

310

311

preprocessors:

312

- includes

313

- plantuml:

314

format: svg

315

theme: dark

316

server_url: http://localhost:8080

317

- custom:

318

style: modern

319

format: html

320

uppercase: false

321

```

322

323

### Conditional Preprocessor

324

325

```python

326

class ConditionalPreprocessor(BasePreprocessor):

327

"""Preprocessor for conditional content inclusion."""

328

329

defaults = {'target': 'all'}

330

tags = ('if', 'unless', 'target')

331

332

def apply(self):

333

"""Remove or keep content based on conditions."""

334

current_target = self.context['target']

335

336

for markdown_file in self.working_dir.rglob('*.md'):

337

with open(markdown_file, 'r', encoding='utf8') as f:

338

content = f.read()

339

340

# Process conditional tags

341

content = self._process_conditionals(content, current_target)

342

343

with open(markdown_file, 'w', encoding='utf8') as f:

344

f.write(content)

345

346

def _process_conditionals(self, content, current_target):

347

"""Process conditional tags based on current build target."""

348

def process_tag(match):

349

tag = match.group('tag')

350

options_str = match.group('options') or ''

351

body = match.group('body')

352

353

options = self.get_options(options_str)

354

target_condition = options.get('target', 'all')

355

356

if tag == 'if':

357

# Include content if target matches

358

if target_condition == 'all' or target_condition == current_target:

359

return body

360

else:

361

return ''

362

elif tag == 'unless':

363

# Include content unless target matches

364

if target_condition != current_target:

365

return body

366

else:

367

return ''

368

elif tag == 'target':

369

# Include only for specific target

370

if target_condition == current_target:

371

return body

372

else:

373

return ''

374

375

return match.group(0)

376

377

return self.pattern.sub(process_tag, content)

378

```

379

380

Usage in Markdown:

381

```markdown

382

# Documentation

383

384

<if target="html">

385

This content only appears in HTML builds.

386

</if>

387

388

<unless target="pdf">

389

This content appears in all formats except PDF.

390

</unless>

391

392

<target target="pdf">

393

PDF-specific content here.

394

</target>

395

```