or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

cli.mdcore-conversions.mdindex.mdmacros-extensions.mdparsers-emitters.mdrest-tools.mdsetup-utilities.md

parsers-emitters.mddocs/

0

# Parsers and Emitters

1

2

Low-level classes for advanced parsing and emission control, enabling custom conversion workflows and specialized markup processing. These classes provide the foundation for all conversion functions and allow fine-grained control over the parsing and output generation process.

3

4

## Capabilities

5

6

### Creole Parser

7

8

Parse Creole markup into document tree structure for processing.

9

10

```python { .api }

11

class CreoleParser:

12

def __init__(self, markup_string: str, block_rules: tuple = None,

13

blog_line_breaks: bool = True, debug: bool = False): ...

14

def parse(self) -> DocNode: ...

15

```

16

17

**Parameters:**

18

- `markup_string`: Creole markup text to parse

19

- `block_rules`: Custom block-level parsing rules

20

- `blog_line_breaks`: Use blog-style (True) vs wiki-style (False) line breaks

21

- `debug`: Enable debug output

22

23

**Usage Examples:**

24

25

```python

26

from creole.parser.creol2html_parser import CreoleParser

27

28

# Basic parsing

29

parser = CreoleParser("This is **bold** text")

30

document = parser.parse()

31

32

# Custom block rules

33

from creole.parser.creol2html_rules import BlockRules

34

custom_rules = BlockRules()

35

parser = CreoleParser(markup, block_rules=custom_rules)

36

document = parser.parse()

37

38

# Debug mode

39

parser = CreoleParser(markup, debug=True)

40

document = parser.parse()

41

if debug:

42

document.debug() # Print document tree structure

43

```

44

45

### HTML Parser

46

47

Parse HTML markup into document tree structure for conversion to other formats.

48

49

```python { .api }

50

class HtmlParser:

51

def __init__(self, debug: bool = False): ...

52

def feed(self, html_string: str) -> DocNode: ...

53

def debug(self): ...

54

```

55

56

**Parameters:**

57

- `debug`: Enable debug output and tree visualization

58

59

**Usage Examples:**

60

61

```python

62

from creole.parser.html_parser import HtmlParser

63

64

# Basic HTML parsing

65

parser = HtmlParser()

66

document = parser.feed('<p>Hello <strong>world</strong></p>')

67

68

# Debug mode

69

parser = HtmlParser(debug=True)

70

document = parser.feed(html_content)

71

parser.debug() # Print parsing debug information

72

```

73

74

### HTML Emitter

75

76

Convert document tree to HTML output with macro support and formatting options.

77

78

```python { .api }

79

class HtmlEmitter:

80

def __init__(self, document: DocNode, macros: dict = None,

81

verbose: int = None, stderr = None, strict: bool = False): ...

82

def emit(self) -> str: ...

83

```

84

85

**Parameters:**

86

- `document`: Document tree to convert

87

- `macros`: Dictionary of macro functions

88

- `verbose`: Verbosity level for output

89

- `stderr`: Error output stream

90

- `strict`: Enable strict Creole 1.0 compliance

91

92

**Usage Examples:**

93

94

```python

95

from creole.emitter.creol2html_emitter import HtmlEmitter

96

from creole.parser.creol2html_parser import CreoleParser

97

98

# Parse and emit HTML

99

parser = CreoleParser("**bold** text")

100

document = parser.parse()

101

emitter = HtmlEmitter(document)

102

html = emitter.emit()

103

104

# With macros

105

def code_macro(ext, text):

106

return f'<pre><code class="{ext}">{text}</code></pre>'

107

108

macros = {'code': code_macro}

109

emitter = HtmlEmitter(document, macros=macros)

110

html = emitter.emit()

111

112

# Strict mode

113

emitter = HtmlEmitter(document, strict=True)

114

html = emitter.emit()

115

```

116

117

### Creole Emitter

118

119

Convert document tree to Creole markup output with unknown tag handling.

120

121

```python { .api }

122

class CreoleEmitter:

123

def __init__(self, document: DocNode, debug: bool = False,

124

unknown_emit = None, strict: bool = False): ...

125

def emit(self) -> str: ...

126

```

127

128

**Parameters:**

129

- `document`: Document tree to convert

130

- `debug`: Enable debug output

131

- `unknown_emit`: Handler function for unknown HTML tags

132

- `strict`: Enable strict Creole output mode

133

134

**Usage Examples:**

135

136

```python

137

from creole.emitter.html2creole_emitter import CreoleEmitter

138

from creole.parser.html_parser import HtmlParser

139

from creole.shared.unknown_tags import transparent_unknown_nodes

140

141

# Parse HTML and emit Creole

142

parser = HtmlParser()

143

document = parser.feed('<p><strong>bold</strong> text</p>')

144

emitter = CreoleEmitter(document)

145

creole = emitter.emit()

146

147

# Handle unknown tags

148

emitter = CreoleEmitter(document, unknown_emit=transparent_unknown_nodes)

149

creole = emitter.emit()

150

151

# Debug mode

152

emitter = CreoleEmitter(document, debug=True)

153

creole = emitter.emit()

154

```

155

156

### ReStructuredText Emitter

157

158

Convert document tree to ReStructuredText markup with reference link handling.

159

160

```python { .api }

161

class ReStructuredTextEmitter:

162

def __init__(self, document: DocNode, debug: bool = False,

163

unknown_emit = None): ...

164

def emit(self) -> str: ...

165

```

166

167

**Parameters:**

168

- `document`: Document tree to convert

169

- `debug`: Enable debug output

170

- `unknown_emit`: Handler function for unknown HTML tags

171

172

**Usage Examples:**

173

174

```python

175

from creole.emitter.html2rest_emitter import ReStructuredTextEmitter

176

from creole.parser.html_parser import HtmlParser

177

178

# Parse HTML and emit ReStructuredText

179

parser = HtmlParser()

180

document = parser.feed('<h1>Title</h1><p>Content with <a href="http://example.com">link</a></p>')

181

emitter = ReStructuredTextEmitter(document)

182

rest = emitter.emit()

183

# Returns ReStructuredText with proper heading underlines and reference links

184

```

185

186

### Textile Emitter

187

188

Convert document tree to Textile markup format.

189

190

```python { .api }

191

class TextileEmitter:

192

def __init__(self, document: DocNode, debug: bool = False,

193

unknown_emit = None): ...

194

def emit(self) -> str: ...

195

```

196

197

**Parameters:**

198

- `document`: Document tree to convert

199

- `debug`: Enable debug output

200

- `unknown_emit`: Handler function for unknown HTML tags

201

202

**Usage Examples:**

203

204

```python

205

from creole.emitter.html2textile_emitter import TextileEmitter

206

from creole.parser.html_parser import HtmlParser

207

208

# Parse HTML and emit Textile

209

parser = HtmlParser()

210

document = parser.feed('<p><strong>bold</strong> and <em>italic</em></p>')

211

emitter = TextileEmitter(document)

212

textile = emitter.emit()

213

# Returns: '*bold* and __italic__'

214

```

215

216

## Document Tree Structure

217

218

### DocNode Class

219

220

The document tree node that represents markup elements and hierarchy.

221

222

```python { .api }

223

class DocNode:

224

def __init__(self, kind: str = None, parent = None): ...

225

def debug(self): ...

226

def append(self, child): ...

227

def get_text(self) -> str: ...

228

```

229

230

**Properties:**

231

- `kind`: Node type (e.g., 'document', 'paragraph', 'strong', 'link')

232

- `parent`: Parent node reference

233

- `children`: List of child nodes

234

- `content`: Text content for leaf nodes

235

- `attrs`: Dictionary of node attributes

236

237

**Usage Examples:**

238

239

```python

240

from creole.shared.document_tree import DocNode

241

242

# Create document structure

243

doc = DocNode('document')

244

para = DocNode('paragraph', parent=doc)

245

doc.append(para)

246

247

bold = DocNode('strong', parent=para)

248

bold.content = 'bold text'

249

para.append(bold)

250

251

# Debug tree structure

252

doc.debug()

253

```

254

255

## Advanced Usage Patterns

256

257

### Custom Parser-Emitter Workflow

258

259

```python

260

from creole.parser.creol2html_parser import CreoleParser

261

from creole.emitter.html2rest_emitter import ReStructuredTextEmitter

262

263

# Parse Creole and emit ReStructuredText directly

264

parser = CreoleParser("= Heading =\n\nThis is **bold** text")

265

document = parser.parse()

266

emitter = ReStructuredTextEmitter(document)

267

rest_output = emitter.emit()

268

```

269

270

### Document Tree Manipulation

271

272

```python

273

# Parse, modify, and emit

274

parser = CreoleParser("Original text")

275

document = parser.parse()

276

277

# Modify document tree

278

for node in document.children:

279

if node.kind == 'strong':

280

node.kind = 'emphasis' # Change bold to italic

281

282

emitter = HtmlEmitter(document)

283

modified_html = emitter.emit()

284

```

285

286

## HTML Processing Utilities

287

288

### HTML Entity Decoder

289

290

Utility class for converting HTML entities to Unicode characters.

291

292

```python { .api }

293

class Deentity:

294

def __init__(self): ...

295

def replace_all(self, content: str) -> str: ...

296

def replace_number(self, text: str) -> str: ...

297

def replace_hex(self, text: str) -> str: ...

298

def replace_named(self, text: str) -> str: ...

299

```

300

301

**Usage Examples:**

302

303

```python

304

from creole.html_tools.deentity import Deentity

305

306

# Create decoder instance

307

decoder = Deentity()

308

309

# Convert all types of HTML entities

310

html_text = "&lt;p&gt;Hello &amp; welcome &#8212; &#x2014; &nbsp;"

311

clean_text = decoder.replace_all(html_text)

312

# Returns: '<p>Hello & welcome — — \xa0'

313

314

# Convert specific entity types

315

decoder.replace_number("62") # Returns: '>'

316

decoder.replace_hex("3E") # Returns: '>'

317

decoder.replace_named("amp") # Returns: '&'

318

```

319

320

### HTML Whitespace Stripper

321

322

Remove unnecessary whitespace from HTML while preserving structure.

323

324

```python { .api }

325

def strip_html(html_code: str) -> str: ...

326

```

327

328

**Usage Examples:**

329

330

```python

331

from creole.html_tools.strip_html import strip_html

332

333

# Clean up HTML whitespace

334

messy_html = ' <p> one \n two </p>'

335

clean_html = strip_html(messy_html)

336

# Returns: '<p>one two</p>'

337

338

# Preserves important spacing around inline elements

339

html = 'one <i>two \n <strong> \n three \n </strong></i>'

340

clean = strip_html(html)

341

# Returns: 'one <i>two <strong>three</strong> </i>'

342

```

343

344

## Unknown Tag Handlers

345

346

Functions for handling unknown HTML tags during conversion.

347

348

```python { .api }

349

def raise_unknown_node(emitter, node): ...

350

def use_html_macro(emitter, node): ...

351

def preformat_unknown_nodes(emitter, node): ...

352

def escape_unknown_nodes(emitter, node): ...

353

def transparent_unknown_nodes(emitter, node): ...

354

```

355

356

**Usage Examples:**

357

358

```python

359

from creole.shared.unknown_tags import (

360

transparent_unknown_nodes, escape_unknown_nodes,

361

raise_unknown_node, use_html_macro

362

)

363

from creole import html2creole

364

365

# Different ways to handle unknown tags

366

html = '<p>Text with <unknown>content</unknown></p>'

367

368

# Remove tags, keep content (default)

369

creole = html2creole(html, unknown_emit=transparent_unknown_nodes)

370

# Returns: 'Text with content'

371

372

# Escape unknown tags as text

373

creole = html2creole(html, unknown_emit=escape_unknown_nodes)

374

# Returns: 'Text with &lt;unknown&gt;content&lt;/unknown&gt;'

375

376

# Raise error on unknown tags

377

try:

378

creole = html2creole(html, unknown_emit=raise_unknown_node)

379

except NotImplementedError:

380

print("Unknown tag encountered")

381

382

# Wrap in HTML macro

383

creole = html2creole(html, unknown_emit=use_html_macro)

384

# Returns: 'Text with <<html>><unknown>content</unknown><</html>>'

385

```

386

387

### Error Handling and Debugging

388

389

All parser and emitter classes support debug mode for troubleshooting:

390

391

```python

392

# Enable debugging

393

parser = CreoleParser(markup, debug=True)

394

document = parser.parse()

395

document.debug() # Print tree structure

396

397

emitter = HtmlEmitter(document, verbose=2)

398

html = emitter.emit() # Verbose output during emission

399

```