or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

core-parsing.mddirectives.mdindex.mdparsing.mdplugins.mdrenderers.mdutilities.md

parsing.mddocs/

0

# Block and Inline Parsing

1

2

Low-level parsing components that handle the conversion of Markdown text into structured tokens. The parsing system is split into block-level elements (paragraphs, headings, lists) and inline elements (bold, italic, links), with state management for tracking parsing progress and context.

3

4

## Capabilities

5

6

### Block Parser

7

8

Handles block-level Markdown elements like headings, paragraphs, lists, code blocks, and blockquotes.

9

10

```python { .api }

11

class BlockParser(Parser[BlockState]):

12

"""

13

Parser for block-level Markdown elements.

14

15

Handles elements that form document structure: headings, paragraphs,

16

lists, code blocks, blockquotes, tables, etc.

17

"""

18

19

def __init__(self):

20

"""Initialize block parser with default rules."""

21

22

def parse(self, state: BlockState, rules: Optional[List[str]] = None) -> None:

23

"""

24

Parse state source and populate with block tokens.

25

26

Parameters:

27

- state: BlockState to parse and populate with tokens

28

- rules: Optional list of rules to use for parsing

29

"""

30

```

31

32

### Inline Parser

33

34

Processes inline Markdown elements within block content like emphasis, links, code spans, and images.

35

36

```python { .api }

37

class InlineParser(Parser[InlineState]):

38

"""

39

Parser for inline-level Markdown elements.

40

41

Handles elements within block content: bold, italic, links,

42

images, code spans, line breaks, etc.

43

"""

44

45

def __init__(self, hard_wrap: bool = False):

46

"""

47

Initialize inline parser.

48

49

Parameters:

50

- hard_wrap: Whether to convert line breaks to <br> tags

51

"""

52

53

def __call__(self, text: str, env: MutableMapping[str, Any]) -> List[Dict[str, Any]]:

54

"""

55

Process text and return inline tokens.

56

57

Parameters:

58

- text: Text to process

59

- env: Environment mapping for parsing context

60

61

Returns:

62

List of inline tokens

63

"""

64

```

65

66

### Block State

67

68

State management for block-level parsing including cursor position, token accumulation, and parsing environment.

69

70

```python { .api }

71

class BlockState:

72

"""

73

State management for block-level parsing.

74

75

Tracks parsing progress, accumulated tokens, and contextual information

76

during the block parsing process.

77

78

Attributes:

79

- src: str - Source text being parsed

80

- tokens: List[Dict[str, Any]] - Accumulated parsed tokens

81

- cursor: int - Current position in source text

82

- cursor_max: int - Maximum position (length of source)

83

- list_tight: bool - Whether current list is tight formatting

84

- parent: Any - Parent parsing context

85

- env: MutableMapping[str, Any] - Environment variables and data

86

"""

87

88

def __init__(self, parent: Optional[Any] = None):

89

"""

90

Initialize block parsing state.

91

92

Parameters:

93

- parent: Parent state context

94

"""

95

96

def child_state(self, src: str) -> Self:

97

"""

98

Create child state for nested parsing.

99

100

Parameters:

101

- src: Source text for child state

102

103

Returns:

104

New BlockState instance with this state as parent

105

"""

106

107

def process(self, text: str) -> Self:

108

"""

109

Process text and return populated state.

110

111

Parameters:

112

- text: Text to process

113

114

Returns:

115

Self with populated tokens and updated cursor

116

"""

117

```

118

119

### Inline State

120

121

State management for inline-level parsing within block elements.

122

123

```python { .api }

124

class InlineState:

125

"""

126

State management for inline-level parsing.

127

128

Tracks parsing of inline elements within block content including

129

position tracking and environment data.

130

131

Attributes:

132

- src: str - Source text being parsed

133

- tokens: List[Dict[str, Any]] - Accumulated inline tokens

134

- pos: int - Current position in source text

135

- env: MutableMapping[str, Any] - Environment variables and data

136

"""

137

138

def __init__(self):

139

"""Initialize inline parsing state."""

140

141

def append_token(self, token: Dict[str, Any]) -> None:

142

"""

143

Add token to the token list.

144

145

Parameters:

146

- token: Token to add

147

"""

148

```

149

150

### Base Parser

151

152

Abstract base class providing common parsing functionality.

153

154

```python { .api }

155

ST = TypeVar('ST', bound=Union[BlockState, InlineState])

156

157

class Parser(Generic[ST]):

158

"""

159

Base parser class with common parsing functionality.

160

161

Provides rule registration, method dispatch, and parsing utilities

162

for both block and inline parsers.

163

"""

164

165

def register(

166

self,

167

name: str,

168

pattern: Union[str, None],

169

func: Callable,

170

before: Optional[str] = None

171

) -> None:

172

"""

173

Register a new parsing rule.

174

175

Parameters:

176

- name: Rule name

177

- pattern: Regex pattern string or None

178

- func: Function to handle matches

179

- before: Insert rule before this existing rule

180

"""

181

```

182

183

## Usage Examples

184

185

### Custom Block Rule

186

187

Adding a custom block-level element:

188

189

```python

190

from mistune import create_markdown, BlockParser

191

import re

192

193

def custom_block_plugin(md):

194

"""Add support for custom block syntax: :::type content :::"""

195

196

def parse_custom_block(block, m, state):

197

block_type = m.group(1)

198

content = m.group(2).strip()

199

200

# Parse content as nested blocks

201

child = state.child_state(content)

202

block.parse(content, child)

203

204

return {

205

'type': 'custom_block',

206

'attrs': {'block_type': block_type},

207

'children': child.tokens

208

}

209

210

# Register rule with block parser

211

md.block.register(

212

'custom_block',

213

r'^:::(\w+)\n(.*?)\n:::$',

214

parse_custom_block

215

)

216

217

# Add renderer method

218

def render_custom_block(text, block_type):

219

return f'<div class="custom-{block_type}">{text}</div>\n'

220

221

md.renderer.register('custom_block', render_custom_block)

222

223

# Use custom plugin

224

md = create_markdown()

225

md.use(custom_block_plugin)

226

227

result = md("""

228

:::warning

229

This is a **warning** block.

230

:::

231

""")

232

```

233

234

### Custom Inline Rule

235

236

Adding a custom inline element:

237

238

```python

239

from mistune import create_markdown

240

import re

241

242

def emoji_plugin(md):

243

"""Add support for emoji syntax: :emoji_name:"""

244

245

def parse_emoji(inline, m, state):

246

emoji_name = m.group(1)

247

return 'emoji', emoji_name

248

249

# Register with inline parser

250

md.inline.register('emoji', r':(\w+):', parse_emoji)

251

252

# Add renderer method

253

def render_emoji(emoji_name):

254

emoji_map = {

255

'smile': '😊',

256

'heart': '❀️',

257

'thumbsup': 'πŸ‘'

258

}

259

return emoji_map.get(emoji_name, f':{emoji_name}:')

260

261

md.renderer.register('emoji', render_emoji)

262

263

# Use emoji plugin

264

md = create_markdown()

265

md.use(emoji_plugin)

266

267

result = md('Hello :smile: world :heart:!')

268

# Output: Hello 😊 world ❀️!

269

```

270

271

### State Access and Analysis

272

273

Accessing parsing state for analysis:

274

275

```python

276

from mistune import create_markdown

277

278

md = create_markdown()

279

280

# Parse with state access

281

text = """

282

# Heading 1

283

284

This is a paragraph with **bold** text.

285

286

## Heading 2

287

288

- List item 1

289

- List item 2

290

"""

291

292

output, state = md.parse(text)

293

294

# Analyze tokens

295

def analyze_tokens(tokens, level=0):

296

indent = " " * level

297

for token in tokens:

298

print(f"{indent}Token: {token['type']}")

299

if 'attrs' in token:

300

print(f"{indent} Attrs: {token['attrs']}")

301

if 'children' in token:

302

analyze_tokens(token['children'], level + 1)

303

304

analyze_tokens(state.tokens)

305

306

# Access environment data

307

print(f"Environment: {state.env}")

308

```

309

310

### Parser Customization

311

312

Customizing parser behavior:

313

314

```python

315

from mistune import BlockParser, InlineParser, Markdown, HTMLRenderer

316

317

# Create custom parsers

318

block = BlockParser()

319

inline = InlineParser(hard_wrap=True) # Convert line breaks to <br>

320

321

# Remove specific rules by modifying rules list

322

block.rules.remove('block_quote') # Disable blockquotes

323

inline.rules.remove('emphasis') # Disable italic text

324

325

# Create parser with custom components

326

renderer = HTMLRenderer(escape=False)

327

md = Markdown(renderer=renderer, block=block, inline=inline)

328

329

result = md('This is *not italic*\nThis is a line break.')

330

```

331

332

## Token Structure

333

334

Understanding the token format for custom processing:

335

336

```python

337

# Block token structure

338

block_token = {

339

'type': 'heading', # Token type

340

'attrs': {'level': 1}, # Element attributes

341

'children': [ # Child tokens (for container elements)

342

{

343

'type': 'text',

344

'raw': 'Heading Text'

345

}

346

]

347

}

348

349

# Inline token structure

350

inline_token = {

351

'type': 'strong', # Token type

352

'children': [ # Child tokens

353

{

354

'type': 'text',

355

'raw': 'Bold Text'

356

}

357

]

358

}

359

360

# Leaf token structure

361

text_token = {

362

'type': 'text', # Token type

363

'raw': 'Plain text content' # Raw text content

364

}

365

```

366

367

This parsing architecture provides the flexibility to extend mistune with custom syntax while maintaining high performance through optimized parsing algorithms and clear separation between block and inline processing stages.