or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

cli-tools.mdcontent-generation.mdcontent-management.mdcontent-reading.mdindex.mdmain-application.mdplugin-system.mdsettings-configuration.mdutilities.md

content-reading.mddocs/

0

# Content Reading

1

2

Reader classes for parsing different markup formats including Markdown, reStructuredText, and HTML. Readers extract metadata, process content, and convert markup to HTML for theme rendering.

3

4

## Capabilities

5

6

### Readers Manager

7

8

Central reader manager that coordinates different format readers and provides caching functionality for improved performance.

9

10

```python { .api }

11

class Readers(FileStampDataCacher):

12

"""

13

Content reader manager with caching support.

14

15

Parameters:

16

- settings (dict): Site configuration dictionary

17

- cache_name (str, optional): Cache identifier for file caching

18

"""

19

def __init__(self, settings: dict, cache_name: str = ""): ...

20

21

def read_file(

22

self,

23

base_path: str,

24

path: str,

25

content_class=Content,

26

fmt: str = None

27

) -> Content:

28

"""

29

Read and parse a content file.

30

31

Parameters:

32

- base_path (str): Base directory path

33

- path (str): Relative file path

34

- content_class (class, optional): Content class to instantiate (default: Content)

35

- fmt (str, optional): Force specific format reader

36

37

Returns:

38

Content: Parsed content object with metadata and HTML content

39

"""

40

41

# Available readers (populated from settings)

42

readers: dict[str, BaseReader] # Format -> Reader mapping

43

```

44

45

### Base Reader Class

46

47

Foundation class for all content format readers providing common functionality for metadata extraction and content processing.

48

49

```python { .api }

50

class BaseReader:

51

"""

52

Base class for content format readers.

53

54

Parameters:

55

- settings (dict): Site configuration dictionary

56

"""

57

def __init__(self, settings: dict): ...

58

59

enabled: bool = True # Whether this reader is enabled

60

file_extensions: list[str] # Supported file extensions

61

62

def read(self, source_path: str) -> tuple[str, dict]:

63

"""

64

Read and parse content file.

65

66

Parameters:

67

- source_path (str): Path to content file

68

69

Returns:

70

tuple: (HTML content string, metadata dictionary)

71

"""

72

73

def process_metadata(self, name: str, value: str) -> tuple[str, Any]:

74

"""

75

Process individual metadata field.

76

77

Parameters:

78

- name (str): Metadata field name

79

- value (str): Raw metadata value

80

81

Returns:

82

tuple: (processed name, processed value)

83

"""

84

```

85

86

### reStructuredText Reader

87

88

Reader for reStructuredText (.rst) files using the docutils library for parsing and HTML generation.

89

90

```python { .api }

91

class RstReader(BaseReader):

92

"""

93

reStructuredText content reader.

94

95

Supports:

96

- Standard reStructuredText syntax

97

- Custom Pelican directives (code highlighting, etc.)

98

- Metadata extraction from docutils meta fields

99

- Math rendering via MathJax

100

- Custom role and directive registration

101

"""

102

103

file_extensions: list[str] = ['rst']

104

105

def read(self, source_path: str) -> tuple[str, dict]:

106

"""

107

Parse reStructuredText file and extract content/metadata.

108

109

Uses docutils for parsing with Pelican-specific settings and directives.

110

Supports custom roles and directives for enhanced functionality.

111

"""

112

```

113

114

### Markdown Reader

115

116

Reader for Markdown (.md, .markdown, .mkd) files using the Python-Markdown library with configurable extensions.

117

118

```python { .api }

119

class MarkdownReader(BaseReader):

120

"""

121

Markdown content reader.

122

123

Supports:

124

- Standard Markdown syntax

125

- Configurable Python-Markdown extensions

126

- Metadata extraction from YAML front matter or meta extension

127

- Code highlighting via Pygments

128

- Table support, footnotes, and other extensions

129

"""

130

131

file_extensions: list[str] = ['md', 'markdown', 'mkd']

132

133

def read(self, source_path: str) -> tuple[str, dict]:

134

"""

135

Parse Markdown file and extract content/metadata.

136

137

Uses Python-Markdown with configurable extensions.

138

Metadata can be extracted from YAML front matter or meta extension.

139

"""

140

```

141

142

### HTML Reader

143

144

Reader for HTML (.html, .htm) files that extracts metadata from HTML meta tags and preserves HTML content.

145

146

```python { .api }

147

class HTMLReader(BaseReader):

148

"""

149

HTML content reader.

150

151

Supports:

152

- Raw HTML content preservation

153

- Metadata extraction from HTML meta tags

154

- Title extraction from <title> tag

155

- Custom metadata via <meta> tags

156

"""

157

158

file_extensions: list[str] = ['html', 'htm']

159

160

def read(self, source_path: str) -> tuple[str, dict]:

161

"""

162

Parse HTML file and extract content/metadata.

163

164

Extracts metadata from HTML meta tags and preserves HTML content as-is.

165

Useful for importing existing HTML content or custom layouts.

166

"""

167

```

168

169

## Reader Configuration

170

171

### Markdown Configuration

172

173

Configure Markdown reader behavior in settings:

174

175

```python

176

# In pelicanconf.py

177

MARKDOWN = {

178

'extension_configs': {

179

'markdown.extensions.codehilite': {'css_class': 'highlight'},

180

'markdown.extensions.extra': {},

181

'markdown.extensions.meta': {},

182

'markdown.extensions.toc': {'permalink': True},

183

},

184

'output_format': 'html5',

185

}

186

```

187

188

### reStructuredText Configuration

189

190

Configure reStructuredText reader behavior:

191

192

```python

193

# In pelicanconf.py

194

DOCUTILS_SETTINGS = {

195

'smart_quotes': True,

196

'initial_header_level': 2,

197

'syntax_highlight': 'short',

198

'input_encoding': 'utf-8',

199

'math_output': 'MathJax',

200

}

201

```

202

203

### Custom Readers

204

205

Register custom readers for additional formats:

206

207

```python

208

# In pelicanconf.py

209

READERS = {

210

'txt': 'path.to.custom.TextReader',

211

'org': 'path.to.custom.OrgModeReader',

212

}

213

```

214

215

## Metadata Processing

216

217

### Common Metadata Fields

218

219

All readers process these standard metadata fields:

220

221

- `title`: Content title

222

- `date`: Publication date (ISO format or custom format)

223

- `modified`: Last modification date

224

- `category`: Content category (articles only)

225

- `tags`: Comma-separated tags (articles only)

226

- `slug`: URL slug (auto-generated if not provided)

227

- `author`: Author name

228

- `authors`: Multiple authors (comma-separated)

229

- `summary`: Content summary/description

230

- `lang`: Content language code

231

- `status`: Content status (published, draft, hidden)

232

- `template`: Custom template name

233

- `save_as`: Custom output file path

234

- `url`: Custom URL path

235

236

### Metadata Format Examples

237

238

#### Markdown with YAML Front Matter

239

240

```markdown

241

---

242

title: My Article Title

243

date: 2023-01-15 10:30

244

category: Python

245

tags: tutorial, programming

246

author: John Doe

247

summary: A comprehensive guide to Python programming.

248

---

249

250

# Article Content

251

252

Content goes here...

253

```

254

255

#### Markdown with Meta Extension

256

257

```markdown

258

Title: My Article Title

259

Date: 2023-01-15 10:30

260

Category: Python

261

Tags: tutorial, programming

262

Author: John Doe

263

Summary: A comprehensive guide to Python programming.

264

265

# Article Content

266

267

Content goes here...

268

```

269

270

#### reStructuredText

271

272

```rst

273

My Article Title

274

================

275

276

:date: 2023-01-15 10:30

277

:category: Python

278

:tags: tutorial, programming

279

:author: John Doe

280

:summary: A comprehensive guide to Python programming.

281

282

Article Content

283

---------------

284

285

Content goes here...

286

```

287

288

#### HTML

289

290

```html

291

<html>

292

<head>

293

<title>My Article Title</title>

294

<meta name="date" content="2023-01-15 10:30">

295

<meta name="category" content="Python">

296

<meta name="tags" content="tutorial, programming">

297

<meta name="author" content="John Doe">

298

<meta name="summary" content="A comprehensive guide to Python programming.">

299

</head>

300

<body>

301

<h1>Article Content</h1>

302

<p>Content goes here...</p>

303

</body>

304

</html>

305

```

306

307

## Usage Examples

308

309

### Using Readers Directly

310

311

```python

312

from pelican.readers import Readers

313

from pelican.settings import read_settings

314

315

# Load settings and create readers

316

settings = read_settings('pelicanconf.py')

317

readers = Readers(settings)

318

319

# Read a Markdown file

320

content = readers.read_file(

321

base_path='content',

322

path='articles/my-post.md',

323

content_class=Article

324

)

325

326

print(content.title) # Article title

327

print(content.content) # HTML content

328

print(content.metadata) # Raw metadata dictionary

329

```

330

331

### Custom Reader Implementation

332

333

```python

334

from pelican.readers import BaseReader

335

import json

336

337

class JsonReader(BaseReader):

338

"""Custom reader for JSON content files."""

339

340

file_extensions = ['json']

341

342

def read(self, source_path):

343

"""Read JSON file and extract content/metadata."""

344

with open(source_path, 'r', encoding='utf-8') as f:

345

data = json.load(f)

346

347

# Extract content and metadata

348

content = data.get('content', '')

349

metadata = {k: v for k, v in data.items() if k != 'content'}

350

351

# Process metadata using base class method

352

processed_metadata = {}

353

for name, value in metadata.items():

354

name, value = self.process_metadata(name, str(value))

355

processed_metadata[name] = value

356

357

return content, processed_metadata

358

359

# Register custom reader

360

# In pelicanconf.py:

361

# READERS = {'json': 'path.to.JsonReader'}

362

```

363

364

### Reader Integration with Generators

365

366

```python

367

from pelican.generators import Generator

368

369

class CustomGenerator(Generator):

370

"""Generator that uses readers to process content."""

371

372

def generate_context(self):

373

"""Generate content using readers."""

374

content_files = self.get_content_files()

375

376

for content_file in content_files:

377

# Use readers to parse file

378

content = self.readers.read_file(

379

base_path=self.path,

380

path=content_file,

381

content_class=Article

382

)

383

384

# Process content

385

self.process_content(content)

386

387

def get_content_files(self):

388

"""Get list of content files to process."""

389

# Implementation depends on file discovery strategy

390

return []

391

392

def process_content(self, content):

393

"""Process parsed content."""

394

# Add to context or perform custom processing

395

pass

396

```

397

398

### Metadata Processing Customization

399

400

```python

401

from pelican.readers import BaseReader

402

from datetime import datetime

403

404

class CustomReader(BaseReader):

405

"""Reader with custom metadata processing."""

406

407

def process_metadata(self, name, value):

408

"""Custom metadata processing logic."""

409

name, value = super().process_metadata(name, value)

410

411

# Custom date parsing

412

if name == 'date':

413

if isinstance(value, str):

414

try:

415

value = datetime.strptime(value, '%Y-%m-%d %H:%M')

416

except ValueError:

417

value = datetime.strptime(value, '%Y-%m-%d')

418

419

# Custom tag processing

420

elif name == 'tags':

421

if isinstance(value, str):

422

value = [tag.strip() for tag in value.split(',')]

423

424

return name, value

425

```