0
# Utilities
1
2
Helper functions and classes for content processing, URL handling, caching, date formatting, and file operations. These utilities support common tasks throughout the Pelican generation pipeline.
3
4
## Capabilities
5
6
### Text Processing
7
8
Functions for processing and formatting text content, including slug generation and HTML manipulation.
9
10
```python { .api }
11
def slugify(
12
value: str,
13
regex_subs: Iterable[tuple[str, str]] = (),
14
preserve_case: bool = False,
15
use_unicode: bool = False,
16
) -> str:
17
"""
18
Convert text to URL-safe slug.
19
20
Parameters:
21
- value (str): Text to convert to slug
22
- regex_subs (Iterable[tuple[str, str]], optional): Regex substitution pairs
23
- preserve_case (bool, optional): Whether to preserve original case
24
- use_unicode (bool, optional): Whether to allow unicode characters
25
26
Returns:
27
str: URL-safe slug with lowercase letters, numbers, and hyphens
28
"""
29
30
def truncate_html_words(s: str, num: int, end_text: str = "…") -> str:
31
"""
32
Truncate HTML content to specified number of words.
33
34
Parameters:
35
- s (str): HTML content to truncate
36
- num (int): Maximum number of words
37
- end_text (str, optional): Text to append when truncated
38
39
Returns:
40
str: Truncated HTML with proper tag closure
41
"""
42
43
def truncate_html_paragraphs(s: str, num: int) -> str:
44
"""
45
Truncate HTML content to specified number of paragraphs.
46
47
Parameters:
48
- s (str): HTML content to truncate
49
- num (int): Maximum number of paragraphs
50
51
Returns:
52
str: Truncated HTML with complete paragraphs
53
"""
54
```
55
56
### File Operations
57
58
Functions for file system operations, directory management, and file copying with metadata preservation.
59
60
```python { .api }
61
def clean_output_dir(path: str, retention: list = None) -> None:
62
"""
63
Clean output directory while preserving specified files.
64
65
Parameters:
66
- path (str): Output directory path to clean
67
- retention (list, optional): List of files/patterns to preserve
68
"""
69
70
def copy(source: str, destination: str, ignores: list = None) -> None:
71
"""
72
Copy files with metadata preservation and optional ignore patterns.
73
74
Parameters:
75
- source (str): Source file or directory path
76
- destination (str): Destination file or directory path
77
- ignores (list, optional): List of patterns to ignore during copy
78
"""
79
80
def mkdir_p(path: str) -> None:
81
"""
82
Create directory and parent directories as needed (like mkdir -p).
83
84
Parameters:
85
- path (str): Directory path to create
86
"""
87
```
88
89
### URL and Path Utilities
90
91
Functions for URL generation, path manipulation, and cross-platform path handling.
92
93
```python { .api }
94
def path_to_url(path: str) -> str:
95
"""
96
Convert file system path to URL path.
97
98
Parameters:
99
- path (str): File system path
100
101
Returns:
102
str: URL-formatted path with forward slashes
103
"""
104
105
def posixize_path(path: str) -> str:
106
"""
107
Convert path to POSIX format (forward slashes).
108
109
Parameters:
110
- path (str): File system path
111
112
Returns:
113
str: POSIX-formatted path
114
"""
115
116
def sanitised_join(base: str, *args: str) -> str:
117
"""
118
Safely join path components preventing directory traversal.
119
120
Parameters:
121
- base (str): Base directory path
122
- *args (str): Path components to join
123
124
Returns:
125
str: Safely joined path within base directory
126
"""
127
```
128
129
### Date and Time Utilities
130
131
Classes and functions for date processing, timezone handling, and date formatting.
132
133
```python { .api }
134
class SafeDatetime(datetime.datetime):
135
"""
136
Extended datetime class with safe operations and string representation.
137
138
Provides additional safety checks and consistent formatting for
139
date operations in content processing.
140
"""
141
142
class DateFormatter:
143
"""
144
Date formatting utility for Jinja2 templates.
145
146
Provides strftime formatting with locale support and timezone awareness.
147
"""
148
def __init__(self, locale: str = None): ...
149
150
def __call__(self, date, format_string: str) -> str:
151
"""
152
Format date using strftime with locale support.
153
154
Parameters:
155
- date: Date object to format
156
- format_string (str): strftime format string
157
158
Returns:
159
str: Formatted date string
160
"""
161
162
def set_date_tzinfo(date, default_tz) -> datetime.datetime:
163
"""
164
Set timezone info on date object with fallback to default timezone.
165
166
Parameters:
167
- date: Date object (may be naive or timezone-aware)
168
- default_tz: Default timezone to apply if date is naive
169
170
Returns:
171
datetime.datetime: Timezone-aware datetime object
172
"""
173
```
174
175
### Content Organization
176
177
Functions for sorting, organizing, and processing content collections.
178
179
```python { .api }
180
def order_content(content_list: list, order_by: str) -> list:
181
"""
182
Sort content list by specified criteria.
183
184
Parameters:
185
- content_list (list): List of content objects to sort
186
- order_by (str): Sort criteria ('date', 'title', 'basename', etc.)
187
188
Returns:
189
list: Sorted content list
190
"""
191
192
def process_translations(content_list: list, translation_id: str) -> None:
193
"""
194
Process and link content translations.
195
196
Parameters:
197
- content_list (list): List of content objects
198
- translation_id (str): Metadata field used for translation linking
199
"""
200
201
def maybe_pluralize(count: int, singular: str, plural: str = None) -> str:
202
"""
203
Return singular or plural form based on count.
204
205
Parameters:
206
- count (int): Number to check for pluralization
207
- singular (str): Singular form of word
208
- plural (str, optional): Plural form (defaults to singular + 's')
209
210
Returns:
211
str: Formatted string with count and appropriate word form
212
"""
213
```
214
215
### Caching Utilities
216
217
Classes for file-based caching to improve generation performance on large sites.
218
219
```python { .api }
220
class FileDataCacher:
221
"""
222
Base class for file-based data caching.
223
224
Provides caching functionality to avoid reprocessing unchanged files.
225
"""
226
def __init__(self, cache_path: str, cache_name: str = 'cache'): ...
227
228
def get_cached_data(self, path: str, fallback: callable) -> Any:
229
"""
230
Get cached data or compute using fallback function.
231
232
Parameters:
233
- path (str): File path for cache key
234
- fallback (callable): Function to compute data if not cached
235
236
Returns:
237
Any: Cached or computed data
238
"""
239
240
class FileStampDataCacher(FileDataCacher):
241
"""
242
File caching with timestamp-based invalidation.
243
244
Extends FileDataCacher with file modification time checking
245
for automatic cache invalidation.
246
"""
247
248
def should_update_cache(self, path: str, cache_key: str) -> bool:
249
"""
250
Check if cache should be updated based on file modification time.
251
252
Parameters:
253
- path (str): Source file path
254
- cache_key (str): Cache entry key
255
256
Returns:
257
bool: True if cache needs updating
258
"""
259
```
260
261
### Decorators and Helpers
262
263
Utility decorators and helper classes for common patterns.
264
265
```python { .api }
266
class memoized:
267
"""
268
Decorator for caching function results (memoization).
269
270
Caches function return values based on arguments to avoid
271
repeated expensive computations.
272
"""
273
def __init__(self, func: callable): ...
274
275
def __call__(self, *args, **kwargs): ...
276
277
def cache_clear(self) -> None:
278
"""Clear memoization cache."""
279
280
def deprecated_attribute(old: str, new: str, since: tuple):
281
"""
282
Decorator for marking class attributes as deprecated.
283
284
Parameters:
285
- old (str): Old attribute name
286
- new (str): New attribute name
287
- since (tuple): Version tuple when deprecation started
288
289
Returns:
290
property: Property that issues deprecation warning
291
"""
292
```
293
294
### File Monitoring
295
296
Utilities for monitoring file changes during development and auto-reload functionality.
297
298
```python { .api }
299
class FileChangeFilter:
300
"""
301
Filter for file change monitoring.
302
303
Filters file system events to relevant changes for site regeneration.
304
"""
305
def __init__(self, ignore_patterns: list = None): ...
306
307
def should_process(self, path: str, event_type: str) -> bool:
308
"""
309
Check if file change should trigger regeneration.
310
311
Parameters:
312
- path (str): Changed file path
313
- event_type (str): Type of file system event
314
315
Returns:
316
bool: True if change should trigger regeneration
317
"""
318
319
def wait_for_changes(settings_path: str, settings: dict) -> list:
320
"""
321
Wait for and detect file changes in content and theme directories.
322
323
Parameters:
324
- settings_path (str): Path to settings file
325
- settings (dict): Site settings dictionary
326
327
Returns:
328
list: List of changed files with metadata
329
"""
330
```
331
332
## Usage Examples
333
334
### Text Processing Examples
335
336
```python
337
from pelican.utils import slugify, truncate_html_words, truncate_html_paragraphs
338
339
# Generate URL-safe slugs
340
title = "My Article Title with Special Characters!"
341
slug = slugify(title) # "my-article-title-with-special-characters"
342
343
# Custom substitutions
344
slug = slugify("C++ Programming", substitutions=(('C++', 'cpp'),)) # "cpp-programming"
345
346
# Truncate HTML content
347
html_content = "<p>First paragraph.</p><p>Second paragraph with more text.</p>"
348
short_text = truncate_html_words(html_content, 5) # Truncates to 5 words
349
350
# Truncate by paragraphs
351
short_paragraphs = truncate_html_paragraphs(html_content, 1) # First paragraph only
352
```
353
354
### File Operations Examples
355
356
```python
357
from pelican.utils import clean_output_dir, copy, mkdir_p
358
359
# Clean output directory but keep certain files
360
clean_output_dir('output', retention=['.git', 'CNAME', '*.pdf'])
361
362
# Copy files with ignore patterns
363
copy('content/images', 'output/images', ignores=['*.tmp', '.DS_Store'])
364
365
# Create directory structure
366
mkdir_p('output/assets/css')
367
```
368
369
### URL and Path Examples
370
371
```python
372
from pelican.utils import path_to_url, posixize_path, sanitised_join
373
374
# Convert file paths to URLs
375
file_path = 'content\\articles\\my-post.md' # Windows path
376
url_path = path_to_url(file_path) # 'content/articles/my-post.md'
377
378
# Ensure POSIX format
379
posix_path = posixize_path(file_path) # 'content/articles/my-post.md'
380
381
# Safe path joining (prevents directory traversal)
382
safe_path = sanitised_join('/var/www', '../../../etc/passwd') # '/var/www/etc/passwd'
383
```
384
385
### Date Formatting Examples
386
387
```python
388
from pelican.utils import DateFormatter, set_date_tzinfo, SafeDatetime
389
from datetime import datetime
390
import pytz
391
392
# Create date formatter
393
formatter = DateFormatter('en_US')
394
395
# Format dates in templates (used automatically by Pelican)
396
date = datetime.now()
397
formatted = formatter(date, '%B %d, %Y') # "January 15, 2023"
398
399
# Handle timezone-naive dates
400
naive_date = datetime.now()
401
utc_tz = pytz.UTC
402
aware_date = set_date_tzinfo(naive_date, utc_tz)
403
404
# Safe datetime operations
405
safe_date = SafeDatetime.now()
406
print(safe_date) # Safe string representation
407
```
408
409
### Content Organization Examples
410
411
```python
412
from pelican.utils import order_content, process_translations, maybe_pluralize
413
414
# Sort articles by different criteria
415
articles = [article1, article2, article3]
416
sorted_by_date = order_content(articles, 'date')
417
sorted_by_title = order_content(articles, 'title')
418
sorted_by_reverse_date = order_content(articles, 'reversed-date')
419
420
# Process content translations
421
all_content = articles + pages
422
process_translations(all_content, 'slug')
423
424
# Pluralization helper
425
article_count = len(articles)
426
message = maybe_pluralize(article_count, 'article', 'articles')
427
# "5 articles" or "1 article"
428
```
429
430
### Caching Examples
431
432
```python
433
from pelican.utils import FileStampDataCacher
434
import os
435
436
class ContentProcessor(FileStampDataCacher):
437
"""Content processor with caching."""
438
439
def __init__(self, cache_path):
440
super().__init__(cache_path, 'content_cache')
441
442
def process_file(self, file_path):
443
"""Process file with caching."""
444
def expensive_processing():
445
# Expensive content processing
446
with open(file_path, 'r') as f:
447
content = f.read()
448
# ... complex processing ...
449
return processed_content
450
451
return self.get_cached_data(file_path, expensive_processing)
452
453
# Usage
454
processor = ContentProcessor('cache')
455
result = processor.process_file('content/article.md')
456
```
457
458
### Memoization Examples
459
460
```python
461
from pelican.utils import memoized
462
463
@memoized
464
def expensive_computation(x, y):
465
"""Expensive function that benefits from caching."""
466
print(f"Computing for {x}, {y}") # Only prints on first call
467
return x ** y + y ** x
468
469
# First call - computes and caches
470
result1 = expensive_computation(2, 3) # Prints "Computing for 2, 3"
471
472
# Second call - uses cached result
473
result2 = expensive_computation(2, 3) # No print, returns cached result
474
475
# Clear cache if needed
476
expensive_computation.cache_clear()
477
```
478
479
### File Monitoring Examples
480
481
```python
482
from pelican.utils import wait_for_changes, FileChangeFilter
483
484
# Monitor for file changes
485
def auto_regenerate(settings_path, settings):
486
"""Auto-regeneration loop."""
487
while True:
488
try:
489
# Wait for changes
490
changed_files = wait_for_changes(settings_path, settings)
491
492
if changed_files:
493
print(f"Detected changes: {[f[1] for f in changed_files]}")
494
# Trigger regeneration
495
regenerate_site(settings)
496
497
except KeyboardInterrupt:
498
break
499
500
# File change filtering
501
filter_obj = FileChangeFilter(ignore_patterns=['*.tmp', '*.swp', '.git/*'])
502
503
def should_regenerate(file_path, event_type):
504
"""Check if file change should trigger regeneration."""
505
return filter_obj.should_process(file_path, event_type)
506
```
507
508
### Custom Utility Development
509
510
```python
511
from pelican.utils import memoized
512
import re
513
514
class ContentEnhancer:
515
"""Custom utility for enhancing content."""
516
517
@memoized
518
def extract_headings(self, html_content):
519
"""Extract headings from HTML content with caching."""
520
headings = []
521
for match in re.finditer(r'<h([1-6])[^>]*>(.*?)</h\1>', html_content):
522
level = int(match.group(1))
523
title = re.sub(r'<[^>]+>', '', match.group(2)) # Strip HTML tags
524
headings.append({'level': level, 'title': title})
525
return headings
526
527
def generate_toc(self, html_content):
528
"""Generate table of contents from headings."""
529
headings = self.extract_headings(html_content)
530
531
toc_html = '<ul class="toc">'
532
for heading in headings:
533
indent = ' ' * (heading['level'] - 1)
534
toc_html += f'\n{indent}<li><a href="#{slugify(heading["title"])}">{heading["title"]}</a></li>'
535
toc_html += '\n</ul>'
536
537
return toc_html
538
539
# Usage in plugin or generator
540
enhancer = ContentEnhancer()
541
542
def add_toc_to_articles(generator):
543
"""Add table of contents to articles."""
544
for article in generator.articles:
545
if hasattr(article, 'content'):
546
article.toc = enhancer.generate_toc(article.content)
547
```