0
# Entry Customization and Processing
1
2
Collection of functions for customizing and processing bibliographic entries including name parsing, field normalization, LaTeX encoding conversion, and specialized field handling. These functions are designed to be used as customization callbacks during parsing or for post-processing entries.
3
4
## Capabilities
5
6
### Author and Editor Processing
7
8
Functions for processing and formatting author and editor names with support for various name formats and structured output.
9
10
```python { .api }
11
def author(record: dict) -> dict:
12
"""
13
Split author field into a list of formatted names.
14
15
Processes the 'author' field by splitting on ' and ' delimiter and
16
formatting each name as "Last, First" format.
17
18
Parameters:
19
- record (dict): Entry dictionary to process
20
21
Returns:
22
dict: Modified record with author field as list of formatted names
23
24
Note: Removes empty author fields. Handles newlines in author strings.
25
"""
26
27
def editor(record: dict) -> dict:
28
"""
29
Process editor field into structured objects with names and IDs.
30
31
Similar to author processing but creates objects with 'name' and 'ID'
32
fields for each editor, where ID is a sanitized version of the name.
33
34
Parameters:
35
- record (dict): Entry dictionary to process
36
37
Returns:
38
dict: Modified record with editor field as list of editor objects
39
"""
40
```
41
42
### Name Parsing and Formatting
43
44
Advanced name parsing functions that handle complex name formats according to BibTeX conventions.
45
46
```python { .api }
47
def splitname(name: str, strict_mode: bool = True) -> dict:
48
"""
49
Break a name into its constituent parts: First, von, Last, and Jr.
50
51
Parses names according to BibTeX conventions supporting three formats:
52
- First von Last
53
- von Last, First
54
- von Last, Jr, First
55
56
Parameters:
57
- name (str): Single name string to parse
58
- strict_mode (bool): If True, raise exceptions on invalid names
59
60
Returns:
61
dict: Dictionary with keys 'first', 'last', 'von', 'jr' (each a list of words)
62
63
Raises:
64
InvalidName: If name is invalid and strict_mode=True
65
"""
66
67
def getnames(names: list) -> list:
68
"""
69
Convert list of name strings to "surname, firstnames" format.
70
71
Parameters:
72
- names (list): List of name strings
73
74
Returns:
75
list: List of formatted names in "Last, First" format
76
77
Note: Simplified implementation, may not handle all complex cases
78
"""
79
80
def find_matching(
81
text: str,
82
opening: str,
83
closing: str,
84
ignore_escaped: bool = True
85
) -> dict:
86
"""
87
Find matching bracket pairs in text.
88
89
Parameters:
90
- text (str): Text to search
91
- opening (str): Opening bracket character
92
- closing (str): Closing bracket character
93
- ignore_escaped (bool): Ignore escaped brackets
94
95
Returns:
96
dict: Mapping of opening positions to closing positions
97
98
Raises:
99
IndexError: If brackets are unmatched
100
"""
101
```
102
103
### Field Processing and Normalization
104
105
Functions for processing and normalizing specific bibliographic fields.
106
107
```python { .api }
108
def journal(record: dict) -> dict:
109
"""
110
Convert journal field into structured object with name and ID.
111
112
Parameters:
113
- record (dict): Entry dictionary to process
114
115
Returns:
116
dict: Modified record with journal as object containing 'name' and 'ID'
117
"""
118
119
def keyword(record: dict, sep: str = ',|;') -> dict:
120
"""
121
Split keyword field into a list using specified separators.
122
123
Parameters:
124
- record (dict): Entry dictionary to process
125
- sep (str): Regular expression pattern for separators
126
127
Returns:
128
dict: Modified record with keyword field as list of keywords
129
"""
130
131
def link(record: dict) -> dict:
132
"""
133
Process link field into structured objects.
134
135
Parses link field lines into objects with 'url', 'anchor', and 'format' fields.
136
137
Parameters:
138
- record (dict): Entry dictionary to process
139
140
Returns:
141
dict: Modified record with link field as list of link objects
142
"""
143
144
def page_double_hyphen(record: dict) -> dict:
145
"""
146
Normalize page ranges to use double hyphens.
147
148
Converts various hyphen types in page ranges to standard double hyphen (--).
149
150
Parameters:
151
- record (dict): Entry dictionary to process
152
153
Returns:
154
dict: Modified record with normalized page field
155
"""
156
157
def type(record: dict) -> dict:
158
"""
159
Convert type field to lowercase.
160
161
Parameters:
162
- record (dict): Entry dictionary to process
163
164
Returns:
165
dict: Modified record with lowercase type field
166
"""
167
168
def doi(record: dict) -> dict:
169
"""
170
Process DOI field and add to links.
171
172
Converts DOI to URL format and adds to link field if not already present.
173
174
Parameters:
175
- record (dict): Entry dictionary to process
176
177
Returns:
178
dict: Modified record with DOI added to links
179
"""
180
```
181
182
### LaTeX and Unicode Conversion
183
184
Functions for converting between LaTeX encoding and Unicode in bibliographic data.
185
186
```python { .api }
187
def convert_to_unicode(record: dict) -> dict:
188
"""
189
Convert LaTeX accents and encoding to Unicode throughout record.
190
191
Processes all string fields, lists, and dictionary values in the record
192
to convert LaTeX-encoded special characters to Unicode equivalents.
193
194
Parameters:
195
- record (dict): Entry dictionary to process
196
197
Returns:
198
dict: Modified record with Unicode characters
199
"""
200
201
def homogenize_latex_encoding(record: dict) -> dict:
202
"""
203
Homogenize LaTeX encoding style for BibTeX output.
204
205
First converts to Unicode, then converts back to consistent LaTeX encoding.
206
Protects uppercase letters in title field.
207
208
Parameters:
209
- record (dict): Entry dictionary to process
210
211
Returns:
212
dict: Modified record with homogenized LaTeX encoding
213
214
Note: Experimental function, may have limitations
215
"""
216
217
def add_plaintext_fields(record: dict) -> dict:
218
"""
219
Add plaintext versions of all fields with 'plain_' prefix.
220
221
Creates additional fields with braces and special characters removed
222
for easier text processing and searching.
223
224
Parameters:
225
- record (dict): Entry dictionary to process
226
227
Returns:
228
dict: Modified record with additional plain_* fields
229
"""
230
```
231
232
### Exception Classes
233
234
Exception classes for handling errors in name processing.
235
236
```python { .api }
237
class InvalidName(ValueError):
238
"""
239
Exception raised by splitname() when an invalid name is encountered.
240
241
Used when strict_mode=True and name cannot be parsed according to
242
BibTeX naming conventions.
243
"""
244
pass
245
```
246
247
## Usage Examples
248
249
### Basic Entry Customization
250
251
```python
252
import bibtexparser
253
from bibtexparser import customization
254
255
def my_customization(record):
256
"""Custom function to process entries during parsing."""
257
# Process author names
258
record = customization.author(record)
259
260
# Convert journal to structured format
261
record = customization.journal(record)
262
263
# Split keywords
264
record = customization.keyword(record)
265
266
# Convert LaTeX to Unicode
267
record = customization.convert_to_unicode(record)
268
269
return record
270
271
# Use with parser
272
parser = bibtexparser.bparser.BibTexParser(customization=my_customization)
273
with open('bibliography.bib') as f:
274
db = parser.parse_file(f)
275
```
276
277
### Post-processing Entries
278
279
```python
280
from bibtexparser import customization
281
282
# Load database normally
283
with open('bibliography.bib') as f:
284
db = bibtexparser.load(f)
285
286
# Apply customizations to all entries
287
for entry in db.entries:
288
entry = customization.author(entry)
289
entry = customization.page_double_hyphen(entry)
290
entry = customization.doi(entry)
291
```
292
293
### Name Processing Examples
294
295
```python
296
from bibtexparser.customization import splitname, getnames
297
298
# Parse individual names
299
name_parts = splitname("Jean-Baptiste von Neumann, Jr.")
300
print(name_parts)
301
# {'first': ['Jean-Baptiste'], 'von': ['von'], 'last': ['Neumann'], 'jr': ['Jr.']}
302
303
# Format multiple names
304
authors = ["Einstein, Albert", "Newton, Isaac", "Curie, Marie"]
305
formatted = getnames(authors)
306
print(formatted)
307
# ['Einstein, Albert', 'Newton, Isaac', 'Curie, Marie']
308
```
309
310
### LaTeX Conversion Examples
311
312
```python
313
from bibtexparser.customization import convert_to_unicode, homogenize_latex_encoding
314
315
# Sample entry with LaTeX encoding
316
entry = {
317
'title': 'Schr{\\"o}dinger\\'s Cat',
318
'author': 'Erwin Schr{\\"o}dinger'
319
}
320
321
# Convert to Unicode
322
unicode_entry = convert_to_unicode(entry.copy())
323
print(unicode_entry['title']) # Schrödinger's Cat
324
325
# Homogenize LaTeX encoding
326
latex_entry = homogenize_latex_encoding(entry.copy())
327
print(latex_entry['title']) # Consistent LaTeX format
328
```
329
330
### Field Processing Examples
331
332
```python
333
from bibtexparser.customization import keyword, link, journal
334
335
# Process keywords
336
entry = {'keyword': 'physics; quantum mechanics, uncertainty'}
337
entry = keyword(entry, sep=';|,')
338
print(entry['keyword']) # ['physics', 'quantum mechanics', 'uncertainty']
339
340
# Process journal
341
entry = {'journal': 'Nature Physics'}
342
entry = journal(entry)
343
print(entry['journal']) # {'name': 'Nature Physics', 'ID': 'NaturePhysics'}
344
345
# Process links
346
entry = {'link': 'https://example.com PDF article\nhttps://doi.org/10.1000/123 DOI'}
347
entry = link(entry)
348
print(entry['link'])
349
# [{'url': 'https://example.com', 'anchor': 'PDF', 'format': 'article'},
350
# {'url': 'https://doi.org/10.1000/123', 'anchor': 'DOI'}]
351
```
352
353
### Creating Custom Processing Functions
354
355
```python
356
def custom_year_processor(record):
357
"""Custom function to process year field."""
358
if 'year' in record:
359
year = record['year']
360
# Convert to integer if possible
361
try:
362
record['year_int'] = int(year)
363
except ValueError:
364
record['year_int'] = None
365
366
# Add century field
367
if record['year_int']:
368
record['century'] = (record['year_int'] - 1) // 100 + 1
369
370
return record
371
372
def comprehensive_customization(record):
373
"""Comprehensive processing pipeline."""
374
# Apply built-in customizations
375
record = customization.author(record)
376
record = customization.editor(record)
377
record = customization.journal(record)
378
record = customization.keyword(record)
379
record = customization.doi(record)
380
record = customization.page_double_hyphen(record)
381
record = customization.convert_to_unicode(record)
382
383
# Apply custom processing
384
record = custom_year_processor(record)
385
386
# Add plaintext fields for searching
387
record = customization.add_plaintext_fields(record)
388
389
return record
390
```