Tessl Tile for pypi/bibtexparser@1.4.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

advanced-parsing.md advanced-writing.md basic-operations.md bibtex-expression.md data-model.md entry-customization.md index.md latex-encoding.md

entry-customization.mddocs/

0
# Entry Customization and Processing
1

2
Collection of functions for customizing and processing bibliographic entries including name parsing, field normalization, LaTeX encoding conversion, and specialized field handling. These functions are designed to be used as customization callbacks during parsing or for post-processing entries.
3

4
## Capabilities
5

6
### Author and Editor Processing
7

8
Functions for processing and formatting author and editor names with support for various name formats and structured output.
9

10
```python { .api }
11
def author(record: dict) -> dict:
12
    """
13
    Split author field into a list of formatted names.
14
    
15
    Processes the 'author' field by splitting on ' and ' delimiter and
16
    formatting each name as "Last, First" format.
17
    
18
    Parameters:
19
    - record (dict): Entry dictionary to process
20
    
21
    Returns:
22
    dict: Modified record with author field as list of formatted names
23
    
24
    Note: Removes empty author fields. Handles newlines in author strings.
25
    """
26

27
def editor(record: dict) -> dict:
28
    """
29
    Process editor field into structured objects with names and IDs.
30
    
31
    Similar to author processing but creates objects with 'name' and 'ID'
32
    fields for each editor, where ID is a sanitized version of the name.
33
    
34
    Parameters:
35
    - record (dict): Entry dictionary to process
36
    
37
    Returns:
38
    dict: Modified record with editor field as list of editor objects
39
    """
40
```
41

42
### Name Parsing and Formatting
43

44
Advanced name parsing functions that handle complex name formats according to BibTeX conventions.
45

46
```python { .api }
47
def splitname(name: str, strict_mode: bool = True) -> dict:
48
    """
49
    Break a name into its constituent parts: First, von, Last, and Jr.
50
    
51
    Parses names according to BibTeX conventions supporting three formats:
52
    - First von Last
53
    - von Last, First  
54
    - von Last, Jr, First
55
    
56
    Parameters:
57
    - name (str): Single name string to parse
58
    - strict_mode (bool): If True, raise exceptions on invalid names
59
    
60
    Returns:
61
    dict: Dictionary with keys 'first', 'last', 'von', 'jr' (each a list of words)
62
    
63
    Raises:
64
    InvalidName: If name is invalid and strict_mode=True
65
    """
66

67
def getnames(names: list) -> list:
68
    """
69
    Convert list of name strings to "surname, firstnames" format.
70
    
71
    Parameters:
72
    - names (list): List of name strings
73
    
74
    Returns:
75
    list: List of formatted names in "Last, First" format
76
    
77
    Note: Simplified implementation, may not handle all complex cases
78
    """
79

80
def find_matching(
81
    text: str, 
82
    opening: str, 
83
    closing: str, 
84
    ignore_escaped: bool = True
85
) -> dict:
86
    """
87
    Find matching bracket pairs in text.
88
    
89
    Parameters:
90
    - text (str): Text to search
91
    - opening (str): Opening bracket character
92
    - closing (str): Closing bracket character  
93
    - ignore_escaped (bool): Ignore escaped brackets
94
    
95
    Returns:
96
    dict: Mapping of opening positions to closing positions
97
    
98
    Raises:
99
    IndexError: If brackets are unmatched
100
    """
101
```
102

103
### Field Processing and Normalization
104

105
Functions for processing and normalizing specific bibliographic fields.
106

107
```python { .api }
108
def journal(record: dict) -> dict:
109
    """
110
    Convert journal field into structured object with name and ID.
111
    
112
    Parameters:
113
    - record (dict): Entry dictionary to process
114
    
115
    Returns:
116
    dict: Modified record with journal as object containing 'name' and 'ID'
117
    """
118

119
def keyword(record: dict, sep: str = ',|;') -> dict:
120
    """
121
    Split keyword field into a list using specified separators.
122
    
123
    Parameters:
124
    - record (dict): Entry dictionary to process
125
    - sep (str): Regular expression pattern for separators
126
    
127
    Returns:
128
    dict: Modified record with keyword field as list of keywords
129
    """
130

131
def link(record: dict) -> dict:
132
    """
133
    Process link field into structured objects.
134
    
135
    Parses link field lines into objects with 'url', 'anchor', and 'format' fields.
136
    
137
    Parameters:
138
    - record (dict): Entry dictionary to process
139
    
140
    Returns:
141
    dict: Modified record with link field as list of link objects
142
    """
143

144
def page_double_hyphen(record: dict) -> dict:
145
    """
146
    Normalize page ranges to use double hyphens.
147
    
148
    Converts various hyphen types in page ranges to standard double hyphen (--).
149
    
150
    Parameters:
151
    - record (dict): Entry dictionary to process
152
    
153
    Returns:
154
    dict: Modified record with normalized page field
155
    """
156

157
def type(record: dict) -> dict:
158
    """
159
    Convert type field to lowercase.
160
    
161
    Parameters:
162
    - record (dict): Entry dictionary to process
163
    
164
    Returns:
165
    dict: Modified record with lowercase type field
166
    """
167

168
def doi(record: dict) -> dict:
169
    """
170
    Process DOI field and add to links.
171
    
172
    Converts DOI to URL format and adds to link field if not already present.
173
    
174
    Parameters:
175
    - record (dict): Entry dictionary to process
176
    
177
    Returns:
178
    dict: Modified record with DOI added to links
179
    """
180
```
181

182
### LaTeX and Unicode Conversion
183

184
Functions for converting between LaTeX encoding and Unicode in bibliographic data.
185

186
```python { .api }
187
def convert_to_unicode(record: dict) -> dict:
188
    """
189
    Convert LaTeX accents and encoding to Unicode throughout record.
190
    
191
    Processes all string fields, lists, and dictionary values in the record
192
    to convert LaTeX-encoded special characters to Unicode equivalents.
193
    
194
    Parameters:
195
    - record (dict): Entry dictionary to process
196
    
197
    Returns:
198
    dict: Modified record with Unicode characters
199
    """
200

201
def homogenize_latex_encoding(record: dict) -> dict:
202
    """
203
    Homogenize LaTeX encoding style for BibTeX output.
204
    
205
    First converts to Unicode, then converts back to consistent LaTeX encoding.
206
    Protects uppercase letters in title field.
207
    
208
    Parameters:
209
    - record (dict): Entry dictionary to process
210
    
211
    Returns:
212
    dict: Modified record with homogenized LaTeX encoding
213
    
214
    Note: Experimental function, may have limitations
215
    """
216

217
def add_plaintext_fields(record: dict) -> dict:
218
    """
219
    Add plaintext versions of all fields with 'plain_' prefix.
220
    
221
    Creates additional fields with braces and special characters removed
222
    for easier text processing and searching.
223
    
224
    Parameters:
225
    - record (dict): Entry dictionary to process
226
    
227
    Returns:
228
    dict: Modified record with additional plain_* fields
229
    """
230
```
231

232
### Exception Classes
233

234
Exception classes for handling errors in name processing.
235

236
```python { .api }
237
class InvalidName(ValueError):
238
    """
239
    Exception raised by splitname() when an invalid name is encountered.
240
    
241
    Used when strict_mode=True and name cannot be parsed according to
242
    BibTeX naming conventions.
243
    """
244
    pass
245
```
246

247
## Usage Examples
248

249
### Basic Entry Customization
250

251
```python
252
import bibtexparser
253
from bibtexparser import customization
254

255
def my_customization(record):
256
    """Custom function to process entries during parsing."""
257
    # Process author names
258
    record = customization.author(record)
259
    
260
    # Convert journal to structured format
261
    record = customization.journal(record)
262
    
263
    # Split keywords
264
    record = customization.keyword(record)
265
    
266
    # Convert LaTeX to Unicode
267
    record = customization.convert_to_unicode(record)
268
    
269
    return record
270

271
# Use with parser
272
parser = bibtexparser.bparser.BibTexParser(customization=my_customization)
273
with open('bibliography.bib') as f:
274
    db = parser.parse_file(f)
275
```
276

277
### Post-processing Entries
278

279
```python
280
from bibtexparser import customization
281

282
# Load database normally
283
with open('bibliography.bib') as f:
284
    db = bibtexparser.load(f)
285

286
# Apply customizations to all entries
287
for entry in db.entries:
288
    entry = customization.author(entry)
289
    entry = customization.page_double_hyphen(entry)
290
    entry = customization.doi(entry)
291
```
292

293
### Name Processing Examples
294

295
```python
296
from bibtexparser.customization import splitname, getnames
297

298
# Parse individual names
299
name_parts = splitname("Jean-Baptiste von Neumann, Jr.")
300
print(name_parts)
301
# {'first': ['Jean-Baptiste'], 'von': ['von'], 'last': ['Neumann'], 'jr': ['Jr.']}
302

303
# Format multiple names
304
authors = ["Einstein, Albert", "Newton, Isaac", "Curie, Marie"]
305
formatted = getnames(authors)
306
print(formatted)
307
# ['Einstein, Albert', 'Newton, Isaac', 'Curie, Marie']
308
```
309

310
### LaTeX Conversion Examples
311

312
```python
313
from bibtexparser.customization import convert_to_unicode, homogenize_latex_encoding
314

315
# Sample entry with LaTeX encoding
316
entry = {
317
    'title': 'Schr{\\"o}dinger\\'s Cat',
318
    'author': 'Erwin Schr{\\"o}dinger'
319
}
320

321
# Convert to Unicode
322
unicode_entry = convert_to_unicode(entry.copy())
323
print(unicode_entry['title'])  # Schrödinger's Cat
324

325
# Homogenize LaTeX encoding
326
latex_entry = homogenize_latex_encoding(entry.copy())
327
print(latex_entry['title'])  # Consistent LaTeX format
328
```
329

330
### Field Processing Examples
331

332
```python
333
from bibtexparser.customization import keyword, link, journal
334

335
# Process keywords
336
entry = {'keyword': 'physics; quantum mechanics, uncertainty'}
337
entry = keyword(entry, sep=';|,')
338
print(entry['keyword'])  # ['physics', 'quantum mechanics', 'uncertainty']
339

340
# Process journal
341
entry = {'journal': 'Nature Physics'}
342
entry = journal(entry)
343
print(entry['journal'])  # {'name': 'Nature Physics', 'ID': 'NaturePhysics'}
344

345
# Process links
346
entry = {'link': 'https://example.com PDF article\nhttps://doi.org/10.1000/123 DOI'}
347
entry = link(entry)
348
print(entry['link'])
349
# [{'url': 'https://example.com', 'anchor': 'PDF', 'format': 'article'},
350
#  {'url': 'https://doi.org/10.1000/123', 'anchor': 'DOI'}]
351
```
352

353
### Creating Custom Processing Functions
354

355
```python
356
def custom_year_processor(record):
357
    """Custom function to process year field."""
358
    if 'year' in record:
359
        year = record['year']
360
        # Convert to integer if possible
361
        try:
362
            record['year_int'] = int(year)
363
        except ValueError:
364
            record['year_int'] = None
365
            
366
        # Add century field
367
        if record['year_int']:
368
            record['century'] = (record['year_int'] - 1) // 100 + 1
369
    
370
    return record
371

372
def comprehensive_customization(record):
373
    """Comprehensive processing pipeline."""
374
    # Apply built-in customizations
375
    record = customization.author(record)
376
    record = customization.editor(record)
377
    record = customization.journal(record)
378
    record = customization.keyword(record)
379
    record = customization.doi(record)
380
    record = customization.page_double_hyphen(record)
381
    record = customization.convert_to_unicode(record)
382
    
383
    # Apply custom processing
384
    record = custom_year_processor(record)
385
    
386
    # Add plaintext fields for searching
387
    record = customization.add_plaintext_fields(record)
388
    
389
    return record
390
```

Version

Tile

Files

entry-customization.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

entry-customization.mddocs/