0
# Advanced Parsing Configuration
1
2
Configurable parser with extensive options for handling various BibTeX formats, non-standard entries, field processing, and customization hooks. The BibTexParser class provides fine-grained control over the parsing process.
3
4
## Capabilities
5
6
### Parser Configuration
7
8
The BibTexParser class provides comprehensive configuration options for customizing the parsing behavior to handle different BibTeX variants and requirements.
9
10
```python { .api }
11
class BibTexParser:
12
"""
13
A parser for reading BibTeX bibliographic data files.
14
15
Provides extensive configuration options for customizing parsing behavior
16
including entry filtering, field processing, string handling, and
17
cross-reference resolution.
18
"""
19
20
def __init__(
21
self,
22
customization=None,
23
ignore_nonstandard_types: bool = True,
24
homogenize_fields: bool = False,
25
interpolate_strings: bool = True,
26
common_strings: bool = True,
27
add_missing_from_crossref: bool = False
28
):
29
"""
30
Create a configurable BibTeX parser.
31
32
Parameters:
33
- customization (callable, optional): Function to process entries after parsing
34
- ignore_nonstandard_types (bool): If True, ignore non-standard entry types
35
- homogenize_fields (bool): If True, normalize field names (e.g., 'url' to 'link')
36
- interpolate_strings (bool): If True, replace string references with values
37
- common_strings (bool): If True, include common month abbreviations
38
- add_missing_from_crossref (bool): If True, resolve crossref dependencies
39
40
Returns:
41
BibTexParser: Configured parser instance
42
"""
43
```
44
45
### String Parsing
46
47
Parse BibTeX data from strings with full configuration control and error handling options.
48
49
```python { .api }
50
def parse(self, bibtex_str: str, partial: bool = False) -> BibDatabase:
51
"""
52
Parse a BibTeX string into a BibDatabase object.
53
54
Parameters:
55
- bibtex_str (str): BibTeX string to parse
56
- partial (bool): If True, continue parsing on errors; if False, raise exceptions
57
58
Returns:
59
BibDatabase: Parsed bibliographic database
60
61
Raises:
62
ParseException: If parsing fails and partial=False
63
"""
64
```
65
66
### File Parsing
67
68
Parse BibTeX data from file objects with the same configuration and error handling as string parsing.
69
70
```python { .api }
71
def parse_file(self, file, partial: bool = False) -> BibDatabase:
72
"""
73
Parse a BibTeX file into a BibDatabase object.
74
75
Parameters:
76
- file (file): File object to parse
77
- partial (bool): If True, continue parsing on errors; if False, raise exceptions
78
79
Returns:
80
BibDatabase: Parsed bibliographic database
81
82
Raises:
83
ParseException: If parsing fails and partial=False
84
"""
85
```
86
87
### Convenience Function
88
89
Module-level convenience function for quick parsing with custom configuration.
90
91
```python { .api }
92
def parse(data: str, *args, **kwargs) -> BibDatabase:
93
"""
94
Convenience function for parsing BibTeX data.
95
96
Creates a BibTexParser with the provided arguments and parses the data.
97
98
Parameters:
99
- data (str): BibTeX string to parse
100
- *args, **kwargs: Arguments passed to BibTexParser constructor
101
102
Returns:
103
BibDatabase: Parsed bibliographic database
104
"""
105
```
106
107
## Configuration Options
108
109
### Entry Type Handling
110
111
Control how the parser handles different BibTeX entry types:
112
113
```python
114
from bibtexparser.bparser import BibTexParser
115
116
# Allow non-standard entry types (like @software, @dataset)
117
parser = BibTexParser(ignore_nonstandard_types=False)
118
119
# Only accept standard BibTeX types (article, book, etc.)
120
parser = BibTexParser(ignore_nonstandard_types=True) # Default
121
```
122
123
### Field Processing
124
125
Configure how fields are processed and normalized:
126
127
```python
128
# Homogenize field names (e.g., 'url' -> 'link', 'keywords' -> 'keyword')
129
parser = BibTexParser(homogenize_fields=True)
130
131
# Keep original field names
132
parser = BibTexParser(homogenize_fields=False) # Default
133
```
134
135
### String Handling
136
137
Control how BibTeX string definitions are processed:
138
139
```python
140
# Replace string references with their values
141
parser = BibTexParser(interpolate_strings=True) # Default
142
143
# Keep string structure for later processing
144
parser = BibTexParser(interpolate_strings=False)
145
146
# Include common month abbreviations (jan, feb, etc.)
147
parser = BibTexParser(common_strings=True) # Default
148
149
# Don't include common strings
150
parser = BibTexParser(common_strings=False)
151
```
152
153
### Cross-reference Resolution
154
155
Enable automatic resolution of crossref dependencies:
156
157
```python
158
# Resolve crossref fields and merge referenced entries
159
parser = BibTexParser(add_missing_from_crossref=True)
160
161
# Keep crossref fields as-is
162
parser = BibTexParser(add_missing_from_crossref=False) # Default
163
```
164
165
## Usage Examples
166
167
### Custom Entry Processing
168
169
```python
170
from bibtexparser.bparser import BibTexParser
171
172
def customize_entries(record):
173
"""Custom function to process entries during parsing."""
174
# Convert author names to "Last, First" format
175
if 'author' in record:
176
# Apply author processing
177
record = bibtexparser.customization.author(record)
178
179
# Convert LaTeX to Unicode
180
record = bibtexparser.customization.convert_to_unicode(record)
181
182
return record
183
184
parser = BibTexParser(customization=customize_entries)
185
186
with open('bibliography.bib') as bibtex_file:
187
bib_database = parser.parse_file(bibtex_file)
188
```
189
190
### Robust Parsing with Error Handling
191
192
```python
193
from bibtexparser.bparser import BibTexParser
194
195
# Configure parser for maximum compatibility
196
parser = BibTexParser(
197
ignore_nonstandard_types=False, # Accept all entry types
198
homogenize_fields=True, # Normalize field names
199
common_strings=True, # Include month abbreviations
200
add_missing_from_crossref=True # Resolve crossrefs
201
)
202
203
try:
204
with open('messy_bibliography.bib') as bibtex_file:
205
# Use partial=True to continue parsing on errors
206
bib_database = parser.parse_file(bibtex_file, partial=True)
207
print(f"Parsed {len(bib_database.entries)} entries")
208
except Exception as e:
209
print(f"Parsing failed: {e}")
210
```
211
212
### Multiple Parsing Passes
213
214
```python
215
from bibtexparser.bparser import BibTexParser
216
217
# Create parser that can be used multiple times
218
parser = BibTexParser()
219
parser.expect_multiple_parse = True # Disable warning
220
221
# Parse multiple files into the same database
222
for filename in ['refs1.bib', 'refs2.bib', 'refs3.bib']:
223
with open(filename) as bibtex_file:
224
bib_database = parser.parse_file(bibtex_file)
225
226
print(f"Total entries: {len(parser.bib_database.entries)}")
227
```
228
229
### Configuration for Different BibTeX Dialects
230
231
```python
232
# Configuration for strict academic BibTeX
233
academic_parser = BibTexParser(
234
ignore_nonstandard_types=True,
235
homogenize_fields=False,
236
interpolate_strings=True,
237
common_strings=True
238
)
239
240
# Configuration for modern/extended BibTeX
241
modern_parser = BibTexParser(
242
ignore_nonstandard_types=False, # Allow @software, @online, etc.
243
homogenize_fields=True, # Normalize field names
244
interpolate_strings=True,
245
common_strings=True,
246
add_missing_from_crossref=True # Handle complex references
247
)
248
```