0
# Bibliography Processing
1
2
BibTeX file parsing, caching, and processing functionality built on pybtex. Handles reading, parsing, and managing bibliography data with efficient caching and dependency tracking.
3
4
## Capabilities
5
6
### Bibliography Data Management
7
8
Core data structures for managing bibliography file information and parsed data.
9
10
```python { .api }
11
class BibData(NamedTuple):
12
"""
13
Information about a collection of bib files.
14
15
Attributes:
16
encoding: Encoding used for all bib files
17
bibfiles: Map from file paths to file information
18
data: Parsed bibliography data from pybtex
19
"""
20
encoding: str
21
bibfiles: Dict[Path, BibFile]
22
data: BibliographyData # From pybtex
23
24
class BibFile(NamedTuple):
25
"""
26
Information about a parsed bib file.
27
28
Attributes:
29
mtime: File modification time when last parsed
30
keys: Set of citation keys in this file (as ordered dict)
31
"""
32
mtime: float
33
keys: Dict[str, None]
34
```
35
36
### File Processing Functions
37
38
Functions for parsing, caching, and managing bibliography files with dependency tracking.
39
40
```python { .api }
41
def parse_bibdata(bibfilenames: List[Path], encoding: str) -> BibData:
42
"""
43
Parse bibliography files with given encoding.
44
45
Parameters:
46
bibfilenames: List of bibliography file paths to parse
47
encoding: Text encoding to use when reading files
48
49
Returns:
50
Parsed bibliography data containing all entries
51
52
Raises:
53
BibliographyDataError: If bibliography files contain syntax errors
54
"""
55
56
def is_bibdata_outdated(
57
bibdata: BibData,
58
bibfilenames: List[Path],
59
encoding: str
60
) -> bool:
61
"""
62
Check if bibliography data needs to be reparsed.
63
64
Parameters:
65
bibdata: Current bibliography data
66
bibfilenames: List of current bibliography files
67
encoding: Current encoding setting
68
69
Returns:
70
True if data is outdated and needs reparsing
71
"""
72
73
def process_bibdata(
74
bibdata: BibData,
75
bibfilenames: List[Path],
76
encoding: str
77
) -> BibData:
78
"""
79
Parse bibliography files and store parsed data with caching.
80
81
Only reparses if files have changed, encoding changed, or file list changed.
82
83
Parameters:
84
bibdata: Previous bibliography data (for caching)
85
bibfilenames: List of bibliography file paths
86
encoding: Text encoding for files
87
88
Returns:
89
Updated bibliography data (may be cached if unchanged)
90
"""
91
92
def get_mtime(bibfilename: Path) -> float:
93
"""
94
Get file modification time, returning -inf if file doesn't exist.
95
96
Parameters:
97
bibfilename: Path to bibliography file
98
99
Returns:
100
Modification time as float, or -math.inf if file missing
101
"""
102
```
103
104
### Citation Target Processing
105
106
Functions for parsing and processing citation targets from role content.
107
108
```python { .api }
109
class CitationTarget(NamedTuple):
110
"""
111
Citation key with optional pre-text and post-text.
112
113
Attributes:
114
key: The bibliography key to cite
115
pre: Optional text to appear before citation
116
post: Optional text to appear after citation
117
"""
118
key: str
119
pre: str
120
post: str
121
122
def parse_citation_targets(targets: str, pos: int = 0) -> Iterable[CitationTarget]:
123
"""
124
Parse citation target string into citation keys with optional pre/post text.
125
126
Supports formats:
127
key - Simple key citation
128
{pre}key - Key with pre-text
129
key{post} - Key with post-text
130
{pre}key{post} - Key with both pre and post text
131
key1,key2,key3 - Multiple keys
132
133
Parameters:
134
targets: String containing citation targets
135
pos: Starting position for parsing (used internally for recursion)
136
137
Returns:
138
Iterable of CitationTarget objects
139
140
Raises:
141
ValueError: If citation target format is malformed
142
"""
143
```
144
145
### ID Generation
146
147
Utility functions for generating unique citation and bibliography IDs.
148
149
```python { .api }
150
def _make_ids(docname: str, lineno: int, ids: Set[str], raw_id: str) -> List[str]:
151
"""
152
Generate unique IDs for citations and bibliographies.
153
154
Parameters:
155
docname: Document name for warning location
156
lineno: Line number for warning location
157
ids: Set of existing IDs to check for conflicts
158
raw_id: Raw ID template string
159
160
Returns:
161
List containing the generated ID, or empty list if duplicate detected
162
163
Side Effects:
164
- Adds generated ID to ids set if successful
165
- Logs warning if duplicate ID detected
166
"""
167
```
168
169
## Usage Examples
170
171
### Basic File Processing
172
173
```python
174
from pathlib import Path
175
from sphinxcontrib.bibtex.bibfile import parse_bibdata, process_bibdata
176
177
# Parse bibliography files
178
bibfiles = [Path("refs.bib"), Path("additional.bib")]
179
bibdata = parse_bibdata(bibfiles, "utf-8")
180
181
# Access parsed data
182
for key, entry in bibdata.data.entries.items():
183
print(f"Key: {key}, Title: {entry.fields.get('title', 'No title')}")
184
```
185
186
### Caching with Process Function
187
188
```python
189
# First call - parses files
190
bibdata = process_bibdata(BibData("", {}, BibliographyData()), bibfiles, "utf-8")
191
192
# Second call - uses cache since files unchanged
193
bibdata = process_bibdata(bibdata, bibfiles, "utf-8") # Fast, uses cache
194
```
195
196
### Citation Target Parsing
197
198
```python
199
from sphinxcontrib.bibtex.citation_target import parse_citation_targets
200
201
# Simple keys
202
targets = list(parse_citation_targets("smith2020,jones2019"))
203
# [CitationTarget(key='smith2020', pre='', post=''),
204
# CitationTarget(key='jones2019', pre='', post='')]
205
206
# With pre/post text
207
targets = list(parse_citation_targets("{see}smith2020{for details}"))
208
# [CitationTarget(key='smith2020', pre='see', post='for details')]
209
```
210
211
### File Monitoring
212
213
```python
214
from sphinxcontrib.bibtex.bibfile import get_mtime, is_bibdata_outdated
215
216
# Check if files need reparsing
217
if is_bibdata_outdated(current_bibdata, bibfiles, "utf-8"):
218
print("Bibliography files have changed, reparsing...")
219
bibdata = parse_bibdata(bibfiles, "utf-8")
220
else:
221
print("Bibliography files unchanged, using cache")
222
```
223
224
## Error Handling
225
226
### File Errors
227
228
- **Missing files**: Logs warning and continues with empty entry set
229
- **Permission errors**: Treated as missing files with appropriate warnings
230
- **Encoding errors**: Logs warning about encoding issues
231
232
### Parsing Errors
233
234
- **BibTeX syntax errors**: Logs warning with file and error details, continues processing
235
- **Malformed entries**: Individual entries skipped with warnings
236
- **Duplicate keys**: Later entries override earlier ones, with optional warnings
237
238
### Citation Target Errors
239
240
- **Malformed syntax**: `ValueError` raised with descriptive message
241
- **Invalid characters**: Proper escaping required in target strings
242
- **Missing keys**: Handled at higher level with appropriate warnings
243
244
## Integration Points
245
246
### Sphinx Integration
247
248
- Automatic file dependency tracking via `env.note_dependency()`
249
- Configuration integration through `bibtex_bibfiles` and `bibtex_encoding`
250
- Caching integrated with Sphinx's build environment
251
252
### Pybtex Integration
253
254
- Uses pybtex `Parser` class for BibTeX parsing
255
- Returns pybtex `BibliographyData` objects
256
- Compatible with pybtex entry and field structures
257
- Supports all pybtex-supported BibTeX features