0
# Citation Processing
1
2
Citation parsing and processing functionality that extracts and parses citation syntax from markdown content. This module handles the parsing of citation blocks, inline references, and provides data structures for representing citations.
3
4
## Capabilities
5
6
### Citation Class
7
8
Represents a single citation in raw markdown format without any formatting applied.
9
10
```python { .api }
11
@dataclass
12
class Citation:
13
"""Represents a citation in raw markdown without formatting"""
14
15
key: str # The citation key (without @ symbol)
16
prefix: str = "" # Text before the citation key
17
suffix: str = "" # Text after the citation key
18
19
def __str__(self) -> str:
20
"""
21
String representation of the citation.
22
23
Returns:
24
str: Formatted citation string with prefix, @key, and suffix
25
"""
26
27
@classmethod
28
def from_markdown(cls, markdown: str) -> list["Citation"]:
29
"""
30
Extracts citations from a markdown string.
31
32
Args:
33
markdown (str): Markdown text containing citations
34
35
Returns:
36
list[Citation]: List of parsed Citation objects
37
38
Note:
39
Filters out email addresses to avoid false matches
40
"""
41
```
42
43
### CitationBlock Class
44
45
Represents a block of citations enclosed in square brackets, which may contain multiple citations separated by semicolons.
46
47
```python { .api }
48
@dataclass
49
class CitationBlock:
50
"""Represents a block of citations in square brackets"""
51
52
citations: list[Citation] # List of citations in this block
53
raw: str = "" # Raw markdown text of the block
54
55
def __str__(self) -> str:
56
"""
57
String representation of the citation block.
58
59
Returns:
60
str: Formatted citation block with square brackets
61
"""
62
63
@classmethod
64
def from_markdown(cls, markdown: str) -> list["CitationBlock"]:
65
"""
66
Extracts citation blocks from a markdown string.
67
68
Process:
69
1. Find all square bracket blocks
70
2. For each block, try to extract citations
71
3. If successful, create CitationBlock object
72
4. Skip blocks that don't contain valid citations
73
74
Args:
75
markdown (str): Markdown text containing citation blocks
76
77
Returns:
78
list[CitationBlock]: List of parsed CitationBlock objects
79
"""
80
```
81
82
### InlineReference Class
83
84
Represents an inline citation reference that appears directly in text without square brackets.
85
86
```python { .api }
87
@dataclass
88
class InlineReference:
89
"""Represents an inline citation reference"""
90
91
key: str # The citation key (without @ symbol)
92
93
def __str__(self) -> str:
94
"""
95
String representation of the inline reference.
96
97
Returns:
98
str: Formatted as @key
99
"""
100
101
def __hash__(self) -> int:
102
"""
103
Hash implementation for use in sets.
104
105
Returns:
106
int: Hash based on citation key
107
"""
108
109
@classmethod
110
def from_markdown(cls, markdown: str) -> list["InlineReference"]:
111
"""
112
Finds inline references in the markdown text.
113
114
Note:
115
Only use this after processing all regular citations to avoid conflicts
116
117
Args:
118
markdown (str): Markdown text containing inline references
119
120
Returns:
121
list[InlineReference]: List of parsed InlineReference objects
122
"""
123
```
124
125
### Regular Expression Patterns
126
127
Pre-compiled regular expression patterns used for citation parsing.
128
129
```python { .api }
130
CITATION_REGEX: re.Pattern[str]
131
"""Pattern for matching individual citations with optional prefix/suffix"""
132
133
CITATION_BLOCK_REGEX: re.Pattern[str]
134
"""Pattern for matching citation blocks in square brackets"""
135
136
EMAIL_REGEX: re.Pattern[str]
137
"""Pattern for matching email addresses to avoid false citation matches"""
138
139
INLINE_REFERENCE_REGEX: re.Pattern[str]
140
"""Pattern for matching inline references outside of citation blocks"""
141
```
142
143
## Usage Examples
144
145
### Parsing Citation Blocks
146
147
```python
148
from mkdocs_bibtex.citation import CitationBlock
149
150
# Citation block with multiple citations
151
markdown = "This references [@smith2020; @jones2019, pp. 100-120]."
152
153
citation_blocks = CitationBlock.from_markdown(markdown)
154
for block in citation_blocks:
155
print(f"Block: {block}")
156
for citation in block.citations:
157
print(f" Key: {citation.key}")
158
print(f" Prefix: '{citation.prefix}'")
159
print(f" Suffix: '{citation.suffix}'")
160
161
# Output:
162
# Block: [@smith2020; @jones2019, pp. 100-120]
163
# Key: smith2020
164
# Prefix: ''
165
# Suffix: ''
166
# Key: jones2019
167
# Prefix: ''
168
# Suffix: 'pp. 100-120'
169
```
170
171
### Parsing Individual Citations
172
173
```python
174
from mkdocs_bibtex.citation import Citation
175
176
# Citation with prefix and suffix
177
citation_text = "see @author2021, pp. 25-30"
178
citations = Citation.from_markdown(citation_text)
179
180
for citation in citations:
181
print(f"Key: {citation.key}")
182
print(f"Prefix: '{citation.prefix}'")
183
print(f"Suffix: '{citation.suffix}'")
184
185
# Output:
186
# Key: author2021
187
# Prefix: 'see'
188
# Suffix: 'pp. 25-30'
189
```
190
191
### Parsing Inline References
192
193
```python
194
from mkdocs_bibtex.citation import InlineReference
195
196
# Text with inline citations
197
markdown = "According to @smith2020, the results show @jones2019 was correct."
198
199
inline_refs = InlineReference.from_markdown(markdown)
200
for ref in inline_refs:
201
print(f"Inline reference: {ref}")
202
203
# Output:
204
# Inline reference: @smith2020
205
# Inline reference: @jones2019
206
```
207
208
### Complete Processing Pipeline
209
210
```python
211
from mkdocs_bibtex.citation import CitationBlock, InlineReference
212
213
markdown_content = '''
214
# My Document
215
216
This cites [@primary2020; see @secondary2019, pp. 100].
217
218
The method from @author2021 shows interesting results.
219
220
\bibliography
221
'''
222
223
# Step 1: Process citation blocks first
224
citation_blocks = CitationBlock.from_markdown(markdown_content)
225
print(f"Found {len(citation_blocks)} citation blocks")
226
227
# Step 2: Process inline references (after blocks to avoid conflicts)
228
inline_refs = InlineReference.from_markdown(markdown_content)
229
print(f"Found {len(inline_refs)} inline references")
230
231
# Step 3: Extract all unique keys
232
all_keys = set()
233
for block in citation_blocks:
234
for citation in block.citations:
235
all_keys.add(citation.key)
236
for ref in inline_refs:
237
all_keys.add(ref.key)
238
239
print(f"Total unique citations: {all_keys}")
240
# Output: {'primary2020', 'secondary2019', 'author2021'}
241
```
242
243
## Citation Syntax Patterns
244
245
### Citation Block Syntax
246
247
Citation blocks are enclosed in square brackets and can contain multiple citations:
248
249
```markdown
250
[@single_citation]
251
[@first; @second]
252
[@author2020, pp. 100-120]
253
[see @author2020, pp. 100; @other2019]
254
```
255
256
### Inline Citation Syntax
257
258
Inline citations appear directly in text:
259
260
```markdown
261
According to @author2020, the method works.
262
The @author2020 approach is effective.
263
Results from @study2019 confirm this.
264
```
265
266
### Complex Citation Examples
267
268
```markdown
269
[See @primary2020, pp. 25-30; cf. @secondary2019; @tertiary2018, ch. 5]
270
[@author2020, Figure 3; @coauthor2020, Table 2]
271
[e.g., @example2019; @another2020; but see @contrary2018]
272
```
273
274
## Error Handling
275
276
The citation parsing system includes robust error handling:
277
278
- **Invalid Citation Syntax**: Malformed citations are skipped with debug logging
279
- **Email Address Filtering**: Automatically filters out email addresses that match citation patterns
280
- **Empty Citations**: Handles empty or whitespace-only citation keys gracefully
281
- **Special Characters**: Properly handles citation keys with hyphens, underscores, and numbers
282
283
## Performance Considerations
284
285
- **Regex Compilation**: All patterns are pre-compiled as module constants
286
- **Single Pass Processing**: Citation blocks are processed in a single pass through the markdown
287
- **Lazy Processing**: Inline references are only processed when explicitly requested
288
- **Memory Efficiency**: Uses dataclasses for minimal memory overhead