or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

bibliography-management.mdcitation-processing.mdconfiguration-schema.mdindex.mdplugin-integration.mdutility-functions.md

citation-processing.mddocs/

0

# Citation Processing

1

2

Citation parsing and processing functionality that extracts and parses citation syntax from markdown content. This module handles the parsing of citation blocks, inline references, and provides data structures for representing citations.

3

4

## Capabilities

5

6

### Citation Class

7

8

Represents a single citation in raw markdown format without any formatting applied.

9

10

```python { .api }

11

@dataclass

12

class Citation:

13

"""Represents a citation in raw markdown without formatting"""

14

15

key: str # The citation key (without @ symbol)

16

prefix: str = "" # Text before the citation key

17

suffix: str = "" # Text after the citation key

18

19

def __str__(self) -> str:

20

"""

21

String representation of the citation.

22

23

Returns:

24

str: Formatted citation string with prefix, @key, and suffix

25

"""

26

27

@classmethod

28

def from_markdown(cls, markdown: str) -> list["Citation"]:

29

"""

30

Extracts citations from a markdown string.

31

32

Args:

33

markdown (str): Markdown text containing citations

34

35

Returns:

36

list[Citation]: List of parsed Citation objects

37

38

Note:

39

Filters out email addresses to avoid false matches

40

"""

41

```

42

43

### CitationBlock Class

44

45

Represents a block of citations enclosed in square brackets, which may contain multiple citations separated by semicolons.

46

47

```python { .api }

48

@dataclass

49

class CitationBlock:

50

"""Represents a block of citations in square brackets"""

51

52

citations: list[Citation] # List of citations in this block

53

raw: str = "" # Raw markdown text of the block

54

55

def __str__(self) -> str:

56

"""

57

String representation of the citation block.

58

59

Returns:

60

str: Formatted citation block with square brackets

61

"""

62

63

@classmethod

64

def from_markdown(cls, markdown: str) -> list["CitationBlock"]:

65

"""

66

Extracts citation blocks from a markdown string.

67

68

Process:

69

1. Find all square bracket blocks

70

2. For each block, try to extract citations

71

3. If successful, create CitationBlock object

72

4. Skip blocks that don't contain valid citations

73

74

Args:

75

markdown (str): Markdown text containing citation blocks

76

77

Returns:

78

list[CitationBlock]: List of parsed CitationBlock objects

79

"""

80

```

81

82

### InlineReference Class

83

84

Represents an inline citation reference that appears directly in text without square brackets.

85

86

```python { .api }

87

@dataclass

88

class InlineReference:

89

"""Represents an inline citation reference"""

90

91

key: str # The citation key (without @ symbol)

92

93

def __str__(self) -> str:

94

"""

95

String representation of the inline reference.

96

97

Returns:

98

str: Formatted as @key

99

"""

100

101

def __hash__(self) -> int:

102

"""

103

Hash implementation for use in sets.

104

105

Returns:

106

int: Hash based on citation key

107

"""

108

109

@classmethod

110

def from_markdown(cls, markdown: str) -> list["InlineReference"]:

111

"""

112

Finds inline references in the markdown text.

113

114

Note:

115

Only use this after processing all regular citations to avoid conflicts

116

117

Args:

118

markdown (str): Markdown text containing inline references

119

120

Returns:

121

list[InlineReference]: List of parsed InlineReference objects

122

"""

123

```

124

125

### Regular Expression Patterns

126

127

Pre-compiled regular expression patterns used for citation parsing.

128

129

```python { .api }

130

CITATION_REGEX: re.Pattern[str]

131

"""Pattern for matching individual citations with optional prefix/suffix"""

132

133

CITATION_BLOCK_REGEX: re.Pattern[str]

134

"""Pattern for matching citation blocks in square brackets"""

135

136

EMAIL_REGEX: re.Pattern[str]

137

"""Pattern for matching email addresses to avoid false citation matches"""

138

139

INLINE_REFERENCE_REGEX: re.Pattern[str]

140

"""Pattern for matching inline references outside of citation blocks"""

141

```

142

143

## Usage Examples

144

145

### Parsing Citation Blocks

146

147

```python

148

from mkdocs_bibtex.citation import CitationBlock

149

150

# Citation block with multiple citations

151

markdown = "This references [@smith2020; @jones2019, pp. 100-120]."

152

153

citation_blocks = CitationBlock.from_markdown(markdown)

154

for block in citation_blocks:

155

print(f"Block: {block}")

156

for citation in block.citations:

157

print(f" Key: {citation.key}")

158

print(f" Prefix: '{citation.prefix}'")

159

print(f" Suffix: '{citation.suffix}'")

160

161

# Output:

162

# Block: [@smith2020; @jones2019, pp. 100-120]

163

# Key: smith2020

164

# Prefix: ''

165

# Suffix: ''

166

# Key: jones2019

167

# Prefix: ''

168

# Suffix: 'pp. 100-120'

169

```

170

171

### Parsing Individual Citations

172

173

```python

174

from mkdocs_bibtex.citation import Citation

175

176

# Citation with prefix and suffix

177

citation_text = "see @author2021, pp. 25-30"

178

citations = Citation.from_markdown(citation_text)

179

180

for citation in citations:

181

print(f"Key: {citation.key}")

182

print(f"Prefix: '{citation.prefix}'")

183

print(f"Suffix: '{citation.suffix}'")

184

185

# Output:

186

# Key: author2021

187

# Prefix: 'see'

188

# Suffix: 'pp. 25-30'

189

```

190

191

### Parsing Inline References

192

193

```python

194

from mkdocs_bibtex.citation import InlineReference

195

196

# Text with inline citations

197

markdown = "According to @smith2020, the results show @jones2019 was correct."

198

199

inline_refs = InlineReference.from_markdown(markdown)

200

for ref in inline_refs:

201

print(f"Inline reference: {ref}")

202

203

# Output:

204

# Inline reference: @smith2020

205

# Inline reference: @jones2019

206

```

207

208

### Complete Processing Pipeline

209

210

```python

211

from mkdocs_bibtex.citation import CitationBlock, InlineReference

212

213

markdown_content = '''

214

# My Document

215

216

This cites [@primary2020; see @secondary2019, pp. 100].

217

218

The method from @author2021 shows interesting results.

219

220

\bibliography

221

'''

222

223

# Step 1: Process citation blocks first

224

citation_blocks = CitationBlock.from_markdown(markdown_content)

225

print(f"Found {len(citation_blocks)} citation blocks")

226

227

# Step 2: Process inline references (after blocks to avoid conflicts)

228

inline_refs = InlineReference.from_markdown(markdown_content)

229

print(f"Found {len(inline_refs)} inline references")

230

231

# Step 3: Extract all unique keys

232

all_keys = set()

233

for block in citation_blocks:

234

for citation in block.citations:

235

all_keys.add(citation.key)

236

for ref in inline_refs:

237

all_keys.add(ref.key)

238

239

print(f"Total unique citations: {all_keys}")

240

# Output: {'primary2020', 'secondary2019', 'author2021'}

241

```

242

243

## Citation Syntax Patterns

244

245

### Citation Block Syntax

246

247

Citation blocks are enclosed in square brackets and can contain multiple citations:

248

249

```markdown

250

[@single_citation]

251

[@first; @second]

252

[@author2020, pp. 100-120]

253

[see @author2020, pp. 100; @other2019]

254

```

255

256

### Inline Citation Syntax

257

258

Inline citations appear directly in text:

259

260

```markdown

261

According to @author2020, the method works.

262

The @author2020 approach is effective.

263

Results from @study2019 confirm this.

264

```

265

266

### Complex Citation Examples

267

268

```markdown

269

[See @primary2020, pp. 25-30; cf. @secondary2019; @tertiary2018, ch. 5]

270

[@author2020, Figure 3; @coauthor2020, Table 2]

271

[e.g., @example2019; @another2020; but see @contrary2018]

272

```

273

274

## Error Handling

275

276

The citation parsing system includes robust error handling:

277

278

- **Invalid Citation Syntax**: Malformed citations are skipped with debug logging

279

- **Email Address Filtering**: Automatically filters out email addresses that match citation patterns

280

- **Empty Citations**: Handles empty or whitespace-only citation keys gracefully

281

- **Special Characters**: Properly handles citation keys with hyphens, underscores, and numbers

282

283

## Performance Considerations

284

285

- **Regex Compilation**: All patterns are pre-compiled as module constants

286

- **Single Pass Processing**: Citation blocks are processed in a single pass through the markdown

287

- **Lazy Processing**: Inline references are only processed when explicitly requested

288

- **Memory Efficiency**: Uses dataclasses for minimal memory overhead