or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

advanced-parsing.mdadvanced-writing.mdbasic-operations.mdbibtex-expression.mddata-model.mdentry-customization.mdindex.mdlatex-encoding.md

advanced-parsing.mddocs/

0

# Advanced Parsing Configuration

1

2

Configurable parser with extensive options for handling various BibTeX formats, non-standard entries, field processing, and customization hooks. The BibTexParser class provides fine-grained control over the parsing process.

3

4

## Capabilities

5

6

### Parser Configuration

7

8

The BibTexParser class provides comprehensive configuration options for customizing the parsing behavior to handle different BibTeX variants and requirements.

9

10

```python { .api }

11

class BibTexParser:

12

"""

13

A parser for reading BibTeX bibliographic data files.

14

15

Provides extensive configuration options for customizing parsing behavior

16

including entry filtering, field processing, string handling, and

17

cross-reference resolution.

18

"""

19

20

def __init__(

21

self,

22

customization=None,

23

ignore_nonstandard_types: bool = True,

24

homogenize_fields: bool = False,

25

interpolate_strings: bool = True,

26

common_strings: bool = True,

27

add_missing_from_crossref: bool = False

28

):

29

"""

30

Create a configurable BibTeX parser.

31

32

Parameters:

33

- customization (callable, optional): Function to process entries after parsing

34

- ignore_nonstandard_types (bool): If True, ignore non-standard entry types

35

- homogenize_fields (bool): If True, normalize field names (e.g., 'url' to 'link')

36

- interpolate_strings (bool): If True, replace string references with values

37

- common_strings (bool): If True, include common month abbreviations

38

- add_missing_from_crossref (bool): If True, resolve crossref dependencies

39

40

Returns:

41

BibTexParser: Configured parser instance

42

"""

43

```

44

45

### String Parsing

46

47

Parse BibTeX data from strings with full configuration control and error handling options.

48

49

```python { .api }

50

def parse(self, bibtex_str: str, partial: bool = False) -> BibDatabase:

51

"""

52

Parse a BibTeX string into a BibDatabase object.

53

54

Parameters:

55

- bibtex_str (str): BibTeX string to parse

56

- partial (bool): If True, continue parsing on errors; if False, raise exceptions

57

58

Returns:

59

BibDatabase: Parsed bibliographic database

60

61

Raises:

62

ParseException: If parsing fails and partial=False

63

"""

64

```

65

66

### File Parsing

67

68

Parse BibTeX data from file objects with the same configuration and error handling as string parsing.

69

70

```python { .api }

71

def parse_file(self, file, partial: bool = False) -> BibDatabase:

72

"""

73

Parse a BibTeX file into a BibDatabase object.

74

75

Parameters:

76

- file (file): File object to parse

77

- partial (bool): If True, continue parsing on errors; if False, raise exceptions

78

79

Returns:

80

BibDatabase: Parsed bibliographic database

81

82

Raises:

83

ParseException: If parsing fails and partial=False

84

"""

85

```

86

87

### Convenience Function

88

89

Module-level convenience function for quick parsing with custom configuration.

90

91

```python { .api }

92

def parse(data: str, *args, **kwargs) -> BibDatabase:

93

"""

94

Convenience function for parsing BibTeX data.

95

96

Creates a BibTexParser with the provided arguments and parses the data.

97

98

Parameters:

99

- data (str): BibTeX string to parse

100

- *args, **kwargs: Arguments passed to BibTexParser constructor

101

102

Returns:

103

BibDatabase: Parsed bibliographic database

104

"""

105

```

106

107

## Configuration Options

108

109

### Entry Type Handling

110

111

Control how the parser handles different BibTeX entry types:

112

113

```python

114

from bibtexparser.bparser import BibTexParser

115

116

# Allow non-standard entry types (like @software, @dataset)

117

parser = BibTexParser(ignore_nonstandard_types=False)

118

119

# Only accept standard BibTeX types (article, book, etc.)

120

parser = BibTexParser(ignore_nonstandard_types=True) # Default

121

```

122

123

### Field Processing

124

125

Configure how fields are processed and normalized:

126

127

```python

128

# Homogenize field names (e.g., 'url' -> 'link', 'keywords' -> 'keyword')

129

parser = BibTexParser(homogenize_fields=True)

130

131

# Keep original field names

132

parser = BibTexParser(homogenize_fields=False) # Default

133

```

134

135

### String Handling

136

137

Control how BibTeX string definitions are processed:

138

139

```python

140

# Replace string references with their values

141

parser = BibTexParser(interpolate_strings=True) # Default

142

143

# Keep string structure for later processing

144

parser = BibTexParser(interpolate_strings=False)

145

146

# Include common month abbreviations (jan, feb, etc.)

147

parser = BibTexParser(common_strings=True) # Default

148

149

# Don't include common strings

150

parser = BibTexParser(common_strings=False)

151

```

152

153

### Cross-reference Resolution

154

155

Enable automatic resolution of crossref dependencies:

156

157

```python

158

# Resolve crossref fields and merge referenced entries

159

parser = BibTexParser(add_missing_from_crossref=True)

160

161

# Keep crossref fields as-is

162

parser = BibTexParser(add_missing_from_crossref=False) # Default

163

```

164

165

## Usage Examples

166

167

### Custom Entry Processing

168

169

```python

170

from bibtexparser.bparser import BibTexParser

171

172

def customize_entries(record):

173

"""Custom function to process entries during parsing."""

174

# Convert author names to "Last, First" format

175

if 'author' in record:

176

# Apply author processing

177

record = bibtexparser.customization.author(record)

178

179

# Convert LaTeX to Unicode

180

record = bibtexparser.customization.convert_to_unicode(record)

181

182

return record

183

184

parser = BibTexParser(customization=customize_entries)

185

186

with open('bibliography.bib') as bibtex_file:

187

bib_database = parser.parse_file(bibtex_file)

188

```

189

190

### Robust Parsing with Error Handling

191

192

```python

193

from bibtexparser.bparser import BibTexParser

194

195

# Configure parser for maximum compatibility

196

parser = BibTexParser(

197

ignore_nonstandard_types=False, # Accept all entry types

198

homogenize_fields=True, # Normalize field names

199

common_strings=True, # Include month abbreviations

200

add_missing_from_crossref=True # Resolve crossrefs

201

)

202

203

try:

204

with open('messy_bibliography.bib') as bibtex_file:

205

# Use partial=True to continue parsing on errors

206

bib_database = parser.parse_file(bibtex_file, partial=True)

207

print(f"Parsed {len(bib_database.entries)} entries")

208

except Exception as e:

209

print(f"Parsing failed: {e}")

210

```

211

212

### Multiple Parsing Passes

213

214

```python

215

from bibtexparser.bparser import BibTexParser

216

217

# Create parser that can be used multiple times

218

parser = BibTexParser()

219

parser.expect_multiple_parse = True # Disable warning

220

221

# Parse multiple files into the same database

222

for filename in ['refs1.bib', 'refs2.bib', 'refs3.bib']:

223

with open(filename) as bibtex_file:

224

bib_database = parser.parse_file(bibtex_file)

225

226

print(f"Total entries: {len(parser.bib_database.entries)}")

227

```

228

229

### Configuration for Different BibTeX Dialects

230

231

```python

232

# Configuration for strict academic BibTeX

233

academic_parser = BibTexParser(

234

ignore_nonstandard_types=True,

235

homogenize_fields=False,

236

interpolate_strings=True,

237

common_strings=True

238

)

239

240

# Configuration for modern/extended BibTeX

241

modern_parser = BibTexParser(

242

ignore_nonstandard_types=False, # Allow @software, @online, etc.

243

homogenize_fields=True, # Normalize field names

244

interpolate_strings=True,

245

common_strings=True,

246

add_missing_from_crossref=True # Handle complex references

247

)

248

```