or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

advanced-parsing.mdadvanced-writing.mdbasic-operations.mdbibtex-expression.mddata-model.mdentry-customization.mdindex.mdlatex-encoding.md

entry-customization.mddocs/

0

# Entry Customization and Processing

1

2

Collection of functions for customizing and processing bibliographic entries including name parsing, field normalization, LaTeX encoding conversion, and specialized field handling. These functions are designed to be used as customization callbacks during parsing or for post-processing entries.

3

4

## Capabilities

5

6

### Author and Editor Processing

7

8

Functions for processing and formatting author and editor names with support for various name formats and structured output.

9

10

```python { .api }

11

def author(record: dict) -> dict:

12

"""

13

Split author field into a list of formatted names.

14

15

Processes the 'author' field by splitting on ' and ' delimiter and

16

formatting each name as "Last, First" format.

17

18

Parameters:

19

- record (dict): Entry dictionary to process

20

21

Returns:

22

dict: Modified record with author field as list of formatted names

23

24

Note: Removes empty author fields. Handles newlines in author strings.

25

"""

26

27

def editor(record: dict) -> dict:

28

"""

29

Process editor field into structured objects with names and IDs.

30

31

Similar to author processing but creates objects with 'name' and 'ID'

32

fields for each editor, where ID is a sanitized version of the name.

33

34

Parameters:

35

- record (dict): Entry dictionary to process

36

37

Returns:

38

dict: Modified record with editor field as list of editor objects

39

"""

40

```

41

42

### Name Parsing and Formatting

43

44

Advanced name parsing functions that handle complex name formats according to BibTeX conventions.

45

46

```python { .api }

47

def splitname(name: str, strict_mode: bool = True) -> dict:

48

"""

49

Break a name into its constituent parts: First, von, Last, and Jr.

50

51

Parses names according to BibTeX conventions supporting three formats:

52

- First von Last

53

- von Last, First

54

- von Last, Jr, First

55

56

Parameters:

57

- name (str): Single name string to parse

58

- strict_mode (bool): If True, raise exceptions on invalid names

59

60

Returns:

61

dict: Dictionary with keys 'first', 'last', 'von', 'jr' (each a list of words)

62

63

Raises:

64

InvalidName: If name is invalid and strict_mode=True

65

"""

66

67

def getnames(names: list) -> list:

68

"""

69

Convert list of name strings to "surname, firstnames" format.

70

71

Parameters:

72

- names (list): List of name strings

73

74

Returns:

75

list: List of formatted names in "Last, First" format

76

77

Note: Simplified implementation, may not handle all complex cases

78

"""

79

80

def find_matching(

81

text: str,

82

opening: str,

83

closing: str,

84

ignore_escaped: bool = True

85

) -> dict:

86

"""

87

Find matching bracket pairs in text.

88

89

Parameters:

90

- text (str): Text to search

91

- opening (str): Opening bracket character

92

- closing (str): Closing bracket character

93

- ignore_escaped (bool): Ignore escaped brackets

94

95

Returns:

96

dict: Mapping of opening positions to closing positions

97

98

Raises:

99

IndexError: If brackets are unmatched

100

"""

101

```

102

103

### Field Processing and Normalization

104

105

Functions for processing and normalizing specific bibliographic fields.

106

107

```python { .api }

108

def journal(record: dict) -> dict:

109

"""

110

Convert journal field into structured object with name and ID.

111

112

Parameters:

113

- record (dict): Entry dictionary to process

114

115

Returns:

116

dict: Modified record with journal as object containing 'name' and 'ID'

117

"""

118

119

def keyword(record: dict, sep: str = ',|;') -> dict:

120

"""

121

Split keyword field into a list using specified separators.

122

123

Parameters:

124

- record (dict): Entry dictionary to process

125

- sep (str): Regular expression pattern for separators

126

127

Returns:

128

dict: Modified record with keyword field as list of keywords

129

"""

130

131

def link(record: dict) -> dict:

132

"""

133

Process link field into structured objects.

134

135

Parses link field lines into objects with 'url', 'anchor', and 'format' fields.

136

137

Parameters:

138

- record (dict): Entry dictionary to process

139

140

Returns:

141

dict: Modified record with link field as list of link objects

142

"""

143

144

def page_double_hyphen(record: dict) -> dict:

145

"""

146

Normalize page ranges to use double hyphens.

147

148

Converts various hyphen types in page ranges to standard double hyphen (--).

149

150

Parameters:

151

- record (dict): Entry dictionary to process

152

153

Returns:

154

dict: Modified record with normalized page field

155

"""

156

157

def type(record: dict) -> dict:

158

"""

159

Convert type field to lowercase.

160

161

Parameters:

162

- record (dict): Entry dictionary to process

163

164

Returns:

165

dict: Modified record with lowercase type field

166

"""

167

168

def doi(record: dict) -> dict:

169

"""

170

Process DOI field and add to links.

171

172

Converts DOI to URL format and adds to link field if not already present.

173

174

Parameters:

175

- record (dict): Entry dictionary to process

176

177

Returns:

178

dict: Modified record with DOI added to links

179

"""

180

```

181

182

### LaTeX and Unicode Conversion

183

184

Functions for converting between LaTeX encoding and Unicode in bibliographic data.

185

186

```python { .api }

187

def convert_to_unicode(record: dict) -> dict:

188

"""

189

Convert LaTeX accents and encoding to Unicode throughout record.

190

191

Processes all string fields, lists, and dictionary values in the record

192

to convert LaTeX-encoded special characters to Unicode equivalents.

193

194

Parameters:

195

- record (dict): Entry dictionary to process

196

197

Returns:

198

dict: Modified record with Unicode characters

199

"""

200

201

def homogenize_latex_encoding(record: dict) -> dict:

202

"""

203

Homogenize LaTeX encoding style for BibTeX output.

204

205

First converts to Unicode, then converts back to consistent LaTeX encoding.

206

Protects uppercase letters in title field.

207

208

Parameters:

209

- record (dict): Entry dictionary to process

210

211

Returns:

212

dict: Modified record with homogenized LaTeX encoding

213

214

Note: Experimental function, may have limitations

215

"""

216

217

def add_plaintext_fields(record: dict) -> dict:

218

"""

219

Add plaintext versions of all fields with 'plain_' prefix.

220

221

Creates additional fields with braces and special characters removed

222

for easier text processing and searching.

223

224

Parameters:

225

- record (dict): Entry dictionary to process

226

227

Returns:

228

dict: Modified record with additional plain_* fields

229

"""

230

```

231

232

### Exception Classes

233

234

Exception classes for handling errors in name processing.

235

236

```python { .api }

237

class InvalidName(ValueError):

238

"""

239

Exception raised by splitname() when an invalid name is encountered.

240

241

Used when strict_mode=True and name cannot be parsed according to

242

BibTeX naming conventions.

243

"""

244

pass

245

```

246

247

## Usage Examples

248

249

### Basic Entry Customization

250

251

```python

252

import bibtexparser

253

from bibtexparser import customization

254

255

def my_customization(record):

256

"""Custom function to process entries during parsing."""

257

# Process author names

258

record = customization.author(record)

259

260

# Convert journal to structured format

261

record = customization.journal(record)

262

263

# Split keywords

264

record = customization.keyword(record)

265

266

# Convert LaTeX to Unicode

267

record = customization.convert_to_unicode(record)

268

269

return record

270

271

# Use with parser

272

parser = bibtexparser.bparser.BibTexParser(customization=my_customization)

273

with open('bibliography.bib') as f:

274

db = parser.parse_file(f)

275

```

276

277

### Post-processing Entries

278

279

```python

280

from bibtexparser import customization

281

282

# Load database normally

283

with open('bibliography.bib') as f:

284

db = bibtexparser.load(f)

285

286

# Apply customizations to all entries

287

for entry in db.entries:

288

entry = customization.author(entry)

289

entry = customization.page_double_hyphen(entry)

290

entry = customization.doi(entry)

291

```

292

293

### Name Processing Examples

294

295

```python

296

from bibtexparser.customization import splitname, getnames

297

298

# Parse individual names

299

name_parts = splitname("Jean-Baptiste von Neumann, Jr.")

300

print(name_parts)

301

# {'first': ['Jean-Baptiste'], 'von': ['von'], 'last': ['Neumann'], 'jr': ['Jr.']}

302

303

# Format multiple names

304

authors = ["Einstein, Albert", "Newton, Isaac", "Curie, Marie"]

305

formatted = getnames(authors)

306

print(formatted)

307

# ['Einstein, Albert', 'Newton, Isaac', 'Curie, Marie']

308

```

309

310

### LaTeX Conversion Examples

311

312

```python

313

from bibtexparser.customization import convert_to_unicode, homogenize_latex_encoding

314

315

# Sample entry with LaTeX encoding

316

entry = {

317

'title': 'Schr{\\"o}dinger\\'s Cat',

318

'author': 'Erwin Schr{\\"o}dinger'

319

}

320

321

# Convert to Unicode

322

unicode_entry = convert_to_unicode(entry.copy())

323

print(unicode_entry['title']) # Schrödinger's Cat

324

325

# Homogenize LaTeX encoding

326

latex_entry = homogenize_latex_encoding(entry.copy())

327

print(latex_entry['title']) # Consistent LaTeX format

328

```

329

330

### Field Processing Examples

331

332

```python

333

from bibtexparser.customization import keyword, link, journal

334

335

# Process keywords

336

entry = {'keyword': 'physics; quantum mechanics, uncertainty'}

337

entry = keyword(entry, sep=';|,')

338

print(entry['keyword']) # ['physics', 'quantum mechanics', 'uncertainty']

339

340

# Process journal

341

entry = {'journal': 'Nature Physics'}

342

entry = journal(entry)

343

print(entry['journal']) # {'name': 'Nature Physics', 'ID': 'NaturePhysics'}

344

345

# Process links

346

entry = {'link': 'https://example.com PDF article\nhttps://doi.org/10.1000/123 DOI'}

347

entry = link(entry)

348

print(entry['link'])

349

# [{'url': 'https://example.com', 'anchor': 'PDF', 'format': 'article'},

350

# {'url': 'https://doi.org/10.1000/123', 'anchor': 'DOI'}]

351

```

352

353

### Creating Custom Processing Functions

354

355

```python

356

def custom_year_processor(record):

357

"""Custom function to process year field."""

358

if 'year' in record:

359

year = record['year']

360

# Convert to integer if possible

361

try:

362

record['year_int'] = int(year)

363

except ValueError:

364

record['year_int'] = None

365

366

# Add century field

367

if record['year_int']:

368

record['century'] = (record['year_int'] - 1) // 100 + 1

369

370

return record

371

372

def comprehensive_customization(record):

373

"""Comprehensive processing pipeline."""

374

# Apply built-in customizations

375

record = customization.author(record)

376

record = customization.editor(record)

377

record = customization.journal(record)

378

record = customization.keyword(record)

379

record = customization.doi(record)

380

record = customization.page_double_hyphen(record)

381

record = customization.convert_to_unicode(record)

382

383

# Apply custom processing

384

record = custom_year_processor(record)

385

386

# Add plaintext fields for searching

387

record = customization.add_plaintext_fields(record)

388

389

return record

390

```