or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

advanced-parsing.mdadvanced-writing.mdbasic-operations.mdbibtex-expression.mddata-model.mdentry-customization.mdindex.mdlatex-encoding.md

latex-encoding.mddocs/

0

# LaTeX Encoding Utilities

1

2

Utilities for converting between LaTeX-encoded text and Unicode, supporting a comprehensive range of special characters, accents, and symbols commonly found in bibliographic data. These functions handle the complexities of LaTeX character encoding in academic publications.

3

4

## Capabilities

5

6

### Unicode to LaTeX Conversion

7

8

Convert Unicode characters to their LaTeX equivalents for compatibility with LaTeX-based typesetting systems.

9

10

```python { .api }

11

def string_to_latex(string: str) -> str:

12

"""

13

Convert a Unicode string to its LaTeX equivalent.

14

15

Converts Unicode characters to LaTeX commands while preserving

16

whitespace and brace characters. Uses comprehensive mapping

17

for accented characters, symbols, and special characters.

18

19

Parameters:

20

- string (str): Unicode string to convert

21

22

Returns:

23

str: LaTeX-encoded string with Unicode characters converted to LaTeX commands

24

25

Example:

26

>>> string_to_latex("café résumé")

27

"caf{\\'e} r{\\'e}sum{\\'e}"

28

"""

29

```

30

31

### LaTeX to Unicode Conversion

32

33

Convert LaTeX-encoded text to Unicode characters for modern text processing and display.

34

35

```python { .api }

36

def latex_to_unicode(string: str) -> str:

37

"""

38

Convert a LaTeX string to Unicode equivalent.

39

40

Processes LaTeX commands and converts them to Unicode characters.

41

Handles accented characters, symbols, and removes braces used

42

for LaTeX grouping. Normalizes the result to NFC form.

43

44

Parameters:

45

- string (str): LaTeX string to convert

46

47

Returns:

48

str: Unicode string with LaTeX commands converted to Unicode characters

49

50

Example:

51

>>> latex_to_unicode("caf{\\'e} r{\\'e}sum{\\'e}")

52

"café résumé"

53

"""

54

```

55

56

### Uppercase Protection

57

58

Protect uppercase letters in titles for proper BibTeX formatting, ensuring they are preserved in LaTeX output.

59

60

```python { .api }

61

def protect_uppercase(string: str) -> str:

62

"""

63

Protect uppercase letters for BibTeX by wrapping them in braces.

64

65

BibTeX and LaTeX bibliography styles often convert titles to sentence case,

66

which can incorrectly lowercase proper nouns and acronyms. This function

67

protects uppercase letters by wrapping them in braces.

68

69

Parameters:

70

- string (str): String to process

71

72

Returns:

73

str: String with uppercase letters wrapped in braces

74

75

Example:

76

>>> protect_uppercase("The DNA Analysis")

77

"The {D}{N}{A} {A}nalysis"

78

"""

79

```

80

81

### Legacy Conversion Functions

82

83

Legacy functions maintained for backwards compatibility with older LaTeX encoding approaches.

84

85

```python { .api }

86

def unicode_to_latex(string: str) -> str:

87

"""

88

Convert Unicode to LaTeX using legacy mappings.

89

90

Alternative Unicode to LaTeX conversion using older mapping approach.

91

92

Parameters:

93

- string (str): Unicode string to convert

94

95

Returns:

96

str: LaTeX-encoded string

97

"""

98

99

def unicode_to_crappy_latex1(string: str) -> str:

100

"""

101

Convert Unicode using first legacy LaTeX approach.

102

103

Uses older, less optimal LaTeX encoding patterns that may not

104

be suitable for modern LaTeX systems.

105

106

Parameters:

107

- string (str): Unicode string to convert

108

109

Returns:

110

str: LaTeX-encoded string using legacy patterns

111

"""

112

113

def unicode_to_crappy_latex2(string: str) -> str:

114

"""

115

Convert Unicode using second legacy LaTeX approach.

116

117

Uses alternative legacy LaTeX encoding patterns.

118

119

Parameters:

120

- string (str): Unicode string to convert

121

122

Returns:

123

str: LaTeX-encoded string using alternative legacy patterns

124

"""

125

```

126

127

### Mapping Constants

128

129

Pre-built mappings for character conversion used by the conversion functions.

130

131

```python { .api }

132

unicode_to_latex_map: dict

133

"""

134

Dictionary mapping Unicode characters to LaTeX commands.

135

Comprehensive mapping covering accented characters, symbols,

136

mathematical characters, and special typography.

137

"""

138

139

unicode_to_crappy_latex1: list

140

"""

141

List of (Unicode, LaTeX) tuples for legacy conversion approach.

142

Contains mappings that may not follow modern LaTeX best practices.

143

"""

144

145

unicode_to_crappy_latex2: list

146

"""

147

List of (Unicode, LaTeX) tuples for alternative legacy conversion.

148

Contains additional legacy mappings for special cases.

149

"""

150

```

151

152

## Usage Examples

153

154

### Basic Conversion

155

156

```python

157

from bibtexparser.latexenc import latex_to_unicode, string_to_latex

158

159

# Convert LaTeX to Unicode

160

latex_title = "Schr{\\"o}dinger's Cat in Quantum Mechanics"

161

unicode_title = latex_to_unicode(latex_title)

162

print(unicode_title) # Output: Schrödinger's Cat in Quantum Mechanics

163

164

# Convert Unicode to LaTeX

165

unicode_author = "José María Azañar"

166

latex_author = string_to_latex(unicode_author)

167

print(latex_author) # Output: Jos{\\'e} Mar{\\'\i}a Aza{\\~n}ar

168

```

169

170

### Title Protection for BibTeX

171

172

```python

173

from bibtexparser.latexenc import protect_uppercase

174

175

# Protect acronyms and proper nouns in titles

176

title = "The Effect of DNA Analysis on RNA Processing"

177

protected_title = protect_uppercase(title)

178

print(protected_title) # Output: The {E}ffect of {D}{N}{A} {A}nalysis on {R}{N}{A} {P}rocessing

179

180

# Use in BibTeX entry

181

entry = {

182

'title': protect_uppercase("Machine Learning Applications in NLP"),

183

'author': string_to_latex("José García")

184

}

185

```

186

187

### Processing Bibliographic Data

188

189

```python

190

from bibtexparser.latexenc import latex_to_unicode, string_to_latex, protect_uppercase

191

192

def process_entry_latex(entry, to_unicode=True):

193

"""Process entry LaTeX encoding."""

194

processed = entry.copy()

195

196

if to_unicode:

197

# Convert LaTeX to Unicode

198

for field in ['title', 'author', 'journal', 'booktitle']:

199

if field in processed:

200

processed[field] = latex_to_unicode(processed[field])

201

else:

202

# Convert Unicode to LaTeX and protect titles

203

for field in ['author', 'journal', 'booktitle']:

204

if field in processed:

205

processed[field] = string_to_latex(processed[field])

206

207

# Special handling for titles

208

if 'title' in processed:

209

processed['title'] = protect_uppercase(string_to_latex(processed['title']))

210

211

return processed

212

213

# Example usage

214

entry = {

215

'title': 'Café Culture in Montréal',

216

'author': 'François Dubé',

217

'journal': 'Études Québécoises'

218

}

219

220

# Convert for LaTeX output

221

latex_entry = process_entry_latex(entry, to_unicode=False)

222

print(latex_entry['title']) # {C}af{\\'e} {C}ulture in {M}ontr{\\'e}al

223

print(latex_entry['author']) # Fran{\\c{c}}ois Dub{\\'e}

224

```

225

226

### Handling Different Character Sets

227

228

```python

229

from bibtexparser.latexenc import latex_to_unicode, string_to_latex

230

231

# European accented characters

232

text_fr = "Élève français à l'école"

233

latex_fr = string_to_latex(text_fr)

234

print(latex_fr) # {\\'{E}}l{\\`e}ve fran{\\c{c}}ais {\\`a} l'{\\'{e}}cole

235

236

# German umlauts

237

text_de = "Müller über Käse"

238

latex_de = string_to_latex(text_de)

239

print(latex_de) # M{\\"u}ller {\\"u}ber K{\\"a}se

240

241

# Mathematical symbols

242

text_math = "α-particle β-decay γ-ray"

243

latex_math = string_to_latex(text_math)

244

print(latex_math) # \\alpha -particle \\beta -decay \\gamma -ray

245

246

# Convert back

247

unicode_math = latex_to_unicode(latex_math)

248

print(unicode_math) # α-particle β-decay γ-ray

249

```

250

251

### Integration with BibTeX Processing

252

253

```python

254

import bibtexparser

255

from bibtexparser.latexenc import latex_to_unicode, string_to_latex, protect_uppercase

256

257

def latex_processing_customization(record):

258

"""Customization function for LaTeX processing."""

259

# Convert LaTeX to Unicode for processing

260

for field in ['title', 'author', 'journal', 'booktitle', 'publisher']:

261

if field in record:

262

record[field] = latex_to_unicode(record[field])

263

264

# Store original LaTeX versions

265

for field in ['title', 'author', 'journal', 'booktitle', 'publisher']:

266

if field in record:

267

record[f'{field}_latex'] = string_to_latex(record[field])

268

269

# Protect uppercase in title for BibTeX output

270

if 'title' in record:

271

record['title_protected'] = protect_uppercase(record['title_latex'])

272

273

return record

274

275

# Use with parser

276

parser = bibtexparser.bparser.BibTexParser(customization=latex_processing_customization)

277

with open('bibliography.bib') as f:

278

db = parser.parse_file(f)

279

280

# Entries now have both Unicode and LaTeX versions

281

for entry in db.entries:

282

print(f"Unicode title: {entry.get('title', '')}")

283

print(f"LaTeX title: {entry.get('title_latex', '')}")

284

print(f"Protected title: {entry.get('title_protected', '')}")

285

```

286

287

### Custom Character Mappings

288

289

```python

290

from bibtexparser.latexenc import unicode_to_latex_map

291

292

# Check available mappings

293

print(f"Total mappings: {len(unicode_to_latex_map)}")

294

295

# Find specific character mappings

296

for char, latex in unicode_to_latex_map.items():

297

if 'alpha' in latex.lower():

298

print(f"'{char}' -> '{latex}'")

299

300

# Custom extension of mappings

301

custom_mappings = unicode_to_latex_map.copy()

302

custom_mappings['™'] = '\\texttrademark'

303

custom_mappings['©'] = '\\textcopyright'

304

305

def custom_string_to_latex(string):

306

"""Custom conversion with additional mappings."""

307

result = []

308

for char in string:

309

if char in [' ', '{', '}']:

310

result.append(char)

311

else:

312

result.append(custom_mappings.get(char, char))

313

return ''.join(result)

314

```