0
# LaTeX Encoding Utilities
1
2
Utilities for converting between LaTeX-encoded text and Unicode, supporting a comprehensive range of special characters, accents, and symbols commonly found in bibliographic data. These functions handle the complexities of LaTeX character encoding in academic publications.
3
4
## Capabilities
5
6
### Unicode to LaTeX Conversion
7
8
Convert Unicode characters to their LaTeX equivalents for compatibility with LaTeX-based typesetting systems.
9
10
```python { .api }
11
def string_to_latex(string: str) -> str:
12
"""
13
Convert a Unicode string to its LaTeX equivalent.
14
15
Converts Unicode characters to LaTeX commands while preserving
16
whitespace and brace characters. Uses comprehensive mapping
17
for accented characters, symbols, and special characters.
18
19
Parameters:
20
- string (str): Unicode string to convert
21
22
Returns:
23
str: LaTeX-encoded string with Unicode characters converted to LaTeX commands
24
25
Example:
26
>>> string_to_latex("café résumé")
27
"caf{\\'e} r{\\'e}sum{\\'e}"
28
"""
29
```
30
31
### LaTeX to Unicode Conversion
32
33
Convert LaTeX-encoded text to Unicode characters for modern text processing and display.
34
35
```python { .api }
36
def latex_to_unicode(string: str) -> str:
37
"""
38
Convert a LaTeX string to Unicode equivalent.
39
40
Processes LaTeX commands and converts them to Unicode characters.
41
Handles accented characters, symbols, and removes braces used
42
for LaTeX grouping. Normalizes the result to NFC form.
43
44
Parameters:
45
- string (str): LaTeX string to convert
46
47
Returns:
48
str: Unicode string with LaTeX commands converted to Unicode characters
49
50
Example:
51
>>> latex_to_unicode("caf{\\'e} r{\\'e}sum{\\'e}")
52
"café résumé"
53
"""
54
```
55
56
### Uppercase Protection
57
58
Protect uppercase letters in titles for proper BibTeX formatting, ensuring they are preserved in LaTeX output.
59
60
```python { .api }
61
def protect_uppercase(string: str) -> str:
62
"""
63
Protect uppercase letters for BibTeX by wrapping them in braces.
64
65
BibTeX and LaTeX bibliography styles often convert titles to sentence case,
66
which can incorrectly lowercase proper nouns and acronyms. This function
67
protects uppercase letters by wrapping them in braces.
68
69
Parameters:
70
- string (str): String to process
71
72
Returns:
73
str: String with uppercase letters wrapped in braces
74
75
Example:
76
>>> protect_uppercase("The DNA Analysis")
77
"The {D}{N}{A} {A}nalysis"
78
"""
79
```
80
81
### Legacy Conversion Functions
82
83
Legacy functions maintained for backwards compatibility with older LaTeX encoding approaches.
84
85
```python { .api }
86
def unicode_to_latex(string: str) -> str:
87
"""
88
Convert Unicode to LaTeX using legacy mappings.
89
90
Alternative Unicode to LaTeX conversion using older mapping approach.
91
92
Parameters:
93
- string (str): Unicode string to convert
94
95
Returns:
96
str: LaTeX-encoded string
97
"""
98
99
def unicode_to_crappy_latex1(string: str) -> str:
100
"""
101
Convert Unicode using first legacy LaTeX approach.
102
103
Uses older, less optimal LaTeX encoding patterns that may not
104
be suitable for modern LaTeX systems.
105
106
Parameters:
107
- string (str): Unicode string to convert
108
109
Returns:
110
str: LaTeX-encoded string using legacy patterns
111
"""
112
113
def unicode_to_crappy_latex2(string: str) -> str:
114
"""
115
Convert Unicode using second legacy LaTeX approach.
116
117
Uses alternative legacy LaTeX encoding patterns.
118
119
Parameters:
120
- string (str): Unicode string to convert
121
122
Returns:
123
str: LaTeX-encoded string using alternative legacy patterns
124
"""
125
```
126
127
### Mapping Constants
128
129
Pre-built mappings for character conversion used by the conversion functions.
130
131
```python { .api }
132
unicode_to_latex_map: dict
133
"""
134
Dictionary mapping Unicode characters to LaTeX commands.
135
Comprehensive mapping covering accented characters, symbols,
136
mathematical characters, and special typography.
137
"""
138
139
unicode_to_crappy_latex1: list
140
"""
141
List of (Unicode, LaTeX) tuples for legacy conversion approach.
142
Contains mappings that may not follow modern LaTeX best practices.
143
"""
144
145
unicode_to_crappy_latex2: list
146
"""
147
List of (Unicode, LaTeX) tuples for alternative legacy conversion.
148
Contains additional legacy mappings for special cases.
149
"""
150
```
151
152
## Usage Examples
153
154
### Basic Conversion
155
156
```python
157
from bibtexparser.latexenc import latex_to_unicode, string_to_latex
158
159
# Convert LaTeX to Unicode
160
latex_title = "Schr{\\"o}dinger's Cat in Quantum Mechanics"
161
unicode_title = latex_to_unicode(latex_title)
162
print(unicode_title) # Output: Schrödinger's Cat in Quantum Mechanics
163
164
# Convert Unicode to LaTeX
165
unicode_author = "José María Azañar"
166
latex_author = string_to_latex(unicode_author)
167
print(latex_author) # Output: Jos{\\'e} Mar{\\'\i}a Aza{\\~n}ar
168
```
169
170
### Title Protection for BibTeX
171
172
```python
173
from bibtexparser.latexenc import protect_uppercase
174
175
# Protect acronyms and proper nouns in titles
176
title = "The Effect of DNA Analysis on RNA Processing"
177
protected_title = protect_uppercase(title)
178
print(protected_title) # Output: The {E}ffect of {D}{N}{A} {A}nalysis on {R}{N}{A} {P}rocessing
179
180
# Use in BibTeX entry
181
entry = {
182
'title': protect_uppercase("Machine Learning Applications in NLP"),
183
'author': string_to_latex("José García")
184
}
185
```
186
187
### Processing Bibliographic Data
188
189
```python
190
from bibtexparser.latexenc import latex_to_unicode, string_to_latex, protect_uppercase
191
192
def process_entry_latex(entry, to_unicode=True):
193
"""Process entry LaTeX encoding."""
194
processed = entry.copy()
195
196
if to_unicode:
197
# Convert LaTeX to Unicode
198
for field in ['title', 'author', 'journal', 'booktitle']:
199
if field in processed:
200
processed[field] = latex_to_unicode(processed[field])
201
else:
202
# Convert Unicode to LaTeX and protect titles
203
for field in ['author', 'journal', 'booktitle']:
204
if field in processed:
205
processed[field] = string_to_latex(processed[field])
206
207
# Special handling for titles
208
if 'title' in processed:
209
processed['title'] = protect_uppercase(string_to_latex(processed['title']))
210
211
return processed
212
213
# Example usage
214
entry = {
215
'title': 'Café Culture in Montréal',
216
'author': 'François Dubé',
217
'journal': 'Études Québécoises'
218
}
219
220
# Convert for LaTeX output
221
latex_entry = process_entry_latex(entry, to_unicode=False)
222
print(latex_entry['title']) # {C}af{\\'e} {C}ulture in {M}ontr{\\'e}al
223
print(latex_entry['author']) # Fran{\\c{c}}ois Dub{\\'e}
224
```
225
226
### Handling Different Character Sets
227
228
```python
229
from bibtexparser.latexenc import latex_to_unicode, string_to_latex
230
231
# European accented characters
232
text_fr = "Élève français à l'école"
233
latex_fr = string_to_latex(text_fr)
234
print(latex_fr) # {\\'{E}}l{\\`e}ve fran{\\c{c}}ais {\\`a} l'{\\'{e}}cole
235
236
# German umlauts
237
text_de = "Müller über Käse"
238
latex_de = string_to_latex(text_de)
239
print(latex_de) # M{\\"u}ller {\\"u}ber K{\\"a}se
240
241
# Mathematical symbols
242
text_math = "α-particle β-decay γ-ray"
243
latex_math = string_to_latex(text_math)
244
print(latex_math) # \\alpha -particle \\beta -decay \\gamma -ray
245
246
# Convert back
247
unicode_math = latex_to_unicode(latex_math)
248
print(unicode_math) # α-particle β-decay γ-ray
249
```
250
251
### Integration with BibTeX Processing
252
253
```python
254
import bibtexparser
255
from bibtexparser.latexenc import latex_to_unicode, string_to_latex, protect_uppercase
256
257
def latex_processing_customization(record):
258
"""Customization function for LaTeX processing."""
259
# Convert LaTeX to Unicode for processing
260
for field in ['title', 'author', 'journal', 'booktitle', 'publisher']:
261
if field in record:
262
record[field] = latex_to_unicode(record[field])
263
264
# Store original LaTeX versions
265
for field in ['title', 'author', 'journal', 'booktitle', 'publisher']:
266
if field in record:
267
record[f'{field}_latex'] = string_to_latex(record[field])
268
269
# Protect uppercase in title for BibTeX output
270
if 'title' in record:
271
record['title_protected'] = protect_uppercase(record['title_latex'])
272
273
return record
274
275
# Use with parser
276
parser = bibtexparser.bparser.BibTexParser(customization=latex_processing_customization)
277
with open('bibliography.bib') as f:
278
db = parser.parse_file(f)
279
280
# Entries now have both Unicode and LaTeX versions
281
for entry in db.entries:
282
print(f"Unicode title: {entry.get('title', '')}")
283
print(f"LaTeX title: {entry.get('title_latex', '')}")
284
print(f"Protected title: {entry.get('title_protected', '')}")
285
```
286
287
### Custom Character Mappings
288
289
```python
290
from bibtexparser.latexenc import unicode_to_latex_map
291
292
# Check available mappings
293
print(f"Total mappings: {len(unicode_to_latex_map)}")
294
295
# Find specific character mappings
296
for char, latex in unicode_to_latex_map.items():
297
if 'alpha' in latex.lower():
298
print(f"'{char}' -> '{latex}'")
299
300
# Custom extension of mappings
301
custom_mappings = unicode_to_latex_map.copy()
302
custom_mappings['™'] = '\\texttrademark'
303
custom_mappings['©'] = '\\textcopyright'
304
305
def custom_string_to_latex(string):
306
"""Custom conversion with additional mappings."""
307
result = []
308
for char in string:
309
if char in [' ', '{', '}']:
310
result.append(char)
311
else:
312
result.append(custom_mappings.get(char, char))
313
return ''.join(result)
314
```