Tessl Tile for pypi/bibtexparser@1.4.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

advanced-parsing.md advanced-writing.md basic-operations.md bibtex-expression.md data-model.md entry-customization.md index.md latex-encoding.md

latex-encoding.mddocs/

0
# LaTeX Encoding Utilities
1

2
Utilities for converting between LaTeX-encoded text and Unicode, supporting a comprehensive range of special characters, accents, and symbols commonly found in bibliographic data. These functions handle the complexities of LaTeX character encoding in academic publications.
3

4
## Capabilities
5

6
### Unicode to LaTeX Conversion
7

8
Convert Unicode characters to their LaTeX equivalents for compatibility with LaTeX-based typesetting systems.
9

10
```python { .api }
11
def string_to_latex(string: str) -> str:
12
    """
13
    Convert a Unicode string to its LaTeX equivalent.
14
    
15
    Converts Unicode characters to LaTeX commands while preserving
16
    whitespace and brace characters. Uses comprehensive mapping
17
    for accented characters, symbols, and special characters.
18
    
19
    Parameters:
20
    - string (str): Unicode string to convert
21
    
22
    Returns:
23
    str: LaTeX-encoded string with Unicode characters converted to LaTeX commands
24
    
25
    Example:
26
    >>> string_to_latex("café résumé")
27
    "caf{\\'e} r{\\'e}sum{\\'e}"
28
    """
29
```
30

31
### LaTeX to Unicode Conversion
32

33
Convert LaTeX-encoded text to Unicode characters for modern text processing and display.
34

35
```python { .api }
36
def latex_to_unicode(string: str) -> str:
37
    """
38
    Convert a LaTeX string to Unicode equivalent.
39
    
40
    Processes LaTeX commands and converts them to Unicode characters.
41
    Handles accented characters, symbols, and removes braces used
42
    for LaTeX grouping. Normalizes the result to NFC form.
43
    
44
    Parameters:
45
    - string (str): LaTeX string to convert
46
    
47
    Returns:
48
    str: Unicode string with LaTeX commands converted to Unicode characters
49
    
50
    Example:
51
    >>> latex_to_unicode("caf{\\'e} r{\\'e}sum{\\'e}")
52
    "café résumé"
53
    """
54
```
55

56
### Uppercase Protection
57

58
Protect uppercase letters in titles for proper BibTeX formatting, ensuring they are preserved in LaTeX output.
59

60
```python { .api }
61
def protect_uppercase(string: str) -> str:
62
    """
63
    Protect uppercase letters for BibTeX by wrapping them in braces.
64
    
65
    BibTeX and LaTeX bibliography styles often convert titles to sentence case,
66
    which can incorrectly lowercase proper nouns and acronyms. This function
67
    protects uppercase letters by wrapping them in braces.
68
    
69
    Parameters:
70
    - string (str): String to process
71
    
72
    Returns:
73
    str: String with uppercase letters wrapped in braces
74
    
75
    Example:
76
    >>> protect_uppercase("The DNA Analysis")
77
    "The {D}{N}{A} {A}nalysis"
78
    """
79
```
80

81
### Legacy Conversion Functions
82

83
Legacy functions maintained for backwards compatibility with older LaTeX encoding approaches.
84

85
```python { .api }
86
def unicode_to_latex(string: str) -> str:
87
    """
88
    Convert Unicode to LaTeX using legacy mappings.
89
    
90
    Alternative Unicode to LaTeX conversion using older mapping approach.
91
    
92
    Parameters:
93
    - string (str): Unicode string to convert
94
    
95
    Returns:
96
    str: LaTeX-encoded string
97
    """
98

99
def unicode_to_crappy_latex1(string: str) -> str:
100
    """
101
    Convert Unicode using first legacy LaTeX approach.
102
    
103
    Uses older, less optimal LaTeX encoding patterns that may not
104
    be suitable for modern LaTeX systems.
105
    
106
    Parameters:
107
    - string (str): Unicode string to convert
108
    
109
    Returns:
110
    str: LaTeX-encoded string using legacy patterns
111
    """
112

113
def unicode_to_crappy_latex2(string: str) -> str:
114
    """
115
    Convert Unicode using second legacy LaTeX approach.
116
    
117
    Uses alternative legacy LaTeX encoding patterns.
118
    
119
    Parameters:
120
    - string (str): Unicode string to convert
121
    
122
    Returns:
123
    str: LaTeX-encoded string using alternative legacy patterns
124
    """
125
```
126

127
### Mapping Constants
128

129
Pre-built mappings for character conversion used by the conversion functions.
130

131
```python { .api }
132
unicode_to_latex_map: dict
133
"""
134
Dictionary mapping Unicode characters to LaTeX commands.
135
Comprehensive mapping covering accented characters, symbols,
136
mathematical characters, and special typography.
137
"""
138

139
unicode_to_crappy_latex1: list
140
"""
141
List of (Unicode, LaTeX) tuples for legacy conversion approach.
142
Contains mappings that may not follow modern LaTeX best practices.
143
"""
144

145
unicode_to_crappy_latex2: list
146
"""
147
List of (Unicode, LaTeX) tuples for alternative legacy conversion.
148
Contains additional legacy mappings for special cases.
149
"""
150
```
151

152
## Usage Examples
153

154
### Basic Conversion
155

156
```python
157
from bibtexparser.latexenc import latex_to_unicode, string_to_latex
158

159
# Convert LaTeX to Unicode
160
latex_title = "Schr{\\"o}dinger's Cat in Quantum Mechanics"
161
unicode_title = latex_to_unicode(latex_title)
162
print(unicode_title)  # Output: Schrödinger's Cat in Quantum Mechanics
163

164
# Convert Unicode to LaTeX
165
unicode_author = "José María Azañar"
166
latex_author = string_to_latex(unicode_author)
167
print(latex_author)  # Output: Jos{\\'e} Mar{\\'\i}a Aza{\\~n}ar
168
```
169

170
### Title Protection for BibTeX
171

172
```python
173
from bibtexparser.latexenc import protect_uppercase
174

175
# Protect acronyms and proper nouns in titles
176
title = "The Effect of DNA Analysis on RNA Processing"
177
protected_title = protect_uppercase(title)
178
print(protected_title)  # Output: The {E}ffect of {D}{N}{A} {A}nalysis on {R}{N}{A} {P}rocessing
179

180
# Use in BibTeX entry
181
entry = {
182
    'title': protect_uppercase("Machine Learning Applications in NLP"),
183
    'author': string_to_latex("José García")
184
}
185
```
186

187
### Processing Bibliographic Data
188

189
```python
190
from bibtexparser.latexenc import latex_to_unicode, string_to_latex, protect_uppercase
191

192
def process_entry_latex(entry, to_unicode=True):
193
    """Process entry LaTeX encoding."""
194
    processed = entry.copy()
195
    
196
    if to_unicode:
197
        # Convert LaTeX to Unicode
198
        for field in ['title', 'author', 'journal', 'booktitle']:
199
            if field in processed:
200
                processed[field] = latex_to_unicode(processed[field])
201
    else:
202
        # Convert Unicode to LaTeX and protect titles
203
        for field in ['author', 'journal', 'booktitle']:
204
            if field in processed:
205
                processed[field] = string_to_latex(processed[field])
206
        
207
        # Special handling for titles
208
        if 'title' in processed:
209
            processed['title'] = protect_uppercase(string_to_latex(processed['title']))
210
    
211
    return processed
212

213
# Example usage
214
entry = {
215
    'title': 'Café Culture in Montréal',
216
    'author': 'François Dubé',
217
    'journal': 'Études Québécoises'
218
}
219

220
# Convert for LaTeX output
221
latex_entry = process_entry_latex(entry, to_unicode=False)
222
print(latex_entry['title'])   # {C}af{\\'e} {C}ulture in {M}ontr{\\'e}al
223
print(latex_entry['author'])  # Fran{\\c{c}}ois Dub{\\'e}
224
```
225

226
### Handling Different Character Sets
227

228
```python
229
from bibtexparser.latexenc import latex_to_unicode, string_to_latex
230

231
# European accented characters
232
text_fr = "Élève français à l'école"
233
latex_fr = string_to_latex(text_fr)
234
print(latex_fr)  # {\\'{E}}l{\\`e}ve fran{\\c{c}}ais {\\`a} l'{\\'{e}}cole
235

236
# German umlauts
237
text_de = "Müller über Käse"
238
latex_de = string_to_latex(text_de)
239
print(latex_de)  # M{\\"u}ller {\\"u}ber K{\\"a}se
240

241
# Mathematical symbols
242
text_math = "α-particle β-decay γ-ray"
243
latex_math = string_to_latex(text_math)
244
print(latex_math)  # \\alpha -particle \\beta -decay \\gamma -ray
245

246
# Convert back
247
unicode_math = latex_to_unicode(latex_math)
248
print(unicode_math)  # α-particle β-decay γ-ray
249
```
250

251
### Integration with BibTeX Processing
252

253
```python
254
import bibtexparser
255
from bibtexparser.latexenc import latex_to_unicode, string_to_latex, protect_uppercase
256

257
def latex_processing_customization(record):
258
    """Customization function for LaTeX processing."""
259
    # Convert LaTeX to Unicode for processing
260
    for field in ['title', 'author', 'journal', 'booktitle', 'publisher']:
261
        if field in record:
262
            record[field] = latex_to_unicode(record[field])
263
    
264
    # Store original LaTeX versions
265
    for field in ['title', 'author', 'journal', 'booktitle', 'publisher']:
266
        if field in record:
267
            record[f'{field}_latex'] = string_to_latex(record[field])
268
    
269
    # Protect uppercase in title for BibTeX output
270
    if 'title' in record:
271
        record['title_protected'] = protect_uppercase(record['title_latex'])
272
    
273
    return record
274

275
# Use with parser
276
parser = bibtexparser.bparser.BibTexParser(customization=latex_processing_customization)
277
with open('bibliography.bib') as f:
278
    db = parser.parse_file(f)
279

280
# Entries now have both Unicode and LaTeX versions
281
for entry in db.entries:
282
    print(f"Unicode title: {entry.get('title', '')}")
283
    print(f"LaTeX title: {entry.get('title_latex', '')}")
284
    print(f"Protected title: {entry.get('title_protected', '')}")
285
```
286

287
### Custom Character Mappings
288

289
```python
290
from bibtexparser.latexenc import unicode_to_latex_map
291

292
# Check available mappings
293
print(f"Total mappings: {len(unicode_to_latex_map)}")
294

295
# Find specific character mappings
296
for char, latex in unicode_to_latex_map.items():
297
    if 'alpha' in latex.lower():
298
        print(f"'{char}' -> '{latex}'")
299

300
# Custom extension of mappings
301
custom_mappings = unicode_to_latex_map.copy()
302
custom_mappings['™'] = '\\texttrademark'
303
custom_mappings['©'] = '\\textcopyright'
304

305
def custom_string_to_latex(string):
306
    """Custom conversion with additional mappings."""
307
    result = []
308
    for char in string:
309
        if char in [' ', '{', '}']:
310
            result.append(char)
311
        else:
312
            result.append(custom_mappings.get(char, char))
313
    return ''.join(result)
314
```

Version

Tile

Files

latex-encoding.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

latex-encoding.mddocs/