CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl/pypi-bibtexparser

A comprehensive BibTeX parser library for Python 3 that enables parsing and writing of bibliographic data files

Pending

Quality

Pending

Does it follow best practices?

Impact

Pending

No eval scenarios have been run

Overview
Eval results
Files

latex-encoding.mddocs/

LaTeX Encoding Utilities

Utilities for converting between LaTeX-encoded text and Unicode, supporting a comprehensive range of special characters, accents, and symbols commonly found in bibliographic data. These functions handle the complexities of LaTeX character encoding in academic publications.

Capabilities

Unicode to LaTeX Conversion

Convert Unicode characters to their LaTeX equivalents for compatibility with LaTeX-based typesetting systems.

def string_to_latex(string: str) -> str:
    """
    Convert a Unicode string to its LaTeX equivalent.
    
    Converts Unicode characters to LaTeX commands while preserving
    whitespace and brace characters. Uses comprehensive mapping
    for accented characters, symbols, and special characters.
    
    Parameters:
    - string (str): Unicode string to convert
    
    Returns:
    str: LaTeX-encoded string with Unicode characters converted to LaTeX commands
    
    Example:
    >>> string_to_latex("café résumé")
    "caf{\\'e} r{\\'e}sum{\\'e}"
    """

LaTeX to Unicode Conversion

Convert LaTeX-encoded text to Unicode characters for modern text processing and display.

def latex_to_unicode(string: str) -> str:
    """
    Convert a LaTeX string to Unicode equivalent.
    
    Processes LaTeX commands and converts them to Unicode characters.
    Handles accented characters, symbols, and removes braces used
    for LaTeX grouping. Normalizes the result to NFC form.
    
    Parameters:
    - string (str): LaTeX string to convert
    
    Returns:
    str: Unicode string with LaTeX commands converted to Unicode characters
    
    Example:
    >>> latex_to_unicode("caf{\\'e} r{\\'e}sum{\\'e}")
    "café résumé"
    """

Uppercase Protection

Protect uppercase letters in titles for proper BibTeX formatting, ensuring they are preserved in LaTeX output.

def protect_uppercase(string: str) -> str:
    """
    Protect uppercase letters for BibTeX by wrapping them in braces.
    
    BibTeX and LaTeX bibliography styles often convert titles to sentence case,
    which can incorrectly lowercase proper nouns and acronyms. This function
    protects uppercase letters by wrapping them in braces.
    
    Parameters:
    - string (str): String to process
    
    Returns:
    str: String with uppercase letters wrapped in braces
    
    Example:
    >>> protect_uppercase("The DNA Analysis")
    "The {D}{N}{A} {A}nalysis"
    """

Legacy Conversion Functions

Legacy functions maintained for backwards compatibility with older LaTeX encoding approaches.

def unicode_to_latex(string: str) -> str:
    """
    Convert Unicode to LaTeX using legacy mappings.
    
    Alternative Unicode to LaTeX conversion using older mapping approach.
    
    Parameters:
    - string (str): Unicode string to convert
    
    Returns:
    str: LaTeX-encoded string
    """

def unicode_to_crappy_latex1(string: str) -> str:
    """
    Convert Unicode using first legacy LaTeX approach.
    
    Uses older, less optimal LaTeX encoding patterns that may not
    be suitable for modern LaTeX systems.
    
    Parameters:
    - string (str): Unicode string to convert
    
    Returns:
    str: LaTeX-encoded string using legacy patterns
    """

def unicode_to_crappy_latex2(string: str) -> str:
    """
    Convert Unicode using second legacy LaTeX approach.
    
    Uses alternative legacy LaTeX encoding patterns.
    
    Parameters:
    - string (str): Unicode string to convert
    
    Returns:
    str: LaTeX-encoded string using alternative legacy patterns
    """

Mapping Constants

Pre-built mappings for character conversion used by the conversion functions.

unicode_to_latex_map: dict
"""
Dictionary mapping Unicode characters to LaTeX commands.
Comprehensive mapping covering accented characters, symbols,
mathematical characters, and special typography.
"""

unicode_to_crappy_latex1: list
"""
List of (Unicode, LaTeX) tuples for legacy conversion approach.
Contains mappings that may not follow modern LaTeX best practices.
"""

unicode_to_crappy_latex2: list
"""
List of (Unicode, LaTeX) tuples for alternative legacy conversion.
Contains additional legacy mappings for special cases.
"""

Usage Examples

Basic Conversion

from bibtexparser.latexenc import latex_to_unicode, string_to_latex

# Convert LaTeX to Unicode
latex_title = "Schr{\\"o}dinger's Cat in Quantum Mechanics"
unicode_title = latex_to_unicode(latex_title)
print(unicode_title)  # Output: Schrödinger's Cat in Quantum Mechanics

# Convert Unicode to LaTeX
unicode_author = "José María Azañar"
latex_author = string_to_latex(unicode_author)
print(latex_author)  # Output: Jos{\\'e} Mar{\\'\i}a Aza{\\~n}ar

Title Protection for BibTeX

from bibtexparser.latexenc import protect_uppercase

# Protect acronyms and proper nouns in titles
title = "The Effect of DNA Analysis on RNA Processing"
protected_title = protect_uppercase(title)
print(protected_title)  # Output: The {E}ffect of {D}{N}{A} {A}nalysis on {R}{N}{A} {P}rocessing

# Use in BibTeX entry
entry = {
    'title': protect_uppercase("Machine Learning Applications in NLP"),
    'author': string_to_latex("José García")
}

Processing Bibliographic Data

from bibtexparser.latexenc import latex_to_unicode, string_to_latex, protect_uppercase

def process_entry_latex(entry, to_unicode=True):
    """Process entry LaTeX encoding."""
    processed = entry.copy()
    
    if to_unicode:
        # Convert LaTeX to Unicode
        for field in ['title', 'author', 'journal', 'booktitle']:
            if field in processed:
                processed[field] = latex_to_unicode(processed[field])
    else:
        # Convert Unicode to LaTeX and protect titles
        for field in ['author', 'journal', 'booktitle']:
            if field in processed:
                processed[field] = string_to_latex(processed[field])
        
        # Special handling for titles
        if 'title' in processed:
            processed['title'] = protect_uppercase(string_to_latex(processed['title']))
    
    return processed

# Example usage
entry = {
    'title': 'Café Culture in Montréal',
    'author': 'François Dubé',
    'journal': 'Études Québécoises'
}

# Convert for LaTeX output
latex_entry = process_entry_latex(entry, to_unicode=False)
print(latex_entry['title'])   # {C}af{\\'e} {C}ulture in {M}ontr{\\'e}al
print(latex_entry['author'])  # Fran{\\c{c}}ois Dub{\\'e}

Handling Different Character Sets

from bibtexparser.latexenc import latex_to_unicode, string_to_latex

# European accented characters
text_fr = "Élève français à l'école"
latex_fr = string_to_latex(text_fr)
print(latex_fr)  # {\\'{E}}l{\\`e}ve fran{\\c{c}}ais {\\`a} l'{\\'{e}}cole

# German umlauts
text_de = "Müller über Käse"
latex_de = string_to_latex(text_de)
print(latex_de)  # M{\\"u}ller {\\"u}ber K{\\"a}se

# Mathematical symbols
text_math = "α-particle β-decay γ-ray"
latex_math = string_to_latex(text_math)
print(latex_math)  # \\alpha -particle \\beta -decay \\gamma -ray

# Convert back
unicode_math = latex_to_unicode(latex_math)
print(unicode_math)  # α-particle β-decay γ-ray

Integration with BibTeX Processing

import bibtexparser
from bibtexparser.latexenc import latex_to_unicode, string_to_latex, protect_uppercase

def latex_processing_customization(record):
    """Customization function for LaTeX processing."""
    # Convert LaTeX to Unicode for processing
    for field in ['title', 'author', 'journal', 'booktitle', 'publisher']:
        if field in record:
            record[field] = latex_to_unicode(record[field])
    
    # Store original LaTeX versions
    for field in ['title', 'author', 'journal', 'booktitle', 'publisher']:
        if field in record:
            record[f'{field}_latex'] = string_to_latex(record[field])
    
    # Protect uppercase in title for BibTeX output
    if 'title' in record:
        record['title_protected'] = protect_uppercase(record['title_latex'])
    
    return record

# Use with parser
parser = bibtexparser.bparser.BibTexParser(customization=latex_processing_customization)
with open('bibliography.bib') as f:
    db = parser.parse_file(f)

# Entries now have both Unicode and LaTeX versions
for entry in db.entries:
    print(f"Unicode title: {entry.get('title', '')}")
    print(f"LaTeX title: {entry.get('title_latex', '')}")
    print(f"Protected title: {entry.get('title_protected', '')}")

Custom Character Mappings

from bibtexparser.latexenc import unicode_to_latex_map

# Check available mappings
print(f"Total mappings: {len(unicode_to_latex_map)}")

# Find specific character mappings
for char, latex in unicode_to_latex_map.items():
    if 'alpha' in latex.lower():
        print(f"'{char}' -> '{latex}'")

# Custom extension of mappings
custom_mappings = unicode_to_latex_map.copy()
custom_mappings['™'] = '\\texttrademark'
custom_mappings['©'] = '\\textcopyright'

def custom_string_to_latex(string):
    """Custom conversion with additional mappings."""
    result = []
    for char in string:
        if char in [' ', '{', '}']:
            result.append(char)
        else:
            result.append(custom_mappings.get(char, char))
    return ''.join(result)

Install with Tessl CLI

npx tessl i tessl/pypi-bibtexparser@1.4.2

docs

advanced-parsing.md

advanced-writing.md

basic-operations.md

bibtex-expression.md

data-model.md

entry-customization.md

index.md

latex-encoding.md

tile.json