or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

index.md

tile.json

tessl/pypi-python-bidi

Python Bidi layout wrapping the Rust crate unicode-bidi

Workspace: tessl
Visibility: Public
Created: 3 months ago
Last updated: 3 months ago
Describes: pkg:pypi/python-bidi@0.6.x

To install, run

npx @tessl/cli install tessl/pypi-python-bidi@0.6.0

Python Bidi

Python BiDi provides bi-directional (BiDi) text layout support for Python applications, enabling correct display of mixed left-to-right and right-to-left text (such as Arabic, Hebrew mixed with English). The library offers two implementations: a high-performance Rust-based implementation (default) and a pure Python implementation for compatibility.

Architecture

Python-bidi uses a dual-implementation approach to provide both performance and compatibility:

Rust Implementation (Default): High-performance implementation using the unicode-bidi Rust crate, compiled as a Python extension module (.bidi). Implements a more recent version of the Unicode BiDi algorithm.
Pure Python Implementation: Compatible fallback implementation in pure Python, implementing Unicode BiDi algorithm version 5. Provides additional debugging features and internal API access.
Unified API: Both implementations expose the same primary functions (get_display, get_base_level) with identical behavior for standard use cases.
Automatic Selection: The default import (from bidi import) uses the Rust implementation, while the Python implementation is explicitly accessible via from bidi.algorithm import.

Package Information

Package Name: python-bidi
Language: Python
Installation: pip install python-bidi

Core Imports

Main API (Rust-based implementation):

from bidi import get_display, get_base_level

Pure Python implementation:

from bidi.algorithm import get_display, get_base_level

Basic Usage

from bidi import get_display

# Hebrew text example
hebrew_text = "שלום"
display_text = get_display(hebrew_text)
print(display_text)  # Outputs correctly ordered text for display

# Mixed text with numbers
mixed_text = "1 2 3 ניסיון"
display_text = get_display(mixed_text)
print(display_text)  # "ןויסינ 3 2 1"

# Working with bytes and encoding
hebrew_bytes = "שלם".encode('utf-8')
display_bytes = get_display(hebrew_bytes, encoding='utf-8')
print(display_bytes.decode('utf-8'))

# Override base direction
text = "hello world"
rtl_display = get_display(text, base_dir='R')
print(rtl_display)

# Debug mode to see algorithm steps
debug_output = get_display("hello שלום", debug=True)
# Outputs algorithm steps to stderr

Capabilities

Text Layout Processing

Converts logical text order to visual display order according to the Unicode BiDi algorithm.

def get_display(
    str_or_bytes: StrOrBytes,
    encoding: str = "utf-8",
    base_dir: Optional[str] = None,
    debug: bool = False
) -> StrOrBytes:
    """
    Convert text from logical order to visual display order.
    
    Args:
        str_or_bytes: Input text as string or bytes
        encoding: Encoding to use if input is bytes (default: "utf-8")
        base_dir: Override base direction ('L' for LTR, 'R' for RTL)
        debug: Enable debug output to stderr (default: False)
    
    Returns:
        Processed text in same type as input (str or bytes)
    """

Base Direction Detection

Determines the base paragraph direction of text.

def get_base_level(text: str) -> int:
    """
    Get the base embedding level of the first paragraph in text.
    
    Args:
        text: Input text string
    
    Returns:
        Base level (0 for LTR, 1 for RTL)
    """

Pure Python Implementation

For compatibility or when Rust implementation is not available, use the pure Python implementation.

# From bidi.algorithm module
def get_display(
    str_or_bytes: StrOrBytes,
    encoding: str = "utf-8", 
    upper_is_rtl: bool = False,
    base_dir: Optional[str] = None,
    debug: bool = False
) -> StrOrBytes:
    """
    Pure Python implementation of BiDi text layout.
    
    Args:
        str_or_bytes: Input text as string or bytes
        encoding: Encoding to use if input is bytes (default: "utf-8")
        upper_is_rtl: Treat uppercase chars as strong RTL for debugging (default: False)
        base_dir: Override base direction ('L' for LTR, 'R' for RTL)
        debug: Enable debug output to stderr (default: False)
    
    Returns:
        Processed text in same type as input (str or bytes)
    """

def get_base_level(text, upper_is_rtl: bool = False) -> int:
    """
    Get base embedding level using Python implementation.
    
    Args:
        text: Input text string
        upper_is_rtl: Treat uppercase chars as strong RTL for debugging (default: False)
    
    Returns:
        Base level (0 for LTR, 1 for RTL)
    """

Internal Algorithm Functions

For advanced usage, the Python implementation exposes internal algorithm functions.

def get_empty_storage() -> dict:
    """
    Return empty storage skeleton for testing and advanced usage.
    
    Returns:
        Dictionary with keys: base_level, base_dir, chars, runs
    """

def get_embedding_levels(text, storage, upper_is_rtl: bool = False, debug: bool = False):
    """
    Get paragraph embedding levels and populate storage with character data.
    
    Args:
        text: Input text string
        storage: Storage dictionary from get_empty_storage()
        upper_is_rtl: Treat uppercase chars as strong RTL (default: False)
        debug: Enable debug output (default: False)
    """

def debug_storage(storage, base_info: bool = False, chars: bool = True, runs: bool = False):
    """
    Display debug information for storage object.
    
    Args:
        storage: Storage dictionary
        base_info: Show base level and direction info (default: False)
        chars: Show character data (default: True) 
        runs: Show level runs (default: False)
    """

Mirror Character Mappings

Access to Unicode character mirroring data.

from bidi.mirror import MIRRORED

# MIRRORED is a dictionary mapping characters to their mirrored versions
# Example: MIRRORED['('] == ')'

Command Line Interface

Use pybidi command for text processing from the command line.

# Basic usage
pybidi "your text here"

# Read from stdin
echo "your text here" | pybidi

# Use Rust implementation (default is Python)
pybidi -r "your text here"

# Override base direction
pybidi -b R "your text here"

# Enable debug output
pybidi -d "your text here"

# Specify encoding
pybidi -e utf-8 "your text here"

# For Python implementation, treat uppercase as RTL (debugging)
pybidi -u "Your Text HERE"

Version Information

Access version information for the package:

from bidi import VERSION, VERSION_TUPLE

# VERSION is a string like "0.6.0"
# VERSION_TUPLE is a tuple like (0, 6, 0)

Main Function API

The package provides a main function for command-line usage:

from bidi import main

def main():
    """
    Command-line interface function for pybidi.
    
    Processes command line arguments and applies BiDi algorithm to input text.
    Used by the pybidi console script. Reads from arguments or stdin,
    supports all CLI options (encoding, base direction, debug, etc.).
    
    Returns:
        None (outputs processed text to stdout)
    """

Types

from typing import Union, Optional, List, Dict, Any
from collections import deque

# Type aliases used in the API
StrOrBytes = Union[str, bytes]

# Storage structure (Python implementation)
Storage = Dict[str, Any]  # Contains:
# {
#     "base_level": int,        # Base embedding level (0 for LTR, 1 for RTL)
#     "base_dir": str,          # Base direction ('L' or 'R')
#     "chars": List[Dict],      # Character data with level, type, original type
#     "runs": deque             # Level runs for processing
# }

# Character object structure (within Storage["chars"])
Character = Dict[str, Union[str, int]]  # Contains:
# {
#     "ch": str,        # The character
#     "level": int,     # Embedding level
#     "type": str,      # BiDi character type
#     "orig": str       # Original BiDi character type
# }

Implementation Differences

Rust Implementation (Default)

Higher performance
Implements more recent Unicode BiDi algorithm
Access via from bidi import get_display, get_base_level (uses compiled .bidi module)
Does NOT support upper_is_rtl parameter
Debug output: Formatted debug representation of internal BidiInfo structure
Limited to main API functions

Python Implementation

Pure Python compatibility
Implements Unicode BiDi algorithm v5
Access via from bidi.algorithm import get_display, get_base_level
Supports upper_is_rtl parameter for debugging
Exposes internal algorithm functions for advanced usage
Debug output: Detailed step-by-step algorithm information to stderr
Suitable for educational purposes or when Rust implementation unavailable

Error Handling

Both implementations handle common error cases gracefully:

Common Error Conditions:

Invalid encodings: Raise standard Python UnicodeDecodeError or UnicodeEncodeError
Empty or None text inputs: Handled safely, return empty string or raise ValueError
Invalid base_dir values: Rust implementation raises ValueError for values other than 'L', 'R', or None
Malformed Unicode text: Processed according to Unicode BiDi algorithm specifications

Rust Implementation Specific:

Empty paragraphs: get_base_level_inner() raises ValueError for text with no paragraphs
Invalid base_dir: Raises ValueError with message "base_dir can be 'L', 'R' or None"

Python Implementation Specific:

Assertion errors: Internal algorithm functions may raise AssertionError for invalid character types
Debug mode: Outputs debugging information to sys.stderr, does not raise exceptions

Encoding Support:

Supports any encoding that Python's str.encode() and bytes.decode() support, including:

UTF-8 (default)
UTF-16, UTF-32
ASCII, Latin-1
Windows code pages (cp1252, cp1255 for Hebrew)
ISO encodings (iso-8859-1, iso-8859-8 for Hebrew)

Usage Examples

Processing Mixed Language Text

from bidi import get_display

# English with Hebrew
text = "Hello שלום World"
display = get_display(text)
print(display)  # Correctly ordered for display

# Numbers with RTL text
text = "הספר עולה 25 שקל"
display = get_display(text)
print(display)  # Numbers maintain LTR order within RTL text

Working with Different Encodings

from bidi import get_display

# Hebrew text in different encoding
hebrew_cp1255 = "שלום".encode('cp1255')
display = get_display(hebrew_cp1255, encoding='cp1255')
print(display.decode('cp1255'))

Debugging Text Processing

from bidi.algorithm import get_display, debug_storage, get_empty_storage, get_embedding_levels

# Enable debug output
text = "Hello שלום"
display = get_display(text, debug=True)
# Outputs detailed algorithm steps to stderr

# Manual debugging with storage
storage = get_empty_storage()
get_embedding_levels(text, storage)
debug_storage(storage, base_info=True, chars=True, runs=True)