or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

tessl/pypi-python-bidi

Python Bidi layout wrapping the Rust crate unicode-bidi

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
pypipkg:pypi/python-bidi@0.6.x

To install, run

npx @tessl/cli install tessl/pypi-python-bidi@0.6.0

0

# Python Bidi

1

2

Python BiDi provides bi-directional (BiDi) text layout support for Python applications, enabling correct display of mixed left-to-right and right-to-left text (such as Arabic, Hebrew mixed with English). The library offers two implementations: a high-performance Rust-based implementation (default) and a pure Python implementation for compatibility.

3

4

## Architecture

5

6

Python-bidi uses a dual-implementation approach to provide both performance and compatibility:

7

8

- **Rust Implementation (Default)**: High-performance implementation using the `unicode-bidi` Rust crate, compiled as a Python extension module (`.bidi`). Implements a more recent version of the Unicode BiDi algorithm.

9

- **Pure Python Implementation**: Compatible fallback implementation in pure Python, implementing Unicode BiDi algorithm version 5. Provides additional debugging features and internal API access.

10

- **Unified API**: Both implementations expose the same primary functions (`get_display`, `get_base_level`) with identical behavior for standard use cases.

11

- **Automatic Selection**: The default import (`from bidi import`) uses the Rust implementation, while the Python implementation is explicitly accessible via `from bidi.algorithm import`.

12

13

## Package Information

14

15

- **Package Name**: python-bidi

16

- **Language**: Python

17

- **Installation**: `pip install python-bidi`

18

19

## Core Imports

20

21

Main API (Rust-based implementation):

22

23

```python

24

from bidi import get_display, get_base_level

25

```

26

27

Pure Python implementation:

28

29

```python

30

from bidi.algorithm import get_display, get_base_level

31

```

32

33

## Basic Usage

34

35

```python

36

from bidi import get_display

37

38

# Hebrew text example

39

hebrew_text = "שלום"

40

display_text = get_display(hebrew_text)

41

print(display_text) # Outputs correctly ordered text for display

42

43

# Mixed text with numbers

44

mixed_text = "1 2 3 ניסיון"

45

display_text = get_display(mixed_text)

46

print(display_text) # "ןויסינ 3 2 1"

47

48

# Working with bytes and encoding

49

hebrew_bytes = "שלם".encode('utf-8')

50

display_bytes = get_display(hebrew_bytes, encoding='utf-8')

51

print(display_bytes.decode('utf-8'))

52

53

# Override base direction

54

text = "hello world"

55

rtl_display = get_display(text, base_dir='R')

56

print(rtl_display)

57

58

# Debug mode to see algorithm steps

59

debug_output = get_display("hello שלום", debug=True)

60

# Outputs algorithm steps to stderr

61

```

62

63

## Capabilities

64

65

### Text Layout Processing

66

67

Converts logical text order to visual display order according to the Unicode BiDi algorithm.

68

69

```python { .api }

70

def get_display(

71

str_or_bytes: StrOrBytes,

72

encoding: str = "utf-8",

73

base_dir: Optional[str] = None,

74

debug: bool = False

75

) -> StrOrBytes:

76

"""

77

Convert text from logical order to visual display order.

78

79

Args:

80

str_or_bytes: Input text as string or bytes

81

encoding: Encoding to use if input is bytes (default: "utf-8")

82

base_dir: Override base direction ('L' for LTR, 'R' for RTL)

83

debug: Enable debug output to stderr (default: False)

84

85

Returns:

86

Processed text in same type as input (str or bytes)

87

"""

88

```

89

90

### Base Direction Detection

91

92

Determines the base paragraph direction of text.

93

94

```python { .api }

95

def get_base_level(text: str) -> int:

96

"""

97

Get the base embedding level of the first paragraph in text.

98

99

Args:

100

text: Input text string

101

102

Returns:

103

Base level (0 for LTR, 1 for RTL)

104

"""

105

```

106

107

### Pure Python Implementation

108

109

For compatibility or when Rust implementation is not available, use the pure Python implementation.

110

111

```python { .api }

112

# From bidi.algorithm module

113

def get_display(

114

str_or_bytes: StrOrBytes,

115

encoding: str = "utf-8",

116

upper_is_rtl: bool = False,

117

base_dir: Optional[str] = None,

118

debug: bool = False

119

) -> StrOrBytes:

120

"""

121

Pure Python implementation of BiDi text layout.

122

123

Args:

124

str_or_bytes: Input text as string or bytes

125

encoding: Encoding to use if input is bytes (default: "utf-8")

126

upper_is_rtl: Treat uppercase chars as strong RTL for debugging (default: False)

127

base_dir: Override base direction ('L' for LTR, 'R' for RTL)

128

debug: Enable debug output to stderr (default: False)

129

130

Returns:

131

Processed text in same type as input (str or bytes)

132

"""

133

134

def get_base_level(text, upper_is_rtl: bool = False) -> int:

135

"""

136

Get base embedding level using Python implementation.

137

138

Args:

139

text: Input text string

140

upper_is_rtl: Treat uppercase chars as strong RTL for debugging (default: False)

141

142

Returns:

143

Base level (0 for LTR, 1 for RTL)

144

"""

145

```

146

147

### Internal Algorithm Functions

148

149

For advanced usage, the Python implementation exposes internal algorithm functions.

150

151

```python { .api }

152

def get_empty_storage() -> dict:

153

"""

154

Return empty storage skeleton for testing and advanced usage.

155

156

Returns:

157

Dictionary with keys: base_level, base_dir, chars, runs

158

"""

159

160

def get_embedding_levels(text, storage, upper_is_rtl: bool = False, debug: bool = False):

161

"""

162

Get paragraph embedding levels and populate storage with character data.

163

164

Args:

165

text: Input text string

166

storage: Storage dictionary from get_empty_storage()

167

upper_is_rtl: Treat uppercase chars as strong RTL (default: False)

168

debug: Enable debug output (default: False)

169

"""

170

171

def debug_storage(storage, base_info: bool = False, chars: bool = True, runs: bool = False):

172

"""

173

Display debug information for storage object.

174

175

Args:

176

storage: Storage dictionary

177

base_info: Show base level and direction info (default: False)

178

chars: Show character data (default: True)

179

runs: Show level runs (default: False)

180

"""

181

```

182

183

### Mirror Character Mappings

184

185

Access to Unicode character mirroring data.

186

187

```python { .api }

188

from bidi.mirror import MIRRORED

189

190

# MIRRORED is a dictionary mapping characters to their mirrored versions

191

# Example: MIRRORED['('] == ')'

192

```

193

194

### Command Line Interface

195

196

Use `pybidi` command for text processing from the command line.

197

198

```bash

199

# Basic usage

200

pybidi "your text here"

201

202

# Read from stdin

203

echo "your text here" | pybidi

204

205

# Use Rust implementation (default is Python)

206

pybidi -r "your text here"

207

208

# Override base direction

209

pybidi -b R "your text here"

210

211

# Enable debug output

212

pybidi -d "your text here"

213

214

# Specify encoding

215

pybidi -e utf-8 "your text here"

216

217

# For Python implementation, treat uppercase as RTL (debugging)

218

pybidi -u "Your Text HERE"

219

```

220

221

## Version Information

222

223

Access version information for the package:

224

225

```python { .api }

226

from bidi import VERSION, VERSION_TUPLE

227

228

# VERSION is a string like "0.6.0"

229

# VERSION_TUPLE is a tuple like (0, 6, 0)

230

```

231

232

## Main Function API

233

234

The package provides a main function for command-line usage:

235

236

```python { .api }

237

from bidi import main

238

239

def main():

240

"""

241

Command-line interface function for pybidi.

242

243

Processes command line arguments and applies BiDi algorithm to input text.

244

Used by the pybidi console script. Reads from arguments or stdin,

245

supports all CLI options (encoding, base direction, debug, etc.).

246

247

Returns:

248

None (outputs processed text to stdout)

249

"""

250

```

251

252

## Types

253

254

```python { .api }

255

from typing import Union, Optional, List, Dict, Any

256

from collections import deque

257

258

# Type aliases used in the API

259

StrOrBytes = Union[str, bytes]

260

261

# Storage structure (Python implementation)

262

Storage = Dict[str, Any] # Contains:

263

# {

264

# "base_level": int, # Base embedding level (0 for LTR, 1 for RTL)

265

# "base_dir": str, # Base direction ('L' or 'R')

266

# "chars": List[Dict], # Character data with level, type, original type

267

# "runs": deque # Level runs for processing

268

# }

269

270

# Character object structure (within Storage["chars"])

271

Character = Dict[str, Union[str, int]] # Contains:

272

# {

273

# "ch": str, # The character

274

# "level": int, # Embedding level

275

# "type": str, # BiDi character type

276

# "orig": str # Original BiDi character type

277

# }

278

```

279

280

## Implementation Differences

281

282

### Rust Implementation (Default)

283

- Higher performance

284

- Implements more recent Unicode BiDi algorithm

285

- Access via `from bidi import get_display, get_base_level` (uses compiled `.bidi` module)

286

- Does NOT support `upper_is_rtl` parameter

287

- Debug output: Formatted debug representation of internal BidiInfo structure

288

- Limited to main API functions

289

290

### Python Implementation

291

- Pure Python compatibility

292

- Implements Unicode BiDi algorithm v5

293

- Access via `from bidi.algorithm import get_display, get_base_level`

294

- Supports `upper_is_rtl` parameter for debugging

295

- Exposes internal algorithm functions for advanced usage

296

- Debug output: Detailed step-by-step algorithm information to stderr

297

- Suitable for educational purposes or when Rust implementation unavailable

298

299

## Error Handling

300

301

Both implementations handle common error cases gracefully:

302

303

### Common Error Conditions:

304

- **Invalid encodings**: Raise standard Python `UnicodeDecodeError` or `UnicodeEncodeError`

305

- **Empty or None text inputs**: Handled safely, return empty string or raise `ValueError`

306

- **Invalid `base_dir` values**: Rust implementation raises `ValueError` for values other than 'L', 'R', or None

307

- **Malformed Unicode text**: Processed according to Unicode BiDi algorithm specifications

308

309

### Rust Implementation Specific:

310

- **Empty paragraphs**: `get_base_level_inner()` raises `ValueError` for text with no paragraphs

311

- **Invalid base_dir**: Raises `ValueError` with message "base_dir can be 'L', 'R' or None"

312

313

### Python Implementation Specific:

314

- **Assertion errors**: Internal algorithm functions may raise `AssertionError` for invalid character types

315

- **Debug mode**: Outputs debugging information to `sys.stderr`, does not raise exceptions

316

317

### Encoding Support:

318

Supports any encoding that Python's `str.encode()` and `bytes.decode()` support, including:

319

- UTF-8 (default)

320

- UTF-16, UTF-32

321

- ASCII, Latin-1

322

- Windows code pages (cp1252, cp1255 for Hebrew)

323

- ISO encodings (iso-8859-1, iso-8859-8 for Hebrew)

324

325

## Usage Examples

326

327

### Processing Mixed Language Text

328

329

```python

330

from bidi import get_display

331

332

# English with Hebrew

333

text = "Hello שלום World"

334

display = get_display(text)

335

print(display) # Correctly ordered for display

336

337

# Numbers with RTL text

338

text = "הספר עולה 25 שקל"

339

display = get_display(text)

340

print(display) # Numbers maintain LTR order within RTL text

341

```

342

343

### Working with Different Encodings

344

345

```python

346

from bidi import get_display

347

348

# Hebrew text in different encoding

349

hebrew_cp1255 = "שלום".encode('cp1255')

350

display = get_display(hebrew_cp1255, encoding='cp1255')

351

print(display.decode('cp1255'))

352

```

353

354

### Debugging Text Processing

355

356

```python

357

from bidi.algorithm import get_display, debug_storage, get_empty_storage, get_embedding_levels

358

359

# Enable debug output

360

text = "Hello שלום"

361

display = get_display(text, debug=True)

362

# Outputs detailed algorithm steps to stderr

363

364

# Manual debugging with storage

365

storage = get_empty_storage()

366

get_embedding_levels(text, storage)

367

debug_storage(storage, base_info=True, chars=True, runs=True)

368

```