0
# wcwidth
1
2
A Python implementation of the POSIX wcwidth() and wcswidth() C functions for determining the printable width of Unicode strings on terminals. This library addresses the issue that string length doesn't always equal terminal display width due to characters that occupy 0 cells (zero-width/combining), 1 cell (normal), or 2 cells (wide East Asian characters).
3
4
The library includes comprehensive Unicode character width tables that can be configured to match specific Unicode versions via environment variables, making it essential for CLI applications, terminal emulators, and any software that needs accurate text formatting and alignment in terminal environments.
5
6
## Package Information
7
8
- **Package Name**: wcwidth
9
- **Package Type**: pypi
10
- **Language**: Python
11
- **Installation**: `pip install wcwidth`
12
- **Version**: 0.2.13
13
- **License**: MIT
14
15
## Core Imports
16
17
```python
18
import wcwidth
19
```
20
21
Selective imports for commonly used functions:
22
23
```python
24
from wcwidth import wcwidth, wcswidth, list_versions
25
```
26
27
Import all (includes private functions):
28
29
```python
30
from wcwidth import *
31
```
32
33
## Basic Usage
34
35
```python
36
from wcwidth import wcwidth, wcswidth
37
38
# Get width of a single character
39
char_width = wcwidth('A') # Returns 1
40
wide_char_width = wcwidth('コ') # Returns 2 (Japanese character)
41
zero_width = wcwidth('\u200d') # Returns 0 (zero-width joiner)
42
43
# Get width of a string
44
string_width = wcswidth('Hello') # Returns 5
45
japanese_width = wcswidth('コンニチハ') # Returns 10
46
mixed_width = wcswidth('Hello コ') # Returns 7
47
48
# Use with specific Unicode version
49
width_unicode_9 = wcwidth('🎉', unicode_version='9.0.0')
50
```
51
52
## Architecture
53
54
The wcwidth library is built around Unicode character width tables and binary search algorithms:
55
56
- **Character Width Tables**: Pre-computed tables for different Unicode versions containing ranges for zero-width, wide, and special characters
57
- **Binary Search**: Efficient lookup of character widths using `_bisearch()` function
58
- **Unicode Version Support**: Configurable support for Unicode versions 4.1.0 through 15.1.0
59
- **Caching**: LRU caches on core functions for performance optimization
60
- **Environment Integration**: Automatic Unicode version detection via `UNICODE_VERSION` environment variable
61
62
## Capabilities
63
64
### Character Width Calculation
65
66
Core functions for determining the printable width of Unicode characters and strings in terminal environments.
67
68
```python { .api }
69
def wcwidth(wc, unicode_version='auto'):
70
"""
71
Given one Unicode character, return its printable length on a terminal.
72
73
Parameters:
74
- wc: str, a single Unicode character
75
- unicode_version: str, Unicode version ('auto', 'latest', or specific version like '9.0.0')
76
77
Returns:
78
int, the width in cells:
79
- -1: not printable or indeterminate effect (control characters)
80
- 0: does not advance cursor (NULL, combining characters, zero-width)
81
- 1: normal width characters
82
- 2: wide characters (East Asian full-width)
83
"""
84
85
def wcswidth(pwcs, n=None, unicode_version='auto'):
86
"""
87
Given a unicode string, return its printable length on a terminal.
88
89
Parameters:
90
- pwcs: str, unicode string to measure
91
- n: int, optional maximum number of characters to measure (for POSIX compatibility)
92
- unicode_version: str, Unicode version ('auto', 'latest', or specific version)
93
94
Returns:
95
int, total width in cells, or -1 if any character is not printable
96
"""
97
```
98
99
### Unicode Version Management
100
101
Functions for working with supported Unicode versions and version matching.
102
103
```python { .api }
104
def list_versions():
105
"""
106
Return Unicode version levels supported by this module release.
107
108
Returns:
109
tuple of str, supported Unicode version numbers in ascending sorted order
110
"""
111
```
112
113
### Internal/Advanced Functions
114
115
Internal functions exported for advanced usage, but not part of the main public API.
116
117
```python { .api }
118
def _bisearch(ucs, table):
119
"""
120
Auxiliary function for binary search in interval table.
121
122
Parameters:
123
- ucs: int, ordinal value of unicode character
124
- table: list, list of starting and ending ranges as [(start, end), ...]
125
126
Returns:
127
int, 1 if ordinal value ucs is found within lookup table, else 0
128
"""
129
130
def _wcmatch_version(given_version):
131
"""
132
Return nearest matching supported Unicode version level.
133
134
Parameters:
135
- given_version: str, version for compare, may be 'auto' or 'latest'
136
137
Returns:
138
str, matched unicode version string
139
"""
140
141
def _wcversion_value(ver_string):
142
"""
143
Integer-mapped value of given dotted version string.
144
145
Parameters:
146
- ver_string: str, Unicode version string of form 'n.n.n'
147
148
Returns:
149
tuple of int, digit tuples representing version components
150
"""
151
```
152
153
## Constants and Tables
154
155
Character width lookup tables and constants for different character categories.
156
157
```python { .api }
158
ZERO_WIDTH: dict
159
# Unicode character table for zero-width characters by version
160
# Format: {'version': [(start, end), ...]}
161
162
WIDE_EASTASIAN: dict
163
# Unicode character table for wide East Asian characters by version
164
# Format: {'version': [(start, end), ...]}
165
166
VS16_NARROW_TO_WIDE: dict
167
# Unicode character table for variation selector 16 width changes
168
# Format: {'version': [(start, end), ...]}
169
170
__version__: str
171
# Package version string, currently '0.2.13'
172
```
173
174
## Environment Variables
175
176
### UNICODE_VERSION
177
178
Controls which Unicode version tables to use when `unicode_version='auto'` is specified.
179
180
```python
181
import os
182
os.environ['UNICODE_VERSION'] = '13.0'
183
184
# Now wcwidth() will use Unicode 13.0 tables by default
185
width = wcwidth('🎉') # Uses Unicode 13.0 tables
186
```
187
188
If not set, defaults to the latest supported version (15.1.0).
189
190
## Supported Unicode Versions
191
192
The library supports the following Unicode versions:
193
194
- **4.1.0** through **15.1.0**
195
- Complete list: 4.1.0, 5.0.0, 5.1.0, 5.2.0, 6.0.0, 6.1.0, 6.2.0, 6.3.0, 7.0.0, 8.0.0, 9.0.0, 10.0.0, 11.0.0, 12.0.0, 12.1.0, 13.0.0, 14.0.0, 15.0.0, 15.1.0
196
197
## Special Character Handling
198
199
### Zero-Width Joiner (ZWJ) Sequences
200
201
```python
202
from wcwidth import wcswidth
203
204
# ZWJ sequences are handled specially
205
emoji_sequence = '👨👩👧👦' # Family emoji with ZWJ
206
width = wcswidth(emoji_sequence) # Correctly handles ZWJ sequences
207
```
208
209
### Variation Selector 16 (VS16)
210
211
```python
212
# VS16 can change narrow characters to wide
213
text_with_vs16 = '🎉\uFE0F' # Emoji with VS16
214
width = wcswidth(text_with_vs16, unicode_version='9.0.0')
215
```
216
217
### Control Characters
218
219
```python
220
# Control characters return -1
221
control_char_width = wcwidth('\x01') # Returns -1
222
string_with_control = wcswidth('Hello\x01World') # Returns -1
223
```
224
225
## Error Handling
226
227
The library handles various edge cases:
228
229
- **Empty strings**: `wcwidth('')` returns 0, `wcswidth('')` returns 0
230
- **Control characters**: Return -1 for non-printable characters
231
- **Invalid Unicode versions**: Issues warnings and falls back to nearest supported version
232
- **Mixed printable/non-printable**: `wcswidth()` returns -1 if any character is non-printable
233
234
## Performance Considerations
235
236
- **LRU Caching**: `wcwidth()` uses `@lru_cache(maxsize=1000)` for performance
237
- **Version Matching**: Unicode version matching is cached with `@lru_cache(maxsize=8)`
238
- **Version Parsing**: Version string parsing is cached with `@lru_cache(maxsize=128)`
239
- **ASCII Optimization**: Fast path for printable ASCII characters (32-127)
240
241
## Dependencies
242
243
- **backports.functools-lru-cache**: Required for Python < 3.2
244
- **No other runtime dependencies**
245
246
## Common Use Cases
247
248
### Terminal Text Alignment
249
250
```python
251
from wcwidth import wcswidth
252
253
def terminal_center(text, width):
254
"""Center text in terminal with correct width calculation."""
255
text_width = wcswidth(text)
256
if text_width is None or text_width < 0:
257
return text # Handle unprintable characters
258
padding = max(0, width - text_width)
259
left_pad = padding // 2
260
return ' ' * left_pad + text
261
262
# Usage
263
centered = terminal_center('Hello コンニチハ', 40)
264
```
265
266
### Text Truncation
267
268
```python
269
from wcwidth import wcswidth
270
271
def truncate_to_width(text, max_width):
272
"""Truncate text to fit within specified terminal width."""
273
for i in range(len(text) + 1):
274
substring = text[:i]
275
width = wcswidth(substring)
276
if width is not None and width > max_width:
277
return text[:i-1] + '…'
278
return text
279
280
# Usage
281
truncated = truncate_to_width('Very long text with unicode コンニチハ', 20)
282
```
283
284
### Column Formatting
285
286
```python
287
from wcwidth import wcswidth
288
289
def format_columns(rows, column_widths):
290
"""Format data in aligned columns considering Unicode width."""
291
formatted_rows = []
292
for row in rows:
293
formatted_row = []
294
for cell, width in zip(row, column_widths):
295
cell_width = wcswidth(str(cell)) or 0
296
padding = max(0, width - cell_width)
297
formatted_row.append(str(cell) + ' ' * padding)
298
formatted_rows.append(''.join(formatted_row))
299
return formatted_rows
300
301
# Usage
302
data = [['Name', 'Age', 'City'], ['Alice', '25', 'Tokyo 東京']]
303
formatted = format_columns(data, [15, 5, 20])
304
```