or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

docs

index.md
tile.json

tessl/pypi-wcwidth

Measures the displayed width of unicode strings in a terminal

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
pypipkg:pypi/wcwidth@0.2.x

To install, run

npx @tessl/cli install tessl/pypi-wcwidth@0.2.0

index.mddocs/

wcwidth

A Python implementation of the POSIX wcwidth() and wcswidth() C functions for determining the printable width of Unicode strings on terminals. This library addresses the issue that string length doesn't always equal terminal display width due to characters that occupy 0 cells (zero-width/combining), 1 cell (normal), or 2 cells (wide East Asian characters).

The library includes comprehensive Unicode character width tables that can be configured to match specific Unicode versions via environment variables, making it essential for CLI applications, terminal emulators, and any software that needs accurate text formatting and alignment in terminal environments.

Package Information

  • Package Name: wcwidth
  • Package Type: pypi
  • Language: Python
  • Installation: pip install wcwidth
  • Version: 0.2.13
  • License: MIT

Core Imports

import wcwidth

Selective imports for commonly used functions:

from wcwidth import wcwidth, wcswidth, list_versions

Import all (includes private functions):

from wcwidth import *

Basic Usage

from wcwidth import wcwidth, wcswidth

# Get width of a single character
char_width = wcwidth('A')  # Returns 1
wide_char_width = wcwidth('コ')  # Returns 2 (Japanese character)
zero_width = wcwidth('\u200d')  # Returns 0 (zero-width joiner)

# Get width of a string
string_width = wcswidth('Hello')  # Returns 5
japanese_width = wcswidth('コンニチハ')  # Returns 10
mixed_width = wcswidth('Hello コ')  # Returns 7

# Use with specific Unicode version
width_unicode_9 = wcwidth('🎉', unicode_version='9.0.0')

Architecture

The wcwidth library is built around Unicode character width tables and binary search algorithms:

  • Character Width Tables: Pre-computed tables for different Unicode versions containing ranges for zero-width, wide, and special characters
  • Binary Search: Efficient lookup of character widths using _bisearch() function
  • Unicode Version Support: Configurable support for Unicode versions 4.1.0 through 15.1.0
  • Caching: LRU caches on core functions for performance optimization
  • Environment Integration: Automatic Unicode version detection via UNICODE_VERSION environment variable

Capabilities

Character Width Calculation

Core functions for determining the printable width of Unicode characters and strings in terminal environments.

def wcwidth(wc, unicode_version='auto'):
    """
    Given one Unicode character, return its printable length on a terminal.

    Parameters:
    - wc: str, a single Unicode character
    - unicode_version: str, Unicode version ('auto', 'latest', or specific version like '9.0.0')

    Returns:
    int, the width in cells:
    - -1: not printable or indeterminate effect (control characters)
    - 0: does not advance cursor (NULL, combining characters, zero-width)
    - 1: normal width characters
    - 2: wide characters (East Asian full-width)
    """

def wcswidth(pwcs, n=None, unicode_version='auto'):
    """
    Given a unicode string, return its printable length on a terminal.

    Parameters:
    - pwcs: str, unicode string to measure
    - n: int, optional maximum number of characters to measure (for POSIX compatibility)
    - unicode_version: str, Unicode version ('auto', 'latest', or specific version)

    Returns:
    int, total width in cells, or -1 if any character is not printable
    """

Unicode Version Management

Functions for working with supported Unicode versions and version matching.

def list_versions():
    """
    Return Unicode version levels supported by this module release.

    Returns:
    tuple of str, supported Unicode version numbers in ascending sorted order
    """

Internal/Advanced Functions

Internal functions exported for advanced usage, but not part of the main public API.

def _bisearch(ucs, table):
    """
    Auxiliary function for binary search in interval table.

    Parameters:
    - ucs: int, ordinal value of unicode character
    - table: list, list of starting and ending ranges as [(start, end), ...]

    Returns:
    int, 1 if ordinal value ucs is found within lookup table, else 0
    """

def _wcmatch_version(given_version):
    """
    Return nearest matching supported Unicode version level.

    Parameters:
    - given_version: str, version for compare, may be 'auto' or 'latest'

    Returns:
    str, matched unicode version string
    """

def _wcversion_value(ver_string):
    """
    Integer-mapped value of given dotted version string.

    Parameters:
    - ver_string: str, Unicode version string of form 'n.n.n'

    Returns:
    tuple of int, digit tuples representing version components
    """

Constants and Tables

Character width lookup tables and constants for different character categories.

ZERO_WIDTH: dict
    # Unicode character table for zero-width characters by version
    # Format: {'version': [(start, end), ...]}

WIDE_EASTASIAN: dict
    # Unicode character table for wide East Asian characters by version
    # Format: {'version': [(start, end), ...]}

VS16_NARROW_TO_WIDE: dict
    # Unicode character table for variation selector 16 width changes
    # Format: {'version': [(start, end), ...]}

__version__: str
    # Package version string, currently '0.2.13'

Environment Variables

UNICODE_VERSION

Controls which Unicode version tables to use when unicode_version='auto' is specified.

import os
os.environ['UNICODE_VERSION'] = '13.0'

# Now wcwidth() will use Unicode 13.0 tables by default
width = wcwidth('🎉')  # Uses Unicode 13.0 tables

If not set, defaults to the latest supported version (15.1.0).

Supported Unicode Versions

The library supports the following Unicode versions:

  • 4.1.0 through 15.1.0
  • Complete list: 4.1.0, 5.0.0, 5.1.0, 5.2.0, 6.0.0, 6.1.0, 6.2.0, 6.3.0, 7.0.0, 8.0.0, 9.0.0, 10.0.0, 11.0.0, 12.0.0, 12.1.0, 13.0.0, 14.0.0, 15.0.0, 15.1.0

Special Character Handling

Zero-Width Joiner (ZWJ) Sequences

from wcwidth import wcswidth

# ZWJ sequences are handled specially
emoji_sequence = '👨‍👩‍👧‍👦'  # Family emoji with ZWJ
width = wcswidth(emoji_sequence)  # Correctly handles ZWJ sequences

Variation Selector 16 (VS16)

# VS16 can change narrow characters to wide
text_with_vs16 = '🎉\uFE0F'  # Emoji with VS16
width = wcswidth(text_with_vs16, unicode_version='9.0.0')

Control Characters

# Control characters return -1
control_char_width = wcwidth('\x01')  # Returns -1
string_with_control = wcswidth('Hello\x01World')  # Returns -1

Error Handling

The library handles various edge cases:

  • Empty strings: wcwidth('') returns 0, wcswidth('') returns 0
  • Control characters: Return -1 for non-printable characters
  • Invalid Unicode versions: Issues warnings and falls back to nearest supported version
  • Mixed printable/non-printable: wcswidth() returns -1 if any character is non-printable

Performance Considerations

  • LRU Caching: wcwidth() uses @lru_cache(maxsize=1000) for performance
  • Version Matching: Unicode version matching is cached with @lru_cache(maxsize=8)
  • Version Parsing: Version string parsing is cached with @lru_cache(maxsize=128)
  • ASCII Optimization: Fast path for printable ASCII characters (32-127)

Dependencies

  • backports.functools-lru-cache: Required for Python < 3.2
  • No other runtime dependencies

Common Use Cases

Terminal Text Alignment

from wcwidth import wcswidth

def terminal_center(text, width):
    """Center text in terminal with correct width calculation."""
    text_width = wcswidth(text)
    if text_width is None or text_width < 0:
        return text  # Handle unprintable characters
    padding = max(0, width - text_width)
    left_pad = padding // 2
    return ' ' * left_pad + text

# Usage
centered = terminal_center('Hello コンニチハ', 40)

Text Truncation

from wcwidth import wcswidth

def truncate_to_width(text, max_width):
    """Truncate text to fit within specified terminal width."""
    for i in range(len(text) + 1):
        substring = text[:i]
        width = wcswidth(substring)
        if width is not None and width > max_width:
            return text[:i-1] + '…'
    return text

# Usage
truncated = truncate_to_width('Very long text with unicode コンニチハ', 20)

Column Formatting

from wcwidth import wcswidth

def format_columns(rows, column_widths):
    """Format data in aligned columns considering Unicode width."""
    formatted_rows = []
    for row in rows:
        formatted_row = []
        for cell, width in zip(row, column_widths):
            cell_width = wcswidth(str(cell)) or 0
            padding = max(0, width - cell_width)
            formatted_row.append(str(cell) + ' ' * padding)
        formatted_rows.append(''.join(formatted_row))
    return formatted_rows

# Usage
data = [['Name', 'Age', 'City'], ['Alice', '25', 'Tokyo 東京']]
formatted = format_columns(data, [15, 5, 20])