Comprehensive Chinese character to Pinyin conversion library with intelligent word segmentation and multiple output styles
—
Primary functions for converting Chinese characters to pinyin with various output options, heteronym support, and error handling customization.
The primary function for converting Chinese characters to pinyin with comprehensive options for output style, heteronym handling, and error processing.
def pinyin(hans, style=Style.TONE, heteronym=False, errors='default', strict=True, v_to_u=False, neutral_tone_with_five=False):
"""
Convert Chinese characters to pinyin.
Parameters:
- hans (str): Chinese characters to convert
- style (Style): Output style (default: Style.TONE)
- heteronym (bool): Return all possible pronunciations for polyphonic characters (default: False)
- errors (str): Error handling strategy ('default', 'ignore', 'replace', 'exception') (default: 'default')
- strict (bool): Strict mode for character processing (default: True)
- v_to_u (bool): Convert 'v' to 'ü' in output (default: False)
- neutral_tone_with_five (bool): Use '5' for neutral tone in numeric styles (default: False)
Returns:
list: List of lists, where each inner list contains pinyin for one character.
With heteronym=True, inner lists may contain multiple pronunciations.
"""from pypinyin import pinyin, Style
# Basic conversion with tone marks
result = pinyin('中国')
print(result) # [['zhōng'], ['guó']]
# Heteronym support - multiple pronunciations
result = pinyin('银行', heteronym=True)
print(result) # [['yín'], ['háng', 'xíng']]
# Different styles
result = pinyin('中国', style=Style.TONE3)
print(result) # [['zhong1'], ['guo2']]
result = pinyin('中国', style=Style.INITIALS)
print(result) # [['zh'], ['g']]
# Character conversion options
result = pinyin('女', style=Style.TONE2, v_to_u=True)
print(result) # [['nǔ']] instead of [['nv3']]
# Error handling
result = pinyin('中国abc', errors='ignore')
print(result) # [['zhōng'], ['guó']]
result = pinyin('中国abc', errors='replace')
print(result) # [['zhōng'], ['guó'], ['abc']]Optimized function for simple pinyin conversion without heteronym results, ideal for most common use cases.
def lazy_pinyin(hans, style=Style.NORMAL, errors='default', strict=True, v_to_u=False, neutral_tone_with_five=False, tone_sandhi=False):
"""
Convert Chinese characters to pinyin (lazy mode - no heteronyms).
Parameters:
- hans (str): Chinese characters to convert
- style (Style): Output style (default: Style.NORMAL)
- errors (str): Error handling strategy ('default', 'ignore', 'replace', 'exception') (default: 'default')
- strict (bool): Strict mode for character processing (default: True)
- v_to_u (bool): Convert 'v' to 'ü' in output (default: False)
- neutral_tone_with_five (bool): Use '5' for neutral tone in numeric styles (default: False)
- tone_sandhi (bool): Apply tone sandhi processing rules (default: False)
Returns:
list: Flat list of pinyin strings, one per character.
"""from pypinyin import lazy_pinyin, Style
# Simple conversion
result = lazy_pinyin('中国')
print(result) # ['zhong', 'guo']
# With tones
result = lazy_pinyin('中国', style=Style.TONE)
print(result) # ['zhōng', 'guó']
# Tone sandhi processing
result = lazy_pinyin('一个', tone_sandhi=True)
print(result) # Applies tone change rules
# First letters only
result = lazy_pinyin('中华人民共和国', style=Style.FIRST_LETTER)
print(result) # ['z', 'h', 'r', 'm', 'g', 'h', 'g']Generate URL-friendly slug strings from Chinese characters using pinyin conversion with customizable separators.
def slug(hans, style=Style.NORMAL, heteronym=False, separator='-', errors='default', strict=True):
"""
Generate slug string from Chinese characters.
Parameters:
- hans (str): Chinese characters to convert
- style (Style): Output style (default: Style.NORMAL)
- heteronym (bool): Include all pronunciations for polyphonic characters (default: False)
- separator (str): Separator between pinyin syllables (default: '-')
- errors (str): Error handling strategy ('default', 'ignore', 'replace', 'exception') (default: 'default')
- strict (bool): Strict mode for character processing (default: True)
Returns:
str: URL-friendly slug string.
"""from pypinyin import slug, Style
# Basic slug generation
result = slug('中国')
print(result) # 'zhong-guo'
# Custom separator
result = slug('中国', separator='_')
print(result) # 'zhong_guo'
# With tones (not typical for URLs)
result = slug('中国', style=Style.TONE)
print(result) # 'zhōng-guó'
# Heteronym handling
result = slug('银行', heteronym=True, separator='_')
print(result) # 'yin_hang_xing'
# Mixed content
result = slug('北京大学2023')
print(result) # 'bei-jing-da-xue-2023'All core functions support four error handling strategies:
# Demonstration of error handling
text = '中国abc123'
# Default: keep unrecognized characters
result = lazy_pinyin(text, errors='default')
print(result) # ['zhong', 'guo', 'abc123']
# Ignore: skip unrecognized characters
result = lazy_pinyin(text, errors='ignore')
print(result) # ['zhong', 'guo']
# Replace: substitute unrecognized characters with unicode codes
result = lazy_pinyin(text, errors='replace')
print(result) # ['zhong', 'guo', '61626331323'] # Unicode codes without \u
# Exception: raise error for unrecognized characters
from pypinyin import PinyinNotFoundException
try:
result = lazy_pinyin(text, errors='exception')
except PinyinNotFoundException as e:
print(f"Exception raised: {e.message}")
print(f"Problematic chars: {e.chars}")Install with Tessl CLI
npx tessl i tessl/pypi-pypinyin