Comprehensive Chinese character to Pinyin conversion library with intelligent word segmentation and multiple output styles
npx @tessl/cli install tessl/pypi-pypinyin@0.55.00
# pypinyin
1
2
A comprehensive Chinese character to Pinyin conversion library for Python that provides intelligent word segmentation to match accurate pronunciation for multi-character phrases. It supports polyphonic characters with heteronym detection, multiple Pinyin output styles including tone marks, tone numbers, first letters, initials/finals separation, and Bopomofo notation.
3
4
## Package Information
5
6
- **Package Name**: pypinyin
7
- **Language**: Python
8
- **Installation**: `pip install pypinyin`
9
- **Documentation**: https://pypinyin.readthedocs.io/
10
11
## Core Imports
12
13
```python
14
import pypinyin
15
```
16
17
Common imports for core functionality:
18
19
```python
20
from pypinyin import pinyin, lazy_pinyin, slug, Style
21
```
22
23
For style constants:
24
25
```python
26
from pypinyin import (
27
NORMAL, TONE, TONE2, TONE3,
28
INITIALS, FIRST_LETTER, FINALS, FINALS_TONE, FINALS_TONE2, FINALS_TONE3,
29
BOPOMOFO, BOPOMOFO_FIRST, CYRILLIC, CYRILLIC_FIRST,
30
WADEGILES, GWOYEU, BRAILLE_MAINLAND, BRAILLE_MAINLAND_TONE
31
)
32
```
33
34
## Basic Usage
35
36
```python
37
from pypinyin import pinyin, lazy_pinyin, slug, Style
38
39
# Basic pinyin conversion with tone marks
40
text = "中国"
41
result = pinyin(text)
42
print(result) # [['zhōng'], ['guó']]
43
44
# Simple pinyin without tone marks
45
result = lazy_pinyin(text)
46
print(result) # ['zhong', 'guo']
47
48
# Different output styles
49
result = pinyin(text, style=Style.TONE3)
50
print(result) # [['zhong1'], ['guo2']]
51
52
result = pinyin(text, style=Style.FIRST_LETTER)
53
print(result) # [['z'], ['g']]
54
55
# Generate URL-friendly slugs
56
slug_text = slug(text)
57
print(slug_text) # zhong-guo
58
59
# Handle polyphonic characters (heteronyms)
60
text = "银行" # can be pronounced different ways
61
result = pinyin(text, heteronym=True)
62
print(result) # [['yín'], ['háng', 'xíng']]
63
```
64
65
## Architecture
66
67
pypinyin is built around a modular architecture:
68
69
- **Core conversion functions**: Main API functions for different use cases (pinyin, lazy_pinyin, slug)
70
- **Style system**: Comprehensive output format control through Style enum and constants
71
- **Converter backends**: Pluggable converter implementations (DefaultConverter, UltimateConverter)
72
- **Segmentation modules**: Word boundary detection for accurate pronunciation (mmseg, simpleseg)
73
- **Contrib modules**: Advanced features like tone sandhi, character variants, and specialized processing
74
75
## Capabilities
76
77
### Core Conversion Functions
78
79
Primary functions for converting Chinese characters to pinyin with various output options, heteronym support, and customization.
80
81
```python { .api }
82
def pinyin(hans, style=Style.TONE, heteronym=False, errors='default', strict=True, v_to_u=False, neutral_tone_with_five=False): ...
83
def lazy_pinyin(hans, style=Style.NORMAL, errors='default', strict=True, v_to_u=False, neutral_tone_with_five=False, tone_sandhi=False): ...
84
def slug(hans, style=Style.NORMAL, heteronym=False, separator='-', errors='default', strict=True): ...
85
```
86
87
[Core Functions](./core-functions.md)
88
89
### Output Styles and Formatting
90
91
Comprehensive style system controlling pinyin output format including tones, initials/finals, alternative notation systems, and specialized styles.
92
93
```python { .api }
94
class Style(IntEnum):
95
NORMAL = 0
96
TONE = 1
97
TONE2 = 2
98
INITIALS = 3
99
FIRST_LETTER = 4
100
FINALS = 5
101
FINALS_TONE = 6
102
FINALS_TONE2 = 7
103
TONE3 = 8
104
FINALS_TONE3 = 9
105
BOPOMOFO = 10
106
BOPOMOFO_FIRST = 11
107
CYRILLIC = 12
108
CYRILLIC_FIRST = 13
109
WADEGILES = 14
110
GWOYEU = 15
111
BRAILLE_MAINLAND = 16
112
BRAILLE_MAINLAND_TONE = 17
113
```
114
115
[Styles and Formatting](./styles-formatting.md)
116
117
### Dictionary Customization
118
119
Functions for loading custom pronunciation dictionaries to override default pinyin mappings for specific characters or phrases.
120
121
```python { .api }
122
def load_single_dict(pinyin_dict, style='default'): ...
123
def load_phrases_dict(phrases_dict, style='default'): ...
124
```
125
126
[Dictionary Customization](./dictionary-customization.md)
127
128
### Command-line Interface
129
130
Command-line tools for batch processing and format conversion.
131
132
```bash
133
pypinyin [options] [input_text]
134
python -m pypinyin.tools.toneconvert [action] [input]
135
```
136
137
[Command-line Tools](./command-line-tools.md)
138
139
### Advanced Features
140
141
Extended functionality including custom converters, tone sandhi processing, segmentation control, and specialized mixins.
142
143
```python { .api }
144
class Pinyin:
145
def __init__(self, converter=None): ...
146
147
class DefaultConverter: ...
148
class UltimateConverter: ...
149
```
150
151
[Advanced Features](./advanced-features.md)
152
153
## Exception Handling
154
155
```python { .api }
156
class PinyinNotFoundException(Exception):
157
"""
158
Raised when no pinyin pronunciation found for input characters.
159
160
Attributes:
161
- message (str): Exception message
162
- chars (str): Characters that caused the exception
163
"""
164
165
def __init__(self, chars):
166
"""Initialize exception with problematic characters."""
167
self.message = 'No pinyin found for character "{}"'.format(chars)
168
self.chars = chars
169
super(PinyinNotFoundException, self).__init__(self.message)
170
```
171
172
Common error handling patterns:
173
174
```python
175
from pypinyin import pinyin, PinyinNotFoundException
176
177
try:
178
result = pinyin("some text", errors='exception')
179
except PinyinNotFoundException as e:
180
print(f"No pinyin found: {e.message}")
181
print(f"Problematic characters: {e.chars}")
182
```