or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

advanced-features.mdcommand-line-tools.mdcore-functions.mddictionary-customization.mdindex.mdstyles-formatting.md

index.mddocs/

0

# pypinyin

1

2

A comprehensive Chinese character to Pinyin conversion library for Python that provides intelligent word segmentation to match accurate pronunciation for multi-character phrases. It supports polyphonic characters with heteronym detection, multiple Pinyin output styles including tone marks, tone numbers, first letters, initials/finals separation, and Bopomofo notation.

3

4

## Package Information

5

6

- **Package Name**: pypinyin

7

- **Language**: Python

8

- **Installation**: `pip install pypinyin`

9

- **Documentation**: https://pypinyin.readthedocs.io/

10

11

## Core Imports

12

13

```python

14

import pypinyin

15

```

16

17

Common imports for core functionality:

18

19

```python

20

from pypinyin import pinyin, lazy_pinyin, slug, Style

21

```

22

23

For style constants:

24

25

```python

26

from pypinyin import (

27

NORMAL, TONE, TONE2, TONE3,

28

INITIALS, FIRST_LETTER, FINALS, FINALS_TONE, FINALS_TONE2, FINALS_TONE3,

29

BOPOMOFO, BOPOMOFO_FIRST, CYRILLIC, CYRILLIC_FIRST,

30

WADEGILES, GWOYEU, BRAILLE_MAINLAND, BRAILLE_MAINLAND_TONE

31

)

32

```

33

34

## Basic Usage

35

36

```python

37

from pypinyin import pinyin, lazy_pinyin, slug, Style

38

39

# Basic pinyin conversion with tone marks

40

text = "中国"

41

result = pinyin(text)

42

print(result) # [['zhōng'], ['guó']]

43

44

# Simple pinyin without tone marks

45

result = lazy_pinyin(text)

46

print(result) # ['zhong', 'guo']

47

48

# Different output styles

49

result = pinyin(text, style=Style.TONE3)

50

print(result) # [['zhong1'], ['guo2']]

51

52

result = pinyin(text, style=Style.FIRST_LETTER)

53

print(result) # [['z'], ['g']]

54

55

# Generate URL-friendly slugs

56

slug_text = slug(text)

57

print(slug_text) # zhong-guo

58

59

# Handle polyphonic characters (heteronyms)

60

text = "银行" # can be pronounced different ways

61

result = pinyin(text, heteronym=True)

62

print(result) # [['yín'], ['háng', 'xíng']]

63

```

64

65

## Architecture

66

67

pypinyin is built around a modular architecture:

68

69

- **Core conversion functions**: Main API functions for different use cases (pinyin, lazy_pinyin, slug)

70

- **Style system**: Comprehensive output format control through Style enum and constants

71

- **Converter backends**: Pluggable converter implementations (DefaultConverter, UltimateConverter)

72

- **Segmentation modules**: Word boundary detection for accurate pronunciation (mmseg, simpleseg)

73

- **Contrib modules**: Advanced features like tone sandhi, character variants, and specialized processing

74

75

## Capabilities

76

77

### Core Conversion Functions

78

79

Primary functions for converting Chinese characters to pinyin with various output options, heteronym support, and customization.

80

81

```python { .api }

82

def pinyin(hans, style=Style.TONE, heteronym=False, errors='default', strict=True, v_to_u=False, neutral_tone_with_five=False): ...

83

def lazy_pinyin(hans, style=Style.NORMAL, errors='default', strict=True, v_to_u=False, neutral_tone_with_five=False, tone_sandhi=False): ...

84

def slug(hans, style=Style.NORMAL, heteronym=False, separator='-', errors='default', strict=True): ...

85

```

86

87

[Core Functions](./core-functions.md)

88

89

### Output Styles and Formatting

90

91

Comprehensive style system controlling pinyin output format including tones, initials/finals, alternative notation systems, and specialized styles.

92

93

```python { .api }

94

class Style(IntEnum):

95

NORMAL = 0

96

TONE = 1

97

TONE2 = 2

98

INITIALS = 3

99

FIRST_LETTER = 4

100

FINALS = 5

101

FINALS_TONE = 6

102

FINALS_TONE2 = 7

103

TONE3 = 8

104

FINALS_TONE3 = 9

105

BOPOMOFO = 10

106

BOPOMOFO_FIRST = 11

107

CYRILLIC = 12

108

CYRILLIC_FIRST = 13

109

WADEGILES = 14

110

GWOYEU = 15

111

BRAILLE_MAINLAND = 16

112

BRAILLE_MAINLAND_TONE = 17

113

```

114

115

[Styles and Formatting](./styles-formatting.md)

116

117

### Dictionary Customization

118

119

Functions for loading custom pronunciation dictionaries to override default pinyin mappings for specific characters or phrases.

120

121

```python { .api }

122

def load_single_dict(pinyin_dict, style='default'): ...

123

def load_phrases_dict(phrases_dict, style='default'): ...

124

```

125

126

[Dictionary Customization](./dictionary-customization.md)

127

128

### Command-line Interface

129

130

Command-line tools for batch processing and format conversion.

131

132

```bash

133

pypinyin [options] [input_text]

134

python -m pypinyin.tools.toneconvert [action] [input]

135

```

136

137

[Command-line Tools](./command-line-tools.md)

138

139

### Advanced Features

140

141

Extended functionality including custom converters, tone sandhi processing, segmentation control, and specialized mixins.

142

143

```python { .api }

144

class Pinyin:

145

def __init__(self, converter=None): ...

146

147

class DefaultConverter: ...

148

class UltimateConverter: ...

149

```

150

151

[Advanced Features](./advanced-features.md)

152

153

## Exception Handling

154

155

```python { .api }

156

class PinyinNotFoundException(Exception):

157

"""

158

Raised when no pinyin pronunciation found for input characters.

159

160

Attributes:

161

- message (str): Exception message

162

- chars (str): Characters that caused the exception

163

"""

164

165

def __init__(self, chars):

166

"""Initialize exception with problematic characters."""

167

self.message = 'No pinyin found for character "{}"'.format(chars)

168

self.chars = chars

169

super(PinyinNotFoundException, self).__init__(self.message)

170

```

171

172

Common error handling patterns:

173

174

```python

175

from pypinyin import pinyin, PinyinNotFoundException

176

177

try:

178

result = pinyin("some text", errors='exception')

179

except PinyinNotFoundException as e:

180

print(f"No pinyin found: {e.message}")

181

print(f"Problematic characters: {e.chars}")

182

```