0
# Command-line Tools
1
2
Command-line interfaces for batch processing, format conversion, and integration with shell scripts and automation workflows.
3
4
## Capabilities
5
6
### Main Command-line Interface
7
8
The primary command-line tool for converting Chinese text to pinyin with full option support.
9
10
```bash { .api }
11
pypinyin [options] [input_text]
12
```
13
14
#### Command-line Options
15
16
```bash
17
# Basic usage
18
pypinyin "中国" # Basic conversion with default options
19
pypinyin "中国" --style tone # Specify output style
20
pypinyin "中国" --style tone3 # Tone numbers after pinyin
21
22
# Style options
23
pypinyin "中国" --style normal # No tones
24
pypinyin "中国" --style initials # Initial consonants only
25
pypinyin "中国" --style finals # Final vowels only
26
pypinyin "中国" --style first_letter # First letters only
27
28
# Advanced options
29
pypinyin "银行" --heteronym # Show all pronunciations
30
pypinyin "text" --errors ignore # Skip unrecognized characters
31
pypinyin "女" --v-to-u # Convert v to ü
32
33
# Output formatting
34
pypinyin "中国" --separator "_" # Custom separator
35
pypinyin "中国" --no-tone-num # Disable tone numbers in numeric styles
36
```
37
38
#### Usage Examples
39
40
```bash
41
# Simple conversion
42
$ pypinyin "中华人民共和国"
43
zhōng huá rén mín gòng hé guó
44
45
# Different styles
46
$ pypinyin "中华人民共和国" --style normal
47
zhong hua ren min gong he guo
48
49
$ pypinyin "中华人民共和国" --style tone3
50
zhong1 hua2 ren2 min2 gong4 he2 guo2
51
52
$ pypinyin "中华人民共和国" --style first_letter
53
z h r m g h g
54
55
# Heteronym support
56
$ pypinyin "银行" --heteronym
57
yín háng,xíng
58
59
# Pipe support
60
$ echo "北京大学" | pypinyin
61
běi jīng dà xué
62
63
# File processing
64
$ pypinyin < input.txt > output.txt
65
$ cat chinese_text.txt | pypinyin --style tone3 > pinyin_output.txt
66
```
67
68
### Module Execution
69
70
Execute pypinyin as a Python module for integration with Python workflows.
71
72
```bash { .api }
73
python -m pypinyin [options] [input_text]
74
```
75
76
#### Usage Examples
77
78
```bash
79
# Direct module execution
80
$ python -m pypinyin "中国"
81
zhōng guó
82
83
# With Python options
84
$ python -m pypinyin "中国" --style tone3
85
zhong1 guo2
86
87
# Environment variable support
88
$ export PYPINYIN_STYLE=normal
89
$ python -m pypinyin "中国"
90
zhong guo
91
```
92
93
### Tone Conversion Tools
94
95
Specialized tools for converting between different tone representation formats.
96
97
```bash { .api }
98
python -m pypinyin.tools.toneconvert [action] [input]
99
```
100
101
#### Available Actions
102
103
- **to_normal**: Convert to normal style (remove tones)
104
- **to_tone**: Convert to standard tone marks
105
- **to_tone2**: Convert to tone2 format (numbers after vowels)
106
- **to_tone3**: Convert to tone3 format (numbers after pinyin)
107
108
#### Usage Examples
109
110
```bash
111
# Convert tone marks to tone numbers
112
$ echo "zhōng guó" | python -m pypinyin.tools.toneconvert to_tone3
113
zhong1 guo2
114
115
# Convert tone numbers to tone marks
116
$ echo "zhong1 guo2" | python -m pypinyin.tools.toneconvert to_tone
117
zhōng guó
118
119
# Remove tones entirely
120
$ echo "zhōng guó" | python -m pypinyin.tools.toneconvert to_normal
121
zhong guo
122
123
# Convert between number formats
124
$ echo "zho1ng guo2" | python -m pypinyin.tools.toneconvert to_tone3
125
zhong1 guo2
126
127
# Batch file processing
128
$ python -m pypinyin.tools.toneconvert to_tone3 < input_with_tones.txt > output_with_numbers.txt
129
```
130
131
## Integration Patterns
132
133
### Shell Script Integration
134
135
Common patterns for integrating pypinyin into shell scripts and automation:
136
137
```bash
138
#!/bin/bash
139
140
# Process multiple files
141
for file in *.txt; do
142
echo "Processing $file..."
143
pypinyin --style normal < "$file" > "${file%.txt}_pinyin.txt"
144
done
145
146
# Create searchable index
147
create_pinyin_index() {
148
local input_file="$1"
149
local index_file="${input_file%.txt}_index.txt"
150
151
# Create first-letter index for search
152
pypinyin --style first_letter < "$input_file" | \
153
tr ' ' '\n' | \
154
sort | uniq > "$index_file"
155
}
156
157
# URL slug generation
158
generate_url_slug() {
159
local chinese_title="$1"
160
echo "$chinese_title" | pypinyin --style normal --separator "-"
161
}
162
163
# Example usage
164
chinese_title="北京大学计算机科学"
165
url_slug=$(generate_url_slug "$chinese_title")
166
echo "URL slug: $url_slug" # beijing-da-xue-ji-suan-ji-ke-xue
167
```
168
169
### Batch Processing Workflows
170
171
Efficient processing of large text corpora:
172
173
```bash
174
# Process large files with progress indication
175
process_large_file() {
176
local input_file="$1"
177
local output_file="$2"
178
local style="${3:-normal}"
179
180
echo "Processing $input_file with style $style..."
181
182
# Count lines for progress
183
total_lines=$(wc -l < "$input_file")
184
current_line=0
185
186
while IFS= read -r line; do
187
current_line=$((current_line + 1))
188
echo "$line" | pypinyin --style "$style" >> "$output_file"
189
190
# Progress indicator
191
if ((current_line % 100 == 0)); then
192
echo "Progress: $current_line/$total_lines lines"
193
fi
194
done < "$input_file"
195
}
196
197
# Parallel processing for multiple files
198
parallel_process() {
199
local style="$1"
200
shift
201
local files=("$@")
202
203
for file in "${files[@]}"; do
204
(
205
echo "Starting $file..."
206
pypinyin --style "$style" < "$file" > "${file%.txt}_${style}.txt"
207
echo "Completed $file"
208
) &
209
done
210
211
wait # Wait for all background jobs to complete
212
echo "All files processed"
213
}
214
215
# Usage
216
parallel_process normal file1.txt file2.txt file3.txt
217
```
218
219
### Data Processing Pipelines
220
221
Integration with common Unix text processing tools:
222
223
```bash
224
# Extract and convert Chinese text from mixed content
225
extract_and_convert() {
226
local input_file="$1"
227
228
# Extract Chinese characters, convert to pinyin, create word frequency
229
grep -oP '[\x{4e00}-\x{9fff}]+' "$input_file" | \
230
pypinyin --style normal | \
231
tr ' ' '\n' | \
232
sort | uniq -c | sort -nr > chinese_word_frequency.txt
233
}
234
235
# Create pronunciation dictionary from text
236
create_pronunciation_dict() {
237
local input_file="$1"
238
239
# Extract unique Chinese phrases and their pinyin
240
grep -oP '[\x{4e00}-\x{9fff}]{2,}' "$input_file" | \
241
sort | uniq | \
242
while read -r phrase; do
243
pinyin_result=$(echo "$phrase" | pypinyin --style tone)
244
echo "$phrase -> $pinyin_result"
245
done > pronunciation_dict.txt
246
}
247
248
# Search text by pinyin
249
search_by_pinyin() {
250
local search_term="$1"
251
local text_file="$2"
252
253
# Convert search term to pinyin patterns
254
search_pattern=$(echo "$search_term" | pypinyin --style normal | tr ' ' '.*')
255
256
# Find matching lines
257
while IFS= read -r line; do
258
line_pinyin=$(echo "$line" | pypinyin --style normal)
259
if echo "$line_pinyin" | grep -q "$search_pattern"; then
260
echo "$line"
261
fi
262
done < "$text_file"
263
}
264
```
265
266
### Configuration and Environment
267
268
Environment variables and configuration options:
269
270
```bash
271
# Environment variable configuration
272
export PYPINYIN_STYLE=tone3 # Default style
273
export PYPINYIN_SEPARATOR="_" # Default separator
274
export PYPINYIN_ERRORS=ignore # Error handling strategy
275
276
# Configuration file support (if available)
277
cat > ~/.pypinyinrc << EOF
278
style=normal
279
separator=-
280
heteronym=false
281
v_to_u=true
282
EOF
283
284
# Use configuration in scripts
285
load_config() {
286
if [[ -f ~/.pypinyinrc ]]; then
287
source ~/.pypinyinrc
288
echo "Loaded configuration from ~/.pypinyinrc"
289
fi
290
}
291
```
292
293
### Error Handling in Scripts
294
295
Robust error handling for production workflows:
296
297
```bash
298
# Safe pypinyin execution with error handling
299
safe_pypinyin() {
300
local input="$1"
301
local style="${2:-normal}"
302
local max_retries=3
303
local retry_count=0
304
305
while ((retry_count < max_retries)); do
306
if result=$(echo "$input" | pypinyin --style "$style" 2>/dev/null); then
307
echo "$result"
308
return 0
309
else
310
((retry_count++))
311
echo "Retry $retry_count/$max_retries for: $input" >&2
312
sleep 1
313
fi
314
done
315
316
echo "Failed to process: $input" >&2
317
return 1
318
}
319
320
# Validate Chinese text before processing
321
validate_chinese_text() {
322
local text="$1"
323
324
# Check if text contains Chinese characters
325
if ! echo "$text" | grep -qP '[\x{4e00}-\x{9fff}]'; then
326
echo "Warning: No Chinese characters found in: $text" >&2
327
return 1
328
fi
329
330
# Check text length
331
if ((${#text} > 1000)); then
332
echo "Warning: Text very long (${#text} chars): $text" >&2
333
fi
334
335
return 0
336
}
337
338
# Complete processing function with validation
339
process_with_validation() {
340
local input="$1"
341
local style="${2:-normal}"
342
343
if validate_chinese_text "$input"; then
344
safe_pypinyin "$input" "$style"
345
else
346
echo "Skipping invalid input: $input" >&2
347
return 1
348
fi
349
}
350
```