Chinese character to Pinyin conversion with intelligent phrase matching and multiple pronunciation support
—
Quality
Pending
Does it follow best practices?
Impact
Pending
No eval scenarios have been run
Integrated support for multiple Chinese text segmentation libraries to improve conversion accuracy for phrases and compound words by recognizing word boundaries.
Support for multiple Chinese word segmentation libraries with automatic fallback handling.
type IPinyinSegment = "nodejieba" | "segmentit" | "@node-rs/jieba" | "Intl.Segmenter";Note: Segmentation is handled internally by the main pinyin() function when the segment option is provided. There is no standalone segment function exported from the package.
nodejieba (Default)
nodejieba as peer dependency@node-rs/jieba
@node-rs/jieba as peer dependencysegmentit
segmentit as peer dependencyIntl.Segmenter
Segmentation can be enabled through the main pinyin function options:
interface IPinyinOptions {
/** Text segmentation library for phrase recognition */
segment?: IPinyinSegment | boolean;
}Configuration Options:
false (default): No segmentation, character-by-character conversiontrue: Enable segmentation using Intl.Segmenter (recommended default)"nodejieba": Use nodejieba library (fastest, Node.js only)"@node-rs/jieba": Use Rust-based jieba (very fast, Node.js only)"segmentit": Use pure JavaScript segmentation (cross-platform)"Intl.Segmenter": Use web standard segmentation (modern environments)import pinyin from "pinyin";
// Without segmentation (character-by-character)
console.log(pinyin("我喜欢编程"));
// Result: [["wǒ"], ["xǐ"], ["huān"], ["biān"], ["chéng"]]
// With segmentation (phrase-aware)
console.log(pinyin("我喜欢编程", { segment: true }));
// Result: [["wǒ"], ["xǐ"], ["huān"], ["biānchéng"]]
// With specific segmentation library
console.log(pinyin("我喜欢编程", { segment: "nodejieba" }));
// Result: [["wǒ"], ["xǐhuān"], ["biānchéng"]]Segmentation significantly improves accuracy for compound words and phrases:
// Without segmentation - less accurate
console.log(pinyin("北京大学"));
// Result: [["běi"], ["jīng"], ["dà"], ["xué"]]
// With segmentation - more accurate phrase recognition
console.log(pinyin("北京大学", { segment: true }));
// Result: [["běijīng"], ["dàxué"]]
// Complex phrases
console.log(pinyin("人工智能技术", { segment: "nodejieba" }));
// Result: [["réngōng"], ["zhìnéng"], ["jìshù"]]Combine segmentation with group option for phrase-level Pinyin:
// Segmentation with phrase grouping
console.log(pinyin("自然语言处理", {
segment: true,
group: true
}));
// Result: [["zìráncr"], ["yǔyán"], ["chǔlǐ"]]
// Character-by-character for comparison
console.log(pinyin("自然语言处理"));
// Result: [["zì"], ["rán"], ["yǔ"], ["yán"], ["chǔ"], ["lǐ"]]The segmentation functionality is internal to the pinyin() function and is not exposed as a standalone API. When you enable segmentation through the segment option, the library automatically handles word boundary detection internally before applying Pinyin conversion.
The segmentation system includes robust error handling:
When a specified segmentation library is not available:
// If nodejieba is not installed
console.log(pinyin("测试", { segment: "nodejieba" }));
// Logs: "pinyin v4: 'nodejieba' is peerDependencies"
// Fallback: Returns original text as single segmentIf segmentation fails due to errors:
// Error in segmentation library
console.log(segment("测试", "invalid-library"));
// Fallback: Returns original text as array with single element
// Result: ["测试"]Different libraries work on different platforms:
// Browser environment
console.log(pinyin("测试", { segment: "Intl.Segmenter" }));
// Works in modern browsers
console.log(pinyin("测试", { segment: "nodejieba" }));
// Will fallback - nodejieba only works in Node.jsFor most applications, use segment: true (defaults to Intl.Segmenter) as it provides good performance without additional dependencies.
// Recommended approach
const result = pinyin("中文文本", { segment: true });For high-performance Node.js applications with many conversions, consider nodejieba or @node-rs/jieba:
// High-performance Node.js
const result = pinyin("中文文本", { segment: "@node-rs/jieba" });Install with Tessl CLI
npx tessl i tessl/npm-pinyin