0
# Pyphen
1
2
A pure Python text hyphenation library that uses existing Hunspell hyphenation dictionaries to provide automatic word breaking at syllable boundaries. Pyphen offers comprehensive hyphenation capabilities through multiple methods including position detection, word wrapping with width constraints, and hyphen insertion, supporting over 40 languages through included LibreOffice dictionaries.
3
4
## Package Information
5
6
- **Package Name**: pyphen
7
- **Language**: Python
8
- **Installation**: `pip install pyphen`
9
- **Python Requirements**: Python 3.9+ (CPython and PyPy)
10
- **License**: GPL 2.0+/LGPL 2.1+/MPL 1.1
11
12
## Core Imports
13
14
```python
15
import pyphen
16
```
17
18
Access to all public functionality:
19
20
```python
21
from pyphen import Pyphen, LANGUAGES, language_fallback
22
```
23
24
## Basic Usage
25
26
```python
27
import pyphen
28
29
# Create hyphenation instance for a language
30
dic = pyphen.Pyphen(lang='en_US')
31
32
# Get word with hyphens inserted at all valid positions
33
hyphenated = dic.inserted('hyphenation')
34
print(hyphenated) # 'hy-phen-ation'
35
36
# Iterate through all possible hyphenation splits
37
for first_part, second_part in dic.iterate('hyphenation'):
38
print(f"{first_part} | {second_part}")
39
# Output:
40
# hyphena | tion
41
# hyphen | ation
42
# hy | phenation
43
44
# Wrap word to fit within specific width
45
wrapped = dic.wrap('hyphenation', width=8)
46
if wrapped:
47
first, second = wrapped
48
print(f"{first} {second}") # 'hyphen- ation'
49
50
# Find all valid hyphenation positions
51
positions = dic.positions('hyphenation')
52
print(positions) # [2, 6, 8] (DataInt objects)
53
```
54
55
## Capabilities
56
57
### Language Discovery and Fallback
58
59
Discover available languages and find appropriate fallback languages using Unicode locale inheritance.
60
61
```python
62
LANGUAGES: dict[str, Path] # { .api }
63
```
64
65
Dictionary mapping language codes to hyphenation dictionary file paths. Contains both full codes (e.g., 'en_US', 'de_DE') and short codes (e.g., 'en', 'de') for over 40 supported languages.
66
67
```python { .api }
68
def language_fallback(language: str) -> str | None:
69
"""
70
Get a fallback language available in dictionaries using Unicode locale inheritance.
71
72
Args:
73
language (str): Language code with potential region/script variants (e.g., 'en-US', 'sr-Latn')
74
75
Returns:
76
str | None: Available language code in LANGUAGES, or None if no fallback found
77
"""
78
```
79
80
### Hyphenation Class
81
82
Main interface for text hyphenation with configurable parameters and multiple hyphenation methods.
83
84
```python { .api }
85
class Pyphen:
86
def __init__(
87
self,
88
filename: str | Path | None = None,
89
lang: str | None = None,
90
left: int = 2,
91
right: int = 2,
92
cache: bool = True
93
):
94
"""
95
Create hyphenation instance for specified language or dictionary file.
96
97
Args:
98
filename (str | Path, optional): Path to custom hyph_*.dic file
99
lang (str, optional): Language code for built-in dictionary
100
left (int): Minimum characters in first syllable (default: 2)
101
right (int): Minimum characters in last syllable (default: 2)
102
cache (bool): Whether to cache hyphenation patterns (default: True)
103
104
Raises:
105
KeyError: If specified language is not available in LANGUAGES
106
"""
107
```
108
109
### Position-Based Hyphenation
110
111
Find where words can be hyphenated and access position metadata.
112
113
```python { .api }
114
def positions(self, word: str) -> list[DataInt]:
115
"""
116
Get valid hyphenation positions in word, respecting left/right constraints.
117
118
Args:
119
word (str): Word to find hyphenation positions for
120
121
Returns:
122
list[DataInt]: Positions where word can be hyphenated. Each DataInt may contain
123
tuple (change, index, cut) for non-standard hyphenation patterns.
124
"""
125
```
126
127
### Iterative Hyphenation
128
129
Iterate through all possible hyphenation splits of a word.
130
131
```python { .api }
132
def iterate(self, word: str) -> Generator[tuple[str, str], None, None]:
133
"""
134
Iterate over all hyphenation possibilities, longest first part first.
135
136
Args:
137
word (str): Word to hyphenate
138
139
Yields:
140
tuple[str, str]: (first_part, second_part) for each valid hyphenation point
141
"""
142
143
def __call__(self, word: str) -> Generator[tuple[str, str], None, None]:
144
"""Alias for iterate() method - makes Pyphen instances callable."""
145
```
146
147
### Width-Constrained Wrapping
148
149
Wrap words to fit within specific width constraints with hyphenation.
150
151
```python { .api }
152
def wrap(self, word: str, width: int, hyphen: str = '-') -> tuple[str, str] | None:
153
"""
154
Get longest possible first part and remaining part that fits within width.
155
156
Args:
157
word (str): Word to wrap
158
width (int): Maximum length for first part including hyphen
159
hyphen (str): Hyphen character to use (default: '-')
160
161
Returns:
162
tuple[str, str] | None: (first_part_with_hyphen, remaining_part) or None
163
if no valid hyphenation fits within width
164
"""
165
```
166
167
### Hyphen Insertion
168
169
Insert hyphens at all valid hyphenation points in a word.
170
171
```python { .api }
172
def inserted(self, word: str, hyphen: str = '-') -> str:
173
"""
174
Get word with all possible hyphens inserted at valid positions.
175
176
Args:
177
word (str): Word to hyphenate
178
hyphen (str): Hyphen character to insert (default: '-')
179
180
Returns:
181
str: Word with hyphens inserted at all valid hyphenation points
182
"""
183
```
184
185
## Types
186
187
```python { .api }
188
class DataInt(int):
189
"""
190
Integer with additional data attribute for hyphenation metadata.
191
192
Attributes:
193
data (tuple[str, int, int] | None): Non-standard hyphenation data containing
194
(change, index, cut) for character substitutions
195
"""
196
197
def __new__(cls, value: int, data: tuple[str, int, int] | None = None, reference: DataInt | None = None) -> DataInt:
198
"""
199
Create DataInt with optional hyphenation metadata.
200
201
Args:
202
value (int): Integer value (hyphenation position)
203
data (tuple, optional): Hyphenation metadata (change, index, cut)
204
reference (DataInt, optional): Copy data from another DataInt
205
"""
206
```
207
208
## Language Support
209
210
Pyphen includes hyphenation dictionaries for 40+ languages sourced from LibreOffice:
211
212
**Major Languages**: en_US, en_GB, fr, de, es, it, pt_PT, pt_BR, ru_RU, nl_NL, sv, da_DK, no_NO, pl_PL, cs_CZ, hu_HU, el_GR, tr_TR, fi_FI
213
214
**Regional Variants**: de_DE, de_AT, de_CH, pt_BR vs pt_PT, en_US vs en_GB, sr_Latn vs sr (Cyrillic)
215
216
**Script Variants**: Serbian Latin (sr_Latn) and Cyrillic (sr), multiple Chinese variants
217
218
Access all available languages:
219
220
```python
221
import pyphen
222
223
# List all available language codes
224
print(list(pyphen.LANGUAGES.keys()))
225
226
# Check if language is available
227
if 'de_DE' in pyphen.LANGUAGES:
228
dic = pyphen.Pyphen(lang='de_DE')
229
230
# Use fallback for unavailable variants
231
fallback = pyphen.language_fallback('de-AT-x-variant') # Returns 'de_AT'
232
```
233
234
## Advanced Usage
235
236
### Custom Dictionary Files
237
238
Load hyphenation patterns from custom dictionary files:
239
240
```python
241
from pathlib import Path
242
import pyphen
243
244
# Load from file path
245
custom_dict = Path('/path/to/custom_hyph.dic')
246
dic = pyphen.Pyphen(filename=custom_dict)
247
248
# Use existing dictionary path
249
dic = pyphen.Pyphen(filename=pyphen.LANGUAGES['fr'])
250
```
251
252
### Syllable Constraints
253
254
Control minimum syllable lengths for hyphenation:
255
256
```python
257
import pyphen
258
259
# Default: minimum 2 characters on each side
260
dic = pyphen.Pyphen(lang='en_US')
261
print(dic.inserted('automatic')) # 'au-to-mat-ic'
262
263
# Require 4 characters at start, 3 at end
264
dic = pyphen.Pyphen(lang='en_US', left=4, right=3)
265
print(dic.inserted('automatic')) # 'auto-matic'
266
267
# No constraints (left=1, right=1)
268
dic = pyphen.Pyphen(lang='en_US', left=1, right=1)
269
print(dic.inserted('automatic')) # 'a-u-t-o-m-a-t-i-c'
270
```
271
272
### Non-Standard Hyphenation
273
274
Some languages use character substitution during hyphenation:
275
276
```python
277
import pyphen
278
279
# Hungarian example with character changes
280
dic = pyphen.Pyphen(lang='hu', left=1, right=1)
281
word = 'kulissza'
282
283
# Standard positions show metadata
284
for pos in dic.positions(word):
285
if pos.data:
286
change, index, cut = pos.data
287
print(f"Position {pos}: change '{change}' at index {index}, cut {cut}")
288
289
# Iteration handles substitutions automatically
290
for first, second in dic.iterate(word):
291
print(f"{first} | {second}")
292
# Output:
293
# kulisz | sza # 'ss' becomes 'sz' + 's'
294
# ku | lissza
295
296
# Inserted form shows final result
297
print(dic.inserted(word)) # 'ku-lisz-sza'
298
```
299
300
### Performance Considerations
301
302
Dictionary loading is cached by default for better performance:
303
304
```python
305
import pyphen
306
307
# First instance loads and caches dictionary
308
dic1 = pyphen.Pyphen(lang='en_US') # Loads dictionary
309
310
# Subsequent instances reuse cached dictionary
311
dic2 = pyphen.Pyphen(lang='en_US') # Uses cached dictionary
312
313
# Disable caching if needed (e.g., for memory-constrained environments)
314
dic3 = pyphen.Pyphen(lang='en_US', cache=False) # Reloads dictionary
315
```
316
317
## Error Handling
318
319
```python
320
import pyphen
321
322
# Handle unavailable languages
323
try:
324
dic = pyphen.Pyphen(lang='unknown_language')
325
except KeyError as e:
326
print(f"Language not available: {e}")
327
# Use fallback or default language
328
dic = pyphen.Pyphen(lang='en_US')
329
330
# Handle custom dictionary file errors
331
try:
332
dic = pyphen.Pyphen(filename='/nonexistent/path.dic')
333
except (FileNotFoundError, PermissionError, UnicodeDecodeError) as e:
334
print(f"Dictionary file error: {e}")
335
# Fallback to built-in dictionary
336
dic = pyphen.Pyphen(lang='en_US')
337
```