or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

index.md

index.mddocs/

0

# Pyphen

1

2

A pure Python text hyphenation library that uses existing Hunspell hyphenation dictionaries to provide automatic word breaking at syllable boundaries. Pyphen offers comprehensive hyphenation capabilities through multiple methods including position detection, word wrapping with width constraints, and hyphen insertion, supporting over 40 languages through included LibreOffice dictionaries.

3

4

## Package Information

5

6

- **Package Name**: pyphen

7

- **Language**: Python

8

- **Installation**: `pip install pyphen`

9

- **Python Requirements**: Python 3.9+ (CPython and PyPy)

10

- **License**: GPL 2.0+/LGPL 2.1+/MPL 1.1

11

12

## Core Imports

13

14

```python

15

import pyphen

16

```

17

18

Access to all public functionality:

19

20

```python

21

from pyphen import Pyphen, LANGUAGES, language_fallback

22

```

23

24

## Basic Usage

25

26

```python

27

import pyphen

28

29

# Create hyphenation instance for a language

30

dic = pyphen.Pyphen(lang='en_US')

31

32

# Get word with hyphens inserted at all valid positions

33

hyphenated = dic.inserted('hyphenation')

34

print(hyphenated) # 'hy-phen-ation'

35

36

# Iterate through all possible hyphenation splits

37

for first_part, second_part in dic.iterate('hyphenation'):

38

print(f"{first_part} | {second_part}")

39

# Output:

40

# hyphena | tion

41

# hyphen | ation

42

# hy | phenation

43

44

# Wrap word to fit within specific width

45

wrapped = dic.wrap('hyphenation', width=8)

46

if wrapped:

47

first, second = wrapped

48

print(f"{first} {second}") # 'hyphen- ation'

49

50

# Find all valid hyphenation positions

51

positions = dic.positions('hyphenation')

52

print(positions) # [2, 6, 8] (DataInt objects)

53

```

54

55

## Capabilities

56

57

### Language Discovery and Fallback

58

59

Discover available languages and find appropriate fallback languages using Unicode locale inheritance.

60

61

```python

62

LANGUAGES: dict[str, Path] # { .api }

63

```

64

65

Dictionary mapping language codes to hyphenation dictionary file paths. Contains both full codes (e.g., 'en_US', 'de_DE') and short codes (e.g., 'en', 'de') for over 40 supported languages.

66

67

```python { .api }

68

def language_fallback(language: str) -> str | None:

69

"""

70

Get a fallback language available in dictionaries using Unicode locale inheritance.

71

72

Args:

73

language (str): Language code with potential region/script variants (e.g., 'en-US', 'sr-Latn')

74

75

Returns:

76

str | None: Available language code in LANGUAGES, or None if no fallback found

77

"""

78

```

79

80

### Hyphenation Class

81

82

Main interface for text hyphenation with configurable parameters and multiple hyphenation methods.

83

84

```python { .api }

85

class Pyphen:

86

def __init__(

87

self,

88

filename: str | Path | None = None,

89

lang: str | None = None,

90

left: int = 2,

91

right: int = 2,

92

cache: bool = True

93

):

94

"""

95

Create hyphenation instance for specified language or dictionary file.

96

97

Args:

98

filename (str | Path, optional): Path to custom hyph_*.dic file

99

lang (str, optional): Language code for built-in dictionary

100

left (int): Minimum characters in first syllable (default: 2)

101

right (int): Minimum characters in last syllable (default: 2)

102

cache (bool): Whether to cache hyphenation patterns (default: True)

103

104

Raises:

105

KeyError: If specified language is not available in LANGUAGES

106

"""

107

```

108

109

### Position-Based Hyphenation

110

111

Find where words can be hyphenated and access position metadata.

112

113

```python { .api }

114

def positions(self, word: str) -> list[DataInt]:

115

"""

116

Get valid hyphenation positions in word, respecting left/right constraints.

117

118

Args:

119

word (str): Word to find hyphenation positions for

120

121

Returns:

122

list[DataInt]: Positions where word can be hyphenated. Each DataInt may contain

123

tuple (change, index, cut) for non-standard hyphenation patterns.

124

"""

125

```

126

127

### Iterative Hyphenation

128

129

Iterate through all possible hyphenation splits of a word.

130

131

```python { .api }

132

def iterate(self, word: str) -> Generator[tuple[str, str], None, None]:

133

"""

134

Iterate over all hyphenation possibilities, longest first part first.

135

136

Args:

137

word (str): Word to hyphenate

138

139

Yields:

140

tuple[str, str]: (first_part, second_part) for each valid hyphenation point

141

"""

142

143

def __call__(self, word: str) -> Generator[tuple[str, str], None, None]:

144

"""Alias for iterate() method - makes Pyphen instances callable."""

145

```

146

147

### Width-Constrained Wrapping

148

149

Wrap words to fit within specific width constraints with hyphenation.

150

151

```python { .api }

152

def wrap(self, word: str, width: int, hyphen: str = '-') -> tuple[str, str] | None:

153

"""

154

Get longest possible first part and remaining part that fits within width.

155

156

Args:

157

word (str): Word to wrap

158

width (int): Maximum length for first part including hyphen

159

hyphen (str): Hyphen character to use (default: '-')

160

161

Returns:

162

tuple[str, str] | None: (first_part_with_hyphen, remaining_part) or None

163

if no valid hyphenation fits within width

164

"""

165

```

166

167

### Hyphen Insertion

168

169

Insert hyphens at all valid hyphenation points in a word.

170

171

```python { .api }

172

def inserted(self, word: str, hyphen: str = '-') -> str:

173

"""

174

Get word with all possible hyphens inserted at valid positions.

175

176

Args:

177

word (str): Word to hyphenate

178

hyphen (str): Hyphen character to insert (default: '-')

179

180

Returns:

181

str: Word with hyphens inserted at all valid hyphenation points

182

"""

183

```

184

185

## Types

186

187

```python { .api }

188

class DataInt(int):

189

"""

190

Integer with additional data attribute for hyphenation metadata.

191

192

Attributes:

193

data (tuple[str, int, int] | None): Non-standard hyphenation data containing

194

(change, index, cut) for character substitutions

195

"""

196

197

def __new__(cls, value: int, data: tuple[str, int, int] | None = None, reference: DataInt | None = None) -> DataInt:

198

"""

199

Create DataInt with optional hyphenation metadata.

200

201

Args:

202

value (int): Integer value (hyphenation position)

203

data (tuple, optional): Hyphenation metadata (change, index, cut)

204

reference (DataInt, optional): Copy data from another DataInt

205

"""

206

```

207

208

## Language Support

209

210

Pyphen includes hyphenation dictionaries for 40+ languages sourced from LibreOffice:

211

212

**Major Languages**: en_US, en_GB, fr, de, es, it, pt_PT, pt_BR, ru_RU, nl_NL, sv, da_DK, no_NO, pl_PL, cs_CZ, hu_HU, el_GR, tr_TR, fi_FI

213

214

**Regional Variants**: de_DE, de_AT, de_CH, pt_BR vs pt_PT, en_US vs en_GB, sr_Latn vs sr (Cyrillic)

215

216

**Script Variants**: Serbian Latin (sr_Latn) and Cyrillic (sr), multiple Chinese variants

217

218

Access all available languages:

219

220

```python

221

import pyphen

222

223

# List all available language codes

224

print(list(pyphen.LANGUAGES.keys()))

225

226

# Check if language is available

227

if 'de_DE' in pyphen.LANGUAGES:

228

dic = pyphen.Pyphen(lang='de_DE')

229

230

# Use fallback for unavailable variants

231

fallback = pyphen.language_fallback('de-AT-x-variant') # Returns 'de_AT'

232

```

233

234

## Advanced Usage

235

236

### Custom Dictionary Files

237

238

Load hyphenation patterns from custom dictionary files:

239

240

```python

241

from pathlib import Path

242

import pyphen

243

244

# Load from file path

245

custom_dict = Path('/path/to/custom_hyph.dic')

246

dic = pyphen.Pyphen(filename=custom_dict)

247

248

# Use existing dictionary path

249

dic = pyphen.Pyphen(filename=pyphen.LANGUAGES['fr'])

250

```

251

252

### Syllable Constraints

253

254

Control minimum syllable lengths for hyphenation:

255

256

```python

257

import pyphen

258

259

# Default: minimum 2 characters on each side

260

dic = pyphen.Pyphen(lang='en_US')

261

print(dic.inserted('automatic')) # 'au-to-mat-ic'

262

263

# Require 4 characters at start, 3 at end

264

dic = pyphen.Pyphen(lang='en_US', left=4, right=3)

265

print(dic.inserted('automatic')) # 'auto-matic'

266

267

# No constraints (left=1, right=1)

268

dic = pyphen.Pyphen(lang='en_US', left=1, right=1)

269

print(dic.inserted('automatic')) # 'a-u-t-o-m-a-t-i-c'

270

```

271

272

### Non-Standard Hyphenation

273

274

Some languages use character substitution during hyphenation:

275

276

```python

277

import pyphen

278

279

# Hungarian example with character changes

280

dic = pyphen.Pyphen(lang='hu', left=1, right=1)

281

word = 'kulissza'

282

283

# Standard positions show metadata

284

for pos in dic.positions(word):

285

if pos.data:

286

change, index, cut = pos.data

287

print(f"Position {pos}: change '{change}' at index {index}, cut {cut}")

288

289

# Iteration handles substitutions automatically

290

for first, second in dic.iterate(word):

291

print(f"{first} | {second}")

292

# Output:

293

# kulisz | sza # 'ss' becomes 'sz' + 's'

294

# ku | lissza

295

296

# Inserted form shows final result

297

print(dic.inserted(word)) # 'ku-lisz-sza'

298

```

299

300

### Performance Considerations

301

302

Dictionary loading is cached by default for better performance:

303

304

```python

305

import pyphen

306

307

# First instance loads and caches dictionary

308

dic1 = pyphen.Pyphen(lang='en_US') # Loads dictionary

309

310

# Subsequent instances reuse cached dictionary

311

dic2 = pyphen.Pyphen(lang='en_US') # Uses cached dictionary

312

313

# Disable caching if needed (e.g., for memory-constrained environments)

314

dic3 = pyphen.Pyphen(lang='en_US', cache=False) # Reloads dictionary

315

```

316

317

## Error Handling

318

319

```python

320

import pyphen

321

322

# Handle unavailable languages

323

try:

324

dic = pyphen.Pyphen(lang='unknown_language')

325

except KeyError as e:

326

print(f"Language not available: {e}")

327

# Use fallback or default language

328

dic = pyphen.Pyphen(lang='en_US')

329

330

# Handle custom dictionary file errors

331

try:

332

dic = pyphen.Pyphen(filename='/nonexistent/path.dic')

333

except (FileNotFoundError, PermissionError, UnicodeDecodeError) as e:

334

print(f"Dictionary file error: {e}")

335

# Fallback to built-in dictionary

336

dic = pyphen.Pyphen(lang='en_US')

337

```