0
# Python Slugify
1
2
A comprehensive Python library for converting Unicode text strings into URL-friendly slugs. Python Slugify handles complex Unicode characters from various languages by transliterating them to ASCII equivalents, while offering extensive customization options including custom separators, stopword filtering, length limits, regex patterns, and character replacements.
3
4
## Package Information
5
6
- **Package Name**: python-slugify
7
- **Language**: Python
8
- **Installation**: `pip install python-slugify`
9
- **Optional**: `pip install python-slugify[unidecode]` (for advanced Unicode handling)
10
11
## Core Imports
12
13
```python
14
from slugify import slugify
15
```
16
17
Additional utilities and special character mappings:
18
19
```python
20
from slugify import slugify, smart_truncate
21
from slugify import PRE_TRANSLATIONS, CYRILLIC, GERMAN, GREEK
22
```
23
24
Version and metadata information:
25
26
```python
27
from slugify import __version__, __title__, __author__, __description__
28
```
29
30
Regex patterns and constants:
31
32
```python
33
from slugify import DEFAULT_SEPARATOR
34
from slugify import CHAR_ENTITY_PATTERN, DECIMAL_PATTERN, HEX_PATTERN
35
```
36
37
## Basic Usage
38
39
```python
40
from slugify import slugify
41
42
# Basic text slugification
43
text = "This is a test ---"
44
result = slugify(text)
45
print(result) # "this-is-a-test"
46
47
# Unicode text handling
48
text = '影師嗎'
49
result = slugify(text)
50
print(result) # "ying-shi-ma"
51
52
# Preserve Unicode characters
53
text = '影師嗎'
54
result = slugify(text, allow_unicode=True)
55
print(result) # "影師嗎"
56
57
# Custom separator and length limits
58
text = 'C\'est déjà l\'été.'
59
result = slugify(text, separator='_', max_length=15)
60
print(result) # "c_est_deja_l_et"
61
62
# Using replacement rules
63
text = "50% off | great deal"
64
result = slugify(text, replacements=[['%', 'percent'], ['|', 'or']])
65
print(result) # "50-percent-off-or-great-deal"
66
```
67
68
## Capabilities
69
70
### Text Slugification
71
72
The main function for converting text to URL-friendly slugs with comprehensive Unicode support and customization options.
73
74
```python { .api }
75
def slugify(
76
text: str,
77
entities: bool = True,
78
decimal: bool = True,
79
hexadecimal: bool = True,
80
max_length: int = 0,
81
word_boundary: bool = False,
82
separator: str = "-",
83
save_order: bool = False,
84
stopwords: Iterable[str] = (),
85
regex_pattern: re.Pattern[str] | str | None = None,
86
lowercase: bool = True,
87
replacements: Iterable[Iterable[str]] = (),
88
allow_unicode: bool = False,
89
) -> str:
90
"""
91
Convert text into a URL-friendly slug.
92
93
Parameters:
94
- text (str): Input text to slugify
95
- entities (bool): Convert HTML entities to unicode (default: True)
96
- decimal (bool): Convert HTML decimal entities to unicode (default: True)
97
- hexadecimal (bool): Convert HTML hexadecimal entities to unicode (default: True)
98
- max_length (int): Maximum output length, 0 for no limit (default: 0)
99
- word_boundary (bool): Truncate to complete words (default: False)
100
- separator (str): Separator between words (default: "-")
101
- save_order (bool): Preserve word order when truncating (default: False)
102
- stopwords (Iterable[str]): Words to exclude from output (default: ())
103
- regex_pattern (re.Pattern[str] | str | None): Custom regex for disallowed characters (default: None)
104
- lowercase (bool): Convert to lowercase (default: True)
105
- replacements (Iterable[Iterable[str]]): Custom replacement rules (default: ())
106
- allow_unicode (bool): Allow Unicode characters in output (default: False)
107
108
Returns:
109
str: URL-friendly slug
110
"""
111
```
112
113
#### Usage Examples
114
115
```python
116
from slugify import slugify
117
118
# HTML entity handling
119
text = "foo & bar"
120
result = slugify(text) # "foo-bar"
121
122
# Stopword filtering
123
text = "The quick brown fox"
124
result = slugify(text, stopwords=['the', 'a', 'an']) # "quick-brown-fox"
125
126
# Custom regex pattern
127
import re
128
text = "Hello World 123"
129
pattern = re.compile(r'[^a-z]+')
130
result = slugify(text, regex_pattern=pattern) # "hello-world"
131
132
# Length limits with word boundaries
133
text = "This is a very long sentence"
134
result = slugify(text, max_length=15, word_boundary=True) # "this-is-a-very"
135
136
# Multiple replacement rules
137
text = "Price: $50 | 20% off"
138
replacements = [['$', 'dollar'], ['%', 'percent'], ['|', 'and']]
139
result = slugify(text, replacements=replacements) # "price-dollar50-and-20-percent-off"
140
```
141
142
### Smart Text Truncation
143
144
Intelligent string truncation with word boundary preservation and order control.
145
146
```python { .api }
147
def smart_truncate(
148
string: str,
149
max_length: int = 0,
150
word_boundary: bool = False,
151
separator: str = " ",
152
save_order: bool = False,
153
) -> str:
154
"""
155
Intelligently truncate strings while preserving word boundaries.
156
157
Parameters:
158
- string (str): String to truncate
159
- max_length (int): Maximum length, 0 for no truncation (default: 0)
160
- word_boundary (bool): Respect word boundaries (default: False)
161
- separator (str): Word separator (default: " ")
162
- save_order (bool): Maintain original word order (default: False)
163
164
Returns:
165
str: Truncated string
166
"""
167
```
168
169
#### Usage Examples
170
171
```python
172
from slugify import smart_truncate
173
174
# Basic truncation
175
text = "This is a long sentence"
176
result = smart_truncate(text, max_length=10) # "This is a "
177
178
# Word boundary preservation
179
text = "This is a long sentence"
180
result = smart_truncate(text, max_length=15, word_boundary=True) # "This is a long"
181
182
# Custom separator
183
text = "word1-word2-word3-word4"
184
result = smart_truncate(text, max_length=15, word_boundary=True, separator="-") # "word1-word2"
185
```
186
187
### Language-Specific Character Mappings
188
189
Pre-defined character translation mappings for various languages, useful for custom transliteration workflows.
190
191
```python { .api }
192
# Character mapping lists
193
CYRILLIC: list[tuple[str, str]]
194
GERMAN: list[tuple[str, str]]
195
GREEK: list[tuple[str, str]]
196
PRE_TRANSLATIONS: list[tuple[str, str]]
197
198
def add_uppercase_char(char_list: list[tuple[str, str]]) -> list[tuple[str, str]]:
199
"""
200
Add uppercase variants to character replacement list.
201
202
Parameters:
203
- char_list (list[tuple[str, str]]): List of character replacement tuples
204
205
Returns:
206
list[tuple[str, str]]: Enhanced list with uppercase variants
207
"""
208
```
209
210
#### Available Character Mappings
211
212
```python
213
from slugify import CYRILLIC, GERMAN, GREEK, PRE_TRANSLATIONS
214
215
# Cyrillic mappings: ё->e, я->ya, х->h, у->y, щ->sch, ю->u (with uppercase variants)
216
print(CYRILLIC[:3]) # [('Ё', 'E'), ('ё', 'e'), ('Я', 'Ya'), ...]
217
218
# German umlaut mappings: ä->ae, ö->oe, ü->ue (with uppercase variants)
219
print(GERMAN[:3]) # [('Ä', 'Ae'), ('ä', 'ae'), ('Ö', 'Oe'), ...]
220
221
# Greek mappings: χ->ch, Ξ->X, ϒ->Y, υ->y, etc. (with uppercase variants)
222
print(GREEK[:3]) # [('Χ', 'Ch'), ('χ', 'ch'), ('Ξ', 'X'), ...]
223
224
# Combined mappings from all languages
225
print(len(PRE_TRANSLATIONS)) # Total count of all mappings
226
```
227
228
### Command Line Interface
229
230
Python Slugify provides a command-line interface for text slugification with full parameter support.
231
232
```python { .api }
233
def main(argv: list[str] | None = None):
234
"""
235
Command-line entry point for slugification.
236
237
Parameters:
238
- argv (list[str] | None): Command line arguments (default: None uses sys.argv)
239
"""
240
```
241
242
#### Command Line Usage
243
244
```bash
245
# Basic usage
246
slugify "Hello World" # Output: hello-world
247
248
# From stdin
249
echo "Hello World" | slugify --stdin
250
251
# With options
252
slugify "Hello World" --separator="_" --max-length=8 # Output: hello_wo
253
254
# Custom replacements
255
slugify "Price: $50" --replacements "\$->dollar" # Output: price-dollar50
256
257
# Custom regex pattern
258
slugify "Keep_underscores" --regex-pattern "[^-a-z0-9_]+" # Output: keep_underscores
259
260
# Allow unicode
261
slugify "影師嗎" --allow-unicode # Output: 影師嗎
262
263
# Complex combination
264
slugify "The ÜBER café costs 50%" --stopwords "the" --replacements "Ü->UE" "%->percent" --max-length=20
265
# Output: ueber-cafe-costs-50
266
267
# Help
268
slugify --help
269
```
270
271
#### Command Line Parameters
272
273
All `slugify()` function parameters are available as command-line options:
274
275
- `--separator`: Custom separator (default: "-")
276
- `--max-length`: Maximum output length
277
- `--word-boundary`: Truncate to complete words
278
- `--save-order`: Preserve word order when truncating
279
- `--stopwords`: Space-separated list of words to exclude
280
- `--regex-pattern`: Custom regex for disallowed characters
281
- `--no-lowercase`: Disable lowercase conversion
282
- `--replacements`: Replacement rules in format "old->new"
283
- `--allow-unicode`: Allow Unicode characters
284
- `--no-entities`: Disable HTML entity conversion
285
- `--no-decimal`: Disable HTML decimal conversion
286
- `--no-hexadecimal`: Disable HTML hexadecimal conversion
287
- `--stdin`: Read input from stdin
288
289
## Types and Constants
290
291
```python { .api }
292
# Default separator constant
293
DEFAULT_SEPARATOR: str = "-"
294
295
# Regex patterns for text processing
296
CHAR_ENTITY_PATTERN: re.Pattern[str] # HTML character entities
297
DECIMAL_PATTERN: re.Pattern[str] # HTML decimal references
298
HEX_PATTERN: re.Pattern[str] # HTML hexadecimal references
299
QUOTE_PATTERN: re.Pattern[str] # Quote characters
300
DISALLOWED_CHARS_PATTERN: re.Pattern[str] # Disallowed ASCII characters
301
DISALLOWED_UNICODE_CHARS_PATTERN: re.Pattern[str] # Disallowed Unicode characters
302
DUPLICATE_DASH_PATTERN: re.Pattern[str] # Duplicate dashes
303
NUMBERS_PATTERN: re.Pattern[str] # Comma-separated numbers
304
```
305
306
### Package Metadata
307
308
```python { .api }
309
# Version and package information
310
__version__: str # Package version (e.g., "8.0.4")
311
__title__: str # Package title ("python-slugify")
312
__author__: str # Package author ("Val Neekman")
313
__author_email__: str # Author email ("info@neekware.com")
314
__description__: str # Package description
315
__url__: str # Package URL ("https://github.com/un33k/python-slugify")
316
__license__: str # License ("MIT")
317
__copyright__: str # Copyright notice
318
```
319
320
#### Usage Examples
321
322
```python
323
from slugify import __version__, __title__, __author__
324
325
print(f"{__title__} version {__version__} by {__author__}")
326
# Output: python-slugify version 8.0.4 by Val Neekman
327
```
328
329
## Error Handling
330
331
Python Slugify is designed to be robust and handles various edge cases gracefully:
332
333
- **Invalid input types**: Automatically converts non-string inputs to strings
334
- **HTML entity errors**: Silently skips malformed decimal/hexadecimal entities
335
- **Empty input**: Returns empty string for empty or whitespace-only input
336
- **Unicode normalization**: Handles Unicode normalization form variations
337
- **Regex pattern errors**: Falls back to default patterns if custom regex is invalid
338
339
```python
340
from slugify import slugify
341
342
# Handles various input types
343
result = slugify(123) # "123"
344
result = slugify(None) # ""
345
result = slugify("") # ""
346
347
# Graceful error handling for malformed HTML entities
348
result = slugify("&#invalid;") # Skips invalid entity, continues processing
349
```
350
351
## Dependencies
352
353
- **Required**: `text-unidecode>=1.3` (GPL & Perl Artistic license)
354
- **Optional**: `Unidecode>=1.1.1` (install with `pip install python-slugify[unidecode]`)
355
356
The package automatically uses `Unidecode` if available, otherwise falls back to `text-unidecode`. `Unidecode` is considered more advanced for Unicode transliteration but has different licensing terms.