Tessl Tile for pypi/python-slugify@8.0.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

tessl/pypi-python-slugify

A Python slugify application that handles Unicode text conversion to URL-friendly slugs

Workspace: tessl
Visibility: Public
Created: 3 months ago
Last updated: 3 months ago
Describes: pkg:pypi/python-slugify@8.0.x

To install, run

npx @tessl/cli install tessl/pypi-python-slugify@8.0.0

0
# Python Slugify
1

2
A comprehensive Python library for converting Unicode text strings into URL-friendly slugs. Python Slugify handles complex Unicode characters from various languages by transliterating them to ASCII equivalents, while offering extensive customization options including custom separators, stopword filtering, length limits, regex patterns, and character replacements.
3

4
## Package Information
5

6
- **Package Name**: python-slugify
7
- **Language**: Python
8
- **Installation**: `pip install python-slugify`
9
- **Optional**: `pip install python-slugify[unidecode]` (for advanced Unicode handling)
10

11
## Core Imports
12

13
```python
14
from slugify import slugify
15
```
16

17
Additional utilities and special character mappings:
18

19
```python
20
from slugify import slugify, smart_truncate
21
from slugify import PRE_TRANSLATIONS, CYRILLIC, GERMAN, GREEK
22
```
23

24
Version and metadata information:
25

26
```python
27
from slugify import __version__, __title__, __author__, __description__
28
```
29

30
Regex patterns and constants:
31

32
```python
33
from slugify import DEFAULT_SEPARATOR
34
from slugify import CHAR_ENTITY_PATTERN, DECIMAL_PATTERN, HEX_PATTERN
35
```
36

37
## Basic Usage
38

39
```python
40
from slugify import slugify
41

42
# Basic text slugification
43
text = "This is a test ---"
44
result = slugify(text)
45
print(result)  # "this-is-a-test"
46

47
# Unicode text handling
48
text = '影師嗎'
49
result = slugify(text)
50
print(result)  # "ying-shi-ma"
51

52
# Preserve Unicode characters
53
text = '影師嗎'
54
result = slugify(text, allow_unicode=True)
55
print(result)  # "影師嗎"
56

57
# Custom separator and length limits
58
text = 'C\'est déjà l\'été.'
59
result = slugify(text, separator='_', max_length=15)
60
print(result)  # "c_est_deja_l_et"
61

62
# Using replacement rules
63
text = "50% off | great deal"
64
result = slugify(text, replacements=[['%', 'percent'], ['|', 'or']])
65
print(result)  # "50-percent-off-or-great-deal"
66
```
67

68
## Capabilities
69

70
### Text Slugification
71

72
The main function for converting text to URL-friendly slugs with comprehensive Unicode support and customization options.
73

74
```python { .api }
75
def slugify(
76
    text: str,
77
    entities: bool = True,
78
    decimal: bool = True,
79
    hexadecimal: bool = True,
80
    max_length: int = 0,
81
    word_boundary: bool = False,
82
    separator: str = "-",
83
    save_order: bool = False,
84
    stopwords: Iterable[str] = (),
85
    regex_pattern: re.Pattern[str] | str | None = None,
86
    lowercase: bool = True,
87
    replacements: Iterable[Iterable[str]] = (),
88
    allow_unicode: bool = False,
89
) -> str:
90
    """
91
    Convert text into a URL-friendly slug.
92

93
    Parameters:
94
    - text (str): Input text to slugify
95
    - entities (bool): Convert HTML entities to unicode (default: True)
96
    - decimal (bool): Convert HTML decimal entities to unicode (default: True)
97
    - hexadecimal (bool): Convert HTML hexadecimal entities to unicode (default: True)
98
    - max_length (int): Maximum output length, 0 for no limit (default: 0)
99
    - word_boundary (bool): Truncate to complete words (default: False)
100
    - separator (str): Separator between words (default: "-")
101
    - save_order (bool): Preserve word order when truncating (default: False)
102
    - stopwords (Iterable[str]): Words to exclude from output (default: ())
103
    - regex_pattern (re.Pattern[str] | str | None): Custom regex for disallowed characters (default: None)
104
    - lowercase (bool): Convert to lowercase (default: True)
105
    - replacements (Iterable[Iterable[str]]): Custom replacement rules (default: ())
106
    - allow_unicode (bool): Allow Unicode characters in output (default: False)
107

108
    Returns:
109
    str: URL-friendly slug
110
    """
111
```
112

113
#### Usage Examples
114

115
```python
116
from slugify import slugify
117

118
# HTML entity handling
119
text = "foo &amp; bar"
120
result = slugify(text)  # "foo-bar"
121

122
# Stopword filtering
123
text = "The quick brown fox"
124
result = slugify(text, stopwords=['the', 'a', 'an'])  # "quick-brown-fox"
125

126
# Custom regex pattern
127
import re
128
text = "Hello World 123"
129
pattern = re.compile(r'[^a-z]+')
130
result = slugify(text, regex_pattern=pattern)  # "hello-world"
131

132
# Length limits with word boundaries
133
text = "This is a very long sentence"
134
result = slugify(text, max_length=15, word_boundary=True)  # "this-is-a-very"
135

136
# Multiple replacement rules
137
text = "Price: $50 | 20% off"
138
replacements = [['$', 'dollar'], ['%', 'percent'], ['|', 'and']]
139
result = slugify(text, replacements=replacements)  # "price-dollar50-and-20-percent-off"
140
```
141

142
### Smart Text Truncation
143

144
Intelligent string truncation with word boundary preservation and order control.
145

146
```python { .api }
147
def smart_truncate(
148
    string: str,
149
    max_length: int = 0,
150
    word_boundary: bool = False,
151
    separator: str = " ",
152
    save_order: bool = False,
153
) -> str:
154
    """
155
    Intelligently truncate strings while preserving word boundaries.
156

157
    Parameters:
158
    - string (str): String to truncate
159
    - max_length (int): Maximum length, 0 for no truncation (default: 0)
160
    - word_boundary (bool): Respect word boundaries (default: False)
161
    - separator (str): Word separator (default: " ")
162
    - save_order (bool): Maintain original word order (default: False)
163

164
    Returns:
165
    str: Truncated string
166
    """
167
```
168

169
#### Usage Examples
170

171
```python
172
from slugify import smart_truncate
173

174
# Basic truncation
175
text = "This is a long sentence"
176
result = smart_truncate(text, max_length=10)  # "This is a "
177

178
# Word boundary preservation
179
text = "This is a long sentence"
180
result = smart_truncate(text, max_length=15, word_boundary=True)  # "This is a long"
181

182
# Custom separator
183
text = "word1-word2-word3-word4"
184
result = smart_truncate(text, max_length=15, word_boundary=True, separator="-")  # "word1-word2"
185
```
186

187
### Language-Specific Character Mappings
188

189
Pre-defined character translation mappings for various languages, useful for custom transliteration workflows.
190

191
```python { .api }
192
# Character mapping lists
193
CYRILLIC: list[tuple[str, str]]
194
GERMAN: list[tuple[str, str]]
195
GREEK: list[tuple[str, str]]
196
PRE_TRANSLATIONS: list[tuple[str, str]]
197

198
def add_uppercase_char(char_list: list[tuple[str, str]]) -> list[tuple[str, str]]:
199
    """
200
    Add uppercase variants to character replacement list.
201

202
    Parameters:
203
    - char_list (list[tuple[str, str]]): List of character replacement tuples
204

205
    Returns:
206
    list[tuple[str, str]]: Enhanced list with uppercase variants
207
    """
208
```
209

210
#### Available Character Mappings
211

212
```python
213
from slugify import CYRILLIC, GERMAN, GREEK, PRE_TRANSLATIONS
214

215
# Cyrillic mappings: ё->e, я->ya, х->h, у->y, щ->sch, ю->u (with uppercase variants)
216
print(CYRILLIC[:3])  # [('Ё', 'E'), ('ё', 'e'), ('Я', 'Ya'), ...]
217

218
# German umlaut mappings: ä->ae, ö->oe, ü->ue (with uppercase variants)
219
print(GERMAN[:3])   # [('Ä', 'Ae'), ('ä', 'ae'), ('Ö', 'Oe'), ...]
220

221
# Greek mappings: χ->ch, Ξ->X, ϒ->Y, υ->y, etc. (with uppercase variants)
222
print(GREEK[:3])    # [('Χ', 'Ch'), ('χ', 'ch'), ('Ξ', 'X'), ...]
223

224
# Combined mappings from all languages
225
print(len(PRE_TRANSLATIONS))  # Total count of all mappings
226
```
227

228
### Command Line Interface
229

230
Python Slugify provides a command-line interface for text slugification with full parameter support.
231

232
```python { .api }
233
def main(argv: list[str] | None = None):
234
    """
235
    Command-line entry point for slugification.
236

237
    Parameters:
238
    - argv (list[str] | None): Command line arguments (default: None uses sys.argv)
239
    """
240
```
241

242
#### Command Line Usage
243

244
```bash
245
# Basic usage
246
slugify "Hello World"  # Output: hello-world
247

248
# From stdin
249
echo "Hello World" | slugify --stdin
250

251
# With options
252
slugify "Hello World" --separator="_" --max-length=8  # Output: hello_wo
253

254
# Custom replacements
255
slugify "Price: $50" --replacements "\$->dollar"  # Output: price-dollar50
256

257
# Custom regex pattern
258
slugify "Keep_underscores" --regex-pattern "[^-a-z0-9_]+"  # Output: keep_underscores
259

260
# Allow unicode
261
slugify "影師嗎" --allow-unicode  # Output: 影師嗎
262

263
# Complex combination
264
slugify "The ÜBER café costs 50%" --stopwords "the" --replacements "Ü->UE" "%->percent" --max-length=20
265
# Output: ueber-cafe-costs-50
266

267
# Help
268
slugify --help
269
```
270

271
#### Command Line Parameters
272

273
All `slugify()` function parameters are available as command-line options:
274

275
- `--separator`: Custom separator (default: "-")
276
- `--max-length`: Maximum output length
277
- `--word-boundary`: Truncate to complete words
278
- `--save-order`: Preserve word order when truncating
279
- `--stopwords`: Space-separated list of words to exclude
280
- `--regex-pattern`: Custom regex for disallowed characters
281
- `--no-lowercase`: Disable lowercase conversion
282
- `--replacements`: Replacement rules in format "old->new"
283
- `--allow-unicode`: Allow Unicode characters
284
- `--no-entities`: Disable HTML entity conversion
285
- `--no-decimal`: Disable HTML decimal conversion
286
- `--no-hexadecimal`: Disable HTML hexadecimal conversion
287
- `--stdin`: Read input from stdin
288

289
## Types and Constants
290

291
```python { .api }
292
# Default separator constant
293
DEFAULT_SEPARATOR: str = "-"
294

295
# Regex patterns for text processing
296
CHAR_ENTITY_PATTERN: re.Pattern[str]      # HTML character entities
297
DECIMAL_PATTERN: re.Pattern[str]          # HTML decimal references
298
HEX_PATTERN: re.Pattern[str]              # HTML hexadecimal references
299
QUOTE_PATTERN: re.Pattern[str]            # Quote characters
300
DISALLOWED_CHARS_PATTERN: re.Pattern[str] # Disallowed ASCII characters
301
DISALLOWED_UNICODE_CHARS_PATTERN: re.Pattern[str] # Disallowed Unicode characters
302
DUPLICATE_DASH_PATTERN: re.Pattern[str]   # Duplicate dashes
303
NUMBERS_PATTERN: re.Pattern[str]          # Comma-separated numbers
304
```
305

306
### Package Metadata
307

308
```python { .api }
309
# Version and package information
310
__version__: str        # Package version (e.g., "8.0.4")
311
__title__: str          # Package title ("python-slugify")
312
__author__: str         # Package author ("Val Neekman")
313
__author_email__: str   # Author email ("info@neekware.com")
314
__description__: str    # Package description
315
__url__: str            # Package URL ("https://github.com/un33k/python-slugify")
316
__license__: str        # License ("MIT")
317
__copyright__: str      # Copyright notice
318
```
319

320
#### Usage Examples
321

322
```python
323
from slugify import __version__, __title__, __author__
324

325
print(f"{__title__} version {__version__} by {__author__}")
326
# Output: python-slugify version 8.0.4 by Val Neekman
327
```
328

329
## Error Handling
330

331
Python Slugify is designed to be robust and handles various edge cases gracefully:
332

333
- **Invalid input types**: Automatically converts non-string inputs to strings
334
- **HTML entity errors**: Silently skips malformed decimal/hexadecimal entities
335
- **Empty input**: Returns empty string for empty or whitespace-only input
336
- **Unicode normalization**: Handles Unicode normalization form variations
337
- **Regex pattern errors**: Falls back to default patterns if custom regex is invalid
338

339
```python
340
from slugify import slugify
341

342
# Handles various input types
343
result = slugify(123)         # "123"
344
result = slugify(None)        # ""
345
result = slugify("")          # ""
346

347
# Graceful error handling for malformed HTML entities
348
result = slugify("&#invalid;")  # Skips invalid entity, continues processing
349
```
350

351
## Dependencies
352

353
- **Required**: `text-unidecode>=1.3` (GPL & Perl Artistic license)
354
- **Optional**: `Unidecode>=1.1.1` (install with `pip install python-slugify[unidecode]`)
355

356
The package automatically uses `Unidecode` if available, otherwise falls back to `text-unidecode`. `Unidecode` is considered more advanced for Unicode transliteration but has different licensing terms.