0
# Utility Functions
1
2
Furl provides a comprehensive set of utility functions for URL validation, encoding, parsing, and manipulation operations. These functions can be used independently or as part of the main furl functionality.
3
4
## Capabilities
5
6
### URL Parsing and Splitting
7
8
Enhanced URL parsing functions that extend Python's standard urllib functionality.
9
10
```python { .api }
11
def urlsplit(url):
12
"""
13
Split URL into components with enhanced parsing.
14
15
Args:
16
url (str): URL string to split
17
18
Returns:
19
SplitResult: Named tuple with URL components
20
"""
21
22
def urljoin(base, url):
23
"""
24
Join base URL with relative URL using enhanced logic.
25
26
Args:
27
base (str): Base URL string
28
url (str): URL string to join (can be relative or absolute)
29
30
Returns:
31
str: Joined URL string
32
"""
33
```
34
35
**Usage:**
36
37
```python
38
from furl import urlsplit, urljoin
39
40
# Enhanced URL splitting
41
result = urlsplit('https://user:pass@example.com:8080/path?query=value#fragment')
42
print(result.scheme) # 'https'
43
print(result.netloc) # 'user:pass@example.com:8080'
44
print(result.path) # '/path'
45
print(result.query) # 'query=value'
46
print(result.fragment) # 'fragment'
47
48
# URL joining
49
base = 'https://example.com/api/v1/'
50
endpoint = 'users/123'
51
full_url = urljoin(base, endpoint)
52
print(full_url) # 'https://example.com/api/v1/users/123'
53
54
# Join with absolute URL (replaces base)
55
absolute = 'https://different.com/other'
56
result = urljoin(base, absolute)
57
print(result) # 'https://different.com/other'
58
```
59
60
### Scheme Manipulation
61
62
Functions for extracting, validating, and manipulating URL schemes.
63
64
```python { .api }
65
def get_scheme(url):
66
"""
67
Extract scheme from URL string.
68
69
Args:
70
url (str): URL string
71
72
Returns:
73
str | None: Scheme string or None if no scheme
74
"""
75
76
def strip_scheme(url):
77
"""
78
Remove scheme from URL string.
79
80
Args:
81
url (str): URL string
82
83
Returns:
84
str: URL string without scheme
85
"""
86
87
def set_scheme(url, scheme):
88
"""
89
Set or replace scheme in URL string.
90
91
Args:
92
url (str): URL string
93
scheme (str): New scheme to set
94
95
Returns:
96
str: URL string with new scheme
97
"""
98
99
def is_valid_scheme(scheme):
100
"""
101
Validate URL scheme format.
102
103
Args:
104
scheme (str): Scheme string to validate
105
106
Returns:
107
bool: True if scheme is valid
108
"""
109
110
def has_netloc(url):
111
"""
112
Check if URL has network location component.
113
114
Args:
115
url (str): URL string to check
116
117
Returns:
118
bool: True if URL has netloc
119
"""
120
```
121
122
**Usage:**
123
124
```python
125
from furl import get_scheme, strip_scheme, set_scheme, is_valid_scheme, has_netloc
126
127
url = 'https://example.com/path'
128
129
# Extract scheme
130
scheme = get_scheme(url)
131
print(scheme) # 'https'
132
133
# Remove scheme
134
no_scheme = strip_scheme(url)
135
print(no_scheme) # '//example.com/path'
136
137
# Set new scheme
138
ftp_url = set_scheme(url, 'ftp')
139
print(ftp_url) # 'ftp://example.com/path'
140
141
# Validate scheme
142
print(is_valid_scheme('https')) # True
143
print(is_valid_scheme('ht-tps')) # False (invalid characters)
144
145
# Check for network location
146
print(has_netloc('https://example.com/path')) # True
147
print(has_netloc('/just/a/path')) # False
148
```
149
150
### Path Manipulation Utilities
151
152
Functions for manipulating URL path segments and components.
153
154
```python { .api }
155
def join_path_segments(*args):
156
"""
157
Join multiple path segments into a single path.
158
159
Args:
160
*args: Path segments to join (strings)
161
162
Returns:
163
str: Joined path string
164
"""
165
166
def remove_path_segments(segments, remove):
167
"""
168
Remove specified segments from path segments list.
169
170
Args:
171
segments (list): List of path segments
172
remove (list|str): Segments to remove
173
174
Returns:
175
list: Updated segments list
176
"""
177
178
def quacks_like_a_path_with_segments(obj):
179
"""
180
Duck typing check for path-like objects with segments.
181
182
Args:
183
obj: Object to check
184
185
Returns:
186
bool: True if object behaves like a path with segments
187
"""
188
```
189
190
**Usage:**
191
192
```python
193
from furl import join_path_segments, remove_path_segments, quacks_like_a_path_with_segments
194
195
# Join path segments
196
path = join_path_segments('api', 'v1', 'users', '123')
197
print(path) # 'api/v1/users/123'
198
199
# Remove segments from list
200
segments = ['api', 'v1', 'users', '123', 'profile']
201
updated = remove_path_segments(segments, ['v1', 'profile'])
202
print(updated) # ['api', 'users', '123']
203
204
# Duck typing check
205
from furl import Path
206
path_obj = Path('/api/v1/users')
207
print(quacks_like_a_path_with_segments(path_obj)) # True
208
print(quacks_like_a_path_with_segments("string")) # False
209
```
210
211
### Validation Functions
212
213
Functions for validating various URL components and formats.
214
215
```python { .api }
216
def is_valid_host(hostname):
217
"""
218
Validate hostname format.
219
220
Args:
221
hostname (str): Hostname to validate
222
223
Returns:
224
bool: True if hostname is valid
225
"""
226
227
def is_valid_port(port):
228
"""
229
Validate port number.
230
231
Args:
232
port (int|str): Port number to validate
233
234
Returns:
235
bool: True if port is valid (1-65535)
236
"""
237
238
def is_valid_encoded_path_segment(segment):
239
"""
240
Validate percent-encoded path segment.
241
242
Args:
243
segment (str): Path segment to validate
244
245
Returns:
246
bool: True if segment is properly encoded
247
"""
248
249
def is_valid_encoded_query_key(key):
250
"""
251
Validate percent-encoded query parameter key.
252
253
Args:
254
key (str): Query key to validate
255
256
Returns:
257
bool: True if key is properly encoded
258
"""
259
260
def is_valid_encoded_query_value(value):
261
"""
262
Validate percent-encoded query parameter value.
263
264
Args:
265
value (str): Query value to validate
266
267
Returns:
268
bool: True if value is properly encoded
269
"""
270
```
271
272
**Usage:**
273
274
```python
275
from furl import (is_valid_host, is_valid_port, is_valid_encoded_path_segment,
276
is_valid_encoded_query_key, is_valid_encoded_query_value)
277
278
# Validate hostname
279
print(is_valid_host('example.com')) # True
280
print(is_valid_host('sub.example.com')) # True
281
print(is_valid_host('192.168.1.1')) # True
282
print(is_valid_host('invalid..host')) # False
283
284
# Validate port
285
print(is_valid_port(80)) # True
286
print(is_valid_port('443')) # True
287
print(is_valid_port(0)) # False
288
print(is_valid_port(99999)) # False
289
290
# Validate encoded components
291
print(is_valid_encoded_path_segment('users')) # True
292
print(is_valid_encoded_path_segment('user%20name')) # True
293
print(is_valid_encoded_path_segment('user name')) # False (not encoded)
294
295
print(is_valid_encoded_query_key('search_term')) # True
296
print(is_valid_encoded_query_key('search%20term')) # True
297
298
print(is_valid_encoded_query_value('hello%20world')) # True
299
print(is_valid_encoded_query_value('hello world')) # False
300
```
301
302
### Encoding and Decoding Utilities
303
304
Functions for handling character encoding and IDNA (Internationalized Domain Names).
305
306
```python { .api }
307
def utf8(obj, default=None):
308
"""
309
Convert object to UTF-8 encoded string.
310
311
Args:
312
obj: Object to convert
313
default: Default value if conversion fails
314
315
Returns:
316
str: UTF-8 encoded string
317
"""
318
319
def idna_encode(hostname):
320
"""
321
Encode hostname using IDNA (Internationalized Domain Names).
322
323
Args:
324
hostname (str): Hostname to encode
325
326
Returns:
327
str: IDNA encoded hostname
328
"""
329
330
def idna_decode(hostname):
331
"""
332
Decode IDNA encoded hostname.
333
334
Args:
335
hostname (str): IDNA encoded hostname
336
337
Returns:
338
str: Decoded hostname
339
"""
340
341
def attemptstr(obj):
342
"""
343
Attempt to convert object to string.
344
345
Args:
346
obj: Object to convert
347
348
Returns:
349
str: String representation or original object
350
"""
351
352
def non_string_iterable(obj):
353
"""
354
Check if object is iterable but not a string.
355
356
Args:
357
obj: Object to check
358
359
Returns:
360
bool: True if iterable but not string
361
"""
362
```
363
364
**Usage:**
365
366
```python
367
from furl import utf8, idna_encode, idna_decode, attemptstr, non_string_iterable
368
369
# UTF-8 encoding
370
text = utf8('Hello 世界')
371
print(text) # Properly encoded UTF-8 string
372
373
# IDNA encoding for international domain names
374
international_domain = 'тест.example'
375
encoded = idna_encode(international_domain)
376
print(encoded) # 'xn--e1aybc.example'
377
378
# IDNA decoding
379
decoded = idna_decode(encoded)
380
print(decoded) # 'тест.example'
381
382
# String conversion
383
print(attemptstr(123)) # '123'
384
print(attemptstr(['a'])) # "['a']"
385
386
# Check for non-string iterables
387
print(non_string_iterable(['a', 'b'])) # True
388
print(non_string_iterable('string')) # False
389
print(non_string_iterable(123)) # False
390
```
391
392
### Helper Functions
393
394
Additional utility functions for common operations.
395
396
```python { .api }
397
def lget(lst, index, default=None):
398
"""
399
Safe list index access with default value.
400
401
Args:
402
lst (list): List to access
403
index (int): Index to access
404
default: Default value if index out of bounds
405
406
Returns:
407
Any: List item or default value
408
"""
409
410
def static_vars(**kwargs):
411
"""
412
Decorator to add static variables to functions.
413
414
Args:
415
**kwargs: Static variables to add
416
417
Returns:
418
function: Decorated function with static variables
419
"""
420
421
def create_quote_fn(safe_charset, quote_plus):
422
"""
423
Create custom URL quoting function.
424
425
Args:
426
safe_charset (str): Characters considered safe (not to quote)
427
quote_plus (bool): Use '+' for spaces instead of '%20'
428
429
Returns:
430
function: Custom quoting function
431
"""
432
```
433
434
**Usage:**
435
436
```python
437
from furl import lget
438
439
# Safe list access
440
items = ['a', 'b', 'c']
441
print(lget(items, 1)) # 'b'
442
print(lget(items, 10)) # None
443
print(lget(items, 10, 'default')) # 'default'
444
445
# Safe access with empty list
446
empty = []
447
print(lget(empty, 0, 'fallback')) # 'fallback'
448
```
449
450
## Constants and Configuration
451
452
### Default Ports Mapping
453
454
Dictionary mapping URL schemes to their default ports.
455
456
```python { .api }
457
DEFAULT_PORTS = {
458
'http': 80,
459
'https': 443,
460
'ftp': 21,
461
'ssh': 22,
462
'telnet': 23,
463
# ... and 34 more common protocols
464
}
465
```
466
467
**Usage:**
468
469
```python
470
from furl import DEFAULT_PORTS
471
472
# Check default port for scheme
473
print(DEFAULT_PORTS.get('https')) # 443
474
print(DEFAULT_PORTS.get('ftp')) # 21
475
print(DEFAULT_PORTS.get('unknown')) # None
476
477
# List all supported schemes
478
print(list(DEFAULT_PORTS.keys()))
479
```
480
481
### Validation Patterns
482
483
Regular expression patterns used for validation.
484
485
```python { .api }
486
PERCENT_REGEX = r'\%[a-fA-F\d][a-fA-F\d]' # Pattern for percent-encoded chars
487
INVALID_HOST_CHARS = '!@#$%^&\'"*()+=:;/' # Invalid characters in hostnames
488
```
489
490
### Advanced Helper Functions
491
492
Additional functions for specialized operations.
493
494
```python { .api }
495
def static_vars(**kwargs):
496
"""
497
Decorator to add static variables to functions.
498
499
Args:
500
**kwargs: Static variables to add
501
502
Returns:
503
function: Decorated function with static variables
504
"""
505
506
def create_quote_fn(safe_charset, quote_plus):
507
"""
508
Create custom URL quoting function.
509
510
Args:
511
safe_charset (str): Characters considered safe (not to quote)
512
quote_plus (bool): Use '+' for spaces instead of '%20'
513
514
Returns:
515
function: Custom quoting function
516
"""
517
```
518
519
## Error Handling
520
521
Utility functions handle various error conditions gracefully:
522
523
- **Invalid input types**: Functions handle unexpected input types
524
- **Encoding errors**: Proper handling of Unicode and encoding issues
525
- **Malformed URLs**: Graceful handling of malformed URL strings
526
- **Network errors**: IDNA encoding/decoding errors
527
- **Validation failures**: Clear indication of validation failures
528
529
```python
530
from furl import is_valid_host, utf8
531
532
# Handle invalid input gracefully
533
print(is_valid_host(None)) # False (doesn't crash)
534
print(utf8(None, 'fallback')) # 'fallback'
535
536
# Handle encoding errors
537
try:
538
result = idna_encode('invalid..domain')
539
except Exception as e:
540
print(f"Encoding error: {e}")
541
```