Tessl Tile for pypi/tldextract@5.3.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

cli.md configurable-extraction.md index.md result-processing.md url-extraction.md

url-extraction.mddocs/

0
# URL Extraction
1

2
Core functionality for extracting URL components using the convenience `extract()` function. This provides the most common use case with sensible defaults and handles the majority of URL parsing scenarios.
3

4
## Capabilities
5

6
### Basic Extraction
7

8
The primary extraction function that separates any URL-like string into its subdomain, domain, and public suffix components.
9

10
```python { .api }
11
def extract(
12
    url: str,
13
    include_psl_private_domains: bool | None = False,
14
    session: requests.Session | None = None
15
) -> ExtractResult:
16
    """
17
    Extract subdomain, domain, and suffix from a URL string.
18
    
19
    Parameters:
20
    - url: URL string to parse (can include protocol, port, path)
21
    - include_psl_private_domains: Include PSL private domains like 'blogspot.com'
22
    - session: Optional requests.Session for HTTP customization
23
    
24
    Returns:
25
    ExtractResult with parsed components and metadata
26
    """
27
```
28

29
**Usage Examples:**
30

31
```python
32
import tldextract
33

34
# Standard domains
35
result = tldextract.extract('http://www.google.com')
36
print(result)
37
# ExtractResult(subdomain='www', domain='google', suffix='com', is_private=False)
38

39
# Complex country code TLDs
40
result = tldextract.extract('http://forums.bbc.co.uk/')
41
print(result)
42
# ExtractResult(subdomain='forums', domain='bbc', suffix='co.uk', is_private=False)
43

44
# Subdomains with multiple levels
45
result = tldextract.extract('http://forums.news.cnn.com/')
46
print(result)
47
# ExtractResult(subdomain='forums.news', domain='cnn', suffix='com', is_private=False)
48

49
# International domains
50
result = tldextract.extract('http://www.worldbank.org.kg/')
51
print(result)
52
# ExtractResult(subdomain='www', domain='worldbank', suffix='org.kg', is_private=False)
53
```
54

55
### Private Domain Handling
56

57
Control how PSL private domains are handled during extraction. Private domains are organizational domains like 'blogspot.com' that allow subdomain registration.
58

59
```python
60
# Default behavior - treat private domains as regular domains
61
result = tldextract.extract('waiterrant.blogspot.com')
62
print(result)
63
# ExtractResult(subdomain='waiterrant', domain='blogspot', suffix='com', is_private=False)
64

65
# Include private domains in suffix
66
result = tldextract.extract('waiterrant.blogspot.com', include_psl_private_domains=True)
67
print(result)
68
# ExtractResult(subdomain='', domain='waiterrant', suffix='blogspot.com', is_private=True)
69
```
70

71
### Edge Case Handling
72

73
The library gracefully handles various edge cases including IP addresses, invalid suffixes, and malformed URLs.
74

75
```python
76
# IP addresses
77
result = tldextract.extract('http://127.0.0.1:8080/deployed/')
78
print(result)
79
# ExtractResult(subdomain='', domain='127.0.0.1', suffix='', is_private=False)
80

81
# IPv6 addresses
82
result = tldextract.extract('http://[2001:db8::1]/path')
83
print(result.domain)  # '[2001:db8::1]'
84

85
# No subdomain
86
result = tldextract.extract('google.com')
87
print(result)
88
# ExtractResult(subdomain='', domain='google', suffix='com', is_private=False)
89

90
# Invalid suffixes
91
result = tldextract.extract('google.notavalidsuffix')
92
print(result)
93
# ExtractResult(subdomain='google', domain='notavalidsuffix', suffix='', is_private=False)
94
```
95

96
### Session Customization
97

98
Provide custom HTTP session for PSL fetching to support proxies, authentication, or other HTTP customizations.
99

100
```python
101
import requests
102
import tldextract
103

104
# Create custom session with proxy
105
session = requests.Session()
106
session.proxies = {'http': 'http://proxy.example.com:8080'}
107

108
# Use custom session for PSL fetching
109
result = tldextract.extract('http://example.com', session=session)
110
```
111

112
### Update Functionality
113

114
Force update of the cached Public Suffix List data to get the latest TLD definitions.
115

116
```python { .api }
117
def update(fetch_now: bool = False, session: requests.Session | None = None) -> None:
118
    """
119
    Force update of cached PSL data.
120
    
121
    Parameters:
122
    - fetch_now: Whether to fetch immediately rather than on next extraction
123
    - session: Optional requests.Session for HTTP customization
124
    """
125
```
126

127
**Usage Example:**
128

129
```python
130
import tldextract
131

132
# Force update of PSL data
133
tldextract.update(fetch_now=True)
134

135
# Use after update
136
result = tldextract.extract('http://example.new-tld')
137
```
138

139
## Return Value
140

141
All extraction functions return an `ExtractResult` object with the following structure:
142

143
```python { .api }
144
@dataclass
145
class ExtractResult:
146
    subdomain: str  # All subdomains, empty string if none
147
    domain: str     # Main domain name
148
    suffix: str     # Public suffix (TLD), empty string if none/invalid
149
    is_private: bool  # Whether suffix is from PSL private domains
150
    registry_suffix: str  # Registry suffix (internal)
151
```
152

153
The `ExtractResult` provides additional properties and methods for working with the parsed components - see [Result Processing](./result-processing.md) for complete details.
154

155
## Error Handling
156

157
The extraction functions are designed to never raise exceptions for malformed input. Invalid or unparseable URLs will return sensible fallback values:
158

159
- Invalid URLs return the entire input as the `domain` with empty `subdomain` and `suffix`
160
- IP addresses are detected and returned as the `domain` with empty `suffix`
161
- Network errors during PSL fetching fall back to the bundled snapshot
162
- Malformed PSL data is handled gracefully with logging warnings

Version

Tile

Files

url-extraction.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

url-extraction.mddocs/