Tessl Tile for pypi/tldextract@5.3.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

cli.md configurable-extraction.md index.md result-processing.md url-extraction.md

index.mddocs/

0
# tldextract
1

2
Accurately separates a URL's subdomain, domain, and public suffix using the Public Suffix List (PSL). This library provides robust URL parsing that handles complex domain structures including country code TLDs (ccTLDs), generic TLDs (gTLDs), and their exceptions that naive string splitting cannot parse correctly.
3

4
## Package Information
5

6
- **Package Name**: tldextract
7
- **Language**: Python
8
- **Installation**: `pip install tldextract`
9

10
## Core Imports
11

12
```python
13
import tldextract
14
```
15

16
For basic usage, all functionality is available through the main module:
17

18
```python
19
from tldextract import extract, TLDExtract, ExtractResult, __version__
20
```
21

22
## Basic Usage
23

24
```python
25
import tldextract
26

27
# Basic URL extraction
28
result = tldextract.extract('http://forums.news.cnn.com/')
29
print(result)
30
# ExtractResult(subdomain='forums.news', domain='cnn', suffix='com', is_private=False)
31

32
# Access individual components
33
print(f"Subdomain: {result.subdomain}")  # 'forums.news'
34
print(f"Domain: {result.domain}")        # 'cnn'
35
print(f"Suffix: {result.suffix}")        # 'com'
36

37
# Reconstruct full domain name
38
print(result.fqdn)  # 'forums.news.cnn.com'
39

40
# Handle complex TLDs
41
uk_result = tldextract.extract('http://forums.bbc.co.uk/')
42
print(uk_result)
43
# ExtractResult(subdomain='forums', domain='bbc', suffix='co.uk', is_private=False)
44

45
# Handle edge cases
46
ip_result = tldextract.extract('http://127.0.0.1:8080/path')
47
print(ip_result)
48
# ExtractResult(subdomain='', domain='127.0.0.1', suffix='', is_private=False)
49
```
50

51
## Architecture
52

53
The tldextract library uses the authoritative Public Suffix List (PSL) to make parsing decisions:
54

55
- **Public Suffix List (PSL)**: Maintained list of all known public suffixes under which domain registration is possible
56
- **Caching System**: Local caching of PSL data to avoid repeated HTTP requests
57
- **Fallback Mechanism**: Built-in snapshot for offline operation
58
- **Private Domains**: Optional support for PSL private domains (like blogspot.com)
59

60
The library automatically fetches and caches the latest PSL data on first use, with intelligent fallback to a bundled snapshot if network access is unavailable.
61

62
## Capabilities
63

64
### URL Extraction
65

66
Core functionality for extracting URL components using the convenience `extract()` function. This provides the most common use case with sensible defaults.
67

68
```python { .api }
69
def extract(
70
    url: str,
71
    include_psl_private_domains: bool | None = False,
72
    session: requests.Session | None = None
73
) -> ExtractResult
74
```
75

76
[URL Extraction](./url-extraction.md)
77

78
### Configurable Extraction
79

80
Advanced extraction with custom configuration options including cache settings, custom suffix lists, and private domain handling through the `TLDExtract` class.
81

82
```python { .api }
83
class TLDExtract:
84
    def __init__(
85
        self,
86
        cache_dir: str | None = None,
87
        suffix_list_urls: Sequence[str] = PUBLIC_SUFFIX_LIST_URLS,
88
        fallback_to_snapshot: bool = True,
89
        include_psl_private_domains: bool = False,
90
        extra_suffixes: Sequence[str] = (),
91
        cache_fetch_timeout: str | float | None = CACHE_TIMEOUT
92
    ) -> None
93
    
94
    def __call__(
95
        self,
96
        url: str,
97
        include_psl_private_domains: bool | None = None,
98
        session: requests.Session | None = None
99
    ) -> ExtractResult
100

101
    def extract_str(
102
        self,
103
        url: str,
104
        include_psl_private_domains: bool | None = None,
105
        session: requests.Session | None = None
106
    ) -> ExtractResult
107

108
    def extract_urllib(
109
        self,
110
        url: urllib.parse.ParseResult | urllib.parse.SplitResult,
111
        include_psl_private_domains: bool | None = None,
112
        session: requests.Session | None = None
113
    ) -> ExtractResult
114

115
    def update(
116
        self,
117
        fetch_now: bool = False,
118
        session: requests.Session | None = None
119
    ) -> None
120

121
    def tlds(self, session: requests.Session | None = None) -> list[str]
122
```
123

124
[Configurable Extraction](./configurable-extraction.md)
125

126
### Result Processing
127

128
Comprehensive result handling with properties for reconstructing domains, handling IP addresses, and accessing metadata about the extraction process.
129

130
```python { .api }
131
@dataclass
132
class ExtractResult:
133
    subdomain: str
134
    domain: str
135
    suffix: str
136
    is_private: bool
137
    registry_suffix: str
138

139
    @property
140
    def fqdn(self) -> str
141
    
142
    @property
143
    def ipv4(self) -> str
144
    
145
    @property
146
    def ipv6(self) -> str
147
    
148
    @property
149
    def registered_domain(self) -> str
150
    
151
    @property
152
    def reverse_domain_name(self) -> str
153
    
154
    @property
155
    def top_domain_under_public_suffix(self) -> str
156
    
157
    @property
158
    def top_domain_under_registry_suffix(self) -> str
159
```
160

161
[Result Processing](./result-processing.md)
162

163
### Command Line Interface
164

165
Command-line tool for URL parsing with options for output formatting, cache management, and PSL updates.
166

167
```bash { .api }
168
tldextract [options] <url1> [url2] ...
169
```
170

171
[Command Line Interface](./cli.md)
172

173
### PSL Data Management
174

175
Functions for updating and managing Public Suffix List data globally.
176

177
```python { .api }
178
def update(fetch_now: bool = False, session: requests.Session | None = None) -> None
179
```
180

181
[URL Extraction](./url-extraction.md)
182

183
## Types
184

185
```python { .api }
186
from typing import Sequence
187
from dataclasses import dataclass, field
188
import requests
189
import urllib.parse
190

191
# Module attributes
192
__version__: str
193

194
# Constants
195
PUBLIC_SUFFIX_LIST_URLS: tuple[str, ...]
196
CACHE_TIMEOUT: str | None
197

198
# Functions - detailed in respective sections
199

200
# Classes - detailed in respective sections
201
ExtractResult = dataclass  # Detailed in Result Processing section
202
TLDExtract = class  # Detailed in Configurable Extraction section
203
```

Version

Tile

Files

index.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

index.mddocs/