0
# Command Line Interface
1
2
Command-line tool for URL parsing with options for output formatting, cache management, PSL updates, and batch processing. The CLI provides access to tldextract functionality from shell scripts and command-line workflows.
3
4
## Capabilities
5
6
### Basic Command Structure
7
8
The CLI accepts URLs as positional arguments and provides various options for customizing behavior and output.
9
10
```bash { .api }
11
tldextract [options] <url1> [url2] ...
12
13
Options:
14
--version Show version information
15
-j, --json Output in JSON format
16
-u, --update Force fetch latest TLD definitions
17
--suffix_list_url URL Use alternate PSL URL/file (can specify multiple)
18
-c DIR, --cache_dir DIR Use alternate cache directory
19
-p, --include_psl_private_domains, --private_domains
20
Include PSL private domains
21
--no_fallback_to_snapshot Don't fall back to bundled PSL snapshot
22
```
23
24
### Basic Usage
25
26
Extract URL components with default space-separated output:
27
28
```bash
29
# Single URL
30
tldextract 'http://forums.bbc.co.uk'
31
# Output: forums bbc co.uk
32
33
# Multiple URLs
34
tldextract 'google.com' 'http://forums.news.cnn.com/' 'https://www.example.co.uk'
35
# Output:
36
# google com
37
# forums.news cnn com
38
# www example co.uk
39
40
# Complex domains
41
tldextract 'http://www.worldbank.org.kg/'
42
# Output: www worldbank org.kg
43
```
44
45
### JSON Output
46
47
Get structured JSON output for programmatic processing:
48
49
```bash
50
# Single URL with JSON output
51
tldextract --json 'http://forums.bbc.co.uk'
52
# Output: {"subdomain": "forums", "domain": "bbc", "suffix": "co.uk", "is_private": false, "registry_suffix": "co.uk", "fqdn": "forums.bbc.co.uk", "ipv4": "", "ipv6": "", "registered_domain": "bbc.co.uk", "reverse_domain_name": "co.uk.bbc.forums", "top_domain_under_public_suffix": "bbc.co.uk", "top_domain_under_registry_suffix": "bbc.co.uk"}
53
54
# Multiple URLs with JSON output
55
tldextract --json 'google.com' 'http://127.0.0.1:8080'
56
# Output:
57
# {"subdomain": "", "domain": "google", "suffix": "com", "is_private": false, "registry_suffix": "com", "fqdn": "google.com", "ipv4": "", "ipv6": "", "registered_domain": "google.com", "reverse_domain_name": "com.google", "top_domain_under_public_suffix": "google.com", "top_domain_under_registry_suffix": "google.com"}
58
# {"subdomain": "", "domain": "127.0.0.1", "suffix": "", "is_private": false, "registry_suffix": "", "fqdn": "", "ipv4": "127.0.0.1", "ipv6": "", "registered_domain": "", "reverse_domain_name": "127.0.0.1", "top_domain_under_public_suffix": "", "top_domain_under_registry_suffix": ""}
59
```
60
61
### Private Domain Handling
62
63
Control how PSL private domains are processed:
64
65
```bash
66
# Default behavior - private domains as regular domains
67
tldextract 'waiterrant.blogspot.com'
68
# Output: waiterrant blogspot com
69
70
# Include private domains in suffix
71
tldextract --include_psl_private_domains 'waiterrant.blogspot.com'
72
# Output: waiterrant blogspot.com
73
74
# Short form of the option
75
tldextract -p 'waiterrant.blogspot.com'
76
# Output: waiterrant blogspot.com
77
```
78
79
### PSL Data Management
80
81
Update and manage Public Suffix List data:
82
83
```bash
84
# Force update PSL data from remote sources
85
tldextract --update
86
87
# Update and then process URLs
88
tldextract --update 'http://example.new-tld'
89
90
# Check version after update
91
tldextract --version
92
```
93
94
### Custom PSL Sources
95
96
Use alternative or local PSL data sources:
97
98
```bash
99
# Use custom remote PSL source
100
tldextract --suffix_list_url 'http://custom.psl.mirror.com/list.dat' 'example.com'
101
102
# Use local PSL file
103
tldextract --suffix_list_url 'file:///path/to/custom/suffix_list.dat' 'example.com'
104
105
# Use multiple PSL sources (tried in order)
106
tldextract --suffix_list_url 'http://primary.psl.com/list.dat' --suffix_list_url 'http://backup.psl.com/list.dat' 'example.com'
107
108
# Disable fallback to bundled snapshot
109
tldextract --suffix_list_url 'http://custom.psl.com/list.dat' --no_fallback_to_snapshot 'example.com'
110
```
111
112
### Cache Management
113
114
Control PSL data caching behavior:
115
116
```bash
117
# Use custom cache directory
118
tldextract --cache_dir '/path/to/custom/cache' 'example.com'
119
120
# Use environment variable for cache location
121
export TLDEXTRACT_CACHE="/path/to/cache"
122
tldextract 'example.com'
123
124
# Use environment variable for cache timeout
125
export TLDEXTRACT_CACHE_TIMEOUT="10.0"
126
tldextract 'example.com'
127
```
128
129
## Integration Examples
130
131
### Shell Scripting
132
133
Extract specific components for shell scripts:
134
135
```bash
136
#!/bin/bash
137
# Extract just the domain name
138
URL="http://forums.news.cnn.com/"
139
DOMAIN=$(tldextract "$URL" | awk '{print $2}')
140
echo "Domain: $DOMAIN" # Output: Domain: cnn
141
142
# Extract all components
143
read SUBDOMAIN DOMAIN SUFFIX <<< $(tldextract "$URL")
144
echo "Subdomain: $SUBDOMAIN"
145
echo "Domain: $DOMAIN"
146
echo "Suffix: $SUFFIX"
147
```
148
149
### Batch Processing
150
151
Process multiple URLs from files or pipes:
152
153
```bash
154
# Process URLs from file
155
cat urls.txt | xargs tldextract
156
157
# Process with JSON output for further processing
158
cat urls.txt | xargs tldextract --json | jq '.domain' | sort | uniq
159
160
# Extract domains from access logs
161
grep "GET" access.log | awk '{print $7}' | xargs tldextract | awk '{print $2}' | sort | uniq -c
162
```
163
164
### Combined with Other Tools
165
166
Use with standard Unix tools for data processing:
167
168
```bash
169
# Count domains by TLD
170
tldextract --json 'site1.com' 'site2.org' 'site3.com' | jq -r '.suffix' | sort | uniq -c
171
172
# Extract and validate domains
173
echo "http://example.com" | xargs tldextract --json | jq -r 'select(.suffix != "") | .top_domain_under_public_suffix'
174
175
# Check for private domains
176
tldextract --json --include_psl_private_domains 'waiterrant.blogspot.com' | jq '.is_private'
177
```
178
179
## Error Handling
180
181
The CLI handles various error conditions gracefully:
182
183
### Invalid URLs
184
185
```bash
186
# Invalid URLs are processed without errors
187
tldextract 'not-a-url' 'google.notavalidsuffix'
188
# Output:
189
# not-a-url
190
# google notavalidsuffix
191
```
192
193
### Network Errors
194
195
```bash
196
# Network errors during PSL update are logged but don't prevent operation
197
tldextract --update --suffix_list_url 'http://nonexistent.example.com/list.dat' 'example.com'
198
# Will fall back to cached data or bundled snapshot
199
```
200
201
### Missing Arguments
202
203
```bash
204
# No URLs provided shows usage
205
tldextract
206
# Output: usage: tldextract [-h] [--version] [-j] [-u] [--suffix_list_url SUFFIX_LIST_URL] [-c CACHE_DIR] [-p] [--no_fallback_to_snapshot] [fqdn|url ...]
207
208
# Help is available
209
tldextract --help
210
```
211
212
## Output Format Details
213
214
### Standard Output Format
215
216
Space-separated: `subdomain domain suffix`
217
218
- Empty fields are represented as empty strings
219
- IPv4/IPv6 addresses appear in the domain field with empty suffix
220
- Invalid suffixes result in empty suffix field
221
222
### JSON Output Format
223
224
Complete ExtractResult data including all properties:
225
226
```json
227
{
228
"subdomain": "forums",
229
"domain": "bbc",
230
"suffix": "co.uk",
231
"is_private": false,
232
"registry_suffix": "co.uk",
233
"fqdn": "forums.bbc.co.uk",
234
"ipv4": "",
235
"ipv6": "",
236
"registered_domain": "bbc.co.uk",
237
"reverse_domain_name": "co.uk.bbc.forums",
238
"top_domain_under_public_suffix": "bbc.co.uk",
239
"top_domain_under_registry_suffix": "bbc.co.uk"
240
}
241
```
242
243
## Environment Variables
244
245
The CLI respects the following environment variables:
246
247
- `TLDEXTRACT_CACHE`: Cache directory path (overrides default)
248
- `TLDEXTRACT_CACHE_TIMEOUT`: HTTP timeout for PSL fetching (seconds)
249
250
```bash
251
# Set cache location
252
export TLDEXTRACT_CACHE="/tmp/tldextract_cache"
253
254
# Set timeout
255
export TLDEXTRACT_CACHE_TIMEOUT="5.0"
256
257
# Use with settings
258
tldextract 'example.com'
259
```