Accurately separates a URL's subdomain, domain, and public suffix using the Public Suffix List
Command-line tool for URL parsing with options for output formatting, cache management, PSL updates, and batch processing. The CLI provides access to tldextract functionality from shell scripts and command-line workflows.
The CLI accepts URLs as positional arguments and provides various options for customizing behavior and output.
tldextract [options] <url1> [url2] ...
Options:
--version Show version information
-j, --json Output in JSON format
-u, --update Force fetch latest TLD definitions
--suffix_list_url URL Use alternate PSL URL/file (can specify multiple)
-c DIR, --cache_dir DIR Use alternate cache directory
-p, --include_psl_private_domains, --private_domains
Include PSL private domains
--no_fallback_to_snapshot Don't fall back to bundled PSL snapshotExtract URL components with default space-separated output:
# Single URL
tldextract 'http://forums.bbc.co.uk'
# Output: forums bbc co.uk
# Multiple URLs
tldextract 'google.com' 'http://forums.news.cnn.com/' 'https://www.example.co.uk'
# Output:
# google com
# forums.news cnn com
# www example co.uk
# Complex domains
tldextract 'http://www.worldbank.org.kg/'
# Output: www worldbank org.kgGet structured JSON output for programmatic processing:
# Single URL with JSON output
tldextract --json 'http://forums.bbc.co.uk'
# Output: {"subdomain": "forums", "domain": "bbc", "suffix": "co.uk", "is_private": false, "registry_suffix": "co.uk", "fqdn": "forums.bbc.co.uk", "ipv4": "", "ipv6": "", "registered_domain": "bbc.co.uk", "reverse_domain_name": "co.uk.bbc.forums", "top_domain_under_public_suffix": "bbc.co.uk", "top_domain_under_registry_suffix": "bbc.co.uk"}
# Multiple URLs with JSON output
tldextract --json 'google.com' 'http://127.0.0.1:8080'
# Output:
# {"subdomain": "", "domain": "google", "suffix": "com", "is_private": false, "registry_suffix": "com", "fqdn": "google.com", "ipv4": "", "ipv6": "", "registered_domain": "google.com", "reverse_domain_name": "com.google", "top_domain_under_public_suffix": "google.com", "top_domain_under_registry_suffix": "google.com"}
# {"subdomain": "", "domain": "127.0.0.1", "suffix": "", "is_private": false, "registry_suffix": "", "fqdn": "", "ipv4": "127.0.0.1", "ipv6": "", "registered_domain": "", "reverse_domain_name": "127.0.0.1", "top_domain_under_public_suffix": "", "top_domain_under_registry_suffix": ""}Control how PSL private domains are processed:
# Default behavior - private domains as regular domains
tldextract 'waiterrant.blogspot.com'
# Output: waiterrant blogspot com
# Include private domains in suffix
tldextract --include_psl_private_domains 'waiterrant.blogspot.com'
# Output: waiterrant blogspot.com
# Short form of the option
tldextract -p 'waiterrant.blogspot.com'
# Output: waiterrant blogspot.comUpdate and manage Public Suffix List data:
# Force update PSL data from remote sources
tldextract --update
# Update and then process URLs
tldextract --update 'http://example.new-tld'
# Check version after update
tldextract --versionUse alternative or local PSL data sources:
# Use custom remote PSL source
tldextract --suffix_list_url 'http://custom.psl.mirror.com/list.dat' 'example.com'
# Use local PSL file
tldextract --suffix_list_url 'file:///path/to/custom/suffix_list.dat' 'example.com'
# Use multiple PSL sources (tried in order)
tldextract --suffix_list_url 'http://primary.psl.com/list.dat' --suffix_list_url 'http://backup.psl.com/list.dat' 'example.com'
# Disable fallback to bundled snapshot
tldextract --suffix_list_url 'http://custom.psl.com/list.dat' --no_fallback_to_snapshot 'example.com'Control PSL data caching behavior:
# Use custom cache directory
tldextract --cache_dir '/path/to/custom/cache' 'example.com'
# Use environment variable for cache location
export TLDEXTRACT_CACHE="/path/to/cache"
tldextract 'example.com'
# Use environment variable for cache timeout
export TLDEXTRACT_CACHE_TIMEOUT="10.0"
tldextract 'example.com'Extract specific components for shell scripts:
#!/bin/bash
# Extract just the domain name
URL="http://forums.news.cnn.com/"
DOMAIN=$(tldextract "$URL" | awk '{print $2}')
echo "Domain: $DOMAIN" # Output: Domain: cnn
# Extract all components
read SUBDOMAIN DOMAIN SUFFIX <<< $(tldextract "$URL")
echo "Subdomain: $SUBDOMAIN"
echo "Domain: $DOMAIN"
echo "Suffix: $SUFFIX"Process multiple URLs from files or pipes:
# Process URLs from file
cat urls.txt | xargs tldextract
# Process with JSON output for further processing
cat urls.txt | xargs tldextract --json | jq '.domain' | sort | uniq
# Extract domains from access logs
grep "GET" access.log | awk '{print $7}' | xargs tldextract | awk '{print $2}' | sort | uniq -cUse with standard Unix tools for data processing:
# Count domains by TLD
tldextract --json 'site1.com' 'site2.org' 'site3.com' | jq -r '.suffix' | sort | uniq -c
# Extract and validate domains
echo "http://example.com" | xargs tldextract --json | jq -r 'select(.suffix != "") | .top_domain_under_public_suffix'
# Check for private domains
tldextract --json --include_psl_private_domains 'waiterrant.blogspot.com' | jq '.is_private'The CLI handles various error conditions gracefully:
# Invalid URLs are processed without errors
tldextract 'not-a-url' 'google.notavalidsuffix'
# Output:
# not-a-url
# google notavalidsuffix# Network errors during PSL update are logged but don't prevent operation
tldextract --update --suffix_list_url 'http://nonexistent.example.com/list.dat' 'example.com'
# Will fall back to cached data or bundled snapshot# No URLs provided shows usage
tldextract
# Output: usage: tldextract [-h] [--version] [-j] [-u] [--suffix_list_url SUFFIX_LIST_URL] [-c CACHE_DIR] [-p] [--no_fallback_to_snapshot] [fqdn|url ...]
# Help is available
tldextract --helpSpace-separated: subdomain domain suffix
Complete ExtractResult data including all properties:
{
"subdomain": "forums",
"domain": "bbc",
"suffix": "co.uk",
"is_private": false,
"registry_suffix": "co.uk",
"fqdn": "forums.bbc.co.uk",
"ipv4": "",
"ipv6": "",
"registered_domain": "bbc.co.uk",
"reverse_domain_name": "co.uk.bbc.forums",
"top_domain_under_public_suffix": "bbc.co.uk",
"top_domain_under_registry_suffix": "bbc.co.uk"
}The CLI respects the following environment variables:
TLDEXTRACT_CACHE: Cache directory path (overrides default)TLDEXTRACT_CACHE_TIMEOUT: HTTP timeout for PSL fetching (seconds)# Set cache location
export TLDEXTRACT_CACHE="/tmp/tldextract_cache"
# Set timeout
export TLDEXTRACT_CACHE_TIMEOUT="5.0"
# Use with settings
tldextract 'example.com'Install with Tessl CLI
npx tessl i tessl/pypi-tldextract