Accurately separates a URL's subdomain, domain, and public suffix using the Public Suffix List
Core functionality for extracting URL components using the convenience extract() function. This provides the most common use case with sensible defaults and handles the majority of URL parsing scenarios.
The primary extraction function that separates any URL-like string into its subdomain, domain, and public suffix components.
def extract(
url: str,
include_psl_private_domains: bool | None = False,
session: requests.Session | None = None
) -> ExtractResult:
"""
Extract subdomain, domain, and suffix from a URL string.
Parameters:
- url: URL string to parse (can include protocol, port, path)
- include_psl_private_domains: Include PSL private domains like 'blogspot.com'
- session: Optional requests.Session for HTTP customization
Returns:
ExtractResult with parsed components and metadata
"""Usage Examples:
import tldextract
# Standard domains
result = tldextract.extract('http://www.google.com')
print(result)
# ExtractResult(subdomain='www', domain='google', suffix='com', is_private=False)
# Complex country code TLDs
result = tldextract.extract('http://forums.bbc.co.uk/')
print(result)
# ExtractResult(subdomain='forums', domain='bbc', suffix='co.uk', is_private=False)
# Subdomains with multiple levels
result = tldextract.extract('http://forums.news.cnn.com/')
print(result)
# ExtractResult(subdomain='forums.news', domain='cnn', suffix='com', is_private=False)
# International domains
result = tldextract.extract('http://www.worldbank.org.kg/')
print(result)
# ExtractResult(subdomain='www', domain='worldbank', suffix='org.kg', is_private=False)Control how PSL private domains are handled during extraction. Private domains are organizational domains like 'blogspot.com' that allow subdomain registration.
# Default behavior - treat private domains as regular domains
result = tldextract.extract('waiterrant.blogspot.com')
print(result)
# ExtractResult(subdomain='waiterrant', domain='blogspot', suffix='com', is_private=False)
# Include private domains in suffix
result = tldextract.extract('waiterrant.blogspot.com', include_psl_private_domains=True)
print(result)
# ExtractResult(subdomain='', domain='waiterrant', suffix='blogspot.com', is_private=True)The library gracefully handles various edge cases including IP addresses, invalid suffixes, and malformed URLs.
# IP addresses
result = tldextract.extract('http://127.0.0.1:8080/deployed/')
print(result)
# ExtractResult(subdomain='', domain='127.0.0.1', suffix='', is_private=False)
# IPv6 addresses
result = tldextract.extract('http://[2001:db8::1]/path')
print(result.domain) # '[2001:db8::1]'
# No subdomain
result = tldextract.extract('google.com')
print(result)
# ExtractResult(subdomain='', domain='google', suffix='com', is_private=False)
# Invalid suffixes
result = tldextract.extract('google.notavalidsuffix')
print(result)
# ExtractResult(subdomain='google', domain='notavalidsuffix', suffix='', is_private=False)Provide custom HTTP session for PSL fetching to support proxies, authentication, or other HTTP customizations.
import requests
import tldextract
# Create custom session with proxy
session = requests.Session()
session.proxies = {'http': 'http://proxy.example.com:8080'}
# Use custom session for PSL fetching
result = tldextract.extract('http://example.com', session=session)Force update of the cached Public Suffix List data to get the latest TLD definitions.
def update(fetch_now: bool = False, session: requests.Session | None = None) -> None:
"""
Force update of cached PSL data.
Parameters:
- fetch_now: Whether to fetch immediately rather than on next extraction
- session: Optional requests.Session for HTTP customization
"""Usage Example:
import tldextract
# Force update of PSL data
tldextract.update(fetch_now=True)
# Use after update
result = tldextract.extract('http://example.new-tld')All extraction functions return an ExtractResult object with the following structure:
@dataclass
class ExtractResult:
subdomain: str # All subdomains, empty string if none
domain: str # Main domain name
suffix: str # Public suffix (TLD), empty string if none/invalid
is_private: bool # Whether suffix is from PSL private domains
registry_suffix: str # Registry suffix (internal)The ExtractResult provides additional properties and methods for working with the parsed components - see Result Processing for complete details.
The extraction functions are designed to never raise exceptions for malformed input. Invalid or unparseable URLs will return sensible fallback values:
domain with empty subdomain and suffixdomain with empty suffixInstall with Tessl CLI
npx tessl i tessl/pypi-tldextract