or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

cli.mdconfigurable-extraction.mdindex.mdresult-processing.mdurl-extraction.md

cli.mddocs/

0

# Command Line Interface

1

2

Command-line tool for URL parsing with options for output formatting, cache management, PSL updates, and batch processing. The CLI provides access to tldextract functionality from shell scripts and command-line workflows.

3

4

## Capabilities

5

6

### Basic Command Structure

7

8

The CLI accepts URLs as positional arguments and provides various options for customizing behavior and output.

9

10

```bash { .api }

11

tldextract [options] <url1> [url2] ...

12

13

Options:

14

--version Show version information

15

-j, --json Output in JSON format

16

-u, --update Force fetch latest TLD definitions

17

--suffix_list_url URL Use alternate PSL URL/file (can specify multiple)

18

-c DIR, --cache_dir DIR Use alternate cache directory

19

-p, --include_psl_private_domains, --private_domains

20

Include PSL private domains

21

--no_fallback_to_snapshot Don't fall back to bundled PSL snapshot

22

```

23

24

### Basic Usage

25

26

Extract URL components with default space-separated output:

27

28

```bash

29

# Single URL

30

tldextract 'http://forums.bbc.co.uk'

31

# Output: forums bbc co.uk

32

33

# Multiple URLs

34

tldextract 'google.com' 'http://forums.news.cnn.com/' 'https://www.example.co.uk'

35

# Output:

36

# google com

37

# forums.news cnn com

38

# www example co.uk

39

40

# Complex domains

41

tldextract 'http://www.worldbank.org.kg/'

42

# Output: www worldbank org.kg

43

```

44

45

### JSON Output

46

47

Get structured JSON output for programmatic processing:

48

49

```bash

50

# Single URL with JSON output

51

tldextract --json 'http://forums.bbc.co.uk'

52

# Output: {"subdomain": "forums", "domain": "bbc", "suffix": "co.uk", "is_private": false, "registry_suffix": "co.uk", "fqdn": "forums.bbc.co.uk", "ipv4": "", "ipv6": "", "registered_domain": "bbc.co.uk", "reverse_domain_name": "co.uk.bbc.forums", "top_domain_under_public_suffix": "bbc.co.uk", "top_domain_under_registry_suffix": "bbc.co.uk"}

53

54

# Multiple URLs with JSON output

55

tldextract --json 'google.com' 'http://127.0.0.1:8080'

56

# Output:

57

# {"subdomain": "", "domain": "google", "suffix": "com", "is_private": false, "registry_suffix": "com", "fqdn": "google.com", "ipv4": "", "ipv6": "", "registered_domain": "google.com", "reverse_domain_name": "com.google", "top_domain_under_public_suffix": "google.com", "top_domain_under_registry_suffix": "google.com"}

58

# {"subdomain": "", "domain": "127.0.0.1", "suffix": "", "is_private": false, "registry_suffix": "", "fqdn": "", "ipv4": "127.0.0.1", "ipv6": "", "registered_domain": "", "reverse_domain_name": "127.0.0.1", "top_domain_under_public_suffix": "", "top_domain_under_registry_suffix": ""}

59

```

60

61

### Private Domain Handling

62

63

Control how PSL private domains are processed:

64

65

```bash

66

# Default behavior - private domains as regular domains

67

tldextract 'waiterrant.blogspot.com'

68

# Output: waiterrant blogspot com

69

70

# Include private domains in suffix

71

tldextract --include_psl_private_domains 'waiterrant.blogspot.com'

72

# Output: waiterrant blogspot.com

73

74

# Short form of the option

75

tldextract -p 'waiterrant.blogspot.com'

76

# Output: waiterrant blogspot.com

77

```

78

79

### PSL Data Management

80

81

Update and manage Public Suffix List data:

82

83

```bash

84

# Force update PSL data from remote sources

85

tldextract --update

86

87

# Update and then process URLs

88

tldextract --update 'http://example.new-tld'

89

90

# Check version after update

91

tldextract --version

92

```

93

94

### Custom PSL Sources

95

96

Use alternative or local PSL data sources:

97

98

```bash

99

# Use custom remote PSL source

100

tldextract --suffix_list_url 'http://custom.psl.mirror.com/list.dat' 'example.com'

101

102

# Use local PSL file

103

tldextract --suffix_list_url 'file:///path/to/custom/suffix_list.dat' 'example.com'

104

105

# Use multiple PSL sources (tried in order)

106

tldextract --suffix_list_url 'http://primary.psl.com/list.dat' --suffix_list_url 'http://backup.psl.com/list.dat' 'example.com'

107

108

# Disable fallback to bundled snapshot

109

tldextract --suffix_list_url 'http://custom.psl.com/list.dat' --no_fallback_to_snapshot 'example.com'

110

```

111

112

### Cache Management

113

114

Control PSL data caching behavior:

115

116

```bash

117

# Use custom cache directory

118

tldextract --cache_dir '/path/to/custom/cache' 'example.com'

119

120

# Use environment variable for cache location

121

export TLDEXTRACT_CACHE="/path/to/cache"

122

tldextract 'example.com'

123

124

# Use environment variable for cache timeout

125

export TLDEXTRACT_CACHE_TIMEOUT="10.0"

126

tldextract 'example.com'

127

```

128

129

## Integration Examples

130

131

### Shell Scripting

132

133

Extract specific components for shell scripts:

134

135

```bash

136

#!/bin/bash

137

# Extract just the domain name

138

URL="http://forums.news.cnn.com/"

139

DOMAIN=$(tldextract "$URL" | awk '{print $2}')

140

echo "Domain: $DOMAIN" # Output: Domain: cnn

141

142

# Extract all components

143

read SUBDOMAIN DOMAIN SUFFIX <<< $(tldextract "$URL")

144

echo "Subdomain: $SUBDOMAIN"

145

echo "Domain: $DOMAIN"

146

echo "Suffix: $SUFFIX"

147

```

148

149

### Batch Processing

150

151

Process multiple URLs from files or pipes:

152

153

```bash

154

# Process URLs from file

155

cat urls.txt | xargs tldextract

156

157

# Process with JSON output for further processing

158

cat urls.txt | xargs tldextract --json | jq '.domain' | sort | uniq

159

160

# Extract domains from access logs

161

grep "GET" access.log | awk '{print $7}' | xargs tldextract | awk '{print $2}' | sort | uniq -c

162

```

163

164

### Combined with Other Tools

165

166

Use with standard Unix tools for data processing:

167

168

```bash

169

# Count domains by TLD

170

tldextract --json 'site1.com' 'site2.org' 'site3.com' | jq -r '.suffix' | sort | uniq -c

171

172

# Extract and validate domains

173

echo "http://example.com" | xargs tldextract --json | jq -r 'select(.suffix != "") | .top_domain_under_public_suffix'

174

175

# Check for private domains

176

tldextract --json --include_psl_private_domains 'waiterrant.blogspot.com' | jq '.is_private'

177

```

178

179

## Error Handling

180

181

The CLI handles various error conditions gracefully:

182

183

### Invalid URLs

184

185

```bash

186

# Invalid URLs are processed without errors

187

tldextract 'not-a-url' 'google.notavalidsuffix'

188

# Output:

189

# not-a-url

190

# google notavalidsuffix

191

```

192

193

### Network Errors

194

195

```bash

196

# Network errors during PSL update are logged but don't prevent operation

197

tldextract --update --suffix_list_url 'http://nonexistent.example.com/list.dat' 'example.com'

198

# Will fall back to cached data or bundled snapshot

199

```

200

201

### Missing Arguments

202

203

```bash

204

# No URLs provided shows usage

205

tldextract

206

# Output: usage: tldextract [-h] [--version] [-j] [-u] [--suffix_list_url SUFFIX_LIST_URL] [-c CACHE_DIR] [-p] [--no_fallback_to_snapshot] [fqdn|url ...]

207

208

# Help is available

209

tldextract --help

210

```

211

212

## Output Format Details

213

214

### Standard Output Format

215

216

Space-separated: `subdomain domain suffix`

217

218

- Empty fields are represented as empty strings

219

- IPv4/IPv6 addresses appear in the domain field with empty suffix

220

- Invalid suffixes result in empty suffix field

221

222

### JSON Output Format

223

224

Complete ExtractResult data including all properties:

225

226

```json

227

{

228

"subdomain": "forums",

229

"domain": "bbc",

230

"suffix": "co.uk",

231

"is_private": false,

232

"registry_suffix": "co.uk",

233

"fqdn": "forums.bbc.co.uk",

234

"ipv4": "",

235

"ipv6": "",

236

"registered_domain": "bbc.co.uk",

237

"reverse_domain_name": "co.uk.bbc.forums",

238

"top_domain_under_public_suffix": "bbc.co.uk",

239

"top_domain_under_registry_suffix": "bbc.co.uk"

240

}

241

```

242

243

## Environment Variables

244

245

The CLI respects the following environment variables:

246

247

- `TLDEXTRACT_CACHE`: Cache directory path (overrides default)

248

- `TLDEXTRACT_CACHE_TIMEOUT`: HTTP timeout for PSL fetching (seconds)

249

250

```bash

251

# Set cache location

252

export TLDEXTRACT_CACHE="/tmp/tldextract_cache"

253

254

# Set timeout

255

export TLDEXTRACT_CACHE_TIMEOUT="5.0"

256

257

# Use with settings

258

tldextract 'example.com'

259

```