or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

cli.mdconfigurable-extraction.mdindex.mdresult-processing.mdurl-extraction.md

result-processing.mddocs/

0

# Result Processing

1

2

Comprehensive result handling through the `ExtractResult` dataclass, providing properties and methods for reconstructing domains, handling IP addresses, accessing metadata, and working with parsed URL components in various formats.

3

4

## Capabilities

5

6

### ExtractResult Structure

7

8

The core data structure returned by all extraction operations, containing the parsed URL components and metadata.

9

10

```python { .api }

11

from dataclasses import dataclass, field

12

13

@dataclass(order=True)

14

class ExtractResult:

15

subdomain: str

16

"""All subdomains beneath the domain, empty string if none"""

17

18

domain: str

19

"""The topmost domain name, or hostname-like content if no valid domain"""

20

21

suffix: str

22

"""The public suffix (TLD), empty string if none or invalid"""

23

24

is_private: bool

25

"""Whether the suffix belongs to PSL private domains"""

26

27

registry_suffix: str = field(repr=False)

28

"""The registry suffix, unaffected by include_psl_private_domains setting"""

29

```

30

31

**Basic Usage:**

32

33

```python

34

import tldextract

35

36

result = tldextract.extract('http://forums.news.cnn.com/')

37

print(f"Subdomain: '{result.subdomain}'") # 'forums.news'

38

print(f"Domain: '{result.domain}'") # 'cnn'

39

print(f"Suffix: '{result.suffix}'") # 'com'

40

print(f"Is Private: {result.is_private}") # False

41

```

42

43

### Domain Reconstruction

44

45

Properties for reconstructing various forms of the original domain name from the parsed components.

46

47

```python { .api }

48

@property

49

def fqdn(self) -> str:

50

"""

51

Fully Qualified Domain Name if there is a proper domain and suffix.

52

53

Returns:

54

Complete domain name or empty string if invalid

55

"""

56

57

@property

58

def top_domain_under_public_suffix(self) -> str:

59

"""

60

Domain and suffix joined with a dot if both are present.

61

62

Returns:

63

Registered domain name or empty string if invalid

64

"""

65

66

@property

67

def top_domain_under_registry_suffix(self) -> str:

68

"""

69

Top domain under registry suffix, handling PSL private domains.

70

71

Returns:

72

Registry domain name or empty string if invalid

73

"""

74

75

@property

76

def registered_domain(self) -> str:

77

"""

78

DEPRECATED: Use top_domain_under_public_suffix instead.

79

80

Returns:

81

Same as top_domain_under_public_suffix

82

"""

83

```

84

85

**Usage Examples:**

86

87

```python

88

import tldextract

89

90

# Standard domain reconstruction

91

result = tldextract.extract('http://forums.bbc.co.uk/path')

92

print(result.fqdn) # 'forums.bbc.co.uk'

93

print(result.top_domain_under_public_suffix) # 'bbc.co.uk'

94

95

# No subdomain

96

result = tldextract.extract('google.com')

97

print(result.fqdn) # 'google.com'

98

print(result.top_domain_under_public_suffix) # 'google.com'

99

100

# Invalid domain (IP address)

101

result = tldextract.extract('http://127.0.0.1:8080')

102

print(result.fqdn) # '' (empty string)

103

print(result.top_domain_under_public_suffix) # '' (empty string)

104

105

# Private domain handling

106

result = tldextract.extract('waiterrant.blogspot.com', include_psl_private_domains=True)

107

print(result.top_domain_under_public_suffix) # 'waiterrant.blogspot.com'

108

print(result.top_domain_under_registry_suffix) # 'blogspot.com'

109

```

110

111

### IP Address Detection

112

113

Properties for detecting and extracting IP addresses from the parsed results.

114

115

```python { .api }

116

@property

117

def ipv4(self) -> str:

118

"""

119

IPv4 address if input was a valid IPv4, empty string otherwise.

120

121

Returns:

122

IPv4 address string or empty string

123

"""

124

125

@property

126

def ipv6(self) -> str:

127

"""

128

IPv6 address if input was a valid IPv6, empty string otherwise.

129

130

Returns:

131

IPv6 address string or empty string

132

"""

133

```

134

135

**Usage Examples:**

136

137

```python

138

import tldextract

139

140

# IPv4 detection

141

result = tldextract.extract('http://192.168.1.1:8080/path')

142

print(result.ipv4) # '192.168.1.1'

143

print(result.ipv6) # ''

144

print(result.domain) # '192.168.1.1'

145

print(result.suffix) # ''

146

147

# IPv6 detection

148

result = tldextract.extract('http://[2001:db8::1]/path')

149

print(result.ipv4) # ''

150

print(result.ipv6) # '2001:db8::1'

151

print(result.domain) # '[2001:db8::1]'

152

153

# Invalid IP addresses

154

result = tldextract.extract('http://256.1.1.1/') # Invalid IPv4

155

print(result.ipv4) # ''

156

print(result.domain) # '256.1.1.1'

157

158

result = tldextract.extract('http://127.0.0.1.1/') # Invalid format

159

print(result.ipv4) # ''

160

print(result.domain) # '127.0.0.1.1'

161

```

162

163

### Domain Name Formatting

164

165

Property for converting domain names to reverse DNS notation, commonly used in package naming and namespace organization.

166

167

```python { .api }

168

@property

169

def reverse_domain_name(self) -> str:

170

"""

171

Domain name in reverse DNS notation.

172

173

Joins components as: suffix.domain.reversed_subdomain_parts

174

175

Returns:

176

Reverse domain name string

177

"""

178

```

179

180

**Usage Examples:**

181

182

```python

183

import tldextract

184

185

# Simple domain

186

result = tldextract.extract('login.example.com')

187

print(result.reverse_domain_name) # 'com.example.login'

188

189

# Complex subdomain

190

result = tldextract.extract('api.v2.auth.example.com')

191

print(result.reverse_domain_name) # 'com.example.auth.v2.api'

192

193

# Country code TLD

194

result = tldextract.extract('login.example.co.uk')

195

print(result.reverse_domain_name) # 'co.uk.example.login'

196

197

# No subdomain

198

result = tldextract.extract('example.com')

199

print(result.reverse_domain_name) # 'com.example'

200

```

201

202

## Private Domain Handling

203

204

Understanding how PSL private domains affect the result structure and property values.

205

206

### Default Behavior (include_psl_private_domains=False)

207

208

```python

209

import tldextract

210

211

# Default: private domains treated as regular domains

212

result = tldextract.extract('waiterrant.blogspot.com')

213

print(result.subdomain) # 'waiterrant'

214

print(result.domain) # 'blogspot'

215

print(result.suffix) # 'com'

216

print(result.is_private) # False

217

print(result.registry_suffix) # 'com'

218

print(result.top_domain_under_public_suffix) # 'blogspot.com'

219

print(result.top_domain_under_registry_suffix) # 'blogspot.com'

220

```

221

222

### Private Domains Enabled (include_psl_private_domains=True)

223

224

```python

225

import tldextract

226

227

# Private domains included in suffix

228

result = tldextract.extract('waiterrant.blogspot.com', include_psl_private_domains=True)

229

print(result.subdomain) # ''

230

print(result.domain) # 'waiterrant'

231

print(result.suffix) # 'blogspot.com'

232

print(result.is_private) # True

233

print(result.registry_suffix) # 'com'

234

print(result.top_domain_under_public_suffix) # 'waiterrant.blogspot.com'

235

print(result.top_domain_under_registry_suffix) # 'blogspot.com'

236

```

237

238

## Edge Cases and Special Handling

239

240

### Invalid Suffixes

241

242

When the input domain doesn't have a recognized public suffix:

243

244

```python

245

import tldextract

246

247

result = tldextract.extract('google.notavalidsuffix')

248

print(result.subdomain) # 'google'

249

print(result.domain) # 'notavalidsuffix'

250

print(result.suffix) # ''

251

print(result.fqdn) # ''

252

```

253

254

### Localhost and Private Networks

255

256

```python

257

import tldextract

258

259

result = tldextract.extract('http://localhost:8080')

260

print(result.subdomain) # ''

261

print(result.domain) # 'localhost'

262

print(result.suffix) # ''

263

print(result.fqdn) # ''

264

265

result = tldextract.extract('http://intranet.corp')

266

print(result.subdomain) # 'intranet'

267

print(result.domain) # 'corp'

268

print(result.suffix) # ''

269

```

270

271

### Punycode/IDN Domains

272

273

International domain names are automatically handled:

274

275

```python

276

import tldextract

277

278

# Punycode is automatically decoded internally

279

result = tldextract.extract('http://xn--n3h.com') # ☃.com

280

print(result.domain) # Handled correctly

281

282

# Unicode domains work directly

283

result = tldextract.extract('http://münchen.de')

284

print(result.domain) # 'münchen'

285

print(result.suffix) # 'de'

286

```

287

288

## Comparison and Sorting

289

290

`ExtractResult` objects support comparison and sorting operations:

291

292

```python

293

import tldextract

294

295

results = [

296

tldextract.extract('b.example.com'),

297

tldextract.extract('a.example.com'),

298

tldextract.extract('c.example.org')

299

]

300

301

# Results are sortable (order=True in dataclass)

302

sorted_results = sorted(results)

303

for result in sorted_results:

304

print(result.fqdn)

305

# Output will be in lexicographic order

306

307

# Equality comparison

308

result1 = tldextract.extract('example.com')

309

result2 = tldextract.extract('http://example.com/')

310

print(result1 == result2) # True - same parsed components

311

```

312

313

## String Representation

314

315

`ExtractResult` provides readable string representation:

316

317

```python

318

import tldextract

319

320

result = tldextract.extract('http://forums.news.cnn.com/')

321

print(result)

322

# ExtractResult(subdomain='forums.news', domain='cnn', suffix='com', is_private=False)

323

324

print(repr(result))

325

# Same detailed representation

326

```