0
# tldjs
1
2
tldjs is a JavaScript library for working with complex domain names, subdomains and well-known TLDs. It provides utilities to parse URLs/hostnames and extract domain components based on Mozilla's Public Suffix List, answering questions like "what is mail.google.com's domain?" and "is big.data's TLD well-known?".
3
4
## Package Information
5
6
- **Package Name**: tldjs
7
- **Package Type**: npm
8
- **Language**: JavaScript
9
- **Installation**: `npm install tldjs`
10
11
## Core Imports
12
13
```javascript
14
const { parse, tldExists, getDomain, getSubdomain, getPublicSuffix, isValidHostname, extractHostname } = require('tldjs');
15
```
16
17
Or import the entire module:
18
19
```javascript
20
const tldjs = require('tldjs');
21
```
22
23
For ES6 modules:
24
25
```javascript
26
import { parse, tldExists, getDomain, getSubdomain, getPublicSuffix, isValidHostname, extractHostname } from 'tldjs';
27
```
28
29
Or import the entire module:
30
31
```javascript
32
import tldjs from 'tldjs';
33
```
34
35
## Basic Usage
36
37
```javascript
38
const tldjs = require('tldjs');
39
40
// Parse a URL completely
41
const result = tldjs.parse('https://spark-public.s3.amazonaws.com/dataanalysis/loansData.csv');
42
console.log(result);
43
// {
44
// hostname: 'spark-public.s3.amazonaws.com',
45
// isValid: true,
46
// isIp: false,
47
// tldExists: true,
48
// publicSuffix: 's3.amazonaws.com',
49
// domain: 'spark-public.s3.amazonaws.com',
50
// subdomain: ''
51
// }
52
53
// Check if TLD exists
54
console.log(tldjs.tldExists('google.com')); // true
55
console.log(tldjs.tldExists('google.local')); // false
56
57
// Extract specific parts
58
console.log(tldjs.getDomain('fr.google.com')); // 'google.com'
59
console.log(tldjs.getSubdomain('fr.google.com')); // 'fr'
60
console.log(tldjs.getPublicSuffix('google.co.uk')); // 'co.uk'
61
```
62
63
## Architecture
64
65
tldjs is built around several key components:
66
67
- **Public Suffix List**: Uses Mozilla's Public Suffix List for accurate TLD recognition
68
- **Hostname Extraction**: Robust URL parsing to extract hostnames from complex URLs
69
- **Validation Layer**: RFC-compliant hostname validation
70
- **Trie Data Structure**: Efficient suffix lookup using a trie for fast public suffix matching
71
- **Factory Pattern**: Customizable instances with user-defined rules and validation hosts
72
73
## Capabilities
74
75
### URL/Hostname Parsing
76
77
Complete parsing of URLs or hostnames with all domain components extracted in a single operation.
78
79
```javascript { .api }
80
/**
81
* Parse URL/hostname and return complete information about domain components
82
* @param {string} url - URL or hostname to parse
83
* @param {number} [_step] - Internal step control for optimization
84
* @returns {ParseResult} Complete parsing result
85
*/
86
function parse(url, _step);
87
88
interface ParseResult {
89
hostname: string | null; // Extracted hostname
90
isValid: boolean; // Whether hostname is valid per RFC
91
isIp: boolean; // Whether hostname is an IP address
92
tldExists: boolean; // Whether TLD is well-known
93
publicSuffix: string | null; // Public suffix portion
94
domain: string | null; // Domain portion
95
subdomain: string | null; // Subdomain portion
96
}
97
```
98
99
**Usage Examples:**
100
101
```javascript
102
// Standard web URL
103
tldjs.parse('https://www.example.com/path');
104
// { hostname: 'www.example.com', isValid: true, isIp: false,
105
// tldExists: true, publicSuffix: 'com', domain: 'example.com', subdomain: 'www' }
106
107
// Complex AWS hostname
108
tldjs.parse('https://spark-public.s3.amazonaws.com/data.csv');
109
// { hostname: 'spark-public.s3.amazonaws.com', isValid: true, isIp: false,
110
// tldExists: true, publicSuffix: 's3.amazonaws.com',
111
// domain: 'spark-public.s3.amazonaws.com', subdomain: '' }
112
113
// IP address
114
tldjs.parse('https://192.168.0.1/admin');
115
// { hostname: '192.168.0.1', isValid: true, isIp: true,
116
// tldExists: false, publicSuffix: null, domain: null, subdomain: null }
117
118
// Invalid/unknown TLD
119
tldjs.parse('domain.unknown');
120
// { hostname: 'domain.unknown', isValid: true, isIp: false,
121
// tldExists: false, publicSuffix: 'unknown', domain: 'domain.unknown', subdomain: '' }
122
```
123
124
### TLD Existence Checking
125
126
Validates whether a TLD is well-known according to the Public Suffix List.
127
128
```javascript { .api }
129
/**
130
* Check if TLD exists for given URL/hostname
131
* @param {string} url - URL or hostname to check
132
* @returns {boolean} True if TLD is well-known
133
*/
134
function tldExists(url);
135
```
136
137
**Usage Examples:**
138
139
```javascript
140
tldjs.tldExists('google.com'); // true
141
tldjs.tldExists('google.local'); // false (not registered TLD)
142
tldjs.tldExists('com'); // true
143
tldjs.tldExists('uk'); // true
144
tldjs.tldExists('co.uk'); // true
145
tldjs.tldExists('amazon.co.uk'); // true (because 'uk' is valid)
146
tldjs.tldExists('https://user:password@example.co.uk:8080/path'); // true
147
```
148
149
### Public Suffix Extraction
150
151
Extracts the public suffix (effective TLD) from URLs or hostnames.
152
153
```javascript { .api }
154
/**
155
* Extract public suffix from URL/hostname
156
* @param {string} url - URL or hostname to analyze
157
* @returns {string | null} Public suffix or null if invalid
158
*/
159
function getPublicSuffix(url);
160
```
161
162
**Usage Examples:**
163
164
```javascript
165
tldjs.getPublicSuffix('google.com'); // 'com'
166
tldjs.getPublicSuffix('fr.google.com'); // 'com'
167
tldjs.getPublicSuffix('google.co.uk'); // 'co.uk'
168
tldjs.getPublicSuffix('s3.amazonaws.com'); // 's3.amazonaws.com'
169
tldjs.getPublicSuffix('tld.is.unknown'); // 'unknown'
170
```
171
172
### Domain Extraction
173
174
Extracts the domain (second-level domain + public suffix) from URLs or hostnames.
175
176
```javascript { .api }
177
/**
178
* Extract domain from URL/hostname
179
* @param {string} url - URL or hostname to analyze
180
* @returns {string | null} Domain or null if invalid
181
*/
182
function getDomain(url);
183
```
184
185
**Usage Examples:**
186
187
```javascript
188
tldjs.getDomain('google.com'); // 'google.com'
189
tldjs.getDomain('fr.google.com'); // 'google.com'
190
tldjs.getDomain('fr.google.google'); // 'google.google'
191
tldjs.getDomain('foo.google.co.uk'); // 'google.co.uk'
192
tldjs.getDomain('t.co'); // 't.co'
193
tldjs.getDomain('fr.t.co'); // 't.co'
194
tldjs.getDomain('https://user:password@example.co.uk:8080/some/path?query#hash'); // 'example.co.uk'
195
```
196
197
### Subdomain Extraction
198
199
Extracts the subdomain portion from URLs or hostnames.
200
201
```javascript { .api }
202
/**
203
* Extract subdomain from URL/hostname
204
* @param {string} url - URL or hostname to analyze
205
* @returns {string | null} Subdomain, empty string if none, or null if invalid
206
*/
207
function getSubdomain(url);
208
```
209
210
**Usage Examples:**
211
212
```javascript
213
tldjs.getSubdomain('google.com'); // ''
214
tldjs.getSubdomain('fr.google.com'); // 'fr'
215
tldjs.getSubdomain('google.co.uk'); // ''
216
tldjs.getSubdomain('foo.google.co.uk'); // 'foo'
217
tldjs.getSubdomain('moar.foo.google.co.uk'); // 'moar.foo'
218
tldjs.getSubdomain('t.co'); // ''
219
tldjs.getSubdomain('fr.t.co'); // 'fr'
220
tldjs.getSubdomain('https://secure.example.co.uk:443/path'); // 'secure'
221
```
222
223
### Hostname Extraction
224
225
Extracts and validates hostname from URLs or validates existing hostnames.
226
227
```javascript { .api }
228
/**
229
* Extract hostname from URL or validate hostname
230
* @param {string} url - URL or hostname to process
231
* @returns {string | null} Clean hostname or null if invalid
232
*/
233
function extractHostname(url);
234
```
235
236
**Usage Examples:**
237
238
```javascript
239
tldjs.extractHostname(' example.CO.uk '); // 'example.co.uk'
240
tldjs.extractHostname('example.co.uk/some/path'); // 'example.co.uk'
241
tldjs.extractHostname('user:password@example.co.uk:8080/path'); // 'example.co.uk'
242
tldjs.extractHostname('https://www.example.com/'); // 'www.example.com'
243
tldjs.extractHostname('台灣'); // 'xn--kpry57d' (punycode)
244
tldjs.extractHostname(42); // '42' (returns stringified input if invalid)
245
```
246
247
### Hostname Validation
248
249
Validates hostnames according to RFC 1035 standards.
250
251
```javascript { .api }
252
/**
253
* Validate hostname according to RFC 1035
254
* @param {string} hostname - Hostname to validate
255
* @returns {boolean} True if hostname is valid per RFC
256
*/
257
function isValidHostname(hostname);
258
```
259
260
**Usage Examples:**
261
262
```javascript
263
tldjs.isValidHostname('google.com'); // true
264
tldjs.isValidHostname('.google.com'); // false
265
tldjs.isValidHostname('my.fake.domain'); // true
266
tldjs.isValidHostname('localhost'); // false
267
tldjs.isValidHostname('192.168.0.0'); // true
268
tldjs.isValidHostname('https://example.com'); // false (full URL, not hostname)
269
```
270
271
### Deprecated: isValid
272
273
Legacy hostname validation function (use isValidHostname instead).
274
275
```javascript { .api }
276
/**
277
* @deprecated Use isValidHostname instead
278
* Validate hostname according to RFC 1035
279
* @param {string} hostname - Hostname to validate
280
* @returns {boolean} True if hostname is valid per RFC
281
*/
282
function isValid(hostname);
283
```
284
285
### Custom Configuration Factory
286
287
Creates customized tldjs instances with user-defined settings for specialized use cases.
288
289
```javascript { .api }
290
/**
291
* Create customized tldjs instance with user settings
292
* @param {FactoryOptions} options - Configuration options
293
* @returns {tldjs} Customized tldjs instance with same API
294
*/
295
function fromUserSettings(options);
296
297
interface FactoryOptions {
298
rules?: SuffixTrie; // Custom suffix trie for lookups
299
validHosts?: string[]; // Additional hosts to treat as valid domains
300
extractHostname?: (url: string) => string | null; // Custom hostname extraction function
301
}
302
```
303
304
**Usage Examples:**
305
306
```javascript
307
// Default behavior - localhost is not recognized
308
tldjs.getDomain('localhost'); // null
309
tldjs.getSubdomain('vhost.localhost'); // null
310
311
// Custom instance with localhost support
312
const myTldjs = tldjs.fromUserSettings({
313
validHosts: ['localhost']
314
});
315
316
myTldjs.getDomain('localhost'); // 'localhost'
317
myTldjs.getSubdomain('vhost.localhost'); // 'vhost'
318
myTldjs.getDomain('api.localhost'); // 'localhost'
319
myTldjs.getSubdomain('api.localhost'); // 'api'
320
```
321
322
## Types
323
324
```javascript { .api }
325
interface ParseResult {
326
hostname: string | null; // Extracted hostname from input
327
isValid: boolean; // Whether hostname follows RFC 1035
328
isIp: boolean; // Whether hostname is IPv4/IPv6 address
329
tldExists: boolean; // Whether TLD exists in Public Suffix List
330
publicSuffix: string | null; // Public suffix (effective TLD)
331
domain: string | null; // Domain name (SLD + public suffix)
332
subdomain: string | null; // Subdomain portion
333
}
334
335
class SuffixTrie {
336
constructor(rules?: PlainRules); // Create trie with optional rules
337
static fromJson(json: object): SuffixTrie; // Create trie from JSON rules
338
hasTld(value: string): boolean; // Check if TLD exists in trie
339
suffixLookup(hostname: string): string | null; // Find public suffix for hostname
340
exceptions: object; // Exception rules trie
341
rules: object; // Standard rules trie
342
}
343
344
interface PlainRules {
345
parts: string[]; // Domain parts in reverse order
346
exception: boolean; // Whether this is an exception rule
347
}[]
348
```
349
350
## Error Handling
351
352
All tldjs functions handle invalid input gracefully:
353
354
- Invalid URLs return `null` for extracted components
355
- Malformed hostnames are detected via `isValid: false` in parse results
356
- IP addresses are properly identified and bypass TLD validation
357
- Unknown TLDs are handled transparently (marked as `tldExists: false`)
358
359
## Performance Notes
360
361
tldjs is optimized for performance with different input types:
362
363
- **Cleaned hostnames**: ~850,000-8,700,000 ops/sec depending on function
364
- **Full URLs**: ~230,000-25,400,000 ops/sec depending on function
365
- **Lazy evaluation**: The `parse()` function uses early termination to avoid unnecessary processing
366
- **Custom hostname extraction**: You can provide optimized `extractHostname` functions for specialized use cases
367
368
## Browser Compatibility
369
370
tldjs works in browsers via bundlers like browserify, webpack, and others. The library has no Node.js-specific dependencies and uses only standard JavaScript features.
371
372
## TLD List Updates
373
374
The library bundles Mozilla's Public Suffix List but supports updates:
375
376
```bash
377
# Update TLD rules during installation
378
npm install tldjs --tldjs-update-rules
379
380
# Update existing installation
381
npm install --tldjs-update-rules
382
```