UNPKG

19.2 kBMarkdownView Raw
1# parse-domain
2
3**Splits a hostname into subdomains, domain and (effective) top-level domains.**
4
5[![Version on NPM](https://img.shields.io/npm/v/parse-domain?style=for-the-badge)](https://www.npmjs.com/package/parse-domain)
6[![Semantically released](https://img.shields.io/badge/%20%20%F0%9F%93%A6%F0%9F%9A%80-semantic--release-e10079.svg?style=for-the-badge)](https://github.com/semantic-release/semantic-release)
7[![Monthly downloads on NPM](https://img.shields.io/npm/dm/parse-domain?style=for-the-badge)](https://www.npmjs.com/package/parse-domain)<br>
8[![NPM Bundle size minified](https://img.shields.io/bundlephobia/min/parse-domain?style=for-the-badge)](https://bundlephobia.com/result?p=parse-domain)
9[![NPM Bundle size minified and gzipped](https://img.shields.io/bundlephobia/minzip/parse-domain?style=for-the-badge)](https://bundlephobia.com/result?p=parse-domain)<br>
10[![License](https://img.shields.io/npm/l/parse-domain?style=for-the-badge)](./LICENSE)
11
12Since domain name registrars organize their namespaces in different ways, it's not straight-forward to split a hostname into subdomains, the domain and top-level domains. In order to do that **parse-domain** uses a [large list of known top-level domains](https://publicsuffix.org/list/public_suffix_list.dat) from [publicsuffix.org](https://publicsuffix.org/):
13
14```javascript
15import { parseDomain, ParseResultType } from "parse-domain";
16
17const parseResult = parseDomain(
18 // This should be a string with basic latin characters only.
19 // More information below.
20 "www.some.example.co.uk"
21);
22
23// Check if the domain is listed in the public suffix list
24if (parseResult.type === ParseResultType.Listed) {
25 const { subDomains, domain, topLevelDomains } = parseResult;
26
27 console.log(subDomains); // ["www", "some"]
28 console.log(domain); // "example"
29 console.log(topLevelDomains); // ["co", "uk"]
30} else {
31 // Read more about other parseResult types below...
32}
33```
34
35This package has been designed for modern Node and browser environments, supporting both CommonJS and ECMAScript modules. It assumes an ES2015 environment with [`Symbol()`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Symbol) and [`URL()`](https://developer.mozilla.org/en-US/docs/Web/API/URL) globally available. You need to transpile it down to ES5 (e.g. by using [Babel](https://babeljs.io/)) if you need to support older environments.
36
37The list of top-level domains is stored in a [trie](https://en.wikipedia.org/wiki/Trie) data structure and serialization format to ensure the fastest lookup and the smallest possible library size. The library is side-effect free (this is important for proper [tree-shaking](https://webpack.js.org/guides/tree-shaking/)).
38
39<br />
40
41## Installation
42
43```sh
44npm install parse-domain
45```
46
47## Updates
48
49💡 **Please note:** [publicsuffix.org](https://publicsuffix.org/) is updated several times per month. This package comes with a prebuilt list that has been downloaded at the time of `npm publish`. In order to get an up-to-date list, you should run `npx parse-domain-update` everytime you start or build your application. This will download the latest list from `https://publicsuffix.org/list/public_suffix_list.dat`.
50
51<br />
52
53## Expected input
54
55**⚠️ [`parseDomain`](#api-js-parseDomain) does not parse whole URLs**. You should only pass the [puny-encoded](https://en.wikipedia.org/wiki/Punycode) hostname section of the URL:
56
57| ❌ Wrong | ✅ Correct |
58| ---------------------------------------------- | -------------------- |
59| `https://user@www.example.com:8080/path?query` | `www.example.com` |
60| `münchen.de` | `xn--mnchen-3ya.de` |
61| `食狮.com.cn?query` | `xn--85x722f.com.cn` |
62
63There is the utility function [`fromUrl`](#api-js-fromUrl) which tries to extract the hostname from a (partial) URL and puny-encodes it:
64
65```javascript
66import { parseDomain, fromUrl } from "parse-domain";
67
68const { subDomains, domain, topLevelDomains } = parseDomain(
69 fromUrl("https://www.münchen.de?query")
70);
71
72console.log(subDomains); // ["www"]
73console.log(domain); // "xn--mnchen-3ya"
74console.log(topLevelDomains); // ["de"]
75
76// You can use the 'punycode' NPM package to decode the domain again
77import { toUnicode } from "punycode";
78
79console.log(toUnicode(domain)); // "münchen"
80```
81
82[`fromUrl`](#api-js-fromUrl) parses the URL using [`new URL()`](https://developer.mozilla.org/en-US/docs/Web/API/URL). Depending on your target environments you need to make sure that there is a [polyfill](https://www.npmjs.com/package/whatwg-url) for it. It's globally available in [all modern browsers](https://caniuse.com/#feat=url) (no IE) and in [Node v10](https://nodejs.org/api/url.html#url_class_url).
83
84## Expected output
85
86When parsing a hostname there are 5 possible results:
87
88- invalid
89- it is an ip address
90- it is formally correct and the domain is
91 - reserved
92 - not listed in the public suffix list
93 - listed in the public suffix list
94
95[`parseDomain`](#api-js-parseDomain) returns a [`ParseResult`](#api-ts-ParseResult) with a `type` property that allows to distinguish these cases.
96
97### 👉 Invalid domains
98
99The given input is first validated against [RFC 1034](https://tools.ietf.org/html/rfc1034). If the validation fails, `parseResult.type` will be `ParseResultType.Invalid`:
100
101```javascript
102import { parseDomain, ParseResultType } from "parse-domain";
103
104const parseResult = parseDomain("münchen.de");
105
106console.log(parseResult.type === ParseResultType.Invalid); // true
107```
108
109Check out the [API](#api-ts-ValidationError) if you need more information about the validation error.
110
111### 👉 IP addresses
112
113If the given input is an IP address, `parseResult.type` will be `ParseResultType.Ip`:
114
115```javascript
116import { parseDomain, ParseResultType } from "parse-domain";
117
118const parseResult = parseDomain("192.168.2.1");
119
120console.log(parseResult.type === ParseResultType.Ip); // true
121console.log(parseResult.ipVersion); // 4
122```
123
124It's debatable if a library for parsing domains should also accept IP addresses. In fact, you could argue that [`parseDomain`](#api-js-parseDomain) should reject an IP address as invalid. While this is true from a technical point of view, we decided to report IP addresses in a special way because we assume that a lot of people are using this library to make sense from an arbitrary hostname (see [#102](https://github.com/peerigon/parse-domain/issues/102)).
125
126### 👉 Reserved domains
127
128There are 5 top-level domains that are not listed in the public suffix list but reserved according to [RFC 6761](https://tools.ietf.org/html/rfc6761) and [RFC 6762](https://tools.ietf.org/html/rfc6762):
129
130- `localhost`
131- `local`
132- `example`
133- `invalid`
134- `test`
135
136In these cases, `parseResult.type` will be `ParseResultType.Reserved`:
137
138```javascript
139import { parseDomain, ParseResultType } from "parse-domain";
140
141const parseResult = parseDomain("pecorino.local");
142
143console.log(parseResult.type === ParseResultType.Reserved); // true
144console.log(parseResult.labels); // ["pecorino", "local"]
145```
146
147### 👉 Domains that are not listed
148
149If the given hostname is valid, but not listed in the downloaded public suffix list, `parseResult.type` will be `ParseResultType.NotListed`:
150
151```javascript
152import { parseDomain, ParseResultType } from "parse-domain";
153
154const parseResult = parseDomain("this.is.not-listed");
155
156console.log(parseResult.type === ParseResultType.NotListed); // true
157console.log(parseResult.labels); // ["this", "is", "not-listed"]
158```
159
160If a domain is not listed, it can be caused by an outdated list. Make sure to [update the list once in a while](#installation).
161
162⚠️ **Do not treat parseDomain as authoritative answer.** It cannot replace a real DNS lookup to validate if a given domain is known in a certain network.
163
164### 👉 Effective top-level domains
165
166Technically, the term _top-level domain_ describes the very last domain in a hostname (`uk` in `example.co.uk`). Most people, however, use the term _top-level domain_ for the _public suffix_ which is a namespace ["under which Internet users can directly register names"](https://publicsuffix.org/).
167
168Some examples for public suffixes:
169
170- `com` in `example.com`
171- `co.uk` in `example.co.uk`
172- `co` in `example.co`
173- but also `com.co` in `example.com.co`
174
175If the hostname is listed in the public suffix list, the `parseResult.type` will be `ParseResultType.Listed`:
176
177```javascript
178import { parseDomain, ParseResultType } from "parse-domain";
179
180const parseResult = parseDomain("example.co.uk");
181
182console.log(parseResult.type === ParseResultType.Listed); // true
183console.log(parseResult.labels); // ["example", "co", "uk"]
184```
185
186Now `parseResult` will also provide a `subDomains`, `domain` and `topLevelDomains` property:
187
188```javascript
189const { subDomains, domain, topLevelDomains } = parseResult;
190
191console.log(subDomains); // []
192console.log(domain); // "example"
193console.log(topLevelDomains); // ["co", "uk"]
194```
195
196### 👉 Switch over `parseResult.type` to distinguish between different parse results
197
198We recommend switching over the `parseResult.type`:
199
200```javascript
201switch (parseResult.type) {
202 case ParseResultType.Listed: {
203 const { hostname, topLevelDomains } = parseResult;
204
205 console.log(`${hostname} belongs to ${topLevelDomains.join(".")}`);
206 break;
207 }
208 case ParseResultType.Reserved:
209 case ParseResultType.NotListed: {
210 const { hostname } = parseResult;
211
212 console.log(`${hostname} is a reserved or unknown domain`);
213 break;
214 }
215 default:
216 throw new Error(`${hostname} is an ip address or invalid domain`);
217}
218```
219
220### ⚠️ Effective TLDs !== TLDs acknowledged by ICANN
221
222What's surprising to a lot of people is that the definition of public suffix means that regular user domains can become effective top-level domains:
223
224```javascript
225const { subDomains, domain, topLevelDomains } = parseDomain(
226 "parse-domain.github.io"
227);
228
229console.log(subDomains); // []
230console.log(domain); // "parse-domain"
231console.log(topLevelDomains); // ["github", "io"] 🤯
232```
233
234In this case, `github.io` is nothing else than a private domain name registrar. `github.io` is the _effective_ top-level domain and browsers are treating it like that (e.g. for setting [`document.domain`](https://developer.mozilla.org/en-US/docs/Web/API/Document/domain)).
235
236If you want to deviate from the browser's understanding of a top-level domain and you're only interested in top-level domains acknowledged by [ICANN](https://en.wikipedia.org/wiki/ICANN), there's an `icann` property:
237
238```javascript
239const parseResult = parseDomain("parse-domain.github.io");
240const { subDomains, domain, topLevelDomains } = parseResult.icann;
241
242console.log(subDomains); // ["parse-domain"]
243console.log(domain); // "github"
244console.log(topLevelDomains); // ["io"]
245```
246
247### ⚠️ `domain` can also be `undefined`
248
249```javascript
250const { subDomains, domain, topLevelDomains } = parseDomain("co.uk");
251
252console.log(subDomains); // []
253console.log(domain); // undefined
254console.log(topLevelDomains); // ["co", "uk"]
255```
256
257### ⚠️ `""` is a valid (but reserved) domain
258
259The empty string `""` represents the [DNS root](https://en.wikipedia.org/wiki/DNS_root_zone) and is considered to be valid. `parseResult.type` will be `ParseResultType.Reserved` in that case:
260
261```javascript
262const { type, subDomains, domain, topLevelDomains } = parseDomain("");
263
264console.log(type === ParseResultType.Reserved); // true
265console.log(subDomains); // []
266console.log(domain); // undefined
267console.log(topLevelDomains); // []
268```
269
270## API
271
272🧩 = JavaScript export<br>
273🧬 = TypeScript export
274
275<h3 id="api-js-parseDomain">
276🧩 <code>export parseDomain(hostname: string | typeof <a href="#api-js-NO_HOSTNAME">NO_HOSTNAME</a>): <a href="#api-ts-ParseResult">ParseResult</a></code>
277</h3>
278
279Takes a hostname (e.g. `"www.example.com"`) and returns a [`ParseResult`](#api-ts-ParseResult). The hostname must only contain basic latin characters, digits, hyphens and dots. International hostnames must be puny-encoded. Does not throw an error, even with invalid input.
280
281```javascript
282import { parseDomain } from "parse-domain";
283
284const parseResult = parseDomain("www.example.com");
285```
286
287<h3 id="api-js-fromUrl">
288🧩 <code>export fromUrl(input: string): string | typeof <a href="#api-js-NO_HOSTNAME">NO_HOSTNAME</a></code>
289</h3>
290
291Takes a URL-like string and tries to extract the hostname. Requires the global [`URL` constructor](https://developer.mozilla.org/en-US/docs/Web/API/URL) to be available on the platform. Returns the [`NO_HOSTNAME`](#api-js-NO_HOSTNAME) symbol if the input was not a string or the hostname could not be extracted. Take a look [at the test suite](/src/from-url.test.ts) for some examples. Does not throw an error, even with invalid input.
292
293<h3 id="api-js-NO_HOSTNAME">
294🧩 <code>export NO_HOSTNAME: unique symbol</code>
295</h3>
296
297`NO_HOSTNAME` is a symbol that is returned by [`fromUrl`](#api-js-fromUrl) when it was not able to extract a hostname from the given string. When passed to [`parseDomain`](#api-js-parseDomain), it will always yield a [`ParseResultInvalid`](#api-ts-ParseResultInvalid).
298
299<h3 id="api-ts-ParseResult">
300🧬 <code>export ParseResult</code>
301</h3>
302
303A `ParseResult` is either a [`ParseResultInvalid`](#api-ts-ParseResultInvalid), [`ParseResultIp`](#api-ts-ParseResultIp), [`ParseResultReserved`](#api-ts-ParseResultReserved), [`ParseResultNotListed`](#api-ts-ParseResultNotListed) or [`ParseResultListed`](#api-ts-ParseResultListed).
304
305All parse results have a `type` property that is either `"INVALID"`, `"IP"`,`"RESERVED"`,`"NOT_LISTED"`or`"LISTED"`. Use the exported [ParseResultType](#api-js-ParseResultType) to check for the type instead of checking against string literals.
306
307All parse results also have a `hostname` property that provides access to the sanitized hostname that was passed to [`parseDomain`](#api-js-parseDomain).
308
309<h3 id="api-js-ParseResultType">
310🧩 <code>export ParseResultType</code>
311</h3>
312
313An object that holds all possible [ParseResult](#api-ts-ParseResult) `type` values:
314
315```javascript
316const ParseResultType = {
317 Invalid: "INVALID",
318 Ip: "IP",
319 Reserved: "RESERVED",
320 NotListed: "NOT_LISTED",
321 Listed: "LISTED",
322};
323```
324
325<h3 id="api-ts-ParseResultType">
326🧬 <code>export ParseResultType</code>
327</h3>
328
329This type represents all possible [ParseResult](#api-ts-ParseResult) `type` values.
330
331<h3 id="api-ts-ParseResultInvalid">
332🧬 <code>export ParseResultInvalid</code>
333</h3>
334
335Describes the shape of the parse result that is returned when the given hostname does not adhere to [RFC 1034](https://tools.ietf.org/html/rfc1034):
336
337- The hostname is not a string
338- The hostname is longer than 253 characters
339- A domain label is shorter than 1 character
340- A domain label is longer than 63 characters
341- A domain label contains a character that is not a basic latin character, digit or hyphen
342
343```ts
344type ParseResultInvalid = {
345 type: ParseResultType.INVALID;
346 hostname: string | typeof NO_HOSTNAME;
347 errors: Array<ValidationError>;
348};
349
350// Example
351
352{
353 type: "INVALID",
354 hostname: ".com",
355 errors: [...]
356}
357```
358
359<h3 id="api-ts-ValidationError">
360🧬 <code>export ValidationError</code>
361</h3>
362
363Describes the shape of a validation error as returned by [`parseDomain`](#api-js-parseDomain)
364
365```ts
366type ValidationError = {
367 type: ValidationErrorType;
368 message: string;
369 column: number;
370};
371
372// Example
373
374{
375 type: "LABEL_MIN_LENGTH",
376 message: `Label "" is too short. Label is 0 octets long but should be at least 1.`,
377 column: 1,
378}
379```
380
381<h3 id="api-js-ValidationErrorType">
382🧩 <code>export ValidationErrorType</code>
383</h3>
384
385An object that holds all possible [ValidationError](#api-ts-ValidationError) `type` values:
386
387```javascript
388const ValidationErrorType = {
389 NoHostname: "NO_HOSTNAME",
390 DomainMaxLength: "DOMAIN_MAX_LENGTH",
391 LabelMinLength: "LABEL_MIN_LENGTH",
392 LabelMaxLength: "LABEL_MAX_LENGTH",
393 LabelInvalidCharacter: "LABEL_INVALID_CHARACTER",
394};
395```
396
397<h3 id="api-ts-ValidationErrorType">
398🧬 <code>export ValidationErrorType</code>
399</h3>
400
401This type represents all possible `type` values of a [ValidationError](#api-ts-ValidationError).
402
403<h3 id="api-ts-ParseResultIp">
404🧬 <code>export ParseResultIp</code>
405</h3>
406
407This type describes the shape of the parse result that is returned when the given hostname was an IPv4 or IPv6 address.
408
409```ts
410type ParseResultIp = {
411 type: ParseResultType.Ip;
412 hostname: string;
413 ipVersion: 4 | 6;
414};
415
416// Example
417
418{
419 type: "IP",
420 hostname: "192.168.0.1",
421 ipVersion: 4
422}
423```
424
425According to [RFC 3986](https://tools.ietf.org/html/rfc3986#section-3.2.2), IPv6 addresses need to be surrounded by `[` and `]` in URLs. [`parseDomain`](#api-js-parseDomain) accepts both IPv6 address with and without square brackets:
426
427```js
428// Recognized as IPv4 address
429parseDomain("192.168.0.1");
430// Both are recognized as proper IPv6 addresses
431parseDomain("::");
432parseDomain("[::]");
433```
434
435<h3 id="api-ts-ParseResultReserved">
436🧬 <code>export ParseResultReserved</code>
437</h3>
438
439This type describes the shape of the parse result that is returned when the given hostname
440
441- is the root domain (the empty string `""`)
442- belongs to the top-level domain `localhost`, `local`, `example`, `invalid` or `test`
443
444```ts
445type ParseResultReserved = {
446 type: ParseResultType.Reserved;
447 hostname: string;
448 labels: Array<string>;
449};
450
451// Example
452
453{
454 type: "RESERVED",
455 hostname: "pecorino.local",
456 labels: ["pecorino", "local"]
457}
458```
459
460⚠️ Reserved IPs, such as `127.0.0.1`, will not be reported as reserved, but as <a href="#-export-parseresultip">`ParseResultIp`</a>. See [#117](https://github.com/peerigon/parse-domain/issues/117).
461
462<h3 id="api-ts-ParseResultNotListed">
463🧬 <code>export ParseResultNotListed</code>
464</h3>
465
466Describes the shape of the parse result that is returned when the given hostname is valid and does not belong to a reserved top-level domain, but is not listed in the downloaded public suffix list.
467
468```ts
469type ParseResultNotListed = {
470 type: ParseResultType.NotListed;
471 hostname: string;
472 labels: Array<string>;
473};
474
475// Example
476
477{
478 type: "NOT_LISTED",
479 hostname: "this.is.not-listed",
480 labels: ["this", "is", "not-listed"]
481}
482```
483
484<h3 id="api-ts-ParseResultListed">
485🧬 <code>export ParseResultListed</code>
486</h3>
487
488Describes the shape of the parse result that is returned when the given hostname belongs to a top-level domain that is listed in the public suffix list.
489
490```ts
491type ParseResultListed = {
492 type: ParseResultType.Listed;
493 hostname: string;
494 labels: Array<string>;
495 subDomains: Array<string>;
496 domain: string | undefined;
497 topLevelDomains: Array<string>;
498 icann: {
499 subDomains: Array<string>;
500 domain: string | undefined;
501 topLevelDomains: Array<string>;
502 };
503};
504
505// Example
506
507{
508 type: "LISTED",
509 hostname: "parse-domain.github.io",
510 labels: ["parse-domain", "github", "io"]
511 subDomains: [],
512 domain: "parse-domain",
513 topLevelDomains: ["github", "io"],
514 icann: {
515 subDomains: ["parse-domain"],
516 domain: "github",
517 topLevelDomains: ["io"]
518 }
519}
520```
521
522## License
523
524MIT
525
526## Sponsors
527
528[<img src="https://assets.peerigon.com/peerigon/logo/peerigon-logo-flat-spinat.png" width="150" />](https://peerigon.com)