UNPKG

2.44 kBMarkdownView Raw
1# chardet
2
3_Chardet_ is a character detection module written in pure JavaScript (TypeScript). Module uses occurrence analysis to determine the most probable encoding.
4
5- Packed size is only **22 KB**
6- Works in all environments: Node / Browser / Native
7- Works on all platforms: Linux / Mac / Windows
8- No dependencies
9- No native code / bindings
10- 100% written in TypeScript
11- Extensive code coverage
12
13## Installation
14
15```
16npm i chardet
17```
18
19## Usage
20
21To return the encoding with the highest confidence:
22
23```javascript
24import chardet from 'chardet';
25
26const encoding = chardet.detect(Buffer.from('hello there!'));
27// or
28const encoding = await chardet.detectFile('/path/to/file');
29// or
30const encoding = chardet.detectFileSync('/path/to/file');
31```
32
33To return the full list of possible encodings use `analyse` method.
34
35```javascript
36import chardet from 'chardet';
37chardet.analyse(Buffer.from('hello there!'));
38```
39
40Returned value is an array of objects sorted by confidence value in descending order
41
42```javascript
43[
44 { confidence: 90, name: 'UTF-8' },
45 { confidence: 20, name: 'windows-1252', lang: 'fr' },
46];
47```
48
49In browser, you can use [Uint8Array](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Uint8Array) instead of the `Buffer`:
50
51```javascript
52import chardet from 'chardet';
53chardet.analyse(new Uint8Array([0x68, 0x65, 0x6c, 0x6c, 0x6f]));
54```
55
56## Working with large data sets
57
58Sometimes, when data set is huge and you want to optimize performance (with a tradeoff of less accuracy),
59you can sample only the first N bytes of the buffer:
60
61```javascript
62chardet
63 .detectFile('/path/to/file', { sampleSize: 32 })
64 .then((encoding) => console.log(encoding));
65```
66
67You can also specify where to begin reading from in the buffer:
68
69```javascript
70chardet
71 .detectFile('/path/to/file', { sampleSize: 32, offset: 128 })
72 .then((encoding) => console.log(encoding));
73```
74
75## Supported Encodings:
76
77- UTF-8
78- UTF-16 LE
79- UTF-16 BE
80- UTF-32 LE
81- UTF-32 BE
82- ISO-2022-JP
83- ISO-2022-KR
84- ISO-2022-CN
85- Shift_JIS
86- Big5
87- EUC-JP
88- EUC-KR
89- GB18030
90- ISO-8859-1
91- ISO-8859-2
92- ISO-8859-5
93- ISO-8859-6
94- ISO-8859-7
95- ISO-8859-8
96- ISO-8859-9
97- windows-1250
98- windows-1251
99- windows-1252
100- windows-1253
101- windows-1254
102- windows-1255
103- windows-1256
104- KOI8-R
105
106Currently only these encodings are supported.
107
108## TypeScript?
109
110Yes. Type definitions are included.
111
112### References
113
114- ICU project http://site.icu-project.org/