UNPKG

1.95 kBMarkdownView Raw
1# chardet
2
3*Chardet* is a character detection module written in pure Javascript (Typescript). Module uses occurrence analysis to determine the most probable encoding.
4
5- Packed size is only **22 KB**
6- Works in all environments: Node / Browser / Native
7- Works on all platforms: Linux / Mac / Windows
8- No dependencies
9- No native code / bindings
10- 100% written in Typescript
11- Extensive code coverage
12
13## Installation
14
15```
16npm i chardet
17```
18
19## Usage
20
21To return the encoding with the highest confidence:
22
23```javascript
24const chardet = require('chardet');
25
26chardet.detect(Buffer.from('hello there!'));
27// or
28chardet.detectFile('/path/to/file').then(encoding => console.log(encoding));
29// or
30chardet.detectFileSync('/path/to/file');
31```
32
33To return the full list of possible encodings use `analyse` method.
34
35```javascript
36const chardet = require('chardet');
37chardet.analyse(Buffer.from('hello there!'));
38```
39
40Returned value is an array of objects sorted by confidence value in decending order
41
42```javascript
43[
44 { confidence: 90, name: 'UTF-8' },
45 { confidence: 20, name: 'windows-1252', lang: 'fr' }
46];
47```
48
49## Working with large data sets
50
51Sometimes, when data set is huge and you want to optimize performace (in tradeoff of less accuracy),
52you can sample only first N bytes of the buffer:
53
54```javascript
55chardet
56 .detectFile('/path/to/file', { sampleSize: 32 })
57 .then(encoding => console.log(encoding));
58```
59
60## Supported Encodings:
61
62- UTF-8
63- UTF-16 LE
64- UTF-16 BE
65- UTF-32 LE
66- UTF-32 BE
67- ISO-2022-JP
68- ISO-2022-KR
69- ISO-2022-CN
70- Shift_JIS
71- Big5
72- EUC-JP
73- EUC-KR
74- GB18030
75- ISO-8859-1
76- ISO-8859-2
77- ISO-8859-5
78- ISO-8859-6
79- ISO-8859-7
80- ISO-8859-8
81- ISO-8859-9
82- windows-1250
83- windows-1251
84- windows-1252
85- windows-1253
86- windows-1254
87- windows-1255
88- windows-1256
89- KOI8-R
90
91Currently only these encodings are supported.
92
93## Typescript?
94
95Yes. Type definitions are included.
96
97### References
98
99- ICU project http://site.icu-project.org/