1 | [tests]: http://img.shields.io/travis/mafintosh/csv-parser.svg
|
2 | [tests-url]: http://travis-ci.org/mafintosh/csv-parser
|
3 |
|
4 | [cover]: https://codecov.io/gh/mafintosh/csv-parser/branch/master/graph/badge.svg
|
5 | [cover-url]: https://codecov.io/gh/mafintosh/csv-parser
|
6 |
|
7 | [size]: https://packagephobia.now.sh/badge?p=csv-parser
|
8 | [size-url]: https://packagephobia.now.sh/result?p=csv-parser
|
9 |
|
10 | # csv-parser
|
11 |
|
12 | [![tests][tests]][tests-url]
|
13 | [![cover][cover]][cover-url]
|
14 | [![size][size]][size-url]
|
15 |
|
16 | Streaming CSV parser that aims for maximum speed as well as compatibility with
|
17 | the [csv-spectrum](https://npmjs.org/csv-spectrum) CSV acid test suite.
|
18 |
|
19 | `csv-parser` can convert CSV into JSON at at rate of around 90,000 rows per
|
20 | second. Performance varies with the data used; try `bin/bench.js <your file>`
|
21 | to benchmark your data.
|
22 |
|
23 | `csv-parser` can be used in the browser with [browserify](http://browserify.org/).
|
24 |
|
25 | [neat-csv](https://github.com/sindresorhus/neat-csv) can be used if a `Promise`
|
26 | based interface to `csv-parser` is needed.
|
27 |
|
28 | _Note: This module requires Node v8.16.0 or higher._
|
29 |
|
30 | ## Benchmarks
|
31 |
|
32 | ⚡️ `csv-parser` is greased-lightning fast
|
33 |
|
34 | ```console
|
35 | → npm run bench
|
36 |
|
37 | Filename Rows Parsed Duration
|
38 | backtick.csv 2 3.5ms
|
39 | bad-data.csv 3 0.55ms
|
40 | basic.csv 1 0.26ms
|
41 | comma-in-quote.csv 1 0.29ms
|
42 | comment.csv 2 0.40ms
|
43 | empty-columns.csv 1 0.40ms
|
44 | escape-quotes.csv 3 0.38ms
|
45 | geojson.csv 3 0.46ms
|
46 | large-dataset.csv 7268 73ms
|
47 | newlines.csv 3 0.35ms
|
48 | no-headers.csv 3 0.26ms
|
49 | option-comment.csv 2 0.24ms
|
50 | option-escape.csv 3 0.25ms
|
51 | option-maxRowBytes.csv 4577 39ms
|
52 | option-newline.csv 0 0.47ms
|
53 | option-quote-escape.csv 3 0.33ms
|
54 | option-quote-many.csv 3 0.38ms
|
55 | option-quote.csv 2 0.22ms
|
56 | quotes+newlines.csv 3 0.20ms
|
57 | strict.csv 3 0.22ms
|
58 | latin.csv 2 0.38ms
|
59 | mac-newlines.csv 2 0.28ms
|
60 | utf16-big.csv 2 0.33ms
|
61 | utf16.csv 2 0.26ms
|
62 | utf8.csv 2 0.24ms
|
63 | ```
|
64 |
|
65 | ## Install
|
66 |
|
67 | Using npm:
|
68 |
|
69 | ```console
|
70 | $ npm install csv-parser
|
71 | ```
|
72 |
|
73 | Using yarn:
|
74 |
|
75 | ```console
|
76 | $ yarn add csv-parser
|
77 | ```
|
78 |
|
79 | ## Usage
|
80 |
|
81 | To use the module, create a readable stream to a desired CSV file, instantiate
|
82 | `csv`, and pipe the stream to `csv`.
|
83 |
|
84 | Suppose you have a CSV file `data.csv` which contains the data:
|
85 |
|
86 | ```
|
87 | NAME,AGE
|
88 | Daffy Duck,24
|
89 | Bugs Bunny,22
|
90 | ```
|
91 |
|
92 | It could then be parsed, and results shown like so:
|
93 |
|
94 | ``` js
|
95 | const csv = require('csv-parser')
|
96 | const fs = require('fs')
|
97 | const results = [];
|
98 |
|
99 | fs.createReadStream('data.csv')
|
100 | .pipe(csv())
|
101 | .on('data', (data) => results.push(data))
|
102 | .on('end', () => {
|
103 | console.log(results);
|
104 | // [
|
105 | // { NAME: 'Daffy Duck', AGE: '24' },
|
106 | // { NAME: 'Bugs Bunny', AGE: '22' }
|
107 | // ]
|
108 | });
|
109 | ```
|
110 |
|
111 | To specify options for `csv`, pass an object argument to the function. For
|
112 | example:
|
113 |
|
114 | ```js
|
115 | csv({ separator: '\t' });
|
116 | ```
|
117 |
|
118 | ## API
|
119 |
|
120 | ### csv([options | headers])
|
121 |
|
122 | Returns: `Array[Object]`
|
123 |
|
124 | #### options
|
125 |
|
126 | Type: `Object`
|
127 |
|
128 | As an alternative to passing an `options` object, you may pass an `Array[String]`
|
129 | which specifies the headers to use. For example:
|
130 |
|
131 | ```js
|
132 | csv(['Name', 'Age']);
|
133 | ```
|
134 |
|
135 | If you need to specify options _and_ headers, please use the the object notation
|
136 | with the `headers` property as shown below.
|
137 |
|
138 | #### escape
|
139 |
|
140 | Type: `String`<br>
|
141 | Default: `"`
|
142 |
|
143 | A single-character string used to specify the character used to escape strings
|
144 | in a CSV row.
|
145 |
|
146 | #### headers
|
147 |
|
148 | Type: `Array[String] | Boolean`
|
149 |
|
150 | Specifies the headers to use. Headers define the property key for each value in
|
151 | a CSV row. If no `headers` option is provided, `csv-parser` will use the first
|
152 | line in a CSV file as the header specification.
|
153 |
|
154 | If `false`, specifies that the first row in a data file does _not_ contain
|
155 | headers, and instructs the parser to use the column index as the key for each column.
|
156 | Using `headers: false` with the same `data.csv` example from above would yield:
|
157 |
|
158 | ``` js
|
159 | [
|
160 | { '0': 'Daffy Duck', '1': 24 },
|
161 | { '0': 'Bugs Bunny', '1': 22 }
|
162 | ]
|
163 | ```
|
164 |
|
165 | _Note: If using the `headers` for an operation on a file which contains headers on the first line, specify `skipLines: 1` to skip over the row, or the headers row will appear as normal row data. Alternatively, use the `mapHeaders` option to manipulate existing headers in that scenario._
|
166 |
|
167 | #### mapHeaders
|
168 |
|
169 | Type: `Function`
|
170 |
|
171 | A function that can be used to modify the values of each header. Return a `String` to modify the header. Return `null` to remove the header, and it's column, from the results.
|
172 |
|
173 | ```js
|
174 | csv({
|
175 | mapHeaders: ({ header, index }) => header.toLowerCase()
|
176 | })
|
177 | ```
|
178 |
|
179 | ##### Parameters
|
180 |
|
181 | **header** _String_ The current column header.<br/>
|
182 | **index** _Number_ The current column index.
|
183 |
|
184 | #### mapValues
|
185 |
|
186 | Type: `Function`
|
187 |
|
188 | A function that can be used to modify the content of each column. The return value will replace the current column content.
|
189 |
|
190 | ```js
|
191 | csv({
|
192 | mapValues: ({ header, index, value }) => value.toLowerCase()
|
193 | })
|
194 | ```
|
195 |
|
196 | ##### Parameters
|
197 |
|
198 | **header** _String_ The current column header.<br/>
|
199 | **index** _Number_ The current column index.<br/>
|
200 | **value** _String_ The current column value (or content).
|
201 |
|
202 | ##### newline
|
203 |
|
204 | Type: `String`<br>
|
205 | Default: `\n`
|
206 |
|
207 | Specifies a single-character string to denote the end of a line in a CSV file.
|
208 |
|
209 | #### quote
|
210 |
|
211 | Type: `String`<br>
|
212 | Default: `"`
|
213 |
|
214 | Specifies a single-character string to denote a quoted string.
|
215 |
|
216 | #### raw
|
217 |
|
218 | Type: `Boolean`<br>
|
219 |
|
220 | If `true`, instructs the parser not to decode UTF-8 strings.
|
221 |
|
222 | #### separator
|
223 |
|
224 | Type: `String`<br>
|
225 | Default: `,`
|
226 |
|
227 | Specifies a single-character string to use as the column separator for each row.
|
228 |
|
229 | #### skipComments
|
230 |
|
231 | Type: `Boolean | String`<br>
|
232 | Default: `false`
|
233 |
|
234 | Instructs the parser to ignore lines which represent comments in a CSV file. Since there is no specification that dictates what a CSV comment looks like, comments should be considered non-standard. The "most common" character used to signify a comment in a CSV file is `"#"`. If this option is set to `true`, lines which begin with `#` will be skipped. If a custom character is needed to denote a commented line, this option may be set to a string which represents the leading character(s) signifying a comment line.
|
235 |
|
236 | #### skipLines
|
237 |
|
238 | Type: `Number`<br>
|
239 | Default: `0`
|
240 |
|
241 | Specifies the number of lines at the beginning of a data file that the parser should
|
242 | skip over, prior to parsing headers.
|
243 |
|
244 | #### maxRowBytes
|
245 |
|
246 | Type: `Number`<br>
|
247 | Default: `Number.MAX_SAFE_INTEGER`
|
248 |
|
249 | Maximum number of bytes per row. An error is thrown if a line exeeds this value. The default value is on 8 peta byte.
|
250 |
|
251 | #### strict
|
252 |
|
253 | Type: `Boolean`<br>
|
254 |
|
255 | If `true`, instructs the parser that the number of columns in each row must match
|
256 | the number of `headers` specified.
|
257 |
|
258 | ## Events
|
259 |
|
260 | The following events are emitted during parsing:
|
261 |
|
262 | ### `data`
|
263 |
|
264 | Emitted for each row of data parsed with the notable exception of the header
|
265 | row. Please see [Usage](#Usage) for an example.
|
266 |
|
267 | ### `headers`
|
268 |
|
269 | Emitted after the header row is parsed. The first parameter of the event
|
270 | callback is an `Array[String]` containing the header names.
|
271 |
|
272 | ```js
|
273 | fs.createReadStream('data.csv')
|
274 | .pipe(csv())
|
275 | .on('headers', (headers) => {
|
276 | console.log(`First header: ${headers[0]}`)
|
277 | })
|
278 | ```
|
279 |
|
280 | ### Readable Stream Events
|
281 |
|
282 | Events available on Node built-in
|
283 | [Readable Streams](https://nodejs.org/api/stream.html#stream_class_stream_readable)
|
284 | are also emitted. The `end` event should be used to detect the end of parsing.
|
285 |
|
286 | ## CLI
|
287 |
|
288 | This module also provides a CLI which will convert CSV to
|
289 | [newline-delimited](http://ndjson.org/) JSON. The following CLI flags can be
|
290 | used to control how input is parsed:
|
291 |
|
292 | ```
|
293 | Usage: csv-parser [filename?] [options]
|
294 |
|
295 | --escape,-e Set the escape character (defaults to quote value)
|
296 | --headers,-h Explicitly specify csv headers as a comma separated list
|
297 | --help Show this help
|
298 | --output,-o Set output file. Defaults to stdout
|
299 | --quote,-q Set the quote character ('"' by default)
|
300 | --remove Remove columns from output by header name
|
301 | --separator,-s Set the separator character ("," by default)
|
302 | --skipComments,-c Skip CSV comments that begin with '#'. Set a value to change the comment character.
|
303 | --skipLines,-l Set the number of lines to skip to before parsing headers
|
304 | --strict Require column length match headers length
|
305 | --version,-v Print out the installed version
|
306 | ```
|
307 |
|
308 | For example; to parse a TSV file:
|
309 |
|
310 | ```
|
311 | cat data.tsv | csv-parser -s $'\t'
|
312 | ```
|
313 |
|
314 | ## Encoding
|
315 |
|
316 | Users may encounter issues with the encoding of a CSV file. Transcoding the
|
317 | source stream can be done neatly with a modules such as:
|
318 | - [`iconv-lite`](https://www.npmjs.com/package/iconv-lite)
|
319 | - [`iconv`](https://www.npmjs.com/package/iconv)
|
320 |
|
321 | Or native [`iconv`](http://man7.org/linux/man-pages/man1/iconv.1.html) if part
|
322 | of a pipeline.
|
323 |
|
324 | ## Byte Order Marks
|
325 |
|
326 | Some CSV files may be generated with, or contain a leading [Byte Order Mark](https://en.wikipedia.org/wiki/Byte_order_mark#UTF-8). This may cause issues parsing headers and/or data from your file. From Wikipedia:
|
327 |
|
328 | >The Unicode Standard permits the BOM in UTF-8, but does not require nor recommend its use. Byte order has no meaning in UTF-8.
|
329 |
|
330 | To use this module with a file containing a BOM, please use a module like [strip-bom-stream](https://github.com/sindresorhus/strip-bom-stream) in your pipeline:
|
331 |
|
332 | ```js
|
333 | const fs = require('fs');
|
334 |
|
335 | const csv = require('csv-parser');
|
336 | const stripBom = require('strip-bom-stream');
|
337 |
|
338 | fs.createReadStream('data.csv')
|
339 | .pipe(stripBom())
|
340 | .pipe(csv())
|
341 | ...
|
342 | ```
|
343 |
|
344 | When using the CLI, the BOM can be removed by first running:
|
345 |
|
346 | ```console
|
347 | $ sed $'s/\xEF\xBB\xBF//g' data.csv
|
348 | ```
|
349 |
|
350 | ## Meta
|
351 |
|
352 | [CONTRIBUTING](./.github/CONTRIBUTING)
|
353 |
|
354 | [LICENSE (MIT)](./LICENSE)
|