UNPKG

20 kBMarkdownView Raw
1[![Build Status](https://travis-ci.org/Keyang/node-csvtojson.svg?branch=master)](https://travis-ci.org/Keyang/node-csvtojson)
2[![Coverage Status](https://coveralls.io/repos/github/Keyang/node-csvtojson/badge.svg?branch=master)](https://coveralls.io/github/Keyang/node-csvtojson?branch=master)
3[![OpenCollective](https://opencollective.com/csvtojson/backers/badge.svg)](#backers)
4[![OpenCollective](https://opencollective.com/csvtojson/sponsors/badge.svg)](#sponsors)
5
6# CSVTOJSON
7
8`csvtojson` module is a comprehensive nodejs csv parser to convert csv to json or column arrays. It can be used as node.js library / command line tool / or in browser. Below are some features:
9
10* Strictly follow CSV definition [RF4180](https://www.loc.gov/preservation/digital/formats/fdd/fdd000323.shtml)
11* Work with millions of lines of CSV data
12* Provide comprehensive parsing parameters
13* Provide out of box CSV parsing tool for Command Line
14* Blazing fast -- [Focus on performance](https://github.com/Keyang/csvbench)
15* Give flexibility to developer with 'pre-defined' helpers
16* Allow async / streaming parsing
17* Provide a csv parser for both Node.JS and browsers
18* Easy to use API
19
20
21# csvtojson online
22
23[Here](http://keyangxiang.com/csvtojson/) is a free online csv to json convert service utilizing latest `csvtojson` module.
24
25# Upgrade to V2
26
27`csvtojson` has released version `2.0.0`.
28* To upgrade to v2, please follow [upgrading guide](https://github.com/Keyang/node-csvtojson/blob/master/docs/csvtojson-v2.md)
29* If you are looking for documentation for `v1`, open [this page](https://github.com/Keyang/node-csvtojson/blob/master/docs/readme.v1.md)
30
31It is still able to use v1 with `csvtojson@2.0.0`
32
33```js
34// v1
35const csvtojsonV1=require("csvtojson/v1");
36// v2
37const csvtojsonV2=require("csvtojson");
38const csvtojsonV2=require("csvtojson/v2");
39
40```
41
42# Menu
43
44* [Quick Start](#quick-start)
45* [API](#api)
46* [Browser Usage](#browser-usage)
47* [Contribution](#contribution)
48
49# Quick Start
50
51* [As Library](#library)
52* [As Command Line Tool](#command-line-usage)
53
54## Library
55
56### Installation
57
58```
59npm i --save csvtojson
60```
61
62### From CSV File to JSON Array
63
64```js
65/** csv file
66a,b,c
671,2,3
684,5,6
69*/
70const csvFilePath='<path to csv file>'
71const csv=require('csvtojson')
72csv()
73.fromFile(csvFilePath)
74.then((jsonObj)=>{
75 console.log(jsonObj);
76 /**
77 * [
78 * {a:"1", b:"2", c:"3"},
79 * {a:"4", b:"5". c:"6"}
80 * ]
81 */
82})
83
84// Async / await usage
85const jsonArray=await csv().fromFile(csvFilePath);
86
87```
88
89### From CSV String to CSV Row
90
91```js
92/**
93csvStr:
941,2,3
954,5,6
967,8,9
97*/
98const csv=require('csvtojson')
99csv({
100 noheader:true,
101 output: "csv"
102})
103.fromString(csvStr)
104.then((csvRow)=>{
105 console.log(csvRow) // => [["1","2","3"], ["4","5","6"], ["7","8","9"]]
106})
107
108```
109
110
111### Asynchronously process each line from csv url
112
113```js
114const request=require('request')
115const csv=require('csvtojson')
116
117csv()
118.fromStream(request.get('http://mywebsite.com/mycsvfile.csv'))
119.subscribe((json)=>{
120 return new Promise((resolve,reject)=>{
121 // long operation for each json e.g. transform / write into database.
122 })
123},onError,onComplete);
124
125```
126
127### Convert to CSV lines
128
129```js
130/**
131csvStr:
132a,b,c
1331,2,3
1344,5,6
135*/
136
137const csv=require('csvtojson')
138csv({output:"line"})
139.fromString(csvStr)
140.subscribe((csvLine)=>{
141 // csvLine => "1,2,3" and "4,5,6"
142})
143```
144
145### Use Stream
146
147```js
148const csv=require('csvtojson');
149
150const readStream=require('fs').createReadStream(csvFilePath);
151
152const writeStream=request.put('http://mysite.com/obj.json');
153
154readStream.pipe(csv()).pipe(writeStream);
155
156```
157
158To find more detailed usage, please see [API](#api) section
159
160## Command Line Usage
161
162### Installation
163
164```
165$ npm i -g csvtojson
166```
167
168### Usage
169
170
171```
172$ csvtojson [options] <csv file path>
173```
174
175### Example
176
177Convert csv file and save result to json file:
178
179```
180$ csvtojson source.csv > converted.json
181```
182
183Pipe in csv data:
184
185```
186$ cat ./source.csv | csvtojson > converted.json
187```
188
189Print Help:
190
191```
192$ csvtojson
193```
194
195# API
196
197* [Parameters](#parameters)
198* [Asynchronous Result Process](#asynchronous-result-process)
199* [Events](#events)
200* [Hook / Transform](#hook--transform)
201* [Nested JSON Structure](#nested-json-structure)
202* [Header Row](#header-row)
203* [Column Parser](#column-parser)
204
205
206## Parameters
207
208`require('csvtojson')` returns a constructor function which takes 2 arguments:
209
2101. Parser parameters
2112. Stream options
212
213```js
214const csv=require('csvtojson')
215const converter=csv(parserParameters, streamOptions)
216```
217Both arguments are optional.
218
219For `Stream Options` please read [Stream Option](https://nodejs.org/api/stream.html#stream_new_stream_transform_options) from Node.JS
220
221`parserParameters` is a JSON object like:
222
223```js
224const converter=csv({
225 noheader:true,
226 trim:true,
227})
228```
229Following parameters are supported:
230
231* **output**: The format to be converted to. "json" (default) -- convert csv to json. "csv" -- convert csv to csv row array. "line" -- convert csv to csv line string
232* **delimiter**: delimiter used for separating columns. Use "auto" if delimiter is unknown in advance, in this case, delimiter will be auto-detected (by best attempt). Use an array to give a list of potential delimiters e.g. [",","|","$"]. default: ","
233* **quote**: If a column contains delimiter, it is able to use quote character to surround the column content. e.g. "hello, world" won't be split into two columns while parsing. Set to "off" will ignore all quotes. default: " (double quote)
234* **trim**: Indicate if parser trim off spaces surrounding column content. e.g. " content " will be trimmed to "content". Default: true
235* **checkType**: This parameter turns on and off whether check field type. Default is false. (The default is `true` if version < 1.1.4)
236* **ignoreEmpty**: Ignore the empty value in CSV columns. If a column value is not given, set this to true to skip them. Default: false.
237* **fork (experimental)**: Fork another process to parse the CSV stream. It is effective if many concurrent parsing sessions for large csv files. Default: false
238* **noheader**:Indicating csv data has no header row and first row is data row. Default is false. See [header row](#header-row)
239* **headers**: An array to specify the headers of CSV data. If --noheader is false, this value will override CSV header row. Default: null. Example: ["my field","name"]. See [header row](#header-row)
240* **flatKeys**: Don't interpret dots (.) and square brackets in header fields as nested object or array identifiers at all (treat them like regular characters for JSON field identifiers). Default: false.
241* **maxRowLength**: the max character a csv row could have. 0 means infinite. If max number exceeded, parser will emit "error" of "row_exceed". if a possibly corrupted csv data provided, give it a number like 65535 so the parser won't consume memory. default: 0
242* **checkColumn**: whether check column number of a row is the same as headers. If column number mismatched headers number, an error of "mismatched_column" will be emitted.. default: false
243* **eol**: End of line character. If omitted, parser will attempt to retrieve it from the first chunks of CSV data.
244* **escape**: escape character used in quoted column. Default is double quote (") according to RFC4108. Change to back slash (\\) or other chars for your own case.
245* **includeColumns**: This parameter instructs the parser to include only those columns as specified by the regular expression. Example: /(name|age)/ will parse and include columns whose header contains "name" or "age"
246* **ignoreColumns**: This parameter instructs the parser to ignore columns as specified by the regular expression. Example: /(name|age)/ will ignore columns whose header contains "name" or "age"
247* **colParser**: Allows override parsing logic for a specific column. It accepts a JSON object with fields like: `headName: <String | Function | ColParser>` . e.g. {field1:'number'} will use built-in number parser to convert value of the `field1` column to number. For more information See [details below](#column-parser)
248* **alwaysSplitAtEOL**: Always interpret each line (as defined by `eol` like `\n`) as a row. This will prevent `eol` characters from being used within a row (even inside a quoted field). Default is false. Change to true if you are confident no inline line breaks (like line break in a cell which has multi line text).
249* **nullObject**: How to parse if a csv cell contains "null". Default false will keep "null" as string. Change to true if a null object is needed.
250* **downstreamFormat**: Option to set what JSON array format is needed by downstream. "line" is also called ndjson format. This format will write lines of JSON (without square brackets and commas) to downstream. "array" will write complete JSON array string to downstream (suitable for file writable stream etc). Default "line"
251* **needEmitAll**: Parser will build JSON result is `.then` is called (or await is used). If this is not desired, set this to false. Default is true.
252All parameters can be used in Command Line tool.
253
254## Asynchronous Result Process
255
256Since `v2.0.0`, asynchronous processing has been fully supported.
257
258e.g. Process each JSON result asynchronously.
259
260```js
261csv().fromFile(csvFile)
262.subscribe((json)=>{
263 return new Promise((resolve,reject)=>{
264 // Async operation on the json
265 // don't forget to call resolve and reject
266 })
267})
268```
269For more details please read:
270
271* [Add Promise and Async / Await support](https://github.com/Keyang/node-csvtojson/blob/master/docs/csvtojson-v2.md#add-promise-and-async--await-support)
272* [Add asynchronous line by line processing support](https://github.com/Keyang/node-csvtojson/blob/master/docs/csvtojson-v2.md#add-asynchronous-line-by-line-processing-support)
273* [Async Hooks Support](https://github.com/Keyang/node-csvtojson/blob/master/docs/csvtojson-v2.md#async-hooks-support)
274
275
276## Events
277
278`Converter` class defined a series of events.
279
280### header
281
282`header` event is emitted for each CSV file once. It passes an array object which contains the names of the header row.
283
284```js
285const csv=require('csvtojson')
286csv()
287.on('header',(header)=>{
288 //header=> [header1, header2, header3]
289})
290```
291
292`header` is always an array of strings without types.
293
294### data
295
296`data` event is emitted for each parsed CSV line. It passes buffer of stringified JSON in [ndjson format](http://ndjson.org/) unless `objectMode` is set true in stream option.
297
298```js
299const csv=require('csvtojson')
300csv()
301.on('data',(data)=>{
302 //data is a buffer object
303 const jsonStr= data.toString('utf8')
304})
305```
306
307### error
308`error` event is emitted if any errors happened during parsing.
309
310```js
311const csv=require('csvtojson')
312csv()
313.on('error',(err)=>{
314 console.log(err)
315})
316```
317
318Note that if `error` being emitted, the process will stop as node.js will automatically `unpipe()` upper-stream and chained down-stream<sup>1</sup>. This will cause `end` event never being emitted because `end` event is only emitted when all data being consumed <sup>2</sup>. If need to know when parsing finished, use `done` event instead of `end`.
319
3201. [Node.JS Readable Stream](https://github.com/nodejs/node/blob/master/lib/_stream_readable.js#L572-L583)
3212. [Writable end Event](https://nodejs.org/api/stream.html#stream_event_end)
322
323### done
324
325`done` event is emitted either after parsing successfully finished or any error happens. This indicates the processor has stopped.
326
327```js
328const csv=require('csvtojson')
329csv()
330.on('done',(error)=>{
331 //do some stuff
332})
333```
334
335if any error during parsing, it will be passed in callback.
336
337## Hook & Transform
338
339### Raw CSV Data Hook
340
341the hook -- `preRawData` will be called with csv string passed to parser.
342
343```js
344const csv=require('csvtojson')
345// synchronouse
346csv()
347.preRawData((csvRawData)=>{
348 var newData=csvRawData.replace('some value','another value');
349 return newData;
350})
351
352// asynchronous
353csv()
354.preRawData((csvRawData)=>{
355 return new Promise((resolve,reject)=>{
356 var newData=csvRawData.replace('some value','another value');
357 resolve(newData);
358 })
359
360})
361```
362
363### CSV File Line Hook
364
365The function is called each time a file line has been parsed in csv stream. The `lineIdx` is the file line number in the file starting with 0.
366
367```js
368const csv=require('csvtojson')
369// synchronouse
370csv()
371.preFileLine((fileLineString, lineIdx)=>{
372 if (lineIdx === 2){
373 return fileLineString.replace('some value','another value')
374 }
375 return fileLineString
376})
377
378// asynchronous
379csv()
380.preFileLine((fileLineString, lineIdx)=>{
381 return new Promise((resolve,reject)=>{
382 // async function processing the data.
383 })
384
385
386})
387```
388
389
390
391### Result transform
392
393To transform result that is sent to downstream, use `.subscribe` method for each json populated.
394
395```js
396const csv=require('csvtojson')
397csv()
398.subscribe((jsonObj,index)=>{
399 jsonObj.myNewKey='some value'
400 // OR asynchronously
401 return new Promise((resolve,reject)=>{
402 jsonObj.myNewKey='some value';
403 resolve();
404 })
405})
406.on('data',(jsonObj)=>{
407 console.log(jsonObj.myNewKey) // some value
408});
409```
410
411
412## Nested JSON Structure
413
414`csvtojson` is able to convert csv line to a nested JSON by correctly defining its csv header row. This is default out-of-box feature.
415
416Here is an example. Original CSV:
417
418```csv
419fieldA.title, fieldA.children.0.name, fieldA.children.0.id,fieldA.children.1.name, fieldA.children.1.employee.0.name,fieldA.children.1.employee.1.name, fieldA.address.0,fieldA.address.1, description
420Food Factory, Oscar, 0023, Tikka, Tim, Joe, 3 Lame Road, Grantstown, A fresh new food factory
421Kindom Garden, Ceil, 54, Pillow, Amst, Tom, 24 Shaker Street, HelloTown, Awesome castle
422
423```
424The data above contains nested JSON including nested array of JSON objects and plain texts.
425
426Using csvtojson to convert, the result would be like:
427
428```json
429[{
430 "fieldA": {
431 "title": "Food Factory",
432 "children": [{
433 "name": "Oscar",
434 "id": "0023"
435 }, {
436 "name": "Tikka",
437 "employee": [{
438 "name": "Tim"
439 }, {
440 "name": "Joe"
441 }]
442 }],
443 "address": ["3 Lame Road", "Grantstown"]
444 },
445 "description": "A fresh new food factory"
446}, {
447 "fieldA": {
448 "title": "Kindom Garden",
449 "children": [{
450 "name": "Ceil",
451 "id": "54"
452 }, {
453 "name": "Pillow",
454 "employee": [{
455 "name": "Amst"
456 }, {
457 "name": "Tom"
458 }]
459 }],
460 "address": ["24 Shaker Street", "HelloTown"]
461 },
462 "description": "Awesome castle"
463}]
464```
465
466### Flat Keys
467
468In order to not produce nested JSON, simply set `flatKeys:true` in parameters.
469
470```js
471/**
472csvStr:
473a.b,a.c
4741,2
475*/
476csv({flatKeys:true})
477.fromString(csvStr)
478.subscribe((jsonObj)=>{
479 //{"a.b":1,"a.c":2} rather than {"a":{"b":1,"c":2}}
480});
481
482```
483
484## Header Row
485
486`csvtojson` uses csv header row as generator of JSON keys. However, it does not require the csv source containing a header row. There are 4 ways to define header rows:
487
4881. First row of csv source. Use first row of csv source as header row. This is default.
4892. If first row of csv source is header row but it is incorrect and need to be replaced. Use `headers:[]` and `noheader:false` parameters.
4903. If original csv source has no header row but the header definition can be defined. Use `headers:[]` and `noheader:true` parameters.
4914. If original csv source has no header row and the header definition is unknown. Use `noheader:true`. This will automatically add `fieldN` header to csv cells
492
493
494### Example
495
496```js
497// replace header row (first row) from original source with 'header1, header2'
498csv({
499 noheader: false,
500 headers: ['header1','header2']
501})
502
503// original source has no header row. add 'field1' 'field2' ... 'fieldN' as csv header
504csv({
505 noheader: true
506})
507
508// original source has no header row. use 'header1' 'header2' as its header row
509csv({
510 noheader: true,
511 headers: ['header1','header2']
512})
513
514```
515
516## Column Parser
517
518`Column Parser` allows writing a custom parser for a column in CSV data.
519
520**What is Column Parser**
521
522When `csvtojson` walks through csv data, it converts value in a cell to something else. For example, if `checkType` is `true`, `csvtojson` will attempt to find a proper type parser according to the cell value. That is, if cell value is "5", a `numberParser` will be used and all value under that column will use the `numberParser` to transform data.
523
524### Built-in parsers
525
526There are currently following built-in parser:
527
528* string: Convert value to string
529* number: Convert value to number
530* omit: omit the whole column
531
532This will override types infered from `checkType:true` parameter. More built-in parsers will be added as requested in [issues page](https://github.com/Keyang/node-csvtojson/issues).
533
534Example:
535
536```js
537/*csv string
538column1,column2
539hello,1234
540*/
541csv({
542 colParser:{
543 "column1":"omit",
544 "column2":"string",
545 },
546 checkType:true
547})
548.fromString(csvString)
549.subscribe((jsonObj)=>{
550 //jsonObj: {column2:"1234"}
551})
552```
553
554### Custom parsers function
555
556Sometimes, developers want to define custom parser. It is able to pass a function to specific column in `colParser`.
557
558Example:
559
560```js
561/*csv data
562name, birthday
563Joe, 1970-01-01
564*/
565csv({
566 colParser:{
567 "birthday":function(item, head, resultRow, row , colIdx){
568 /*
569 item - "1970-01-01"
570 head - "birthday"
571 resultRow - {name:"Joe"}
572 row - ["Joe","1970-01-01"]
573 colIdx - 1
574 */
575 return new Date(item);
576 }
577 }
578})
579```
580
581Above example will convert `birthday` column into a js `Date` object.
582
583The returned value will be used in result JSON object. Returning `undefined` will not change result JSON object.
584
585### Flat key column
586
587It is also able to mark a column as `flat`:
588
589```js
590
591/*csv string
592person.comment,person.number
593hello,1234
594*/
595csv({
596 colParser:{
597 "person.number":{
598 flat:true,
599 cellParser: "number" // string or a function
600 }
601 }
602})
603.fromString(csvString)
604.subscribe((jsonObj)=>{
605 //jsonObj: {"person.number":1234,"person":{"comment":"hello"}}
606})
607```
608
609# Contribution
610
611Very much appreciate any types of donation and support.
612
613## Code
614
615`csvtojson` follows github convention for contributions. Here are some steps:
616
6171. Fork the repo to your github account
6182. Checkout code from your github repo to your local machine.
6193. Make code changes and don't forget add related tests.
6204. Run `npm test` locally before pushing code back.
6215. Create a [Pull Request](https://help.github.com/articles/creating-a-pull-request/) on github.
6226. Code review and merge
6237. Changes will be published to NPM within next version.
624
625Thanks all the [contributors](https://github.com/Keyang/node-csvtojson/graphs/contributors)
626
627## Backers
628
629Thank you to all our backers! [[Become a backer](https://opencollective.com/csvtojson#backer)]
630
631[![OpenCollective](https://opencollective.com/csvtojson/backers.svg?width=890)](https://opencollective.com/csvtojson#backer)
632
633## Sponsors
634
635Thank you to all our sponsors! (please ask your company to also support this open source project by [becoming a sponsor](https://opencollective.com/csvtojson#sponsor))
636
637## Paypal
638
639[![donate](https://www.paypalobjects.com/en_US/i/btn/btn_donate_SM.gif)](https://www.paypal.com/cgi-bin/webscr?cmd=_s-xclick&hosted_button_id=DUBQLRPJADJFQ)
640
641# Browser Usage
642
643To use `csvtojson` in browser is quite simple. There are two ways:
644
645**1. Embed script directly into script tag**
646
647There is a pre-built script located in `browser/csvtojson.min.js`. Simply include that file in a `script` tag in `index.html` page:
648
649```html
650<script src="node_modules/csvtojson/browser/csvtojson.min.js"></script>
651<!-- or use cdn -->
652<script src="https://cdn.rawgit.com/Keyang/node-csvtojson/d41f44aa/browser/csvtojson.min.js"></script>
653```
654then use a global `csv` function
655```html
656<script>
657csv({
658 output: "csv"
659})
660.fromString("a,b,c\n1,2,3")
661.then(function(result){
662
663})
664</script>
665```
666
667
668
669**2. Use webpack or browserify**
670
671If a module packager is preferred, just simply `require("csvtojson")`:
672
673```js
674var csv=require("csvtojson");
675
676// or with import
677import * as csv from "csvtojson";
678
679//then use csv as normal
680```