UNPKG

binary-parser/README.md

Version:
21.2 kBMarkdownView Raw
1# Binary-parser
2
3[![build](https://github.com/keichi/binary-parser/workflows/build/badge.svg)](https://github.com/keichi/binary-parser/actions?query=workflow%3Abuild)
4[![npm](https://img.shields.io/npm/v/binary-parser)](https://www.npmjs.com/package/binary-parser)
5[![license](https://img.shields.io/github/license/keichi/binary-parser)](https://github.com/keichi/binary-parser/blob/master/LICENSE)
6
7Binary-parser is a parser builder for JavaScript that enables you to write
8efficient binary parsers in a simple and declarative manner.
9
10It supports all common data types required to analyze a structured binary
11data. Binary-parser dynamically generates and compiles the parser code
12on-the-fly, which runs as fast as a hand-written parser (which takes much more
13time and effort to write). Supported data types are:
14
15- [Integers](#uint8-16-32-64le-bename-options) (8, 16, 32 and 64 bit signed
16  and unsigned integers)
17- [Floating point numbers](#float-doublele-bename-options) (32 and 64 bit
18  floating point values)
19- [Bit fields](#bit1-32name-options) (bit fields with length from 1 to 32
20  bits)
21- [Strings](#stringname-options) (fixed-length, variable-length and zero
22  terminated strings with various encodings)
23- [Arrays](#arrayname-options) (fixed-length and variable-length arrays of
24  builtin or user-defined element types)
25- [Choices](#choicename-options) (supports integer keys)
26- [Pointers](#pointername-options)
27- User defined types (arbitrary combination of builtin types)
28
29Binary-parser was inspired by [BinData](https://github.com/dmendel/bindata)
30and [binary](https://github.com/substack/node-binary).
31
32## Quick Start
33
341. Create an empty `Parser` object with `new Parser()` or `Parser.start()`.
352. Chain methods to build your desired parser. (See [API](#api) for detailed
36   documentation of each method)
373. Call `Parser.prototype.parse` with a `Buffer`/`Uint8Array` object passed as
38   its only argument.
394. The parsed result will be returned as an object.
40   - If parsing failed, an exception will be thrown.
41
42```javascript
43// Module import
44const Parser = require("binary-parser").Parser;
45
46// Alternative way to import the module
47// import { Parser } from "binary-parser";
48
49// Build an IP packet header Parser
50const ipHeader = new Parser()
51  .endianness("big")
52  .bit4("version")
53  .bit4("headerLength")
54  .uint8("tos")
55  .uint16("packetLength")
56  .uint16("id")
57  .bit3("offset")
58  .bit13("fragOffset")
59  .uint8("ttl")
60  .uint8("protocol")
61  .uint16("checksum")
62  .array("src", {
63    type: "uint8",
64    length: 4
65  })
66  .array("dst", {
67    type: "uint8",
68    length: 4
69  });
70
71// Prepare buffer to parse.
72const buf = Buffer.from("450002c5939900002c06ef98adc24f6c850186d1", "hex");
73
74// Parse buffer and show result
75console.log(ipHeader.parse(buf));
76```
77
78## Installation
79
80You can install `binary-parser` via npm:
81
82```bash
83npm install binary-parser
84```
85
86The npm package provides entry points for both CommonJS and ES modules.
87
88## API
89
90### new Parser()
91Create an empty parser object that parses nothing.
92
93### parse(buffer)
94Parse a `Buffer`/`Uint8Array` object `buffer` with this parser and return the
95resulting object. When `parse(buffer)` is called for the first time, the
96associated parser code is compiled on-the-fly and internally cached.
97
98### create(constructorFunction)
99Set the constructor function that should be called to create the object
100returned from the `parse` method.
101
102### [u]int{8, 16, 32, 64}{le, be}(name[, options])
103Parse bytes as an integer and store it in a variable named `name`. `name`
104should consist only of alphanumeric characters and start with an alphabet.
105Number of bits can be chosen from 8, 16, 32 and 64. Byte-ordering can be either
106`le` for little endian or `be` for big endian. With no prefix, it parses as a
107signed number, with `u` prefix as an unsigned number. The runtime type
108returned by the 8, 16, 32 bit methods is `number` while the type
109returned by the 64 bit is `bigint`.
110
111**Note:** [u]int64{be,le} methods only work if your runtime is node v12.0.0 or
112greater. Lower versions will throw a runtime error.
113
114```javascript
115const parser = new Parser()
116  // Signed 32-bit integer (little endian)
117  .int32le("a")
118  // Unsigned 8-bit integer
119  .uint8("b")
120  // Signed 16-bit integer (big endian)
121  .int16be("c")
122  // signed 64-bit integer (big endian)
123  .int64be("d")
124```
125
126### bit\[1-32\](name[, options])
127Parse bytes as a bit field and store it in variable `name`. There are 32
128methods from `bit1` to `bit32` each corresponding to 1-bit-length to
12932-bits-length bit field.
130
131### {float, double}{le, be}(name[, options])
132Parse bytes as a floating-point value and stores it to a variable named
133`name`.
134
135```javascript
136const parser = new Parser()
137  // 32-bit floating value (big endian)
138  .floatbe("a")
139  // 64-bit floating value (little endian)
140  .doublele("b");
141```
142
143### string(name[, options])
144Parse bytes as a string. `name` should consist only of alpha numeric
145characters and start with an alphabet. `options` is an object which can have
146the following keys:
147
148- `encoding` - (Optional, defaults to `utf8`) Specify which encoding to use.
149  Supported encodings include `"hex"` and all encodings supported by
150  [`TextDecoder`](https://developer.mozilla.org/en-US/docs/Web/API/TextDecoder/encoding).
151- `length ` - (Optional) Length of the string. Can be a number, string or a
152  function. Use number for statically sized arrays, string to reference
153  another variable and function to do some calculation.
154- `zeroTerminated` - (Optional, defaults to `false`) If true, then this parser
155  reads until it reaches zero.
156- `greedy` - (Optional, defaults to `false`) If true, then this parser reads
157  until it reaches the end of the buffer. Will consume zero-bytes.
158- `stripNull` - (Optional, must be used with `length`) If true, then strip
159  null characters from end of the string.
160
161### buffer(name[, options])
162Parse bytes as a buffer. Its type will be the same as the input to
163`parse(buffer)`. `name` should consist only of alpha numeric characters and
164start with an alphabet. `options` is an object which can have the following
165keys:
166
167- `clone` - (Optional, defaults to `false`) By default,
168  `buffer(name [,options])` returns a new buffer which references the same
169  memory as the parser input, but offset and cropped by a certain range. If
170  this option is true, input buffer will be cloned and a new buffer
171  referencing a new memory region is returned.
172- `length ` - (either `length` or `readUntil` is required) Length of the
173  buffer. Can be a number, string or a function. Use number for statically
174  sized buffers, string to reference another variable and function to do some
175  calculation.
176- `readUntil` - (either `length` or `readUntil` is required) If `"eof"`, then
177  this parser will read till it reaches the end of the `Buffer`/`Uint8Array`
178  object. If it is a function, this parser will read the buffer until the
179  function returns true.
180
181### array(name, options)
182Parse bytes as an array. `options` is an object which can have the following
183keys:
184
185- `type` - (Required) Type of the array element. Can be a string or a user
186  defined `Parser` object. If it's a string, you have to choose from [u]int{8,
187  16, 32}{le, be}.
188- `length` - (either `length`, `lengthInBytes`, or `readUntil` is required)
189  Length of the array. Can be a number, string or a function. Use number for
190  statically sized arrays.
191- `lengthInBytes` - (either `length`, `lengthInBytes`, or `readUntil` is
192  required) Length of the array expressed in bytes. Can be a number, string or
193  a function. Use number for statically sized arrays.
194- `readUntil` - (either `length`, `lengthInBytes`, or `readUntil` is required)
195  If `"eof"`, then this parser reads until the end of the `Buffer`/`Uint8Array`
196  object. If function it reads until the function returns true.
197
198```javascript
199const parser = new Parser()
200  // Statically sized array
201  .array("data", {
202    type: "int32",
203    length: 8
204  })
205
206  // Dynamically sized array (references another variable)
207  .uint8("dataLength")
208  .array("data2", {
209    type: "int32",
210    length: "dataLength"
211  })
212
213  // Dynamically sized array (with some calculation)
214  .array("data3", {
215    type: "int32",
216    length: function() {
217      return this.dataLength - 1;
218    } // other fields are available through `this`
219  })
220
221  // Statically sized array
222  .array("data4", {
223    type: "int32",
224    lengthInBytes: 16
225  })
226
227  // Dynamically sized array (references another variable)
228  .uint8("dataLengthInBytes")
229  .array("data5", {
230    type: "int32",
231    lengthInBytes: "dataLengthInBytes"
232  })
233
234  // Dynamically sized array (with some calculation)
235  .array("data6", {
236    type: "int32",
237    lengthInBytes: function() {
238      return this.dataLengthInBytes - 4;
239    } // other fields are available through `this`
240  })
241
242  // Dynamically sized array (with stop-check on parsed item)
243  .array("data7", {
244    type: "int32",
245    readUntil: function(item, buffer) {
246      return item === 42;
247    } // stop when specific item is parsed. buffer can be used to perform a read-ahead.
248  })
249
250  // Use user defined parser object
251  .array("data8", {
252    type: userDefinedParser,
253    length: "dataLength"
254  });
255```
256
257### choice([name,] options)
258Choose one parser from multiple parsers according to a field value and store
259its parsed result to key `name`. If `name` is null or omitted, the result of
260the chosen parser is directly embedded into the current object. `options` is
261an object which can have the following keys:
262
263- `tag` - (Required) The value used to determine which parser to use from the
264  `choices`. Can be a string pointing to another field or a function.
265- `choices` - (Required) An object which key is an integer and value is the
266  parser which is executed when `tag` equals the key value.
267- `defaultChoice` - (Optional) In case if the tag value doesn't match any of
268  `choices`, this parser is used.
269
270```javascript
271const parser1 = ...;
272const parser2 = ...;
273const parser3 = ...;
274
275const parser = new Parser().uint8("tagValue").choice("data", {
276  tag: "tagValue",
277  choices: {
278    1: parser1, // if tagValue == 1, execute parser1
279    4: parser2, // if tagValue == 4, execute parser2
280    5: parser3 // if tagValue == 5, execute parser3
281  }
282});
283```
284
285Combining `choice` with `array` is an idiom to parse
286[TLV](http://en.wikipedia.org/wiki/Type-length-value)-based binary formats.
287
288### nest([name,] options)
289Execute an inner parser and store its result to key `name`. If `name` is null
290or omitted, the result of the inner parser is directly embedded into the
291current object. `options` is an object which can have the following keys:
292
293- `type` - (Required) A `Parser` object.
294
295### pointer(name [,options])
296Jump to `offset`, execute parser for `type` and rewind to previous offset.
297Useful for parsing binary formats such as ELF where the offset of a field is
298pointed by another field.
299
300- `type` - (Required) Can be a string `[u]int{8, 16, 32, 64}{le, be}`
301   or a user defined `Parser` object.
302- `offset` - (Required) Indicates absolute offset from the beginning of the
303  input buffer. Can be a number, string or a function.
304
305### saveOffset(name [,options])
306Save the current buffer offset as key `name`. This function is only useful
307when called after another function which would advance the internal buffer
308offset.
309
310```javascript
311const parser = new Parser()
312  // this call advances the buffer offset by
313  // a variable (i.e. unknown to us) number of bytes
314  .string("name", {
315    zeroTerminated: true
316  })
317  // this variable points to an absolute position
318  // in the buffer
319  .uint32("seekOffset")
320  // now, save the "current" offset in the stream
321  // as the variable "currentOffset"
322  .saveOffset("currentOffset")
323  // finally, use the saved offset to figure out
324  // how many bytes we need to skip
325  .seek(function() {
326    return this.seekOffset - this.currentOffset;
327  })
328  ... // the parser would continue here
329```
330
331### seek(relOffset)
332Move the buffer offset for `relOffset` bytes from the current position. Use a
333negative `relOffset` value to rewind the offset. This method was previously
334named `skip(length)`.
335
336### endianness(endianness)
337Define what endianness to use in this parser. `endianness` can be either
338`"little"` or `"big"`. The default endianness of `Parser` is set to big-endian.
339
340```javascript
341const parser = new Parser()
342  .endianness("little")
343  // You can specify endianness explicitly
344  .uint16be("a")
345  .uint32le("a")
346  // Or you can omit endianness (in this case, little-endian is used)
347  .uint16("b")
348  .int32("c");
349```
350
351### namely(alias)
352Set an alias to this parser, so that it can be referred to by name in methods
353like `.array`, `.nest` and `.choice`, without the requirement to have an
354instance of this parser.
355
356Especially, the parser may reference itself:
357
358```javascript
359const stop = new Parser();
360
361const parser = new Parser()
362  .namely("self") // use 'self' to refer to the parser itself
363  .uint8("type")
364  .choice("data", {
365    tag: "type",
366    choices: {
367      0: stop,
368      1: "self",
369      2: Parser.start()
370        .nest("left", { type: "self" })
371        .nest("right", { type: "self" }),
372      3: Parser.start()
373        .nest("one", { type: "self" })
374        .nest("two", { type: "self" })
375        .nest("three", { type: "self" })
376    }
377  });
378
379//        2
380//       / \
381//      3   1
382//    / | \  \
383//   1  0  2  0
384//  /     / \
385// 0     1   0
386//      /
387//     0
388
389const buffer = Buffer.from([
390  2,
391  /* left -> */ 3,
392    /* one   -> */ 1, /* -> */ 0,
393    /* two   -> */ 0,
394    /* three -> */ 2,
395      /* left  -> */ 1, /* -> */ 0,
396      /* right -> */ 0,
397  /* right -> */ 1, /* -> */ 0
398]);
399
400parser.parse(buffer);
401```
402
403For most of the cases there is almost no difference to the instance-way of
404referencing, but this method provides the way to parse recursive trees, where
405each node could reference the node of the same type from the inside.
406
407Also, when you reference a parser using its instance twice, the generated code
408will contain two similar parts of the code included, while with the named
409approach, it will include a function with a name, and will just call this
410function for every case of usage.
411
412**Note**: This style could lead to circular references and infinite recursion,
413to avoid this, ensure that every possible path has its end. Also, this
414recursion is not tail-optimized, so could lead to memory leaks when it goes
415too deep.
416
417An example of referencing other parsers:
418
419```javascript
420// the line below registers the name "self", so we will be able to use it in
421// `twoCells` as a reference
422const parser = Parser.start().namely("self");
423
424const stop = Parser.start().namely("stop");
425
426const twoCells = Parser.start()
427  .namely("twoCells")
428  .nest("left", { type: "self" })
429  .nest("right", { type: "stop" });
430
431parser.uint8("type").choice("data", {
432  tag: "type",
433  choices: {
434    0: "stop",
435    1: "self",
436    2: "twoCells"
437  }
438});
439
440const buffer = Buffer.from([2, /* left */ 1, 1, 0, /* right */ 0]);
441
442parser.parse(buffer);
443```
444
445### wrapped([name,] options)
446Read data, then wrap it by transforming it by a function for further parsing.
447It works similarly to a buffer where it reads a block of data. But instead of
448returning the buffer it will pass the buffer on to a parser for further processing.
449
450The result will be stored in the key `name`. If `name` is an empty string or
451`null`, or if it is omitted, the parsed result is directly embedded into the
452current object.
453
454- `wrapper` - (Required) A function taking a buffer and returning a buffer
455  (`(x: Buffer | Uint8Array ) => Buffer | Uint8Array`) transforming the buffer
456  into a buffer expected by `type`.
457- `type` - (Required) A `Parser` object to parse the buffer returned by `wrapper`.
458- `length ` - (either `length` or `readUntil` is required) Length of the
459  buffer. Can be a number, string or a function. Use a number for statically
460  sized buffers, a string to reference another variable and a function to do some
461  calculation.
462- `readUntil` - (either `length` or `readUntil` is required) If `"eof"`, then
463  this parser will read till it reaches the end of the `Buffer`/`Uint8Array`
464  object. If it is a function, this parser will read the buffer until the
465  function returns `true`.
466
467```javascript
468const zlib = require("zlib");
469// A parser to run on the data returned by the wrapper
470const textParser = Parser.start()
471  .string("text", {
472    zeroTerminated: true,
473  });
474
475const mainParser = Parser.start()
476  // Read length of the data to wrap
477  .uint32le("length")
478  // Read wrapped data
479  .wrapped("wrappedData", {
480    // Indicate how much data to read, like buffer()
481    length: "length",
482    // Define function to pre-process the data buffer
483    wrapper: function (buffer) {
484      // E.g. decompress data and return it for further parsing
485      return zlib.inflateRawSync(buffer);
486    },
487    // The parser to run on the decompressed data
488    type: textParser,
489  });
490
491mainParser.parse(buffer);
492```
493
494### sizeOf()
495Returns how many bytes this parser consumes. If the size of the parser cannot
496be statically determined, a `NaN` is returned.
497
498### compile()
499Compile this parser on-the-fly and cache its result. Usually, there is no need
500to call this method directly, since it's called when `parse(buffer)` is
501executed for the first time.
502
503### getCode()
504Dynamically generates the code for this parser and returns it as a string.
505Useful for debugging the generated code.
506
507### Common options
508These options can be used in all parsers.
509
510- `formatter` - Function that transforms the parsed value into a more desired
511  form.
512    ```javascript
513    const parser = new Parser().array("ipv4", {
514      type: uint8,
515      length: "4",
516      formatter: function(arr) {
517        return arr.join(".");
518      }
519    });
520    ```
521
522- `assert` - Do assertion on the parsed result (useful for checking magic
523  numbers and so on). If `assert` is a `string` or `number`, the actual parsed
524  result will be compared with it with `===` (strict equality check), and an
525  exception is thrown if they mismatch. On the other hand, if `assert` is a
526  function, that function is executed with one argument (the parsed result)
527  and if it returns false, an exception is thrown.
528
529    ```javascript
530    // simple maginc number validation
531    const ClassFile = Parser.start()
532      .endianness("big")
533      .uint32("magic", { assert: 0xcafebabe });
534
535    // Doing more complex assertion with a predicate function
536    const parser = new Parser()
537      .int16le("a")
538      .int16le("b")
539      .int16le("c", {
540        assert: function(x) {
541          return this.a + this.b === x;
542        }
543      });
544    ```
545
546### Context variables
547You can use some special fields while parsing to traverse your structure.
548These context variables will be removed after the parsing process.
549Note that this feature is turned off by default for performance reasons, and
550you need to call `.useContextVars()` at the top level `Parser` to enable it.
551Otherwise, the context variables will not be present.
552
553- `$parent` - This field references the parent structure. This variable will be
554  `null` while parsing the root structure.
555
556  ```javascript
557  var parser = new Parser()
558    .useContextVars()
559    .nest("header", {
560      type: new Parser().uint32("length"),
561    })
562    .array("data", {
563      type: "int32",
564      length: function() {
565        return this.$parent.header.length;
566      }
567    });
568  ```
569
570- `$root` - This field references the root structure.
571
572  ```javascript
573  const parser = new Parser()
574    .useContextVars()
575    .nest("header", {
576      type: new Parser().uint32("length"),
577    })
578    .nest("data", {
579      type: new Parser()
580        .uint32("value")
581        .array("data", {
582          type: "int32",
583          length: function() {
584            return this.$root.header.length;
585          }
586        }),
587    });
588  ```
589
590- `$index` - This field references the actual index in array parsing. This
591  variable will be available only when using the `length` mode for arrays.
592
593  ```javascript
594  const parser = new Parser()
595    .useContextVars()
596    .nest("header", {
597      type: new Parser().uint32("length"),
598    })
599    .nest("data", {
600      type: new Parser()
601        .uint32("value")
602        .array("data", {
603          type: new Parser().nest({
604            type: new Parser().uint8("_tmp"),
605            formatter: function(item) {
606              return this.$index % 2 === 0 ? item._tmp : String.fromCharCode(item._tmp);
607            }
608          }),
609          length: "$root.header.length"
610        }),
611    });
612  ```
613
614## Examples
615
616See `example/` for real-world examples.
617
618## Benchmarks
619
620A benchmark script to compare the parsing performance with binparse, structron
621and destruct.js is available under `benchmark/`.
622
623## Contributing
624
625Please report issues to the
626[issue tracker](https://github.com/keichi/binary-parser/issues) if you have
627any difficulties using this module, found a bug, or would like to request a
628new feature. Pull requests are welcome.
629
630To contribute code, first clone this repo, then install the dependencies:
631
632```bash
633git clone https://github.com/keichi/binary-parser.git
634cd binary-parser
635npm install
636```
637
638If you added a feature or fixed a bug, update the test suite under `test/` and
639then run it like this:
640
641```bash
642npm run test
643```
644
645Make sure all the tests pass before submitting a pull request.