UNPKG

15.3 kBMarkdownView Raw
1# Acorn
2
3[![Build Status](https://travis-ci.org/marijnh/acorn.svg?branch=master)](https://travis-ci.org/marijnh/acorn)
4[![NPM version](https://img.shields.io/npm/v/acorn.svg)](https://www.npmjs.org/package/acorn)
5[Author funding status: ![maintainer happiness](https://marijnhaverbeke.nl/fund/status_s.png?force)](https://marijnhaverbeke.nl/fund/)
6
7A tiny, fast JavaScript parser, written completely in JavaScript.
8
9## Installation
10
11The easiest way to install acorn is with [`npm`][npm].
12
13[npm]: http://npmjs.org
14
15```sh
16npm install acorn
17```
18
19Alternately, download the source.
20
21```sh
22git clone https://github.com/marijnh/acorn.git
23```
24
25## Components
26
27When run in a CommonJS (node.js) or AMD environment, exported values
28appear in the interfaces exposed by the individual files, as usual.
29When loaded in the browser (Acorn works in any JS-enabled browser more
30recent than IE5) without any kind of module management, a single
31global object `acorn` will be defined, and all the exported properties
32will be added to that.
33
34### Main parser
35
36This is implemented in `dist/acorn.js`, and is what you get when you
37`require("acorn")` in node.js.
38
39**parse**`(input, options)` is used to parse a JavaScript program.
40The `input` parameter is a string, `options` can be undefined or an
41object setting some of the options listed below. The return value will
42be an abstract syntax tree object as specified by the
43[Mozilla Parser API][mozapi].
44
45When encountering a syntax error, the parser will raise a
46`SyntaxError` object with a meaningful message. The error object will
47have a `pos` property that indicates the character offset at which the
48error occurred, and a `loc` object that contains a `{line, column}`
49object referring to that same position.
50
51[mozapi]: https://developer.mozilla.org/en-US/docs/SpiderMonkey/Parser_API
52
53- **ecmaVersion**: Indicates the ECMAScript version to parse. Must be
54 either 3, 5, or 6. This influences support for strict mode, the set
55 of reserved words, and support for new syntax features. Default is 5.
56
57- **sourceType**: Indicate the mode the code should be parsed in. Can be
58 either `"script"` or `"module"`.
59
60- **onInsertedSemicolon**: If given a callback, that callback will be
61 called whenever a missing semicolon is inserted by the parser. The
62 callback will be given the character offset of the point where the
63 semicolon is inserted as argument, and if `locations` is on, also a
64 `{line, column}` object representing this position.
65
66- **onTrailingComma**: Like `onInsertedSemicolon`, but for trailing
67 commas.
68
69- **allowReserved**: If `false`, using a reserved word will generate
70 an error. Defaults to `true`. When given the value `"never"`,
71 reserved words and keywords can also not be used as property names
72 (as in Internet Explorer's old parser).
73
74- **allowReturnOutsideFunction**: By default, a return statement at
75 the top level raises an error. Set this to `true` to accept such
76 code.
77
78- **allowImportExportEverywhere**: By default, `import` and `export`
79 declarations can only appear at a program's top level. Setting this
80 option to `true` allows them anywhere where a statement is allowed.
81
82- **allowHashBang**: When this is enabled (off by default), if the
83 code starts with the characters `#!` (as in a shellscript), the
84 first line will be treated as a comment.
85
86- **locations**: When `true`, each node has a `loc` object attached
87 with `start` and `end` subobjects, each of which contains the
88 one-based line and zero-based column numbers in `{line, column}`
89 form. Default is `false`.
90
91- **onToken**: If a function is passed for this option, each found
92 token will be passed in same format as `tokenize()` returns.
93
94 If array is passed, each found token is pushed to it.
95
96 Note that you are not allowed to call the parser from the
97 callback—that will corrupt its internal state.
98
99- **onComment**: If a function is passed for this option, whenever a
100 comment is encountered the function will be called with the
101 following parameters:
102
103 - `block`: `true` if the comment is a block comment, false if it
104 is a line comment.
105 - `text`: The content of the comment.
106 - `start`: Character offset of the start of the comment.
107 - `end`: Character offset of the end of the comment.
108
109 When the `locations` options is on, the `{line, column}` locations
110 of the comment’s start and end are passed as two additional
111 parameters.
112
113 If array is passed for this option, each found comment is pushed
114 to it as object in Esprima format:
115
116 ```javascript
117 {
118 "type": "Line" | "Block",
119 "value": "comment text",
120 "range": ...,
121 "loc": ...
122 }
123 ```
124
125 Note that you are not allowed to call the parser from the
126 callback—that will corrupt its internal state.
127
128- **ranges**: Nodes have their start and end characters offsets
129 recorded in `start` and `end` properties (directly on the node,
130 rather than the `loc` object, which holds line/column data. To also
131 add a [semi-standardized][range] "range" property holding a
132 `[start, end]` array with the same numbers, set the `ranges` option
133 to `true`.
134
135- **program**: It is possible to parse multiple files into a single
136 AST by passing the tree produced by parsing the first file as the
137 `program` option in subsequent parses. This will add the toplevel
138 forms of the parsed file to the "Program" (top) node of an existing
139 parse tree.
140
141- **sourceFile**: When the `locations` option is `true`, you can pass
142 this option to add a `source` attribute in every node’s `loc`
143 object. Note that the contents of this option are not examined or
144 processed in any way; you are free to use whatever format you
145 choose.
146
147- **directSourceFile**: Like `sourceFile`, but a `sourceFile` property
148 will be added directly to the nodes, rather than the `loc` object.
149
150- **preserveParens**: If this option is `true`, parenthesized expressions
151 are represented by (non-standard) `ParenthesizedExpression` nodes
152 that have a single `expression` property containing the expression
153 inside parentheses.
154
155[range]: https://bugzilla.mozilla.org/show_bug.cgi?id=745678
156
157**parseExpressionAt**`(input, offset, options)` will parse a single
158expression in a string, and return its AST. It will not complain if
159there is more of the string left after the expression.
160
161**getLineInfo**`(input, offset)` can be used to get a `{line,
162column}` object for a given program string and character offset.
163
164**tokenizer**`(input, options)` returns an object with a `getToken`
165method that can be called repeatedly to get the next token, a `{start,
166end, type, value}` object (with added `loc` property when the
167`locations` option is enabled and `range` property when the `ranges`
168option is enabled). When the token's type is `tokTypes.eof`, you
169should stop calling the method, since it will keep returning that same
170token forever.
171
172In ES6 environment, returned result can be used as any other
173protocol-compliant iterable:
174
175```javascript
176for (let token of acorn.tokenize(str)) {
177 // iterate over the tokens
178}
179
180// transform code to array of tokens:
181var tokens = [...acorn.tokenize(str)];
182```
183
184**tokTypes** holds an object mapping names to the token type objects
185that end up in the `type` properties of tokens.
186
187#### Note on using with [Escodegen][escodegen]
188
189Escodegen supports generating comments from AST, attached in
190Esprima-specific format. In order to simulate same format in
191Acorn, consider following example:
192
193```javascript
194var comments = [], tokens = [];
195
196var ast = acorn.parse('var x = 42; // answer', {
197 // collect ranges for each node
198 ranges: true,
199 // collect comments in Esprima's format
200 onComment: comments,
201 // collect token ranges
202 onToken: tokens
203});
204
205// attach comments using collected information
206escodegen.attachComments(ast, comments, tokens);
207
208// generate code
209console.log(escodegen.generate(ast, {comment: true}));
210// > 'var x = 42; // answer'
211```
212
213[escodegen]: https://github.com/Constellation/escodegen
214
215#### Using Acorn in an environment with a Content Security Policy
216
217Some contexts, such as Chrome Web Apps, disallow run-time code evaluation.
218Acorn uses `new Function` to generate fast functions that test whether
219a word is in a given set, and will trigger a security error when used
220in a context with such a
221[Content Security Policy](http://www.html5rocks.com/en/tutorials/security/content-security-policy/#eval-too)
222(see [#90](https://github.com/marijnh/acorn/issues/90) and
223[#123](https://github.com/marijnh/acorn/issues/123)).
224
225The `dist/acorn_csp.js` file in the distribution (which is built
226by the `bin/without_eval` script) has the generated code inlined, and
227can thus run without evaluating anything.
228
229### dist/acorn_loose.js ###
230
231This file implements an error-tolerant parser. It exposes a single
232function.
233
234**parse_dammit**`(input, options)` takes the same arguments and
235returns the same syntax tree as the `parse` function in `acorn.js`,
236but never raises an error, and will do its best to parse syntactically
237invalid code in as meaningful a way as it can. It'll insert identifier
238nodes with name `"✖"` as placeholders in places where it can't make
239sense of the input. Depends on `acorn.js`, because it uses the same
240tokenizer.
241
242### dist/walk.js ###
243
244Implements an abstract syntax tree walker. Will store its interface in
245`acorn.walk` when loaded without a module system.
246
247**simple**`(node, visitors, base, state)` does a 'simple' walk over
248a tree. `node` should be the AST node to walk, and `visitors` an
249object with properties whose names correspond to node types in the
250[Mozilla Parser API][mozapi]. The properties should contain functions
251that will be called with the node object and, if applicable the state
252at that point. The last two arguments are optional. `base` is a walker
253algorithm, and `state` is a start state. The default walker will
254simply visit all statements and expressions and not produce a
255meaningful state. (An example of a use of state it to track scope at
256each point in the tree.)
257
258**ancestor**`(node, visitors, base, state)` does a 'simple' walk over
259a tree, building up an array of ancestor nodes (including the current node)
260and passing the array to callbacks in the `state` parameter.
261
262**recursive**`(node, state, functions, base)` does a 'recursive'
263walk, where the walker functions are responsible for continuing the
264walk on the child nodes of their target node. `state` is the start
265state, and `functions` should contain an object that maps node types
266to walker functions. Such functions are called with `(node, state, c)`
267arguments, and can cause the walk to continue on a sub-node by calling
268the `c` argument on it with `(node, state)` arguments. The optional
269`base` argument provides the fallback walker functions for node types
270that aren't handled in the `functions` object. If not given, the
271default walkers will be used.
272
273**make**`(functions, base)` builds a new walker object by using the
274walker functions in `functions` and filling in the missing ones by
275taking defaults from `base`.
276
277**findNodeAt**`(node, start, end, test, base, state)` tries to
278locate a node in a tree at the given start and/or end offsets, which
279satisfies the predicate `test`. `start` end `end` can be either `null`
280(as wildcard) or a number. `test` may be a string (indicating a node
281type) or a function that takes `(nodeType, node)` arguments and
282returns a boolean indicating whether this node is interesting. `base`
283and `state` are optional, and can be used to specify a custom walker.
284Nodes are tested from inner to outer, so if two nodes match the
285boundaries, the inner one will be preferred.
286
287**findNodeAround**`(node, pos, test, base, state)` is a lot like
288`findNodeAt`, but will match any node that exists 'around' (spanning)
289the given position.
290
291**findNodeAfter**`(node, pos, test, base, state)` is similar to
292`findNodeAround`, but will match all nodes *after* the given position
293(testing outer nodes before inner nodes).
294
295## Command line interface
296
297The `bin/acorn` utility can be used to parse a file from the command
298line. It accepts as arguments its input file and the following
299options:
300
301- `--ecma3|--ecma5|--ecma6`: Sets the ECMAScript version to parse. Default is
302 version 5.
303
304- `--locations`: Attaches a "loc" object to each node with "start" and
305 "end" subobjects, each of which contains the one-based line and
306 zero-based column numbers in `{line, column}` form.
307
308- `--compact`: No whitespace is used in the AST output.
309
310- `--silent`: Do not output the AST, just return the exit status.
311
312- `--help`: Print the usage information and quit.
313
314The utility spits out the syntax tree as JSON data.
315
316## Build system
317
318Acorn is written in ECMAScript 6, as a set of small modules, in the
319project's `src` directory, and compiled down to bigger ECMAScript 3
320files in `dist` using [Browserify](http://browserify.org) and
321[Babel](http://babeljs.io/). If you are already using Babel, you can
322consider including the modules directly.
323
324The command-line test runner (`npm test`) uses the ES6 modules. The
325browser-based test page (`test/index.html`) uses the compiled modules.
326The `bin/build-acorn.js` script builds the latter from the former.
327
328If you are working on Acorn, you'll probably want to try the code out
329directly, without an intermediate build step. In your scripts, you can
330register the Babel require shim like this:
331
332 require("babelify/node_modules/babel-core/register")
333
334That will allow you to directly `require` the ES6 modules.
335
336## Plugins
337
338Acorn is designed support allow plugins which, within reasonable
339bounds, redefine the way the parser works. Plugins can add new token
340types and new tokenizer contexts (if necessary), and extend methods in
341the parser object. This is not a clean, elegant API—using it requires
342an understanding of Acorn's internals, and plugins are likely to break
343whenever those internals are significantly changed. But still, it is
344_possible_, in this way, to create parsers for JavaScript dialects
345without forking all of Acorn. And in principle it is even possible to
346combine such plugins, so that if you have, for example, a plugin for
347parsing types and a plugin for parsing JSX-style XML literals, you
348could load them both and parse code with both JSX tags and types.
349
350A plugin should register itself by adding a property to
351`acorn.plugins`, which holds a function. Calling `acorn.parse`, a
352`plugin` option can be passed, holding an object mapping plugin names
353to configuration values (or just `true` for plugins that don't take
354options). After the parser object has been created, the initialization
355functions for the chosen plugins are called with `(parser,
356configValue)` arguments. They are expected to use the `parser.extend`
357method to extend parser methods. For example, the `readToken` method
358could be extended like this:
359
360```javascript
361parser.extend("readToken", function(nextMethod) {
362 return function(code) {
363 console.log("Reading a token!")
364 return nextMethod.call(this, code)
365 }
366})
367```
368
369The `nextMethod` argument passed to `extend`'s second argument is the
370previous value of this method, and should usually be called through to
371whenever the extended method does not handle the call itself.
372
373There is a proof-of-concept JSX plugin in the [`jsx`
374branch](https://github.com/marijnh/acorn/tree/jsx) branch of the
375Github repository.