1 | # Acorn
|
2 |
|
3 | [![Build Status](https://travis-ci.org/marijnh/acorn.svg?branch=master)](https://travis-ci.org/marijnh/acorn)
|
4 | [![NPM version](https://img.shields.io/npm/v/acorn.svg)](https://www.npmjs.org/package/acorn)
|
5 | [Author funding status: ![maintainer happiness](https://marijnhaverbeke.nl/fund/status_s.png?force)](https://marijnhaverbeke.nl/fund/)
|
6 |
|
7 | A tiny, fast JavaScript parser, written completely in JavaScript.
|
8 |
|
9 | ## Installation
|
10 |
|
11 | The easiest way to install acorn is with [`npm`][npm].
|
12 |
|
13 | [npm]: http://npmjs.org
|
14 |
|
15 | ```sh
|
16 | npm install acorn
|
17 | ```
|
18 |
|
19 | Alternately, download the source.
|
20 |
|
21 | ```sh
|
22 | git clone https://github.com/marijnh/acorn.git
|
23 | ```
|
24 |
|
25 | ## Components
|
26 |
|
27 | When run in a CommonJS (node.js) or AMD environment, exported values
|
28 | appear in the interfaces exposed by the individual files, as usual.
|
29 | When loaded in the browser (Acorn works in any JS-enabled browser more
|
30 | recent than IE5) without any kind of module management, a single
|
31 | global object `acorn` will be defined, and all the exported properties
|
32 | will be added to that.
|
33 |
|
34 | ### Main parser
|
35 |
|
36 | This is implemented in `dist/acorn.js`, and is what you get when you
|
37 | `require("acorn")` in node.js.
|
38 |
|
39 | **parse**`(input, options)` is used to parse a JavaScript program.
|
40 | The `input` parameter is a string, `options` can be undefined or an
|
41 | object setting some of the options listed below. The return value will
|
42 | be an abstract syntax tree object as specified by the
|
43 | [Mozilla Parser API][mozapi].
|
44 |
|
45 | When encountering a syntax error, the parser will raise a
|
46 | `SyntaxError` object with a meaningful message. The error object will
|
47 | have a `pos` property that indicates the character offset at which the
|
48 | error occurred, and a `loc` object that contains a `{line, column}`
|
49 | object referring to that same position.
|
50 |
|
51 | [mozapi]: https://developer.mozilla.org/en-US/docs/SpiderMonkey/Parser_API
|
52 |
|
53 | - **ecmaVersion**: Indicates the ECMAScript version to parse. Must be
|
54 | either 3, 5, or 6. This influences support for strict mode, the set
|
55 | of reserved words, and support for new syntax features. Default is 5.
|
56 |
|
57 | - **sourceType**: Indicate the mode the code should be parsed in. Can be
|
58 | either `"script"` or `"module"`.
|
59 |
|
60 | - **onInsertedSemicolon**: If given a callback, that callback will be
|
61 | called whenever a missing semicolon is inserted by the parser. The
|
62 | callback will be given the character offset of the point where the
|
63 | semicolon is inserted as argument, and if `locations` is on, also a
|
64 | `{line, column}` object representing this position.
|
65 |
|
66 | - **onTrailingComma**: Like `onInsertedSemicolon`, but for trailing
|
67 | commas.
|
68 |
|
69 | - **allowReserved**: If `false`, using a reserved word will generate
|
70 | an error. Defaults to `true`. When given the value `"never"`,
|
71 | reserved words and keywords can also not be used as property names
|
72 | (as in Internet Explorer's old parser).
|
73 |
|
74 | - **allowReturnOutsideFunction**: By default, a return statement at
|
75 | the top level raises an error. Set this to `true` to accept such
|
76 | code.
|
77 |
|
78 | - **allowImportExportEverywhere**: By default, `import` and `export`
|
79 | declarations can only appear at a program's top level. Setting this
|
80 | option to `true` allows them anywhere where a statement is allowed.
|
81 |
|
82 | - **allowHashBang**: When this is enabled (off by default), if the
|
83 | code starts with the characters `#!` (as in a shellscript), the
|
84 | first line will be treated as a comment.
|
85 |
|
86 | - **locations**: When `true`, each node has a `loc` object attached
|
87 | with `start` and `end` subobjects, each of which contains the
|
88 | one-based line and zero-based column numbers in `{line, column}`
|
89 | form. Default is `false`.
|
90 |
|
91 | - **onToken**: If a function is passed for this option, each found
|
92 | token will be passed in same format as `tokenize()` returns.
|
93 |
|
94 | If array is passed, each found token is pushed to it.
|
95 |
|
96 | Note that you are not allowed to call the parser from the
|
97 | callback—that will corrupt its internal state.
|
98 |
|
99 | - **onComment**: If a function is passed for this option, whenever a
|
100 | comment is encountered the function will be called with the
|
101 | following parameters:
|
102 |
|
103 | - `block`: `true` if the comment is a block comment, false if it
|
104 | is a line comment.
|
105 | - `text`: The content of the comment.
|
106 | - `start`: Character offset of the start of the comment.
|
107 | - `end`: Character offset of the end of the comment.
|
108 |
|
109 | When the `locations` options is on, the `{line, column}` locations
|
110 | of the comment’s start and end are passed as two additional
|
111 | parameters.
|
112 |
|
113 | If array is passed for this option, each found comment is pushed
|
114 | to it as object in Esprima format:
|
115 |
|
116 | ```javascript
|
117 | {
|
118 | "type": "Line" | "Block",
|
119 | "value": "comment text",
|
120 | "range": ...,
|
121 | "loc": ...
|
122 | }
|
123 | ```
|
124 |
|
125 | Note that you are not allowed to call the parser from the
|
126 | callback—that will corrupt its internal state.
|
127 |
|
128 | - **ranges**: Nodes have their start and end characters offsets
|
129 | recorded in `start` and `end` properties (directly on the node,
|
130 | rather than the `loc` object, which holds line/column data. To also
|
131 | add a [semi-standardized][range] "range" property holding a
|
132 | `[start, end]` array with the same numbers, set the `ranges` option
|
133 | to `true`.
|
134 |
|
135 | - **program**: It is possible to parse multiple files into a single
|
136 | AST by passing the tree produced by parsing the first file as the
|
137 | `program` option in subsequent parses. This will add the toplevel
|
138 | forms of the parsed file to the "Program" (top) node of an existing
|
139 | parse tree.
|
140 |
|
141 | - **sourceFile**: When the `locations` option is `true`, you can pass
|
142 | this option to add a `source` attribute in every node’s `loc`
|
143 | object. Note that the contents of this option are not examined or
|
144 | processed in any way; you are free to use whatever format you
|
145 | choose.
|
146 |
|
147 | - **directSourceFile**: Like `sourceFile`, but a `sourceFile` property
|
148 | will be added directly to the nodes, rather than the `loc` object.
|
149 |
|
150 | - **preserveParens**: If this option is `true`, parenthesized expressions
|
151 | are represented by (non-standard) `ParenthesizedExpression` nodes
|
152 | that have a single `expression` property containing the expression
|
153 | inside parentheses.
|
154 |
|
155 | [range]: https://bugzilla.mozilla.org/show_bug.cgi?id=745678
|
156 |
|
157 | **parseExpressionAt**`(input, offset, options)` will parse a single
|
158 | expression in a string, and return its AST. It will not complain if
|
159 | there is more of the string left after the expression.
|
160 |
|
161 | **getLineInfo**`(input, offset)` can be used to get a `{line,
|
162 | column}` object for a given program string and character offset.
|
163 |
|
164 | **tokenizer**`(input, options)` returns an object with a `getToken`
|
165 | method that can be called repeatedly to get the next token, a `{start,
|
166 | end, type, value}` object (with added `loc` property when the
|
167 | `locations` option is enabled and `range` property when the `ranges`
|
168 | option is enabled). When the token's type is `tokTypes.eof`, you
|
169 | should stop calling the method, since it will keep returning that same
|
170 | token forever.
|
171 |
|
172 | In ES6 environment, returned result can be used as any other
|
173 | protocol-compliant iterable:
|
174 |
|
175 | ```javascript
|
176 | for (let token of acorn.tokenize(str)) {
|
177 | // iterate over the tokens
|
178 | }
|
179 |
|
180 | // transform code to array of tokens:
|
181 | var tokens = [...acorn.tokenize(str)];
|
182 | ```
|
183 |
|
184 | **tokTypes** holds an object mapping names to the token type objects
|
185 | that end up in the `type` properties of tokens.
|
186 |
|
187 | #### Note on using with [Escodegen][escodegen]
|
188 |
|
189 | Escodegen supports generating comments from AST, attached in
|
190 | Esprima-specific format. In order to simulate same format in
|
191 | Acorn, consider following example:
|
192 |
|
193 | ```javascript
|
194 | var comments = [], tokens = [];
|
195 |
|
196 | var ast = acorn.parse('var x = 42; // answer', {
|
197 | // collect ranges for each node
|
198 | ranges: true,
|
199 | // collect comments in Esprima's format
|
200 | onComment: comments,
|
201 | // collect token ranges
|
202 | onToken: tokens
|
203 | });
|
204 |
|
205 | // attach comments using collected information
|
206 | escodegen.attachComments(ast, comments, tokens);
|
207 |
|
208 | // generate code
|
209 | console.log(escodegen.generate(ast, {comment: true}));
|
210 | // > 'var x = 42; // answer'
|
211 | ```
|
212 |
|
213 | [escodegen]: https://github.com/Constellation/escodegen
|
214 |
|
215 | #### Using Acorn in an environment with a Content Security Policy
|
216 |
|
217 | Some contexts, such as Chrome Web Apps, disallow run-time code evaluation.
|
218 | Acorn uses `new Function` to generate fast functions that test whether
|
219 | a word is in a given set, and will trigger a security error when used
|
220 | in a context with such a
|
221 | [Content Security Policy](http://www.html5rocks.com/en/tutorials/security/content-security-policy/#eval-too)
|
222 | (see [#90](https://github.com/marijnh/acorn/issues/90) and
|
223 | [#123](https://github.com/marijnh/acorn/issues/123)).
|
224 |
|
225 | The `dist/acorn_csp.js` file in the distribution (which is built
|
226 | by the `bin/without_eval` script) has the generated code inlined, and
|
227 | can thus run without evaluating anything.
|
228 |
|
229 | ### dist/acorn_loose.js ###
|
230 |
|
231 | This file implements an error-tolerant parser. It exposes a single
|
232 | function.
|
233 |
|
234 | **parse_dammit**`(input, options)` takes the same arguments and
|
235 | returns the same syntax tree as the `parse` function in `acorn.js`,
|
236 | but never raises an error, and will do its best to parse syntactically
|
237 | invalid code in as meaningful a way as it can. It'll insert identifier
|
238 | nodes with name `"✖"` as placeholders in places where it can't make
|
239 | sense of the input. Depends on `acorn.js`, because it uses the same
|
240 | tokenizer.
|
241 |
|
242 | ### dist/walk.js ###
|
243 |
|
244 | Implements an abstract syntax tree walker. Will store its interface in
|
245 | `acorn.walk` when loaded without a module system.
|
246 |
|
247 | **simple**`(node, visitors, base, state)` does a 'simple' walk over
|
248 | a tree. `node` should be the AST node to walk, and `visitors` an
|
249 | object with properties whose names correspond to node types in the
|
250 | [Mozilla Parser API][mozapi]. The properties should contain functions
|
251 | that will be called with the node object and, if applicable the state
|
252 | at that point. The last two arguments are optional. `base` is a walker
|
253 | algorithm, and `state` is a start state. The default walker will
|
254 | simply visit all statements and expressions and not produce a
|
255 | meaningful state. (An example of a use of state it to track scope at
|
256 | each point in the tree.)
|
257 |
|
258 | **ancestor**`(node, visitors, base, state)` does a 'simple' walk over
|
259 | a tree, building up an array of ancestor nodes (including the current node)
|
260 | and passing the array to callbacks in the `state` parameter.
|
261 |
|
262 | **recursive**`(node, state, functions, base)` does a 'recursive'
|
263 | walk, where the walker functions are responsible for continuing the
|
264 | walk on the child nodes of their target node. `state` is the start
|
265 | state, and `functions` should contain an object that maps node types
|
266 | to walker functions. Such functions are called with `(node, state, c)`
|
267 | arguments, and can cause the walk to continue on a sub-node by calling
|
268 | the `c` argument on it with `(node, state)` arguments. The optional
|
269 | `base` argument provides the fallback walker functions for node types
|
270 | that aren't handled in the `functions` object. If not given, the
|
271 | default walkers will be used.
|
272 |
|
273 | **make**`(functions, base)` builds a new walker object by using the
|
274 | walker functions in `functions` and filling in the missing ones by
|
275 | taking defaults from `base`.
|
276 |
|
277 | **findNodeAt**`(node, start, end, test, base, state)` tries to
|
278 | locate a node in a tree at the given start and/or end offsets, which
|
279 | satisfies the predicate `test`. `start` end `end` can be either `null`
|
280 | (as wildcard) or a number. `test` may be a string (indicating a node
|
281 | type) or a function that takes `(nodeType, node)` arguments and
|
282 | returns a boolean indicating whether this node is interesting. `base`
|
283 | and `state` are optional, and can be used to specify a custom walker.
|
284 | Nodes are tested from inner to outer, so if two nodes match the
|
285 | boundaries, the inner one will be preferred.
|
286 |
|
287 | **findNodeAround**`(node, pos, test, base, state)` is a lot like
|
288 | `findNodeAt`, but will match any node that exists 'around' (spanning)
|
289 | the given position.
|
290 |
|
291 | **findNodeAfter**`(node, pos, test, base, state)` is similar to
|
292 | `findNodeAround`, but will match all nodes *after* the given position
|
293 | (testing outer nodes before inner nodes).
|
294 |
|
295 | ## Command line interface
|
296 |
|
297 | The `bin/acorn` utility can be used to parse a file from the command
|
298 | line. It accepts as arguments its input file and the following
|
299 | options:
|
300 |
|
301 | - `--ecma3|--ecma5|--ecma6`: Sets the ECMAScript version to parse. Default is
|
302 | version 5.
|
303 |
|
304 | - `--locations`: Attaches a "loc" object to each node with "start" and
|
305 | "end" subobjects, each of which contains the one-based line and
|
306 | zero-based column numbers in `{line, column}` form.
|
307 |
|
308 | - `--allow-hash-bang`: If the code starts with the characters #! (as in a shellscript), the first line will be treated as a comment.
|
309 |
|
310 | - `--compact`: No whitespace is used in the AST output.
|
311 |
|
312 | - `--silent`: Do not output the AST, just return the exit status.
|
313 |
|
314 | - `--help`: Print the usage information and quit.
|
315 |
|
316 | The utility spits out the syntax tree as JSON data.
|
317 |
|
318 | ## Build system
|
319 |
|
320 | Acorn is written in ECMAScript 6, as a set of small modules, in the
|
321 | project's `src` directory, and compiled down to bigger ECMAScript 3
|
322 | files in `dist` using [Browserify](http://browserify.org) and
|
323 | [Babel](http://babeljs.io/). If you are already using Babel, you can
|
324 | consider including the modules directly.
|
325 |
|
326 | The command-line test runner (`npm test`) uses the ES6 modules. The
|
327 | browser-based test page (`test/index.html`) uses the compiled modules.
|
328 | The `bin/build-acorn.js` script builds the latter from the former.
|
329 |
|
330 | If you are working on Acorn, you'll probably want to try the code out
|
331 | directly, without an intermediate build step. In your scripts, you can
|
332 | register the Babel require shim like this:
|
333 |
|
334 | require("babelify/node_modules/babel-core/register")
|
335 |
|
336 | That will allow you to directly `require` the ES6 modules.
|
337 |
|
338 | ## Plugins
|
339 |
|
340 | Acorn is designed support allow plugins which, within reasonable
|
341 | bounds, redefine the way the parser works. Plugins can add new token
|
342 | types and new tokenizer contexts (if necessary), and extend methods in
|
343 | the parser object. This is not a clean, elegant API—using it requires
|
344 | an understanding of Acorn's internals, and plugins are likely to break
|
345 | whenever those internals are significantly changed. But still, it is
|
346 | _possible_, in this way, to create parsers for JavaScript dialects
|
347 | without forking all of Acorn. And in principle it is even possible to
|
348 | combine such plugins, so that if you have, for example, a plugin for
|
349 | parsing types and a plugin for parsing JSX-style XML literals, you
|
350 | could load them both and parse code with both JSX tags and types.
|
351 |
|
352 | A plugin should register itself by adding a property to
|
353 | `acorn.plugins`, which holds a function. Calling `acorn.parse`, a
|
354 | `plugin` option can be passed, holding an object mapping plugin names
|
355 | to configuration values (or just `true` for plugins that don't take
|
356 | options). After the parser object has been created, the initialization
|
357 | functions for the chosen plugins are called with `(parser,
|
358 | configValue)` arguments. They are expected to use the `parser.extend`
|
359 | method to extend parser methods. For example, the `readToken` method
|
360 | could be extended like this:
|
361 |
|
362 | ```javascript
|
363 | parser.extend("readToken", function(nextMethod) {
|
364 | return function(code) {
|
365 | console.log("Reading a token!")
|
366 | return nextMethod.call(this, code)
|
367 | }
|
368 | })
|
369 | ```
|
370 |
|
371 | The `nextMethod` argument passed to `extend`'s second argument is the
|
372 | previous value of this method, and should usually be called through to
|
373 | whenever the extended method does not handle the call itself.
|
374 |
|
375 | There is a proof-of-concept JSX plugin in the [`jsx`
|
376 | branch](https://github.com/marijnh/acorn/tree/jsx) branch of the
|
377 | Github repository.
|