1 | # ES Module Lexer
|
2 |
|
3 | [![Build Status][actions-image]][actions-url]
|
4 |
|
5 | A JS module syntax lexer used in [es-module-shims](https://github.com/guybedford/es-module-shims).
|
6 |
|
7 | Outputs the list of exports and locations of import specifiers, including dynamic import and import meta handling.
|
8 |
|
9 | A very small single JS file (4KiB gzipped) that includes inlined Web Assembly for very fast source analysis of ECMAScript module syntax only.
|
10 |
|
11 | For an example of the performance, Angular 1 (720KiB) is fully parsed in 5ms, in comparison to the fastest JS parser, Acorn which takes over 100ms.
|
12 |
|
13 | _Comprehensively handles the JS language grammar while remaining small and fast. - ~10ms per MB of JS cold and ~5ms per MB of JS warm, [see benchmarks](#benchmarks) for more info._
|
14 |
|
15 | ### Usage
|
16 |
|
17 | ```
|
18 | npm install es-module-lexer
|
19 | ```
|
20 |
|
21 | For use in CommonJS:
|
22 |
|
23 | ```js
|
24 | const { init, parse } = require('es-module-lexer');
|
25 |
|
26 | (async () => {
|
27 | // either await init, or call parse asynchronously
|
28 | // this is necessary for the Web Assembly boot
|
29 | await init;
|
30 |
|
31 | const [imports, exports] = parse('export var p = 5');
|
32 | exports[0] === 'p';
|
33 | })();
|
34 | ```
|
35 |
|
36 | An ES module version is also available:
|
37 |
|
38 | ```js
|
39 | import { init, parse } from 'es-module-lexer';
|
40 |
|
41 | (async () => {
|
42 | await init;
|
43 |
|
44 | const source = `
|
45 | import { name } from 'mod\\u1011';
|
46 | import json from './json.json' assert { type: 'json' }
|
47 | export var p = 5;
|
48 | export function q () {
|
49 |
|
50 | };
|
51 |
|
52 | // Comments provided to demonstrate edge cases
|
53 | import /*comment!*/ ( 'asdf', { assert: { type: 'json' }});
|
54 | import /*comment!*/.meta.asdf;
|
55 | `;
|
56 |
|
57 | const [imports, exports] = parse(source, 'optional-sourcename');
|
58 |
|
59 | // Returns "modထ"
|
60 | imports[0].n
|
61 | // Returns "mod\u1011"
|
62 | source.substring(imports[0].s, imports[0].e);
|
63 | // "s" = start
|
64 | // "e" = end
|
65 |
|
66 | // Returns "import { name } from 'mod'"
|
67 | source.substring(imports[0].ss, imports[0].se);
|
68 | // "ss" = statement start
|
69 | // "se" = statement end
|
70 |
|
71 | // Returns "{ type: 'json' }"
|
72 | source.substring(imports[1].a, imports[1].se);
|
73 | // "a" = assert, -1 for no assertion
|
74 |
|
75 | // Returns "p,q"
|
76 | exports.toString();
|
77 |
|
78 | // Dynamic imports are indicated by imports[2].d > -1
|
79 | // In this case the "d" index is the start of the dynamic import bracket
|
80 | // Returns true
|
81 | imports[2].d > -1;
|
82 |
|
83 | // Returns "asdf" (only for string literal dynamic imports)
|
84 | imports[2].n
|
85 | // Returns "import /*comment!*/ ( 'asdf', { assert: { type: 'json' } })"
|
86 | source.substring(imports[2].ss, imports[2].se);
|
87 | // Returns "'asdf'"
|
88 | source.substring(imports[2].s, imports[2].e);
|
89 | // Returns "( 'asdf', { assert: { type: 'json' } })"
|
90 | source.substring(imports[2].d, imports[2].se);
|
91 | // Returns "{ assert: { type: 'json' } }"
|
92 | source.substring(imports[2].a, imports[2].se - 1);
|
93 |
|
94 | // For non-string dynamic import expressions:
|
95 | // - n will be undefined
|
96 | // - a is currently -1 even if there is an assertion
|
97 | // - e is currently the character before the closing )
|
98 |
|
99 | // For nested dynamic imports, the se value of the outer import is -1 as end tracking does not
|
100 | // currently support nested dynamic immports
|
101 |
|
102 | // import.meta is indicated by imports[2].d === -2
|
103 | // Returns true
|
104 | imports[2].d === -2;
|
105 | // Returns "import /*comment!*/.meta"
|
106 | source.substring(imports[2].s, imports[2].e);
|
107 | // ss and se are the same for import meta
|
108 | })();
|
109 | ```
|
110 |
|
111 | ### CSP asm.js Build
|
112 |
|
113 | The default version of the library uses Wasm and (safe) eval usage for performance and a minimal footprint.
|
114 |
|
115 | Neither of these represent security escalation possibilities since there are no execution string injection vectors, but that can still violate existing CSP policies for applications.
|
116 |
|
117 | For a version that works with CSP eval disabled, use the `es-module-lexer/js` build:
|
118 |
|
119 | ```js
|
120 | import { parse } from 'es-module-lexer/js';
|
121 | ```
|
122 |
|
123 | Instead of Web Assembly, this uses an asm.js build which is almost as fast as the Wasm version ([see benchmarks below](#benchmarks)).
|
124 |
|
125 | ### Escape Sequences
|
126 |
|
127 | To handle escape sequences in specifier strings, the `.n` field of imported specifiers will be provided where possible.
|
128 |
|
129 | For dynamic import expressions, this field will be empty if not a valid JS string.
|
130 |
|
131 | ### Facade Detection
|
132 |
|
133 | Facade modules that only use import / export syntax can be detected via the third return value:
|
134 |
|
135 | ```js
|
136 | const [,, facade] = parse(`
|
137 | export * from 'external';
|
138 | import * as ns from 'external2';
|
139 | export { a as b } from 'external3';
|
140 | export { ns };
|
141 | `);
|
142 | facade === true;
|
143 | ```
|
144 |
|
145 | ### Environment Support
|
146 |
|
147 | Node.js 10+, and [all browsers with Web Assembly support](https://caniuse.com/#feat=wasm).
|
148 |
|
149 | ### Grammar Support
|
150 |
|
151 | * Token state parses all line comments, block comments, strings, template strings, blocks, parens and punctuators.
|
152 | * Division operator / regex token ambiguity is handled via backtracking checks against punctuator prefixes, including closing brace or paren backtracking.
|
153 | * Always correctly parses valid JS source, but may parse invalid JS source without errors.
|
154 |
|
155 | ### Limitations
|
156 |
|
157 | The lexing approach is designed to deal with the full language grammar including RegEx / division operator ambiguity through backtracking and paren / brace tracking.
|
158 |
|
159 | The only limitation to the reduced parser is that the "exports" list may not correctly gather all export identifiers in the following edge cases:
|
160 |
|
161 | ```js
|
162 | // Only "a" is detected as an export, "q" isn't
|
163 | export var a = 'asdf', q = z;
|
164 |
|
165 | // "b" is not detected as an export
|
166 | export var { a: b } = asdf;
|
167 | ```
|
168 |
|
169 | The above cases are handled gracefully in that the lexer will keep going fine, it will just not properly detect the export names above.
|
170 |
|
171 | ### Benchmarks
|
172 |
|
173 | Benchmarks can be run with `npm run bench`.
|
174 |
|
175 | Current results for a high spec machine:
|
176 |
|
177 | #### Wasm Build
|
178 |
|
179 | ```
|
180 | Module load time
|
181 | > 5ms
|
182 | Cold Run, All Samples
|
183 | test/samples/*.js (3123 KiB)
|
184 | > 20ms
|
185 |
|
186 | Warm Runs (average of 25 runs)
|
187 | test/samples/angular.js (739 KiB)
|
188 | > 2.12ms
|
189 | test/samples/angular.min.js (188 KiB)
|
190 | > 1ms
|
191 | test/samples/d3.js (508 KiB)
|
192 | > 3.04ms
|
193 | test/samples/d3.min.js (274 KiB)
|
194 | > 2ms
|
195 | test/samples/magic-string.js (35 KiB)
|
196 | > 0ms
|
197 | test/samples/magic-string.min.js (20 KiB)
|
198 | > 0ms
|
199 | test/samples/rollup.js (929 KiB)
|
200 | > 4.04ms
|
201 | test/samples/rollup.min.js (429 KiB)
|
202 | > 2.16ms
|
203 |
|
204 | Warm Runs, All Samples (average of 25 runs)
|
205 | test/samples/*.js (3123 KiB)
|
206 | > 14.4ms
|
207 | ```
|
208 |
|
209 | #### JS Build (asm.js)
|
210 |
|
211 | ```
|
212 | Module load time
|
213 | > 2ms
|
214 | Cold Run, All Samples
|
215 | test/samples/*.js (3123 KiB)
|
216 | > 35ms
|
217 |
|
218 | Warm Runs (average of 25 runs)
|
219 | test/samples/angular.js (739 KiB)
|
220 | > 3ms
|
221 | test/samples/angular.min.js (188 KiB)
|
222 | > 1.08ms
|
223 | test/samples/d3.js (508 KiB)
|
224 | > 3.04ms
|
225 | test/samples/d3.min.js (274 KiB)
|
226 | > 2ms
|
227 | test/samples/magic-string.js (35 KiB)
|
228 | > 0ms
|
229 | test/samples/magic-string.min.js (20 KiB)
|
230 | > 0ms
|
231 | test/samples/rollup.js (929 KiB)
|
232 | > 5.04ms
|
233 | test/samples/rollup.min.js (429 KiB)
|
234 | > 3ms
|
235 |
|
236 | Warm Runs, All Samples (average of 25 runs)
|
237 | test/samples/*.js (3123 KiB)
|
238 | > 17ms
|
239 | ```
|
240 |
|
241 | ### Building
|
242 |
|
243 | This project uses [Chomp](https://chompbuild.com) for building.
|
244 |
|
245 | With Chomp installed, download the WASI SDK 12.0 from https://github.com/WebAssembly/wasi-sdk/releases/tag/wasi-sdk-12.
|
246 |
|
247 | - [Linux](https://github.com/WebAssembly/wasi-sdk/releases/download/wasi-sdk-12/wasi-sdk-12.0-linux.tar.gz)
|
248 | - [Windows (MinGW)](https://github.com/WebAssembly/wasi-sdk/releases/download/wasi-sdk-12/wasi-sdk-12.0-mingw.tar.gz)
|
249 | - [macOS](https://github.com/WebAssembly/wasi-sdk/releases/download/wasi-sdk-12/wasi-sdk-12.0-macos.tar.gz)
|
250 |
|
251 | Locate the WASI-SDK as a sibling folder, or customize the path via the `WASI_PATH` environment variable.
|
252 |
|
253 | Emscripten emsdk is also assumed to be a sibling folder or via the `EMSDK_PATH` environment variable.
|
254 |
|
255 | Example setup:
|
256 |
|
257 | ```
|
258 | git clone https://github.com:guybedford/es-module-lexer
|
259 | git clone https://github.com/emscripten-core/emsdk
|
260 | wget https://github.com/WebAssembly/wasi-sdk/releases/download/wasi-sdk-12/wasi-sdk-12.0-linux.tar.gz
|
261 | gunzip wasi-sdk-12.0-linux.tar.gz
|
262 | tar -xf wasi-sdk-12.0-linux.tar
|
263 | mv wasi-sdk-12.0-linux.tar wasi-sdk-12.0
|
264 | cargo install chompbuild
|
265 | cd es-module-lexer
|
266 | chomp test
|
267 | ```
|
268 |
|
269 | For the `asm.js` build, git clone `emsdk` from is assumed to be a sibling folder as well.
|
270 |
|
271 | ### License
|
272 |
|
273 | MIT
|
274 |
|
275 | [actions-image]: https://github.com/guybedford/es-module-lexer/actions/workflows/build.yml/badge.svg
|
276 | [actions-url]: https://github.com/guybedford/es-module-lexer/actions/workflows/build.yml
|
277 |
|