1 | # character-parser
|
2 |
|
3 | Parse JavaScript one character at a time to look for snippets in Templates. This is not a validator, it's just designed to allow you to have sections of JavaScript delimited by brackets robustly.
|
4 |
|
5 | [](https://github.com/ForbesLindesay/character-parser/actions)
|
6 | [](https://rollingversions.com/ForbesLindesay/character-parser)
|
7 | [](https://www.npmjs.com/package/character-parser)
|
8 |
|
9 | ## Installation
|
10 |
|
11 | npm install character-parser
|
12 |
|
13 | ## Usage
|
14 |
|
15 | ### Parsing
|
16 |
|
17 | Work out how much depth changes:
|
18 |
|
19 | ```js
|
20 | var state = parse('foo(arg1, arg2, {\n foo: [a, b\n');
|
21 | assert.deepEqual(state.stack, [')', '}', ']']);
|
22 |
|
23 | parse(' c, d]\n })', state);
|
24 | assert.deepEqual(state.stack, []);
|
25 | ```
|
26 |
|
27 | ### Custom Delimited Expressions
|
28 |
|
29 | Find code up to a custom delimiter:
|
30 |
|
31 | ```js
|
32 | // EJS-style
|
33 | var section = parser.parseUntil('foo.bar("%>").baz%> bing bong', '%>');
|
34 | assert(section.start === 0);
|
35 | assert(section.end === 17); // exclusive end of string
|
36 | assert(section.src = 'foo.bar("%>").baz');
|
37 |
|
38 | var section = parser.parseUntil('<%foo.bar("%>").baz%> bing bong', '%>', {start: 2});
|
39 | assert(section.start === 2);
|
40 | assert(section.end === 19); // exclusive end of string
|
41 | assert(section.src = 'foo.bar("%>").baz');
|
42 |
|
43 | // Jade-style
|
44 | var section = parser.parseUntil('#[p= [1, 2][i]]', ']', {start: 2})
|
45 | assert(section.start === 2);
|
46 | assert(section.end === 14); // exclusive end of string
|
47 | assert(section.src === 'p= [1, 2][i]')
|
48 |
|
49 | // Dumb parsing
|
50 | // Stop at first delimiter encountered, doesn't matter if it's nested or not
|
51 | // This is the character-parser@1 default behavior.
|
52 | var section = parser.parseUntil('#[p= [1, 2][i]]', '}', {start: 2, ignoreNesting: true})
|
53 | assert(section.start === 2);
|
54 | assert(section.end === 10); // exclusive end of string
|
55 | assert(section.src === 'p= [1, 2')
|
56 | ''
|
57 | ```
|
58 |
|
59 | Delimiters are ignored if they are inside strings or comments.
|
60 |
|
61 | ## API
|
62 |
|
63 | All methods may throw an exception in the case of syntax errors. The exception contains an additional `code` property that always starts with `CHARACTER_PARSER:` that is unique for the error.
|
64 |
|
65 | ### parse(str, state = defaultState(), options = {start: 0, end: src.length})
|
66 |
|
67 | Parse a string starting at the index start, and return the state after parsing that string.
|
68 |
|
69 | If you want to parse one string in multiple sections you should keep passing the resulting state to the next parse operation.
|
70 |
|
71 | Returns a `State` object.
|
72 |
|
73 | ### parseUntil(src, delimiter, options = {start: 0, ignoreLineComment: false, ignoreNesting: false})
|
74 |
|
75 | Parses the source until the first occurrence of `delimiter` which is not in a string or a comment.
|
76 |
|
77 | If `ignoreLineComment` is `true`, it will still count if the delimiter occurs in a line comment.
|
78 |
|
79 | If `ignoreNesting` is `true`, it will stop at the first bracket, not taking into account if the bracket part of nesting or not. See example above.
|
80 |
|
81 | It returns an object with the structure:
|
82 |
|
83 | ```js
|
84 | {
|
85 | start: 0,//index of first character of string
|
86 | end: 13,//index of first character after the end of string
|
87 | src: 'source string'
|
88 | }
|
89 | ```
|
90 |
|
91 | ### parseChar(character, state = defaultState())
|
92 |
|
93 | Parses the single character and returns the state. See `parse` for the structure of the returned state object. N.B. character must be a single character not a multi character string.
|
94 |
|
95 | ### defaultState()
|
96 |
|
97 | Get a default starting state.
|
98 |
|
99 | ### isPunctuator(character)
|
100 |
|
101 | Returns `true` if `character` represents punctuation in JavaScript.
|
102 |
|
103 | ### isKeyword(name)
|
104 |
|
105 | Returns `true` if `name` is a keyword in JavaScript.
|
106 |
|
107 | ### TOKEN_TYPES & BRACKETS
|
108 |
|
109 | Objects whose values can be a frame in the `stack` property of a State (documented below).
|
110 |
|
111 | ## State
|
112 |
|
113 | A state is an object with the following structure
|
114 |
|
115 | ```js
|
116 | {
|
117 | stack: [], // stack of detected brackets; the outermost is [0]
|
118 | regexpStart: false, // true if a slash is just encountered and a REGEXP state has just been added to the stack
|
119 |
|
120 | escaped: false, // true if in a string and the last character was an escape character
|
121 | hasDollar: false, // true if in a template string and the last character was a dollar sign
|
122 |
|
123 | src: '', // the concatenated source string
|
124 | history: '', // reversed `src`
|
125 | lastChar: '' // last parsed character
|
126 | }
|
127 | ```
|
128 |
|
129 | `stack` property can contain any of the following:
|
130 |
|
131 | - Any of the property values of `characterParser.TOKEN_TYPES`
|
132 | - Any of the property values of `characterParser.BRACKETS` (the end bracket, not the starting bracket)
|
133 |
|
134 | It also has the following useful methods:
|
135 |
|
136 | - `.current()` returns the innermost bracket (i.e. the last stack frame).
|
137 | - `.isString()` returns `true` if the current location is inside a string.
|
138 | - `.isComment()` returns `true` if the current location is inside a comment.
|
139 | - `.isNesting([opts])` returns `true` if the current location is not at the top level, i.e. if the stack is not empty. If `opts.ignoreLineComment` is `true`, line comments are not counted as a level, so for `// a` it will still return false.
|
140 |
|
141 | ### Errors
|
142 |
|
143 | All errors thrown by character-parser has a `code` property attached to it that allows one to identify what sort of error is thrown. For errors thrown from `parse` and `parseUntil`, an additional `index` property is available.
|
144 |
|
145 | ## Transition from v1
|
146 |
|
147 | In character-parser@2, we have changed the APIs quite a bit. These are some notes that will help you transition to the new version.
|
148 |
|
149 | ### State Object Changes
|
150 |
|
151 | Instead of keeping depths of different brackets, we are now keeping a stack. We also removed some properties:
|
152 |
|
153 | ```js
|
154 | state.lineComment → state.current() === parser.TOKEN_TYPES.LINE_COMMENT
|
155 | state.blockComment → state.current() === parser.TOKEN_TYPES.BLOCK_COMMENT
|
156 | state.singleQuote → state.current() === parser.TOKEN_TYPES.SINGLE_QUOTE
|
157 | state.doubleQuote → state.current() === parser.TOKEN_TYPES.DOUBLE_QUOTE
|
158 | state.regexp → state.current() === parser.TOKEN_TYPES.REGEXP
|
159 | ```
|
160 |
|
161 | ### `parseMax`
|
162 |
|
163 | This function has been removed since the usefulness of this function has been questioned. You should find that `parseUntil` is a better choice for your task.
|
164 |
|
165 | ### `parseUntil`
|
166 |
|
167 | The default behavior when the delimiter is a bracket has been changed so that nesting is taken into account to determine if the end is reached.
|
168 |
|
169 | To preserve the original behavior, pass `ignoreNesting: true` as an option.
|
170 |
|
171 | To see the difference between the new and old behaviors, see the "Usage" section earlier.
|
172 |
|
173 | ### `parseMaxBracket`
|
174 |
|
175 | This function has been merged into `parseUntil`. You can directly rename the function call without any repercussions.
|
176 |
|
177 | ## License
|
178 |
|
179 | MIT
|