UNPKG

character-parser/README.md

Version:

6.8 kBMarkdownView Raw

1# character-parser
2
3Parse JavaScript one character at a time to look for snippets in Templates.  This is not a validator, it's just designed to allow you to have sections of JavaScript delimited by brackets robustly.
4
5[![Build Status](https://img.shields.io/github/workflow/status/ForbesLindesay/character-parser/Publish%20Canary/master?style=for-the-badge)](https://github.com/ForbesLindesay/character-parser/actions)
6[![Rolling Versions](https://img.shields.io/badge/Rolling%20Versions-Enabled-brightgreen?style=for-the-badge)](https://rollingversions.com/ForbesLindesay/character-parser)
7[![NPM version](https://img.shields.io/npm/v/character-parser?style=for-the-badge)](https://www.npmjs.com/package/character-parser)
8
9## Installation
10
11    npm install character-parser
12
13## Usage
14
15### Parsing
16
17Work out how much depth changes:
18
19```js
20var state = parse('foo(arg1, arg2, {\n  foo: [a, b\n');
21assert.deepEqual(state.stack, [')', '}', ']']);
22
23parse('    c, d]\n  })', state);
24assert.deepEqual(state.stack, []);
25```
26
27### Custom Delimited Expressions
28
29Find code up to a custom delimiter:
30
31```js
32// EJS-style
33var section = parser.parseUntil('foo.bar("%>").baz%> bing bong', '%>');
34assert(section.start === 0);
35assert(section.end === 17); // exclusive end of string
36assert(section.src = 'foo.bar("%>").baz');
37
38var section = parser.parseUntil('<%foo.bar("%>").baz%> bing bong', '%>', {start: 2});
39assert(section.start === 2);
40assert(section.end === 19); // exclusive end of string
41assert(section.src = 'foo.bar("%>").baz');
42
43// Jade-style
44var section = parser.parseUntil('#[p= [1, 2][i]]', ']', {start: 2})
45assert(section.start === 2);
46assert(section.end === 14); // exclusive end of string
47assert(section.src === 'p= [1, 2][i]')
48
49// Dumb parsing
50// Stop at first delimiter encountered, doesn't matter if it's nested or not
51// This is the character-parser@1 default behavior.
52var section = parser.parseUntil('#[p= [1, 2][i]]', '}', {start: 2, ignoreNesting: true})
53assert(section.start === 2);
54assert(section.end === 10); // exclusive end of string
55assert(section.src === 'p= [1, 2')
56''
57```
58
59Delimiters are ignored if they are inside strings or comments.
60
61## API
62
63All methods may throw an exception in the case of syntax errors. The exception contains an additional `code` property that always starts with `CHARACTER_PARSER:` that is unique for the error.
64
65### parse(str, state = defaultState(), options = {start: 0, end: src.length})
66
67Parse a string starting at the index start, and return the state after parsing that string.
68
69If you want to parse one string in multiple sections you should keep passing the resulting state to the next parse operation.
70
71Returns a `State` object.
72
73### parseUntil(src, delimiter, options = {start: 0, ignoreLineComment: false, ignoreNesting: false})
74
75Parses the source until the first occurrence of `delimiter` which is not in a string or a comment.
76
77If `ignoreLineComment` is `true`, it will still count if the delimiter occurs in a line comment.
78
79If `ignoreNesting` is `true`, it will stop at the first bracket, not taking into account if the bracket part of nesting or not. See example above.
80
81It returns an object with the structure:
82
83```js
84{
85  start: 0,//index of first character of string
86  end: 13,//index of first character after the end of string
87  src: 'source string'
88}
89```
90
91### parseChar(character, state = defaultState())
92
93Parses the single character and returns the state.  See `parse` for the structure of the returned state object.  N.B. character must be a single character not a multi character string.
94
95### defaultState()
96
97Get a default starting state.
98
99### isPunctuator(character)
100
101Returns `true` if `character` represents punctuation in JavaScript.
102
103### isKeyword(name)
104
105Returns `true` if `name` is a keyword in JavaScript.
106
107### TOKEN_TYPES & BRACKETS
108
109Objects whose values can be a frame in the `stack` property of a State (documented below).
110
111## State
112
113A state is an object with the following structure
114
115```js
116{
117  stack: [],          // stack of detected brackets; the outermost is [0]
118  regexpStart: false, // true if a slash is just encountered and a REGEXP state has just been added to the stack
119
120  escaped: false,     // true if in a string and the last character was an escape character
121  hasDollar: false,   // true if in a template string and the last character was a dollar sign
122
123  src: '',            // the concatenated source string
124  history: '',        // reversed `src`
125  lastChar: ''        // last parsed character
126}
127```
128
129`stack` property can contain any of the following:
130
131- Any of the property values of `characterParser.TOKEN_TYPES`
132- Any of the property values of `characterParser.BRACKETS` (the end bracket, not the starting bracket)
133
134It also has the following useful methods:
135
136- `.current()` returns the innermost bracket (i.e. the last stack frame).
137- `.isString()` returns `true` if the current location is inside a string.
138- `.isComment()` returns `true` if the current location is inside a comment.
139- `.isNesting([opts])` returns `true` if the current location is not at the top level, i.e. if the stack is not empty. If `opts.ignoreLineComment` is `true`, line comments are not counted as a level, so for `// a` it will still return false.
140
141### Errors
142
143All errors thrown by character-parser has a `code` property attached to it that allows one to identify what sort of error is thrown. For errors thrown from `parse` and `parseUntil`, an additional `index` property is available.
144
145## Transition from v1
146
147In character-parser@2, we have changed the APIs quite a bit. These are some notes that will help you transition to the new version.
148
149### State Object Changes
150
151Instead of keeping depths of different brackets, we are now keeping a stack. We also removed some properties:
152
153```js
154state.lineComment  → state.current() === parser.TOKEN_TYPES.LINE_COMMENT
155state.blockComment → state.current() === parser.TOKEN_TYPES.BLOCK_COMMENT
156state.singleQuote  → state.current() === parser.TOKEN_TYPES.SINGLE_QUOTE
157state.doubleQuote  → state.current() === parser.TOKEN_TYPES.DOUBLE_QUOTE
158state.regexp       → state.current() === parser.TOKEN_TYPES.REGEXP
159```
160
161### `parseMax`
162
163This function has been removed since the usefulness of this function has been questioned. You should find that `parseUntil` is a better choice for your task.
164
165### `parseUntil`
166
167The default behavior when the delimiter is a bracket has been changed so that nesting is taken into account to determine if the end is reached.
168
169To preserve the original behavior, pass `ignoreNesting: true` as an option.
170
171To see the difference between the new and old behaviors, see the "Usage" section earlier.
172
173### `parseMaxBracket`
174
175This function has been merged into `parseUntil`. You can directly rename the function call without any repercussions.
176
177## License
178
179MIT

1	`# character-parser`
2
3	`Parse JavaScript one character at a time to look for snippets in Templates. This is not a validator, it's just designed to allow you to have sections of JavaScript delimited by brackets robustly.`
4
5	`[![Build Status](https://img.shields.io/github/workflow/status/ForbesLindesay/character-parser/Publish%20Canary/master?style=for-the-badge)](https://github.com/ForbesLindesay/character-parser/actions)`
6	`[![Rolling Versions](https://img.shields.io/badge/Rolling%20Versions-Enabled-brightgreen?style=for-the-badge)](https://rollingversions.com/ForbesLindesay/character-parser)`
7	`[![NPM version](https://img.shields.io/npm/v/character-parser?style=for-the-badge)](https://www.npmjs.com/package/character-parser)`
8
9	`## Installation`
10
11	`npm install character-parser`
12
13	`## Usage`
14
15	`### Parsing`
16
17	`Work out how much depth changes:`
18
19	```js
20	`var state = parse('foo(arg1, arg2, {\n foo: [a, b\n');`
21	`assert.deepEqual(state.stack, [')', '}', ']']);`
22
23	`parse(' c, d]\n })', state);`
24	`assert.deepEqual(state.stack, []);`
25	```
26
27	`### Custom Delimited Expressions`
28
29	`Find code up to a custom delimiter:`
30
31	```js
32	`// EJS-style`
33	`var section = parser.parseUntil('foo.bar("%>").baz%> bing bong', '%>');`
34	`assert(section.start === 0);`
35	`assert(section.end === 17); // exclusive end of string`
36	`assert(section.src = 'foo.bar("%>").baz');`
37
38	`var section = parser.parseUntil('<%foo.bar("%>").baz%> bing bong', '%>', {start: 2});`
39	`assert(section.start === 2);`
40	`assert(section.end === 19); // exclusive end of string`
41	`assert(section.src = 'foo.bar("%>").baz');`
42
43	`// Jade-style`
44	`var section = parser.parseUntil('#[p= [1, 2][i]]', ']', {start: 2})`
45	`assert(section.start === 2);`
46	`assert(section.end === 14); // exclusive end of string`
47	`assert(section.src === 'p= [1, 2][i]')`
48
49	`// Dumb parsing`
50	`// Stop at first delimiter encountered, doesn't matter if it's nested or not`
51	`// This is the character-parser@1 default behavior.`
52	`var section = parser.parseUntil('#[p= [1, 2][i]]', '}', {start: 2, ignoreNesting: true})`
53	`assert(section.start === 2);`
54	`assert(section.end === 10); // exclusive end of string`
55	`assert(section.src === 'p= [1, 2')`
56	`''`
57	```
58
59	`Delimiters are ignored if they are inside strings or comments.`
60
61	`## API`
62
63	All methods may throw an exception in the case of syntax errors. The exception contains an additional `code` property that always starts with `CHARACTER_PARSER:` that is unique for the error.
64
65	`### parse(str, state = defaultState(), options = {start: 0, end: src.length})`
66
67	`Parse a string starting at the index start, and return the state after parsing that string.`
68
69	`If you want to parse one string in multiple sections you should keep passing the resulting state to the next parse operation.`
70
71	Returns a `State` object.
72
73	`### parseUntil(src, delimiter, options = {start: 0, ignoreLineComment: false, ignoreNesting: false})`
74
75	Parses the source until the first occurrence of `delimiter` which is not in a string or a comment.
76
77	If `ignoreLineComment` is `true`, it will still count if the delimiter occurs in a line comment.
78
79	If `ignoreNesting` is `true`, it will stop at the first bracket, not taking into account if the bracket part of nesting or not. See example above.
80
81	`It returns an object with the structure:`
82
83	```js
84	`{`
85	`start: 0,//index of first character of string`
86	`end: 13,//index of first character after the end of string`
87	`src: 'source string'`
88	`}`
89	```
90
91	`### parseChar(character, state = defaultState())`
92
93	Parses the single character and returns the state. See `parse` for the structure of the returned state object. N.B. character must be a single character not a multi character string.
94
95	`### defaultState()`
96
97	`Get a default starting state.`
98
99	`### isPunctuator(character)`
100
101	Returns `true` if `character` represents punctuation in JavaScript.
102
103	`### isKeyword(name)`
104
105	Returns `true` if `name` is a keyword in JavaScript.
106
107	`### TOKEN_TYPES & BRACKETS`
108
109	Objects whose values can be a frame in the `stack` property of a State (documented below).
110
111	`## State`
112
113	`A state is an object with the following structure`
114
115	```js
116	`{`
117	`stack: [], // stack of detected brackets; the outermost is [0]`
118	`regexpStart: false, // true if a slash is just encountered and a REGEXP state has just been added to the stack`
119
120	`escaped: false, // true if in a string and the last character was an escape character`
121	`hasDollar: false, // true if in a template string and the last character was a dollar sign`
122
123	`src: '', // the concatenated source string`
124	history: '', // reversed `src`
125	`lastChar: '' // last parsed character`
126	`}`
127	```
128
129	`stack` property can contain any of the following:
130
131	- Any of the property values of `characterParser.TOKEN_TYPES`
132	- Any of the property values of `characterParser.BRACKETS` (the end bracket, not the starting bracket)
133
134	`It also has the following useful methods:`
135
136	- `.current()` returns the innermost bracket (i.e. the last stack frame).
137	- `.isString()` returns `true` if the current location is inside a string.
138	- `.isComment()` returns `true` if the current location is inside a comment.
139	- `.isNesting([opts])` returns `true` if the current location is not at the top level, i.e. if the stack is not empty. If `opts.ignoreLineComment` is `true`, line comments are not counted as a level, so for `// a` it will still return false.
140
141	`### Errors`
142
143	All errors thrown by character-parser has a `code` property attached to it that allows one to identify what sort of error is thrown. For errors thrown from `parse` and `parseUntil`, an additional `index` property is available.
144
145	`## Transition from v1`
146
147	`In character-parser@2, we have changed the APIs quite a bit. These are some notes that will help you transition to the new version.`
148
149	`### State Object Changes`
150
151	`Instead of keeping depths of different brackets, we are now keeping a stack. We also removed some properties:`
152
153	```js
154	`state.lineComment → state.current() === parser.TOKEN_TYPES.LINE_COMMENT`
155	`state.blockComment → state.current() === parser.TOKEN_TYPES.BLOCK_COMMENT`
156	`state.singleQuote → state.current() === parser.TOKEN_TYPES.SINGLE_QUOTE`
157	`state.doubleQuote → state.current() === parser.TOKEN_TYPES.DOUBLE_QUOTE`
158	`state.regexp → state.current() === parser.TOKEN_TYPES.REGEXP`
159	```
160
161	### `parseMax`
162
163	This function has been removed since the usefulness of this function has been questioned. You should find that `parseUntil` is a better choice for your task.
164
165	### `parseUntil`
166
167	`The default behavior when the delimiter is a bracket has been changed so that nesting is taken into account to determine if the end is reached.`
168
169	To preserve the original behavior, pass `ignoreNesting: true` as an option.
170
171	`To see the difference between the new and old behaviors, see the "Usage" section earlier.`
172
173	### `parseMaxBracket`
174
175	This function has been merged into `parseUntil`. You can directly rename the function call without any repercussions.
176
177	`## License`
178
179	`MIT`