UNPKG

lop/README.md

Version:

6.99 kBMarkdownView Raw

1# lop -- parsing library for JavaScript
2
3lop is a library to create parsers using parser combinators with helpful errors.
4
5```javascript
6
7function parse(tokens) {
8    var parser = lop.Parser();
9    return parser.parseTokens(expressionRule, tokens);
10}
11
12// This rule is wrapped inside lop.rule to defer evaluation until
13// the rule is used -- otherwise, it would reference integerRule
14// and ifRule, which don't exist yet.
15var expressionRule = lop.rule(function() {
16    return rules.firstOf("expression",
17        integerRule,
18        ifRule
19    );
20});
21
22var integerRule = rules.then(
23    rules.tokenOfType("integer"),
24    function(value) {
25        return new IntegerNode(parseInt(value, 10));
26    }
27);
28
29var ifRule = rules.sequence(
30    rules.token("keyword", "if"),
31    rules.sequence.cut(),
32    rules.sequence.capture(expressionRule),
33    rules.token("keyword", "then"),
34    rules.sequence.capture(expressionRule),
35    rules.token("keyword", "else"),
36    rules.sequence.capture(expressionRule)
37).map(function(condition, trueBranch, falseBranch) {
38    return new IfNode(condition, trueBranch, falseBranch);
39});
40```
41
42lop tries to provide helpful errors where possible. For instance, in `ifRule`
43as defined above, there is a cut following the keyword `if`. Before the cut,
44if we fail to match the input, we can backtrack -- in this case, we backtrack
45and see if another form of expression might match the input. However, after the
46cut, we prevent backtracking. Once we've see the keyword `if`, there's no doubt
47about which sort of expression this is, so if parsing fails later in this rule,
48there's no point in backtracking. This allows informative error messages to be
49generated: if we try to parse the string `"if 1 42 else 12"`, we get the error:
50
51    Error: File: /tmp/lop-example
52    Line number: 1
53    Character number: 6:
54    Expected keyword "then"
55    but got integer "42"
56
57## Tokens
58
59When using a parser built with lop, the input is an array of tokens. A token can be any value so long as it has the property `source`, which must be a `StringSourceRange`. For instance, to create a simple tokeniser that generates a stream of words tokens separated by whitespace tokens:
60
61```javascript
62var StringSource = require("lop").StringSource;
63
64function tokeniseString(string) {
65    return tokenise(new StringSource(string, "raw string"));
66}
67
68function tokenise(source) {
69    var string = source.asString();
70    var whitespaceRegex = /(\s+)/g;
71    var result;
72    var start = 0;
73    var parts = [];
74    
75    while ((result = whitespaceRegex.exec(source)) !== null) {
76        parts.push({
77            type: "word",
78            value: string.substring(start, result.index),
79            source: source.range(start, result.index)
80        });
81        parts.push({
82            type: "whitespace",
83            value: result[1],
84            source: source.range(result.index, whitespaceRegex.lastIndex)
85        });
86        start = whitespaceRegex.lastIndex;
87    }
88    parts.push({
89        type: "word",
90        value: string.substring(start),
91        source: source.range(start, string.length)
92    });
93    parts.push({
94        type: "end",
95        source: source.range(string.length, string.length)
96    });
97    return parts.filter(function(part) {
98        return part.type !== "word" || part.value !== "";
99    });
100}
101```
102
103lop also defines its own notion of a token. Each instance of `lop.Token` has a type, name, and source, similarly to most of the tokens that would be created by the token above. For instance, instead of:
104
105    {
106        type: "word",
107        value: value,
108        source: source
109    }
110
111you could use:
112
113    new Token("word", value, source)
114
115The main advantage of using `lop.Token` is that you can then use the rules `lop.rules.token` and `lop.rules.tokenOfType` (described later). If you don't use `lop.Token`, you must define your own atomic rules, but you can use the other rules without any modifications.
116
117## Parser
118
119To parse an array of tokens, you can call the method `parseTokens` on `lop.Parser`, passing in the parsing rule and the array of tokens. For instance, assuming we already have a `tokenise` function (the one above would do fine):
120
121```javascript
122function parseSentence(source) {
123    var tokens = tokenise(source);
124    var parser = new lop.Parser();
125    var parseResult = parser.parseTokens(sentenceRule, tokens);
126    if (!parseResult.isSuccess()) {
127        throw new Error("Failed to parse: " + describeFailure(parseResult));
128    }
129    return parseResult.value();
130}
131
132function describeFailure(parseResult) {
133    return parseResult.errors().map(describeError).join("\n");
134   
135    function describeError(error) {
136        return error.describe();
137    }
138}
139```
140
141The result of parsing can be success, failure, or error. While failure indicates
142that the rule didn't match the input tokens, error indicates that the input
143was invalid in some way. In general, rules will backtrack when they
144encounter a failure, but will completely abort when they encounter an error.
145Each of these results has a number of methods:
146
147```javascript
148    result.isSuccess() // true for success, false otherwise
149    result.isFailure() // true for failure, false otherwise
150    result.isError() // true for error, false otherwise
151    result.value() // if success, the value that was parsed
152    result.remaining() // if success, the tokens that weren't consumed by parsing
153    result.source() // the StringSourceRange containing the consumed tokens
154    result.errors() // if failure or error, an array of descriptions of the failure/error
155```
156
157The final question is then: how do we define rules for the parser, such as the currently undefined `sentenceRule`?
158
159## Rules
160
161Each rule in lop accepts an iterator over tokens, and returns a result, as
162described in the previous section.
163
164### lop.rules.token(*tokenType*, *value*)
165
166Success if the next token has type `tokenType` and value `value`, failure
167otherwise. Value on success is the value of the token.
168
169### lop.rules.tokenOfType(*tokenType*)
170
171Success if the next token has type `tokenType`, failure otherwise. Value on
172success is the value of the token.
173
174### lop.rules.firstOf(*name*, *subRules*)
175
176Tries each rule in `subRules` on the input tokens in turn. We return the result
177from the first sub-rule that returns success or error. In other words, return the
178result from the first sub-rule that doesn't return failure. If all sub-rules return
179failure, this rule returns failure.
180
181### lop.rules.then(*subRule*, *func*)
182
183Try `subRule` on the input tokens, and if successful, map over the result. For
184instance:
185
186```javascript
187lop.rules.then(
188    lop.rules.tokenOfType("integer"),
189    function(tokenValue) {
190        return parseInt(tokenValue, 10);
191    }
192)
193```
194
195### lop.rules.optional(*subRule*)
196
197Try `subRule` on the input tokens. If the sub-rule is successful with the value
198`value`, then return success with the value `options.some(value)`. If the sub-rule fails, return
199success with the value `options.none`. If the sub-rules errors, return that error.

1	`# lop -- parsing library for JavaScript`
2
3	`lop is a library to create parsers using parser combinators with helpful errors.`
4
5	```javascript
6
7	`function parse(tokens) {`
8	`var parser = lop.Parser();`
9	`return parser.parseTokens(expressionRule, tokens);`
10	`}`
11
12	`// This rule is wrapped inside lop.rule to defer evaluation until`
13	`// the rule is used -- otherwise, it would reference integerRule`
14	`// and ifRule, which don't exist yet.`
15	`var expressionRule = lop.rule(function() {`
16	`return rules.firstOf("expression",`
17	`integerRule,`
18	`ifRule`
19	`);`
20	`});`
21
22	`var integerRule = rules.then(`
23	`rules.tokenOfType("integer"),`
24	`function(value) {`
25	`return new IntegerNode(parseInt(value, 10));`
26	`}`
27	`);`
28
29	`var ifRule = rules.sequence(`
30	`rules.token("keyword", "if"),`
31	`rules.sequence.cut(),`
32	`rules.sequence.capture(expressionRule),`
33	`rules.token("keyword", "then"),`
34	`rules.sequence.capture(expressionRule),`
35	`rules.token("keyword", "else"),`
36	`rules.sequence.capture(expressionRule)`
37	`).map(function(condition, trueBranch, falseBranch) {`
38	`return new IfNode(condition, trueBranch, falseBranch);`
39	`});`
40	```
41
42	lop tries to provide helpful errors where possible. For instance, in `ifRule`
43	as defined above, there is a cut following the keyword `if`. Before the cut,
44	`if we fail to match the input, we can backtrack -- in this case, we backtrack`
45	`and see if another form of expression might match the input. However, after the`
46	cut, we prevent backtracking. Once we've see the keyword `if`, there's no doubt
47	`about which sort of expression this is, so if parsing fails later in this rule,`
48	`there's no point in backtracking. This allows informative error messages to be`
49	generated: if we try to parse the string `"if 1 42 else 12"`, we get the error:
50
51	`Error: File: /tmp/lop-example`
52	`Line number: 1`
53	`Character number: 6:`
54	`Expected keyword "then"`
55	`but got integer "42"`
56
57	`## Tokens`
58
59	When using a parser built with lop, the input is an array of tokens. A token can be any value so long as it has the property `source`, which must be a `StringSourceRange`. For instance, to create a simple tokeniser that generates a stream of words tokens separated by whitespace tokens:
60
61	```javascript
62	`var StringSource = require("lop").StringSource;`
63
64	`function tokeniseString(string) {`
65	`return tokenise(new StringSource(string, "raw string"));`
66	`}`
67
68	`function tokenise(source) {`
69	`var string = source.asString();`
70	`var whitespaceRegex = /(\s+)/g;`
71	`var result;`
72	`var start = 0;`
73	`var parts = [];`
74
75	`while ((result = whitespaceRegex.exec(source)) !== null) {`
76	`parts.push({`
77	`type: "word",`
78	`value: string.substring(start, result.index),`
79	`source: source.range(start, result.index)`
80	`});`
81	`parts.push({`
82	`type: "whitespace",`
83	`value: result[1],`
84	`source: source.range(result.index, whitespaceRegex.lastIndex)`
85	`});`
86	`start = whitespaceRegex.lastIndex;`
87	`}`
88	`parts.push({`
89	`type: "word",`
90	`value: string.substring(start),`
91	`source: source.range(start, string.length)`
92	`});`
93	`parts.push({`
94	`type: "end",`
95	`source: source.range(string.length, string.length)`
96	`});`
97	`return parts.filter(function(part) {`
98	`return part.type !== "word" \|\| part.value !== "";`
99	`});`
100	`}`
101	```
102
103	lop also defines its own notion of a token. Each instance of `lop.Token` has a type, name, and source, similarly to most of the tokens that would be created by the token above. For instance, instead of:
104
105	`{`
106	`type: "word",`
107	`value: value,`
108	`source: source`
109	`}`
110
111	`you could use:`
112
113	`new Token("word", value, source)`
114
115	The main advantage of using `lop.Token` is that you can then use the rules `lop.rules.token` and `lop.rules.tokenOfType` (described later). If you don't use `lop.Token`, you must define your own atomic rules, but you can use the other rules without any modifications.
116
117	`## Parser`
118
119	To parse an array of tokens, you can call the method `parseTokens` on `lop.Parser`, passing in the parsing rule and the array of tokens. For instance, assuming we already have a `tokenise` function (the one above would do fine):
120
121	```javascript
122	`function parseSentence(source) {`
123	`var tokens = tokenise(source);`
124	`var parser = new lop.Parser();`
125	`var parseResult = parser.parseTokens(sentenceRule, tokens);`
126	`if (!parseResult.isSuccess()) {`
127	`throw new Error("Failed to parse: " + describeFailure(parseResult));`
128	`}`
129	`return parseResult.value();`
130	`}`
131
132	`function describeFailure(parseResult) {`
133	`return parseResult.errors().map(describeError).join("\n");`
134
135	`function describeError(error) {`
136	`return error.describe();`
137	`}`
138	`}`
139	```
140
141	`The result of parsing can be success, failure, or error. While failure indicates`
142	`that the rule didn't match the input tokens, error indicates that the input`
143	`was invalid in some way. In general, rules will backtrack when they`
144	`encounter a failure, but will completely abort when they encounter an error.`
145	`Each of these results has a number of methods:`
146
147	```javascript
148	`result.isSuccess() // true for success, false otherwise`
149	`result.isFailure() // true for failure, false otherwise`
150	`result.isError() // true for error, false otherwise`
151	`result.value() // if success, the value that was parsed`
152	`result.remaining() // if success, the tokens that weren't consumed by parsing`
153	`result.source() // the StringSourceRange containing the consumed tokens`
154	`result.errors() // if failure or error, an array of descriptions of the failure/error`
155	```
156
157	The final question is then: how do we define rules for the parser, such as the currently undefined `sentenceRule`?
158
159	`## Rules`
160
161	`Each rule in lop accepts an iterator over tokens, and returns a result, as`
162	`described in the previous section.`
163
164	`### lop.rules.token(tokenType, value)`
165
166	Success if the next token has type `tokenType` and value `value`, failure
167	`otherwise. Value on success is the value of the token.`
168
169	`### lop.rules.tokenOfType(tokenType)`
170
171	Success if the next token has type `tokenType`, failure otherwise. Value on
172	`success is the value of the token.`
173
174	`### lop.rules.firstOf(name, subRules)`
175
176	Tries each rule in `subRules` on the input tokens in turn. We return the result
177	`from the first sub-rule that returns success or error. In other words, return the`
178	`result from the first sub-rule that doesn't return failure. If all sub-rules return`
179	`failure, this rule returns failure.`
180
181	`### lop.rules.then(subRule, func)`
182
183	Try `subRule` on the input tokens, and if successful, map over the result. For
184	`instance:`
185
186	```javascript
187	`lop.rules.then(`
188	`lop.rules.tokenOfType("integer"),`
189	`function(tokenValue) {`
190	`return parseInt(tokenValue, 10);`
191	`}`
192	`)`
193	```
194
195	`### lop.rules.optional(subRule)`
196
197	Try `subRule` on the input tokens. If the sub-rule is successful with the value
198	`value`, then return success with the value `options.some(value)`. If the sub-rule fails, return
199	success with the value `options.none`. If the sub-rules errors, return that error.