1 | # lop -- parsing library for JavaScript
|
2 |
|
3 | lop is a library to create parsers using parser combinators with helpful errors.
|
4 |
|
5 | ```javascript
|
6 |
|
7 | function parse(tokens) {
|
8 | var parser = lop.Parser();
|
9 | return parser.parseTokens(expressionRule, tokens);
|
10 | }
|
11 |
|
12 | // This rule is wrapped inside lop.rule to defer evaluation until
|
13 | // the rule is used -- otherwise, it would reference integerRule
|
14 | // and ifRule, which don't exist yet.
|
15 | var expressionRule = lop.rule(function() {
|
16 | return rules.firstOf("expression",
|
17 | integerRule,
|
18 | ifRule
|
19 | );
|
20 | });
|
21 |
|
22 | var integerRule = rules.then(
|
23 | rules.tokenOfType("integer"),
|
24 | function(value) {
|
25 | return new IntegerNode(parseInt(value, 10));
|
26 | }
|
27 | );
|
28 |
|
29 | var ifRule = rules.sequence(
|
30 | rules.token("keyword", "if"),
|
31 | rules.sequence.cut(),
|
32 | rules.sequence.capture(expressionRule),
|
33 | rules.token("keyword", "then"),
|
34 | rules.sequence.capture(expressionRule),
|
35 | rules.token("keyword", "else"),
|
36 | rules.sequence.capture(expressionRule)
|
37 | ).map(function(condition, trueBranch, falseBranch) {
|
38 | return new IfNode(condition, trueBranch, falseBranch);
|
39 | });
|
40 | ```
|
41 |
|
42 | lop tries to provide helpful errors where possible. For instance, in `ifRule`
|
43 | as defined above, there is a cut following the keyword `if`. Before the cut,
|
44 | if we fail to match the input, we can backtrack -- in this case, we backtrack
|
45 | and see if another form of expression might match the input. However, after the
|
46 | cut, we prevent backtracking. Once we've see the keyword `if`, there's no doubt
|
47 | about which sort of expression this is, so if parsing fails later in this rule,
|
48 | there's no point in backtracking. This allows informative error messages to be
|
49 | generated: if we try to parse the string `"if 1 42 else 12"`, we get the error:
|
50 |
|
51 | Error: File: /tmp/lop-example
|
52 | Line number: 1
|
53 | Character number: 6:
|
54 | Expected keyword "then"
|
55 | but got integer "42"
|
56 |
|
57 | ## Tokens
|
58 |
|
59 | When using a parser built with lop, the input is an array of tokens. A token can be any value so long as it has the property `source`, which must be a `StringSourceRange`. For instance, to create a simple tokeniser that generates a stream of words tokens separated by whitespace tokens:
|
60 |
|
61 | ```javascript
|
62 | var StringSource = require("lop").StringSource;
|
63 |
|
64 | function tokeniseString(string) {
|
65 | return tokenise(new StringSource(string, "raw string"));
|
66 | }
|
67 |
|
68 | function tokenise(source) {
|
69 | var string = source.asString();
|
70 | var whitespaceRegex = /(\s+)/g;
|
71 | var result;
|
72 | var start = 0;
|
73 | var parts = [];
|
74 |
|
75 | while ((result = whitespaceRegex.exec(source)) !== null) {
|
76 | parts.push({
|
77 | type: "word",
|
78 | value: string.substring(start, result.index),
|
79 | source: source.range(start, result.index)
|
80 | });
|
81 | parts.push({
|
82 | type: "whitespace",
|
83 | value: result[1],
|
84 | source: source.range(result.index, whitespaceRegex.lastIndex)
|
85 | });
|
86 | start = whitespaceRegex.lastIndex;
|
87 | }
|
88 | parts.push({
|
89 | type: "word",
|
90 | value: string.substring(start),
|
91 | source: source.range(start, string.length)
|
92 | });
|
93 | parts.push({
|
94 | type: "end",
|
95 | source: source.range(string.length, string.length)
|
96 | });
|
97 | return parts.filter(function(part) {
|
98 | return part.type !== "word" || part.value !== "";
|
99 | });
|
100 | }
|
101 | ```
|
102 |
|
103 | lop also defines its own notion of a token. Each instance of `lop.Token` has a type, name, and source, similarly to most of the tokens that would be created by the token above. For instance, instead of:
|
104 |
|
105 | {
|
106 | type: "word",
|
107 | value: value,
|
108 | source: source
|
109 | }
|
110 |
|
111 | you could use:
|
112 |
|
113 | new Token("word", value, source)
|
114 |
|
115 | The main advantage of using `lop.Token` is that you can then use the rules `lop.rules.token` and `lop.rules.tokenOfType` (described later). If you don't use `lop.Token`, you must define your own atomic rules, but you can use the other rules without any modifications.
|
116 |
|
117 | ## Parser
|
118 |
|
119 | To parse an array of tokens, you can call the method `parseTokens` on `lop.Parser`, passing in the parsing rule and the array of tokens. For instance, assuming we already have a `tokenise` function (the one above would do fine):
|
120 |
|
121 | ```javascript
|
122 | function parseSentence(source) {
|
123 | var tokens = tokenise(source);
|
124 | var parser = new lop.Parser();
|
125 | var parseResult = parser.parseTokens(sentenceRule, tokens);
|
126 | if (!parseResult.isSuccess()) {
|
127 | throw new Error("Failed to parse: " + describeFailure(parseResult));
|
128 | }
|
129 | return parseResult.value();
|
130 | }
|
131 |
|
132 | function describeFailure(parseResult) {
|
133 | return parseResult.errors().map(describeError).join("\n");
|
134 |
|
135 | function describeError(error) {
|
136 | return error.describe();
|
137 | }
|
138 | }
|
139 | ```
|
140 |
|
141 | The result of parsing can be success, failure, or error. While failure indicates
|
142 | that the rule didn't match the input tokens, error indicates that the input
|
143 | was invalid in some way. In general, rules will backtrack when they
|
144 | encounter a failure, but will completely abort when they encounter an error.
|
145 | Each of these results has a number of methods:
|
146 |
|
147 | ```javascript
|
148 | result.isSuccess() // true for success, false otherwise
|
149 | result.isFailure() // true for failure, false otherwise
|
150 | result.isError() // true for error, false otherwise
|
151 | result.value() // if success, the value that was parsed
|
152 | result.remaining() // if success, the tokens that weren't consumed by parsing
|
153 | result.source() // the StringSourceRange containing the consumed tokens
|
154 | result.errors() // if failure or error, an array of descriptions of the failure/error
|
155 | ```
|
156 |
|
157 | The final question is then: how do we define rules for the parser, such as the currently undefined `sentenceRule`?
|
158 |
|
159 | ## Rules
|
160 |
|
161 | Each rule in lop accepts an iterator over tokens, and returns a result, as
|
162 | described in the previous section.
|
163 |
|
164 | ### lop.rules.token(*tokenType*, *value*)
|
165 |
|
166 | Success if the next token has type `tokenType` and value `value`, failure
|
167 | otherwise. Value on success is the value of the token.
|
168 |
|
169 | ### lop.rules.tokenOfType(*tokenType*)
|
170 |
|
171 | Success if the next token has type `tokenType`, failure otherwise. Value on
|
172 | success is the value of the token.
|
173 |
|
174 | ### lop.rules.firstOf(*name*, *subRules*)
|
175 |
|
176 | Tries each rule in `subRules` on the input tokens in turn. We return the result
|
177 | from the first sub-rule that returns success or error. In other words, return the
|
178 | result from the first sub-rule that doesn't return failure. If all sub-rules return
|
179 | failure, this rule returns failure.
|
180 |
|
181 | ### lop.rules.then(*subRule*, *func*)
|
182 |
|
183 | Try `subRule` on the input tokens, and if successful, map over the result. For
|
184 | instance:
|
185 |
|
186 | ```javascript
|
187 | lop.rules.then(
|
188 | lop.rules.tokenOfType("integer"),
|
189 | function(tokenValue) {
|
190 | return parseInt(tokenValue, 10);
|
191 | }
|
192 | )
|
193 | ```
|
194 |
|
195 | ### lop.rules.optional(*subRule*)
|
196 |
|
197 | Try `subRule` on the input tokens. If the sub-rule is successful with the value
|
198 | `value`, then return success with the value `options.some(value)`. If the sub-rule fails, return
|
199 | success with the value `options.none`. If the sub-rules errors, return that error.
|