1 | >This repository is part of the [Pelias](https://github.com/pelias/pelias)
|
2 | >project. Pelias is an open-source, open-data geocoder built by
|
3 | >[Mapzen](https://www.mapzen.com/) that also powers [Mapzen Search](https://mapzen.com/projects/search). Our
|
4 | >official user documentation is [here](https://mapzen.com/documentation/search/).
|
5 |
|
6 | # Pelias Query
|
7 |
|
8 | Elasticsearch geospatial and linguistic matching queries used by Pelias.
|
9 |
|
10 | ## Installation
|
11 |
|
12 | ```bash
|
13 | $ npm install pelias-query
|
14 | ```
|
15 |
|
16 | [![NPM](https://nodei.co/npm/pelias-query.png?downloads=true&stars=true)](https://nodei.co/npm/pelias-query)
|
17 |
|
18 | ## NPM Module
|
19 |
|
20 | The `pelias-query` npm module can be found here:
|
21 |
|
22 | [https://npmjs.org/package/pelias-query](https://npmjs.org/package/pelias-query)
|
23 |
|
24 | #### About
|
25 |
|
26 | This repository contains all the *geospatial* and *linguistic* matching Elasticsearch queries used in the [Pelias geocoder](https://github.com/pelias/pelias).
|
27 |
|
28 | An attempt has been made to provide the queries in a more general-purpose fashion. Only a few variables need to be changed in order to use the same queries with any Elasticsearch [schema](https://github.com/pelias/schema).
|
29 |
|
30 | Feel free to fork the project, Pull Requests are welcome!
|
31 |
|
32 | #### Motivation
|
33 |
|
34 | As the complexity and variability of database queries grows in a large project, their maintenance starts to become more and more difficult.
|
35 |
|
36 | Changes to the controller layer can have significant impact on the query layer and vice versa, making refactoring a chore.
|
37 |
|
38 | Additionally; the controller code used to compose these queries becomes a horrible mix of user input validation and query composition logic.
|
39 |
|
40 | In many cases query logic is simply copy->pasted between queries to ensure validity when it could simply be reused.
|
41 |
|
42 | This repo aims to solve some of these issues by providing:
|
43 |
|
44 | - a logical boundary between query composition and input validation.
|
45 | - a way to notate query variables which is distinct from the RESTful API.
|
46 | - a method of composing complex queries from smaller components.
|
47 | - a way of testing/debugging and re-using queries across repos/forks.
|
48 | - a language which describes the problem domain rather than an individual implementation.
|
49 |
|
50 | The composition workflow should be instantly familiar to anyone who has used an MVC-type framework before.
|
51 |
|
52 | ### Variables
|
53 |
|
54 | Variables are used as placeholders in order to pre-build queries before we know the final values which will be provided by the user.
|
55 |
|
56 | **note:** Variables can only be Javascript primitive types: `string` *or* `numeric` *or* `boolean`, plus `array`. No objects allowed.
|
57 |
|
58 | #### VariableStore API
|
59 |
|
60 | ```javascript
|
61 | var query = require('pelias-query');
|
62 |
|
63 | // create a new variable store
|
64 | var vs = new query.Vars();
|
65 |
|
66 | // set a variable
|
67 | vs.var('input:name', 'hackney city farm');
|
68 |
|
69 | // or
|
70 | vs.var('input:name').set('hackney city farm');
|
71 |
|
72 | // get a variable
|
73 | var a = vs.var('input:name');
|
74 |
|
75 | // get the primitive value of a variable
|
76 | var a = vs.var('input:name');
|
77 | a.get(); // hackney city farm
|
78 | a.toString(); // hackney city farm
|
79 | a.valueOf(); // hackney city farm
|
80 | a.toJSON(); // hackney city farm
|
81 |
|
82 | // check if a variable has been set
|
83 | vs.isset('input:name'); // true
|
84 | vs.isset('foo'); // false
|
85 |
|
86 | // bulk set many variables
|
87 | vs.set({
|
88 | 'boundary:rect:top': 1,
|
89 | 'boundary:rect:right': 2,
|
90 | 'boundary:rect:bottom': 2,
|
91 | 'boundary:rect:left': 1
|
92 | });
|
93 |
|
94 | // export variables for debugging
|
95 | var dict = vs.export();
|
96 | console.log( dict );
|
97 | ```
|
98 |
|
99 | #### Default Variables
|
100 |
|
101 | This library provides a dictionary of common [default values](https://github.com/pelias/query/blob/master/defaults.json) which can be used when instantiating a new variable store.
|
102 |
|
103 | The defaults should be sufficient in the vast majority of cases but you may elect to change these defaults in order to modify how the queries execute for your specific installation.
|
104 |
|
105 | **note:** You can override any of the defaults at runtime.
|
106 |
|
107 | ```javascript
|
108 | var query = require('pelias-query');
|
109 |
|
110 | // create a new variable with the defaults
|
111 | var vs = new query.Vars( query.defaults );
|
112 |
|
113 | // print all set variables
|
114 | console.log( vs.export() );
|
115 | ```
|
116 |
|
117 | #### User Variables
|
118 |
|
119 | Variables coming from user input should be set on the variable store **whenever they are available**, below is a list of common user variables which can be set/unset to enable/disable query functionality.
|
120 |
|
121 | **note:** This list is non exhaustive, see the validation section of each view in order to confirm which specific variables it uses (explained below).
|
122 |
|
123 | ```
|
124 | input:name: 'hackney city farm'
|
125 |
|
126 | focus:point:lat: 1.1
|
127 | focus:point:lon: 2.2
|
128 |
|
129 | input:housenumber: 101
|
130 | input:street: "hackney road"
|
131 | input:postcode: "E81DN"
|
132 |
|
133 | input:country_a: "GBR"
|
134 | input:country: "hackney"
|
135 | input:region: "hackney"
|
136 | input:region_a: "hackney"
|
137 | input:county: "hackney"
|
138 | input:localadmin: "hackney"
|
139 | input:locality: "hackney"
|
140 | input:neighbourhood: "hackney"
|
141 |
|
142 | input:categories: "food,education"
|
143 |
|
144 | boundary:circle:lat: 1
|
145 | boundary:circle:lon: 2
|
146 | boundary:circle:radius: "50km"
|
147 |
|
148 | boundary:rect:top: 1
|
149 | boundary:rect:right: 2
|
150 | boundary:rect:bottom: 2
|
151 | boundary:rect:left: 1
|
152 |
|
153 | boundary:country: "USA"
|
154 | ```
|
155 |
|
156 | ### Views
|
157 |
|
158 | Complex queries can be composed of smaller 'views', these are query blocks which are marked up with placeholder variables and later 'compiled' with the actual user variables in place.
|
159 |
|
160 | Views are essentially just a function which takes one argument (the variable store `vs`) and returns either `null` (if the required variables are not available) *or* a javascript object which encapsulates the view.
|
161 |
|
162 | ```javascript
|
163 | // example of a 'view'
|
164 | function ( vs ){
|
165 |
|
166 | // validate required params
|
167 | if( !vs.isset('input:name') ||
|
168 | !vs.isset('ngram:analyzer') ||
|
169 | !vs.isset('ngram:field') ||
|
170 | !vs.isset('ngram:boost') ){
|
171 | return null;
|
172 | }
|
173 |
|
174 | // base view
|
175 | var view = { "match": {} };
|
176 |
|
177 | // match query
|
178 | view.match[ vs.var('ngram:field') ] = {
|
179 | analyzer: vs.var('ngram:analyzer'),
|
180 | boost: vs.var('ngram:boost'),
|
181 | query: vs.var('input:name')
|
182 | };
|
183 |
|
184 | return view;
|
185 | }
|
186 | ```
|
187 |
|
188 | It's best practice to validate the variable(s) you are going to use at the top of your view so that:
|
189 |
|
190 | 1. it doesn't execute with unmet dependencies *and*
|
191 | 2. it is clear for other developers which variables are required to execute it
|
192 |
|
193 | #### View API
|
194 |
|
195 | An example of the above view rendered would look like this:
|
196 |
|
197 | ```javascript
|
198 | var query = require('pelias-query'),
|
199 | view = query.view.ngrams;
|
200 |
|
201 | var vs = new query.Vars({
|
202 | 'input:name': 'hackney city farm',
|
203 | 'ngram:analyzer': 'standard',
|
204 | 'ngram:field': 'name.default',
|
205 | 'ngram:boost': 1
|
206 | });
|
207 |
|
208 | var rendered = view( vs );
|
209 | ```
|
210 |
|
211 | ```javascript
|
212 | {
|
213 | "match": {
|
214 | "name.default": {
|
215 | "analyzer": "standard",
|
216 | "boost": 1,
|
217 | "query": "hackney city farm"
|
218 | }
|
219 | }
|
220 | }
|
221 | ```
|
222 |
|
223 | ### Layouts
|
224 |
|
225 | Just as with most MVC frameworks the 'meta' view is called a 'layout', this is the envelope which wraps all other views.
|
226 |
|
227 | There is only one view available in this library (at this time), named the `FilteredBooleanQuery`. This is essentially the most versatile type of Elasticsearch query, all other examples you find online are simplified versions of this `layout`.
|
228 |
|
229 | ```javascript
|
230 | var query = require('pelias-query');
|
231 |
|
232 | var q = new query.layout.FilteredBooleanQuery();
|
233 | ```
|
234 |
|
235 | ##### FilteredBooleanQuery API
|
236 |
|
237 | The `FilteredBooleanQuery` has two different methods for **assigning conditional views** *and* one method for handling the **sorting of results**.
|
238 |
|
239 | ##### .score()
|
240 |
|
241 | The `.score` method is used to assign views which **will effect the scoring** of the results.
|
242 |
|
243 | In most cases you can assume that **records which match more of these conditions will appear higher in the results than those which match fewer**.
|
244 |
|
245 | ```javascript
|
246 | var q = new query.layout.FilteredBooleanQuery();
|
247 |
|
248 | // a 'should' condition, if a record matches, it's score will be increased
|
249 | q.score(view);
|
250 |
|
251 | // this is simply a more explicit equivalent of the above ('should' is the default)
|
252 | q.score(view, 'should');
|
253 |
|
254 | // in this case we mark the view as a 'must' match condition.
|
255 | // Matching results will effect the score **but** in this case
|
256 | // **non-matching records will be removed from the results completely**
|
257 | q.score(view, 'must');
|
258 | ```
|
259 |
|
260 | ##### .filter()
|
261 |
|
262 | The `.filter` method is used to assign views which **do not effect the scoring** of results.
|
263 |
|
264 | **note:** The more results you remove before sorting; using either this method *or* the `.score` method above (with 'must'), the better your query performance will be.
|
265 |
|
266 | ```javascript
|
267 | var q = new query.layout.FilteredBooleanQuery();
|
268 |
|
269 | // **non-matching records will be removed from the results completely**
|
270 | q.filter(view);
|
271 | ```
|
272 |
|
273 | ##### .sort()
|
274 |
|
275 | The `.sort` method is used to assign views which effect the sorting of results.
|
276 |
|
277 | In effect this method is not as useful as it sounds, for the most part you should be using `.score` methods above to effect the sorting of results.
|
278 |
|
279 | This function is only really useful in cases where a 'tiebreaker' is needed. For example: searching 'mcdonalds' may result in several records which scored the same value, in this case we can attempt to 'break the tie'.
|
280 |
|
281 | **warning:** These functions are computed for every document which matches the above conditions. Adding many `.sort` conditions may have a negative affect on query performance.
|
282 |
|
283 | ```javascript
|
284 | var q = new query.layout.FilteredBooleanQuery();
|
285 |
|
286 | // this view is used to mediate 'tied' scoring situations
|
287 | q.sort( view );
|
288 | ```
|
289 |
|
290 | ### Composing Complex Queries
|
291 |
|
292 | Great! So with the building blocks above we can start to build composable, testable and re-usable queries.
|
293 |
|
294 | #### Reverse Geocode
|
295 |
|
296 | One of the simplest queries to build is a reverse geocoder, in this case we have indexed some documents with a `lat/lon` centroid and we would like to find the 1 nearest record to an arbitrary point.
|
297 |
|
298 | ```javascript
|
299 | var query = require('pelias-query'),
|
300 | vs = new query.Vars( query.defaults );
|
301 |
|
302 | // this is our focus point (somewhere in London)
|
303 | var focus = { lat: 51.5, lon: -0.06 };
|
304 |
|
305 | /**
|
306 | build a query with 2 conditions:
|
307 | - (optional) geographic bounds
|
308 | - sort results by distance
|
309 | **/
|
310 | var q = new query.layout.FilteredBooleanQuery()
|
311 | .filter( query.view.boundary_circle )
|
312 | .sort( query.view.sort_distance );
|
313 |
|
314 | // we only want 1 result
|
315 | vs.var('size', 1);
|
316 |
|
317 | // set bounding variables
|
318 | vs.set({
|
319 | 'boundary:circle:lat': focus.lat,
|
320 | 'boundary:circle:lon': focus.lon,
|
321 | 'boundary:circle:radius': '5km'
|
322 | });
|
323 |
|
324 | // set focus point
|
325 | vs.set({
|
326 | 'focus:point:lat': focus.lat,
|
327 | 'focus:point:lon': focus.lon
|
328 | });
|
329 |
|
330 | // render the query
|
331 | var rendered = q.render( vs );
|
332 | ```
|
333 |
|
334 | results in a query such as:
|
335 |
|
336 | ```javascript
|
337 | {
|
338 | "query": {
|
339 | "bool": {
|
340 | "filter": [
|
341 | {
|
342 | "geo_distance": {
|
343 | "distance": "5km",
|
344 | "distance_type": "plane",
|
345 | "center_point": {
|
346 | "lat": 51.5,
|
347 | "lon": -0.06
|
348 | }
|
349 | }
|
350 | }
|
351 | ]
|
352 | }
|
353 | },
|
354 | "size": 1,
|
355 | "track_scores": true,
|
356 | "sort": [
|
357 | "_score",
|
358 | {
|
359 | "_geo_distance": {
|
360 | "order": "asc",
|
361 | "distance_type": "plane",
|
362 | "center_point": {
|
363 | "lat": 51.5,
|
364 | "lon": -0.06
|
365 | }
|
366 | }
|
367 | }
|
368 | ]
|
369 | }
|
370 | ```
|
371 |
|
372 |
|
373 | #### Linguistic Search with Local Bias
|
374 |
|
375 | This example is the most commonly requested full-text search query. In this case we match *all* results but we also apply the following scoring:
|
376 |
|
377 | 1. better linguistic matches rank higher in the results
|
378 | 2. records near the 'focus' point also gain a localized 'boost'
|
379 |
|
380 | In effect this means that we still show far away places but we also give more priority to local places.
|
381 |
|
382 | ```javascript
|
383 | var query = require('pelias-query'),
|
384 | vs = new query.Vars( query.defaults );
|
385 |
|
386 | // this is our focus point (somewhere in London)
|
387 | var focus = { lat: 51.5, lon: -0.06 };
|
388 |
|
389 | /**
|
390 | build a query with 2 conditions:
|
391 | - the linguistic matching strategy for scoring (phrase)
|
392 | - the geographic decay function (focus)
|
393 | **/
|
394 | var q = new query.layout.FilteredBooleanQuery()
|
395 | .score( query.view.phrase )
|
396 | .score( query.view.focus(query.view.phrase) );
|
397 |
|
398 | /**
|
399 | configure implementation-specific settings (or simply use the defaults):
|
400 | - phrase settings
|
401 | - focus settings
|
402 | **/
|
403 | vs.set({
|
404 | 'phrase:field': 'phrase.default',
|
405 | 'phrase:analyzer': 'standard',
|
406 | 'focus:function': 'gauss',
|
407 | 'focus:offset': '10km',
|
408 | 'focus:scale': '100km',
|
409 | 'focus:decay': 0.4
|
410 | });
|
411 |
|
412 | /**
|
413 | set the user-specific variables:
|
414 | - the input text provided by the user
|
415 | - the input point to use for localization
|
416 | **/
|
417 | vs.var('input:name', 'union square');
|
418 | vs.var('focus:point:lat', focus.lat);
|
419 | vs.var('focus:point:lon', focus.lon);
|
420 |
|
421 | // render the query
|
422 | var rendered = q.render( vs );
|
423 | ```
|
424 |
|
425 | results in a query such as:
|
426 |
|
427 | ```javascript
|
428 | {
|
429 | "query": {
|
430 | "bool": {
|
431 | "should": [
|
432 | {
|
433 | "match": {
|
434 | "phrase.default": {
|
435 | "analyzer": "standard",
|
436 | "type": "phrase",
|
437 | "boost": 1,
|
438 | "slop": 2,
|
439 | "query": "union square"
|
440 | }
|
441 | }
|
442 | },
|
443 | {
|
444 | "function_score": {
|
445 | "query": {
|
446 | "match": {
|
447 | "phrase.default": {
|
448 | "analyzer": "standard",
|
449 | "type": "phrase",
|
450 | "boost": 1,
|
451 | "slop": 2,
|
452 | "query": "union square"
|
453 | }
|
454 | }
|
455 | },
|
456 | "functions": [
|
457 | {
|
458 | "weight": 2,
|
459 | "gauss": {
|
460 | "center_point": {
|
461 | "origin": {
|
462 | "lat": 51.5,
|
463 | "lon": -0.06
|
464 | },
|
465 | "offset": "10km",
|
466 | "scale": "100km",
|
467 | "decay": 0.4
|
468 | }
|
469 | }
|
470 | }
|
471 | ],
|
472 | "score_mode": "avg",
|
473 | "boost_mode": "replace"
|
474 | }
|
475 | }
|
476 | ]
|
477 | }
|
478 | },
|
479 | "size": 10,
|
480 | "track_scores": true,
|
481 | "sort": [
|
482 | "_score"
|
483 | ]
|
484 | }
|
485 | ```
|
486 |
|
487 | #### More Examples
|
488 |
|
489 | The above are examples of how you can compose queries which are testable, debuggable and re-usable, they can also be mixed & matched with other queries to build even more complex queries.
|
490 |
|
491 | Rather than trying to document an exhaustive list of geospatial and linguistic queries here; we have added a bunch of examples in the [examples directory](https://github.com/pelias/query/tree/master/examples).
|
492 |
|
493 | If you have any further questions please open an issue.
|
494 |
|
495 | ## Contributing
|
496 |
|
497 | Please fork and pull request against upstream master on a feature branch.
|
498 |
|
499 | Pretty please; provide unit tests and script fixtures in the `test` directory.
|
500 |
|
501 | ### Running Unit Tests
|
502 |
|
503 | ```bash
|
504 | $ npm test
|
505 | ```
|
506 |
|
507 | ### Continuous Integration
|
508 |
|
509 | Travis tests every release against All supported Node.js versions.
|
510 |
|
511 | [![Build Status](https://travis-ci.org/pelias/query.png?branch=master)](https://travis-ci.org/pelias/query)
|
512 |
|
513 | ### Versioning
|
514 |
|
515 | We rely on semantic-release and Greenkeeper to maintain our module and dependency versions.
|
516 |
|
517 | [![Greenkeeper badge](https://badges.greenkeeper.io/pelias/query.svg)](https://greenkeeper.io/)
|