UNPKG

14.9 kBMarkdownView Raw
1>This repository is part of the [Pelias](https://github.com/pelias/pelias)
2>project. Pelias is an open-source, open-data geocoder built by
3>[Mapzen](https://www.mapzen.com/) that also powers [Mapzen Search](https://mapzen.com/projects/search). Our
4>official user documentation is [here](https://mapzen.com/documentation/search/).
5
6# Pelias Query
7
8Elasticsearch geospatial and linguistic matching queries used by Pelias.
9
10## Installation
11
12```bash
13$ npm install pelias-query
14```
15
16[![NPM](https://nodei.co/npm/pelias-query.png?downloads=true&stars=true)](https://nodei.co/npm/pelias-query)
17
18## NPM Module
19
20The `pelias-query` npm module can be found here:
21
22[https://npmjs.org/package/pelias-query](https://npmjs.org/package/pelias-query)
23
24#### About
25
26This repository contains all the *geospatial* and *linguistic* matching Elasticsearch queries used in the [Pelias geocoder](https://github.com/pelias/pelias).
27
28An attempt has been made to provide the queries in a more general-purpose fashion. Only a few variables need to be changed in order to use the same queries with any Elasticsearch [schema](https://github.com/pelias/schema).
29
30Feel free to fork the project, Pull Requests are welcome!
31
32#### Motivation
33
34As the complexity and variability of database queries grows in a large project, their maintenance starts to become more and more difficult.
35
36Changes to the controller layer can have significant impact on the query layer and vice versa, making refactoring a chore.
37
38Additionally; the controller code used to compose these queries becomes a horrible mix of user input validation and query composition logic.
39
40In many cases query logic is simply copy->pasted between queries to ensure validity when it could simply be reused.
41
42This repo aims to solve some of these issues by providing:
43
44- a logical boundary between query composition and input validation.
45- a way to notate query variables which is distinct from the RESTful API.
46- a method of composing complex queries from smaller components.
47- a way of testing/debugging and re-using queries across repos/forks.
48- a language which describes the problem domain rather than an individual implementation.
49
50The composition workflow should be instantly familiar to anyone who has used an MVC-type framework before.
51
52### Variables
53
54Variables are used as placeholders in order to pre-build queries before we know the final values which will be provided by the user.
55
56**note:** Variables can only be Javascript primitive types: `string` *or* `numeric` *or* `boolean`, plus `array`. No objects allowed.
57
58#### VariableStore API
59
60```javascript
61var query = require('pelias-query');
62
63// create a new variable store
64var vs = new query.Vars();
65
66// set a variable
67vs.var('input:name', 'hackney city farm');
68
69// or
70vs.var('input:name').set('hackney city farm');
71
72// get a variable
73var a = vs.var('input:name');
74
75// get the primitive value of a variable
76var a = vs.var('input:name');
77a.get(); // hackney city farm
78a.toString(); // hackney city farm
79a.valueOf(); // hackney city farm
80a.toJSON(); // hackney city farm
81
82// check if a variable has been set
83vs.isset('input:name'); // true
84vs.isset('foo'); // false
85
86// bulk set many variables
87vs.set({
88 'boundary:rect:top': 1,
89 'boundary:rect:right': 2,
90 'boundary:rect:bottom': 2,
91 'boundary:rect:left': 1
92});
93
94// export variables for debugging
95var dict = vs.export();
96console.log( dict );
97```
98
99#### Default Variables
100
101This library provides a dictionary of common [default values](https://github.com/pelias/query/blob/master/defaults.json) which can be used when instantiating a new variable store.
102
103The defaults should be sufficient in the vast majority of cases but you may elect to change these defaults in order to modify how the queries execute for your specific installation.
104
105**note:** You can override any of the defaults at runtime.
106
107```javascript
108var query = require('pelias-query');
109
110// create a new variable with the defaults
111var vs = new query.Vars( query.defaults );
112
113// print all set variables
114console.log( vs.export() );
115```
116
117#### User Variables
118
119Variables coming from user input should be set on the variable store **whenever they are available**, below is a list of common user variables which can be set/unset to enable/disable query functionality.
120
121**note:** This list is non exhaustive, see the validation section of each view in order to confirm which specific variables it uses (explained below).
122
123```
124input:name: 'hackney city farm'
125
126focus:point:lat: 1.1
127focus:point:lon: 2.2
128
129input:housenumber: 101
130input:street: "hackney road"
131input:postcode: "E81DN"
132
133input:country_a: "GBR"
134input:country: "hackney"
135input:region: "hackney"
136input:region_a: "hackney"
137input:county: "hackney"
138input:localadmin: "hackney"
139input:locality: "hackney"
140input:neighbourhood: "hackney"
141
142input:categories: "food,education"
143
144boundary:circle:lat: 1
145boundary:circle:lon: 2
146boundary:circle:radius: "50km"
147
148boundary:rect:top: 1
149boundary:rect:right: 2
150boundary:rect:bottom: 2
151boundary:rect:left: 1
152
153boundary:country: "USA"
154```
155
156### Views
157
158Complex queries can be composed of smaller 'views', these are query blocks which are marked up with placeholder variables and later 'compiled' with the actual user variables in place.
159
160Views are essentially just a function which takes one argument (the variable store `vs`) and returns either `null` (if the required variables are not available) *or* a javascript object which encapsulates the view.
161
162```javascript
163// example of a 'view'
164function ( vs ){
165
166 // validate required params
167 if( !vs.isset('input:name') ||
168 !vs.isset('ngram:analyzer') ||
169 !vs.isset('ngram:field') ||
170 !vs.isset('ngram:boost') ){
171 return null;
172 }
173
174 // base view
175 var view = { "match": {} };
176
177 // match query
178 view.match[ vs.var('ngram:field') ] = {
179 analyzer: vs.var('ngram:analyzer'),
180 boost: vs.var('ngram:boost'),
181 query: vs.var('input:name')
182 };
183
184 return view;
185}
186```
187
188It's best practice to validate the variable(s) you are going to use at the top of your view so that:
189
1901. it doesn't execute with unmet dependencies *and*
1912. it is clear for other developers which variables are required to execute it
192
193#### View API
194
195An example of the above view rendered would look like this:
196
197```javascript
198var query = require('pelias-query'),
199 view = query.view.ngrams;
200
201var vs = new query.Vars({
202 'input:name': 'hackney city farm',
203 'ngram:analyzer': 'standard',
204 'ngram:field': 'name.default',
205 'ngram:boost': 1
206});
207
208var rendered = view( vs );
209```
210
211```javascript
212{
213 "match": {
214 "name.default": {
215 "analyzer": "standard",
216 "boost": 1,
217 "query": "hackney city farm"
218 }
219 }
220}
221```
222
223### Layouts
224
225Just as with most MVC frameworks the 'meta' view is called a 'layout', this is the envelope which wraps all other views.
226
227There is only one view available in this library (at this time), named the `FilteredBooleanQuery`. This is essentially the most versatile type of Elasticsearch query, all other examples you find online are simplified versions of this `layout`.
228
229```javascript
230var query = require('pelias-query');
231
232var q = new query.layout.FilteredBooleanQuery();
233```
234
235##### FilteredBooleanQuery API
236
237The `FilteredBooleanQuery` has two different methods for **assigning conditional views** *and* one method for handling the **sorting of results**.
238
239##### .score()
240
241The `.score` method is used to assign views which **will effect the scoring** of the results.
242
243In most cases you can assume that **records which match more of these conditions will appear higher in the results than those which match fewer**.
244
245```javascript
246var q = new query.layout.FilteredBooleanQuery();
247
248// a 'should' condition, if a record matches, it's score will be increased
249q.score(view);
250
251// this is simply a more explicit equivalent of the above ('should' is the default)
252q.score(view, 'should');
253
254// in this case we mark the view as a 'must' match condition.
255// Matching results will effect the score **but** in this case
256// **non-matching records will be removed from the results completely**
257q.score(view, 'must');
258```
259
260##### .filter()
261
262The `.filter` method is used to assign views which **do not effect the scoring** of results.
263
264**note:** The more results you remove before sorting; using either this method *or* the `.score` method above (with 'must'), the better your query performance will be.
265
266```javascript
267var q = new query.layout.FilteredBooleanQuery();
268
269// **non-matching records will be removed from the results completely**
270q.filter(view);
271```
272
273##### .sort()
274
275The `.sort` method is used to assign views which effect the sorting of results.
276
277In effect this method is not as useful as it sounds, for the most part you should be using `.score` methods above to effect the sorting of results.
278
279This function is only really useful in cases where a 'tiebreaker' is needed. For example: searching 'mcdonalds' may result in several records which scored the same value, in this case we can attempt to 'break the tie'.
280
281**warning:** These functions are computed for every document which matches the above conditions. Adding many `.sort` conditions may have a negative affect on query performance.
282
283```javascript
284var q = new query.layout.FilteredBooleanQuery();
285
286// this view is used to mediate 'tied' scoring situations
287q.sort( view );
288```
289
290### Composing Complex Queries
291
292Great! So with the building blocks above we can start to build composable, testable and re-usable queries.
293
294#### Reverse Geocode
295
296One of the simplest queries to build is a reverse geocoder, in this case we have indexed some documents with a `lat/lon` centroid and we would like to find the 1 nearest record to an arbitrary point.
297
298```javascript
299var query = require('pelias-query'),
300 vs = new query.Vars( query.defaults );
301
302// this is our focus point (somewhere in London)
303var focus = { lat: 51.5, lon: -0.06 };
304
305/**
306 build a query with 2 conditions:
307 - (optional) geographic bounds
308 - sort results by distance
309**/
310var q = new query.layout.FilteredBooleanQuery()
311 .filter( query.view.boundary_circle )
312 .sort( query.view.sort_distance );
313
314// we only want 1 result
315vs.var('size', 1);
316
317// set bounding variables
318vs.set({
319 'boundary:circle:lat': focus.lat,
320 'boundary:circle:lon': focus.lon,
321 'boundary:circle:radius': '5km'
322});
323
324// set focus point
325vs.set({
326 'focus:point:lat': focus.lat,
327 'focus:point:lon': focus.lon
328});
329
330// render the query
331var rendered = q.render( vs );
332```
333
334results in a query such as:
335
336```javascript
337{
338 "query": {
339 "bool": {
340 "filter": [
341 {
342 "geo_distance": {
343 "distance": "5km",
344 "distance_type": "plane",
345 "center_point": {
346 "lat": 51.5,
347 "lon": -0.06
348 }
349 }
350 }
351 ]
352 }
353 },
354 "size": 1,
355 "track_scores": true,
356 "sort": [
357 "_score",
358 {
359 "_geo_distance": {
360 "order": "asc",
361 "distance_type": "plane",
362 "center_point": {
363 "lat": 51.5,
364 "lon": -0.06
365 }
366 }
367 }
368 ]
369}
370```
371
372
373#### Linguistic Search with Local Bias
374
375This example is the most commonly requested full-text search query. In this case we match *all* results but we also apply the following scoring:
376
3771. better linguistic matches rank higher in the results
3782. records near the 'focus' point also gain a localized 'boost'
379
380In effect this means that we still show far away places but we also give more priority to local places.
381
382```javascript
383var query = require('pelias-query'),
384 vs = new query.Vars( query.defaults );
385
386// this is our focus point (somewhere in London)
387var focus = { lat: 51.5, lon: -0.06 };
388
389/**
390 build a query with 2 conditions:
391 - the linguistic matching strategy for scoring (phrase)
392 - the geographic decay function (focus)
393**/
394var q = new query.layout.FilteredBooleanQuery()
395 .score( query.view.phrase )
396 .score( query.view.focus(query.view.phrase) );
397
398/**
399 configure implementation-specific settings (or simply use the defaults):
400 - phrase settings
401 - focus settings
402**/
403vs.set({
404 'phrase:field': 'phrase.default',
405 'phrase:analyzer': 'standard',
406 'focus:function': 'gauss',
407 'focus:offset': '10km',
408 'focus:scale': '100km',
409 'focus:decay': 0.4
410});
411
412/**
413 set the user-specific variables:
414 - the input text provided by the user
415 - the input point to use for localization
416**/
417vs.var('input:name', 'union square');
418vs.var('focus:point:lat', focus.lat);
419vs.var('focus:point:lon', focus.lon);
420
421// render the query
422var rendered = q.render( vs );
423```
424
425results in a query such as:
426
427```javascript
428{
429 "query": {
430 "bool": {
431 "should": [
432 {
433 "match": {
434 "phrase.default": {
435 "analyzer": "standard",
436 "type": "phrase",
437 "boost": 1,
438 "slop": 2,
439 "query": "union square"
440 }
441 }
442 },
443 {
444 "function_score": {
445 "query": {
446 "match": {
447 "phrase.default": {
448 "analyzer": "standard",
449 "type": "phrase",
450 "boost": 1,
451 "slop": 2,
452 "query": "union square"
453 }
454 }
455 },
456 "functions": [
457 {
458 "weight": 2,
459 "gauss": {
460 "center_point": {
461 "origin": {
462 "lat": 51.5,
463 "lon": -0.06
464 },
465 "offset": "10km",
466 "scale": "100km",
467 "decay": 0.4
468 }
469 }
470 }
471 ],
472 "score_mode": "avg",
473 "boost_mode": "replace"
474 }
475 }
476 ]
477 }
478 },
479 "size": 10,
480 "track_scores": true,
481 "sort": [
482 "_score"
483 ]
484}
485```
486
487#### More Examples
488
489The above are examples of how you can compose queries which are testable, debuggable and re-usable, they can also be mixed & matched with other queries to build even more complex queries.
490
491Rather than trying to document an exhaustive list of geospatial and linguistic queries here; we have added a bunch of examples in the [examples directory](https://github.com/pelias/query/tree/master/examples).
492
493If you have any further questions please open an issue.
494
495## Contributing
496
497Please fork and pull request against upstream master on a feature branch.
498
499Pretty please; provide unit tests and script fixtures in the `test` directory.
500
501### Running Unit Tests
502
503```bash
504$ npm test
505```
506
507### Continuous Integration
508
509Travis tests every release against All supported Node.js versions.
510
511[![Build Status](https://travis-ci.org/pelias/query.png?branch=master)](https://travis-ci.org/pelias/query)
512
513### Versioning
514
515We rely on semantic-release and Greenkeeper to maintain our module and dependency versions.
516
517[![Greenkeeper badge](https://badges.greenkeeper.io/pelias/query.svg)](https://greenkeeper.io/)