_[Home](index.html) > Integration with GraphDB Lucene connector_

# Integration with GraphDB Lucene Connector

## Introduction

GraphDB provides an integration with Lucene using the [GraphDB Lucene Connector](http://graphdb.ontotext.com/documentation/free/lucene-graphdb-connector.html). This connector allows to create powerful full-text indexes of text properties in the graph, and write queries that combine both full-text and graph criteria, such as _"Give me the list of museums that display an artwork whose title contains word 'peace'"_

Sparnatural can be integrated with such a GraphDB Lucene index both to feed autocomplete fields and also for SPARQL query generation as described below.

## Create the index

Refer to the [GraphDB Lucene Connector documentation](http://graphdb.ontotext.com/documentation/free/lucene-graphdb-connector.html) for complete details. Here is the idea:

1. Determine the URI of the class for which you want to build the index;
2. List each literal property or path to store in the index;
    - Typically labels, names, titles, definitions, abstract, etc. 
    - but also the literal of _related entities_ in the graph, such as the name of the place where an event took place, the name of the author of a work, the labels of subject concepts, etc.
    - Each of these properties or path will go in a separate field in the index;
3. Create a "catchAll" field named `text` that will concatenate every other field in a single one;
4. Set a custom _language_ and _Analyzer_ if you need to have language-related indexing such as accent-folding in French;

## Example: create an index on SKOS Concepts in French

### Create individual fields

1. In GraphDB Workbench go to "Setup > Connectors", and create a new Lucene Connector;
2. Set its name to `ConceptIndex`
3. In field languages enter `fr`
3. In the "Types" field enter the full URI to index all SKOS Concepts : http://www.w3.org/2004/02/skos/core#Concept
4. Fill in the "Fields" entry to index `skos:prefLabel`s :
    - Field name = prefLabel
    - Property chain = http://www.w3.org/2004/02/skos/core#prefLabel
    - Uncheck the "facet" checkbox

![](/assets/images/graphdb-lucene-01.png)

5. Add a new "Field" entry by clicking on the "+" on the right, this time for `skos:altLabel`s :
    - Field name = altLabel
    - Property chain = http://www.w3.org/2004/02/skos/core#altLabel
    - Uncheck the "facet" checkbox
6. Repeat for `skos:hiddenLabel` if needed
7. `skos:definition` if needed
8. `skos:scopeNote` if needed
9. `skos:example` if needed

### Create catch-all field

Then we need to create a new field `text` that will aggregate the content of every other fields. To do that declare a new "virtual field" named `text/prefLabel` to indicate that the content or field `prefLabel` should be copied in field `text` (refer to the parts about [copy-fields](http://graphdb.ontotext.com/documentation/free/lucene-graphdb-connector.html#copy-fields) combined with [multiple property paths](http://graphdb.ontotext.com/documentation/free/lucene-graphdb-connector.html#multiple-property-chains-per-field) per fields for details):
1. Create new Field named `text/prefLabel`
2. In property path, enter `@prefLabel`; this must correspond to the name of one of the fields created earlier, so if you chose different names, adjust accordingly;
3. Uncheck "facet"
4. Repeat by adding a new Field again, named `text/altLabel` and property path `@altLabel`
5. Repeat with `text/hiddenLabel` and property path `@hiddenLabel`
6. Repeat with `text/definition` and property path `@definition`
7. Repeat with `text/scopeNote` and property path `@scopeNote`
8. Repeat with `text/example` and property path `@example`

Here is how this part looks like:

![](/assets/images/graphdb-lucene-02.png)

### Set language and Analyzer

- In the `Languages` field, set the value `fr`
- In the `Analyzer` field enter the value for the Lucene French Analyzer `org.apache.lucene.analysis.fr.FrenchAnalyzer`. If you don't do that the search will be accent-sensitive (e.g. a search on "metal" will not match "métal");

### Test a SPARQL query

Test a SPARQL search like the following :

```
PREFIX : <http://www.ontotext.com/connectors/lucene#>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
SELECT DISTINCT ?uri ?label ?snippetField ?snippet WHERE  {
    # replace "ConceptIndex" with the name of the index at the end of this URI
    ?search a <http://www.ontotext.com/connectors/lucene/instance#ConceptIndex> .
    # replace with a string to search. Also try with different fields, e.g. "altLabel:foo" or "text:foo" or with no field at all e.g. "foo"
    ?search :query "prefLabel:foo" .
    ?search :entities ?uri .
    ?uri :snippets _:s .
    _:s :snippetField ?snippetField ;
        :snippetText ?snippet .
    ?uri skos:prefLabel ?prefLabel . FILTER(lang(?prefLabel) = $lang)
```

## Feeding an autocomplete field in Sparnatural with SPARQL using GraphDB Lucene Connector

The principle is the following:
1. As stated in the [Sparnatural configuration documentation for your own queries](JSON-based-configuration#your-own-sparql-query), the SPARQL query MUST return 2 variables : `?uri` and `?label`;
2. Set a custom SPARQL query string to the autocomplete field definition, using special `:query` operators to query the index;
3. Add a "*" after the search key to search for the beginning of words;
4. Query the catch-all field, so `text` in our case;
5. Generate a `label` variable by concatenating the skos:prefLabel predicate of the entity with its snippet from the search results

Here is an example query:

```
PREFIX : <http://www.ontotext.com/connectors/lucene#>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
SELECT DISTINCT ?uri ?label WHERE  {
    # replace "ConceptIndex" with the name of the index at the end of this URI
    ?search a <http://www.ontotext.com/connectors/lucene/instance#ConceptIndex> .
    # Replace 'foo' with a test query, but keep the final "*" character
    ?search :query "text:foo*" .
    ?search :entities ?uri .
    ?uri :snippets _:s .
    _:s :snippetField ?snippetField ;
        :snippetText ?snippet .
    # get the skos:prefLabel predicate of the concept in the proper language
    ?uri skos:prefLabel ?prefLabel . FILTER(lang(?prefLabel) = "fr")
    # generate a "label" to display in autocomplete result, by concatenating skos:prefLabel, line break, and snippet in small font
    BIND(CONCAT(STR(?prefLabel), '<br /><small>', ?snippet ,'</small>') AS ?label)
}
```

And here is a query integrated in a [JSON-LD configuration of Sparnatural](JSON-based-configuration). Note how it uses the placeholders `$domain`, `$range`, `$property`, `$key` and `$lang` that gets replaced at query time with actual values:

```json
    {
      "@id" : "http://labs.sparna.fr/sparnatural-demo-graphdb-openarchaeo/onto#type-decouverte",
      "@type" : "ObjectProperty",
      "subPropertyOf" : "sparnatural:AutocompleteProperty",
      "label": [
        {"@value" : "discovery type","@language" : "en"},
        {"@value" : "type de découverte","@language" : "fr"}
      ],
      "domain": "http://labs.sparna.fr/sparnatural-demo-graphdb-openarchaeo/onto#Site",
      "range": "http://labs.sparna.fr/sparnatural-demo-graphdb-openarchaeo/onto#Concept",
      "sparqlString": "<http://www.cidoc-crm.org/cidoc-crm/P2_has_type>",
      "datasource": { 
        "queryString": "\
PREFIX : <http://www.ontotext.com/connectors/lucene#>\
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>\
SELECT DISTINCT ?uri ?label WHERE  {\
    ?something a $domain .\
    ?something $property ?uri .\
    ?search a <http://www.ontotext.com/connectors/lucene/instance#ConceptIndex> .\
    ?search :query \"text:$key*\" .\
    ?search :entities ?uri .\
    ?uri :snippets _:s .\
    _:s :snippetField ?snippetField ;\
        :snippetText ?snippet .\
    ?uri skos:prefLabel ?prefLabel . FILTER(lang(?prefLabel) = $lang)\
    BIND(CONCAT(STR(?prefLabel), '<br /><small>', ?snippet ,'</small>') AS ?label)\
}"
      }
```

## Integration with SPARQL query generation

You can configure Sparnatural to generate SPARQL queries that will query the full-text index. To do so:
1. Create a class in Sparnatural configuration **with the URI of index**, e.g. `http://www.ontotext.com/connectors/lucene/instance#ConceptIndex`. Create this class as a `subClassOf http://www.w3.org/2000/01/rdf-schema#Literal`, and give it a label like "Full-text Search...";
2. Create properties that correspond to the **fields in the index**. The property URIs **must end with the index field name, after the last "#" or last "/"**. e.g. `http://labs.sparna.fr/sparnatural-demo-graphdb-openarchaeo/onto/search#discovery` will query the field `discovery`
3. Set the property as `subPropertyOf sparnatural:GraphDBSearchProperty`; this parameter indicates to Sparnatural that the query to generate must use the GraphDB syntax and not the usual FILTER syntax;
4. Create multiple properties for each index field, so that the user can choose which field to query.

Here is an example configuration with 2 properties configured to query fields `commune` and `discovery`, and a third property named "all fields" / "tous les champs" to query the catch-all field:
```
    {
      "@id" : "http://www.ontotext.com/connectors/lucene/instance#SiteIndex",
      "@type" : "Class",
      "subClassOf" : "http://www.w3.org/2000/01/rdf-schema#Literal",
      "label": [
        {"@value" : "Full-text search...","@language" : "en"},
        {"@value" : "Recherche plein-texte...","@language" : "fr"}
      ],
      "faIcon":  "fad fa-search"
    },
...
     {
      "@id" : "http://labs.sparna.fr/sparnatural-demo-graphdb-openarchaeo/onto/search#commune",
      "@type" : "ObjectProperty",
      "subPropertyOf" : "sparnatural:GraphDBSearchProperty",
      "label": [
        {"@value" : "city","@language" : "en"},
        {"@value" : "commune","@language" : "fr"}
      ],
      "domain": "http://labs.sparna.fr/sparnatural-demo-graphdb-openarchaeo/onto#Site",
      "range": "http://www.ontotext.com/connectors/lucene/instance#SiteIndex"
    },
     {
      "@id" : "http://labs.sparna.fr/sparnatural-demo-graphdb-openarchaeo/onto/search#discovery",
      "@type" : "ObjectProperty",
      "subPropertyOf" : "sparnatural:GraphDBSearchProperty",
      "label": [
        {"@value" : "discovery","@language" : "en"},
        {"@value" : "type de découverte","@language" : "fr"}
      ],
      "domain": "http://labs.sparna.fr/sparnatural-demo-graphdb-openarchaeo/onto#Site",
      "range": "http://www.ontotext.com/connectors/lucene/instance#SiteIndex"
    },
     {
      "@id" : "http://labs.sparna.fr/sparnatural-demo-graphdb-openarchaeo/onto/search#text",
      "@type" : "ObjectProperty",
      "subPropertyOf" : "sparnatural:GraphDBSearchProperty",
      "label": [
        {"@value" : "any field","@language" : "en"},
        {"@value" : "tous les champs","@language" : "fr"}
      ],
      "domain": "http://labs.sparna.fr/sparnatural-demo-graphdb-openarchaeo/onto#Site",
      "range": "http://www.ontotext.com/connectors/lucene/instance#SiteIndex"
    },
```

Which will give the following end-user behavior:

![](/assets/images/graphdb-lucene-03.png)