Difference between revisions of "Elasticsearch extensions for QPL"

From wiki.searchtechnologies.com
Jump to: navigation, search

For Information on Aspire 3.1 Click Here

(Processing elasticsearch results)
m (Protected "Elasticsearch extensions for QPL" ([Edit=Allow only administrators] (indefinite) [Move=Allow only administrators] (indefinite)))
(No difference)

Revision as of 17:11, 3 March 2017

Enterprise Add-On Feature

UNDER CONSTRUCTION:

This page will be used to outline basic elasticsearch extensions for QPL which are accessible through use of the "es" variable.

The "es" Variable

QPL scripts will have access to a special "es" variable for accessing special extensions within elasticsearch. Use "es."

Query Parsing

In ElasticSearch, the"es" variable itself is a tokenizer, and can accept standard tokenization methods:

 def tokens = es.tokenize("this is some-text to tokenize");
 def tokens = es.tokenize("title", "This is some-text to tokenize as if it were in the title field");
 def fieldT = es.getFieldType("customerCount");
 def hasField = es.validField("customerCount");


In addition, the es variable can be sent directly to the standard QPL Query parser:

 def q = parseQuery("text_entry", es, "title:(george and washington)");

Or when using the makeParser utility (for re-usable query parsers):

def myParser = makeParser(extended:true, tokenizer:es, customOps:true);

def myQuery = myParser.parse("title:(george and washington)");

Document Level Security

(UNDER CONSTRUCTION, NOT AVAILABLE YET)

One option:

 QPLSecurityExpression = es.securityFilter("username")
 list = es.groupExpansion("username")
 return and(query, QPLSecurityExpression)

Group expansion

Security Filter

Option 2:

Turn on document level security filter with a plug-in option. Queries are automatically filtered with the security filter for all queries

The elasticsearch Client

To fetch the standard elasticsearch client, use:

 es.client

This will return an instance of the "Client" class, the elasticsearch Java class which can be used to access all standard elasticsearch client functions.

Embedded JSON Type

The embedded JSON type project converts an array of JSON objects into XML, so that the resulting XML can be tokenized using the XML tokenizer in the 'lucene-extensions' project. For example, the following array of objects in JSON:

   "categories": [
       {
           "name": "category1",
           "description": "the quick brown fox jumped over the lazy dog"
       },
       {
           "name": "category2",
           "description": "hello world"
       }
   ]

Will be converted to the this XML:

   <?xml version="1.0" encoding="UTF-8"?>
   <root>
     <categories>
       <description>the quick brown fox jumped over the lazy dog</description>
       <name>category1</name>
     </categories>
     <categories>
       <description>hello world</description>
       <name>category2</name>
     </categories>
   </root>

A 'categories' field is added to the document with the XML string as the value. It is thought that the search performance of the embedded type will be better than the native nested type in Elasticsearch - although this has yet to be proven.

The embedded JSON type has been writen for and tested against Elasticsearch 1.5.0, which is the current release at the time of writing. Although not tested, this plugin may work with earlier versions of Elasticsearch.

Projects

Elasticsearch Lucene Extensions

A version of the 'lucence-extensions' project has been created to work with Elasticsearch (elasticsearch-lucene-extensions).The embedded json type project has a dependency on the 'elasticsarch-lucene-extensions' project. It is packaged as an Elasticsearch plugin and will register the XML Analyzer (and in turn the XML Tokenizer) with Elasticsearch on start up.

Embedded JSON Type

The embedded JSON type is also an Elasticsearch plugin, it has a number of dependencies (including the 'elasticsearch-lucene-extensions' project), most of which are included in the package for convenience. The 'elasticsearch' and the 'elasticsearch-lucene-extensions' dependencies are not included in the package. Elasticearch is not included because it already provided, and 'elasticsearch-lucene-extensions' is not included because it is a plugin that needs to be installed independently.

The embedded JSON type plugin will parse any fields that have been declared with the type. During the parsing it will convert an array of objects to an XML string, if anything but an array of objects is used for the type it will return an error. The XML string is added to the document as field. The JSON embedded type has been configured to use the XML Analyzer, and in turn the XML Tokenizer, by default.

Installation

Elasticsearch

Download and install Elasticsearch 1.5.0 from here. Create a ‘plugins’ directory within the elasticsearch root directory. Within the plugins directory create 2 sub-directories ‘elasticsearch-lucene-extensions’ and ‘elasticsearch-embedded-json-type’:

   /elasticsearch
     /plugins
       /elasticsearch-lucene-extensions
       /elasticsearch-embedded-json-type


Elasticsearch Lucene Extensions

To Install the ‘elasticsearch-lucene-extensions’ plugin, check out the code from here. Using a command prompt window navigate to the project’s root directory and run the following maven command.

   mvn install

This will build the project package and install the project into your local Maven repository (which is needed for the embedded JSON type project).

After the 'mvn install' command is completed, navigate to the target directory in the project and copy the ‘elasticsearch-lucene-extensions-0.0.1-SNAPSHOT.jar’ to the plugin directory ‘/elasticsearch/plugins/elasticsearch-lucene-extensions’.

Elasticsearch Embedded JSON Type

To Install the ‘elasticsearch-embedded-json-type’ plugin, check out the code from here. Using a command prompt window navigate to the project’s root directory and run the following maven command.

   mvn package

This will build the project package.

After the 'mvn package' command is completed, navigate to the ‘target/releases’ directory in the project and copy the ‘elasticsearch-embedded-json-type-0.0.1-SNAPSHOT.jar’ to the elasticsearch plugin directory ‘/elasticsearch/plugins/elasticsearch-embedded-json-type’.

Now that both of the required plugins are installed start Elasticsearch.

Using the embedded JSON type

Create an index with a mapping that defines the ‘embedded_json’ type for a particular field:

   POST /test
   {
       "mappings": {
           "doc": {
               "properties": {
                   "categories": {
                       "type": "embedded_json"
                   }
               }
           }
       }
   }

Index a document with the field:

   POST /test/doc/1
   {
       "categories": [
           {
               "name": "category1",
               "description": "the quick brown fox jumped over the lazy dog"
           },
           {
               "name": "category2",
               "description": "hello world"
           }
       ]
   }

Now if you run a query the searches for ‘fox’ on the 'categories' field it will return the document.

   POST /test/_search
   {
       "query": {
           "match": {
               "categories": "fox"
           }
       }
   }


TODO: provided better examples of the query possibilities using the 'embedded_json' type.

Processing elasticsearch results

(UNDER CONSTRUCTION, NOT AVAILABLE YET)