Writing QPL Scripts

From wiki.searchtechnologies.com
Jump to: navigation, search

For Information on Aspire 3.1 Click Here

This Page is Locked - Content no longer maintained - See QPL for the latest information.
Enterprise Add-On Feature

This section provides a general "getting started" introduction into writing QPL scripts.

One String = One Token

In QPL, a string can only ever be a single token. If the string contains spaces or punctuation, then it is assumed that it represents a single token which itself contains spaces or punctuation.

Sometimes, we'll see QPL code which looks like this:

def word1 = "hello"
def word2 = "world"
def phraseQ = phrase(word1 + " " + word2)   // WRONG

In the above example, QPL will assume that the concatenated words ("hello world") will create a single token.

In other words, the string "hello world" is a single token with a space in it, not two tokens. If this is used to search over a field which is tokenized on white-space, then the above query will never return any results.

Why is this the case? Why doesn't QPL automatically split apart all strings? Mostly this is to give you more control. Sometimes (in Solr, for example) tokens can contain spaces - if the field is specified as a "string" type field.

In addition, this standard allows QPL to write queries even for fields for which custom tokenizers might be written. For example, we have a tokenizer which stores HTML tags as single tokens.

The correct method

Instead, we use lists of strings to represent multiple tokens. For example:

def word1 = "hello"
def word2 = "world"
def phraseQ = phrase(word1, word2)   // CORRECT

Groovy lists can also be used:

def word1 = "hello"
def word2 = "world"
def tokens = [word1, word2]
def phraseQ = phrase(tokens)   // CORRECT

Simple Splitting: split()

Use the split() method to split a string into multiple strings.

 def markets = "Construction;IT;Finance;"
 split(markets, ";")  =>  ["construction", "IT", "Finance"]

Using split() you can quickly construct filter queries from lists of things:

 return and(or(split(markets,";")), phrase("hello", "world"))

Sophisticated Splitting: tokenize()

Tokenizers can be used by QPL scripts to split strings into lists of strings based on token boundaries. Some examples:

 tokenize(          "Hello World!!")  -->  ["Hello", "World!!"]
 tokenize(TO_LOWER, "Hello World!!")  -->  ["hello", "world!!"]
 tokenize(PUNCT,    "Hello World!!")  -->  ["Hello", "World"]

Flags can also be added together:

 tokenize(TO_LOWER+PUNCT, "Hello World!!")  -->  ["hello", "world"]

See QPL Tokenizers more more information about tokenizers.

Field-Sensitive Splitting

Some search engines will allow for tokenization which is sensitive to the type of field. The Solr plug-in for QPL leverages this capability.

 solr.tokenize("description", "Hello World!!")

The above uses the tokenizer defined for the "description" field to tokenize the text "Hello World!!". This will guarantee that the tokens created can be used to search the description field.

See Solr Tokenizers for more information.