Difference between revisions of "Solr Extensions"

From wiki.searchtechnologies.com
Jump to: navigation, search

For Information on Aspire 3.1 Click Here

m (Protected "Solr Extensions" ([Edit=Allow only administrators] (indefinite) [Move=Allow only administrators] (indefinite)))
(No difference)

Revision as of 17:07, 3 March 2017

Enterprise Add-On Feature

There are some extensions for QPL specifically to allow access to Solr functionality.

A special "solr" variable is available from the Solr Plugin for QPL. This variable provides access to all Solr extensions.

The Query String

The query string is available using the "query" variable. For example:

 return and(field("title",solr.tokenize("title",query)));

Tokenization

Tokenize a string using the default Solr field:

 solr.tokenize("string to tokenize")

Tokenize a string using a specified Solr field:

 solr.tokenize("title", "string to tokenize")

The results of solr.tokenize() is a list of strings. This can be easily turned into a query by just wrapping the list with a QPL operator:

 myAndQuery = and(solr.tokenize("hello world"))

Tokenization With Phrasing

Tokenize a string using a specified Solr field, but preserve strings in double-quotes (i.e., don't tokenize within them):

 solr.tokenize("text", 'Labrador "Dog Kennels"', DQ_PHRASES)

The result of this method is a java.util.List<Object>. Each object is a string or a QPL phrase operator. So, the call above would return a List containing: string=Labrador, phrase(term(dog),term(kennels))

Creating Tokenizers

[In Dev]

Some utilities (i.e. thesaurus expansion) require that you pass in the tokenizer to be used for tokenization.

You can use the Solr extensions to create tokenizers for Solr fields:

 def myTokenizer = solr.makeTokenizer("content");

Note that in the above, "content" is the default field.

Query Parameters

All parameters passed to the URL are available inside of QPL, using "solr.<parameterName>".

For example, if the URL sent to Solr is:

 http://localhost:8983/solr/update?q=hello+world&firstName=jack&lastName=kennedy

Then you can access the "lastName" parameter in your QPL script using:

 println solr.lastName;

Other Query Parameter Methods

A number of convenience methods for accessing query parameter methods are also available:

  • get(String paramName, String defaultValue)
    Fetches the named parameter if it exists. Returns the defaultValue if it doesn't.
  • getAll(String paramName)
    Returns a list of values with the specified parameter.
  • getBoolean(String param) / getBool(String param)
    Fetches the parameter and converts it to a boolean (true/false) value. Can return NULL if the parameter does not exist.
  • getBoolean(String param, boolean def) / getBool(String param, boolean def)
    Fetches parameter, converts it to boolean. Returns default value if it doesn't exist.
  • getDouble(String param)
    Fetches parameter and converts it to a double. Can return null if the parameter does not exist.
  • getDouble(String param, double def)
    Fetches parameter and converts it to a double. Returns default value if it does not exist.
  • getFloat(String param)
    Fetches parameter and converts it to a float. Can return null if parameter does not exist.
  • getFloat(String param, float def)
    Fetches parameter and converts it to a float. Returns default value if it does not exist.
  • getInteger(String param) / getInt(String param)
    Fetches parameter and converts it to an integer. Can return null if the parameter does not exist.
  • getInteger(String param, int def) / getInt(String param, int def)
    Fetches parameter and converts it to an integer. Returns default value if it doesn't exist.
  • getParamNames()
    Fetches an iterator for scanning through all parameter names.

Using getParamNames()

If you wish to sequence through all parameter names, the getParamNames() method can be used as follows:

solr.getParamNames().each { pw.println("Param " + it); }

Setting Query Parameters

(only available inside the Solr search component plug-in)

You can set a query parameter from a QPL script using the following method:

  • solr.setParam(String paramName, String paramValue)

An example call is as follows:

 solr.setParam("normalizedLocation", "Pasadena, CA");

Note that this will overwrite the current value. If you want to add a parameter (i.e. to a list), use the following method:

  • solr.addParam(String paramName, String paramValue)

For example, to add an 'fq' param (do this in a search component that occurs before 'query' component, and isProcess=false) The call is as follows:

 solr.addParam("fq", "type:good");

Working With the Solr Search Results XML

Note that these features are only available inside the Solr search component plug-in.

Adding new elements to the search response XML

You can add to the output XML response in a few different ways. This may typically be required when scripts wish to pass additional information to later search stages or to the User Interface (e.g., to be processed by an XSL Transform).

Creating Lists:

  • solr.newList()

Returns a Solr NamedList; see http://lucene.apache.org/solr/4_0_0/solr-solrj/org/apache/solr/common/util/NamedList.html The method may be used when a QPL script wishes to add an ordered list of name/value pairs to the Solr response; see also the solr.addResponse() description later in this section. Note that QPL scripts may invoke any of the Java methods from NamedList on the return value; add(), get(), etc. The NamedList may also be manipulated using native groovy syntax.

Example Usage:

 def myList = solr.newList();
 myList.add("pm", "Enda Kenny");
 
 myList.add("numList", [1, 2, 3, 4]);
 myList.add("pmMap", ["irish-pm":"Enda Kenny", "president":"Barack Obama", "british-pm": "David Cameron"]);

Creating Maps:

  • solr.newMap()

Returns a Solr SimpleOrderedMap; see http://lucene.apache.org/solr/4_0_0/solr-solrj/org/apache/solr/common/util/SimpleOrderedMap.html The method may be used when a QPL script wishes to add an keyed map of name/value pairs to the Solr response; see also the solr.addResponse() description later in this section. Note that, since SimpleOrderedMap extends NamesList, QPL scripts may invoke any of the Java methods from NamedList on the return value; add(), get(), etc. The SimpleOrderedMap may also be manipulated using native groovy syntax.

Example Usage:

 def myMap = solr.newMap();
 myMap.add("pm", "David Cameron");

Adding New Responses to the output:

  • solr.addResponse(String name, String element)

Adds a simple string element (with the specified name) to the response.

  • solr.addResponse(String name, NamedList element)

Adds a NamedList or SimpleOrderedMap to the response; these are typically created using the newList() and newMap() methods described earlier in this section. The added element is given the specified name.

  • solr.addResponse(NamedList element)

Adds a NamedList or SimpleOrderedMap to the response; these are typically created using the newList() and newMap() methods described earlier in this section. The added element is unnamed.

Example Usage:

 solr.addResponse("relatedSearchesList", myList);
 solr.addResponse("relatedSearchesMap", myMap);

Reading (and manipulating) response data from other search stages

You can read and manipulate response data from other (earlier) search stages using the following method:

  • getResponse()

Returns a Solr NamedList of the response values; see http://lucene.apache.org/solr/4_0_0/solr-solrj/org/apache/solr/common/util/NamedList.html QPL scripts can manipulate the NamedList using its available Java methods; add(), get(), etc. The NamedList may also be manipulated using native groovy syntax.

Example Usage:

 def response = solr.getResponse()
 response.highlighting.each {
   println it.key
 }

Embedded Searches

QPL is able to perform a local search and then use the results of that search to create new queries. This is a useful technique when you ancillary databases, such as locations, category headings, etc. which can provide additional information useful to constrain or expand the user's query space.

Basic searches use this structure:

 def results = solr.search(<qpl-query>, <num-docs>)

For example:

 def results = solr.search(phrase('search','technologies'), 10);

The search engine will use the Lucene query builder to convert the QPL query to Lucene query objects, and then will perform a direct-to-Lucene search. Because the search is direct to Lucene it is very fast (no network transfers, no Solr overhead, no RequestHandler or query pipeline, etc.), but also it does no caching, so keep that in mind as well.

Using the Results

The results object returned by solr.search() is a standard List, so it can be used like any other list:

 results[0]  //  Retrieves the first document in the list

 results.each { /* iterate over all documents in the results */ }

Fields can be accessed from document results by simply using results.field, for example:

 results[0].title  // Fetch the title(s) for the first document

 results*.id       // Groovy shortcut:  Returns a list of ids for all documents in the results

Note: Title is defined as multi-valued in the default Solr schema. This means that results[0].title will be returned as an array of Strings (see below for more information).

Objects returned by fielded data access are true Java objects, of the appropriate type for the field in question. For example, if you have a 'boolean' field defined in Solr, you can do this:

 if(results[0].isInStock) {    // 'boolean' fields in Solr are Boolean objects
   /* do something */
 }

It's easy to use the results of a search to construct new queries:

 idsQuery = or(results*.ids);  // Create an OR over all IDs returned by the query

 // Form all of the names returned into phrase queries, 
 // then construct an or() across all names
 complexQuery1 = or(results.collect { phrase(solr.tokenize(it.name)) };

Note that multi-valued fields are returned as arrays of their primary object.

 results[0].cat  // This is a String[] if the 'cat' field is defined as multi-valued

Fortunately, most arrays of strings are handled automatically by standard QPL syntax, for example:

 complexQuery2 = or(results.collect { or(it.cat) })

In the above example, even though "it.cat" returns a String[] array object, QPL will still turn it into the appropriate OR query of all of the members.

Searching over other Cores

Searches can also be sent to other Solr cores:

 def results = solr.search(<solr-core>, <qpl-query>, <num-docs>)

For example:

 def results = solr.search("mycore2", phrase('search','technologies'), 10);

And in this way, you can use the results from one core to search over another core. This can provide a useful separation of indexes, for example, put your "supporting" indexes in one core, and then your "main document" index in a different core, then use QPL to use the results for one to build the query for the next.

Group searches

Group searches use this structure :

 // default Solr "rows" parameter specifies the number of documents
 def response = solr.groupingSearch(<solr-core>, <qpl-query>, <arg-map>);
 
 // explicitly specify number of documents
 def response = solr.groupingSearch(<solr-core>, <qpl-query>, <arg-map>, <num-documents>);

For example:

 def results = solr.groupingSearch("collection1", phrase('search','technologies'), ["group.field":['resourcename'],"group.limit":5,"group.ngroups":true]);

Returns a SolrIndexSearcher.QueryResult containing the group search result. The groupedResult field of this result is a NamedList<Object> of the response values; see http://lucene.apache.org/solr/4_0_0/solr-solrj/org/apache/solr/common/util/NamedList.html

QPL scripts can manipulate the NamedList using its available Java methods; add(), get(), etc. The NamedList may also be manipulated using native groovy syntax.

Example Usage:

def response = solr.groupingSearch("collection1",or("*:*"), ["group.field":['resourcename'],"group.limit":5,"group.ngroups":true], 3); // just want 3 documents
 println response.groupedResults.resourcename  
 println response.groupedResults.resourcename.matches
 println response.groupedResults.resourcename.groups
 response.groupedResults.resourcename.groups.each(){
       println it.groupValue; 	
       println it.results;  
       it.results.each(){
          println it.id; println it.title;
       } 
 }

You can set the grouping result into the Solr response using:

  • solr.outputGroupsToResultsXML(String elementName, SolrIndexSearcher.QueryResult groupResults)

Adds the grouped results (with the specified name) to the response.

Example Usage:

 solr.outputGroupsToResultsXML("grouped", response)

Highlighting for the group result can be obtained using:

  • solr.getGroupingHighlighting(String coreName, Operator qplQuery, SolrIndexSearcher.QueryResult groupResults)

Returns a NamedList<Object> that can be added to the response using the solr.addResponse() method. The format of this element is identical to the standard Solr "highlighting" element.

Example Usage:

 def groupingHighlighting = solr.getGroupingHighlighting("collection1", phrase('search','technologies'), response)
 solr.addResponse("myGroupingHighlighting", groupingHighlighting )


Term Counts

You can get the number of documents which contain a term using:

 solr.getTermCount("field", "token");

The "token" must be a single token. It can not be a phrase (sorry).

The result is a count of documents which contain the specified token in the specified field. The call is very fast. This can be useful for scripts which need to decide, for example, which terms to remove when doing an OR() expression (i.e. to remove the most common ones), or to look for words which should be spell-checked.

Note that "field" can be a comma-separated list of field:

 solr.getTermCount("content,title,categories", "value");


Context

It is possible to pass items from search component to search component using the Solr "context":

 void setContextString(String name, String value);
 String getContextString(String name);
 void setContextObject(Object name, Object value);
 Object getContextObject(String name);