Difference between revisions of "Creating a New Builder"

From wiki.searchtechnologies.com
Jump to: navigation, search

For Information on Aspire 3.1 Click Here

(DO NOT Use Groovy Binding Variables in Builders)
m (Protected "Creating a New Builder" ([Edit=Allow only administrators] (indefinite) [Move=Allow only administrators] (indefinite)))
(No difference)

Revision as of 17:11, 3 March 2017

Enterprise Add-On Feature

Introduction

New builders for new search engines are built by creating a Groovy script. This Groovy script assigns Groovy "closures" (fragments of Groovy code) to each of the different operator types that your builder supports.

This is done with the "builder.opType" syntax:

 builder.and = {context, op, operands -> /* CODE HERE FOR BUILDING THE AND OPERATOR */ ; }

In the above code, you closure will be called whenever the builder needs to build the "and" operator. Your closure will be supplied by the builder with three arguments:

  • context - The context of the build operation.
    This can be used with various methods to extract things like the current field name and the current proximity window.
  • op - The QPL Operator object which needs to be built.
  • operands - A list of pre-built operands. This list contains all of the operands of op, already processed by the builder.

So then, the job of your closure will be to take the pre-built operands and construct whatever string or object is required to build the operator being built.

Processing of builders proceeds from the bottom on up. All of the lowest level operands are built first, then their parents, then grandparents, etc. The root of the query will be built last. This default processing sequence can be adjusted (see event processing below).

DO NOT Use Groovy Binding Variables in Builders

Use the keyword "def" for all variables in builders. This will make all of the variables local variables. For more information on Groovy variable scope, see here

If you don't explicitly define a variable in Groovy (and you're not forced too), they become binding variables. Binding variables are associated with the shell used to parse the Groovy script and the closures are created by a single script. Thus, all closures share the same binding and hence the same binding variables.

The builder implementation stores closures against operator types and reuses the same closure, potentially in different threads and changes to binding variables will be reflected across all threads, resulting in potentially unexpected results.

Context Methods

The "context" passed to your closure will be an instance of the QPLBuilder class. The following methods are the most useful (shown in Groovy-style notation)

  • context.field
    Gets the current 'enclosing field' in which this operator is contained. Returns null if there is no enclosing field specified by the query expression
  • context.fieldOrDefault
    Gets the current 'enclosing field' unless it is null, in which case it returns the default field (as specified when the builder is first initialized).
  • context.window
    Gets the current enclosing proximity window.
  • context.defaultField
    Returns the default field name as specified when the builder was initialized.

You can also use the context to manipulate the builder stack (see the javadoc for the QPLBuilder). We'll document that here once we figure out a reason why anyone would want to do that.

Functions

You can define functions in your code and use them in your closures:

 def myMethod(x) { ... }
 
 builder.and = {context, op, operands -> myMethod(x) . . . ; }

Lucene Example

A starter example for a Lucene builder is shown below:

import org.apache.lucene.index.*;
import org.apache.lucene.search.*;

// TODO:  Set proper field, set proper boost
builder.and = {context, op, operands ->
  BooleanQuery bq = new BooleanQuery();
  operands.each{ bq.add(new BooleanClause(it, BooleanClause.Occur.SHOULD)) };
  return bq;
  };

// TODO:  Set proper field, set proper boost
builder.or = {context, op, operands -> 
  BooleanQuery bq = new BooleanQuery();
  operands.each{ bq.add(new BooleanClause(it, BooleanClause.Occur.MUST)) };
  return bq;
  };

// TODO:  Set proper field, set proper boost
builder.term = {context, op, operands -> return new TermQuery(new Term("text", op.term))};

Event Processing

Builder processing is bottoms-up. The children are all processed first, and then the parent is processed. Most of the time, this is exactly what you want.

Processing at the Parent

However, occasionally, you'll want to do all of the work at the parent before the children are touched, OR you'll want to do some pre-processing first before the children are processed. A special "event code" (@parent) is available for doing this.

Note that closures for 'type@parent' processing only takes two arguments: ctx (the context) and op (the operator being built at the parent level).

In the following example, phrases are fully processed at the parent level:

builder.'phrase@parent' = {ctx, op -> 
  if(op.numOperands > 0) {
    def operandsList = [];
    op.operands.each { processPhraseItem(operandsList, it) };

    def phraseOp;
    if( operandsList.any { it instanceof Term[] } )
      phraseOp = MultiPhraseQuery();
    else
      phraseOp = PhraseQuery();

    operandsLit.each{ phraseOp.add(it); }

    if(op.hasBoost()) phraseOp.setBoost(op.boost);
    return phraseOp;
  }
};

To tell the builder to continue processing as normal (i.e. to process the children and then come back to processing the parent), the return the special code: BUILDER_CONTINUE:

return BUILDER_CONTINUE;

The Finalizer

If you need to do some cleanup on your query expression before it is finally returned, you can write a finalizer. This is called after the entire tree is processed.

All finalizers look like this:

 builder.finalize = {context, qplRoot, engineRoot -> /* your finalization code goes here */ }

The finalizer takes three arguments:

  • context - The context, a QPLBuilder object.
    This is the same context as passed to all of the other builder closures.
  • qplRoot - This the original QPL query which was passed in to be built.
  • engineRoot - This is the final, completed engine query which your builder constructed.

The following is the Lucene finalizer:

builder.finalize = {context, qplRoot, engineRoot -> 
  return toQuery(engineRoot);
}

In the Lucene finalizer, the top-level node could be a not() expression, which is a BooleanClause (not a fully realized BooleanQuery). The toQuery() method (defined elsewhere in the .groovy file) detects this situation and converts the clause to a query by AND'ing against a query which retrieves all documents.