Difference between revisions of "QPL Transformer"

From wiki.searchtechnologies.com
Jump to: navigation, search

For Information on Aspire 3.1 Click Here

(Re-using Transforms)
m (Protected "QPL Transformer" ([Edit=Allow only administrators] (indefinite) [Move=Allow only administrators] (indefinite)))
(No difference)

Revision as of 17:07, 3 March 2017

Enterprise Add-On Feature

The QPL Transformer is a method for transforming hierarchical query trees.

Introduction

The purpose of the QPL transformer is to provide a compact method for manipulating hierarchical query trees. This is done by specifying operations (aka 'closures' in Groovy) which selectively operate on nodes in the query tree. If nodes are modified, an updated query tree is returned.

How It Works

The transform takes as input a map, where the key is a selector, and the value is an operation (a Groovy closure).

def newQuery = transform(oldQuery, 
    [ SELECTOR1:{ctx -> OPERATION1}, 
      SELECTOR2:{ctx -> OPERATION2}, 
      SELECTOR3:{ctx -> OPERATION3}
    ]
 );

The transformer then traverses all nodes of the tree. For every node of the tree, QPL will check to see if any of the selectors match the node. If a selector matches, then it's operation will be performed on the node. If the operation returns a new node, a new query tree will be returned with the new node inserted in place.

All nodes are checked twice during the tree traversal:

  1. Once before the operands are processed
    • These can be matched only with the preProcess() selectors
  2. Once after the operands are processed
    • This can be matched only with the TYPE selectors

Examples

Tokenize all terms in the query which contain embedded punctuation. Replace them with a phrase() expression of all of the post-tokenized terms:

andQuery2 = 
  transform(andQuery1, 
    [ TERM:{ctx -> return phrase(tokenize(PUNCT, ctx.op.term))} ]
   );

Flatten nested AND structures. For example: and(and(a,b),and(c,d),and(and(e,f,g),z)) will be converted to and(a,b,c,d,e,f,g,z).

optimzedAnd = transform(
  [(preProcess(operator:AND, MULTIPLE, WITH_OPERANDS)):
    { ctx -> 
      def andSubOps = ctx.op.operands.findAll{ ctx.op == AND };
      if(andSubObs.size() == 0)  return ctx.op;
      def newOps = ctx.op.operands.minus(andSubObjs);
      andSubOps.each{newOps += ctx.operands;}
      return ctx.op.setOperands(newOps);
    }
  ]
);

Transform expressions tagged with the "talk" field into a custom BETWEEN operator expression. For example, convert: |talk:(abc or def)| into |between("<t>", "</t>", (abc or def))|

def newQ = transform(q, 
  [(preProcess(field:"talk",SINGLE,NO_OPERANDS)):
      {ctx -> 
        return new Operator("BETWEEN", operands:[term("<t>"), term("</t>"), ctx.op.setField(null)]);
      }
  ]
}

The "SINGLE" and "NO_OPERANDS" are the default, therefore the above can be more concisely written as:

def newQ = transform(q, 
  [(preProcess(field:"talk")):
      {ctx -> 
         return new Operator("BETWEEN", operands:[term("<t>"), term("</t>"), ctx.op.operands]);
      }
  ]
}

Transform all terms within the "exact" field to include the "O/" prefix ('O' = original). Convert everything else to lower case. Remove the "exact:" field.

def newQ = transform(q, 
  [(preProcess(op:TERM)): {ctx -> 
     if(ctx.field == "exact")
        return term("O/" + ctx.op.term);
     else
        return ctx.op.term.toLowerCase();
   },
   
   (process(field:"exact")): { ctx -> 
     return ctx.op.setField(null);
   }
  ]
)

Various field mappings:

def newQ = transform(q,
  
  // Map from "myField" -> "engineField"
  
  [(preProcess(field:"myField")):{ctx ->
      return ctx.op.setField("engineField")
   },
   
   // map from "myNewsField" to and(field("source", "news"), (sub-expression))
   
   (preProcess(field:"myNewsField")): {ctx -> 
         return and(field("source","news"), ctx.op.setField(null))
   }
  ]
 );

Selectors

Selectors are used to select nodes to be processed. The following selectors are available:

  • Operator Type
    For example: AND, TERM, OR, or (OperatorType.getType("BETWEEN"))
    Executes the operation on the specified operator, AFTER all operands have been processed
  • (preProcess(args...))
    For example: (preProcess(field:"exact", operator:TERM, MULTIPLE, WITH_OPERANDS))
    Executes the operation when the specified conditions are met, BEFORE the operands have been processed.
    Operands Include:
    • operator:OPERATOR_TYPE
      Only on nodes which match the specified OPERATOR_TYPE (AND, OR, OperatorType.getType("BETWEEN"), etc.
    • field:"fieldName"
      Only on fields which match the specified field name
    • MULTIPLE / SINGLE
      If MULTIPLE will execute the operation over and over until the result doesn't change.
      (useful for flattening operations which may need to execute over and over to fully flatten)
    • WITH_OPERANDS / NO_OPERANDS
      If NO_OPERANDS, does not process the operands after this node. If WITH_OPERANDS, it does continue processing the operands after the current operation is run.
  • (process(args...))
    Is tested on nodes after their operands have been processed. If the node matches the conditions, it is processed.
    Operands Include:
    • operator:OPERATOR_TYPE
      Only on nodes which match the specified OPERATOR_TYPE (AND, OR, OperatorType.getType("BETWEEN"), etc.
    • field:"fieldName"
      Only on fields which match the specified field name

Note that (process(operator:AND)) is the same as simply using AND.

Operations

Operations are specified as a Groovy closure, for example:

 {ctx ->  ... script to execute here ... }

The operation will be executed whenever the selector is matched.

Closure Return Variable

There are three possible closure return values:

  • A new operator
    In this case, the new operator is substituted for the original operator which was matched by the selector.
    A new query tree will be returned with the new operator substituted.
  • The same operator
    If the script returns |ctx.op| then the tree is left un-modified.
  • null
    If the script returns |null|, then this operator will be REMOVED from the tree.

The 'ctx' Variable

The "ctx" (i.e. "context") variable is passed to the groovy script. It contains contextual information which can be used within the operation script.

The fields of "ctx" include the following:

  • ctx.field --> enclosing field
    This is the field specified on the closest ancestor node in the query tree.
  • ctx.window --> enclosing proximity window
    This is the proximity window specified on the closest ancestor node in the query tree.
  • ctx.weight --> enclosing weight
    This is the boost weight specified on the closest ancestor node in the query tree.
  • ctx.op --> The original version of the operator being modified
  • ctx.operands --> The new list of operands
    Note that ctx.operands is the new list of operands.
    If the operation occurs after the operands have been processed (in other words, it is not a preProcess() selector), then ctx.operands will contain all of the (perhaps) newly modified operands.
    If you need access to the original operands, then use ctx.op.operands

Re-using Transforms

If you have a transform you wish to re-use number of times, you can create a transform based on you definition and then use it over and over again in the following manner:

  optimizedAndTransformer = getTransformer(
    [(preProcess(AND, MULTIPLE)):{ctx -> def andSubOps = ctx.op.operands.findAll{ it.op == AND };
      if (andSubOps.size() == 0) return ctx.op;
        def newOps = ctx.op.operands.minus(andSubOps);
        andSubOps.each{newOps += it.operands;};
        return ctx.op.setOperands(newOps);
      }
    ]
  );
  andQuery1 = and(and(and('a1','b1'),or('c1','d1'),'e1'),'f1');
  optimizedAndQry1 = optimizedAndTransformer.transform(andQuery1);

or (using the definition of optimizedAndTransformer from above)

  andQuery2 = and(and(and('a2','b2'),or('c2','d2'),'e2'),'f2');
  optimizedAndQry2 = transform(andQuery2, optimizedAndTransformer);