Regex Splitter (Aspire 2)

From wiki.searchtechnologies.com
Jump to: navigation, search

For Information on Aspire 3.1 Click Here


Regex Splitter (Aspire 2)
Factory Name  com.searchtechnologies.aspire:aspire-tools
subType  regexSplitter
Inputs  AspireObject that has xPath and delimeter element indicating input element to split
Outputs  AspireObject

The Regex Metadata Splitter stage parses fields with semi-colon separated list and creates <val> entry in order to feed multi-value fields. Unlike the default subtype, values are only moved to the output element if they match a regular expression.

Configuration

Element Type Default Description
xPath string Specify xPath element in the AspireObject e.g., //category.
xPath/@regex string The regular expression must be matched for the split field to be moved to the output.
xPath/@output string The name of the output element that will be created in the document.

Sample Configuration

 <component name="regexSplitter" subType="regex" factoryName="aspire-splitter">
   <xPath regex="^[A-Za-z ]*Keywords" output="pg_indexterm_classifications">/doc/pgClassifications_expanded</xPath>
   <xPath regex="^[A-Za-z ]*Keywords" output="p_indexterm_classifications">/doc/pClassifications_expanded</xPath>
   <xPath regex="^[A-Za-z ]*Keywords" output="pv_indexterm_classifications">/doc/pvClassifications_expanded</xPath>
   <xPath regex="^[A-Za-z ]*Keywords" output="ma_indexterm_classifications">/doc/maClassifications_expanded</xPath>
 </component>

For example

Input fields:

 <maClassifications_expanded source="clasificationExpander">Scottish Keywords;Scottish Keywords/GAELIC LANGUAGE;Scottish Keywords/GAELIC LANGUAGE PROGRAMMES</maClassifications_expanded>

Output fields:

 <ma_indexterm_classifications source="RegexSplitter" tagName="maClassifications_expanded">
   <val>Scottish Keywords</val>
   <val>Scottish Keywords/GAELIC LANGUAGE</val>
   <val>Scottish Keywords/GAELIC LANGUAGE PROGRAMMES</val>
 </ma_indexterm_classifications>