Field Mapper (Aspire 2)

From wiki.searchtechnologies.com
Jump to: navigation, search

For Information on Aspire 3.1 Click Here

The Field Mapper component takes one or more source fields and maps the result value into a destination field. The mapping can be made through various mechanisms such as: Simple Mapping, Multiple Source Mapping, Regex Mapping, Template Mapping, Date Format Mapping and Constant Mapping.

Training Material

If you're interested in learning more, here's a recording of the Tech Talk on the Field Mapping video Field Mapping slides

Configuration

All mappings done through this component will be added to the AspireObject under the <searchFields> tag.


Element Type Default Description
mappings xml none Xml that contains all the mappings.
debug boolean false When true, debug logs will be print out.


Mappings Configuration

Simple Mapping

Takes the value inside the source field and maps it to the destination field.

Element Type Description
sourceField string The source field of the mapping.
destinationField string Destination field of the mapping.


Multiple Source Mapping

Takes multiple source fields and maps their content into a destination field. This type of mapping has two modes: Fallback and Concatenate.

  • Fallback Mode: Maps the first available field from the given source field list.
  • Concatenate Mode: Appends all source field values into one string. There three types of concatenations:
    • Blank Space: separates each value using a whitespace.
    • Multivalue field: separates each value using a semicolon (;).
    • Custom separator: separates each value using a custom character separator.


Element Type Description
sourceFields/sourceField string List of source fields.
destinationField string Destination field of the mapping.
multipleMapType string Mapping mode. Can be concatenationMode or fallbackMode.
concatenationType string Concatenation mode. Can be separatorString, multivalued and default.
separator string String to use as custom separator. For separatorString concatenation mode.


Template Mapping

Takes a Groovy template and replaces the field names with their values and maps it to a destination field.

Element Type Description
sourceTemplate string Groovy template to map multiple fields (e.g. "Hello: ${title}, by ${author}").
destinationField string Destination field of the mapping.


Constant Mapping

Sets the destination field with a constant value.

Element Type Description
constantValue string The constant value to set on the destination field.
destinationField string Destination field of the mapping.


Regular Expression Mapping

Takes the source field value and tries to match with a regular expression pattern. There are two modes: Replace and Extract.

  • Replace:Replaces the matching string with a new value.
  • Extract:Extracts the matching string and sets it on the destination field.


Element Type Description
sourceField string The source field of the mapping.
destinationField string Destination field of the mapping.
regularExpressionMappingType string Regex mode. Can be extract or replace.
regex string Regular expression to match the source field value. (e.g. \.(?<=\.).*$)
replaceValue string Replace value for the string that matches the regular expression. Used for replace mode.


Date Format Mapping

Takes the source field date value, formats it using an output format and sets the new value into the destination field.

Element Type Description
sourceField string The source field of the mapping.
destinationField string Destination field of the mapping.
inputFormat string The date format of the source field value (e.g. yyyy-MM-dd'T'HH:mm:ss'Z').
outputFormat string The date format of the destination field value. (e.g. yyyy-MM-dd)


Configuration Example

<component name="FieldMapper" subType="default" factoryName="aspire-field-mapper">
  <mappings>
    <simpleMappings>
      <mapping>
        <sourceField>repItemType</sourceField>
        <destinationField>docTypeField</destinationField>
      </mapping>
    </simpleMappings>
    <multipleSourceMappings>
      <mapping>
        <sourceFields>
          <sourceField>url</sourceField>
          <sourceField>lastModified</sourceField>
          <sourceField>dataSize</sourceField>
        </sourceFields>
        <destinationField>concatField</destinationField>
        <multipleMapType>concatenationMode</multipleMapType>
        <concatenationType>multivalued</concatenationType>
      </mapping>
      <mapping>
        <sourceFields>
          <sourceField>fieldA</sourceField>
          <sourceField>FetchUrl</sourceField>
          <sourceField>lastModified</sourceField>
          <sourceField>fieldB</sourceField>
        </sourceFields>
        <destinationField>fallbackField</destinationField>
        <multipleMapType>fallbackMode</multipleMapType>
      </mapping>
    </multipleSourceMappings>
    <templateMappings>
      <mapping>
        <sourceTemplate>Source: ${sourceName} Type: ${sourceType}</sourceTemplate>
        <destinationField>templateField</destinationField>
      </mapping>
    </templateMappings>
    <constantMappings>
      <mapping>
        <destinationField>constantField</destinationField>
        <constantValue>constantValueField</constantValue>
      </mapping>
    </constantMappings>
    <regularExpressionMappings>
      <mapping>
        <sourceField>url</sourceField>
        <destinationField>regexField</destinationField>
        <regularExpressionMappingType>extract</regularExpressionMappingType>
        <regex>\.(?<=\.).*$</regex>
      </mapping>
    </regularExpressionMappings>
    <dateFormatMappings>
      <mapping>
        <sourceField>lastModified</sourceField>
        <inputFormat>yyyy-MM-dd'T'HH:mm:ss'Z'</inputFormat>
        <destinationField>simpleModified</destinationField>
        <outputFormat>yyyy-MM-dd</outputFormat>
      </mapping>
    </dateFormatMappings>
  </mappings>
  <debug>false</debug> 
</component>

Output Document Example

<doc>
  <url>C:\testdata\Search Engine Security Blog.docx</url>
  <snapshotUrl>002 C:\testdata\Search Engine Security Blog.docx</snapshotUrl>
  <docType>item</docType>
  <repItemType>aspire/file</repItemType>
  <fetchUrl>file:/C:/testdata/Search%20Engine%20Security%20Blog.docx</fetchUrl>
  <displayUrl>C:\testdata\Search Engine Security Blog.docx</displayUrl>
  <id>C:\testdata\Search Engine Security Blog.docx</id>
  <lastModified>2013-06-10T21:38:24Z</lastModified>
  <dataSize>194425</dataSize>
  <sourceName>FieldMapper-FSTest</sourceName>
  <sourceType>filesystem</sourceType>
  <connectorSource type="filesystem">
    <url>C:\testdata</url>
    <partialScan>false</partialScan>
    <subDirUrl />
    <indexContainers>false</indexContainers>
    <scanRecursively>false</scanRecursively>
    <useACLs>false</useACLs>
    <acls />
    <scanExcludedItems>false</scanExcludedItems>
    <fileNamePatterns />
    <displayName>FieldMapper-FSTest</displayName>
  </connectorSource>
  <action>add</action>
  <hierarchy>
    <item id="4CD52EBD58F51B94364D1CC77D878910" level="2" name="Search Engine Security Blog.docx" 
     url="C:\testdata\Search Engine Security Blog.docx">
      <ancestors>
        <ancestor id="7C070A1B17F7736EF883435C5AC053E2" level="1" name="FieldMapper-FSTest" parent="true" 
         type="aspire/filesystem" url="C:\testdata\" />
      </ancestors>
    </item>
  </hierarchy>
  <protocol source="FetchURLStage/protocol">file</protocol>
  <mimeType source="FetchURLStage/mimeType">content/unknown</mimeType>
  <extension source="FetchURLStage">
    <field name="modificationDate">2013-06-10T21:38:24Z</field>
    <field name="content-length">194425</field>
    <field name="last-modified">Mon, 10 Jun 2013 21:38:24 GMT</field>
    <field name="content-type">content/unknown</field>
  </extension>
  <searchFields>
    <constantField>constantValueField</constantField>
    <simpleModified>2013-06-10</simpleModified>
    <concatField>C:\testdata\Search Engine Security Blog.docx,2013-06-10T21:38:24Z,194425</concatField>
    <fallbackField>2013-06-10T21:38:24Z</fallbackField>
    <regexField>.docx</regexField>
    <docTypeField>aspire/file</docTypeField>	
    <templateField>Source: FieldMapper-FSTest Type: filesystem</templateField>
  </searchFields>
</doc>