Read Only Hash Table Lookup (Aspire 2)

From wiki.searchtechnologies.com
Jump to: navigation, search

For Information on Aspire 3.1 Click Here


Read Only Hash Table Lookup (Aspire 2)
Factory Name  com.searchtechnologies.aspire:aspire-hash-table
subType  readOnlyAttribute
Inputs  AspireObject (when used as a pipeline stage)
Outputs  Attributes on the target element

The Read Only Hash Table Lookup component loads an in-memory hash table for very quickly looking up data and adding it to the document being processed. The hash table will be automatically loaded on start-up from a tabular file or relational database select. This pipeline stage, takes the key from an existing XML element, looks up the entry in the hash table, and then maps the hash table values-array elements to attributes of the target element in the document being processed.


Configuration

Element Type Default Description
initialSize int 10000 (10 thousand) The estimated initial size for the hash table, used to specify its initial capacity. It is best to set this value large enough to contain all of the expected entries in the hash table. This will prevent additional hash table allocations and rehashing.
initializeFromTabularFile boolean false Set this flag to true if you are initializing the hash table from a tabular file (i.e. a comma-separated or tab-separated file).
fileName string none (requires initializeFromTabularFile = true) The file name where the tabular file can be located. If a relative path is specified, this is assumed to be relative to Aspire Home.
separator string tab (requires initializeFromTabularFile = true) This is either "comma", "tab" or a single character to specify the separator used for columns in the file. If a CSV file, use "comma".

The tabular files use the Microsoft-Excel standards for specifying data. Specifically, data entries with embedded commas or tabs should be surrounded by double quotes. Data entries which contain double quotes should escape the double-quote character with a pair of double quotes.

Finally, if you want to have some other separator (for example, the pipe-character / vertical-bar, |, is popular), then you can specify that single character in the <separator> tab as well.

hasColumnLabels boolean false (requires initializeFromTabularFile = true) Set this flag to true if the first row of the tabular file contain column labels.
keyColumn string column0 (requires initializeFromTabularFile = true) The name of the tabular file column which will be used for the hash table key.

If <hasColumnLabels> = false, then the column labels will be numbered starting with 1, as in "column1", "column2", "column3", etc.

<keyColumn> is also available when loading the hash table from the RDB. See below.

valueMap Nested list of <column label=""/> tags include all columns in the order in which they occur (requires initializeFromTabularFile = true) The value map parent tag allows users to choose exactly which columns are stored in the hash table (controlling memory usage) and the order of the columns in the value array.

Inside of <valueMap> list the columns desired with nested <column label=""> tags. Only columns specified in the value map will be stored in the hash table. The order of the values in the hash table will be the same as the order of the <column> tags inside the value map.

Column labels will either be the labels specified in the file (if <hashColumnLables> is true) or "column1", "column2", "column3" etc. otherwise.

initializeFromSQL boolean false Set this flag to true if you are initializing the hash table from a SQL select statement.
connectionPoolName string none (requires initializeFromSQL = true) The Aspire component name of the RDBMS Connection component which maintains the pool of RDB connections for the database to be queried.
sqlQuery string none (requires initializeFromSQL = true) The SQL query to use to access the data from the RDBMS to load the hash table. The order of the columns in the SQL table will be maintained in the list of values stored in the hash table.
keyColumn string none (requires initializeFromSQL = true) The name of the SQL column from the "sqlQuery" query which will be used for the hash table key.
targetElement string none (when used as a pipeline stage) The XML element from the document being processed which will be used as the key to look up the entry in the hash table. NOTE: Multiple target elements are allowed.
metadataMap Metadata Mapper none Specifies the mapping of fields or columns from the original hash table

Example Configurations

 <component name="refGetLifeCycleText" subType="readOnlyAttribute" factoryName="aspire-hash-table">
   <initializeFromSQL>true</initializeFromSQL>
   <connectionPoolName>/rdbConnections/reference</connectionPoolName>
   <targetElement>/doc/STATE_ID</targetElement>
   <targetElement>/doc/PG_STATE_ID</targetElement>
   <targetElement>/doc/P_STATE_ID</targetElement>
   <targetElement>/doc/PV_STATE_ID</targetElement>
   <targetElement>/doc/MA_STATE_ID</targetElement>
   <targetElement>/doc/MAR_STATE_ID</targetElement>
   <sqlQuery>
      <![CDATA[
         SELECT
             id   AS ID,
             name AS NAME
         FROM
             REF.ref_entity_state_type
       ]]>
   </sqlQuery>
   <keyColumn>id</keyColumn>
 </component>

Also see the examples for the default subtype

For example

Input fields:

 <PG_STATE_ID source="RDBFeederImpl"><![CDATA[1]]></PG_STATE_ID>
 <P_STATE_ID source="RDBFeederImpl"><![CDATA[1]]></P_STATE_ID>
 <PV_STATE_ID source="RDBFeederImpl"><![CDATA[1]]></PV_STATE_ID>
 <MA_STATE_ID source="RDBFeederImpl"><![CDATA[1]]></MA_STATE_ID>
 <MAR_STATE_ID source="RDBFeederImpl"><![CDATA[1]]></MAR_STATE_ID>

Output fields:

 <PG_STATE_ID NAME="Created" source="RDBFeederImpl"><![CDATA[1]]></PG_STATE_ID>
 <P_STATE_ID NAME="Created" source="RDBFeederImpl"><![CDATA[1]]></P_STATE_ID>
 <PV_STATE_ID NAME="Created" source="RDBFeederImpl"><![CDATA[1]]></PV_STATE_ID>
 <MA_STATE_ID NAME="Created" source="RDBFeederImpl"><![CDATA[1]]></MA_STATE_ID>
 <MAR_STATE_ID NAME="Created" source="RDBFeederImpl"><![CDATA[1]]></MAR_STATE_ID>