RDB Row Extractor 0.4

From wiki.searchtechnologies.com
Jump to: navigation, search

For Information on Aspire 3.1 Click Here

Aspire / Aspire Components / RDB Row Extractor

RDB Row Extractor
Description: Adds data from an RDB to an aspire document.
Inputs: Aspire document object and RDB
Outputs: The input AspireDocument with extra information added. This information is taken from the RDB.
Factory: aspire-rdbfeeder
Sub Type: rdbRowExtractor
Object Type: Produces AspireDocument objects.


Operation

This stage takes an Aspire document and exectutes some SQL against an RDB. This data is then added to the document under an output element.

The SQL to execute is held in the configuration file. The configuration allows for parameters to the sql to be taken from the document. The location of the parameter is given as an xpath.

The extractor has three modes: Single Row, Multiple Row and Compress Multiple Rows. By default, Single Row Mode is used.

Single Row

The sql is exectuted and the first row of the results added to the document. Each column of the row will appear as an element under the output element. If there is more than one row, a warning message is issued.

Example document for Single Row mode

 <doc action="insert">
   <ID source="RDBFeederImpl">1</ID>
   <fetchUrl source="RDBFeederImpl">1</fetchUrl>
   .
   .
   <pgTitle>
      <TITLE source="RDBRowExtractor">commission title</TITLE>
      <TITLE_TYPE source="RDBRowExtractor">Group Working Title</TITLE_TYPE>
   </pgTitle>
 </doc>

Multiple Row

The sql is exectuted and the rows of results are added to the document. Each row will appear as an element under the output element. Each column of the row will appear as an element under that sub element.

Example document for Multiple Row mode

 <doc action="insert">
   <ID source="RDBFeederImpl">1</ID>
   <fetchUrl source="RDBFeederImpl">1</fetchUrl>
   .
   .
   <pgTitles>
     <pgTitle>
       <TITLE source="RDBRowExtractor">commission title</TITLE>
       <TITLE_TYPE source="RDBRowExtractor">Group Working Title</TITLE_TYPE>
     </pgTitle>
     <pgTitle>
       <TITLE source="RDBRowExtractor">Another commission title</TITLE>
       <TITLE_TYPE source="RDBRowExtractor">Group Working Title</TITLE_TYPE>
     </pgTitle>
   </pgTitles>
 </doc>

Compress Multiple Rows

The sql is exectuted and the rows of results are added to the document. Each column of each row will be concatenated together using the configured delimiter. The complete results set will appear under the output element.

Example document for Compress Multiple Row mode

 <doc action="insert">
   <ID source="RDBFeederImpl">1</ID>
   <fetchUrl source="RDBFeederImpl">1</fetchUrl>
   .
   .
   <pgIndexingTerm source="RDBRowExtractor">/multimedia/videos/uk/london;/multimedia/sound/uk/london;/multimedia/web;/multimedia/videos/uk/herefordshire/hendre;</pgIndexingTerm>
 </doc>

Configuration

Element Type Default Description
rdbLocation String rdb The component path within the system of the rdb connection pool component. Used for fetching database connections.
output String None The name of the element in the aspire document to write the data.
output/@row String None Turn on Multiple Row mode and use the given element name as the sub element for a row.
output/@rowDelim String None Turn on Compress Multiple Row mode and use the the given character as the delimiter
extract None The SQL to run to extract data. See below.


Extract SQL Configuration

The RDB Row Extractor uses configurable SQL to specify what to add to the document.

Element Type Default Description
extract/sql String None The SQL to be run to extract data. The SQL may include ? as place holders for parameters. These will then be subsituted in in order.
extract/param/@index Integer None The parameter number of this parameter, with parameter one replacing the first ? in the SQL, parameter two replacing the second ? in the SQL etc.
extract/param/@type String Integer The type of this parameter - either String or Integer
extract/param/@docPath String None The xpath in the Aspire document to the element that holds the value for this parameter.

Example Configurations

Simple Configuration

 <component name="EM3getPGTitles" subType="rdbRowExtractor" factoryName="aspire-rdbfeeder">
   <config>
     <rdbLocation>/common/EM3rdb</rdbLocation>
     <output row="pgTitle">pgTitles</output>
     <extract>
       <param index="1" type="Integer" docPath="/doc/PG_ID"/>
       <sql><![CDATA[
           select pgt.title as TITLE,
                  tt.name as TITLE_TYPE
           from   em3.product_group_title pgt,
                  em3.title_type tt
           where  pgt.product_group_id = ?
              and pgt.title_type_id = tt.id
                   ]]></sql>
     </extract>
   </config>
 </component>