XML Loader (Aspire 2)

From wiki.searchtechnologies.com
Jump to: navigation, search

For Information on Aspire 3.1 Click Here


XML Loader (Aspire 2)
Factory Name  com.searchtechnologies.aspire:aspire-xml-files (previously aspire.XML)
subType  loadXML
Inputs  Either object['contentStream'] (an InputStream which contains the XML file to be loaded) or object['contentBytes'] (an array of bytes which contains the XML file to be loaded).
Outputs  The XML file specified by the content stream or bytes will be loaded into memory and stored as a sub-element within the <doc> element attached to the AspireObject which is attached to the job.

The Load XML Stage stage loads XML from a stream into the job's AspireObject. The XML will be loaded as a sub-element. This stage is typically used after a Fetch URL stage (which creates the stream).


Configuration

Element Type Default Description
localResourceDir string null The directory on the local system where DTD files and other required XML resources are located. The local directory will be consulted for these DTD files before going across the web. This often works better for large and complex files from third party resources, and when working on machines that are not connected to the internet (i.e., behind a firewall).

Also, it improves the performance of fetching these files.

If null (the default), DTD files will always be fetched from across the internet.

cleanse boolean false Set to true if you want to clean the XML content from non-readable characters (i.e., ASCII code 15).
encoding string null Allows to specify a concrete XML character encoding. The specified encoding will be used to read all XML files, if the encoding cannot be determined automatically from the input XML stream.

Example Configuration

Simple

 <component name="LoadXML" subType="loadXML" factoryName="aspire-xml-files"/>

With a Locally Stored DTDs

Use this version if the XML file calls out DTDs which you can not access through the internet.

  <component name="LoadXML" subType="loadXML" factoryName="aspire-xml-files">
    <localResourceDir>resources/dtds</localResourceDir>
  </component>

Example Use Within A Pipeline

  <pipeline name="process-feedOne-test">
    <stages>
      <stage component="FetchUrl" />
      <stage component="LoadXML" />
    </stages>
  </pipeline>

Example

In the following example suppose that there's a file called "file:test.xml" which contains the following:

<testRootNode>
  <speech name="George Washington">The period for a new election of a citizen, 
    to administer the executive government of the United States, being not far distant, 
    and the time actually arrived...
  </speech>
  <speech name="Abraham Lincoln">Four score and seven years ago our forefathers 
    brought forth upon this country...
  </speech>
  <speech name="Thomas Jefferson">We hold these truths to be self-evident, 
    that all men are created equal, that they are endowed by their Creator 
    with certain unalienable Rights, that among these are Life, Liberty and 
    the pursuit of Happiness...
  </speech>
</testRootNode>

Further suppose that "file:test.xml" is read by the Fetch URL stage. Once executing the Load XML stage, the AspireObject will contain the following structure. Notice how the <testRootNode> is nested within the <doc> node which is the root node of the AspireObject.

<doc>
  <fetchUrl>file:test.xml</fetchUrl>
  <protocol source="FetchURLStage/protocol">file</protocol>
  <mimeType source="FetchURLStage/mimeType">application/xml</mimeType>
  <extension source="FetchURLStage">
    <field name="modificationDate">2009-12-06T05:06:06Z</field>
    <field name="content-type">application/xml</field>
    <field name="content-length">618</field>
    <field name="last-modified">Sun, 06 Dec 2009 05:06:06 GMT</field>
  </extension>
  <testRootNode>
    <speech name="George Washington">The period for a new election of a citizen, 
      to administer the executive government of the United States, being not far distant, 
      and the time actually arrived...
    </speech>
    <speech name="Abraham Lincoln">Four score and seven years ago our forefathers 
      brought forth upon this country...
    </speech>
    <speech name="Thomas Jefferson">We hold these truths to be self-evident, 
      that all men are created equal, that they are endowed by their Creator 
      with certain unalienable Rights, that among these are Life, Liberty and 
      the pursuit of Happiness...
    </speech>
  </testRootNode>
</doc>