Difference between revisions of "File System Staging Repository Connector App-bundle"

From wiki.searchtechnologies.com
Jump to: navigation, search

For Information on Aspire 3.1 Click Here

Line 1: Line 1:
{{Aspire 2.0 Enterprise}}
{{Aspire 2.0 Enterprise}}
{{Infobox appBundle
{{Infobox appBundle

Latest revision as of 00:08, 9 December 2015

Feature only available with Aspire Enterprise
File System Staging Repository Connector App-bundle
AppBundle Name  File System Staging Repository Connector
Maven Coordinates  com.searchtechnologies.aspire:app-file-repo-connector
Versions  2.2.2
Type Flags  None
Inputs  N/A
Outputs  N/A

File System Staging Repository Connector App-bundle

The File System Staging Repository Connector reads data from a File System Staging Repository and passes to to Aspire for processing. It is much like any other connector and allows submitted jobs to be passed to workflow stages and processed by workflow rules. Optionally, text can be extracted from the items in the repository using Apache Tika.

For information on the File System Staging Repository, see here and for more information on Staging Repository (Aspire 2) in general, see here.

The bundle uses the following components:


This section lists all configuration parameters available to install the File System Staging Repository Connector Application Bundle.

General Application Configuration

Property Type Default Description
enableTextExtract boolean [Required] If true, and streams returned from the repository will be passed to Apache Tika for text extraction
jms boolean false Enable JMS updates
broker string [Required (JMS)] The JMS broker to connect to
channel string [Required (JMS)] The JMS channel (topic/queue) to connect to
useTopic boolean false The value in the channel is a topic
subJobThreads long 10 The number of threads to process the jobs
subJobQueue long 30 The size of the sub job queue
subJobTimeout long 5m The period to try to put a job on the queue before failing
workflowReloadPeriod long 15m The period after which the workflow will reload
workflowErrorTolerant boolean false When set to true this allows workflows to continue even when they encounter an error and complete normally regardless of the document fields available
emitStartJob boolean true Emit a startCrawl job when the crawl start
emitEndJob boolean true Emit a endCrawl job when the crawl stops
fullRecovery full/incremental The type of full recovery crawl
incrementalRecovery full/incremental The type of incremental recovery crawl
batchSize long 50 The maximum number of items submitted to a batch
batchTimeout long 60,000 The time in ms before batches are timed out
enableAuditing boolean Enable auditing
snapshotDir String snapshots The directory for snapshot files.
debug Boolean false Controls whether debugging is enabled for the application. Debug messages will be written to the log files.

Configuration Example

To install the application bundle, connecting to an LDAP server to for cache population, add the configuration, as follows, to the <autoStart> section of the Aspire settings.xml.

<?xml version="1.0" encoding="UTF-8"?>
<application config="com.searchtechnologies.aspire:app-file-repo-connector">
    <property name="enableTextExtract">true</property>
    <property name="jms">true</property>
    <property name="broker">tcp://localhost:61616</property>
    <property name="channel">demoQueue</property>
    <property name="useTopic">true</property>
    <property name="generalConfiguration">true</property>
    <property name="snapshotDir">${dist.data.dir}/${app.name}/snapshots</property>
    <property name="subJobThreads">10</property>
    <property name="subJobQueue">30</property>
    <property name="subJobTimeout">10m</property>
    <property name="workflowReloadPeriod">15s</property>
    <property name="workflowErrorTolerant">false</property>
    <property name="emitStartJob">false</property>
    <property name="emitEndJob">false</property>
    <property name="fullRecovery">incremental</property>
    <property name="incrementalRecovery">incremental</property>
    <property name="batchSize">50</property>
    <property name="batchTimeout">60000</property>
    <property name="enableAuditing">true</property>
    <property name="debug">false</property>

Note: Any optional properties can be removed from the configuration to use the default value described on the table above.