FAST Content API Listener (Aspire 2)

From wiki.searchtechnologies.com
Jump to: navigation, search

For Information on Aspire 3.1 Click Here


FAST Content API Listener (Aspire 2)
Factory Name  com.searchtechnologies.aspire:aspire-fast-listener
subType  default
Inputs  Jobs from the HTTP Feeder containing data sent by Fast clients via the Content API
Outputs  AspireObjects containing responses to by written back to the client, or new Aspire Job containing client data, including all name=value pairs from the API, to be processed by subsequent Aspire stages
Feature only available with Aspire Enterprise

The FAST Content API Listener stage allows connectors for Fast servers to input content to Aspire.

Fast clients that use the content API utilize HTTP for communication. The FAST Content API Listener sits behind a number of HTTP Feeders and imitates a Fast server, interpreting the requests and providing the appropriate responses. If the request is to process a batch of documents, the listener will create sub jobs for the documents submitted and publish them into Aspire via the Branch Handler.

Supported Servlets

The listener supports the following three Fast servlet paths. Note that when configuring an application, each path requires its own HTTP Feeder.

/is_master

/is_master is called by the client to establish which server it should communicate to. When receiving an is_master request, the listener returns a value indicating it is the master server

/session_factory

/session_factory is called to create, recreate and close sessions via requests to the paths /session_factory/create, /session_factory/recreate and /session_factory/close. On creates and re-creates, the listener receives the collection the connection is for and returns a session number (unique) for subsequent requests.

/session

/session requests are made via four different paths

  • /session/ping
    • a ping request - the listener returns an indication that it is alive
  • /session/get_system_ids
    • a request for the available services - the system responds with the configured system ids
  • /session/poll_callbacks
    • callbacks are required for a given session - the system responds with the appropriate call backs (see Callback handling below)
  • /session/process
    • a request to process documents - the listener creates a batch and publishes the passed content (adding to the batch as it goes) to the configured pipeline via the onPublish event. Completed batches will later be notified to the client when it polls for callbacks

Callback handling

Fast clients ask for status information about the documents they have submitted. The client can submit a single or batch of documents, but each submission (which equates to a call to the servlet /session/process) is a single entity for which status is reported. Thus if a client submits a batch of 20 documents, a callback for that batch will return the status of all 20 documents.

The listener uses one of two different methods to gather the information required to satisfy the call back requests. However with both methods, callbacks will not be returned to the client until all documents in the batch have finished processing.

External callbacks

To use external callbacks, you must configure the listener with the path of an Aspire component (such as Post Fast that supports external callbacks. When the callback request is received, the listener will request the status of the callbacks from the configured component, and return this to the client. This allows the client to obtain the callback (in effect) from the real Fast indexer.

Document counting

In this method, the listener notes the number of documents in the batch and then listens for the Aspire Job events, counting the number or successful and unsuccessful responses. When each document in the batch has been accounted for, the callback can be returned (when requested by the client). If there were no failures in the Aspire processing (i.e. all documents returned successful job events), then the batch will be noted as successful in the callback response. Otherwise, the batch will be noted as failed in the callback response.

Configuration

Element Type Default Description
systemIds String processing:0:1,indexing:1:1 The system ids passed back to Fast clients.
timeout long 60 (seconds) The timeout passed back to Fast clients.
callbacks/component String The Apsire component that will provide status information that can be used when callback requests are received from the Fast client. Currently, only Post Fast components can supply this data.
callbacks/processing String secured,completed A string containing a list of callbacks that should be sent to the client when it asks for processing callbacks for a batch (assuming the batch is complete).
callbacks/indexing String secured,completed A string containing a list of callbacks that should be sent to the client when it asks for indexing callbacks for a batch (assuming the batch is complete).
contentTag String If specified, then content from the Fast API is written to the document under this child of the <doc> element. If not specified, the content is written to root.
onTerminate String ERROR Used to choose what callback should be sent to the client when a job is terminated in the Aspire pipeline. the options are:
  • SUCCESS
    • returns no error
  • ERROR
    • returns a processing error
  • DROP
    • returns a document dropped callback
  • CUSTOM
    • returns a processing error, unless a document attribute "fastCallback" is set. See below.
waitForSubJobsTimeout long 600 (seconds) The time out used by the stage when waiting for subjobs to complete.
branches None The configuration of the pipeline to publish to. See below.

Branch Configuration

The FAST Content API Listener publishes jobs using the branch manager. It uses onPublish event and you therefore need to include a <branches> element in the configuration to publish to a pipeline within a pipeline manager. See Branch Handler for more details.

Element Type Description
branches/branch/@event String The event to configure. At the very least, you should include the onPublish event.
branches/branch/@pipelineManager string The URL of the pipeline manager to publish to. Can be relative.
branches/branch/@pipeline string The name of the pipeline to publish to.

Returning Custom Callbacks

Where jobs fail (exception) or are terminated in the Aspire pipeline, it is possible to return custom callbacks to the FAST client.

Errors

Normally, exceptions in the Aspire cause a processing error callback to be sent to the FAST client. The message from the exception will be added to the error description:

 10:54:20,929 ERROR [DocumentumConnector] Processing Error[document id="09de75d180007187" code="999"
   description="Document "09de75d180007187" failed during Aspire processing (batch: 3_11_20, opId: 13) >>> Exception: Division by zero ### Exception whilst running script: Rule: TEST"
   suggested action="DROP" component="processing" processor="Aspire"]

However, you can choose what type of callback is sent by setting a fastCallback attribute on the document associated with the job that failed. Valid options are:

  • Success
  • Document Dropped
  • Document Error
  • Format Error
  • Indexing Error
  • Invalid Content
  • Operation Lost
  • Processing Error
  • Resource Error
  • Server Unavailable
  • UTF8 Error
  • Unknown Document
  • Xml Error

NOTE: The values are case and white space sensitive and must appear exactly as above. Choosing a value of Success will cause a sucessful callback to be sent and no error is reported to the user

You can also add a custom error message by setting a fastCallbackMessage attribute on the document. This test is then added to the error description before the Exception:

 10:54:21,734 ERROR [DocumentumConnector] Processing Error[document id="09de75d180007187" code="999"
   description="Document "09de75d180007187" failed during Aspire processing (batch: 3_11_20, opId: 13) - exception raised on purpose >>> Exception: Division by zero ### Exception whilst running script: Rule: TEST"
   suggested action="DROP" component="processing" processor="Aspire"]

Terminations

The type of callback sent when a job is terminated in the Aspire pipeline is defined using the <onTerminate> option. This defaults to ERROR which causes a Processing Error to be returned. When this option is set to CUSTOM, the fastCallback and fastCallbackMessage attributes may be set to define the callback and message sent to the client.

 10:54:20,752 ERROR [DocumentumConnector] Document Dropped[document id="09de75d180002e9d" code="998"
   description="Document "09de75d180002e9d" was terminated during Aspire processing (batch: 3_1_10, opId: 4) - job terminated on purpose"
   suggested action="DROP" component="processing" processor="Aspire"]

Valid values for the callback are as for errors, and choosing Success will cause a succesful callback to be sent and message will be reported to the user.

Example configuration

Below is an example configuration

  <component name="Listener" subType="default" factoryName="aspire-fast-listener">
    <debug>true</debug>
    <systemIds>processing:0:1,indexing:1:1</systemIds>
    <callbacks>
      <component>/myPipeline/myFastPublisher</component>
    </callbacks>
    <branches>
      <branch event="onPublish" pipelineManager="../ProcessFastOperationPM"/>
    </branches>
  </component>

Example application

Below is an application configuration that utilisies the HTTP Feeder and Fast Content API Listener to imitate a Fast indexing server

<application name="FastCAPIListener">
  
  <components>
    <!-- FIRST FEEDER: Handle /is_master --> 
    <component name="IsMasterHttpFeeder" factoryName="aspire-http-feeder" subType="default">
      <servletName>is_master</servletName>
      <outputMime>application/octet-stream</outputMime>
      <branches>
        <branch event="onPublish" pipelineManager="CAPIListenerPM" pipeline="listener"/>
      </branches> 
    </component>
   
    <!-- SECOND FEEDER: Handles /session_factory --> 
    <component name="SessionFactoryHttpFeeder" factoryName="aspire-http-feeder" subType="default">
      <servletName>session_factory</servletName>
      <outputMime>application/octet-stream</outputMime>
      <branches>
        <branch event="onPublish" pipelineManager="CAPIListenerPM" pipeline="listener"/>
      </branches> 
    </component>

    <!-- THIRD FEEDER: Handles /session  --> 
    <component name="SessionHttpFeeder" factoryName="aspire-http-feeder" subType="default">
      <servletName>session</servletName>
      <XMLContent>true</XMLContent>
      <outputMime>application/octet-stream</outputMime>
      <branches>
        <branch event="onPublish" pipelineManager="CAPIListenerPM" pipeline="listener"/>
      </branches> 
    </component>

   
    <component name="CAPIListenerPM" subType="pipeline"  factoryName="aspire-application">
      <pipelines>
        <pipeline name="listener" default="true">
          <stages>
            <stage component="Listener" />
          </stages>
        </pipeline>
      </pipelines>
        
      <components>
        <component name="Listener" subType="default" factoryName="aspire-fast-listener">
          <callbacks>
            <component>/myPipeline/myFastPublisher</component>
          </callbacks>
          <branches>
            <branch event="onPublish" pipelineManager="../ProcessFastOperationPM"/>
          </branches>
        </component>
      </components>
    </component>
    
    
    <component name="ProcessFastOperationPM" subType="pipeline"  factoryName="aspire-application">
      <pipelines>
        <pipeline name="process-doc" default="true">
          <stages>
            <stage component="ReceivedJobLogger" />
            <!--

              Add stages to do your processing here

            -->
          </stages>
        </pipeline>
      </pipelines>
        
      <components>
        <component name="ReceivedJobLogger" subType="jobLogger" factoryName="aspire-tools">
          <debug>${debug}</debug>
          <logFile>log/${app.name}/processed.jobs</logFile>
        </component>

        <!--

          Add components to be called from your pipeline here

        -->
      </components>
    </component>
    
  </components>
</application>