Job Error Handler (Aspire 2)

From wiki.searchtechnologies.com
Jump to: navigation, search

For Information on Aspire 3.1 Click Here

Feature deprecated This feature has been deprecated.
Job Error Handler (Aspire 2)
Factory Name  com.searchtechnologies.aspire:aspire-job-errorhandler
subType  handler
Inputs  Jobs that have failed in other pipelines or stages
Outputs  New jobs submitted directly to a stage

Called from the onError branch of another pipeline, the Job Error Handler component keeps track of failures and resubmits them to the stage that failed after a period of time. After a configurable number of attempts, the job is quarantined and not submitted again.

  • Failed jobs are written immediately to disk in a retry directory and resubmitted on startup
  • Jobs may be resubmitted "manually" via the web interface
  • Jobs may be removed from quarantine via the web interface
  • By default this component will throw an exception so the feeder that submitted the job will see the error
  • The feeder can request the error handler not to process it's job, or to resubmit the job to the beginning of the pipeline rather than direct to the stage that failed.

Configuration

Element Type Default Description
raiseException boolean true Raise an exception when processing has finished so a feeder will see an error. This may be disabled so a job passed to the error handler is seen as sucessful.
retryDelay int 300000
(=5m)
The period (in ms) the handler should wait before resubmitting the job.
resubmitEnabled boolean true Whether jobs will be resubmitted. This can also be controlled from the UI. Usually this should be left as 'true' and only turned off if you know jobs will continue to fail. If 'false' (or turned off from the UI) jobs will continue to be added to the retry queue, but never resubmitted.
maxRetries int 3 The number of times the job is resubmitted before being quarantined.
retrySleep int 1000
(= 1s)
The number of milliseconds to sleep between polling the job resubmission queue.
retryDirectory String The directory to write failed jobs, that are on the resubmission queue, to.
quarantineDirectory String The directory to write quarantined jobs to.
jobs/directory A registered directory to display in the interface. Jobs in this directory may be submitted back in to the system (similar to the quarantine directory). This may be pointed to the failedJobs directory of a feeder so any failed jobs may be easily resubmitted. Multiple directories are allowed.
jobs/directory/@label String Required The label displayed in the web interface for this directory.
jobs/directory/@path String Required The path to the directory.
jobs/directory/@url String If specified, when displayed in the interface, the label will be hyperlinked to this URL.

Process description

The error handler is a component that resubmits failed jobs back to the stage that failed to allow for a more resilient index process. It should be configured in a application.xml file (see System Configuration) of its own, with its own pipeline and manager (see the application.xml file below).

The component is called by configuring the onError branch of other pipelines to jump to this pipeline containing the error handler (see the pipeline configuration below).

On receiving a job, the handler looks for properties on the job indicating the stage, pipeline and pipeline manager that failed, the stage, pipeline and pipeline manager last cause the job to fail previously (if it has) and a count of the retries for this job. If the stage that failed differs from the stage that previously caused the job to fail, the retry count is reset. Thus the maximum retries will apply to each stage in the pipeline and the job will only be quarantined if it fails the same stage maxRetries times in succession.

If the retry count is greater than the maximum retries, then the job is written to the quarantine directory and an exception raised (unless the raiseException flag has been set to false).

If the retry count is less than the maximum retries, then the job is written to the retry directory and added to a queue of jobs to be resubmitted and an exception raised (unless the raiseException flag has been set to false).

The file written to the quarantine and retry directories contains details of the stage, pipeline and manager, the retry count and the time at which the job was received by the handler. An example of the file can be seen here. The file names used are random, unique and unrelated to the job that failed. A job that fails more than once will use a different filename each time it is presented to the error handler.

A separate thread wakes periodically (as configured by the retrySleep parameter) and examines the queue. If there are jobs that were submitted to the queue more than retryDelay ms ago, then these jobs are resubmitted to the stage at which they previously failed and removes it from the queue. Once the handler receives a job event for this job (whether it succeeds or fails) then the file is deleted from the retry directory. If the job fails again, the onError branch of the original pipeline will send the job back to the error handler. If the job cannot be submitted for some reason, it will remain on the queue.

Web interface

The component provides a web interface that allows the operator to view the queue and quarantine directories and resubmit and unquarantine items.

Resubmitting items via the web interface

From the web interface, choose the link to view the queue. When the queue is displayed, you can click the links to resubmit the first item or all items in the queue.

Unquarantining items via the web interface

From the web interface, choose the link to view the quarantine directory. When the queue is displayed, you can click the links to resubmit individual items or all items in the queue.

Configuring a feeder to prevent job resubmission

If you specifically want jobs submitted by a feeder to be excluded from resubmission, then set the property on the job as below:

 // Create a job to hold the document
 Job j = new Job(doc,"TEST-1");
 
 // Set the last error properties
 j.setProperty(Job.ERROR_NO_RESUBMIT_PROP, new Boolean(true));

Configuring a feeder to resubmit to the beginning of a pipeline

If you want jobs submitted by a feeder to be be resubmitted to the start of the pipeline on which they failed (rather than directly to the stage), then set the property on the job as below:

 // Create a job to hold the document
 Job j = new Job(doc,"TEST-1");
 
 // Set the last error properties
 j.setProperty(Job.ERROR_RESTART_PIPELINE_PROP, new Boolean(true));

Resubmission on startup

When the component is reloaded, all jobs that have files in the retry directory will be loaded and added to the queue. They will then be resubmitted.

Example Configurations

Simple Component Configuration

 <components>
   <component name="errorHandler" subType="handler" factoryName="aspire-job-errorhandler">
     <retryDirectory>data/failed-jobs/retry</retryDirectory>
     <quarantineDirectory>data/failed-jobs/quarantine</quarantineDirectory>
   </component>
 </components>

Complex Component Configuration

 <components>
   <component name="errorHandler" subType="handler" factoryName="aspire-job-errorhandler">
     <retryDirectory>data/failed-jobs/retry</retryDirectory>
     <quarantineDirectory>data/failed-jobs/quarantine</quarantineDirectory>
     <raiseException>false</raiseException>
     <retryDelay>60000</retryDelay>
     <maxRetries>5</maxRetries>
     <retrySleep>5000</retrySleep>
     <jobs>
       <directory path="data/failed-jobs/FullFeeder1" label="FullFeeder1" url="/Feeder/FullFeeder1"/>
       <directory path="data/failed-jobs/FullFeeder2" label="FullFeeder2" url="/Feeder/FullFeeder2"/>
     </jobs>
   </component>
 </components>

Error Handler application.xml

 <?xml version="1.0" encoding="UTF-8"?>
 <application name="error">
   <components>
 
     <component name="errorHandler-pipeline" subType="pipeline" factoryName="aspire-application">
       <components>
         <component name="errorHandler" subType="handler" factoryName="aspire-job-errorhandler">
           <retryDirectory>data/failed-jobs/retry</retryDirectory>
           <quarantineDirectory>data/failed-jobs/quarantine</quarantineDirectory>
         </component>
       </components>
 
       <pipelines>
         <pipeline name="errorHandlerQueue" default="true">
           <stages>
             <stage component="errorHandler" />
           </stages>
         </pipeline>
       </pipelines>
     </component>
   </components>
 </application>

Pipeline Configuration to call the error handler

 <pipelines>
   <pipeline name="add-doc-pipeline" default="true">
     <stages>
       <stage component="getIndexingTerms" />
       <stage component="taxonomyExpander" />
       <stage component="printToFileAdd" />
       <stage component="feed2Solr" />
     </stages>
     <branches>
       <branch event="onError" pipeline="errorHandlerQueue" pipelineManager="/error/errorHandler-pipeline"/>
     </branches>
   </pipeline>
 </pipelines>

Example Failed Job xml

 <failedJob jobId="882dccce-1a3c-38c5-e044-00144fb7b326.882cf0e8-3e19-27bc-e044-00144fb7b326" timestamp="1276652164246">
   <retry count="3"/>
   <stage name="feed2Solr" pipeline="add-doc-pipeline" pipelineManager="/Feeder/pipe-manager"/>
   <doc action="insert">
     <ID source="RDBFeederImpl">882dccce-1a3c-38c5-e044-00144fb7b326.882cf0e8-3e19-27bc-e044-00144fb7b326</ID>
     <fetchUrl source="RDBFeederImpl">882dccce-1a3c-38c5-e044-00144fb7b326.882cf0e8-3e19-27bc-e044-00144fb7b326</fetchUrl>
     <rdbDocumentId source="RDBFeederImpl">882dccce-1a3c-38c5-e044-00144fb7b326.882cf0e8-3e19-27bc-e044-00144fb7b326</rdbDocumentId>
     <ENTITY_UUID source="RDBFeederImpl">882dccce-1a3c-38c5-e044-00144fb7b326.882cf0e8-3e19-27bc-e044-00144fb7b326</ENTITY_UUID>
     <ENTITY_TYPE source="RDBFeederImpl">p</ENTITY_TYPE>
 
 ...
   </doc>
 </failedJob>