Batch Job (Aspire 2)

From wiki.searchtechnologies.com
Jump to: navigation, search

For Information on Aspire 3.1 Click Here

A batch job is a job sent once a batch is completed, it has some useful information as error messages, jobs status and metadata, and custom component data.

It is configured by using the following attributes of the Branch Handler:

  • batchPipelineManager - PipelineManager where the batch job will be sent.
  • batchPipeline - Pipeline where the batch job will be sent.
  • batchStage - Stage where the batch job will be sent.
  • batchMethod - Indicates how the batch job must be sent ("process" or "enqueue"). (default is "enqueue")

Example configuration:

  <branches>
    <branch event="onPublish" pipelineManager="/PostToSolr/Main" 
      batching="true"
      batchSize="50"
      simultaneousBatches="5"
      batchTimeout="1000"
      batchPipeline="batchCompletedPipeline" 
      batchPipelineManager="/PostToSolr/Main"
  </branches>


The following is an example of a batch job sent when a batch is successful:

<job>
  <doc>
    <jobs>
      <job id="TestJob3" />
      <job id="TestJob2" />
      <job id="TestJob1" />
    </jobs>
    <messages/>
  </doc>
</job>

The <job> fields can contain custom metadata as needed (maybe for resubmit failed jobs). Also every job automatically logs errors, warning and info messages into that field.

For example, if we need <fetchUrl> field inside the <job> field to resubmit the failed ones, we can add a groovy script stage with:

job.getBatch().addJobData(job,doc.get("fetchUrl"));

So the batch job will look like:

<job>
  <doc>
    <jobs>
      <job id="TestJob3">
        <fetchUrl>testJobUrl</fetchUrl> <!-- Field added in groovy script stage -->
      </job>
      <job id="TestJob2">
        <fetchUrl>testJobUrl</fetchUrl> <!-- Field added in groovy script stage -->
      </job>
      <job id="TestJob1"> 
        <fetchUrl>testJobUrl</fetchUrl> <!-- Field added in groovy script stage -->
        <messages>      
          <error code="aspire.aStage.anError"> <!-- ******* This job has an error ****** -->
            <message><![CDATA[Error message]]></message>
            <trace><![CDATA[<The exception trace>]]></trace>
          </error>
          <message type=”info”>A info message</message>
          <message type=”warn”>A warn message</message>
        </messages>
      </job>
    </jobs>
    <messages />
  </doc>
</job>

Batch errors or warnings are logged into doc/messages the same way they are logged into the doc/jobs/job

Some components that supports batching like aspire-post-xml adds some custom metadata to the batch job, so you may see some other fields for example for aspire-post-xml:

<job>
  <doc>
    <jobs>
      <job id="TestJob3" />
      <job id="TestJob2" />
      <job id="TestJob1" />
    </jobs>
    <server>http://localhost:8983/solr/update</server> <!—This is a custom field added by aspire-post-xml to the batch object)
    <messages/>
  </doc>
</job>

A complete application example is attached to this page. It is meant to index everything inside a directory into a Solr search engine. It handles batch jobs from failed batches.