Aspire Job Class

From wiki.searchtechnologies.com
Jump to: navigation, search

For Information on Aspire 3.1 Click Here

All processing in Aspire is handled through jobs. Feeders and connectors will generate jobs, jobs flow down pipelines and are processed by applications and components.

The Aspire Job class is a Java class which encapsulates all known information about Java Classes.

Uses of Job

The Aspire Job class is used throughout Aspire:

Feeders and Connectors - Create Job objects which are sent to pipeline managers.
Pipeline Stages - Take in a Job, manipulate the data contained within the Job, and then pass it on to the next pipeline stage.
Sub Job Extractors - Take a parent job, split it into a number of smaller sub-jobs which are all sent to other pipelines.

Basic Structure of an Aspire Job

Job Object Structure

All Jobs have an AspireObject

All jobs in Aspire will have an AspireObject attached to it. The object can be accessed using the job.get() method.

Jobs Also Have Variables

Jobs also have attached variables. These can be accessed using the job.getVariable(name) method, or more simply the get(String key) method. The Aspire Job object implements the Map<String,Object> interface for access to all Job variables.

Job variables are used for communicating any non-metadata type data down the pipeline. For example, if you want to open an InputStream in one component, and then use the stream in another component, this would be set on the Job as a variable.

Job variables are also available in Groovy scripting. They can be accessed directly as if they were global variables in Groovy (no need to specify the "job" prefix, although that's allowed too).

Job Life Cycle

Once a job is created, it is usually enqueued to a pipeline manager (PM). The PM will then send the job down the appropriate pipeline, send the job to each component in the pipeline.

If The Job Completes the Entire Pipeline

In other words, there is nothing more for the job to do, then the job is said to be "complete" and "successful".

If The Job has an Exception

If, when processing a job, a component returns an exception error, the job will be considered to be both "complete" and "error".

When a Job is Complete

When a job is complete, either successful or with an error, it will first notify all of its job listeners, and then it will be simply let go (all nested objects closed and cleared).

Creating a Job

If you are writing a feeder, then you may need to create a job from scratch. For example, if you receive a signal from the outside (perhaps over some sort of message queue) or if you periodically poll something, you may need to create jobs.

Jobs can be created using the JobFactory, with one of the newInstance() methods. This is available to all components as part of the Aspire Framework.

Do not use these methods to create sub-jobs. Sub-jobs must be created from their parent job (see below).

Job Listeners

Any object in Aspire can attach itself to a job as a "listener" using the job.registerListener() methods. Listeners will be notified when the job is complete before the job is let go by the pipeline manager.

Typically, job listeners are things like Feeders, which may wish to record the status of the job before the job is fully let go. For example, if the job encountered an error, the feeder may wish to save the job aside and re-submit it later, perhaps with multiple re-tries.

Branching Jobs

Branching a job which is already active and being processed is handled differently than a newly-created sub-job.

When branching an active job, you use job.setBranch(event). This type of branch does not have a branch handler defined inside your component.

Instead, what happens is that when the job returns to the pipeline manager (PM), the PM will detect that you've requested a branch. The job will then be sent on the <branch> configured within the <pipeline> tag. See Pipeline Branches for more details.

Sub-Jobs

An important feature of Aspire is the ability for a job to spawn of multiple sub-jobs. This is an extremely useful technique which can be used for all sorts of nested processing:

  • Processing documents within zip files
  • Processing files within directories
  • Processing multiple records within a data file

Sub-jobs are always created from a parent job, and are forever attached to that parent job. The parent job will not be considered to be "complete" until all of its sub-jobs are complete. In this way, Aspire assures consistency of job processing.

Creating Sub-Jobs

Use any of the createSubJob() methods (such as job.createSubJob(aspireObject)) to create sub-jobs from a parent job.

Sub-jobs are also instances of the standard Aspire Job object.

After creating the sub-job, you can set the sub-job's AspireObject (with job.set(aspireObject)) and set variables (with job.setVariable(key, object) or just job.set(key, object)).

Enqueing Sub Jobs onto Other Pipeline Managers

Once created, you will likely want to enqueue your sub-job onto a pipeline manager. Generally this means you'll need to do the following:

  1. Configure a branch handler for your component
  2. Initialize the branch handler in the initialize() method of your component
  3. Use the branch handler branchHandler.enqueue(job,event) method to enqueue the sub-job.

See BranchHandler for more information about enqueuing jobs.

Sub Job Statistics

Parent jobs maintain a list of all sub-jobs which are attached to them. As sub-jobs are completed, they will notify the parent job that they are complete.

You can check the parent job for the status of the sub-jobs:

  • job.getSubJobCount() - The total number of sub-jobs created on the parent job
  • job.getSubJobErrorCount() - The number of sub-jobs which encountered an exception error
  • job.getSubJobCompleteCount() - The number of sub-jobs which are complete (either successful or error)

But in truth, you will not need to use these methods directly. The pipeline manager manages parent jobs, and will ensure that that all of the sub-jobs are complete before it finishes the parent job.

Sub-Sub-Jobs / Grand Parent Jobs

Note that sub-jobs can also create sub-jobs of their own. These "sub-sub-jobs" are attached to the "sub-job" which is then attached to the "job". Sometimes you will hear Aspire programmers talking about "grandchildren jobs" and "grandparent jobs" - which refer to this situation.

For example, suppose you have a directory of files. Now suppose that all of those files are .ZIP files. Now suppose that each of the .ZIP files contains an XML file which contains multiple embedded data records. Aspire handles this situation naturally with jobs and sub-jobs no matter how deeply nested.

Terminating Jobs

If a component wishes to immediately "terminate" a job, it can call job.terminate().

"Terminated" jobs are considered to be successful. If you wish to indicate an error, you should throw an exception when processing the job.

Routing Tables

Beginning in Aspire 0.5, you can put a routing table onto a job. This defines how the job is routed, once it completes the pipeline manager.

Routing tables are specified as a sequence of component path names. Each component path is expected to represent a pipeline manager. Jobs can only be routed to the "default" pipeline of a pipeline manager.

Once a job is routed to a pipeline manager, it runs through the entire pipeline (including all nested branches, should they exist). Once the job is complete, it will then be sent to the next pipeline manager specified in its routing table.

Routing tables can also include "nested routes" for sub-jobs.

Result Information

Finally, all jobs contain additional information called the "result". The jobs result will hold information about the job, such as information about what errors or warnings might have occurred while executing the job.

You can fetch the job's result with job.getResult(). You can add data to the result using job.addErrorMessage(), job.addErrorException(), or job.addInfoMessage(). Jobs which encounter exceptions will automatically have that exception added to the job by the PipelineManager (in other words, your component does not usually need to call job.addErrorMessage() directly).