Changes to the Job Object in 0.5

From wiki.searchtechnologies.com
Jump to: navigation, search

For Information on Aspire 3.1 Click Here

Aspire / Version 0.5 Release Notes / Changes to the Job Object in 0.5

Job IDs are Now Computed Automatically

Job IDs are now used as a global method of tracking jobs across a cluster of Aspire machines. They must be unique across all machines in the network.

To make this happen, job IDs are now automatically determined by the framework. By default, they will include the server IP address, port address, and startup time of the machine. Sub jobs, by default will be the same ID as their parent job, but with a "/{id}" number appended to it.

There are still Methods to Specify Your Own Job ID

But do not use them. They should only be used by experts (aka, inside the Framework).


Variables have been Moved to Job

Previously, the AspireDocument contained the name/value map of variables, and you could access these with doc.getVariable() and doc.putVariable().

Since AspireDocument has gone away, these variables have been moved up to the job, which is a more natural place for them. So now:

  • job.getVariable()
  • job.getVariableMap()
  • job.getVariableMapForUse()
  • job.putVariable()

are all on Job. This makes AspireObject much more focussed: It is now only a holder for XML/JSON structured metadata.

Further, Job now extends Map<String,Object>. This means that all Map methods (put(key,obj), get(key), containsKey(), containsValue(), etc.) are now all available on Job and are all for manipulating the map of variables attached to the job.

This also allows for these variables to be accessed more directly using Groovy. For example, these methods will access job variables directly:

 job.myVariable = 12
 println job.myVariable

Note that these methods access the job variable on the specified job object - without hierarchical job traversal.

In Aspire, variables can also be accessed as follows:

 println myOtherVariable

This access will use hierarchical job traversal. In other words, if the variable doesn't exist on the current job, Aspire will automatically check the parent job (and then the grandparent job, and so on).

Job Properties are Gone

Rarely used Job properties have been merged with job variables, and so what were formerly job "properties" have dissappeared from the Job object.

CHANGE: job.getProperty(); --> job.getVariable()

CHANGE: job.setProperty(); --> job.putVariable()

CHANGE: job.removeProperty(); --> job.setVariable("<name>", null)


Jobs now have Routing Tables

Previously, there was only one method for routing jobs, that was to configure it inside the system.xml (now, the application.xml) using pipelines and branches (with the branch handler).

In version 0.5, you can now route jobs with routing tables.

Routing is Handled Automatically by the Framework

In Version 0.5, once a job has finished with a pipeline, including all branches specified by the <branches> branch handler tags, if the job has no where else to go, the Aspire framework will look to see if the job has an attached routing table. If it does, the framework will automatically send the job to the next route (see job.nextRoute()) in the routing table.

Specifying Routing Tables

Routing tables are attached to job and can be specified differently, at run time, for each job. This allows jobs to be routed based on data, such as routing tables specified in a relational database, for example.

The Job class directly supports the creation of routing tables on jobs, using the following methods:

  • job.addRoute()
  • job.clearRoutes()

In addition, there is a new support class, called JobRoute which holds information about a single destination on the routing table.

A "destination" in a job routing table are, essentially, the names of pipeline managers which will receive the job. Currently, the job will always be routed to the default pipeline within a pipeline manager.

Destinations can also be specified as simple application names, for example, "/CIFSConnector", or "/PostToGSA". In these cases, the default destination, "/Main" will be added to the name to give the pipeline manager name: "/CIFSConnector/Main" and "/PostToGSA/Main".

Routing Sub Jobs

Not only can Job routing specify the execution path for a single job, but it can also specify the execution path for all sub-jobs created by a job. This means that a job routing table is actually a tree of nested routes.

For example, if I have a simple job which creates multiple sub-jobs, the routing table could look like this:

  • CIFSConnector
    • NormalizeMetadata
    • PostToGSA
  • CleanupOrphanDocuments

In this example, the CIFSConnector component creates sub-jobs. These sub-jobs will be automatically routed to the "/NormalizeMetadata/Main" and then "/PostToGSA/Main" components. The parent job, meanwhile, after completing "/CIFSConnector/Main", will next be routed to "/CleanupOrphanDocuments/Main".

In this way, the routing table can define the routing for the entire tree of jobs (parent jobs, children jobs, etc.) for a job and all of its descendants. The methods:

  • job.pushIntoNestedSubJobRoutes()
  • job.popFromNestedSubJobRoutes()

exist for the purpose of specifying sub-job routing tables.


Jobs can be Written as JSON

This is so that jobs can be serialized and sent across the wire to remote servers.

But you can also write out your jobs to JSON, if you'd like (to communicate with browsers, for example).


"Job" is now an Interface

"Job" is now an interface. This will improve stability of Aspire across multiple versions, will allow more flexibility in implementing Jobs, including, possibly, multiple implementations in the future.

This means that "new Job()" is no longer available:

 Job j = new Job();    // Only available in 0.4 and earlier

Jobs must now be constructed using JobFactory:

 Job j = JobFactory.newInstance();  // 0.5 and later

The other common constructor includes:

 Job j = JobFactory.newInstance(AspireObject data);

JobFactory is resolved by:

 import com.searchtechnologies.aspire.framework.JobFactory;


Getting and Setting the Main Object / Document

Setting and getting the main job object (i.e. formerly the AspireDocument) has changed. The methods have been renamed and the arguments/returns are now AspireObject rather than Object. If you need to pass any other type of Object, wrap it in an AspireObject (ie set it as content to the AspireObject) with the name of Standards.Basic.OBJECT_WRAPPER, OR simply store it as a job variable.

 void Job.setObject(Object) -> void set(AspireObject)
 Object Job.getObject() -> AspireObject Job.get()