Introduction to Programming Components (Aspire 2)

From wiki.searchtechnologies.com
Jump to: navigation, search

For Information on Aspire 3.1 Click Here

Aspire programming usually means creating custom components (pipeline stages) for processing components. These custom components can be written in Groovy (directly into the application.xml file) or Java.

The following is intended to provide an orientation to programming with Aspire for those approaching it for the first time.


Know these Java Objects

Programming Aspire is mostly about these three fundamental Java classes:

  1. ComponentImpl - Introduction
This is your connection to Aspire at large. Use ComponentImpl to access the Aspire Application, other components, logging, and configuration data. All components (including pipeline stages) extend ComponentImpl.
  1. Job - Introduction
Job represents a unit of work in Aspire. Jobs are created by feeders (or sub job extractors) and fed through pipelines. Jobs contain metadata and other objects which are manipulated by pipeline stages to do things.
  1. AspireObject - Introduction
The AspireObject is the primary holder of hierarchically structured, tagged data. It can hold either XML or JSON, and can even be used to convert between the two. It can be used with XPath and XSLT. Every Job has an AspireObject to hold metadata appropriate to the job.

Groovy Scripting

Once you know the objects above, you can start creating your own Groovy scripts. Groovy scripts are written directly into your application.xml file , using the aspire-groovy component, like this:

 <component name="MyComponent" subType="default" factoryName="aspire-groovy">
   <script>
     <![CDATA[
       println "Hello World, my document looks like this:";
       println doc.toXmlString(true);
     ]]>
   </script>
 </component>

Inside Groovy scripts, you can use the "component" variable to call any method of ComponentImpl, the "job" variable to access the current job object, and the "doc" variable (as in the example above) to access the current AspireObject.

Go to Groovy Scripting to learn more about the Groovy scripting component.

Set Up Your Environment

When programming new Aspire components in Java, we recommend using Eclipse and Maven. Specifically:

  1. Install Java JDK (install appropriate version for the Aspire target version).
  2. Install the Eclipse IDE
  3. Install Maven command line
  4. Install m2Eclipse (Maven for Eclipse)

The version of Java you should use depends on the Aspire version you are targeting to:

  • Aspire 2.1.2 and earlier runs on Java 1.6 or Java 1.7
  • Aspire 2.2 and up requires to run at Java 1.7

Search Technologies uses subversion internally for source code control, but it's not required to program Aspire.

See Developer Environment Setup for step-by-step details on setting up your environment.

Maven Repositories

Deploying Components with Aspire

Aspire development goes smoother if you have a Maven repository handy.

Components you develop can be deployed to your Maven repository. Once deployed, they can be used in any Aspire application simply by referring to them using their maven coordinates. This allows you to deploy and combine components from multiple sources into a single, unified content processing application.

Even better, updates to components deployed as -SNAPSHOT versions to Maven can be dynamically updated inside of an Aspire deployment, without having to restart the servers, thanks to OSGi.

So, we recommend installing a Maven repository to serve as central storage for all components that you create. Artifactory (http://www.jfrog.com/home/v_artifactory_opensource_overview) is a perfectly functional, open source repository for holding your completed components, if you don't already have a repository of your own.

Component Class Hierarchy

Aspire is made up of a few basic Java interfaces: Component, Stage, ComponentManager (a group of components), Pipeline Manager (processes jobs through pipelines), and the Aspire Application itself.

These Java interfaces are arranged into a hierarchy like this:

AspireClassHierarchy.png

You will only ever need to concern yourself with "Stage" and "Component". All of the other interfaces are fully implemented by the Aspire framework.

Each of these interfaces has an associated "Impl" class. For example, StageImpl, ComponentImpl, etc.

The Anatomy of a Component

Most new components are pipeline stages, and all of these have the same basic structure:

public class MyComponent extends StageImpl {
  public void initialize(Element config) throws AspireException { . . .  }
  
  public void process(Job j)  throws AspireException { . . . }
  
  public void close()  throws AspireException { . . . }
}

Those are the only methods that you will ever need to implement.

initialize(Element config)

  • Put code to initialize your component here, for example opening files, opening connections to databases, initializing data structures, etc.
  • "config" will be a W3C Element object that contains the XML that was specified in the application.xml for your component. Extract any configuration data you need from this Element.
  • initialize() is guaranteed to be called before any job is processed with process().
  • initialize() is guaranteed to be only called by a single thread.

process(Job j)

  • Put code here to actually processing the job.
  • job.get() retrieves the associated AspireObject (aka the "document"). This is the primary metadata holder for the job.
  • Jobs can also hold any other object. In fact, Job implements Map<String,Object>, so you can literally store anything in a Job. Anything at all. (Note: Data stored in the map are called Job "variables")

close()

  • Code for freeing any resources used by your component goes here, for example closing file pointers, releasing connections, releasing memory, etc.

The Stage Archetype

Creating new components is made much easier with a Maven archetype. This archetype will create a complete, working Java Maven project for a brand-new pipeline stage, complete with unit tests!

The stage archetype will prompt you to enter some data so it can create the stage properly. Specifically, you will be asked for:

  • The Maven coordinates (group ID, artifact ID, version) for your new pipeline stage
  • The Java class name to use for your new component
  • Some textual information to help annotate your Maven POM file (does not affect functionality)

See Creating a New Pipeline Stage for a detailed, step-by-step description for using the Stage archetype to create a new Aspire pipeline stage.

Where to Go From Here