Developing New Staging Repositories

From wiki.searchtechnologies.com
Jump to: navigation, search

For Information on Aspire 3.1 Click Here

Feature only available with Aspire Enterprise

 (2.1 Release)  

Developing New Staging Repositories

In order to develop a new staging repository for Aspire, you must develop a number for different pieces of code. First you must implement a Repository Access Layer to connect to your repository and then you must develop a publisher to publish to your repository and a connector to crawl it.

If you need a complete example of a Staging Repository, have a look at the File System Staging Repository as describe and used above. This can be found in svn in here.

The Repository Access Layer

The Repository Access Layer is the main piece of any staging repository. It implements the methods required to store data in and retrieve data from the staging repository. It also implements methods that allow the crawler to ascertain what data is in the repository and what data has been updated since the last time a crawler was run.

The Repository Access Layer should be implemented as a library and should implement the RepositoryAccessLayer Java interface that can be found in svn here

The method that opens the repository uses the Java properties class, so you can pass properties from the publishers and scanners that the repository needs (url, encrypt etc)

For reference, the code for the File System Staging Repository can be found in svn here

The Publisher

The staging repository publisher comprises of two pieces – a component and an app-bundle. Since the app bundle is really just a wrapper around the component, creation of a new one should be relatively simple.

The component creation should also be relatively easy. The aspire-publisher project contains an AbstractContentRepositoryPublisher abstract class. Simply create a component that extends this class and implement the three methods to create the publisher. You’ll need to add a dependency to the Repository Access Layer library you created.

In the contentRepositoryPublisherInitialize(Element config) method you gather any configuration you need from the component configuration. The abstract content repository publisher will gather the url, domain, user, password and publishStream options from the component configuration for you.

In the contentRepositoryConnectionProperties() method you’ll set any properties that are required to open the repository. The abstract content repository publisher will add the url, domain, user and password properties (taken from similar named tags in the component configuration) if you do not add them.

In the getContentRepository() method, simply return a new instance of the implementation class for your repository access layer.

For reference, the code for the File System Staging Repository publisher (component and app-bundle) can be found in svn here.

The Connector

The staging repository connector comprises of two pieces – a component and an app-bundle. Unlike the publisher, the app-bundle is not a simple wrapper. It contains all the control and workflow stages required. In the short term the best solution is to take a copy of the File System Staging Repository connector app-bundle and modify it. That app-bundle can be found in svn here

The scanner component is again based on an abstract Java class - AbstractContentRepositoryScanner. Extend this class and implement the four methods to create your scanner component.

In the contentRepositoryConnectionProperties(AspireObject propertiesXml) method you are passed the crawl properties from the control job. You should extract any properties you may need for your connector to connect to the repository or retrieve data from it. You should return a set of properties that contains all the configuration required to connect to the repository. The abstract content repository scanner will add the url, domain, user and password if you don’t set them.

In the contentRepositoryScannerInitialize(Element config) method you gather any configuration you need from the component configuration. The abstract content repository scanner will collect all of the standard connector configuration for you.

In the getContentRepository() method, simply return a new instance of the implementation class for your repository access layer, exactly the same as in the publisher.

The getSourceType() method should simply return the name of the class implementing the Repository Access Layer. For reference, the code for the File System Staging Repository connector (component and app-bundle) can be found in svn here