Cache (Aspire 2)

From wiki.searchtechnologies.com
Jump to: navigation, search

For Information on Aspire 3.1 Click Here

Cache (Aspire 2)
Factory Name  com.searchtechnologies.aspire:aspire-cache
subType  default
Inputs  Servlet commands, access from external components and Jobs from processing pipelines
Outputs  Servlet commands and access from external components
Feature only available with Aspire Enterprise

The cache stage/component provides caching services with in Aspire, either as a stage where jobs are passed through in a pipeline or as a service to external components.

Items are added to the cache and then can be retrieved from the cache. The cache component is configured with a validity in milli-seconds. Items added to the cache will, by default, expire after the given validity period. If required, a different validity may be specified when caching an item.

Optionally, a background thread will purge the cache at given periods, removing items that are no longer within the validity period from the cache. The background purge may be turned off. However, when accessing the cache, the validity of items is always checked before results are returned, ensuring that expired items are not returned from gets and are not counted in size or isEmpty type requests.

Also see the Cache Get stage.

Operation as a Stage

When operating as a stage, the cache component will put items on to the cache as it processes Jobs. If you need a stage to get objects from the cache, see Cache Get.

As jobs pass through the stage, the component will extract a value to add to the cache from the document attached to the job. It will also extract a key. The value will then be stored in then cache against the key with the default validity. There are two methods that may be used to obtain the key and value. These are specified in the component configuration and are described below.

Simple key/value extraction

Simple key and value extraction uses a path to a indicate the location within the document of the key and another to indicate the location of the value. The key should always return a text value. The value cached will be the AspireObject found at the given path. When getting items back from the cache, the value will be placed back in the document at the same location, overwriting any object at that position.

Groovy key/value extraction

Groovy key and value extraction offers a more flexible way of constructing the cache key and value. Three scripts are used:

  • key script
This script is used to 'construct' the key for the cache. This script is used during both puts and gets. This script should return a string and might concatenate fields from the document.
  "##" + doc.myKey?.text() + "##"
  • value put script
This script is used to extract the value to add to the cache and is only executed during puts . The object returned by this script may be of any type and will be added to the cache against the key 'constructed' by the key script.
  doc.path?.myValue?.myValueFolder
  • value get script
This script is used to insert the value held in the cache back in to a document and is only executed during gets, after a cache hit has been made. Any object returned by this script will be ignored.
  if (doc.path == null) {
    doc.add("path");
  }
  doc.path.myValue=it

Special variables

The following variables are available to the Groovy scripts:

  • doc
The document attached to the job
  • component
Access to the underlying component (for access to other components or for logging)
  • it (during gets only)
The object returned from the cache

See the Groovy Scripting page for more details on the doc and component variables.

Access via other Components

Two interface are available for components wishing to access the cache, depending on the operation required. See Accessing Other Components for details on how to programmatically access the cache component.

The cache is able to store any type of Java object against a String key.

When caching AspireObject, puts and gets will use a copy (clone()) of the object to ensure that the item on the cache remains constant until it is replaced or expires.

NOTE: If you cache any object other than AspireObject you are responsible for ensuring that the item cached is immutable. Otherwise, items on the cache could be changed by subsequent accesses to the objects put or got.

Cache interface

See here for details.

The Cache interface allows other components to put to and get from the cache. Components may specify the key, value and validity or may pass an AspireObject which will use the Simple or Groovy methods noted above to calculate keys and values.

Cache Component interface

See here for details.

The Cache Component interface is designed to allows separate put and get components to process documents in a common place, allowing a consistent set of rules for key and value extraction to be used across disparate components.

For example, the Cache Get component calls the processGet() method of a cache component via this interface, meaning that Cache Get does not need to be configured to extract the keys and values. Instead, as the get is actually processed in the put component, it uses the configuration from that component, meaning that the same rules for key and value extraction are used, without having to replicate the configuration.

Configuration

Element Type Default Description
validity long 300,000
(5 mins)
The default validity of items placed on the cache. After this period, the item expires and will be removed.
initialCapacity int Sets the initial capacity of the cache. The default is to use that of the underlying implementation (ConcurrentHashMap).
purgePeriod long 1,800,000
(5 mins)
The period after which the cache is purged (in the background). Set to 0 to disable background purging.
key/@path String Specify the 'simple' path to the key with in the document.
value/@path String Specify the 'simple' path to the value with in the document.
key String The Groovy script to use to construct the key from the document.
NOTE: Ignored if key/@path is specified.
value/put String The Groovy script to use to extract the value from the document before it's put on the cache.
NOTE: Ignored if value/@path is specified.
value/get String The Groovy script to use to insert a cache hit back in to the document. Only called if a cache hit is encountered.
NOTE: Ignored if value/@path is specified.

Example configuration

Cache service only

If the cache is only required to provide services to other components, and will not be used as a stage in a pipeline, the following configuration could be used.

  <component name="myCache" subType="default" factoryName="aspire-cache">
    <validity>60000</validity>
    <purgePeriod>1800000</purgePeriod>

Simple key/value extraction

If stage access is required, keys and values may be obtained using simple paths with in the document using the following configuration.

  <component name="myCache" subType="default" factoryName="aspire-cache">
    <validity>60000</validity>
    <purgePeriod>1800000</purgePeriod>
    <key path="/doc/myKey"/>
    <value path="/doc/path/myValue/myValueFolder"/>
  </component>

Groovy key/value extraction

Stage access to the cache can also be configured using Groovy (see the Groovy Scripting page for more information on Groovy) to allow more flexible configuration of the keys and values using the following configuration.

  <component name="myCache" subType="default" factoryName="aspire-cache">
    <validity>60000</validity>
    <purgePeriod>1800000</purgePeriod>
    <key>"##" + doc.myKey?.text() + "##"</key>
    <value>
      <get>
        <![CDATA[
          if (doc.path == null) {
            doc.add("path");
          }
          // 'it' below is the item returned from the cache
          doc.path.myValue=it
        ]]>
      </get>
      <put>doc.path?.myValue?.myValueFolder</put>
    </value>
  </component>