Lucene Services (Aspire 2)

From wiki.searchtechnologies.com
Jump to: navigation, search

For Information on Aspire 3.1 Click Here


Lucene Services (Aspire 2)
Factory Name  com.searchtechnologies.aspire:aspire-lucene
subType  default
Inputs  Method calls
Outputs  Lucene index (optional)

The Aspire Lucene component provides the Lucene classes to other bundles and methods for some commonly used Lucene functionality.

This component exists as a holder for the Lucene libraries and exports the Lucene classes for use in other components.

It also provides convenient methods for indexing and searching in an index controlled by the component, although configuration of this index is optional. The services are disabled if the index is not configured.

Configuration

Element Type Default Description
indexDirectory string <none> The direcotry on disk of a Lucene index. The index will be created if if does not exist. If this parameter is not given, index and searching methods will not be available.
documentID string <none> The Lucene field to be used as the document id for deletes and updates. If not specified, documents may be added to the index, but updates and deletes will not be available.
luceneMaxFieldLength int 10000 The maximum number of terms that will be indexed for a single field in a document. This limits the amount of memory required for indexing, so that collections with very large files will not crash the indexing process by running out of memory. This setting refers to the number of running terms, not to the number of different terms.

Note: this silently truncates large documents, excluding from the index all terms that occur further in the document. If you know your source documents are large, be sure to set this value high enough to accommodate the expected size. If you set it to Integer.MAX_VALUE, then the only limit is your memory, but you should anticipate an OutOfMemoryError.

By default, no more than 10,000 terms will be indexed for a field.

luceneMaxBufferedDocuments string -1
= disabled
Determines the minimal number of documents required before the buffered in-memory documents are flushed as a new Segment. Large values generally gives faster indexing.

When this is set, the writer will flush every luceneMaxBufferedDocuments added documents. Pass in -1 to prevent triggering a flush due to number of buffered documents. Note that if flushing by RAM usage is also enabled, then the flush will be triggered by whichever comes first.

Disabled by default (writer flushes by RAM usage).

luceneMergeFactor int 2 Sets the index writer merge factor.
luceneRAMBufferSizeMB int 2048 Sets the index writer RAM buffer size in MB.
autoCommitMS long 0
= disabled
The time (in ms) bewteen commits of the index. If set to 0, auto-commit based on time is disabled. This index is only committed if documents have been added since the last commit.
autoCommitMS long 0
= disabled
The maximum number of documents that can be added between commits of the index. If set to 0, auto-commit based on document submission is disabled.
autoCommitSpinWait long 1000 ms
= 1 s
The spin wait time for the thread performing auto-commits (if enabled). The thread wakes this often to check whether the time and document threshold have been passed and commits if required.

Example Configuration

Simple

    <component name="LuceneService" subType="default" factoryName="aspire-lucene"/>

Complex

    <component name="LuceneIndexer" subType="default" factoryName="aspire-lucene">
      <indexDirectory>data/index/lucene-index</indexDirectory>
      <documentID>url</documentID>
      <luceneMaxFieldLength>10000</luceneMaxFieldLength>
      <luceneMaxBufferedDocuments>100</luceneMaxBufferedDocuments>
      <autoCommitSpinWait>5000</autoCommitSpinWait>
      <autoCommitMS>1800000</autoCommitMS>
      <autoCommitDocs>10000</autoCommitDocs>
    </component>

Accessing from External Components

In order to use the index and searching capabilities of this component, you must configure the <indexDirectory> parameter. Services are then provided using the AspireLucene.java interface.

Components wishing to access this functionality should main a service tracker to this component, get an instance an then call the appropriate method. See here for further details.