Mahout Compare Vectors

From wiki.searchtechnologies.com
Jump to: navigation, search

For Information on Aspire 3.1 Click Here

Aspire / Aspire Components / Compare Mahout Vectors

Compare Mahout Vectors
Description: Reads the serial file of all processed mahoutDocVectors. Then, compares the new vector read in to each of the mahoutDocVectors and outputs a text file of the document relation scores.
Inputs: A serialized data file to store the vectors and xmlIdTag (vectorId)
Outputs: A text file of compared mahout vectors scores
Factory: aspire-mahout
Sub Type: compare
Object Type: Implements the storage handler and requires a path for the output file

Configuration

Element Type Default Description
serialPath string <none> The file location of the serialized storage of the processed mahout doc vectors.
xmlIdTag string <none> XML ID Tag to use for storing the key of the document vector to be accessed to identify the current document when performing document comparisons. Example. APPLICANT_ID.

Sample Configuration

  <component name="compareMahoutDocVector" subType="compare" factoryName="aspire-mahout">
     <config>
	<serialPath>trainingModel/MODEL.dat</serialPath>
	<xmlIdTag>APPLICANT_ID</xmlIdTag>
     </config>
  </component>

Sample Storage Handler Configuration

  <open componentRef="../processDOC/compareMahoutDocVector" variableName="vectorCompareHandle" path="compareStats.txt" />

Usage

This stage is meant to be used after Mahout Store Vector, since this stage uses the mahout vectors that it stores.