Difference between revisions of "Publish to Solr Application Bundle (Aspire 2)"

From wiki.searchtechnologies.com
Jump to: navigation, search

For Information on Aspire 3.1 Click Here

 
Line 7: Line 7:
 
|appBundleName=Publish To Solr
 
|appBundleName=Publish To Solr
 
|mavenCoordinates=com.searchtechnologies.aspire:app-publish-to-solr
 
|mavenCoordinates=com.searchtechnologies.aspire:app-publish-to-solr
|versions= 2.0
+
|versions={{CurrentVersion2.X}}
 
|typeFlags= job-input
 
|typeFlags= job-input
 
}}
 
}}

Latest revision as of 00:17, 9 December 2015


Publish to Solr Application Bundle (Aspire 2)
AppBundle Name  Publish To Solr
Maven Coordinates  com.searchtechnologies.aspire:app-publish-to-solr
Versions  2.2.2
Type Flags  job-input
Inputs  AspireObject from a connector's subjob with metadata and content extracted from a specific file/folder.
Outputs  An XML transformation of the AspireObject sent to the Solr's xmlfeed URL.

The Publish to Solr application sends document feeds, to the Solr index update servlet, of metadata and content of files extracted by Aspire connectors. The feed to Solr can be customized by editing the XSL transformation file provided by the user.


Configuration

This section lists all configuration parameters available to install the Solr Application Bundle.

Property Type Default Description
SolrNoUrl boolean true Indicates if the publisher must use a Url or build one from the host and port entered.
SolrPort interger 8983 Solr port where to send the feeds
solrHost string none Solr hostname or IP adress. e.g. solr.domain.com
SolrUrl string none Complete Url where the feeds are going to be send. e.g. http://localhost:8983/solr/core/update
aspireToSolrXsl string ${appbundle.home}/config/xsl/aspireToSolr.xsl Location of the XSL to transform the job data to a Solr feed. See Edit Xsl.
maxResults (2.1 Release)   int 1000000 (Index dump) How many documents can be fetched by the search engine for the same query
pageSize (2.1 Release)   int 10000 (Index dump) How many documents to fetch per page
urlField (2.1 Release)   String displayUrl (Index dump) Field used to store the url in the search engine
idField (2.1 Release)   String id (Index dump) Field used to store the id in the search engine.
timestampField (2.1 Release)   String submitTS (Index dump) The name of the timestamp field holding the index timestamp of every document.

Configuration Example

  <application config="com.searchtechnologies.aspire:app-publish-to-gsa">
    <properties>
      <SolrNoUrl>true</SolrNoUrl>
      <SolrHost>localhost</SolrHost>
      <SolrPort>8983</SolrPort>
      <aspireToSolrXsl>${appbundle.home}/config/xsl/aspireToSolr.xsl</aspireToSolrXsl>
      <debug>false</debug>
    </properties>
  </application>

Note: Any optional properties can be removed from the configuration to use the default value described on the table above.

Edit Xsl

The default XSL transformation file can be found in File:AspireToSolr.xsl.


The default transformation XSL file provided by the publisher expects metadata as described in Connector AspireObject Metadata.

Add metadata field

To add a new metadata field extracted by an Aspire Connector add an XSL element under the <doc> tag.

<field name="metafieldNameInSolr_t">
  <xsl:value-of select="metafieldNameFromAspireObject" />
</field>

Notice that the dynamic field _t is being used by default. If you have a Solr schema that supports your field, then just enter the field name as defined in the schema.

Change the document ID

The id of a Solr document is used to uniquely identify a file in the index. By default, Publish To Solr will use the following fields from the Aspire document in order of precedency (if one is missing, then the next will be used):

  • fetchUrl
  • url
  • displayUrl
  • id

If you want to change this behavior, edit or create a new XSL file which has the following element:

<field name="id">
  <xsl:value-of select="idFieldNameFromAspireObject" />
</field>

Advanced Edit

More advanced changes can be accomplished reading the Solr Update XML Messages wiki.