RSS Feeder 0.4

From wiki.searchtechnologies.com
Jump to: navigation, search

For Information on Aspire 3.1 Click Here

Aspire / Aspire Components / Simple Feeder / RSS Feeder

RSS Feeder
Description: Periodically scans a list of RSS feeds held in the CCD. If a feed has been updated since the last poll (or not been published before), the feed is retrieved and the list of items from the feed are scanned. If these individual items has not been published, the URL of the page is pulished on the configured pipeline.
Inputs: URLs of RSS feeds from the CCD
Outputs: An AspireDocument object containing the <url> of an item from an RSS feed, published to the configured pipeline manager.
Factory: aspire-rssfeeder (previously aspire.RSSFeeder)
Sub Type: default
Object Type: Produces AspireDocument objects.

Other Notes

  • The URLs themselves are not fetched, this is performed in the pipeline
  • This feeder is based on the Simple Feeder


Configuration

This feeder takes all parameters from the Simple Feeder plus the following:

Element Type Default Description
ccdLocation string ccd The location with in the system of the content control database (CCD)
feederLabel string CrawlRSSFeed The feeder label submitted in the <feederLabel> of the published document and when querying the CCD.


Metadata Mapper Configuration

The RSS feeder maps some metadata fields to fields in the AspireDocument XML.

Field Default Output Field Description
title title The title of the RSS feed entry.
author author The author of the RSS feed entry.
updatedDate updatedDate The time when the RSS feed was entry was updated.
publishedDate publishedDate The time when RSS item was published.
feedDate feedDate The time when the URL is published.

Example Configurations

Simple

  <component name="RSSFeeder" subType="default" factoryName="aspire-rssfeeder">
    <config>
      <autoStart>${autoFeedPages}</autoStart>
      <ccdLocation>/systemCommon/ccd</ccdLocation>
      <branches>
        <branch event="onPublish" pipelineManager="standard-pipe-manager" />
      </branches>
    </config>
  </component>

Complex

  <component name="RSSFeeder" subType="default" factoryName="aspire-rssfeeder">
    <config>
      <autoStart>${autoFeedPages}</autoStart>
      <loopWait>3600000</loopWait>
      <metadataMap>
        <map from="category" to="category"/>
        <map from="subCategory" to="subCategory"/>
        <map from="geographicArea" to="geographicArea"/>
        <map from="searchKeywords1" to="searchKeywords1"/>
        <map from="boost" to="boostTokens"/>
      </metadataMap>
      <ccdLocation>/systemCommon/ccd</ccdLocation>
      <branches>
        <branch event="onPublish" pipelineManager="standard-pipe-manager" />
      </branches>
    </config>
  </component>