Aspire for Hadoop Introduction (Aspire 2)

From wiki.searchtechnologies.com
Jump to: navigation, search

For Information on Aspire 3.1 Click Here

Feature only available with Aspire Enterprise

Features

  • Aspire for Hadoop allows Aspire instances to be executed on Hadoop task tracker nodes as map,reduce and/or combine tasks.
  • Every time a task is executed, each Hadoop Task Tracker will be responsible for launching and shutting down the Aspire instance it requires.
  • A Hadoop Writable named AspireObjectWritable is available for Mappers and Reducers tasks to read/write AspireObjects to/from Hadoop input/output files.
  • External Aspire implementations can publish their output to Hadoop HDFS via the Post to HDFS or Post to WebHDFS components to be later used by Aspire for Hadoop map/reduce tasks.
  • The configuration of Aspire for Hadoop tasks is based on similar aspire application.xml structure (map,reduce,combine applications are all defined on a single file).
  • Aspire components are provided to interact with Hadoop's Context and Key/Value pairs (read, write pairs).
  • Aspire groovy component can also be used to directly interact with Hadoop's Context and Key/Value pairs objects.

Supported Versions

  • Aspire 2.2.x for Hadoop works with CDH5 Cloudera distribution.
  • Aspire 2.1.x for Hadoop works with CDH5 Cloudera distribution.
  • Aspire 2.0.x for Hadoop works with CDH4 Cloudera distribution.

High Level Diagram: Aspire for Hadoop

Aspire for Hadoop diagram