Category:1.2 Release

From wiki.searchtechnologies.com
Jump to: navigation, search

For Information on Aspire 3.1 Click Here

This page documents a list of all of the updates for version 1.2 of Aspire, officially released on March 27, 2012.

Release 1.2 also contains all of applicable fixes done to prior 1.0.x versions. Consult the Release Notes for individual versions for details.

Summary of Changes

New Features and Major Improvements

LDAP Services Component NEW

Mime Type Normalizer NEW

  • Normalizes mime types down to simpler types, for presentation and search.
  • Also provides a user-friendly name for the type.
    • For example, there are about 15 different types of PPT files: application/vnd.ms-powerpoint
    • These are all normalized down with a user-friendly name of "PowerPoint."
  • Also does mapping of file extensions to mime types.
  • Uses a mime type file for the mapping in the resources directory.

Metadata structures for ACLs and Hierarchies within the connector framework IMPROVEMENT

  • Some tags and attributes were renamed for clarity.
  • Hierarchical structure was improved to allow for multiple hierarchies.
  • Pseudo-mime types were added for all connectors for hierarchy.
  • Normalize all connectors to write connector-specific metadata into a <connectorSpecific> tag.

Hierarchy Generator IMPROVEMENT

  • Adds in hierarchical nodes for sources that don't have a hierarchical structure.
  • Mostly used for relational databases. (In this case, the hierarchy is generated by metadata, and the hierarchy generator indexes the parent nodes as appropriate.)
  • Automatically computes the URLs for the parent as the aggregate of the URLs from all the children (most permissive aggregate).

Updates to Heritrix Connector IMPROVEMENT

  • Allows for incremental updates for web crawls.
    • Maintains a database of all URLs crawled.
    • Provides statistics on URLs crawled.
    • Automatically checks "unreachable URLs" which were previously reachable to determine if they are still accessible.
    • Unreachable URLs which are no longer accessible are deleted after -N- tries.

RDBMS Snapshot Connector IMPROVEMENT

  • Does incremental updates for RDB tables which do not have update tables.
    • Has a "Discover SQL" for listing all documents in an RDB table.
      • This creates a table of ID's and update dates which is the "snapshot."
      • Snapshots are compared to determine documents created/updated/deleted.
    • Has an "Extraction SQL" for fetching the actual content of the records.
      • Done in batches of 50 for better performance.
    • Separate SQL intended to improve performance of incremental checks.

New connector framework which manages more of the connector processing automatically IMPROVEMENT

  • This includes work such as pause, start, stop, resume.
  • Also includes reporting status (number of jobs, etc.).
  • This is the "HierarchicalScanner" for content sources that are naturally hierarchical; therefore, the new framework also handles producing the hierarchy metadata automatically as well.
  • Also has methods for creating ACLs and other standard metadata structures, which means less manipulation of AspireObject metadata structures is required.

Complete Confluence connector IMPROVEMENT

  • For both version 3.x and 4.x of Confluence.
  • Correctly handles group expansion.
  • Correctly handles ACLs for "intersection ACLs" between spaces and documents.
  • Fully integrated into Aspire.

Group Expansion Cache Warmer IMPROVEMENT

  • Submits users for warming the group expansion caches to improve search performance for the initial search.

RightNow connector now available on the Aspire framework IMPROVEMENT

New Hibernate / Wake Framework Mechanism for Processing Batches IMPROVEMENT

  • Allows jobs to be "hibernated", which detaches them from their execution thread.
  • This is used for batched processing - where a batch of jobs is gathered into a list, then hibernated, and then processed as a list, and then "wake()" on each job re-instantiates the job and it continues down the pipeline.

Batch Jobs IMPROVEMENT

  • Jobs can now be fired in response to batches. These are called "batch jobs."
  • Basically, once a batch of individual jobs is totally complete, the batch itself can then be processed down a pipeline.
  • Job information can be copied to the batch using Groovy.
  • This allows for batches of jobs to be resubmitted or otherwise processed as a batch.
  • It also records the status of every job in the batch, so that the entire batch can be re-submitted, should a single job in the batch throw an error.

Allow the Aspire Heritrix Connector to crawl through NTLM-authenticated secure sites IMPROVEMENT

Post to HTTP JSON IMPROVEMENT

  • Allows Aspire to Post JSON to external servers (previously was just XML)
  • Includes a "JSON Transformer" built on top of Groovy which can transform AspireObjects into any JSON structure using a JSON template
  • This allows users to transform the Aspire objects into any JSON as required by the destination servers.
  • Intended and tested for posting to ElasticSearch

Minor Updates and Bug Fixes

  • "Target Exists" option for the conditional branch connector
  • DXF form option to choose components from pull-down lists
  • FetchURL can specify additional HTTP request headers, such as "user-agent"
  • Publisher XSL transforms have been updated to the new metadata structures
  • S3 Connector refactored for new connector framework
  • Documentum connector refactored for new connector framework
  • Fix JSON tokenization problems when reading JSON into Aspire
    • Parsing of numbers, hex digits, etc. fixed
  • Fix bug with running connectors every X minutes
  • Heritrix crawler now saves checkpoints so it can be paused and re-started without having to start from scratch, also after system crash
  • Fixed Heritrix failure with invalid starting URLs
  • Fixed bug in Documentum where the count of documents was too high
  • New peekRoute() on job routing tables which allows you to see what's next in the table without actually affecting the table
  • HTTP Feeder uses MAX_FILE_SIZE parameter for multi-part file requests
  • (custom-plug-in) content types are now fetched by the Documentum connector
  • Aspire now works with the IBM JVM
  • 403 response codes now logged as errors by Post-XML
  • Fixed cases where error detail is missing in Admin UI
  • Add configuration options for LDAP searching for group expansion component
  • Fixing of mapped values for PostToFS4SP
  • Fixing of long labels on DXF causing input alignment issues
  • Aspire Object
    • Fixed bug while parsing JSON. If a numeric value was followed by NEWLINE, then the parser would fail and return "invalid numeric value"
    • Fixed a bug where removeChildren() removed attributes as well
      • Added removeAttributes() function
  • HTTPFeeder
    • Fixed a bug where the HTTP feeder gave a null pointer exception when routing to an unloaded application
  • Connector AspireObject Metadata
    • CIFS Connector, Documentum Connector, Filesystem Connector, SharePoint Connector, Publish to Solr, Publish to GSA and Publish to CouldSearch have been updated to produce and expect the metadata described in Connector AspireObject Metadata
  • aspire-post-fs4sp
    • When an AXPath query in the mappingFile returns an AspireObject, it now takes for each AO returned the .getContent() value of the AO.

Other Features Available in v1.2 with Custom Services

  • Business Rules
  • Query Caching through Aspire
  • Query Federation through Aspire
  • FAST Listeners
    • Listen and intercept FAST search engine protocols
      • ContentAPI Listener (receives new documents from FAST connectors)
      • SearchAPI Listener (receives new queries from FAST search applications)
    • These listeners allow Aspire to be a proxy server for communications to FAST.
    • Connectors and search applications send their requests to Aspire first, which can modify or handle the request before forwarding it to the FAST search engine itself.

Pages in category "1.2 Release"

The following 55 pages are in this category, out of 55 total.