What is Aspire

From wiki.searchtechnologies.com
Jump to: navigation, search

For Information on Aspire 3.1 Click Here

What Is Aspire?

The focus of Aspire is to “structure unstructured data”. Aspire is a framework and back-office application platform for building solutions that acquire documents or data records from just about any content source, processing / enriching that content, and then publishing that information to search engines or other content consuming business applications.

Complete and consistent unstructured content and the metadata that supports it are increasingly important to implementing modern user interfaces and next generation content analysis.

Aspire can acquire documents from various CMS, relational databases, file systems, and many other repository types, both behind the firewall and offered as SaaS. All available metadata is captured along with the document content, including document-level access control lists. This metadata can then be normalized, enriched, combined and reformatted as necessary before submitting to search engines, document repositories, or business analytics applications. Aspire is both search engine and content repository independent, and is in use with a range of leading search products including the Google Search Appliance (GSA), Solr Lucene, Amazon CloudSearch and Microsoft SharePoint / FAST.

Content Processing

The diversity of processes needed to normalize, enrich, combine, divide and reformat content to accomplish the requirements of search and other unstructured information systems is almost as varied as the content itself. Aspire has a rich set of task specific components that can be combined to create larger units of logical processing, that in turn can be linked together to perform complete solutions, we call Content Sources.

Search Technologies’ engineering staff is constantly developing larger units of processing for the clients they serve and publishing them on this wiki in the Applications section for the community to modify and use. The open nature of the Aspire community also supports the development and sharing by engineers outside of Search Technologies.

These new Content Sources are made available as either Groovy Scripts or java components that operate within the Aspire componentized architecture based on OSGi. This makes developing and sharing components and applications across computers, development teams, and organizations a joy. New applications and software updates can be dynamically loaded into running Aspire systems with just a couple of clicks. This is a new deployment paradigm which dramatically improves usability and scalability, while simplifying the administration - and all without losing flexibility.

See also A Brief History of Aspire.

Uses for Aspire

Aspire is being used in many types of customer applications, here are some examples:

  • Enterprise search to enrich content with additional metadata to support advanced navigation.
  • State government information site to extract metadata from OCR files and normalize the data prior to indexing
  • Records management to automatically categorize corporate data as it is migrated into SharePoint where content needs to be aggregated and categorized before searching
  • Legal research to find and analyze content for forward and reverse citations to other content to improve recall and analysis
  • Company intranet to automatically create enterprise wide site maps for browsing style investigation
  • Staffing and recruitment to provide search and match solutions between candidate CVs and job descriptions
  • Federal government information site to intelligently split up large single files pertaining to laws into searchable” chapters and clauses

Aspire is extremely flexible. By pulling the data processing pipelines out of the search engine, Aspire can more easily manipulate content and metadata, can process it in multiple pipelines simultaneously (and over multiple machines), and then feed it to one or more engines for indexing.

Aspire Features

Over time, Aspire has grown to encompass many types of content. In addition to processing documents, Aspire can:

  • Handle Proxy LDAP requests, including:
    • Authenticating users
    • Determining user group membership across a multitude of systems
  • Federate search requests
    • Distribute queries to multiple search engines
    • Merge search results
  • Process streams of tokens, for performing text analytics
    • Entity extraction
    • Latent Semantic Analysis
    • Document vector creation and comparison
    • Topic Analysis

Some other features of Aspire include:

  • Automatic threading of document processing jobs
  • Ability to split document processing jobs into sub jobs, and the management/coordination of those jobs
  • Dynamic configuration changes
  • Dynamic addition of new components
  • Dynamic refresh of component code
  • Rich built-in XML processing methods including XPath and XSLT
  • Hierarchical component configuration
  • Rich and comprehensive web-based administration and control interface
  • Tightly integrated with Maven repositories for sharing and loading component code

These features are described in more detail in Key Concepts and other sections of this wiki.