Quick Start Tutorial for Creating Applications

From wiki.searchtechnologies.com
Jump to: navigation, search

For Information on Aspire 3.1 Click Here

In this quick-start—that will get you started with Aspire in 20 minutes or less—you’ll learn how to use the Aspire Debug console to load applications, send content through a pipeline and review the results, and install a simple Groovy script component.

Prerequisites

Before you begin, you need to be registered to use Aspire (go to http://aspire.searchtechnologies.com/) if you haven't already done that.

You will need your user registration name and password in order to complete this tutorial.


Step 1: Install Java

The version of Java you should use depends on the Aspire version you are targeting to:

  • Aspire 2.1.2 and earlier runs on Java 1.6 or Java 1.7
  • Aspire 2.2 and up requires to run at Java 1.7.

Note that we recommend installing the Java JDK (Java Development Kit), just in case you want to create your own Aspire Components in the future. But really, only the JRE (Java Runtime Environment) is absolutely required.

  1. Download and install the latest version of the Java JDK appropriate for the system that will run Aspire: http://java.com/en/download/manual.jsp
    • If you have a 64 bit machine, we recommend installing the 64 bit version of Java. That will allow you to create large-memory instances of Aspire.
      • The Aspire framework itself does not use up that much memory (100mb or so). But some applications may store big hash tables to improve performance, so it's best to have the 64 bit JVM (Java Virtual Machine), just in case you need it someday.
  2. Test that you can access the "java" command from your console.
    1. Open up a new DOS command-shell (go to the Start menu, enter "cmd" in the "Run" or "Search for Programs" field, and then execute the cmd.exe program).
    2. At the prompt, enter the following, then press the Enter key: java -version
    3. Success is indicated when version information is returned.

up to Aspire 2.1.2:

 > java -version
 java version "1.6.0_18"
 Java(TM) SE Runtime Environment (build 1.6.0_18-b07)
 Java HotSpot(TM) 64-Bit Server VM (build 16.0-b13, mixed mode)

or as of Aspire  (2.2 Release)  :

 > java -version
 java version "1.7.0_79"
 Java(TM) SE Runtime Environment (build 1.7.0_79-b15)
 Java HotSpot(TM) 64-Bit Server VM (build 24.79-b02, mixed mode)

Step 2: Download the Quick-Start Distribution

Download and unpack https://wiki.searchtechnologies.com/binaries/. For purposes of this tutorial, we'll use "aspire-distribution-2.2.2" as the directory name to which you unpack Aspire.

Note: This is not the best way to create a new Aspire Distribution. The official method is to use the Distribution Archetype, which requires also downloading a Maven client. There's a separate tutorial for getting started using this method: Aspire Quick Start with Distribution Archetype.

The download will create a directory structure similar to that described in Aspire Directory Structure.


Step 3: Edit the Aspire settings.xml File

Go to the directory where you unpacked Aspire (such as "aspire-distribution-2.2.2") and type "config" to go to the configuration directory. Open the settings.xml file with a text or XML editor. Look for the maven repository tag. You need to replace the user name and password that displays with the user name and password you used to register for Aspire.

<repository type="maven">
     <defaultVersion>2.2.2</defaultVersion>
     <remoteRepositories>
       <remoteRepository>
         <id>stPublic</id>
         <url>
           http://repository.searchtechnologies.com/artifactory/simple/community-public/
         </url>
         <user>YOUR-USERNAME-HERE</user>
         <password>YOUR-PASSWORD-HERE</password>
       </remoteRepository>
     </remoteRepositories>
   </repository>

Once you've entered your user name and password, save the file.

Step 4: Start Up Aspire

First, make sure you have access to the internet so that Aspire can download components. Next, still in the Aspire directory you created, change to the bin directory and type "startup" to launch Aspire.

Note that "startup" is a batch script (on Windows) or a shell script (on Unix) that can be modified as necessary if you need more memory or need to set other system properties.

Aspire may take a few seconds to load all of the necessary components.

NOTE: If you are downloading Aspire Community, ignore the error message about being unable to download the com.searchtechnologies:aspire-dcm-enterprise component. The aspire-dcm-enterprise component is available only with Enterprise systems (and is used for Distributed Processing).

Step 5: Go to the Aspire Debug Console

EmptyDebugConsole.png

Leaving the terminal window open, start up a web browser at http://localhost:50505/aspire .

We use the debug console because it provides more visibility for creating new applications.

What you should see is an "empty" console with no applications loaded.


Step 6: Load the Sample Application

SampleApplicationStarted.png

The quick-start distribution (and all distributions) have a sample application specified in the "config/application.xml" file.

To load this application, in the box which says "Load a new application", enter:

config/application.xml

And then click "start".

If your sample application has started correctly, you should see that the "FeedOneExample" configuration is loaded and running.

If something's not right: Check the Aspire command console (where you entered "bin\startup.bat" to start Aspire) for errors.


Step 5: Run a web page through the pipeline and look at the result

One of the surprising features of Aspire is that every component in the system has its own web page. This is enormously useful for doing debugging, testing, and manipulating of the system through the Aspire Admin interface.

For this next step, we're going to feed a document through the configured Aspire pipeline and check out the result. To do this, we're going to go to the web page for the "FeedOne" component, which can be found here: http://localhost:50505/aspire/FeedOneExample/FeedOne/ .

Alternatively, you could also click on the "/FeedOneExample" link from the Aspire home page, and then the "FeedOne" link from there.

Enter a URL to feed in the "URL to Feed" box (for example, "http://www.searchtechnologies.com") and then click on "Feed".

The URL should only take a fraction of second to process. Refresh the screen until the status for your job is "complete", then click on the "detail" link, as shown below:

What you will see is an XML document - the document which was passed down the Aspire pipeline.

 <doc>
   <fetchUrl>http://www.searchtechnologies.com</fetchUrl>
   <httpResponse code="200" source="FetchURLStage">OK</httpResponse>
   <protocol source="FetchURLStage/protocol">http</protocol>
   <host source="FetchURLStage/host">www.searchtechnologies.com</host>
   <mimeType source="FetchURLStage/mimeType">text/html</mimeType>
   <encoding source="FetchURLStage/encoding">utf-8</encoding>
   <extension source="FetchURLStage">
       <field name="status">HTTP/1.1 200 OK</field>
       <field name="Date">Mon, 07 Mar 2011 03:22:12 GMT</field>
       <field name="Server">Microsoft-IIS/6.0</field>
       <field name="Cache-Control">private</field>
       <field name="Content-Type">text/html; charset=utf-8</field>
       <field name="Content-Length">12485</field>
   </extension>
  <title source="ExtractTextStage/title">Search Technologies: The search engine implementation experts</title>
   <contentType source="ExtractTextStage/Content-Type">application/xhtml+xml</contentType>
   <description source="ExtractTextStage/description">We architect, implement and tune leading search engines. Our 80+ experts will help your search implementation exceed expectations regardless of which search engine you use</description>
   <extension source="ExtractTextStage">
       <field name="Content-Location">http://www.searchtechnologies.com</field>
       <field name="Content-Encoding">ISO-8859-1</field>
   </extension>
   <content source="ExtractTextStage"><![CDATA[
       
       Home
       
       About Us	Executive Team
       Careers
       Frequently Asked Questions
 .
 .
 .
 ]]></content>
   <domainName source="ExtractDomainImpl">searchtechnologies.com</domainName>
 </doc>


Note the metadata which added to this document from all of the various pipeline stages:

  • FetchURLStage - HTTP Headers and server status
  • ExtractTextStage - Document content and document metadata
  • ExtractDomainImpl - The domain name from the URL


Step 6: Add a Groovy component

For the final step, we're going to add a Groovy script component to the document processing pipeline to convert the title to ALL CAPS. This will require editing the Aspire system configuration file to add this new component, as well as adding it to the pipeline.

Note that you do not need to shut down Aspire to make this change.

To add the Groovy component, edit the "config/system-example.xml" file in your aspire-quick-start installation directory to include the following component:

 <component name="ExtractDomain" subType="default" factoryName="aspire-extract-domain" />
   
   <!-- vvvv ADD THESE LINES BELOW vvv -->
   <component name="UpperCaseTitle" subType="default" factoryName="aspire-groovy">
     <script>
       <![CDATA[
         doc.set("title",doc.getText("title").toUpperCase());
       ]]>
     </script>
   </component>
   <!-- ^^^ ADD THE LINES ABOVE ^^^ -->
   
   <component name="PrintToFile" subType="printToError" factoryName="aspire-tools">
     <outputFile>exampleDebug.out</outputFile>
   </component>

In the Groovy code above, the "doc" variable is a reference to the document object being processed by the Aspire document processing pipeline. Click Groovy Scripting to learn more about Groovy scripting in general, and AspireObject or AspireObject to learn more about the Java object which is referenced by the "doc" variable.

Next, be sure to add the new Groovy component to the pipeline:

       <pipelines>
         <pipeline name="doc-process" default="true">
           <stages>
             <stage component="FetchUrl" />
             <stage component="ExtractText" />
             <stage component="DateChooser" />
             <stage component="ExtractDomain" />
             <stage component="UpperCaseTitle" />  <!-- ADD THIS LINE HERE -->
             <stage component="PrintToFile" />
           </stages>
         </pipeline>
       </pipelines>


Now, go to the Aspire home page (http://localhost:50505/aspire) and click on the "reload" button:

QuickStartReload.png


Now go back to FeedOne and try feeding the http://www.searchtechnologies.com link again. Your title should now be in all CAPS, as follows:

QuickStartUpperCaseTitle.png

Step 7: Congratulate yourself! (and shutdown)

Congratulations!!

You have completed the 20-minute quick start for creating applications.

Next, to see how to build Aspire distributions from scratch using Maven prototypes and Maven component repositories, you should try the Aspire Quick Start with Distribution.

To shutdown Aspire, go to the home page (http://localhost:50505/aspire) and click on the "Shutdown" link. Or, you could go to the Aspire console window (where you started Aspire with "bin\startup") and type "shutdown" and then press the Return or Enter key. Either way will shut down Aspire.

Cheers!

The Aspire Development Team