Connector Scanner Stage Test Harness

From wiki.searchtechnologies.com
(Redirected from Scanner Stage Test Harness)
Jump to: navigation, search

For Information on Aspire 3.1 Click Here

Page currently under construction

This page is currently under construction


 (2.2 Release)  

Introduction

In version 2.2, we have updated the scanner framework to allow testing of scanner stages without the need for Aspire. The test harness is still under development and is subject to change.

The test harness is a tool which allows developers to scan repositories or download users and groups without needing to run up Aspire. It is run from the command line and uses a text based menu to call scanner functionality. The output can then be sent to a file or to the screen.

The harness supports the "Hierarchical" scanners, the "Linear" scanners and the "Push" scanners, although the actually functionality supported depends on the scanner type.

Building the Test Harness

You can build the Test Harness from the source code of any connector (assuming you've got Maven installed). The pom files include a profile to build/download everything that is required to run the tester.

If you've checked out the code, go to the directory in which the pom.xml exists. Run the command:

 mvn clean install -Ptester

This command will run the scanner build, including unit tests (if you want to disable the unit tests, add -DskipTests to the end of the command line). If the build is successful, you should see:

 [INFO] ------------------------------------------------------------------------
 [INFO] BUILD SUCCESS
 [INFO] ------------------------------------------------------------------------ 

In the project target directory, you will then have scannerTester directory and the connector jar file (aspire-documentum-connector-2.2-SNAPSHOT.jar for instance). The scannerTester directory contains a bin directory with a batch file to run the tester and a lib directory that contains all the required dependencies for the connector.

Running the test harness

Standard usage

In most cases, the test harness can be run using a batch file extracted when the test harness is built. Instructions are below:

  • Change to the target/scannerTester direcotry
  • Invoke the tester using the batch file, passing the jar file of the scanner to be tested as a parameter
 bin\test.bat -jar ..\aspire-documentum-connector-2.2-SNAPSHOT.jar

Command line options

The scanner test harness supports the following options:

Option Description
-jar <jar-file> Specify the jar file which contains the Aspire scanner component

Not needed if -class is specified

-class <full-class-name> Specify the full class name (e.g. com.searchtechnologies.aspire.components.FileSystemScanner) of the scanner class");

Not needed if -jar is specified

-home <Aspire-home> Specify the Aspire Home directory
-subType <subType> Specify the sub-type for the scanner class. If not specified 'default' is assumed
--help Show usage

'NOTE: one of -jar or -class must be specified


Advanced usage

See below for information on running the test harness with out using the batch file


Test Harness Menu Options

When run, a menu will be displayed to the console. Some of the options are common to all scanner types, others are specific to individual scanner types.


Common Options

The following options are common to all scanner types:

Option Description
Initialize or view scanner configuration Shows the current scanner configuration and allows you to load a new one from an xml file. The scanner configuration is the set of properties passed to the scanner when the component is loaded in to Aspire. The format varies from scanner to scanner, but you can use the parameters passed to the component from the application.xml file in the connector app-bundle as an example
Load or view content source job Shows the current scanner content source job and allows you to load a new one from an xml file. The content source job is the set of properties passed to the scanner when a crawl is process. The format varies from scanner to scanner, but you can use the content-source.xml file from the content-sources directory of an Aspire distribution as an example. The root node of the file should be <doc>, but if the test harness sees <connectorSource> it will wrap a <doc> around it
Download users and groups to the cache Initiates the process to download the users and groups from the repository to the user/group cache (if the scanner supports this)
Dump the user-group cache Dumps the contents of the user/group cache to a file or the screen (if the scanner supports this)
Lookup a single user in the user-group cache Looks up a single user (entered by the user when prompted) in the user/group cache and outputs the results to the screen
Download special ACLs to special ACL cache Not yet implemented
Dump the special ACLs cache Not yet implemented
Dump ACL intersections (typically requires a full scan first) Dumps the contents of the intersection ACL database to a file or the screen (if the scanner supports this). The database is populated by a full scan

If you wish to quit, press Q followed by return.

Scanner configuration file

You will need to provide a scanner configuration file containing the xml to be passed to the scanner when it is created. The format is (unfortunately) specific to the type of scanner (so the file system scanner is different to the Documentum scanner). However, you will notice some common pieces.

Knowing what to have in the configuration can seem difficult, but if you have the scanner running in Aspire, you can get the configuration of an installed scanner via the debug interface.

Navigate to the page for a scanner using a url of the form http://localhost:50505/aspire/SOURCE-NAME/Main/Scanner/ and view the page source. You should be able to view the configuration in the <config> tag for the component.

If you can't run Aspire, you can look in the application.xml file of the scanner app-bundle to work out the configuration.

Example scanner configuration file
<?xml version="1.0" encoding="UTF-8"?>
<config>
  <debug>false</debug>
  <fullRecovery>incremental</fullRecovery>
  <incrementalRecovery>incremental</incrementalRecovery>
  <metadataMap>
    <map from="action" to="action"/>
    <map from="doc-type" to="docType"/>
    <map from="last-modified-date" to="lastModified"/>
    <map from="content-length-bytes" to="dataSize"/>
    <map from="owner" to="owner"/>
  </metadataMap>
  <snapshotDir>C:\Users\aspire\demo\target\demo-1.0-SNAPSHOT-distribution\data/FileScanner/snapshots</snapshotDir>
  <fileNamePatterns>
    <include pattern=".*"/>
    <exclude pattern=".*tmp$"/>
  </fileNamePatterns>
  <emitCrawlStartJobs>true</emitCrawlStartJobs>
  <emitCrawlEndJobs>false</emitCrawlEndJobs>
  <enableAuditing>true</enableAuditing>
</config>

Content source job

The content source job is also specific to the connector. The easiest way to get an example job is by using a running Aspire instance. Ensure debugging is turned on for the connector and run a crawl. Then visit the url http://localhost:50505/aspire/SOURCE-NAME/Main/IncomingJobLogger?cmd=viewJobs. The jobs here should include the xml you need.

Example content source job
<?xml version="1.0" encoding="UTF-8"?>
<doc>
  <connectorSource>
    <url>c:\testdata\11</url>
    <partialScan>false</partialScan>
    <subDirUrl/>
    <indexContainers>false</indexContainers>
    <scanRecursively>true</scanRecursively>
    <useACLs>false</useACLs>
    <acls/>
    <scanExcludedItems>false</scanExcludedItems>
    <fileNamePatterns/>
 </connectorSource>
</doc>

Hierarchical Scanner Menu Options

The following options are available for Hierarchical scanners:

Option Description
Browse Hierarchy Allows you to (manually) traverse the repository hierarchy, listing "folders" and "documents" and going in to "folders" in order to see there contents. The test harness will start at the initial url given in the configuration and then list the contents. The user can then pick an item from the list and display the contents of that
Scan a specified URL Initiates a scan of a url given in response to a prompt from the harness. The results are output to the screen or a file
Scan everything (automatically scans nested folders) Initiates a recursive scan of the initial url given in the connector source job file. The results are output to the screen or a file


Linear Scanner Menu Options

The following options are available for Linear scanners:

Option Description
Scan everything Initiates a scan of the initial url given in the connector source job file. The results are output to the screen or a file


Push Scanner Menu Options

The following options are available for Push scanners:

Option Description
Crawl everything Initiates a crawl of the initial url given in the connector source job file. The results are output to the screen or a file


Running the tester without the batch file

The harness can be invoked by entering a java command from the command line. The full java command must be specified, including the classpath, the full name of the test harness class and any options to pass to it.

The Java classpath must include:

  • org.osgi.core-4.2.0.jar & org.osgi.compendium-4.2.0.jar
    • the OSGI container jar files
  • aspire-core-<version>.jar
    • The Aspire core file containing services and framework
  • aspire-scanner-<version>.jar
    • The Aspire scanner framework
  • aspire-simple-group-expander-<version>.jar
    • The Aspire group expansion framework
  • aspire-<repository>-connector-<version>.jar
    • The connector under test. Sometimes this could be named aspire-<repository>-connector-<version>.jar

The classpath must also include any (other) jars on which the scanner is dependent. Typically these will be included in the aspire-<repository>-connector-<version>.jar file, but you must extract them and add them to the classpath manually (see below).

To run the test harness, you run Java to invoke a Java class. You must therefore therefore also include the full name of the class to invoke - com.searchtechnologies.aspire.scanner.testtool.ScannerTester

Command line options

In addition to the standard Java command line options, you can specify any of the Command line options above

'NOTE: one of -jar or -class must be specified

Example command line

Below is an example command line for testing the file system scanner

  java
    -cp .\aspire-scanner-2.2-SNAPSHOT.jar;
      .\aspire-core-2.2-SNAPSHOT.jar;
      .\aspire-simple-group-expander-2.2-SNAPSHOT.jar;
      .\aspire-filesystem-connector-2.2-SNAPSHOT.jar;
      .\org.osgi.core-4.2.0.jar;
      .\org.osgi.compendium-4.2.0.jar
    com.searchtechnologies.aspire.scanner.testtool.ScannerTester
    -jar .\aspire-filesystem-connector-2.2-SNAPSHOT.jar

NOTE: New lines have been added for readability in the above command. It is a single command and should be on a single line. The -cp parameter is a single value and should contain no spaces

If all your jar files to be added to the classpath are in a directory, you can use the java property java.ext.dirs instead of class path:

  java
    -Djava.ext.dirs=./lib
    com.searchtechnologies.aspire.scanner.testtool.ScannerTester
    -jar .\aspire-filesystem-connector-2.2-SNAPSHOT.jar

NOTE: Again, new lines have been added for readability in the above command. It is a single command and should be on a single line.

Extracting embedded jar files

Assuming you have the JDK installed, you can use the jar utility to extract embedded jar files from the scanner jar file under test.

To extract the jar file aspire-scanner-2.2-SNAPSHOT.jar from the aspire-filesystem-connector-2.2-SNAPSHOT.jar use the command:

 jar xf aspire-filesystem-connector-2.2-SNAPSHOT.jar aspire-scanner-2.2-SNAPSHOT.jar

To inspect the contents of the file (so you know what to extract) use the command:

 jar tf aspire-filesystem-connector-2.2-SNAPSHOT.jar

Then look for the .jar files at the root level and extract each file in turn using the commands above.