Workflow (Aspire 2)

From wiki.searchtechnologies.com
Jump to: navigation, search

For Information on Aspire 3.1 Click Here

The workflow is the successor of the Routing Table, with this new feature we can add, remove, change information, create branches for the flow of information, publish to different search engines and more.

The Workflow is divide in two (2) parts, the Workflow Library and the Workflow Tree, please see Workflow Section.

Workflow Library

CSC-Library.png

In the Workflow Library we can find all the applications and rule to create our own business rules, all stored in a collapsible menu divide by categories. In the workflow we can find two (2) types of rules, application rules that are downloadable java applications such as the Publishers and Application Functions, the Publishers and Application Functions may change according to your publishers and applications entitlements, and scripting rules, that are predefined templates of Groovy scripts.


Categories

Both application rules and scripting rules are divided by categories, we are going to see all the categories and what can we expect to find in each one of them.

Publishers

The Publishers are the same publishers from older versions of Aspire, with them we can publish the jobs coming from the connector and publish them to a search engine, each publisher must be configure before be added as a application rule. They are all downloadable java applications.

Application Functions

The Application Functions as the publishers are the same ones as older version of Aspire, they are used to modify the jobs coming from connector in a more complex way that a simple Groovy script could had done. They are all downloadable java applications.

Base Functions

The Base Functions are scripting rules base on Groovy that can do simple functions to the jobs, we can also find rule that can skip or stop the process of a job in the workflow. We can also create our own custom rules.

Choices

The Choices are scripting rules base on Groovy that allow us to create branches in the workflow base on a decision, this branches are only be executed by the jobs that comply with the made decision.

Share Libraries

The Shared Libraries are sets of rules and applications we save to use with other content source, if a rule or application is not in a share rule, we can't used again in other content source.

How to Share a Rule or Application?

This section walks through the steps necessary to share any rule or application with the enabled menu.

Step 1: Open the Context Menu

We right click over the rule or application we want to share, if it can be shared, the Share option must be enabled. We put the mouse over the Share, this should open the sub menu where we can see the New Library option, and click on it, this will open a window where we can put the name of the library.


Step 2: Create the Share Library

After we enter the name of the library, we click on the Share button, this will create the share library with the rule or application inside of it. We can verify this by looking at the bottom of the Workflow Library, where the share library should be, if we click the share library it will open and we can see the rule or application.

* The share library may change position if we have several shared libraries, this is because is going to sorted by alphabetical order.


Step 3: Share to a existent Share Library

If we already have a share library and we want to add a new rule or application to it, we can repeat Step 1 and instead of clicking over New Library we can click over the name of our library, this will add the rule or application automatically.


Step 3a: Unshare a Rule or application

If we want to unshare a rule or application, we can do this by clicking on the trashcan CSC-Trashcan.png, in the left side of the rule or application name inside of the share library.

Workflow Trees

CSC-Workflow-Tree.png

The Workflow Trees represent specific point in the information flow, they are stages in which the jobs are processed before they are sent to the next one. The root node of each tree is always going to be named as the stage it represents. We have five (5) trees for the content source:

  • After Scan: Process documents before their content is fetched. Typically used to terminate jobs to avoid fetching unwanted documents.
  • On Add/Update: Process documents to be added or updated in the index. Typically used to map/normalize metadata fields and values.
  • On Delete: Delete events go through this workflow. Typically empty, this can be used to update an external repository as well.
  • On Error: Any job which encounters an error goes through this workflow. Could be used to log the error or quarantine the document.
  • On Publish: Publish documents to a search engine. Put your publisher in this workflow.

* The workflow trees are saved with every change you made, they are independent from the Save button

Controls

CSC-Workflow-Controls.png

The controls for the Workflow Tree are explained in the UI Introduction. But the most important control for the Workflow tree is the Tree Selection. With this control we can select the tree we want to modify, and we can see it in a representation of the information flow, also we can see the description of each tree just by putting the mouse over the button. The current tree will be highlighted in green

CSC-Pipeline.png

Rules Restrictions

For some rules there are some restrictions of which rule can contain which one, in the list below we can see which rules are containers of which types of rules.

  • Root
    • Folder
    • Publishers
    • Application Functions
    • Basic Functions
    • Choices (but not the Condition);
  • Choices
    • Condition (Only)
  • Condition
    • Choices
    • Folder
    • Exit (The only one from Basic Functions)
  • Folder
    • Folder
    • Publishers
    • Application Functions
    • Basic Functions
    • Choices (but not the Condition);

How to add a rule or application to the Workflow Tree

This section walks through the steps necessary to add any rule or application to the Workflow Tree.

Step 1: Drag from the Library

You can add any rule or application just by dragging the rule from the Workflow Library and dropping it in the part of the Workflow Tree you want to put the rule or application.

CSC-Drag1.pngCSC-Drag2.png


Step 2: Fill the properties

If the rule needs to be configured, it will open a window with all the configuration properties, otherwise it just will appear in the tree. If a window is opened just, fill the necessary properties and click on the Add button to create the rule.

CSC-Modal1.png

Step 2a: Update the properties

By double clicking the rule on the tree, we can open the window it the current properties of the rule. Then we can change the properties and save them by clicking on the Update button.

Context Menu

CSC-Context-Menu.png

The Context Menu has several option to manage the business rules:

  • Cut: Cuts the current rule.
  • Copy: Copy the current rule.
  • Paste: If it is pasting from a cut, it just paste the business rule, but if it is pasting from a copy it creates a new rule with the same properties.
  • Paste Reference: (Enabled only after a copy) creates a reference to the rule we are coping. this means that if we change one the reference or the original rule, all the reference pointing to that rule will change.
  • Delete: Deletes the reference to the rule, if the rule isn't in a Share Library it will also delete the rule.
  • Disable/Enable: Disables or enables the current rule or reference, if the rule is disabled it will be gray and if the rule disabled is a parent, all the branch is inaccessible, this will be displayed as the children with a gray description but a black icon.
  • Share: (Enabled if the rule isn't already shared) Opens a sub-menu with the options to share.
    • New Library: Opens a window where you can specify the name of the new Share Library.
    • Other Libraries: (If any) Displays the name of the existent Share Libraries where we can put the rule.
  • Unshare: (Enabled if the rule is already shared) Removes the rule from the current Share Library


Workflow Rules and Application

For the Workflow we have added several scripting rules and applications that we can use. In this section we are going to explain all the new rules we introduce in Aspire 2.0.

Applications

The includes the Publishers, for which all related information can be found on this page, and the Application Functions that at the moment includes the Hierarchy Extractor and Mime Type Normalizer.

Custom Applications/ Publishers

By dragging and dropping the Custom option of publishers or applications, we will open a window where we can choose between two methods to install a custom application/publisher, repository and configuration files, both show as toggle buttons on the top of the window.

Repository

The repository method is always the default one, with this option we can download the custom application/publisher from a maven repository. To install the custom application/publisher with need to fill the following fields:

  • 'Name: This will be the name of the application/publisher in the system, and it must be unique, otherwise, we will receive an alert indicating that the name is already in use.
  • Description: This will be the description displayed in the tree and it will be the text which we will use to identify the application.
  • Group ID: e.g. com.searchtechnologies.aspire
  • Artifact ID: The id of the artifact representing the connector e.g. app-custom-connector
  • Version: (Optional) If the version of the artifact isn't specify, Aspire will use the same version as it.
CSC-Custom-Application1.png

After got all the necessary fields, we click on continue and the application will be loaded in the same window.

CSC-Custom-Application2.png

* All the application/publisher added using this method will be added to their respective category in the Workflow Library.
** It is not recommended to use an older version of a connector is a new version is available.

Configuration Files

Before accessing the configuration file method an alert will be show indicating that the application/publisher added using this method are not going to be included in the respective category in the Workflow Library.

CSC-Custom-Application4.png

The configuration files method requires to have both application file and dxf file in the Aspire server. To install a custom application/publisher using this method we just have to specify the direction of the application file.

CSC-Custom-Application3.png

After got all the necessary fields, we click on continue and the application will be loaded in the same window.


* If the dxf file doesn't have the new valid format for application/publisher, it won't be possible to configure the application/publisher.

Scripting Rules

Folder

The Folder is a container where we can put all the business rules, then copy only the folder to other tree or other branch, and the folder will be copied with all the bussiness rules inside it, also if the content of the folder is modified, all its copies will be modified too.

To create a folder we only need the description.

CSC-Modal-Folder.png

Exit

The Exit rule doesn't have any configurable properties, its function is only to stop further processing of the job in the tree.

Job Terminate

The Job Terminate doesn't have any configurable properties, its function is to termite the job, this means that there is no more process done to the job at all.

Raise Exception

The Raise Exception rule, creates and exception in the workflow. The only field need it for this rule is the message of the exception.

CSC-Modal-RaiseException.png

Set String Value

The set string value rule, sets the content of a field we choose with the string we want, if the field doesn't exist it creates one, if it does exist overrides the content.

CSC-Modal-SetStringValue.png

Custom Script

The Custom rule open a window where we can put our own Groovy script, and used as a rule in the workflow, you can also click the text area and press F11 to make a full screen text area. For this rule we only need the description and the Groovy script we want to use.

CSC-Modal-Custom.png

For more information please see Using Groovy.

Condition

The condition is a complement of the Choices, each condition represents the result obtain from the choice. It doesn't have any configurable properties and it can only be host by Choices.

Boolean

This choice return a boolean which is determined by the fact of if the content of the field we choose is equals to the value we expected. This type of choices must only have at most two (2) conditions (true, false)

CSC-Modal-Boolean.png

Switch

This choice makes a decision base on if the content of the field matches with one the conditions it has, for example, if the field we choose is repType, and the possible values for it are "document", "attachment" and "blog", we can put three (3) conditions that has "document", "attachment" and "blog".


For this rule we only need the field we want to test.

CSC-Modal-Switch.png

Boolean (Byte array)

This choice return a boolean which is determined by the fact of if the content of the field we choose is equals to the value we expected (The value and the content of the filed must be a byte arrays). This type of choices must only have at most two (2) conditions (true, false)

CSC-Modal-BooleanByteArray.png

Switch (Byte array)

This choice makes a decision base on if the content of the field matches with one the conditions it has (The conditions and the content of the filed must be a byte arrays).


For this rule we only need the field we want to test.

CSC-Modal-SwitchByteArray.png

Exclude-By-Name

This choice checks if the file name match or not with the pattern the user entered.

For this rule, we have to check the field: matches if we want the pattern makes match with the file name on the contrary we don't have to check it. Also in the field pattern we have to set a regex with the pattern we want to filter.

CSC-Modal-ExcludeByFileName.png

Exclude-By-File-Ext

This choice checks if the file extension match or not with the pattern the user entered.

For this rule, we have to check the field: matches if we want the pattern makes match with the file extension on the contrary we don't have to check it. Also in the field pattern we have to set a regex with the pattern we want to filter.

CSC-Modal-ExcludeByFileExtension.png

Exclude-By-File-Size

This choice checks if the file size is in the range the user specified. The values of the range have to be set in bytes.

For this rule, we have to set the fields: min and max in bytes. If you specified -1 in either of that fields it will means infinite.

So, for example if you set field: min = 1000 and max = -1, it will filter all files where dataSize is major or equal than 1000 bytes.

CSC-Modal-ExcludeByFileSize.png