RDBMS Connection (Aspire 2)

From wiki.searchtechnologies.com
Jump to: navigation, search

For Information on Aspire 3.1 Click Here


RDBMS Connection (Aspire 2)
Factory Name  com.searchtechnologies.aspire:aspire-rdb
subType  default
Inputs  None
Outputs  Document variable 'jdbc' (when used as a pipeline stage), holds a Java SQL "Connection" class.

The Aspire RDBMS Connection Pool Component maintains a connection pool to a relational database using the JDBC connector. When requesting a connection, one is returned from the connection pool if it exists. If there are no available connections in the pool, then a new one will be allocated. A separate connection pool is maintained for every configured component instance of the Aspire RDBMS Connection component in the configuration file. Optionally, after a configurable period of time, connections can be purged from the pool, releasing database resources. You may also specify a threshold of available connections, below which the pool will not be purged.

When used as a pipeline stage, the stage will fetch a connection from the connection pool and will store it in object['jdbc'] so it can be used by pipeline modules down-stream. It also puts a connection handler object on the "Closeable" list for the AspireObject, so that, when the job is closed, the connection will be automatically returned back to the connection pool.

When used as an independent component, the component supports the "com.searchtechnologies.aspire.rdb.RDBSConnectionPool" interface.

Note that the component can be used both as an interface and as an independent connection pool at the same time. That is, once you've configured the component to be a pipeline stage (inside of a pipeline manager), you can also access the component directly to get more connections from its pool.

Configuration

Configuration for Connecting to Remote Relational Databases

Element Type Default Description
jdbcDriverJar string null (Required for Remote Database) The location where the JDBC driver Jar file can be located. The driver jar is usually available for download from the Relational Database vendor (see below). The jar should be copied into a directory in your Aspire distribution and then referenced from here. The path specified for "jdbcDriverJar" is relative to the Aspire Home directory. See below for an example.
jdbcDriverClass string specified in the META-INF/services/java.sql.Driver file in the driver Jar file (Optional, but required for some database vendors) The name of the JDBC driver class if the class name from the META-INF/services/java.sql.Driver file in the driver Jar file should not be used, or that file does not exist in the driver jar file (Oracle).
jdbcUrl string null (Required for Remote Database) The JDBC connection URL for accessing the relational database. This is a standard JDBC URL string which specifies the connection protocol, database type, server, port, etc. See below for an example.
user string default (Optional) The user name to use when connecting to the relational database.
password string default (Optional) The password to use when connecting to the relational database.
jdbcProperties/property/@name
jdbcProperties/property
string N/A (Optional) Arbitrary JDBC connection properties (multiples allowed).
startup string default (Optional) SQL commands to execute on startup. Note that these commands are executed every time the component is started or initialized (i.e. restarted, configuration file reload, etc). See below for more details.


Configuration for controlling the connection pool

These parameters are for any type of database, remote or embedded. All of these parameters are optional.

Element Type Default Description
timeout int 0
(= connections never closed)
The time in ms after which a connection in the pool should be considered for closing and purging from the pool. NOTE: the connection will not actually be closed until the next purge, so still could be reused if a request for a connection is received before the next purge occurs.
purgeThreshold int 0
(= infinite connections)
The minimum number of available connections in the pool before a purge will take place. It is not guaranteed that the pool will ever have this many connections available, but the pool will not be purged unless it does.
purgePoll int 60000
(= 60s)
The period in ms between purges of the connection pool.

Example Configuration

Simple

  <component name="RDBConnection" subType="default" factoryName="aspire-rdb">
    <jdbcUrl>jdbc:sqlserver://localhost\SQLEXPRESS:1433</jdbcUrl>
    <jdbcDriverJar>lib/sqljdbc4.jar</jdbcDriverJar>
    <user>test</user>
    <password>test</password>
  </component>

Complex

  <component name="RDBConnection" subType="default" factoryName="aspire-rdb">
    <jdbcUrl>jdbc:sqlserver://localhost\SQLEXPRESS:1433</jdbcUrl>
    <jdbcDriverJar>lib/sqljdbc4.jar</jdbcDriverJar>
    <user>test</user>
    <password>test</password>
    <timeout>60000</timeout>
    <purgePoll>300000</purgePoll>
    <purgeThreshold>10</purgeThreshold>
    <jdbcProperties>
      <property name="jdbcCollection">NULLIDRA</property>
    </jdbcProperties>
    <startup>
      ignore errors;
      create table URLS (
          httpcode varchar(100),
          url varchar(100));
      notice errors;
      delete from URLs;
    </startup>
  </component>

Example of connection to existing Apache Derby RDB

Download driver JAR from Derby Download Page. Extract and get derbyclient.jar from "lib" folder.

<component name="CWSExternalRDBConnector" subType="default" factoryName="aspire-rdb">
  <debug>true</debug>
  <jdbcDriverClass>org.apache.derby.jdbc.ClientDriver</jdbcDriverClass>
  <enabled>true</enabled>
  <jdbcUrl>jdbc:derby://localhost:1527/data/CSManager/db/csManager</jdbcUrl>
  <jdbcDriverJar>C:\jdbc\derbyclient.jar</jdbcDriverJar>
  <user>app</user>
  <password></password>
  <timeout>60000</timeout>
  <purgePoll>300000</purgePoll>
</component>

The above configuration would connect you to the CSManager database (Aspire 1.X version). Note the use of the user name app. The following explanation is taken from here:

If a database is created in Derby using the embedded driver and no user name is specified, the default schema used becomes APP. Therefore any tables created in the database have a schema name of APP. However, when creating a Derby database using the Network Server, the value for the schema becomes the value of the username used to connect with as part of the database URL

Example Use Within A Pipeline

  <pipeline name="process-patent" default="true">
    <stages>
      <stage component="FetchUrl" />
      <stage component="LoadXML" />
      <stage component="RDBConnection"/>

      <!-- Now that the connection has been opened and attached to the job, you can
           use the connection inside of Groovy scripts with the "jdbc." variable.

           In the future, there may be other components which will use it as well.
      -->
    </stages>
  </pipeline>

Example use from within a Groovy scripting component

Typically, Groovy users will want to use the standard Groovy "Sql" object, this can be done as follows:

<component name="createCPAVars" subType="default" factoryName="aspire-groovy">
  <script>
    import groovy.sql.Sql;
    import com.searchtechnologies.aspire.services.AXML;
    doc.putVariable("gsql", new Sql(jdbc));
    gsql.execute("""insert into MyTable(country, pp_num, kind)
                      values(${country}, ${docNumber}, ${kind})""");
    gsql.commit();
    row = gsql.firstRow("select @@IDENTITY ident");
    println "The new identity = ${row.ident}";
  </script>
</component>

Note how, in the above, the JDBC connector can be accessed simply as the variable "jdbc". If you want to use this variable directly (it is the type "java.sql.Connection"), feel free.

Example Use from within a Custom Pipeline Stage

Getting and using the RDB connection from within your custom pipeline stage is easy:

  public void process(Job j) throws AspireException {
    AspireObject doc = (AspireObject) j.getObject();
    Connection rdbConn = doc.getObject("jdbc");
    .
    .  // Use the connection here
    .

    // Note:  No need to return the connection back to the connection pool.
    // It will be returned to the pool automatically when the job is completed
  }

Example For Requesting Connections from Non-Pipeline-Stage Custom Components

If you are writing your own component in Java and you need an RDB connection, you can get components from an aspire-rdb stage, as follows.

Notes:

  1. You will need to specify "com.searchtechnologies.aspire.rdb" in your <Import-Package> statement in your pom.xml file for your component (see How to Include Third-Party Jars in Your Component (Aspire 2) for more details).
  2. Your Maven project will also need to specify the "com.searchtechnologies.aspire:aspire-rdb" maven artifact as a dependency.
  3. Because your component will be dependent on "com.searchtechnologies.aspire.rdb", you will need to make sure that the RDB Component bundle is loaded before your component in the Component Manager (Aspire 2) configuration.
    • This can be done by loading the aspire-rdb bundle before the bundles for dependent components
    • It can also be done by <autoStart> the appropriate settings configuration file (one with an aspire-rdb bundle) before your own system configuration file. For more information on auto-starting, see the settings configuration.
  // Your component will need a service tracker variable:
  // Service tracker for the CCD.
  ServiceTracker rdbTracker = null;

  // *** Somewhere in your initialization code, do the following: ***
  {
    .
    .
    .
    // Probably you should get the name of the ''connection component'' from your configuration file
    String rdbConnectionComponentName = 
        getStringFromConfig(config,"connectionPoolName","/system/RDBConnectionPool");
    rdbTracker =  this.getComponentServiceTracker(rdbConnectionComponentName); 
    .
    .
    .
  }

  // ** Once the tracker is initialized, you can use it to get RDBMS connections:
  {
    .
    .
    .

    // Note: The following should be done *every time* you need to access a connection
    // Do not put the results into member variables for convenience. If the RDB connection is
    // shut down dynamically and then started back up, you want the following code to access
    // the latest and greatest RDBMS pool component.

    // Fortunately, the following executes very, very fast. So there are no performance issues.

    rdbPool = (RDBMSConnectionPool) rdbTracker.getService();
    if(rdbPool == null) {
      rdbPool = (RDBMSConnectionPool) 
         rdbTracker.waitForService(AspireConstants.STANDARD_COMPONENT_AVAILABLE_TIMEOUT);
    }
    if(rdbPool == null) {
      // throw an aspire connection - the pool was not available 
      // even after waiting a reasonable amount of time.
    }

    // Now, fetch a connection:
    Connection rdbConn = rdbPool.getConnection();

    .
    .  // Now, use it
    .

    // Now, return it to the pool
    rdbPool.returnConnectionToPool(rdbConn);

  }

Startup SQL Statements

Some limited capability is available to execute SQL statements whenever an aspire-rdb component is initialized. These statements are executed using the <startup> command:

 <startup>
   ignore errors;
   create table people(age int, name varchar(100));
   notice errors;
   delete from people;
   insert into people
   values(34, 'James Dean'), (92, 'Lady Godiva');
 </startup>


Startup commands are especially useful for in-memory databases such as aspire-derby, which are completely empty when first initialized.

The language between the <startup> tags is simply a list of SQL statements, each statement terminated with a semi-colon, with a few additional special commands.

Language Details:

  1. Statements can be any SQL statement other than SELECT (insert, update, drop, create, delete, etc.)
  2. Every statement must end with a semi-colon
  3. String constants are surrounded by single-quotes
    • Embedded single quotes are escaped with a pair of quotes, for example:
      • 'William ''Buddy'' Johnson'

Additional Commands Include:

  • ignore errors - Ignore any SQL errors which occur, for example creating a table which already exists.
  • notice errors - Errors in executing SQL statements are trapped and will cause initialization to fail.
    • Note that these errors will be considered to be 'fatal', meaning that all jobs sent to the component will fail with an exception error.
  • commit - Commit all changes.
    • Not typically needed, because auto commit is turned on by default.
  • rollback - Rollback all changes.
    • Probably that not useful a command.
  • auto commit off - Turn off auto commit.
  • auto commit on - Turn on auto commit.

Improving Performance with JDBC Properties

Many JDBC connectors have properties that you can modify to change behavior or performance metrics.

For example, in Oracle, you might try the following to improve fetching performance with SELECT statements that retrieve lots of rows:

     <jdbcProperties>
       <property name="defaultRowPrefetch">500</property>
     </jdbcProperties>

This will change the default number of rows fetched on a transaction from 10 to 500. This will result in fewer overall transactions and may result in better SELECT performance.