HTTP Feeder (Aspire 2)
For Information on Aspire 3.1 Click Here
Use the HTTP Feeder to receive RESTFul requests and to feed these requests to an Aspire pipeline. This feeder can turn Aspire into a "RESTful Web Service", accepting requests from outside clients, processing jobs, and then returning results.
The HTTP feeder will register a brand new servlet URL, based on the Aspire server path. For example, if your servletName is "submitFiles", then the new URL will be http://server:50505/submitFiles. In other words, it is separate and apart from the standard Aspire admin user interface (which is under "/aspire").
There are two modes of operation for the HTTP Feeder: 1) Input parameters specified on the URL, and 2) Input data POST'ed to the feeder. In the case of parameters on the URL, the input parameters are added to the AspireObject which is fed down the pipeline. In the case of POSTed data, this may either be parameters from a form that will be added to AspireObject which is fed down the pipeline or data streamed to the servlet which is attached to the published Job as a stream.
The HTTP Feeder can also be used to upload files, using a Multipart form submission. See below for details.
Using the HTTP Feeder as a User Interface
The HTTP Feeder can be used as a user interface. See here for details
Parameters Specified on the URL
In the first mode, parameters are specified on the URL in param=value format. For example: http://server:50505/submitFiles?param1=value1¶m2=value2 .
These parameters will be stored in the resulting AspireDocument passed down the pipeline as XML tags at the top level. For example:
<doc> <feederLabel>HttpFeeder</feederLabel> <param1 source="HTTPFeederServlet">value1</param1> <param2 source="HTTPFeederServlet">value2</param2> </doc>
The pipeline would then be responsible (via groovy scripting or whatever) for processing the job as necessary. The results would be returned as XML data.
Information from the Servlet
Information from the servlet is also added to the job published by the HTTPFeeder Information is added as elements to the <aspireHttpFeederServlet> tag:
<doc> <aspireHttpFeederServlet remotePort="52124" relativePath="/xml-search" serverName="localhost" source="HTTPFeederServlet" remoteHost="127.0.0.1" serverPort="50505" remoteAddr="127.0.0.1" fullPath="/cgi-bin/xml-search" servletPath="/cgi-bin"> <queryString>param1=value1¶m2=value2</queryString> </aspireHttpFeederServlet> . . </doc>
The following information is available:
|source||The name of the HttpFeeder|
|remoteHost||The hostname of the client (e.g., browser).|
|remoteAddr||The IP address of the client (e.g., browser).|
|remotePort||The port used by the client (e.g., browser).|
|serverName||The name of the server running the HttpFeeder.|
|serverPort||The port the HttpFeeder is listening on.|
|servletPath||The path the HttpFeeder is responding to.|
|fullPath||The full path requested by the client.|
|relativePath||The path requested by the client relative to the servletPath.|
||The entire query string (ie, everything after the ? in the URL).|
||The maximum size of file that can be uploaded (in bytes - defaults to 10,485,760 bytes - 10Mb). This may be specified using a suffix to specify bytes/kilobytes/megabytes/gigabytes (b/kb/mb/gb). If the suffix is not given, the parameter is in bytes.|
XML Data POSTed to the Service
If you wish to actually post data to the service, this can currently be done by setting the "XMLContent" parameter to TRUE below.
Despite its name, XMLContent does not actually require that the content be in XML. The content can be HTML, PDF, or anything. Perhaps the config parameter will be renamed in the future.
When XML content is true, data streamed to the servlet via POST will be set as an input stream attached to the job published by the feeder. You can access the data using the Standards.Basic.getContentStream(Job j) method in the package com.searchtechnologies.aspire.framework.
This also means that you can follow the HTTP feeder with any pipeline stage that uses the content stream. For example, XML Sub Job Extractor, Tabular Files Extractor_Aspire_2, XML File Loader, and Extract Text can all be the first pipeline stage to receive the job.
Also, just FYI, the "curl" command (available with http://www.cygwin.com or on most Linux installs) is a great way to test submitting data to the service. For example, to POST the document as the content to an Aspire servlet, you could do the following:
curl --data-binary "@data\full_text.xml" http://localhost:50515/submitFiles
Multipart Form Submissions
HTML supports submitting "multipart forms" made up of multiple parameters, some of which may represent uploaded file content.
In order for the HTTP feeder to receive multipart forms, you need to enable them and then specify how files are handled. You may choose to handle posted files as a stream (choose stream for the <fileHandler> option), or as files (choose file for the <fileHandler> option). If you choose to handle posted files as files, you must also specify the directory they are uploaded to.
NOTE: setting the XMLContent option of the HttpFeeder automatically disables multipart form submission processing
When the file handler is set to stream, only a single file may be uploaded at a time. Also, all parameters which are received BEFORE the file will be added to the job's as XML tags on the AspireObject. Parameters received AFTER the file are ignored. The file itself will be attached as an InputStream to the job and subsequent stages can access the data using the Standards.Basic.getContentStream(Job j) method in the package com.searchtechnologies.aspire.framework and so data can be streamed directly from the client through whatever processing you need to do. The file is NOT stored locally on the Aspire server by the HttpFeeder
<component name="MyHTTPFeeder" factoryName="aspire-http-feeder" subType="default"> . . <multipartForm> <fileHandler>stream</fileHandler> </multipartForm> </component>
When the file handler is set to file, multiple files may be uploaded by a single form submission. Using the file handler requires the HttpFeeder <uploadDir> to be configured. Any file submitted will be uploaded and saved to this directory. The uploaded file is saved using its original filename (filename only, not the complete path).
No streams are added to the Aspire job, and if you wish to reference the file, you will need to access the job's AspireObject and extract the value for the tag corresponding to the HTML form input that caused the file to be uploaded. This value is the full path to the saved copy of the uploaded file on the Aspire server.
For example, if the file was uploaded via the following form:
<form enctype="multipart/form-data" method=POST action="http://localhost:50505/xmlfeed"> XML file to push: <input type="file" name="data"> <input type="submit" value=">Submit<"> </form>
The AspireObject for the job would look similar too:
<doc> <aspireHttpFeederServlet remotePort="56494" serverName="localhost" source="HTTPFeederServlet" remoteHost="127.0.0.1" serverPort="50505" remoteAddr="127.0.0.1" fullPath="/xmlfeed" servletPath="/xmlfeed"> <queryString/> </aspireHttpFeederServlet> C:\tmp\1.2distroTest\distro-test\target\aspire-distribution-1.0-distribution/data/upload\htmlContentFeed.xml </doc>
All ordinary HTML form input parameters will be added to the job's AspireObject as XML tags.
<component name="MyHTTPFeeder" factoryName="aspire-http-feeder" subType="default"> . . <multipartForm> <fileHandler>file</fileHandler> <uploadDir>data/upload</uploadDir> </multipartForm> </component>
|branches||parent tag||None||The configuration of the pipeline to publish to. See below.|
|waitForJob||boolean||true||Indicates to the component whether or not wait for the job to complete .|
|servletName||String||httpFeeder||Name of the servlet that will feed the files. For example, if servletName is "submitFiles", then you would send files to the httpFeeder using the "http://localhost:50505/submitFiles?params..." URL.|
|feederLabel||String||HttpFeeder||The <feederLabel> value to be included with the document as it is sent to the pipeline. For example, HttpFeeder.|
|XMLContent||boolean||true||Set this parameter to true if you will be POST-ing XML data to the HTTP Feeder. This XML data will be set as an input stream attached to the job published by the feeder. Subsequent stages can access the data using the Standards.Basic.getContentStream(Job j) method in the package com.searchtechnologies.aspire.framework.|
|xmlRootName||String||doc||The name of the root element, for example <root>. This will be the root element of the AspireDocument object which is passed down the pipeline.|
|xsltFileName||String||null||The path of the XSL transform file to be used to format the output xml. Path names will be relative to Aspire Home.|
|outputMime||String||text/xml||Specifies the mime type which the HTTP feeder will report back to the HTTP client. Change this to "text/html" if your transform creates HTML which should be shown by a browser.|
|resultMimeTypeField||String||Set the mime type using the value found in the field specified. The field must exist as a child of the root (ie a parameter value of mimeType looks for value in the /doc/mimeType field in the default AspireObject) . If the field does not exist or is empty, then the mimeType reverts back to the value from the parameter <outputMime>|
NOTE: The value is extracted before the transformation (if any) is applied.
|multipartForm||parent tag||Enable multi-part form submission, which allows for uploading files to the HTTP server through HTML forms, as well as other input elements.|
|multipartForm/fileHandler||String||stream||Specify the type of file handler to use for posted files. The stream (default) handler will attach an InputStream to the file stream to the job and subsequent stages can access the data using the Standards.Basic.getContentStream(Job j) method in the package com.searchtechnologies.aspire.framework. The file handler will upload the file to the specified directory (see below). No input stream is attached to the job for the file handler. See above for more details and restrictions.|
|multipartForm/uploadDir||String||Specify the location where files from multi-part forms will be uploaded when using the file handler. See above for more details.|
|saxonProcessor||boolean||false||Set on true if you want to use SAXON Processors to transform using XSLT 2.0 files.|
|debugOutFile||String||Specify the location where the XSLT processed output will be written to. This is used for debugging the transforms.|
|headers (2.1 Release)||parent tag||None||The configuration of the http headers. See below.|
Example Configurations for HTML Form-Style Parameters
This will handle either parameters specified on the URL with HTTP GET, or parameters POST'ed from an HTML <form>.
<component name="MyHTTPFeeder" factoryName="aspire-http-feeder" subType="default"> <servletName>submitFiles</servletName> <feederLabel>HttpFeeder</feederLabel> <xsltFileName>config/categorizeOutput.xsl</xsltFileName> <branches> <branch event="onPublish" pipelineManager="CategorizeFolderOrFile" /> </branches> </component>
Example configuration for posting XML to Aspire
<component name="MyHTTPFeeder" factoryName="aspire-http-feeder" subType="default"> <servletName>submitFiles</servletName> <feederLabel>HttpFeeder</feederLabel> <XMLContent>true</XMLContent> <xsltFileName>config/extractor.xsl</xsltFileName> <branches> <branch event="onPublish" pipelineManager="CategorizeFolderOrFile" /> </branches> </component>
Example configuration for configuring HTTP headers
You can specify required HTTP headers in the configuration as following. Then feeder will add those header information to the response.
<component name="MyHTTPFeeder" factoryName="aspire-http-feeder" subType="default"> . . . <headers> <header name="Authorisation">simple</header> <header name="Accept">text/plain</header> </headers> </component>
The HTTPFeeder can also serve up ordinary HTML files so it can be used as a more complete, end-to-end user interface for simple user interfaces.
Files are stored inside the Aspire Home directory, in the "web/httpfeeder/<servlet-name>" directory.
For example, a request for:
Will access the file from:
Note that “index.html” is also supported. So, a request for:
If it exists.
NOTE: if porting from version 0.4, note that the position of the required directory on disk has changed from web/<servlet-name> to web/httpfeeder/<servlet-name>.
Returning Binary Data
Raw binary data can be returned from the HTTPFeeder. This will happen automatically if the following conditions are met:
- The output mime type is "application/octet-stream"
- This can be set with either the <outputMime> or <resultMimeTypeField> configuration parameters.
- There is a job variable called "byteDataResults"
Note that the job variable must (currently) hold data of type ByteArrayOutputStream.
If the above situation occurs, the HTTPFeeder will do the following:
- Fetch the array of bytes from the ByteArrayOutputStream
- Set the returned content-length to the length of the array of bytes
- Writes the byte data back to the client