cf_webservice — Connector Framework Web service
cf_webservice is a
Metaproxy
filter which offers a Web service for the Connector Framework.
The Web service uses JSON content for responses. A future version may also support XML.
HTTP clients must use Content-Type
application/json to post JSON content.
The Web Service will use the same
Content-Type for JSON content responses as well.
HTTP clients must use Content-Type
text/xml to post XML.
The cf_webservice module filters only HTTP requests with a certain
prefix. The default prefix is "connector" and is
used in the description that follows.
The following requests are offered by the webservice:
Makes a connector framework session. The content is Connector Framework File (XML).
Session data passed as a JSON string in the X-CF-Args header will be decoded and available to the connector in the $.session object.
If successful, the response includes a JSON object with a single member "id" with a session integer value. This session must be used in subsequent requests to refer to this connector.
If the content is empty, no connector is loaded into the engine. In this case only the engine session is established. A connector may be loaded later with load_cf operation (see below).
One or more arguments may be given for the POST in the form of
name=value
pairs, separated by &.
proxy=IP
Specifies HTTP proxy for the session.
thread=0|1
Enables threaded mode (value 1), or forked mode (value 0). If thread is not given, forked mode is used.
loglevel=level
Specifies log level for the engine session. The following names are recognized: DEBUG, INFO, WARN, ERROR .
logmodules=modules
Enables logging only for a subset of modules to be retrieved
by the log webservice command. The
modules list is comma separated
list of named modules. The available modules are:
runtime (JavaScript runtime logger),
engine (Engine encapsulating browser),
timing (timing for tasks),
stdout (unstructured text printed
to standard output in various places).
By default logging is enabled for all modules.
id/op/opargs
Performs an operation op with
arguments opargs on connector
identified by id.
The following operations, op, are
supported: run_task, run_task_opt,
run_tests, screen_shot,
load_cf and log.
For operations run_task and run_task_opt, the opargs is the name of the
task to run and the POSTed content is task parameters. The POSTed
content must be JSON.
For operation run_tests, the
opargs is the test tasks to run.
Operation screen_shot returns an Window dump
of the current browser in PNG format. Content-type of HTTP response is
image/png.
Operation log returns the log for the connection
session as it is produced by the Engine as well as the shared
Java runtime. It may be limited to certain modules by the
logmodules argument when POSTing a connector.
The log
operation may optionally be followed by ?clear=1
which will clear the log upon completion. Thus a following
log operation will only return log material following most recent log
operation.
log is a special operation, where the POSTed
content and content-type is ignored.
dom_string returns the current DOM for the session
rendered as a string. The POSTed content and content-type is ignored.
Operation load_cf loads the connector posted (XML).
Currently the Content-Type is ignored. It should be text/xml.
If an operation is successfully completed (HTTP status 200), the
HTTP response is result. For run_task,
run_task_opt, run_tests
the response is a JSON document.
For run_tests, however, the response is
simply a JSON object with name "result" and a boolean value with
true for success and false for
failure.
For operation log the response is text and
content-type is set to text/plain.
id
Deletes the connector identified by id.
The webservice is implemented as a shared object for the Metaproxy server. The Module ID of is simply "cf".
The following elements may be given as part of the module configuration:
Specifies various settings WRT the environment in which the the
module is run. These are the values that were previosly controlled
by environment variables for cf-zserver. The env
element takes several attributes. These are:
Same as CF_TMP_DIR.
Same as CF_APP_PATH.
Same as CF_MODULE_PATH.
Same as CF_DISPLAY_LOCK.
Same as CF_DISPLAY_CMD.
Same as CF_BASE_PATH.
Same as CF_CONNECTOR_PATH.
Same as CF_REPO_AUTH_URL.
Same as CF_REPO_FETCH_URL.
Specifies the HTTP path for the Web Service. By default it is
connector. If a HTTP request does not
use the prefix given, the cf module will pass the request to the
next module in chain of modules defined by the Metaproxy configuration.
Controls the Z39.50 server interface of the module.
This element takes one attribute, enable which
has values "false" to disable the Z39.50 server
(default) or "true" to enable the Z39.50 server.
Below is shown a small Metaproxy configuration file which loads the CF Web service module:
<?xml version="1.0"?>
<metaproxy xmlns="http://indexdata.com/metaproxy" version="1.0">
<dlpath>.</dlpath>
<start route="start"/>
<filters>
<filter id="frontend" type="frontend_net">
<port>@:9000</port>
<threads>50</threads>
</filter>
</filters>
<routes>
<route id="start">
<filter refid="frontend"/>
<filter type="log"><category user-access="true" apdu="true" /></filter>
<filter type="cf">
<env
app_path="/var/cache/cf"
module_path="/usr/share/cf/modules"
display=":1.0"
tmpdir="/tmp/cfengine"
/>
<url_prefix>connector</url_prefix>
</filter>
<filter type="bounce"/>
</route>
</routes>
</metaproxy>
The dlpath must be set to the directory containing
Metaproxy modules - in particular the CF module
metaproxy_filter_cf.so.
#!/bin/sh
C=/usr/share/cf/connectors/inactive/doaj.cf
if test "$1"; then
C=$1
fi
H=http://localhost:9070/connector
# Create session (empty content)
curl --output ws.log --data-binary "" $H
# Parse it
ID=`cat ws.log | cut -d":" -f 2|cut -d"}" -f 1`
# Load connector file
curl --data-binary @$C $H/$ID/load_cf
# Run a set of tests
curl --header "Content-Type: application/json; charset=UTF-8" --data-binary "{}" \
$H/$ID/run_tests/search,parse,next,parse
# Run task search
curl --header "Content-Type: application/json" \
--data-binary "{\"keyword\":\"water\"}" \
$H/$ID/run_task/search
# Run task parse
curl --header "Content-Type: application/json" --data-binary "{}" \
$H/$ID/run_task/parse
# Take screen shot (requires pnmtopng, xwdtopnm)
if test -x /usr/bin/pnmtopng; then
curl --output screen.png \
--data-binary "{}" $H/$ID/screen_shot
fi
# Run opt task init
curl --output init.log --header "Content-Type: application/json" --data-binary "{}" \
$H/$ID/run_task_opt/init
# Get log
curl --header "Content-Type: application/json" --data-binary "{}" \
$H/$ID/log
# Get dom
curl --header "Content-Type: text/html" --data-binary "{}" \
$H/$ID/dom_string
# Delete the connector
curl --request DELETE $H/$ID