Data Capture Server

While technically not an addon, the Data Capture server (or dcserver in short) is a separate product that can be deployed on a Windows or Linux server as a service or daemon.

The dcserver is used to schedule and perform bulk imports, freeing the Phaedra clients of this resource-intensive task. To facilitate this, the dcserver has a web interface that allows users to:

  • Schedule new scan jobs that scan specific network locations for new data and may trigger capture jobs to import this data.
  • Track the queue of capture jobs. For each capture job, users can see the progress, and the submitter may choose to cancel the job.
  • Register themselves to receive e-mail notifications on various events, such as capture job completion or errors.

In addition, the dcserver can also be used to host a scriptable RESTful API, allowing other systems to consume data from Phaedra.

Setup

Technically, the dcserver is a product very similar to the Phaedra client, except that it runs in headless mode (optionally with a console attached to it) instead of in a user interface, i.e. a workbench.

Download the binaries from the download page, and extract them to a location of your choice. To configure your dcserver, locate the file dcserver.ini and modify the vmargs block:

-vmargs
-Xmx8G
-Ddatacapture.server=true
-Djetty.http.port=8080
-Dphaedra.config=smb://hostname/sharename/config.xml
-Dphaedra.env=EnvironmentName
-Dphaedra.username=test
-Dphaedra.password=test
-Dphaedra.fs.path=/path/to/fileshare
  • -Xmx: the maximum amount of memory the dcserver is allowed to use. Make sure to set this value sufficiently high as capture jobs may run concurrently and consume a high amount of memory.
  • -Ddatacapture.server: set to true to enable all the server features such as the datacapture task queue and the web interface.
  • -Djetty.http.port: set the port where the web interface will be exposed on.
  • -Dphaedra.config: set the smb or file path pointing to the environment configuration file.
  • -Dphaedra.env: set the name of the environment to connect to
  • -Dphaedra.username: set the username to log in to the environment
  • -Dphaedra.password: set the password to log in to the environment

If the dcserver runs on a Linux host, one additional setting is required: -Dphaedra.fs.path. In the config.xml file, the environment's file share component will specify an smb:// path. Due to a technical limitation, Phaedra instances running on Linux cannot access such smb paths with write access (which is needed to be able to import data). The setting Dphaedra.fs.path is used to provide an alternative path to the file share, either via a writable mount point or a local file path.

In addition to the above server settings, you can also configure various other system settings via the ini file:

  • -Djava.io.tmpdir: set the directory that will be used to store temporary files.
  • -Djetty.https.port: set the SSL port where the web interface will be exposed on.
  • -Djavax.net.ssl.keyStore: the location of a Java keystore file (.jks) containing certificates, if you enable https.
  • -Djavax.net.ssl.keyStorePassword: the password that is needed to access the keystore file.

Note that all of these settings should come after the "-vmargs" line.

Running the server

To launch the server in a console, execute the dcserver binary in the main folder.

Note: This executable can be wrapped into a service just like any regular executable file. The configuration of a service or daemon is beyond the scope of this document.

Using the web interface

When the dcserver is up and running, it will log a message mentioning the HTTP port being listened on, for example:

[jetty.server.AbstractConnector] Started ServerConnector@4fa5cc73{HTTP/1.1,[http/1.1]}{0.0.0.0:8080}

You can then open a browser and point it to the following URL:

http://hostname:8080/ui/dc/index.html

Capture Jobs

The Capture Jobs page shows the activity of the dcserver: it lists data capture jobs that are currently running, or have recently finished. By default, the jobs from the past 2 weeks are listed. You can use the From and Until filters to show jobs from a different time window.

capturejobs1.png

The Status column shows the outcome of the job:

  • Queued: The job is in a queue, and will run as soon as the server has free resources to do so
  • Running: The job is currently being executed
  • Cancelled: The job has been cancelled by a user or administrator
  • Error: The job has failed with an error message
  • Finished: The job has completed and its results have been imported in Phaedra

To view more details for a job (such as the error message of a failed job), click on the Details button to the right of the job.

capturejobs2.png

Launching Capture Jobs

New capture jobs can be launched manually, using the Submit new Capture Job button in the top right of the screen.

capturejobs3.png

To launch a capture job, enter the following information:

  • Source Path: The path to the folder that should be captured. Note that this should be a network path (e.g. a Windows UNC path) that the dcserver can access.
  • Protocol: The protocol in Phaedra where the data should be imported into. Selecting a protocol will automatically select a Capture Config as well (see below).
  • Capture Config: The capture configuration that should be used by this capture job. Usually, this is selected automatically by selecting a Protocol, and there is no need to select this field manually.
  • Experiment: A new or existing experiment in the selected Protocol where the plates will be imported into.

Scan Jobs

While launching capture jobs manually may be desirable in some cases, in other situations it is more convenient to automate this procedure. For example, when your team is performing the primary screen phase for a large assay, you may have instruments generating hundreds of plates every day, and you don't want to submit dozens of capture jobs to get them all into Phaedra!

That is the purpose of Scan Jobs: they are defined once, and then run on a regular interval, e.g. every hour or every day. They will look for new data to import, and will automatically submit Capture Jobs for any new data that is found.

scanjobs1.png

To create a new Scan Job, select the Create New Scan Job button in the top right of the screen.

scanjobs2.png

Enter the required information for the new Scan Job:

  • Name: A unique name for this scan job
  • Description: A description for this scan job
  • Type: A scanner type. By default, Phaedra comes with 2 scanner types: folderScanner, which can be used to scan network locations such as shared drives, and columbusScanner which can be used to scan the contents of a Columbus server.
  • Schedule: A scheduling interval, telling the dcserver how often to run the scan job. For more information about the scheduling syntax, see http://www.quartz-scheduler.org/documentation/quartz-2.x/tutorials/crontrigger.html
  • Scanner configuration: Detailed XML configuration for the scan job, which depends on the selected scanner Type. Example:
    <config>
        <path>\\example.server.com\share1\dataToImport</path>
        <experimentPattern>AssayX-experiment[0-9]{4}.*</experimentPattern>
        <plateInProgressFlag>processing</plateInProgressFlag>
        <protocolId>123</protocolId>
    </config>

Below is a full list of available settings:

  • forceDuplicateCapture: Set to 'true' to force the dcserver to capture experiments, even if they have already been captured previously. Default: false.
  • createMissingWellFeatures: Set to 'true' to automatically add new well features to the target protocol if they are found in the data but do not exist in the protocol yet. Default: false.
  • createMissingSubWellFeatures: Set to 'true' to automatically add new subwell features to the target protocol if they are found in the data but do not exist in the protocol yet. Default: false.
  • protocolId: Specify the ID of the protocol to import new data into. This will also determine which capture configuration to use, as each protocol has a default capture configuration.
  • captureConfig: Specify a capture configuration to use to capture any data found by this scan job. Default: none (use the protocol's default, see above).

Settings that are specific to folderScanners:

  • path: The network location to scan.
  • experimentPattern: The pattern that folder names must match to be considered for import. One capture job will be submitted per experiment folder. Default: .* (matches any name)
  • platePattern: The pattern that subfolders of experiment folders must match to be considered for import. If a plate folder name does not match, it will be skipped during the capture job. Default: .* (matches any name)
  • plateInProgressFlag: In an experiment folder, if a file or folder is found matching this name, the experiment folder will be considered incomplete, and skipped.

Settings that are specific to columbusScanners:

  • host: The hostname of the Columbus server to scan.
  • port: The port number of the Columbus server to scan.
  • username: The username of the account to connect with.
  • password: The password of the account to connect with.
  • instanceIds: Instead of the 4 settings above, you can also specify a list of Columbus instances to scan, that have been defined in the preferences of the dcserver.
  • screenPattern: The pattern that screen names must match to be considered for import. One capture job will be submitted per screen. Default: .* (matches any name)
  • platePattern: The pattern that plate names must match to be considered for import. If a plate name does not match, it will be skipped during the capture job. Default: .* (matches any name)