CWL

Common Workflow Language (CWL) is a workflow description language designed to meet the needs of bioinformatics analysis. The specification for CWL is described on the Command Workflow Language website.

The platform's CWL execution engine implementation details and extensions of the CWL specification are described below.

TES ResourceRequirement

TES resources (e.g. resource type, size and hardware, see Requesting Resources) can be specified in the CWL workflow, together with a TES specific namespace. This overrides the standard requirements (e.g. coresMin, ramMin, etc). The CWL workflow will launch a TES task with TES specific resource requirements.

For instance, in the CWL workflow, if the following namespace and hint are specified:

$namespaces:
  ilmn-tes: https://platform.illumina.com/rdf/ica/

hints:
- class: ResourceRequirement
  ilmn-tes:resources:
    tier: standard
    type: fpga
    size: small

the workflow will launch a TES task with the following resources in the execution section of task version.

{
  "resources": {
    "tier": "standard",
    "type": "fpga",
    "size": "small"
  }
}

Inputs

Streamable File

By default, the input files in a TES task launched by the CWL workflow are in Download mode. To enable Stream input mode, set the streamable flag to true for at least one of the input file. For example:

inputs:
  input_file:
    type: File
    inputBinding:
      position: 1
    doc: The file that will have its md5sum calculated.
    streamable: true

TES does not support one task with multiple inputs in both Download and Stream mode. If any input file has streamable flag set to false in the CWL workflow, all input files and folders are put in Download mode. To ensure folder input in Stream mode, make sure streamable flag is set true for all file inputs.

Directory

inputs:
  input_folder:
    type: Directory
    inputBinding:
      position: 1
      doc: The file that will have its md5sum calculated.

The CWL spec doesn't provide a streamable (boolean) attribute on Directory inputs (like on File inputs). By default, the input Directory in a TES task launched by the CWL workflow are in Stream mode. For multiple input types in single task, if any input file has streamable flag set to false, all inputs(files and directories) are put in TES default mode (Download).

Default TES Modes

Https Support

http/https protocol for input type files is supported.

inputs:
  input_file:
    type: File
    location: "https://somehost/test.txt"

TES does not support https protocol for type Directories. Currently gds and s3 folders are supported.

Location for File and Directories

For compatiblity with cwltool --make-template, both path and location are accepted when specifying the file or directory inputs. When both path and location are specified, location is used.

Outputs

Output Binding - Glob

This features allows user to select the output of a workflow using glob (e.g. *.vcf).

outputs:
  output_file:
    type: File
    outputBinding:
      glob: "*.txt"

Currently the following glob patterns are supported through WES.

Folders currently do not support glob patterns.

stdout/stderr

In a CWL workflow, user can refer to the stdout and stderr log files of a tool with type stdout and stderr, for instance

outputs:
  stdout_file:
    type: stdout
  stderr_file:
    type: stderr

In above example, stdout_file and stderr_file will map to corresponding log files in GDS, and thus can be referred to in downstream steps in the workflow.

Alternatively, user can add stdout and stderr keyword, and map them to strings (e.g. stdout.txt and stderr.txt in the following)

requirements:
- class: InlineJavascriptRequirement

stdout: stdout.txt
stderr: stderr.txt

outputs:
  stdout_file:
    type: File
    outputBinding:
      glob: stdout.txt
  stderr_file:
    type: File
    outputBinding:
      glob: stderr.txt
  stdout_content:
    type: string
    outputBinding:
      glob: stdout.txt
      loadContents: true
      outputEval: $(self[0].contents)
  stderr_content:
    type: string
    outputBinding:
      glob: stderr.txt
      loadContents: true
      outputEval: $(self[0].contents)

This allows user to use glob feature in outputBinding to map stdout and stderr, and further process the them in the same tool. In above example, stdout_content and stderr_content contain the content of stdout and stderr logs. Note that, although the value of stdout and stderr keyword can be any string, it must be uniquely and consistently referred to as the value of glob in the outputBinding. Note also that the value of stdout and stderr keyword (e.g. stdout.txt and stderr.txt) are just identifiers. They are not the actual name of log files in GDS.

InitialWorkDirRequirement

Listing

This CWL feature allows users to specify the list of files or subdirectories that must be placed in the designated output directory prior to executing the command line tool. For more information refer to the CWL document on the Common Workflow Language website.

Example

#!/usr/bin/env cwl-runner

cwlVersion: v1.0
hints:
  ResourceRequirement:
    ramMin: 8

requirements:
  - class: DockerRequirement
    dockerPull: debian:stretch-slim
  - class: InitialWorkDirRequirement
    listing:
      - $(inputs.INPUT)

class: CommandLineTool

inputs:
  - id: INPUT
    type: File

outputs:
  - id: OUTPUT
    type: File
    outputBinding:
      glob: $(inputs.INPUT.basename)
    secondaryFiles:
      - .fai

arguments:
  - valueFrom: $(inputs.INPUT.basename).fai
    position: 0

baseCommand: [touch]

Load Listing

loadListing is a common input record attribute that specifies the desired behavior for loading the listing field of a Directory object used by expressions. loadListing is only valid when type is Directory or is an array of items Directory. For more information, see Common Input Record Fields on the Common Workflow Language website.

The following is an example of loadListing:

inputs:
  directory:
    type: Directory
    loadListing: shallow_listing

The following are possible values for loadListing:

Limitations

  • Copying files to certain directories (eg, /usr) is not supported. The complete list of directories can be found at TES inputs.

  • Copying files to current working directory (cwd) has limited support. Files can be placed in CWD; however, the same file cannot be used as output. After the workflow is complete, the file appears as a zero byte file in GDS.

  • For very large directories with more than 10,000 files or directories, the loadListing is silently truncated at 10,000 files and 10,000 directories, using the default sort order from GDS API.

  • loadListing is not supported at the volume level. User must select at least one subfolder.

Execution

A CWL Step consists of evaluating parameters using JavaScript expressions, running a docker task using Task Execution Service (TES) and postprocessing any output JavaScript expressions. Even when a TES task is submitted, it may spend significant amount of time waiting for resources to be available. To get the exact time spent executing the docker container, query the run history and look for duration on a <stepname>_collect event.

Last updated