CWL
Common Workflow Language (CWL) is a workflow description language designed to meet the needs of bioinformatics analysis. The specification for CWL is described on the Command Workflow Language website.
The platform's CWL execution engine implementation details and extensions of the CWL specification are described below.
TES ResourceRequirement
TES resources (e.g. resource type, size and hardware, see Requesting Resources) can be specified in the CWL workflow, together with a TES specific namespace. This overrides the standard requirements (e.g. coresMin, ramMin, etc). The CWL workflow will launch a TES task with TES specific resource requirements.
For instance, in the CWL workflow, if the following namespace
and hint
are specified:
the workflow will launch a TES task with the following resources
in the execution
section of task version.
Inputs
Streamable File
By default, the input files in a TES task launched by the CWL workflow are in Download mode. To enable Stream input mode, set the streamable
flag to true
for at least one of the input file. For example:
TES does not support one task with multiple inputs in both Download and Stream mode. If any input file has streamable
flag set to false
in the CWL workflow, all input files and folders are put in Download mode. To ensure folder input in Stream
mode, make sure streamable
flag is set true
for all file inputs.
Directory
The CWL spec doesn't provide a streamable
(boolean) attribute on Directory inputs (like on File inputs). By default, the input Directory
in a TES task launched by the CWL workflow are in Stream mode. For multiple input types in single task, if any input file has streamable
flag set to false
, all inputs(files and directories) are put in TES default mode (Download).
Default TES Modes
File
Download
Directory
Stream
Multi-Type Inputs : Files and/or Directories (ANY
File in Dowload Mode)
Download
Multi-Type Inputs : Files and/or Directories (ALL
Files in Stream Mode)
Stream
Https Support
http/https protocol for input type files is supported.
TES does not support https protocol for type Directories. Currently gds and s3 folders are supported.
Location for File and Directories
For compatiblity with cwltool --make-template
, both path and location are accepted when specifying the file or directory inputs. When both path and location are specified, location is used.
Outputs
Output Binding - Glob
This features allows user to select the output of a workflow using glob (e.g. *.vcf
).
Currently the following glob patterns are supported through WES.
*
Yes
Match all files in the files in the working directory of task. These may include TES specific files such as _manifest.json
*.txt
Yes
Match all files that end with .txt
in the working directory
sample-?.txt
No
sample-[0-9].txt
No
Folders currently do not support glob patterns.
stdout/stderr
In a CWL workflow, user can refer to the stdout and stderr log files of a tool with type stdout
and stderr
, for instance
In above example, stdout_file
and stderr_file
will map to corresponding log files in GDS, and thus can be referred to in downstream steps in the workflow.
Alternatively, user can add stdout
and stderr
keyword, and map them to strings (e.g. stdout.txt
and stderr.txt
in the following)
This allows user to use glob feature in outputBinding to map stdout and stderr, and further process the them in the same tool. In above example, stdout_content
and stderr_content
contain the content of stdout and stderr logs. Note that, although the value of stdout
and stderr
keyword can be any string, it must be uniquely and consistently referred to as the value of glob
in the outputBinding. Note also that the value of stdout
and stderr
keyword (e.g. stdout.txt
and stderr.txt
) are just identifiers. They are not the actual name of log files in GDS.
InitialWorkDirRequirement
Listing
This CWL feature allows users to specify the list of files or subdirectories that must be placed in the designated output directory prior to executing the command line tool. For more information refer to the CWL document on the Common Workflow Language website.
Example
Load Listing
loadListing
is a common input record attribute that specifies the desired behavior for loading the listing field of a Directory object used by expressions. loadListing
is only valid when type is Directory or is an array of items Directory. For more information, see Common Input Record Fields on the Common Workflow Language website.
The following is an example of loadListing
:
The following are possible values for loadListing
:
no_listing
Do not load the directory listing.
shallow_listing
Only load the top level listing. Do not recurse into subdirectories.
deep_listing
Load the directory listing and recursively load all subdirectories as well.
Limitations
Copying files to certain directories (eg,
/usr
) is not supported. The complete list of directories can be found at TES inputs.Copying files to current working directory (cwd) has limited support. Files can be placed in CWD; however, the same file cannot be used as output. After the workflow is complete, the file appears as a zero byte file in GDS.
For very large directories with more than 10,000 files or directories, the
loadListing
is silently truncated at 10,000 files and 10,000 directories, using the default sort order from GDS API.loadListing
is not supported at the volume level. User must select at least one subfolder.
Execution
A CWL Step consists of evaluating parameters using JavaScript expressions, running a docker task using Task Execution Service (TES) and postprocessing any output JavaScript expressions. Even when a TES task is submitted, it may spend significant amount of time waiting for resources to be available. To get the exact time spent executing the docker container, query the run history and look for duration on a <stepname>_collect
event.
Last updated