Engine Parameters

Engine Parameters are only supported when launching analysis using the command-line interface or APIs

The platform CWL execution engine supports parameters passed in at launch time to configure runtime settings for the workflow run. These are provided through the launch interface passed in alongside the inputs to the workflow in the launch request body.

{
    "name": "runName",
    "input": {},
    "engineParameters": {}
}

Output Directory

During CWL execution, intermediate and final output files are written to GDS. GDS serves as the file system to persist intermediate files passed between steps of a workflow. Calls to GDS are authorized with the token used to launch the workflow. The workflow launcher requires permission to create resources in GDS. The location where outputs are stored can be modified using engine parameters. Intermediate output files. log files, and final outputs of the workflow are stored in subdirectories within the working directory GDS location. The final output location can be modified using the outputDirectory engine parameter. The working directory can be modified using the workDirectory engine parameter.

By default, the work directory is gds://<workflow-run-id>/<workflow-run-name>, and the final output directory, temporary output directories and log directories are all subfolders under work directory.

directory namedefault valueengine parameter keyword

work directory

gds://<workflow-run-id>/workflow-run-name>

workDirectory

output directory

gds://<workflow-run-id>/workflow-run-name>/outputs

outputDirectory

root of tmp output directories

<workDirectory>/steps

relative to workDirectory

root of log directories

<workDirectory>/logs

relative to workDirectory

User can specify alternative final output location using outputDirectory keyword in the engine parameter, and work directory using workDirectory keyword. The root of temporary output directories and log directories follows the definition of work directory. The output location of each step in the workflow are stored under temporary output directory <workDirectory>/steps/<step-name>/try-<try-number> for workflow with simple steps, and <workDirectory>/steps/<subworkflow-name>/<subworkflow-index>/<step-name>/try-<try-number> for workflow with subworkflows (such as scatter-gather subworkflows). The log files of each step are stored under log directory following similar subdirectory structure.

{
    "engineParameters": {
        "outputDirectory": "gds://custom/output/path",
        "workDirectory": "gds://custom/workdir/path"
    }
}

Output Setting

In the engine parameter, user can use keyword outputSetting with value leave, move or copy to specify, whether the workflow output should be left in temporary output locations as after each step (leave) or should be moved (move) or copied (copy) to a final output location. The default value of outputSetting is move.

When outputSetting is set to move or copy, all output files (type File or File array) are moved or copied in a "flattened" way to final output location (i.e. without subfolders). Similarly, all output folders (type Directory or Directory array) are moved or copied to the final output location, recursively with all the structure and name of containing files or subfolders preserved. When output file or folder basename collision occurs, the duplicated basename is renamed with incremental suffixes _2 _3 to resolve the conflict. This renaming handling is consistent with local cwltool behavior, and it applies to both files and folders output. If work diretory and output directory are in same gds volume, contents will be copied via direct aws s3 which will be much faster and requires smaller compute instance.

copyOutputTaskInstanceType

When outputSetting above is set to copy or move, the TES compute type used for the task run to perform the copy/move operation can be overridden using the copyOutputTaskInstanceType parameter. By default, the compute type is set to use a type of standardHiCpu with size set to medium.

{
    "engineParameters": {
        "copyOutputTaskInstanceType": {
            "type": "standardHiCpu",
            "size": "medium"
        }
    }
}

Launch Overrides

ResourceRequirements

As an extension of CWL resource requirement overrides feature, when a CWL workflow version is launched, the TES resource requirements of different tools in the workflow can be overridden using then overrides keyword under the engineParameters field in the version launch request, for instance:

When applying overridden ResourceRequirements, use the requirements attribute.

{
    "engineParameters": {
        "overrides": {
            "#md5sum-tool.cwl": {
                "requirements": {
                    "ResourceRequirement": {
                        "coresMin": 4,
                        "ramMin": 2048,
                        "http://platform.illumina.com/rdf/ica/resources": {
                            "type": "standard",
                            "size": "small"
                        }
                    }
                }
            },
            "#processSamplesheet.cwl": {
                "requirements": {
                    "ResourceRequirement": {
                        "http://platform.illumina.com/rdf/ica/resources": {
                            "type": "fpga",
                            "size": "medium"
                        }
                    }
                }
            }
        }
    }
}

In this example, the resource overrides rule will be applied to specific tools (of class CommandLineTool) in the workflow version with id #md5sum-tool.cwl and #samplesheetSplit.cwl. For instance, the tool #md5sum-tool.cwl in the workflow version can have default resource requirement in hints field:

{
    "class": "CommandLineTool",
    "id": "#md5sum-tool.cwl",
    "label": "Simple md5sum tool",
    "requirements": [
        {
            "class": "DockerRequirement",
            "dockerPull": "quay.io/agduncan94/my-md5sum"
        },
        {
            "class": "InlineJavascriptRequirement"
        }
    ],
    "hints": [
        {
            "class": "ResourceRequirement",
            "coresMin": 2,
            "ramMin": 1024,
            "outdirMin": 512,
            "http://platform.illumina.com/rdf/ica/resources": {
                "type": "standard",
                "size": "medium"
            }
        }
    ],
    "inputs": [ ... ]
    ...
}

As a result of the overrides, TES task corresponding to a step with tool id #md5sum-tool.cwl will be launched with resource requirement "size": ""small" rather than "size": "medium".

Besides the tool id, user can also specify a step id, such as #main/md5sum or whole workflow id #main. The overrides rule will be applied to corresponding tools mapped to these ids in the cwl workflow.

When resource overrides are given at workflow level (e.g. #main), step level as well as tool level, the overrides rule with more specific level (closer to tool level) will take higher precedence.

When the resource requirement specification is present in overrides field of engineParameteres in the launch request, as well as in requirements and/or hints fields in the workflow version, overrides takes the highest precedence, followed by requirements and then hints. In each of these three cases, when TES specific resource requirements (with keyword http://platform.illumina.com/rdf/ica/resources) and standard CWL resource requirements (with keywords such as coresMin, ramMin, etc) are both present, TES resource requirement takes higher precedence and will be used.

maxScatter

In case of workflow using scatter-gather feature, the system by default supports scattering up to a predefined maximum. In order to override the default, us the following engine parameter.

{
    "engineParameters": {
        "maxScatter": 23
    }
}

excludeTesSystemFiles

In the engine parameter, user can use keyword excludeTesSystemFiles with value true or false to specify, whether the CWL listdir command should included system files unrelated to the workflow. By default these files are hidden but users can choose to turn off filtering by setting excludeTesSystemFiles to false.

{
    "engineParameters": {
        "excludeTesSystemFiles": "true"
    }
}

Dirent/Listing/Directory Input Mode Override

In CWL, user can use streamable keyword to specify the input mode of file type input. However, the control of input mode of directory input or listing or dirent files is not directly supported in native CWL specifications. In addition, each ICA TES task only supports one input mode (either Download or Stream) for all the inputs (including files, directories and listing/dirent) of one task. In order to control the input mode of directories and listing/direct files, user can use engine parameter to specify the input mode of one or all steps or tools in the workflow with the following precedences in descending order.

When user includes the key inputModeOverrides in the engine parameter, and its value is a JSON object with step id or tool id as key (matching the value of a step or tool id in the packed CWL workflow), and Download or Stream as value, then the corresponding step or tool in the workflow will use the defined input mode for all its inputs, including files, directories and listing/dirent files. When both step id and tool id are present in inputModeOverrides, the step id takes higher precedence.

When inputModeOverrides is not present in the engine parameter, if user includes the key defaultInputMode in the engine parameter, and Download or Stream as its value, then all steps and tools in the workflow will use the defined input mode. When both inputModeOverrides and defaultInputMode are present in engine parameter, inputModeOverrides takes higher precedence, but the tools or steps which is not specified in the inputModeOverrides will still use input mode according to defaultInputMode.

For example, with the following engine parameter, all inputs of tool tool1-id will be in Download mode, step step2-id in Stream mode, and all the other tools and steps in Download mode.

{
    "defaultInputMode": "Download",
    "inputModeOverrides": {
        "#tool1-id": "Download",
        "#step2-id": "Stream"
    }
}

When input mode is not specified in engine parameter, the input mode of a step or tool is Stream (including all input files, directories, listing/dirent files), only if the step or tool has at least one file input with explicit keyword streamable=true. When streamable is not present in all file inputs (or there is no input of type file), according to CWL specification, it is treated as streamable=false, which implies Download mode for all inputs (including directories and list/dirent files). Since input mode in engine parameter has higher precedence, when user users inputModeOverrides and/or defaultInputMode in engine parameter, the setting of streamable in the workflow is overridden and thus ignored.

Last updated