Task Execution

The basic building blocks of pipelines are Tools - containerized applications executed on a distributed cloud infrastructure with defined compute resources and execution environment conditions. The platform runs Tools through the Task Execution Service (TES), which hosts a suite of APIs for launching and monitoring the execution of the containerized applications. The TES APIs operate on the Task resource model. During pipeline execution, the Tool definition of each step is translated to a task. Tasks contain an execution specification as part of the resource model, containing all the information needed for TES to provision and launch the containerized application.

Requesting Resources

The Task Execution Service supports different compute types depending on the values provided in the execution.environment.resources section of the task version or task run body.

"execution": {
    "environment": {
        "resources": {
            "tier": "standard",
            "type": "standard",
            "size": "small"
        }
    }
}

ℹ️ Queued task runs are fulfilled as resources become available. Ordering is not gauranteed.

Type and Size

For the type and size fields, you can select from the following combinations:

TypeSizeCPUMemory

standard

small

.8 CPU

3 GB

medium

1.3 CPU

4.5 GB

large

2 CPU

7 GB

xlarge

4 CPU

14 GB

xxlarge

8 CPU

28 GB

standardHiCpu

small

15.5 CPU

28 GB

medium

35.5 CPU

68 GB

large

71.5 CPU

140 GB

standardHiMem

small

7.5 CPU

60 GB

medium

15.5 CPU

124 GB

large

47.5 CPU

380 GB

xlarge

95.5 CPU

764 GB

fpga

small

7.5 CPU

118 GB

medium

15.5 CPU

240 GB

large

63.5 CPU

972 GB

FPGA compute types have limited system-wide availability. If the system is under heavy load, the run may not be scheduled for a long time, depending on the load.

The FPGA, large compute type is unavailable in the cac1 region The FPGA, large compute type is unavailable in the aps2 region

The exact memory and CPU resources provisioned for a task run may vary for a given compute type in the table above. This is done to optimize for availability to ensure a job is scheduled in a timely manner while satisfying the minimum resources requested. This may result in slight variations in a task run's performance and duration when executed on the same inputs multiple times.

If you do not specify a resource size and type, then the task is executed on the smallest instance available when the request is made.

Tier

Compute resource tiers provide pricing options to save cost at the risk of having runs more susceptible to capacity limitations. Choosing a low cost tier schedules the task run on a compute node that may be interrupted and re-provisioned when the system is under load. For short running jobs on smaller compute types and sizes that are tolerant to interruption, this works well. For long running jobs running on more powerful compute types and sizes, the change of interruption increases and may severely impact total run duration.

TierDescription

economy

Lowest cost option. The run may be interrupted and will be continue to be rescheduled on interruptible nodes when restarted upon interruption or failure.

standard

Highest cost option. The run will be scheduled on a non-interruptible node and will be rescheduled to non-interruptible nodes when restarted upon interruption or failure.

FPGA compute types are limited to the standard tier in the regions below.

  • London (euw2)

  • Canada (cac1)

  • Singapore (aps1)

  • Frankfurt (euc1)

Environment Variables

Environment variables may be set in the container executing the task run.

"execution": {
    "environment": {
        "variables": {
            "ENV_VAR": "EXAMPLE_VAL"
        }
    }
}

Secure Environment Variables

Environment variables may be secured to hide them from log outputs and API responses. Use the SECURE_ prefix to indicate a environment variable as secure.

"execution": {
    "environment": {
        "variables": {
            "SECURE_ENV_VAR": "EXAMPLE_SECURE_VAL"
        }
    }
}

Substitution

Within the execution body, either a static value or a substitution can be provided for a field's value. A substitution is made using a string wrapped with {{<string>}} and allows the actual values to be specified at launch time using arguments. Substitutions can be reused for multiple fields and can be embedded within a string value.

Certain fields, like passwords or secrets, require substitutions to prevent secrets from being stored with the version. The value for the secret will be replaced in response bodies with "<hidden>" rather than the value itself.

The following is an example task execution specification leveraging substitutions:

{
    "name": "HelloWorld",
    "execution": {
        "image": {
            "name": "{{imageName}}",
            "tag": "{{imageTag}}",
            "credentials": {
                "username": "username",
                "password": "{{password}}"
            }
        },
        "command": "bash",
        "args": [ "Hi my name is {{name}}"],
        "systemFiles": {
            "url": "gds://taskruns"
        },
        "retryLimit": 0
    }
}

The input arguments are then provided in the arguments of the version launch request.

{
    "imageName": "ubuntu",
    "imageTag": "latest",
    "password": "password",
    "name": "world"
}

Logging

Throughout a task run's life-cycle, several logs and system-related files are produced with information about the execution of the job. TES requires the user to provide an external location to serve as the file store for these files. A "systemFiles" field in the execution body is used to provide a URL and optional credentials (similar to an output mapping) for storing these files.

{
    "image": {
        "name": "ubuntu",
        "tag": "latest"
    },
    "command": "bash",
    "args": [ "-c", "echo test" ],
    "inputs": [ ],
    "outputs": [ ],
    "systemFiles": {
        "url": "gds://taskruns",
    },
    "retryLimit": 0
}

During the execution of the task run, TES creates a folder with the name matching the task run ID (ie, trn.<uid>) directly under the final path component of the provided URL. For example, if the above task run is executed, a folder is created at gds://taskruns/trn.<uid> where trn.<uid> is the unique ID assigned to the task run resource. Any system-related files will be stored within the trn.<uid> folder.

stdout/stderr

During task run execution, the stdout and stderr of executing processes are redirected to /var/log/tessystemlogs/task-stdouterr.log. Other container log artifacts are placed in the /var/log/tessystemlogs folder. These log files are uploaded every 3 seconds and can be accessed while the task run is executing.

An output mapping may be specified using the /var/log/tessystemlogs as the path to send the folder contents to an alternative location from the URL specified in the "systemFiles" field.

The following is an example of a task execution specification that will send logs stored in /var/log/tessystemlogs to gds://volume1/myLogs. Any other system-related files produced by the task run job will be sent to the "systemFiles" URL, gds://taskruns:

{
    "image": {
        "name": "ubuntu",
        "tag": "latest"
    },
    "command": "bash",
    "args": [  "-c", "sleep 20", "echo 'test'; /helloWorld.sh >> /outdir/result.txt" ],
        "inputs": [ ],
        "outputs": [
        {
            "path": "/var/log/tessystemlogs",
            "url": "gds://volume1/myLogs"
        }
    ],
    "systemFiles": {
        "url": "gds://taskruns"
    },
    "retryLimit": 0
}

Other logs

The following is a table of the files produced during a task run's life-cycle. Some files are only produced under certain conditions.

FileDescription

bsfs-stdouterr.log

Logs associated with mounting input files

logging-stdouterr.log

Logs associated with logging container

output#-stdouterr.log

Logs associated with uploading to each output location (# is replaced with the index of the output in the execution body)

task-stdouterr.log

stdout/stderr of the application

_manifest.json

Records metadata for all uploaded files, including:

  • Relative path where the file was uploaded.

  • md5 checksum.

  • file size in bytes.

  • UTC timestamp when the file was uploaded.

_tags.json

Records the UTC timestamp when the uploads are completed and the Task Run ID.

Marshalling Data

Inputs

The task execution body provides an array of inputs that will be attached to volume mounts on the running instance of the task. Each object in the inputs array must contain a "path" and a "url". The path is the volume mount location in the container to which the input will be downloaded. The url is the external location the input will be downloaded from.

Input paths are made read-only at the last folder component in the path. In general, applications run on TES should use the /scratch path for intermediate files. The DRAGEN application should use the /ephemeral path for intermediate files.

Inputs must meet the following conditions:

  • The path must be absolute.

  • The same path must not be reused for multiple inputs.

  • The path must not lead to any of the following: /, /usr, /var, /log, /lib, /usr/bin.

  • Http-based URLs must not require authentication.

  • GDS-based URLs must be accessible by the token used to launch the task run.

  • S3-based URLs must provide valid credentials (see below)

TES currently supports a maximum of 20,000 input files, including files mounted with a folder input.

Input File Example

{
    "inputs": [
        {  
            "path":"/media/data/input/file.txt",
            "url":"gds://volume1/folder1/file.txt",
            "type": "file"
        }
    ]
}

Input Folder Example

{
    "inputs": [
        {  
            "path":"/media/data/input",
            "url":"gds://volume1/folder1",
            "type": "folder"
        }
    ],
    "systemFiles": {
        "url": "gds://taskruns"
    }
}

When you specify a folder as input, the "systemFiles" property must also be set.

AWS S3 Inputs

To read inputs from a private S3 bucket, the credentials to that bucket must be provided in the credentials field of the inputs, and the storageProvider must be set to aws. A substitution is required for each of the fields in credentials when defined in a task version. The following are the valid keys that can be provided in credentials:

  • AWS_ACCESS_KEY_ID

  • AWS_SECRET_ACCESS_KEY

  • AWS_SESSION_TOKEN

There are two ways to provide access keys. For permanent credentials, include the AWS_ACCESS_KEY_ID and the AWS_SECRET_ACCESS_KEY. For temporary credentials, include the AWS_ACCESS_KEY_ID, the AWS_SECRET_ACCESS_KEY, and the AWS_SESSION_TOKEN.

The following is an example of a task execution specification that reads inputs from a private S3 location:

{
    "image": {
        "name": "ubuntu"
    },
    "command": "bash",
    "args": [  "-c", "sleep 20" ],
    "inputs": [             {
            "path": "/localfolderpath",
            "url": "s3://bucket/folder",
            "type":"folder",
            "storageProvider": "aws",
            "credentials": {
                "AWS_ACCESS_KEY_ID": "{{accessKeyId}}",
                "AWS_SECRET_ACCESS_KEY": "{{secretAccessKey}}",
                "AWS_SESSION_TOKEN": "{{sessionToken}}"
            }
        },
        {
            "path": "/localfilepath",
            "url": "s3://bucket/path/file.txt",
            "type":"file",
            "storageProvider": "aws",
            "credentials": {
                "AWS_ACCESS_KEY_ID": "{{accessKeyId}}",
                "AWS_SECRET_ACCESS_KEY": "{{secretAccessKey}}",
                "AWS_SESSION_TOKEN": "{{sessionToken}}"
            }
        }
    ],
    "outputs": [],
    "systemFiles": {
        "url": "gds://taskruns"
    },
    "retryLimit": 0
}

Download Mode

By default, input resources are streamed to the task run job during execution. It may be preferable to force the complete download of certain resources prior to executing the command. For example, applications that use a random access pattern need the complete file contents available. Inputs may be specified as requiring download using the "mode" field with a value of "download". Available options for the mode include "download" and "stream" (default).

{
    "inputs": [
        {  
            "path":"/media/data/input/file.txt",
            "url":"gds://volume1/folder1/file.txt",
            "mode": "download",
            "type": "file"
        }
    ]
}

Manifest type input

Each TES task has a maximum number of input files (128) allowed in the inputs list (input of type file).

To launch a task with very large number of inputs, user may use one input of type manifest.

"inputs": [
    {
        "type": "manifest",
        "mode": "download|stream",
        "url": "https://<presignedURL_of_manifest_json>",
        "path": "/manifest/mount/path"
    }
]

Here mode can be either download or stream, which is applied to all input files in the manifest. The value of url is an https-based presigned URL of the manifest JSON file (in case of GDS based manifest JSON, user needs to call GDS API to get its presigned URL). The manifest JSON is a list of input items, each item in following format,

[
    {
        "url": "https://1000genomes-dragen.s3.amazonaws.com/reference/hg38_alt_aware_nohla.fa",
        "size": 3261550200,
        "path": "/reference/fasta/hg38_alt_aware_nohla.fa"
    },
    ...
]

Here, url is the presigned URL of each input file, size is the exact size in byte of input file, and path is the intended mount path relative to the path of manifest itself inside the container. For instance, in above example, the absolute mount path of file hg38_alt_aware_nohla.fa is /manifest/mount/path/reference/fasta/hg38_alt_aware_nohla.fa.

Note, GDS folder or S3 folder is not supported inside manifest. The presigned URLs of all files under the folder need to be iteratively generated before being added to the manifest JSON.

Only one input of type manifest is allowed in the inputs list of a task launch request. All additional input (of type file or folder) are ignored. They should be included into the manifest JSON.

Manifest JSON can be gzipped. The max size of input manifest JSON is 1GB (gzipped or uncompressed).

Outputs

The task execution body provides an array of outputs to upload files and folders local to the task container to an external URL. Similar to inputs, each object in the outputs array must contain a "path" and a "url". The contents of the path will be uploaded to the mapped URL.

Requirements:

  • The path must be absolute.

  • The same path must not be reused for multiple outputs.

  • The path and URL must lead to a folder.

  • The URL scheme must match one of the following: gds://, s3://

  • GDS-based URLs must be accessible by the token used to launch the task run.

In addition to the outputs generated by the task execution, a _manifest.json and _tags.json file is uploaded to each mounted output location. These files contain information about the files uploaded to that specific mount location.

FileDescription

_manifest.json

Records metadata for all uploaded files, including the following:

  • Relative path where the file was uploaded.

  • md5 checksum.

  • File size in bytes.

  • UTC timestamp when the file was uploaded.

_tags.json

Records the UTC timestamp when the uploads completed and the Task Run Id.

AWS S3 Outputs

To write outputs to a private S3 bucket, the credentials to that bucket must be provided in the "credentials" field of the "outputs", and the "storageProvider" must be set to "aws". A substitution is required to be provided for each of the fields in credentials when defined in a task version. The following are the valid keys that can be provided in credentials:

  • AWS_ACCESS_KEY_ID

  • AWS_SECRET_ACCESS_KEY

  • AWS_SESSION_TOKEN

There are two ways to provide access keys. For permanent credentials, include the AWS_ACCESS_KEY_ID and the AWS_SECRET_ACCESS_KEY. For temporary credentials, include the AWS_ACCESS_KEY_ID, the AWS_SECRET_ACCESS_KEY, and the AWS_SESSION_TOKEN.

The following is an example of an execution specification that, when launched, will output logs to a private S3 location:

{
    "image": {
        "name": "ubuntu"
    },
    "command": "bash",
    "args": [  "-c", "sleep 20" ],
    "inputs": [ ],
    "outputs": [
        {
            "path": "/var/log/tessystemlogs",
            "url": "s3://bucket/folder",
            "storageProvider": "aws",
            "credentials": {
                "AWS_ACCESS_KEY_ID": "{{accessKeyId}}",
                "AWS_SECRET_ACCESS_KEY": "{{secretAccessKey}}",
                "AWS_SESSION_TOKEN": "{{sessionToken}}"
            }
        }
    ],
    "systemFiles": {
        "url": "gds://taskruns"
    },
    "retryLimit": 0
}

Private Image Repositories

TES supports running Docker-based images from public or private repositories. For images stored in a private repository, such as a private Docker repo or a private AWS ECR, access must be provided through credentials or an AWS policy.

Private AWS ECR

For images in a private AWS ECR, a policy will need to be added to the ECR to allow TES to pull images. For instructions on creating AWS policies, see Amazon ECR Repository Policy Examples. The following policy should be added to an AWS ECR to allow TES to access images:

{
  "Version": "2008-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam:<platform_aws_account>:root"
      },
      "Action": [
        "ecr:BatchCheckLayerAvailability",
        "ecr:BatchGetImage",
        "ecr:DescribeImages",
        "ecr:GetDownloadUrlForLayer",
        "ecr:GetRepositoryPolicy",
        "ecr:ListImages"
      ]
    }
  ]
}

Substitute <platform_aws_account> with the platform AWS account ID: 079623148045.

Setting this policy allows the image to be specified for task runs by any ICA users, so it is important to ensure no private data is stored on the images in the AWS ECR.

Example AWS ECR Image

{
    "image": {
        "name": "079623148045.dkr.ecr.us-east-1.amazonaws.com/myImage",
        "tag": "latest"
    },
    "command": "bash",
    "args": [  "-c", "sleep 20" ],
    "systemFiles": {
        "url": "gds://taskruns"
    },
    "retryLimit": 0
}

Private Docker Hub

To provide access to private docker hub images, the user name and password for the account hosting the image must be provided. TES requires the password be provided at launch time for security. Task versions requiring a password must utilize substitution for the password to be provided in the launch arguments.

The following is an example of a task execution specification that can be provided with the image password at launch time:

{
    "image": {
        "name": "ubuntu",
        "credentials": {
            "username": "user1",
            "password": "{{password}}"
        }
    },
    "command": "bash",
    "args": [  "-c", "sleep 20" ],
    "systemFiles": {
        "url": "gds://taskruns"
    },
    "retryLimit": 0
}

When this task version is launched, the password is provided in the launch arguments as follows:

{
    "password": "myPassword"
}

Bring Your Own Docker

For information on how to create a docker image for execution in TES, see Create a Base Image.

For information on how to make your image available for TES to run, see Push Images to Docker Cloud.

TES adopted the Kubernetes convention for launching, as follows: the Docker image's ENTRYPOINT is overridden by the "Command" field, and the Docker Image's CMD is overridden by the "Args" field in the task execution body. The "Image" field should match the image name in Docker Hub. To pull images from a private repository, you can provide the image credentials in the execution object.

Currently TES does not support the array syntax for "Command". Only a string can be provided. If your Docker image requires the array syntax, it must be enabled in the image itself by specifying the ENTRYPOINT as an array, and "Command" must not be specified in the task execution specification.

{
    "image": {
        "name": "privateImage",
        "tag": "imageTag",
        "credentials": {
            "username": "username",
            "password": "password"
        }
    },
    "command": "ENTRYPOINT override",
    "args": [ "CMD 1 override", "CMD 2 override"],
    "systemFiles": {
        "url": "gds://taskruns"
    },
    "retryLimit": 0
}

Working Directory

The working directory of the task run may be set using the "workingDirectory" field within the execution body. The value specified must be an absolute path, and will set the WORKDIR of the Dockerfile. For more information, see the Dockerfile reference.

Performance Optimizations

If the task container produces 1000 of output files and each file is mapped to same folder with 1000 output mount path-aws s3 may fail to save all. The AWS s3 uses the first prefix as the partition key. The recommendation is to use different prefix in such cases. This InfoQ article provides a detailed explanation.

Instance Retry

Task runs may experience an unexpected interruption where the compute instance hosting the task run job fails. Failure causes include the following:

  • Hardware failures

  • Instance eviction by the cloud infrastructure

  • Task run application fails with non-0 exit code

To prevent unexpected task run failures, a retryLimit can be provided to specify the number of attempts a task run should be retried if an unexpected job failure occurs. The retryLimit field is specified in the execution body as an integer between 0 and 6 with a default of 3. When developing and testing a task run, it's recommended to use a retryLimit of 0.

{
    "image": {
        "name": "ubuntu",
        "credentials": {
            "username": "user1",
            "password": "{{password}}"
        }
    },
    "command": "bash",
    "args": [  "-c", "sleep 20" ],
    "systemFiles": {
        "url": "gds://taskruns"
    },
    "retryLimit": 0
}

Last updated