Task Execution
The basic building blocks of pipelines are Tools - containerized applications executed on a distributed cloud infrastructure with defined compute resources and execution environment conditions. The platform runs Tools through the Task Execution Service (TES), which hosts a suite of APIs for launching and monitoring the execution of the containerized applications. The TES APIs operate on the Task resource model. During pipeline execution, the Tool definition of each step is translated to a task. Tasks contain an execution specification as part of the resource model, containing all the information needed for TES to provision and launch the containerized application.
Requesting Resources
The Task Execution Service supports different compute types depending on the values provided in the execution.environment.resources
section of the task version or task run body.
ℹ️ Queued task runs are fulfilled as resources become available. Ordering is not gauranteed.
Type and Size
For the type and size fields, you can select from the following combinations:
Type | Size | CPU | Memory |
---|---|---|---|
standard | small | .8 CPU | 3 GB |
medium | 1.3 CPU | 4.5 GB | |
large | 2 CPU | 7 GB | |
xlarge | 4 CPU | 14 GB | |
xxlarge | 8 CPU | 28 GB | |
standardHiCpu | small | 15.5 CPU | 28 GB |
medium | 35.5 CPU | 68 GB | |
large | 71.5 CPU | 140 GB | |
standardHiMem | small | 7.5 CPU | 60 GB |
medium | 15.5 CPU | 124 GB | |
large | 47.5 CPU | 380 GB | |
xlarge | 95.5 CPU | 764 GB | |
fpga | small | 7.5 CPU | 118 GB |
medium | 15.5 CPU | 240 GB | |
large | 63.5 CPU | 972 GB |
FPGA compute types have limited system-wide availability. If the system is under heavy load, the run may not be scheduled for a long time, depending on the load.
The
FPGA, large
compute type is unavailable in the cac1 region TheFPGA, large
compute type is unavailable in the aps2 region
The exact memory and CPU resources provisioned for a task run may vary for a given compute type in the table above. This is done to optimize for availability to ensure a job is scheduled in a timely manner while satisfying the minimum resources requested. This may result in slight variations in a task run's performance and duration when executed on the same inputs multiple times.
If you do not specify a resource size and type, then the task is executed on the smallest instance available when the request is made.
Tier
Compute resource tiers provide pricing options to save cost at the risk of having runs more susceptible to capacity limitations. Choosing a low cost tier schedules the task run on a compute node that may be interrupted and re-provisioned when the system is under load. For short running jobs on smaller compute types and sizes that are tolerant to interruption, this works well. For long running jobs running on more powerful compute types and sizes, the change of interruption increases and may severely impact total run duration.
Tier | Description |
---|---|
economy | Lowest cost option. The run may be interrupted and will be continue to be rescheduled on interruptible nodes when restarted upon interruption or failure. |
standard | Highest cost option. The run will be scheduled on a non-interruptible node and will be rescheduled to non-interruptible nodes when restarted upon interruption or failure. |
FPGA compute types are limited to the standard tier in the regions below.
London (euw2)
Canada (cac1)
Singapore (aps1)
Frankfurt (euc1)
Environment Variables
Environment variables may be set in the container executing the task run.
Secure Environment Variables
Environment variables may be secured to hide them from log outputs and API responses. Use the SECURE_
prefix to indicate a environment variable as secure.
Substitution
Within the execution body, either a static value or a substitution can be provided for a field's value. A substitution is made using a string wrapped with {{<string>}}
and allows the actual values to be specified at launch time using arguments. Substitutions can be reused for multiple fields and can be embedded within a string value.
Certain fields, like passwords or secrets, require substitutions to prevent secrets from being stored with the version. The value for the secret will be replaced in response bodies with "<hidden>"
rather than the value itself.
The following is an example task execution specification leveraging substitutions:
The input arguments are then provided in the arguments
of the version launch request.
Logging
Throughout a task run's life-cycle, several logs and system-related files are produced with information about the execution of the job. TES requires the user to provide an external location to serve as the file store for these files. A "systemFiles" field in the execution body is used to provide a URL and optional credentials (similar to an output mapping) for storing these files.
During the execution of the task run, TES creates a folder with the name matching the task run ID (ie, trn.<uid>
) directly under the final path component of the provided URL. For example, if the above task run is executed, a folder is created at gds://taskruns/trn.<uid>
where trn.<uid>
is the unique ID assigned to the task run resource. Any system-related files will be stored within the trn.<uid>
folder.
stdout/stderr
During task run execution, the stdout and stderr of executing processes are redirected to /var/log/tessystemlogs/task-stdouterr.log
. Other container log artifacts are placed in the /var/log/tessystemlogs
folder. These log files are uploaded every 3 seconds and can be accessed while the task run is executing.
An output mapping may be specified using the /var/log/tessystemlogs
as the path to send the folder contents to an alternative location from the URL specified in the "systemFiles" field.
The following is an example of a task execution specification that will send logs stored in /var/log/tessystemlogs
to gds://volume1/myLogs
. Any other system-related files produced by the task run job will be sent to the "systemFiles" URL, gds://taskruns
:
Other logs
The following is a table of the files produced during a task run's life-cycle. Some files are only produced under certain conditions.
File | Description |
---|---|
bsfs-stdouterr.log | Logs associated with mounting input files |
logging-stdouterr.log | Logs associated with logging container |
output#-stdouterr.log | Logs associated with uploading to each output location (# is replaced with the index of the output in the execution body) |
task-stdouterr.log | stdout/stderr of the application |
_manifest.json | Records metadata for all uploaded files, including:
|
_tags.json | Records the UTC timestamp when the uploads are completed and the Task Run ID. |
Marshalling Data
Inputs
The task execution body provides an array of inputs that will be attached to volume mounts on the running instance of the task. Each object in the inputs array must contain a "path" and a "url". The path is the volume mount location in the container to which the input will be downloaded. The url is the external location the input will be downloaded from.
Input paths are made read-only at the last folder component in the path. In general, applications run on TES should use the /scratch
path for intermediate files. The DRAGEN application should use the /ephemeral
path for intermediate files.
Inputs must meet the following conditions:
The path must be absolute.
The same path must not be reused for multiple inputs.
The path must not lead to any of the following:
/
,/usr
,/var
,/log
,/lib
,/usr/bin
.Http-based URLs must not require authentication.
GDS-based URLs must be accessible by the token used to launch the task run.
S3-based URLs must provide valid credentials (see below)
TES currently supports a maximum of 20,000 input files, including files mounted with a folder input.
Input File Example
Input Folder Example
When you specify a folder as input, the "systemFiles" property must also be set.
AWS S3 Inputs
To read inputs from a private S3 bucket, the credentials to that bucket must be provided in the credentials field of the inputs, and the storageProvider must be set to aws. A substitution is required for each of the fields in credentials when defined in a task version. The following are the valid keys that can be provided in credentials:
AWS_ACCESS_KEY_ID
AWS_SECRET_ACCESS_KEY
AWS_SESSION_TOKEN
There are two ways to provide access keys. For permanent credentials, include the AWS_ACCESS_KEY_ID and the AWS_SECRET_ACCESS_KEY. For temporary credentials, include the AWS_ACCESS_KEY_ID, the AWS_SECRET_ACCESS_KEY, and the AWS_SESSION_TOKEN.
The following is an example of a task execution specification that reads inputs from a private S3 location:
Download Mode
By default, input resources are streamed to the task run job during execution. It may be preferable to force the complete download of certain resources prior to executing the command. For example, applications that use a random access pattern need the complete file contents available. Inputs may be specified as requiring download using the "mode" field with a value of "download". Available options for the mode include "download" and "stream" (default).
Manifest type input
Each TES task has a maximum number of input files (128) allowed in the inputs
list (input of type file
).
To launch a task with very large number of inputs, user may use one input of type manifest
.
Here mode
can be either download
or stream
, which is applied to all input files in the manifest. The value of url
is an https-based presigned URL of the manifest JSON file (in case of GDS based manifest JSON, user needs to call GDS API to get its presigned URL). The manifest JSON is a list of input items, each item in following format,
Here, url
is the presigned URL of each input file, size
is the exact size in byte of input file, and path
is the intended mount path relative to the path of manifest itself inside the container. For instance, in above example, the absolute mount path of file hg38_alt_aware_nohla.fa
is /manifest/mount/path/reference/fasta/hg38_alt_aware_nohla.fa
.
Note, GDS folder or S3 folder is not supported inside manifest. The presigned URLs of all files under the folder need to be iteratively generated before being added to the manifest JSON.
Only one input of type manifest
is allowed in the inputs
list of a task launch request. All additional input (of type file
or folder
) are ignored. They should be included into the manifest JSON.
Manifest JSON can be gzipped. The max size of input manifest JSON is 1GB (gzipped or uncompressed).
Outputs
The task execution body provides an array of outputs to upload files and folders local to the task container to an external URL. Similar to inputs, each object in the outputs array must contain a "path" and a "url". The contents of the path will be uploaded to the mapped URL.
Requirements:
The path must be absolute.
The same path must not be reused for multiple outputs.
The path and URL must lead to a folder.
The URL scheme must match one of the following:
gds://
,s3://
GDS-based URLs must be accessible by the token used to launch the task run.
In addition to the outputs generated by the task execution, a _manifest.json
and _tags.json
file is uploaded to each mounted output location. These files contain information about the files uploaded to that specific mount location.
File | Description |
---|---|
_manifest.json | Records metadata for all uploaded files, including the following:
|
_tags.json | Records the UTC timestamp when the uploads completed and the Task Run Id. |
AWS S3 Outputs
To write outputs to a private S3 bucket, the credentials to that bucket must be provided in the "credentials" field of the "outputs", and the "storageProvider" must be set to "aws". A substitution is required to be provided for each of the fields in credentials when defined in a task version. The following are the valid keys that can be provided in credentials:
AWS_ACCESS_KEY_ID
AWS_SECRET_ACCESS_KEY
AWS_SESSION_TOKEN
There are two ways to provide access keys. For permanent credentials, include the AWS_ACCESS_KEY_ID and the AWS_SECRET_ACCESS_KEY. For temporary credentials, include the AWS_ACCESS_KEY_ID, the AWS_SECRET_ACCESS_KEY, and the AWS_SESSION_TOKEN.
The following is an example of an execution specification that, when launched, will output logs to a private S3 location:
Private Image Repositories
TES supports running Docker-based images from public or private repositories. For images stored in a private repository, such as a private Docker repo or a private AWS ECR, access must be provided through credentials or an AWS policy.
Private AWS ECR
For images in a private AWS ECR, a policy will need to be added to the ECR to allow TES to pull images. For instructions on creating AWS policies, see Amazon ECR Repository Policy Examples. The following policy should be added to an AWS ECR to allow TES to access images:
Substitute <platform_aws_account>
with the platform AWS account ID: 079623148045
.
Setting this policy allows the image to be specified for task runs by any ICA users, so it is important to ensure no private data is stored on the images in the AWS ECR.
Example AWS ECR Image
Private Docker Hub
To provide access to private docker hub images, the user name and password for the account hosting the image must be provided. TES requires the password be provided at launch time for security. Task versions requiring a password must utilize substitution for the password to be provided in the launch arguments.
The following is an example of a task execution specification that can be provided with the image password at launch time:
When this task version is launched, the password is provided in the launch arguments as follows:
Bring Your Own Docker
For information on how to create a docker image for execution in TES, see Create a Base Image.
For information on how to make your image available for TES to run, see Push Images to Docker Cloud.
TES adopted the Kubernetes convention for launching, as follows: the Docker image's ENTRYPOINT is overridden by the "Command" field, and the Docker Image's CMD is overridden by the "Args" field in the task execution body. The "Image" field should match the image name in Docker Hub. To pull images from a private repository, you can provide the image credentials in the execution object.
Currently TES does not support the array syntax for "Command". Only a string can be provided. If your Docker image requires the array syntax, it must be enabled in the image itself by specifying the ENTRYPOINT as an array, and "Command" must not be specified in the task execution specification.
Working Directory
The working directory of the task run may be set using the "workingDirectory" field within the execution body. The value specified must be an absolute path, and will set the WORKDIR of the Dockerfile. For more information, see the Dockerfile reference.
Performance Optimizations
If the task container produces 1000 of output files and each file is mapped to same folder with 1000 output mount path-aws s3 may fail to save all. The AWS s3 uses the first prefix as the partition key. The recommendation is to use different prefix in such cases. This InfoQ article provides a detailed explanation.
Instance Retry
Task runs may experience an unexpected interruption where the compute instance hosting the task run job fails. Failure causes include the following:
Hardware failures
Instance eviction by the cloud infrastructure
Task run application fails with non-0 exit code
To prevent unexpected task run failures, a retryLimit can be provided to specify the number of attempts a task run should be retried if an unexpected job failure occurs. The retryLimit field is specified in the execution body as an integer between 0 and 6 with a default of 3. When developing and testing a task run, it's recommended to use a retryLimit of 0.
Last updated