Metadata Models

Illumina Connected Analytics allows you to create and assign metadata to capture additional information about samples.

Each tenant has one root metadata model that is accessible to all projects in the tenant. This allows an organization to collect the same piece of information for every sample in every project in the tenant, such as an ID number. Within this root model, you can configure multiple metadata submodels, even at different levels.

Illumina recommends that you limit the amount of fields or field groups you add to the root model. If there are any misconfigured items in the root model, it will carry over into all other metadata models in the tenant. Once a root model is published, the fields and groups that are defined within it cannot be deleted. You should first consider creating submodels before adding anything to the root model. When configuring a project, you have the option to assign one published metadata model for all samples in the project. This metadata model can be the root model, a submodel of the root model, or a submodel of a submodel. It can be any published metadata model in the tenant. When a metadata model is selected for a project, all fields configured for the metadata model, and all fields in any parent models are applied to the samples in the project.

❗️ Illumina recommends that you limit the amount of fields or field groups you add to the root model. You should first consider creating submodels before adding anything to the root model.

Metadata concepts

The following terminology is used within this page:

  • Metadata fields = Metadata fields will be linked to a sample in the context of a project. They can be of various types and could contain single or multiple values.

  • Metadata groups = You can identify that a few fields belong together (for example, they all are related to quality metrics). That would be the moment to create a group so that the user knows these fields belong together

  • Root model = Model that is linked to the tenant. Every metadata model that you link to a project will also contain the fields and groups specified in this model as this is a parent model for all other models. This is a subcategory of a project metadata model

  • Child/Sub model = Any metadata model that is not the root model. Child models will inherit all fields and groups from their parent models. This is a subcategory of a project metadata model

  • Pipeline model = Model that is linked to a specific pipeline and not a project

Metadata in the context of ICA will always give information about a sample. It can be provided by the user, the pipeline and via the API. There are 2 general categories of metadata models: Project Metadata Model and Pipeline Metadata Model. Both models are built from metadata fields and groups. The project metadata model is specific per tenant, while the pipeline metadata model is linked to a pipeline and can be shared across tenants. These models are defined by users.

Each sample can have multiple metadata models. Whenever you link a project metadata model to your project, you will see its groups and fields present on each sample. The root model from that tenant will also be present as every metadata model inherits the groups and fields specified in the parent metadata model(s). When a pipeline is executed with sample and the pipeline contained a metadata model, the groups and fields will be present as well for each analysis that comes out of a pipeline execution.

Groups & fields

The following field types are used within ICA:

  • Text: Free text

  • Keyword: Automatically complete value based on already used values

  • Numeric: Only numbers

  • Boolean: True or false, cannot be multiple value

  • Date: e.g. 23/02/2022

  • Date time: e.g. 23/02/2022 11:43:53, saved in UTC

  • Enumeration: select value out of drop-down list

The following properties can be selected for groups & fields:

  • Required: Pipeline can’t be started with this sample until the required group/field is filled in

  • Sensitive: Values of this group/field are only visible to project users of the own tenant. When a sample is shared across tenants, these fields won't be visible

  • Filled by pipeline: Fields that need to be filled by pipeline should be part of the same group. This group will automatically be multiple value and values will be available after pipeline execution. This property is only available for groups

  • Multiple value: This group/field can consist out of multiple (grouped) values

❗️ Fields cannot be both required and filled by pipeline

Project vs. Pipeline Metadata Models

Project metadata model has metadata linked to a specific project. Values are known upront, general information is required for each sample of a specific project, and it may include general mandatory company information.

Pipeline metadata model has metadata linked to a specific pipeline. Values are populated during the pipeline execution and it requires an output file with the name 'metadata.response.json'.

❗️ Field groups should be used when configuring metadata fields that are filled by a pipeline. These fields should be part of the same field group and be configured with the Multiple Value setting enabled

Metadata Actions

Publish a Metadata Model

Newly created and updated metadata models are not available for use within the tenant until the metadata model is published. When a metadata model is published, fields and field groups cannot be deleted, but the names and descriptions for fields and field groups can be edited. A model can be published after verifying all parent models are published first.

Retire a Metadata Model

If a published metadata model is no longer needed, you can retire the model (except the root model).

  1. First, check if the model contains any submodels. A model cannot be retired if it contains any published submodels.

  2. When you are certain you want to retire a model and all submodels are retired, click on the three dots in the top right of the model window, and then select Retire Metadata Model.

Assign a Metadata Model to a Project

To add metadata to your samples, you first need to assign a metadata model to your project.

  1. Go to Projects > your_project > Project Settings > Details.

  2. Select Edit.

  3. From the Metadata Model drop-down list, select the metadata model you want to use for the project.

  4. Select Save. All fields configured for the metadata model, and all fields in any parent models are applied to the samples in the project.

Add Metadata to Samples Manually

To manually add metadata to samples in your project, do as follows.

  1. Precondition is that you have a metadata model assigned to your project

  2. Go to Projects > your_project > Samples > your_sample.

  3. Double-click your sample to open the sample details.

  4. Enter all metadata information as it applies to the selected sample. All required metadata fields must be populated or the pipeline cannot start.

  5. Select Save

Populating a Pipeline Metadata Model

To fill metadata by pipeline executions, a pipeline model must be created.

  1. In the Illumina Connected Analytics main navigation, go to Projects > your_project > Flow > Pipelines > your_pipeline.

  2. Double-click on your pipeline to open the pipeline details.

  3. Create/Edit your model under Metadata Model tab. Field groups should be used when configuring metadata fields that are filled by a pipeline. These fields should be part of the same field group and be configured with the Multiple Value setting enabled.

In order for your pipeline to fill the metadata model, an output file with the name metadata.response.json must be generated. After adding your group fields to the pipeline model, click on Generate example JSON to view the required format for your pipeline.

❗️ The field names cannot have . in them, e.g. for the metric name Q30 bases (excl. dup & clipped bases) the . after excl must be removed.

Pushing Metadata Metrics to Base

Populating metadata models of samples allows having a sample-centric view of all the metadata. It is also possible to synchorinize that data into your project's Base warehouse.

  1. In the Illumina Connected Analytics main navigation, select Projects.

  2. In your project menu select Schedule.

  3. Select 'Add new', and then click on the Metadata Schedule option.

  4. Type a name for your schedule, optionally add description, and select whether you would like the metadata source would be the current project or the entire tenant. It is also possible to select whether ICA references would be anonymized and if sensitive metadata fields would be included. As a reminder, values of sensitive metadata fields would not be visible to other users outside of the project.

  5. Select Save.

  6. Navigate to Tables under BASE menu in your project.

  7. Two new table schemas should be added with your current metadata models.

Last updated