Base

Introduction to Base

Base is a genomics data aggregation and knowledge management solution suite. It is a secure and scalable integrated genomics data analysis solution which provides Information management and knowledge mining. Users are able to analyze, aggregate and query data for new insights that can inform and improve diagnostic assay development, clinical trials, patient testing and patient care. For this, all clinically relevant data generated from routine clinical testing needs to be extracted and clinical questions need to be asked across all data and information sources. As a large data store, Base provides a secure and compliant environment to accumulate data, allowing for efficient exploration of the aggregated data. This data consists of test results, patient data, metadata, reference data, consent and QC data.

Base User Personas and Use Cases

Base can be used by different user personas supporting different use cases:

  • Clinical and Academic Researchers:

    • Big data storage solution housing all aggregated sample test outcomes

    • Analyze information by way of a convenient query formalism

    • Look for signals in combined phenotypic and genotypic data

    • Analyze QC patterns over large cohorts of patients

    • Securely share (sub)sets of data with other scientists

    • Generate reports and analyze trends in a straightforward and simple manner.

  • Bioinformaticians:

    • Access, consult, audit, and query all relevant data and QC information for tests run

    • All accumulated data and accessible pipelines can be used to investigate and improve bioinformatics for clinical analysis

    • Metadata is captured via automatic pipeline version tracking, including information on individual tools and/or reference files used during processing for each sample analyzed, information on the duration of the pipeline, the execution path of the different analytical steps, or in case of failure, exit codes can be warehoused.

  • Product Developers and Service Providers:

    • Better understand the efficiency of kits and tests

    • Analyze usage, understand QC data trends, improve products

    • Store and aggregate business intelligence data such as lab identification, consumption patterns and frequency, as well as allow renderings of test result outcome trends and much more.

Base Action Possibilities

  • Data Warehouse Creation: in which desired data sets can be selected and aggregated. Typical data sets include available VCF and other suitable (meta)data files generated by the ICA platform which can be complemented by additional public (or privately built) databases.

  • Report and Export: Once created, a data warehouse can be mined using standard database query instructions. All Base data is stored in a structured and easily accessible way. An interface allows for the selection of specific datasets and conditional reporting. All queries can be stored, shared, and re-used in the future. This type of standard functionality supports most expected basic mining operations, such as variant frequency aggregation. All result sets can be downloaded or exported in various standard data formats for integration in other reporting or analytical applications.

  • Detect Signals and Patterns: extensive and detailed selection of subsets of patients or samples adhering to any imaginable set of conditions is possible. Users can, for example, group and list subjects based on a combination of (several) specific genetic variants in combination with patient characteristics such as therapeutic (outcome) information. The built-in integration with public datasets allows users to retrieve all relevant publications, or clinically significant information for a single individual or a group of samples with a specific variant. Virtually any possible combination of stored sample and patient information allow for detecting signals and patterns by a simple single query on the big data set.

  • Profile/Cluster patients: use and re-analyze patient cohort information based on specific sample or individual characteristics. For instance, they might want to run a next agile iteration of clinical trials with only patients that respond. Through integrated and structured consent information allowing for time-boxed use, combined with the capability to group subjects by the use of a simple query, patients can be stratified and combined to export all relevant individuals with their genotypic and phenotypic information to be used for further research.

  • Share your data: Data sharing is subject to strict ethical and regulatory requirements. Base provides built-in functionality to securely share (sub)sets of your aggregated data with third parties. All data access can be monitored and audited, in this way Base data can be shared with people in and outside of an organization in a compliant and controlled fashion.

Access

Base is a module that can be found in a project. It is shown in the menu bar of the project.

❗️Before users can access Base:

  • On the domain level, Base needs to be included in the subscription

  • On the project level, the project owner needs to enable Base

  • On the user level, the project administrator needs to enable workgroups to access the Base pages

Permission to Enable Base

The access to activate the Base module is controlled based upon the chosen subscription (full and premium subscriptions give access to Base) when registering the account. This will all happen automatically after the first user logs into the system for that account. So from the moment the account is up and running, the Base module will also be ready to be enabled.

Enable Base

When a user has created a project, they can go to the Base pages and click the Enable button. From that moment on, every user who has the proper permissions has access to the Base module in that project.

Only the project owner can enable Illumina Connected Analytics Base. Make sure that your subscription for the domain includes Base.

  1. In the project, select any page under Base.

  2. Select a bundle. The bundles available depend on your Illumina Connected Analytics subscription.

  3. Select Enable

Access Base pages

Access to the projects and all modules located within the project is provided via the Team page within the project.

Tables

All tables created within Base are gathered on the Tables page. New tables can be created and existing tables can be updated or deleted here.

Create new table

To create a new table, click + New table on the Tables page. Tables can be created from scratch or from a template that was previously saved. Once a table is saved it is no longer possible to edit the schema, only new fields can be added. To edit an existing schema, the original schema can be copied as text and pasted into a new empty table where the necessary changes can be made before saving this new table.

Empty Table

To create a table from scratch, complete all fields as indicated below in the following sections and click the Save button. Once saved, a job will be created to create the table. To view table creation progress, navigate to the Activity page.

Table information

The table name is a required field and must be a unique name. The first character of the table must be a letter followed by letters, numbers or underscores. The description is optional.

References

Including or excluding references can be done by checking or un-checking the Include reference checkbox. By including references, additional columns will be added to the schema (see next paragraph) which can contain references to the data on the platform:

  • data_reference: reference to the data element in the Illumina platform from which the record originates

  • data_name: original name of the data element in the Illumina platform from which the record originates

  • sample_reference: reference to the sample in the Illumina platform from which the record originates

  • sample_name: name of the sample in the Illumina platform from which the record originates

  • pipeline_reference: reference to the pipeline in the Illumina platform from which the record originates

  • pipeline_name: name of the pipeline in the Illumina platform from which the record originates

  • execution_reference: reference to the pipeline execution in the Illumina platform from which the record originates

  • account_reference: reference to the account in the Illumina platform from which the record originates

  • account_name: name of the account in the Illumina platform from which the record originates

Schema

In an empty table, users can create a schema by adding a field for each column of the table and defining it. The + Add field button can be found in the upper, right-hand corner of the schema. At any time during the creation process it is possible to switch to the ‘edit as text’ mode and back. The text mode shows the JSON code, whereas the original view shows the fields in a table.

Each field requires:

  • a name – this has to be unique

  • a type

    • String – collection of characters

    • Bytes – raw binary data

    • Integer – whole numbers

    • Float – fractional numbers

    • Numeric – any number

    • Boolean – only options are “true” or “false”

    • Timestamp - Stores number of (milli)seconds passed since the Unix epoch

    • Date - Stores date in the format YYYY-MM-DD

    • Time - Stores time in the format HH:MI:SS

    • Datetime - Stores date and time information in the format YYYY-MM-DD HH:MI:SS

    • Record – has a child field

  • a mode - indicate whether the field value

    • is required

    • can be nullable

    • can be repeated

  • it is also possible (but not required) to indicate whether the value of the field matches with a database id from the drop-down list. This will create a hyperlink that will bring you to the database item.

From Template

Users can create their own template by making a schema (by creating a new empty table) and clicking “Save as template”. An overview of all saved table templates in your account can be found on the Base Management page.

If a template is created and available/active, it is possible to create a new table based on this template. The table information and references follow the rules of the empty table but in this case the schema will be pre-filled. It is possible to still edit the schema that is based on the template.

Table status

The status of a table can be found on the Tables page. The possible statuses are:

  • Available: Ready to be used, both with or without data

  • Pending: The system is still processing the table, there is probably a process running to fill the table with data

  • Deleted: The table is deleted functionally; it still exists and can be shown in the list again by clicking the “Show deleted tables” button

Additional Considerations

  • Tables created from empty data or from a template are “Available” faster.

  • When copying a table with data, it takes longer. These can remain in a “Pending” state for longer periods of time.

  • Clicking on the page's refresh button will update the list.

Table details

For any available table, the following details can be found:

  • Table information: Name, description, number of records and data size

  • Schema definition: An overview of the table schema, also available in text. Fields can be added to the schema but not deleted. For deleting fields: copy the schema as text and paste in a new empty table where the schema is still editable.

  • Preview: A preview of the table for the 50 first rows (when data is uploaded into the table)

  • Data: the files that are currently uploaded into the table.

Table actions

From within the details of a table it is possible to perform the following actions related to the table:

  • Copy: Create a copy from this table in the same or a different project. In order to copy to another project, data sharing of the original project should be enabled in the details of this project. The user also has to have access to both original and target project.

  • Export as file: Export this table as a CSV, JSON or PARQUET file. The exported file can be found in a project where the user has the access to download it.

  • Save as template: Save the schema or an edited form of it as a template.

  • Add data: Load additional data into the table manually. This can be done by selecting data files previously uploaded to the project, or by dragging and dropping files directly into the popup window for adding data to the table. It’s also possible to load data into a table manually or automatically via a pre-configured job. This can be done on the Schedule page.

  • Delete: Delete the table.

Query

Queries can be used for data mining. On the Query page:

  • New queries can be created and executed

  • Already executed queries can be found in the query history

  • Saved queries and query templates are listed under the saved queries tab.

New Query

Available tables

All available tables and their details are listed on the New Query tab. There are 3 types of tables that can be used for querying:

  • Created tables: Tables created in the project by a user.

  • Metadata tables: Contains metadata from samples that are linked to a table in Base. It is created by syncing it with the Base module. This synchronization is configured on the Details page within the project.

  • Public tables: public databases that are made available within Base by Illumina

Create new query

Input

Queries are executed using standard SQL (e.g., Select * From table). While running the query, errors are checked. If errors are captured, this is represented above the input box. The query can be immediately executed or saved for future use.

Result

If the query is valid for execution, the result will be shown as a table underneath the input box. From within the result page of the query, it is possible to save the result in two ways:

  • Download: As Excel or JSON file to the computer.

  • Export: As new table, as a view or as file to the project in CSV, JSON or AVRO format.

Run a New Query

  1. Select a project.

  2. From the project menu, select Query.

  3. Enter the query to execute using SQL.

  4. Select Run Query.

  5. Select Save Query to add the query to your saved queries list.

Query history

The query history lists all queries that were executed. Historical queries are shown with their date, executing user, returned rows and duration of the run. For each historical query listed, it is possible to:

  • Open: This will open the query again in the “New query” tab.

  • Save: This will save the query, so it will be visible in the “Saved queries” tab.

  • View results: The results of an executed query are available for approximately 24 hours. To see the results after that time period, the query needs to be re-executed.

Run a Previous Query

If you have run a query, you can run the query again by selecting it from the Query History.

  1. Select a project.

  2. From the project menu, select Query.

  3. Select the Query History tab.

  4. Select a query.

  5. Perform one of the following actions:

    • Open Query—Open the query in the New Query tab. You can then select Run Query to execute the query again.

    • Save Query—Save the query to the saved queries list.

    • View Results—Download the results from a query or export results to a new table, view, or file in the project. Results are available for 24 hours after the query is executed. To view results after 24 hours, you need to execute the query again.

Saved queries

All queries saved within the project are listed under the “Saved Queries” tab together with the query templates.

The saved queries can be:

  • Opened: This will open the query again in the “New query” tab.

  • Saved as template: The saved query becomes a query template.

  • Deleted: The query is removed from the list and cannot be opened again.

The query templates can be:

  • Opened: This will open the query again in the “New query” tab.

  • Deleted: The query is removed from the list and cannot be opened again.

It is possible to edit the saved queries and templates by double-clicking on each query or template. The data classification of the templates can also be changed as follows:

  • Account: The query template will be available for everyone within the account

  • User: The query template will be available for the user who created it

Run a Saved Query

If you have saved a query, you can run the query again by selecting it from the list of saved queries.

  1. Select a project.

  2. From the project menu, select Query.

  3. Select the Saved Query tab.

  4. Select a query.

  5. Perform one of the following actions:

    • Open Query—Open the query in the New Query tab. You can edit the query, and then select Run Query to execute the query again.

    • Save as template—Save the query as a template for future queries.

    • Delete Query—Remove the query from the saved queries list.

Schedule

On the Schedule page within the Base module, it’s possible to create a job for importing different types of data you have access to into an existing table. You can schedule this job to run automatically and/or have it executed on demand:

  • Automatic import: Check the Active box within the configured schedule. The job will run on a daily basis.

  • Manual import: Select the schedule to run and click the Run button.

Configure a schedule

There are three types of schedules that can be set up.

Files

This type will load the content of specific files from this project into a table. When adding or editing this schedule you can define the following parameters:

  • Active: The job will run automatically if checked

  • Name – required field: The name of the scheduled job

  • Description: Extra information about the schedule

  • Source:

    • Project: All files with the correct naming from this project will be used.

  • Search for a part of a specific ‘Original Name’ or Tag – required field: Define in this field a part or the full name of the file name or of the tag that the files you want to upload contain. For example, if you want to import files named sample1_reads.txt, sample2_reads.txt, … you can fill in _reads.txt in this field to have all files that contain _reads.txt imported to the table.

  • Generated by Pipelines: Only files generated by these selected pipelines are taken into account. When left clear, files from all pipelines are used.

  • Target Base Table – required field: The table to which the information needs to be added. A drop-down list with all created tables is shown. This means the table needs to be created before the schedule can be created.

  • Write preference: Define data handling; whether it can overwrite the data

  • Data format - required: CSV, TSV, JSON, AVRO, PARQUET

  • Delimiter: to indicate which delimiter is used in the delimiter separated file. If the delimiter is not present in list, it can be indicated as custom.

  • Custom delimiter: the custom delimiter that is used in the file.

  • Header rows to skip: Number of rows in the file that can be skipped

  • References: Choose which references must be added to the table

Delete schedule

Schedules can be deleted. Once deleted, they will no longer run, and they will not be shown in the list of schedules.

Run schedule

When clicking the Run button, the schedule will start the job of importing the configured data in the correct tables. This way the schedule can be run manually. The result of the job can be seen in the tables.

Last updated