28
Cloud Plugin USER MANUAL

$MPVE1MVHJO - QIAGEN Bioinformatics · An AWS Identity and Access Management (IAM) user with credentials for programmatic access (access key ID, secret access key). Such a user can

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: $MPVE1MVHJO - QIAGEN Bioinformatics · An AWS Identity and Access Management (IAM) user with credentials for programmatic access (access key ID, secret access key). Such a user can

Cloud Plugin

USER MANUAL

Page 2: $MPVE1MVHJO - QIAGEN Bioinformatics · An AWS Identity and Access Management (IAM) user with credentials for programmatic access (access key ID, secret access key). Such a user can

User manual for

Cloud Plugin 20.0Windows, macOS and Linux

June 12, 2020

This software is for research purposes only.

QIAGEN AarhusSilkeborgvej 2PrismetDK-8000 Aarhus CDenmark

Page 3: $MPVE1MVHJO - QIAGEN Bioinformatics · An AWS Identity and Access Management (IAM) user with credentials for programmatic access (access key ID, secret access key). Such a user can

Contents

1 Introduction 4

1.1 Prerequisites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2 Configuring the cloud connection 6

2.1 Settings for Amazon Web Services . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.2 Settings for CLC Genomics Cloud Engine . . . . . . . . . . . . . . . . . . . . . . . 7

3 Importing data from the cloud 9

4 Exporting data to the cloud 10

5 Running workflows in the cloud 11

6 Cloud Job Search 14

6.1 Downloading all results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

6.2 Selective download of results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

7 Using the Cloud Plugin with the CLC Genomics Server 19

8 Configuring the Cloud Server Plugin 21

8.0.1 Configuring the AWS and CLC Genomics Cloud Engine settings . . . . . . . 21

8.0.2 Configuring cloud presets . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

9 Troubleshooting 24

10 Installing and uninstalling Workbench plugins 26

10.1 Install . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

10.2 Uninstall . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

3

Page 4: $MPVE1MVHJO - QIAGEN Bioinformatics · An AWS Identity and Access Management (IAM) user with credentials for programmatic access (access key ID, secret access key). Such a user can

Chapter 1

Introduction

The Cloud Plugin provides functionality to:

• Import high-throughput sequencing data from Amazon Simple Storage Service (Amazon S3)to be used in analyses in the CLC Workbench.

• Export data from the CLC Workbench to Amazon S3.

• Run workflows in the cloud through the CLC Genomics Cloud Engine (GCE).

• Find jobs and download results from jobs run in the cloud through GCE.

1.1 PrerequisitesRequirements for importing and exporting data to Amazon S3

To import data from and export data to Amazon S3, you need:

• An Amazon Web Services (AWS) account.

• An AWS Identity and Access Management (IAM) user with credentials for programmaticaccess (access key ID, secret access key). Such a user can be set up for an AWS accountin the AWS Management Console: Services → IAM → Users → Add User. Programmaticaccess should be granted to the user when setting the AWS access type (figure 1.1).Please note that the IAM user must only have access to the S3 locations where GCE, theCLC Genomics Server, and the CLC Workbench should be able to access files. The IAM usershould for example not have API access to anything else in AWS.

• The IAM User should be granted the following permission policy: AmazonS3FullAccess. Amore limiting policy can be used if there is a need to restrict access to Amazon S3 to somedegree.

Requirements for running workflows and accessing results

In addition to the requirements above, you will need the following to be able to run workflows inthe cloud, to find jobs or to download results from jobs that have been run in the cloud:

4

Page 5: $MPVE1MVHJO - QIAGEN Bioinformatics · An AWS Identity and Access Management (IAM) user with credentials for programmatic access (access key ID, secret access key). Such a user can

CHAPTER 1. INTRODUCTION 5

Figure 1.1: Enabling programmatic access for an AWS IAM user can be done in the AWSManagement Console. Note: This screenshot is for illustration purposes only. Details may bechanged at any time by AWS.

• The CLC Genomics Cloud Engine deployed on your AWS account. Read more aboutthe product and request a quote here: https://www.qiagenbioinformatics.com/products/clc-genomics-cloud-engine/.

• At least one Amazon S3 bucket set up to be used for caching uploaded files. The CLCGenomics Cloud Engine includes a tool to set up such a bucket.

• The following permission policy granted to the IAM User: AWSResourceGroupsReadOnlyAc-cess. This policy grants the user the rights needed to automatically find the cache bucketin Amazon S3.

Page 6: $MPVE1MVHJO - QIAGEN Bioinformatics · An AWS Identity and Access Management (IAM) user with credentials for programmatic access (access key ID, secret access key). Such a user can

Chapter 2

Configuring the cloud connection

To configure the cloud connection, go to:

File | Cloud Connection ( )

The same menu item is also available in the main toolbar, under the Cloud menu (figure 2.1).

Figure 2.1: The Cloud menu provides access to the cloud configuration dialog, as well as the CloudJob Search functionality.

The cloud configuration dialog (figure 2.2) allows you to specify the AWS account credentials andthe settings for CLC Genomics Cloud Engine (GCE).

For importing data from the cloud and exporting data to the cloud, only the AWS credentials needto be specified. These are described in section 2.1.

For running workflows in the cloud, the GCE settings must be filled in as well. These are describedin section 2.2.

Connection status

The status of the cloud connection is shown using an icon in the status bar at the bottom of theCLC Workbench window:

• indicates that a connection has been established to Amazon S3 but not to the CLCGenomics Cloud Engine. Import and export are possible with this status.

• indicates that a connection has been established to Amazon S3 and to the CLC GenomicsCloud Engine. Import and export can be carried out, and workflows can be run in the cloudwith this status.

6

Page 7: $MPVE1MVHJO - QIAGEN Bioinformatics · An AWS Identity and Access Management (IAM) user with credentials for programmatic access (access key ID, secret access key). Such a user can

CHAPTER 2. CONFIGURING THE CLOUD CONNECTION 7

Figure 2.2: The cloud configuration dialog

2.1 Settings for Amazon Web ServicesAWS access key ID: The access key ID for programmatic access, set up for the AWS IAM useras described in section 1.1.

AWS secret access key: The secret access key for programmatic access, set up for the AWSIAM user as described in section 1.1.

AWS partition: The partition under which the AWS user is registered.

The window continually validates the settings that have been entered. When the settings arevalid, the Status box will contain the text "Valid" and a green icon will be shown (see figure 2.3)

Figure 2.3: A valid AWS configuration is indicated by the green icon and "Valid" in the Status box.

2.2 Settings for CLC Genomics Cloud EngineFor running workflows in the cloud, the use of the CLC Genomics Cloud Engine must be enabledby selecting the Enable connection checkbox.

URL: The URL pointing to the GCE service deployed on the specified AWS account, as describedin section 1.1.

Accept untrusted certificate: This checkbox should be selected when GCE has been set up witha self-signed certificate. We recommend setting up GCE with a trusted certificate.

Page 8: $MPVE1MVHJO - QIAGEN Bioinformatics · An AWS Identity and Access Management (IAM) user with credentials for programmatic access (access key ID, secret access key). Such a user can

CHAPTER 2. CONFIGURING THE CLOUD CONNECTION 8

Log In: Pressing this button opens a web browser, where you can log in to GCE using yourcompany credentials. When the authentication succeeds, the Status box will contain the nameand region of the connected user, and a green icon will be shown (see figure 2.4).

Log Out: Pressing this button disconnects the CLC Workbench from GCE.

AWS S3 cache bucket: The cache bucket to be used when uploading data to the cloud, asdescribed in section 1.1.

Figure 2.4: A valid GCE configuration is indicated by the green icon and the user details in theStatus box.

Page 9: $MPVE1MVHJO - QIAGEN Bioinformatics · An AWS Identity and Access Management (IAM) user with credentials for programmatic access (access key ID, secret access key). Such a user can

Chapter 3

Importing data from the cloud

In the CLC Genomics Workbench, the Cloud Plugin supports the import of next-generationsequencing (NGS) data from an AWS S3 location.

To import NGS data into the CLC Genomics Workbench from a cloud location, simply start an NGSdata importer from the Import drop-down menu. The wizard will allow you to use a cloud locationfor importing the data, as shown in figure 3.1. The selected files will first be downloaded to thetemporary folder, and subsequently imported into the CLC Workbench.

Figure 3.1: A cloud location can be selected when importing NGS data.

It is also possible to specify a cloud location to import data from when launching workflows ina CLC Workbench and choosing to import the data on the fly. In the case of the CLC GenomicsWorkbench, standard NGS read formats and .clc format files are supported.

9

Page 10: $MPVE1MVHJO - QIAGEN Bioinformatics · An AWS Identity and Access Management (IAM) user with credentials for programmatic access (access key ID, secret access key). Such a user can

Chapter 4

Exporting data to the cloud

To export data to the cloud, launch the exporter, and in the final configuration step, select the"Cloud" tab when specifying the export location (see figure 4.1). New folders can be created byright-clicking the name of a folder or a bucket, and selecting "New folder".

Figure 4.1: A cloud location can be selected for exporting files.

10

Page 11: $MPVE1MVHJO - QIAGEN Bioinformatics · An AWS Identity and Access Management (IAM) user with credentials for programmatic access (access key ID, secret access key). Such a user can

Chapter 5

Running workflows in the cloud

To run workflows in the cloud using the CLC Genomics Cloud Engine, select "CLC Genomics CloudEngine" in the first step when launching a workflow (see figure 5.1).

Figure 5.1: Select "CLC Genomics Cloud Engine" to run a workflow in the cloud.

Job priority and downloading results

The "Job priority" setting affects how quickly a job will be scheduled for execution:

• ASAP: The highest priority level. These jobs are always executed first.

• HIGH: The job will be placed in the priority queue with 60% probability.

• MEDIUM: The job will be placed in the priority queue with 30% probability.

• LOW: The job will be placed in the priority queue with 10% probability.

Select the Download result checkbox to download all results when the workflow is finished. Notethat when analyzing large datasets, it may be desirable to leave this box unchecked. Results canbe downloaded at a later time, including selective download of individual results, using the CloudJob Search functionality, described in chapter 6.

11

Page 12: $MPVE1MVHJO - QIAGEN Bioinformatics · An AWS Identity and Access Management (IAM) user with credentials for programmatic access (access key ID, secret access key). Such a user can

CHAPTER 5. RUNNING WORKFLOWS IN THE CLOUD 12

Data inputs and outputs

Locally available data and data already in the cloud can be used as input to workflows run in thecloud. The following data sources are available when launching a workflow:

• Elements from the Navigation Area Choose "Select from Navigation Area". The selecteddata will be uploaded to the AWS S3 cache bucket configured in the GCE settings describedin section 2.2. If a given data element is already present in the cache bucket, as determinedby the CLC URL of the data element and the time of its latest modification, then it will notbe uploaded.

• Locally available data to be imported on the fly when the workflow is run Choose "Selectfiles for import", and select the "Browse locally" button. The data will be uploaded to theAWS S3 cache bucket. If a file is already present in the cache bucket, as determined bythe full path of the file and the time of its latest modification, then it will not be uploaded.

• Data in the cloud, to be imported on the fly when running the workflow in the cloudChoose "Select files for import", and select the "Browse data in the cloud" button ( ).The analysis is carried out directly on the data in the cloud. It is not transferred to the CLCWorkbench.

In the Result handling step of the workflow wizard, the details of any data to be uploaded will bedisplayed, as shown in figure 5.2. If any of the data is already present in the cloud cache, andtherefore will not be uploaded, this will also be indicated.

Figure 5.2: The data to be uploaded to the cloud is shown in the Result handling step of theworkflow wizard.

In the last step of the wizard, a location must be selected for the outputs, even if the "Downloadresult" checkbox was not checked in the first configuration step. This location is used to savelog files to under some circumstances, for example, if a workflow run fails for particular reasons.

Following the progress of workflow jobs on the cloud

Each workflow submitted to the cloud is submitted as a batch consisting of jobs. A batch mayconsist of just a single job. Multiple jobs are included in a batch when:

Page 13: $MPVE1MVHJO - QIAGEN Bioinformatics · An AWS Identity and Access Management (IAM) user with credentials for programmatic access (access key ID, secret access key). Such a user can

CHAPTER 5. RUNNING WORKFLOWS IN THE CLOUD 13

• The "Batch" checkbox is selected when submitting the workflow, and/or

• The workflow design includes control flow elements, as described in the CLC Genomics Work-bench manual: https://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=Advanced_workflow_batching.html.

Each job within a batch is executed as a separate job in the cloud, potentially in parallel onseparate server instances.

You can follow the progress of the workflow in the Processes area of the CLC Workbench (seefigure 5.3). The icon next to the process indicates the status of the job submission:

• This icon indicates that data is being transferred to the cloud. Do not close yourcomputer or disconnect from the cloud when this icon is displayed.

• This icon indicates the job submission is complete, including any data transfer. You cannow close your computer or disconnect from the cloud if you wish.

Figure 5.3: The icon next to the cloud process in the Processes area indicates when the dataupload is complete.

When the job submission is complete, right-clicking the arrow next to a process and selecting"Show in Cloud Job Search" will open the batch in the Cloud Job Search (see figure 5.4). This isdescribed in more detail in chapter 6.

Figure 5.4: You can open an individiual job in the Cloud Job Search by right-clicking the arrow nextto a process in the Processes area. This option is only available when the job submission to thecloud is complete, including any data transfer.

Note: If the "Download result" checkbox was selected in the first configuration step, and youclose your laptop without exiting the CLC Workbench, then the results will be downloaded whenyou open the laptop again, once the execution has completed. However, if you exit the CLCWorkbench, then the results will not be downloaded automatically from the cloud when you startthe CLC Workbench again, even if the "Download result" checkbox had been selected. Resultscan always be found in the Cloud Job Search, as described in chapter 6.

Page 14: $MPVE1MVHJO - QIAGEN Bioinformatics · An AWS Identity and Access Management (IAM) user with credentials for programmatic access (access key ID, secret access key). Such a user can

Chapter 6

Cloud Job Search

To open the Cloud Job Search, go to

Edit | Search for Cloud Jobs ( )

The same menu item is also available in the main toolbar, under the Cloud menu (figure 6.1).

Figure 6.1: The Cloud menu provides access to the cloud configuration dialog, as well as the CloudJob Search functionality.

The Cloud Job Search provides functionality to:

• Search for jobs submitted to the CLC Genomics Cloud Engine, and inspect their status,progress and other properties

• Retrieve results produced by a job submitted to the CLC Genomics Cloud Engine

The Cloud Job Search lists the latest jobs submitted to the CLC Genomics Cloud Engine that fulfillspecific search criteria. When no search criteria are specified, the latest submitted jobs will beshown, as in figure 6.2. The newest jobs are listed at the top of the table by default.

To add search criteria, press the "Add search parameter" button. To start the search, press"Start search". Note that the table does not refresh automatically. Therefore, the "Start search"button has to be pressed every time an updated status is required.

Checking the "Only from logged-in user" checkbox will only search for jobs submitted by thecurrent logged-in user. If this checkbox is deselected, jobs submitted by any user of the CLCGenomics Cloud Engine can be retrieved.

The search results will always contain maximum 50 of the latest batches fulfilling the searchcriteria. If more than 50 batches were submitted to the cloud that fulfill the search criteria, thensome of them will not appear in the Cloud Job Search, and the search criteria need to be furtherrestricted in order to find them.

14

Page 15: $MPVE1MVHJO - QIAGEN Bioinformatics · An AWS Identity and Access Management (IAM) user with credentials for programmatic access (access key ID, secret access key). Such a user can

CHAPTER 6. CLOUD JOB SEARCH 15

Figure 6.2: The Cloud Job Search allows you to search for and inspect the properties of jobs thathave been submitted to the cloud.

You can select the columns to be displayed in the Cloud Job Search side panel (see figure 6.3).Additionally, under "View settings", the "Collapse batches" option can be used to control whetherbatches or individual jobs are shown in the table. When this box is checked, each batch is shownas a single row in the table. When this box is unchecked, each individual job is shown in its ownrow.

Figure 6.3: The columns to be displayed can be specified in the Cloud Job Search side panel. It isalso possible to display batches as individual jobs.

After a cloud job has completed, results can be retrieved via the Cloud Job Search tool byselecting the jobs of interested and then using the buttons at the bottom of the view (see figure6.4):

• Download All Results Download all results from the selected jobs, optionally including anyexported files. See section 6.1 for further details.

• Download Metadata Download only the Workflow Result Metadata table for the selectedjobs. Results can then be downloaded selectively using the Workflow Result Metadatatable, as described in section 6.2.

Page 16: $MPVE1MVHJO - QIAGEN Bioinformatics · An AWS Identity and Access Management (IAM) user with credentials for programmatic access (access key ID, secret access key). Such a user can

CHAPTER 6. CLOUD JOB SEARCH 16

Figure 6.4: The Download All Results and Download Metadata buttons can be used to retrieveresults from the cloud job submissions.

6.1 Downloading all resultsTo download all results, optionally including any exported data, select one or more rows in CloudJob Search, and press the Download All Results button. Data elements will be downloadedinto the Navigation Area in the folder structure specified by the workflow design for the outputelements.

Check the Also download exported files checkbox in the Result handling step of the wizard todownload exported files in addition to downloading the other results. You can specify the locationto save the exported data to in the Export directory field. Note: Specifying a cloud location forplacing exported files is not supported.

Figure 6.5: The Also download exported files button in the wizard allows you to download exportedfiles as well as results to be saved to the CLC Workbench Navigation Area.

Page 17: $MPVE1MVHJO - QIAGEN Bioinformatics · An AWS Identity and Access Management (IAM) user with credentials for programmatic access (access key ID, secret access key). Such a user can

CHAPTER 6. CLOUD JOB SEARCH 17

In cases where a job failed, the Download All Results button can be used to download logs andother technical information. More information about troubleshooting is provided in chapter 9.

6.2 Selective download of resultsWhen the amount of data generated by a workflow execution is so large that it is not practical todownload all the results at once, individual results can be downloaded using the Workflow ResultMetadata table, which is generated for each batch.

In the Cloud Job Search, download the Workflow Result Metadata table by pressing the DownloadMetadata button.

Each workflow output is shown in a row of the Workflow Result Metadata table. The path tothe output in the AWS S3 location is indicated in the column titled "External path", as shown infigure 6.6. When one or more rows in the table are selected, the Find Associated Data buttonis enabled. Pressing this button opens a new table, where the external references (not yetdownloaded) are listed, as well as any results already downloaded into the Navigation Area.

Figure 6.6: The "External path" column in the Workflow Result Metadata table shows the path tothe result in AWS S3. Press the Find Associated Data button to open a second table, where anyassociated data, in the Navigation Area and in the cloud, will be shown.

Individual data elements can be downloaded by selecting the required row in the table, andpressing the Download button. Data elements are placed in the subfolder (if any) specified bythe workflow design for the given output, and new folders are created as required. Therefore, werecommend selecting the same folder when downloading different results for the same workflow.This will ensure that the outputs are organized in the same folder structure that they would havebeen if the workflow had been executed in the CLC Workbench.

When an element is downloaded, it is automatically associated with the relevant row of theWorkflow Result Metadata table, and can be found when pressing the Find Associated Databutton again, as shown in 6.7. External references will not be removed from the Workflow ResultMetadata table by downloading a result, and results can be downloaded any number of times.

A connection to the CLC Genomics Cloud Engine is required to find the Workflow Result Metadatatable itself and download it in the Cloud Job Search. However, only an AWS S3 connection isrequired for downloading the results through the metadata table. That is, by saving the Workflow

Page 18: $MPVE1MVHJO - QIAGEN Bioinformatics · An AWS Identity and Access Management (IAM) user with credentials for programmatic access (access key ID, secret access key). Such a user can

CHAPTER 6. CLOUD JOB SEARCH 18

Figure 6.7: When the results have been downloaded, they will be automatically associated withthe given row of the Workflow Result Metadata table, and can be found by pressing the "FindAssociated Data" button again.

Result Metadata table, you will be able to download the results from AWS S3 at any later point,as long as the data is still available in AWS S3, even without a connection to the CLC GenomicsCloud Engine, and without using the Cloud Job Search.

Note: Exported data cannot be downloaded selectively. To download exported data, you mustuse the Download All Results button, see section section 6.1.

Page 19: $MPVE1MVHJO - QIAGEN Bioinformatics · An AWS Identity and Access Management (IAM) user with credentials for programmatic access (access key ID, secret access key). Such a user can

Chapter 7

Using the Cloud Plugin with the CLCGenomics Server

When connected to a CLC Genomics Server where the Cloud Server Plugin has been installed andconfigured by the server administrator, it is possible to import data to and export data from AWSS3 and submit jobs to the CLC Genomics Cloud Engine through the CLC Genomics Server. Thiscan be particularly useful when a large amount of data is located in a server location or a filesystem location available to the server. In these cases, the data will be uploaded to the clouddirectly from the server, thereby removing the need to keep the CLC Workbench running duringthe upload.

Submitting jobs to the cloud through the CLC Genomics Server works in a very similar way tosubmitting directly from a CLC Workbench. In the first wizard step when running the workflow,select "CLC Server Cloud". The drop-down menu contains the cloud presets that have beenconfigured by the server administrator. A cloud preset determines the priority of the jobssubmitted to the cloud through the server, as well as whether the results should be automaticallydownloaded or not. The server administrator also controls the cache bucket that will be used forsubmitting jobs through the CLC Genomics Server.

When submitting jobs to the cloud through the CLC Genomics Server, the user account registeredfor the submission will be the user logged in to the server, and not the user from the GCEconfiguration settings in the cloud configuration dialog in the CLC Workbench. In practice, theseuser accounts can be the same.

When a CLC Workbench is connected to a cloud-enabled CLC Genomics Server and no configurationdetails have been entered in the CLC Workbench itself, then the CLC Workbench will automaticallyobtain the AWS and GCE settings from the CLC Genomics Server.

Finding jobs and results using Cloud Job Search

To use Cloud Job Search to find jobs and the results of jobs submitted to CLC Genomics CloudEngine via a CLC Genomics Server, the AWS and GCE settings specified in the CLC Workbenchcloud configuration dialog should either be empty, in which case the details of the configurationsettings for the Cloud Server Plugin will be used, or must match those specified in the CloudServer Plugin configuration in the CLC Genomics Server.

19

Page 20: $MPVE1MVHJO - QIAGEN Bioinformatics · An AWS Identity and Access Management (IAM) user with credentials for programmatic access (access key ID, secret access key). Such a user can

CHAPTER 7. USING THE CLOUD PLUGIN WITH THE CLC GENOMICS SERVER 20

The "Only from logged-in user" option in the Cloud Job Search tool will find jobs submitted byboth the user authenticated through the cloud configuration dialog, and the server user whenlogged in to a CLC Genomics Server. This is particularly relevant if these user names differ.

Page 21: $MPVE1MVHJO - QIAGEN Bioinformatics · An AWS Identity and Access Management (IAM) user with credentials for programmatic access (access key ID, secret access key). Such a user can

Chapter 8

Configuring the Cloud Server Plugin

After installation on a CLC Genomics Server, the Cloud Server Plugin is configured through theserver web administrative interface. Information about connections to AWS and CLC GenomicsCloud Engine are entered via plugin settings. Information about the priorities of jobs sent to CLCGenomics Cloud Engine and how results from those jobs should be handled is configured usingcloud presets. Different presets can have configurations. When submitting a job, you select thepreset you wish to use.

8.0.1 Configuring the AWS and CLC Genomics Cloud Engine settings

To configure the Cloud Server Plugin, click on Plugins, and confirm that the Cloud Server Pluginis shown in the list. Then, click on the Edit Plugin settings... button at the bottom of the list(figure 8.1), followed by the Edit button next to the Cloud Server Plugin.

Figure 8.1: Configure the Cloud Server Plugin by clicking the Edit Plugin settings button.

This will allow you to configure the following parameters:

AWS access key ID: The access key ID for programmatic access, set up for the AWS IAM useras described in section 1.1.

AWS secret access key: The secret access key ID for programmatic access, set up for the AWSIAM user as described in section 1.1.

AWS region: We recommend using one of the geographically nearest regions.

GCE S3 cloud cache bucket name: The cache bucket to be used when uploading data to thecloud, as described in section 1.1.

21

Page 22: $MPVE1MVHJO - QIAGEN Bioinformatics · An AWS Identity and Access Management (IAM) user with credentials for programmatic access (access key ID, secret access key). Such a user can

CHAPTER 8. CONFIGURING THE CLOUD SERVER PLUGIN 22

GCE job mananger rest host URI: The URL pointing to the GCE instance deployed on the specifiedAWS account, as described in section 1.1. Note that this URL must be the same as the "URL"setting specified in the CLC Workbench, under File | Cloud Connection.

GCE http oauth2 client id: The client ID used by the to authenticate using OAuth2 ClientCredentials Grant.

GCE http oauth2 client secret: The client secret used by the CLC Genomics Server to authenticateusing OAuth2 Client Credentials Grant.

GCE http oauth2 authorization server: The access token endpoint of the authentication serverused by the CLC Genomics Cloud Engine.

Accept untrusted certificate: This checkbox should be selected when the GCE has been set upwith a self-signed certificate. We recommend setting up the GCE with a trusted certificate.

Validate settings: This option, which is enabled by default, validates the settings when you pressOK. Uncheck this box only if you need to temporarily store invalid settings.

GLOBAL_OVERRIDABLE: This setting has no effect for the Cloud Server Plugin and we recommendleaving it at the default settings.

Unlike the CLC Workbench, the CLC Genomics Server uses the OAuth2 Client Credentials Grantflow for authentication. This flow is suitable for server-to-server integration, and will ensure thatthe CLC Genomics Server will have valid credentials that do not need to be periodically renewed.CLC Workbench users who connect to the CLC Genomics Server will be granted access to the CLCGenomics Cloud Engine via the server’s credentials, but jobs will be run under the user’s username.

Please note the following security-related information:

• By configuring the AWS settings in CLC Genomics Server, CLC Workbench users connectingto the server and using the Cloud Plugin will be able to see each other’s data in AWS S3,as they will have access through the shared credentials stored in the server.

• There are no user permissions on jobs in GCE. This means that GCE users will be able tofind each other’s jobs, for example by using the Cloud Job Search functionality.

• It is possible to prevent a CLC Workbench user from being able to submit jobs to the cloudby configuring permissions for each cloud preset. (Global permissions | Cloud presets)

8.0.2 Configuring cloud presets

To create or edit existing cloud presets, log into the CLC Genomics Server web administrativeinterface and go to:

Job distribution | Execution node settings | Cloud settings .

Click on Add New Cloud Preset and fill in a preset name, job tag (optional), job priority and resulthandling strategy. See figure 8.2.

The "Job priority" setting affects how quickly a job will be scheduled for execution:

• ASAP: The highest priority level. These jobs are always executed first.

Page 23: $MPVE1MVHJO - QIAGEN Bioinformatics · An AWS Identity and Access Management (IAM) user with credentials for programmatic access (access key ID, secret access key). Such a user can

CHAPTER 8. CONFIGURING THE CLOUD SERVER PLUGIN 23

Figure 8.2: Options for configuring the cloud presets

• HIGH: The job will be placed in the priority queue with 60% probability.

• MEDIUM: The job will be placed in the priority queue with 30% probability.

• LOW: The job will be placed in the priority queue with 10% probability.

Permissions for each cloud preset can be configured under the Global permissions tab.

Page 24: $MPVE1MVHJO - QIAGEN Bioinformatics · An AWS Identity and Access Management (IAM) user with credentials for programmatic access (access key ID, secret access key). Such a user can

Chapter 9

Troubleshooting

If a workflow execution fails while the CLC Workbench is still connected to the CLC GenomicsCloud Engine, the workflow process will be shown in red in the Processes area, with themessage: "Errors occurred: see log". An error message will also be displayed, which may containinformation about the cause of failure.

If further information is required about a failure after the workflow submission is completed,we recommend finding the batch in the Cloud Job Search (see chapter 6), and downloading allresults for the entire batch. This will download any available logs related to the failure. The exactfiles that will be downloaded may vary, depending on the cause of the failure, but an example isshown in figure 9.1.

• Workflow log files are CLC data elements that can be opened within a CLC Workbench.These correspond to the logs produced when running a workflow within a CLC Workbenchor on a CLC Genomics Server.

• Result.json files are plain text files containing information about any outputs produced inAWS S3 for a particular job, including the exact paths to the outputs.

• gce.log files are plain text files containing information registered by the CLC GenomicsCloud Engine about the workflow execution process, and technical details about any errorsthat have occurred.

We recommend that the Workflow log is inspected first, as it may contain useful informationabout simple causes of failures, such as corrupt input data or errors in the workflow configuration.For more complicated cases, please email our Support team, attaching the files listed above([email protected]).

24

Page 25: $MPVE1MVHJO - QIAGEN Bioinformatics · An AWS Identity and Access Management (IAM) user with credentials for programmatic access (access key ID, secret access key). Such a user can

CHAPTER 9. TROUBLESHOOTING 25

Figure 9.1: The "Download All Results" functionality in the Cloud Job Search allows you to downloadlogs and other technical information if a workflow execution failed. The logs are downloaded tothe Navigation Area. Some of them will be plain text files, which can be opened in any externaltext-editing application.

Page 26: $MPVE1MVHJO - QIAGEN Bioinformatics · An AWS Identity and Access Management (IAM) user with credentials for programmatic access (access key ID, secret access key). Such a user can

Chapter 10

Installing and uninstalling Workbenchplugins

The following sections describe the installation and removal of plugins, including Cloud Plugin,on a CLC Workbench. For information about installing plugins on a CLC Genomics Server,please refer to https://resources.qiagenbioinformatics.com/manuals/clcserver/current/admin/index.

php?manual=Server_plugins.html and the information in this manual about configuring the CloudServer Plugin, provided in chapter 8.0.1.

Note: In order to install plugins and modules, the Workbench must be run in administrator mode.On Linux and Mac, it means you must be logged in as an administrator. On Windows, you can dothis by right-clicking the program shortcut and choosing "Run as Administrator".

Plugins are installed and uninstalled using the plugin manager.

Help in the Menu Bar | Plugins... ( ) or Plugins ( ) in the Toolbar

The plugin manager has two tabs at the top:

• Manage Plugins. This is an overview of plugins that are installed.

• Download Plugins. This is an overview of available plugins on QIAGEN Aarhus server.

10.1 InstallTo install a plugin, click the Download Plugins tab. This will display an overview of the pluginsthat are available for download and installation (see figure 10.1).

Select Cloud Plugin to display additional information about the plugin on the right side of thedialog. Click Download and Install to add the plugin functionalities to your workbench.

Accepting the license agreement

The End User License Agreement (EULA) must be read and accepted as part of the installationprocess. Please read the EULA text carefully, and if you agree to it, check the box next to thetext I accept these terms. If further information is requested from you, please fill this in beforeclicking on the Finish button.

26

Page 27: $MPVE1MVHJO - QIAGEN Bioinformatics · An AWS Identity and Access Management (IAM) user with credentials for programmatic access (access key ID, secret access key). Such a user can

CHAPTER 10. INSTALLING AND UNINSTALLING WORKBENCH PLUGINS 27

Figure 10.1: The plugins that are available for download.

If Cloud Plugin is not shown on the server but you have the installer file on your computer (forexample if you have downloaded it from our website), you can install the plugin by clicking theInstall from File button at the bottom of the dialog and specifying the plugin *.cpa file saved onyour computer.

When you close the dialog, you will be asked whether you wish to restart the workbench. Theplugin will not be ready for use until you have restarted.

10.2 UninstallPlugins are uninstalled using the plugin manager:

Help in the Menu Bar | Plugins... ( ) or Plugins ( ) in the Toolbar

This will open the dialog shown in figure 10.2.

Figure 10.2: The plugin manager with plugins installed.

Page 28: $MPVE1MVHJO - QIAGEN Bioinformatics · An AWS Identity and Access Management (IAM) user with credentials for programmatic access (access key ID, secret access key). Such a user can

CHAPTER 10. INSTALLING AND UNINSTALLING WORKBENCH PLUGINS 28

The installed plugins are shown in the Manage plugins tab of the plugin manager. To uninstall,select Cloud Plugin and click Uninstall.

If you do not wish to completely uninstall the plugin, but you do not want it to be used next timeyou start the Workbench, click the Disable button.

When you close the dialog, you will be asked whether you wish to restart the workbench. Theplugin will not be uninstalled until the workbench is restarted.