29
CLC Genomics Workbench Version 12.0.3 User Setup Guide 4th October 2019

User Setup Guide - The University of Sydney

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: User Setup Guide - The University of Sydney

CLC Genomics Workbench

Version 12.0.3

User Setup Guide

4th October 2019

Page 2: User Setup Guide - The University of Sydney

CLC Genomics Workbench User Guide

Sydney Informatics Hub 1 May 2018 1

Table of Contents

Introduction............................................................................................................... 2

Your subscription .................................................................................................... 2

Bookings on PPMS ................................................................................................. 2

Acknowledging the Sydney Informatics Hub.......................................................... 3

Publication Incentives ............................................................................................. 3

CLC Genomics Workbench Installation and License Server Access ................ 4

Downloading and installing CLC Genomics Workbench on your desktop ............ 4

Updating the license server address ...................................................................... 5

Connecting to the CLC Server (Artemis) ............................................................... 6

Install the Server plugin .......................................................................................... 6

Connect to CLC Server (Artemis) .......................................................................... 6

Pre-CLC on Artemis subscribers ............................................................................ 8

Microbial Genomics Module Plugin Installation ................................................... 9

Importing and Exporting Data .............................................................................. 10

Mounting classic RDS to your local desktop ........................................................ 10

Downloading and installing FileZilla ..................................................................... 10

Transferring small data from your local desktop or classic RDS to CLC-project 11

Transferring large data or data stored on RCOS to and from CLC-project ......... 16

Data Analysis on CLC Server – Artemis .............................................................. 24

Data management .................................................................................................. 28

How secure is my data? ....................................................................................... 28

The import-export directory .................................................................................. 28

Further assistance ................................................................................................ 28

Page 3: User Setup Guide - The University of Sydney

CLC Genomics Workbench User Guide

Sydney Informatics Hub 1 May 2018 2

Introduction

These instructions enable CLC Genomics Workbench subscribers to use one of five University-owned network licenses. Please note that each subscription covers one license; this means that a single user per subscribed group should access the license server at a time. This can be managed via the booking calendar (PPMS). The CLC Server was upgraded to utilize the University’s High-Performance Computer (Artemis) on the 30th of April, 2018. This provides users with access to Artemis’ resources and streamline connectivity with the Research Data Store (RDS). Artemis offers high computing power, higher throughput for CLC jobs and improved data security.

Your subscription

Thank you for purchasing a 6-month or 12-month subscription to use the CLC Genomics Workbench. During this time, please ensure that you manage your data efficiently and back important data up to the RDS regularly. At the end of your subscription, if you do not which to re-subscribe, your data will automatically be archived by the Sydney Informatics Hub.

Bookings on PPMS

You should have created an account on the Sydney Informatics Hub PPMS when you purchased your CLC Genomics subscription. If you have not, please use your University login credentials and create an account by clicking “account creation request”. This will allow to book and use a CLC Genomics Workbench/Microbial Genomics Module license. After logging into PPMS, please book time using the calendar on Sydney Informatics Hub PPMS, https://au.ppms.info/sydney/?SIH under ‘Book a system’. All CLC Genomics Workbench licenses grant access to the Microbial Genomics Module.

To ensure fair use of all CLC Genomics Licenses, bookings are essential, and you will not be able to log onto the CLC Genomics Workbench without a valid booking on SIH’s PPMS. You will receive an email 15 minutes before your booking ends to remind you to save your session and log off. If licenses are available in the next time slot, you may extend or add a new booking to continue using CLC Genomics Workbench.

Page 4: User Setup Guide - The University of Sydney

CLC Genomics Workbench User Guide

Sydney Informatics Hub 1 May 2018 3

Acknowledging the Sydney Informatics Hub

The CLC Genomics Workbench is a service that is supported and provided by the Sydney Informatics Hub. Please support us by acknowledging us in your work. This is vital for the ongoing funding of the Sydney Informatics Hub. Here is a suggestion for how you can acknowledge us:

“This research was supported by the Sydney Informatics Hub, funded by the University of Sydney.”

Page 5: User Setup Guide - The University of Sydney

CLC Genomics Workbench User Guide

Sydney Informatics Hub 1 May 2018 4

CLC Genomics Workbench Installation and License Server Access

You can use CLC Genomics Workbench as a stand-alone application on your desktop. The following instructions describe how to install the CLC Genomics Workbench and enable access to the University’s CLC Genomics Workbench licenses.

Downloading and installing CLC Genomics Workbench on your desktop

If you do not have the latest CLC Genomics Workbench installed on your computer:

1. Select the appropriate download for your computer: Windows https://download.clcbio.com/clcgenomicswb/12.0.3/CLCGenomicsWorkbench_12_0_3_64.exe Mac https://download.clcbio.com/clcgenomicswb/12.0.3/CLCGenomicsWorkbench_12_0_3.dmg Linux https://download.clcbio.com/clcgenomicswb/12.0.3/CLCGenomicsWorkbench_12_0_3_64.sh

2. Follow the installation setup, leave the default options checked and click accept/next

as required

3. At the ‘You need a license’ window: click ‘Configure License Server Connection’, ‘Next’, check ‘Enable license server connection’ and select ‘Manually specify license server’. Enter the following details (please type the IP address not copy paste): Hostname/IP address: clcgenomics.lic.sydney.edu.au Port: 6200 Username: Your unikey (e.g. abcd1234)

**Important note regarding Workbench upgrades: When a Workbench upgrade is available, you will be prompted to upgrade the next time you open your Workbench. Please do not upgrade before confirming with the Sydney Informatics Hub at [email protected] that the latest Workbench version is compatible with the current USyd CLC License Server and CLC Server versions.

Page 6: User Setup Guide - The University of Sydney

CLC Genomics Workbench User Guide

Sydney Informatics Hub 1 May 2018 5

4. Click ‘Next’ then ‘Finish’

Updating the license server address If you already have CLC Genomics Workbench installed on your computer, you can update the license server details by following the instructions below.

1. For Windows, you must right click on the CLC icon and click ‘Run as administrator’. Mac – start the CLC Genomics Workbench as usual

2. Go to Help -> License Manager 3. Click ‘Upgrade Workbench License’ 4. At the “You need a license” window: click ‘Configure License Server Connection’,

‘Next’, check ‘Enable license server connection’ and select ‘Manually specify license server’. Enter the following details: Hostname/IP address: clcgenomics.lic.sydney.edu.au Port: 6200

5. Click ‘Finish’

Page 7: User Setup Guide - The University of Sydney

CLC Genomics Workbench User Guide

Sydney Informatics Hub 1 May 2018 6

Connecting to the CLC Server (Artemis)

This will let you run CLC jobs on Artemis instead of your local machine.

Install the Server plugin

1. For Windows, you must right click on the CLC icon and click ‘Run as administrator’. Mac – start as usual

2. Go to Help -> Plugins -> Download plugins 3. Scroll down the list of plugins and select ‘CLC Workbench Client Plugin.’ Download

and install 4. Close the window and restart the workbench when prompted. You will now have a

Server login option under the ‘File’ tab

Connect to CLC Server (Artemis) Each time you open the Workbench, you need to login to the Server. If you routinely run all your analyses on the Server, you may choose to check the ‘Automatic login’ option on the CLC Server Login window.

1. Nominate a Researcher Dashboard (also known as DashR) project to have a directory in the CLC Genomics Server and send your project information as a PDF file to [email protected]. This PDF file can be downloaded from DashR (log in, click

relevant project under “Project Name” and click the PDF icon . Download and save this PDF). Note: all users on this project will be granted access to your new CLC project folder Once we have processed your request, you can proceed to step 2.

2. Under ‘File’ select ‘CLC Server Login’

Page 8: User Setup Guide - The University of Sydney

CLC Genomics Workbench User Guide

Sydney Informatics Hub 1 May 2018 7

3. At “User name”, enter your unikey (e.g. abcd1234) followed by your password in the next field. Enter clcgenomics.hpc.sydney.edu.au into the server host field and change server port to 7777. Check ‘save username and password’ and ‘automatic login’ if desired

4. Once you have connected to the CLC Server (Artemis), on the left-hand panel in the navigation area, click “clc_data” and “projects”. Your new CLC-project folder where you can import files, create sub-folders and conduct analysis will be here.

Page 9: User Setup Guide - The University of Sydney

CLC Genomics Workbench User Guide

Sydney Informatics Hub 1 May 2018 8

Here the CLC-project used is called CLC-SIHsandbox. Data can now be imported to your CLC-project folder and jobs run on Artemis by selecting a ‘Grid’ option instead of ‘Workbench’ during run or import/export operations.

Pre-CLC on Artemis subscribers

To older subscribers of CLC, you will notice that your data on the old CLC Genomics Server has been moved to “clc data” -> “legacy” -> “abcd1234”, where abcd1234 is your unikey. This data may not include recent data created on Thursday the 26th of April. If any data was transferred, you can still export it out of the old CLC Genomics Server and import it into the Artemis CLC Genomics Server manually (see Importing and Exporting data section) until Friday the 11th of May. You are welcome to continue your analysis in the “legacy” folder. However, please note that data in the “legacy” directory is not completely secure – other users can access your data here. If you would like a folder with restricted access, please nominate a DashR project directory and we can set up a CLC-project folder for you (step 1 in “Connect to CLC Server – Artemis”). Please contact [email protected] if you have any issues or further questions.

Page 10: User Setup Guide - The University of Sydney

CLC Genomics Workbench User Guide

Sydney Informatics Hub 1 May 2018 9

Microbial Genomics Module Plugin Installation

There are two MGM licenses. Any subscribed user can use MGM (at no additional cost). Once you have installed the plugin, you will see ‘Microbial Genomics Module’ under your toolbox. Use of an MGM tool uses a license, so please only use an MGM tool if you have booked a license to avoid interfering with other users’ booked MGM sessions.

1. For Windows, you must right click on the CLC icon and click ‘Run as administrator’. Mac – start as usual

2. In the workbench, go to Help -> Plugins and resources -> Download plugins 3. Scroll down the list of plugins and select ‘CLC Microbial Genomics Module.’ 4. Download (this may take some time) and install 5. Close the window and restart the workbench when prompted. You will now have MGM

in your toolbox To get you started:

• You can create a project folder by the standard process of ‘File’ -> ‘New’ -> ‘Folder’, or by clicking the ‘New’ icon near the top left of the workbench.

• To import data, similarly you can use ‘File’ -> ‘Import’ or use the ‘Import’ icon in the

taskbar.

• For importing sequencing data, be sure to select the relevant sequencing platform and ensure the correct check boxes are selected, such as whether the data is paired end.

• Analyses are found under ‘Toolbox’. For detailed information on the analysis tools available, see the documentation at: http://www.clcbio.com/products/clc-genomics-workbench/?action=download_manual&product=1637

• Review the numerous specific workflow tutorials available at:

http://www.clcbio.com/support/tutorials/

Page 11: User Setup Guide - The University of Sydney

CLC Genomics Workbench User Guide

Sydney Informatics Hub 1 May 2018 10

Importing and Exporting Data

Mounting classic RDS to your local desktop

The import-export folder, a classic RDS system, (described in importing large data or data stored on RCOS) and your own group’s classic RDS systems can be mounted onto your local computer. This is useful for transferring data between different systems (e.g. RDS and Artemis), even when you are not using the CLC Genomics Workbench. If you are off-campus, you will need to connect to the University’s Network by using a VPN client. Please follow the instructions provided here by ICT to map your classic RDS to your Windows or MacOS. The import-export server address is: \\research-data-2.shared.sydney.edu.au\RDS-02\PRJ-CLC\import-export Input your unikey login details when prompted.

Downloading and installing FileZilla FileZilla is a free application that will allow you to transfer files between your local desktop (including any drives mounted to your local desktop) and any other “remote” server, including Artemis and RCOS. For CLC Genomics, it is useful for uploading/downloading data between your local computer or mounted classic RDS (“local”) to the import-export directory (“remote”, described in importing large fata of data stored on RCOS). If your DashR project uses RCOS and not classic RDS, you can mount the import-export directory instead (“local”) and connect to RCOS as a “remote” server. To download Filezilla, go to this website: https://filezilla-project.org/

Download the client, not the server. Install Filezilla by following the prompts.

Page 12: User Setup Guide - The University of Sydney

CLC Genomics Workbench User Guide

Sydney Informatics Hub 1 May 2018 11

Transferring small data from your local desktop or classic RDS to CLC-project

Please use these instructions from importing small files (<10 Gb) from your desktop or classic RDS which has been mounted onto your desktop. The next set of instructions require you to keep your computer on with a constant internet connection and connection to the University Network, which is why it is only recommended for transferring small files. For larger files, you will need to use the import-export directory which is described in ‘Importing large data’.

1. Open the CLC Genomics Workbench and make sure you are connected to the CLC Server (see Connect to CLC Server (Artemis)).

2. If you are importing data from your classic RDS, you will need to have this mounted onto your local desktop (see Mounting classic RDS to your local desktop. You don’t need to do anything if you are importing from your local desktop). In the example provided here, the abbreviated DashR project name is ‘DNMSinDogs’ and it uses classic RDS. We will import data in the “InputFastq” folder into CLC Genomics.

3. Click the import icon and the type of files you would like to import

Page 13: User Setup Guide - The University of Sydney

CLC Genomics Workbench User Guide

Sydney Informatics Hub 1 May 2018 12

4. Select ‘Workbench’ and click ‘Next’

5. You will then be asked to select the files to import. If you are importing from classic

RDS, you will find your mounted drive under devices. Navigate to the folder containing your files. Click and hold shift to select multiple files. Click ‘Next’.

Page 14: User Setup Guide - The University of Sydney

CLC Genomics Workbench User Guide

Sydney Informatics Hub 1 May 2018 13

6. Leave ‘Save’ selected and click ‘Next’

Page 15: User Setup Guide - The University of Sydney

CLC Genomics Workbench User Guide

Sydney Informatics Hub 1 May 2018 14

7. Now select where you want to save the data on the CLC Genomics Workbench. You should select your CLC-project folder. In this example, we will save the files in the CLC-project folder named CLC-SIHsandbox (yours will be called something different). Inside this folder, we previously created a folder called RawFastqs. We will select this folder.

8. Click ‘Finish’

9. The progress of the file transfer will appear in the Toolbox. Once complete, you should see your files in your chosen folder.

Page 16: User Setup Guide - The University of Sydney

CLC Genomics Workbench User Guide

Sydney Informatics Hub 1 May 2018 15

A similar process is used to export files from your CLC-project folder to your local desktop or classic RDS that has been mounted to your local desktop.

Page 17: User Setup Guide - The University of Sydney

CLC Genomics Workbench User Guide

Sydney Informatics Hub 1 May 2018 16

Transferring large data or data stored on RCOS to and from CLC-project

The next set of instructions will describe how to import large data or data that is stored on RCOS into your CLC-project folder. This requires the import-export directory to be mounted onto your local computer (see mounting classic RDS to your local desktop). You will also need FileZilla (see Downloading and installing FileZilla). In the example outlined below, we will import data from a DashR project named ‘Tracy’ to our CLC-project folder, here named CLC-SIHsandbox. The advantage of this process is that you can turn off your computer or CLC Genomics Workbench after you have started the transfer. The transfer will be running in the background on Artemis between the import-export directory and your CLC-project folder. This is handy for very large file transfers which may take hours to complete.

1. Ensure that the import-export directory is mounted to your local computer (see Mounting classic RDS to your local desktop).

2. Open FileZilla. We will use FileZilla to transfer between local (import-export mounted directory) and remote servers (RCOS – PRJ-Tracy). First type the following into the fields at the top: Host: sftp://rcos-int.sydney.edu.au Username: abcd1234 Password: unikeypassword Replace abcd1234 with your own unikey and type in your regular unikey password into the ‘Password’ field. Leave ‘Port’ empty (default). Click ‘Quickconnect’.

3. The left-hand panels show your local desktop. Navigate to the import-export directory. Double click on your unikey folder to enter it.

Page 18: User Setup Guide - The University of Sydney

CLC Genomics Workbench User Guide

Sydney Informatics Hub 1 May 2018 17

4. The right-hand panel shows your remote site (in this example RCOS). Navigate to the location of your files to import in your project’s RCOS directory (Your RCOS directory will be called /rds/PRJ-ProjectName. In this example the ProjectName is Tracy)

Note: You can navigate to a folder by typing in the “path” after “Local site:” (/Volumes/import-export/abcd1234) or “Remote site:” (/rds/PRJ-Tracy/InputFastq) as above. Alternatively, you can click into folders below. You can click on the folder that is called “..” to go back up to the parent folder.

5. Drag and drop the files you wish to copy (see below). You can select multiple files by

clicking and holding the shift key. Your files will now be in your unikey folder in the import-export directory.

6. Open the CLC Genomics Workbench and click ‘Import’. Select the appropriate file type.

Page 19: User Setup Guide - The University of Sydney

CLC Genomics Workbench User Guide

Sydney Informatics Hub 1 May 2018 18

7. Select ‘Grid’ and in the drop-down menu, select one of the “Artemis Data Transfer”

options according to how large your data is (if you are unsure, make your best guestimate). Click ‘Next’.

8. Select ‘On the server or a place that the server has access to’. Click ‘Next’

Page 20: User Setup Guide - The University of Sydney

CLC Genomics Workbench User Guide

Sydney Informatics Hub 1 May 2018 19

9. You will see your unikey directory under import-export. Select the files you wish to import (click and hold shift to select multiple files). Then click the blue arrow pointing right.

10. Once the files you wish to upload are in “Selected Elements”, click ‘Next’

Page 21: User Setup Guide - The University of Sydney

CLC Genomics Workbench User Guide

Sydney Informatics Hub 1 May 2018 20

11. Leave ‘Save’ selected and click ‘Next’

Page 22: User Setup Guide - The University of Sydney

CLC Genomics Workbench User Guide

Sydney Informatics Hub 1 May 2018 21

12. Select your CLC-project directory under clc_data. Here the CLC-project directory is

called CLC-SIHsandbox and we have created a folder called ‘RawFastq2’. Click ‘Finish’.

13. On the bottom left corner, a message will appear to indicate that your transfer job is queued on Artemis. In this example, “2210196” is the job ID.

14. [Optional] For users who are experienced with Artemis, you can check the progress of the job transfer by typing the following into the command line: qstat <job id>

Page 23: User Setup Guide - The University of Sydney

CLC Genomics Workbench User Guide

Sydney Informatics Hub 1 May 2018 22

Under ‘S’ (Status), the letter ‘R’ indicates that the job is running. The letter ‘Q’ means that the job is in queue.

15. Email notifications will also advise you of the status of your job. You will receive one when your job has begun. You will receive another once your job has completed. The email will appear similar to the one below: Note: You will receive a third email stating that there was a “Post job file processing error” (see below) – please ignore this email.

16. Your files that were imported from the import-export folder should now be in the selected sub-folder (‘RawFastq2’) in your CLC-project folder (CLC-SIHsandbox).

17. Please remove your data from the import-export directory. The import-export directory is not secure so it is important that the data is removed once your data transfer job has completed. Any data left in the import-export directory may be periodically deleted without warning.

Page 24: User Setup Guide - The University of Sydney

CLC Genomics Workbench User Guide

Sydney Informatics Hub 1 May 2018 23

Page 25: User Setup Guide - The University of Sydney

CLC Genomics Workbench User Guide

Sydney Informatics Hub 1 May 2018 24

Data Analysis on CLC Server – Artemis

Perform a single analysis

1. Select an analytical task

2. Select ‘Grid’ and an appropriate Compute node on Artemis in the drop-down menu to process your job. Select this according to the computational resources that your job requires (estimate to the best of your ability). ‘C’ queues to do not provide email notifications of when the job will begin and end, ‘CE’ queues do send email notifications.

3. Select the parameters for your task. See the CLC Manual for an explanation of the analysis tools and parameters. http://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/702/index.php?manual=Introduction_CLC_Genomics_Workbench.html

Page 26: User Setup Guide - The University of Sydney

CLC Genomics Workbench User Guide

Sydney Informatics Hub 1 May 2018 25

Here all other options were left as default. Click ‘Next’ to continue.

Page 27: User Setup Guide - The University of Sydney

CLC Genomics Workbench User Guide

Sydney Informatics Hub 1 May 2018 26

4. Select a location within your CLC-project folder to save your results.

5. A message will indicate the CLC Server has submitted a job to the compute node and a job id has been allocated. You can also see the progress of your job in CLC Genomics on the bottom left corner.

6. You can check the status of the job in a command line terminal connected to Artemis HPC by typing qstat <job id>

The status of the job will also be displayed in the CLC Genomics Workbench and emailed to you. Note: Please ignore any “Post job file processing error” emails that you receive.

Page 28: User Setup Guide - The University of Sydney

CLC Genomics Workbench User Guide

Sydney Informatics Hub 1 May 2018 27

7. Your results should appear in the CLC Server folder location you specified.

Page 29: User Setup Guide - The University of Sydney

CLC Genomics Workbench User Guide

Sydney Informatics Hub 1 May 2018 28

Data management

The data management guidelines for data stored on the CLC server are similar to those for the Artemis HPC. The CLC server is recommended to be used for data analysis and data processing only and not for data storage. For data that is not accessed frequently, this data should be moved/exported to your Classic RDS or RCOS RDS. If you have large datasets (>1TB) that are accessed frequently and needs to be stored on the CLC Server, please let the Sydney Informatics Hub know ([email protected] or +612 8627 6286).

How secure is my data?

Data that is in your CLC-Project folder is only accessible to the users in your DashR project. This data is not backed up. We recommend that you store valuable data in one of the RDS systems – classic RDS or RCOS. You can read more about each of these RDS systems here.

The import-export directory The import-export directory is to be purely used for data transfers as it is not completely secure. Once your transfer is complete, we recommend you delete your data from this directory immediately. Any data left in the import-export directory may be periodically deleted without warning.

Further assistance For assistance with installation, support, or bioinformatics training please contact the Sydney Informatics Hub at [email protected] or on +61 2 8627 6286.