Tools and Algorithms in Bioinformatics CLC Genomics …...Tools and Algorithms in Bioinformatics CLC Genomics Workbench September 22, 2017 Dr. Matthew Cserhati GCBA Guda lab. Logging

Tools and Algorithms in Bioinformatics

CLC Genomics Workbench

September 22, 2017

Dr. Matthew Cserhati

GCBA Guda lab

Logging on

•Open up a Remote Desktop, pairs

• IP, password, id will be handed out in class

•Right-click CLC GWB and run as Administrator• Let Dr. Guda and me help each group with password

Outline

• Introduction to CLC Genomics Workbench

• Guided genome assembly

• Workflows

• Plugins• IPA analysis

CLC Genomics Workbench - Introduction

• Manual: http://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/User_Manual.pdf

• Software downloadable at (if your lab is interested): https://www.qiagenbioinformatics.com/products/clc-genomics-workbench/• Must pay for license

Just For Your Information:

http://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/User_Manual.pdf

https://www.qiagenbioinformatics.com/products/clc-genomics-workbench/

CLC Genomics Workbench - Introduction

• Widely used, cutting edge multifunctional Windows-based NGS analysis and visualization platform

• Allows you to do Guided genome assembly, RNA-seq analysis, Epigenomic analysis, De Novo Sequencing (see Genome Assembly class, week 15, Dec. 1), Microarray analysis

• Allows you to import your own NGS data or download from the Internet

• Workflow configuration (task automatization)

• Multiple plugins which allow extra functionality• E.g. IPA, Chip-seq, MetaGeneMark

Guided genome assembly

• As opposed to de novo assembly a genome from a related species will be used to guide assembly

• Task: assemble the genome of an unknown NucleoCytoplasmic Large DNA Virus (NCLDV), with id: “GD12”

• Guide genome: Paramecium Bursaria Chlorella Virus-1 (PBCV-1) genome, NCBI ID: JF411744.1

• Find the paired end reads in folder “guided_assembly”

PBCV-1

Guided genome assembly• NGS Core Tools

• Trim reads (5 minutes)• NGS Core Tools, Trim Sequences

• Map reads to reference (10 minutes)• Creates summary

• Mapped reads

• Unmapped reads

• plus log

• Extract Consensus Sequence from mapped reads (5 minutes)• This way we get the assembled genome sequence

Guided genome assembly – ORF detection

• Classical Sequence Analysis• Nucleotide Analysis

• Find Open Reading Frames to predict genes in the newly assembled genome

• Choose Genetic code #1

• Export results to .txt, .xls

• (Post-processing: sequence extraction, blast against known proteins, annotation)

CLC GWB workflows• Workflows in CLC GWB are designed analysis pipelines used to

automate data input and output creation using NGS data

• Based in the GUI environment of CLCGWB the user can• Drag and drop elements

• Inputs

• Tasks

• Connect them together into a pipeline

• Workflows can be made available for other users/researchers for common tasks• E.g. RNA-seq analysis

Resources

• Workflows: http://resources.qiagenbioinformatics.com/tutorials/Workflow-intro.pdf

• Data files: http://resources.qiagenbioinformatics.com/testdata/chrM-tutorial-data.zip [Download this!]• Two sets of reads

• Normal tissue

• Cancer tissue

• Human mitochondrial genome and annotation• Mit. genome sequence

• (37 mitochondrial) Genes and CDs

• SNV table

http://resources.qiagenbioinformatics.com/tutorials/Workflow-intro.pdf

http://resources.qiagenbioinformatics.com/testdata/chrM-tutorial-data.zip

What will the workflow do?

• Alignment of reads to the reference genome

• Re-alignment of reads for better quality

• Detect variants and filter them against known variants

Selection of workflow elements and finished workflow

Click on ‘Add Element’ button at bottom of work panel to get list

Elements of the workflow• Local Realignment Tool: When reads are mapped to a genome, they

sometimes misalign• Indels

• Uses information from other reads near indel to realign reads across indel

• Variant Detection:• Basic Variant Detection: runs quickly, no error-model estimation

• Low Frequency Variant Detection: calls subset of basic variant detection; slowest, uses error-model estimation

• Fixed Ploidy Variant Detection: calls subset of variants from Low Freq Var Detection; difference is likely due to mapping or sequencing errors

Types of tracks (see page 536 of CLC GWB manual)

• Sequence track: Displays the chromosomes of the reference genome.

• Reads track: Displays how all of the (mapped) reads map to the reference genome. Zoomable.

• Variant track: Displays information on allele variants at the base pair level. Variants can be SNV, MNV, replacement, insertion, or deletion. [double-click to visualize in table format]

• Annotation track: displays location and length of different genetic elements. [double-click to visualize in table format]• Gene, CDS, peaks (ChIP-Seq)

• These first four are present in the example

• Other types of tracks: coverage graph, expression tracks

Tracks practical examples• Add all outputs from workflow to track list (plus add Genome)

• ‘Track tools’, ‘Create track list’

• Examine the created tracks

• Double-click on the annotation tracks (genomeTracks, Gene)• Find the ATP genes in the mitochondrial genome

• Search for Name contains ATP

• Filter for variants where the coverage >= 100 (normalData)

• Create a GC content graph for the NC_001807 genome• Track tools, Graphs, Create GC content graph

• This graph displays the GC% all along the genome sequence

IPA (Ingenuity Pathway Analysis) Plugin• Ingenuity Pathway Analysis:

https://www.qiagenbioinformatics.com/products/ingenuity-pathway-analysis/• A GUI tool for analyzing complex ‘omics’ data

• Gene network analysis and visualization• Upstream regulators• Disease analysis• Pathway analysis

• The results of RNA-seq analysis (week 8) can be uploaded to the IPA server (week 11)

• For this we use an integrated IPA-plugin

• Manual: http://resources.qiagenbioinformatics.com/manuals/ingenuitypathwayintegration/current/Ingenuity_Pathway_Analysis.pdf

https://www.qiagenbioinformatics.com/products/ingenuity-pathway-analysis/

http://resources.qiagenbioinformatics.com/manuals/ingenuitypathwayintegration/current/Ingenuity_Pathway_Analysis.pdf

IPA

Working with plugins• Manage plugins

• Manage existing plugins

• Update existing plugins

• Download plugins• Download and Install

• Install from file• .cpa file

• Search for Ingenuity Pathway Analysis plugin

IPA Plug-in

• Used mainly for statistical comparison data generated using the RNA-seq tools• Differential Expression for RNA-Seq tool

• For this you need a user ID and password in IPA

• You can either• Upload data only

• Upload and analyze

• Bonferroni

• Select log2 fold change

Data sets uploaded in IPA

Data sets in CLC GWB

Thanks for your attention!

Documents

Tools and Algorithms in Bioinformatics CLC Genomics …...Tools and Algorithms in Bioinformatics CLC Genomics Workbench September 22, 2017 Dr. Matthew Cserhati GCBA Guda lab. Logging