Upload
others
View
22
Download
0
Embed Size (px)
Citation preview
Tools and Algorithms in Bioinformatics
CLC Genomics Workbench
September 22, 2017
Dr. Matthew Cserhati
GCBA Guda lab
Logging on
•Open up a Remote Desktop, pairs
• IP, password, id will be handed out in class
•Right-click CLC GWB and run as Administrator• Let Dr. Guda and me help each group with password
Outline
• Introduction to CLC Genomics Workbench
• Guided genome assembly
• Workflows
• Plugins• IPA analysis
CLC Genomics Workbench - Introduction
• Manual: http://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/User_Manual.pdf
• Software downloadable at (if your lab is interested): https://www.qiagenbioinformatics.com/products/clc-genomics-workbench/• Must pay for license
Just For Your Information:
CLC Genomics Workbench - Introduction
• Widely used, cutting edge multifunctional Windows-based NGS analysis and visualization platform
• Allows you to do Guided genome assembly, RNA-seq analysis, Epigenomic analysis, De Novo Sequencing (see Genome Assembly class, week 15, Dec. 1), Microarray analysis
• Allows you to import your own NGS data or download from the Internet
• Workflow configuration (task automatization)
• Multiple plugins which allow extra functionality• E.g. IPA, Chip-seq, MetaGeneMark
Guided genome assembly
• As opposed to de novo assembly a genome from a related species will be used to guide assembly
• Task: assemble the genome of an unknown NucleoCytoplasmic Large DNA Virus (NCLDV), with id: “GD12”
• Guide genome: Paramecium Bursaria Chlorella Virus-1 (PBCV-1) genome, NCBI ID: JF411744.1
• Find the paired end reads in folder “guided_assembly”
PBCV-1
Guided genome assembly• NGS Core Tools
• Trim reads (5 minutes)• NGS Core Tools, Trim Sequences
• Map reads to reference (10 minutes)• Creates summary
• Mapped reads
• Unmapped reads
• plus log
• Extract Consensus Sequence from mapped reads (5 minutes)• This way we get the assembled genome sequence
Guided genome assembly – ORF detection
• Classical Sequence Analysis• Nucleotide Analysis
• Find Open Reading Frames to predict genes in the newly assembled genome
• Choose Genetic code #1
• Export results to .txt, .xls
• (Post-processing: sequence extraction, blast against known proteins, annotation)
CLC GWB workflows• Workflows in CLC GWB are designed analysis pipelines used to
automate data input and output creation using NGS data
• Based in the GUI environment of CLCGWB the user can• Drag and drop elements
• Inputs
• Tasks
• Connect them together into a pipeline
• Workflows can be made available for other users/researchers for common tasks• E.g. RNA-seq analysis
Resources
• Workflows: http://resources.qiagenbioinformatics.com/tutorials/Workflow-intro.pdf
• Data files: http://resources.qiagenbioinformatics.com/testdata/chrM-tutorial-data.zip [Download this!]• Two sets of reads
• Normal tissue
• Cancer tissue
• Human mitochondrial genome and annotation• Mit. genome sequence
• (37 mitochondrial) Genes and CDs
• SNV table
What will the workflow do?
• Alignment of reads to the reference genome
• Re-alignment of reads for better quality
• Detect variants and filter them against known variants
Selection of workflow elements and finished workflow
Click on ‘Add Element’ button at bottom of work panel to get list
Elements of the workflow• Local Realignment Tool: When reads are mapped to a genome, they
sometimes misalign• Indels
• Uses information from other reads near indel to realign reads across indel
• Variant Detection:• Basic Variant Detection: runs quickly, no error-model estimation
• Low Frequency Variant Detection: calls subset of basic variant detection; slowest, uses error-model estimation
• Fixed Ploidy Variant Detection: calls subset of variants from Low Freq Var Detection; difference is likely due to mapping or sequencing errors
Types of tracks (see page 536 of CLC GWB manual)
• Sequence track: Displays the chromosomes of the reference genome.
• Reads track: Displays how all of the (mapped) reads map to the reference genome. Zoomable.
• Variant track: Displays information on allele variants at the base pair level. Variants can be SNV, MNV, replacement, insertion, or deletion. [double-click to visualize in table format]
• Annotation track: displays location and length of different genetic elements. [double-click to visualize in table format]• Gene, CDS, peaks (ChIP-Seq)
• These first four are present in the example
• Other types of tracks: coverage graph, expression tracks
Tracks practical examples• Add all outputs from workflow to track list (plus add Genome)
• ‘Track tools’, ‘Create track list’
• Examine the created tracks
• Double-click on the annotation tracks (genomeTracks, Gene)• Find the ATP genes in the mitochondrial genome
• Search for Name contains ATP
• Filter for variants where the coverage >= 100 (normalData)
• Create a GC content graph for the NC_001807 genome• Track tools, Graphs, Create GC content graph
• This graph displays the GC% all along the genome sequence
IPA (Ingenuity Pathway Analysis) Plugin• Ingenuity Pathway Analysis:
https://www.qiagenbioinformatics.com/products/ingenuity-pathway-analysis/• A GUI tool for analyzing complex ‘omics’ data
• Gene network analysis and visualization• Upstream regulators• Disease analysis• Pathway analysis
• The results of RNA-seq analysis (week 8) can be uploaded to the IPA server (week 11)
• For this we use an integrated IPA-plugin
• Manual: http://resources.qiagenbioinformatics.com/manuals/ingenuitypathwayintegration/current/Ingenuity_Pathway_Analysis.pdf
IPA
Working with plugins• Manage plugins
• Manage existing plugins
• Update existing plugins
• Download plugins• Download and Install
• Install from file• .cpa file
• Search for Ingenuity Pathway Analysis plugin
IPA Plug-in
• Used mainly for statistical comparison data generated using the RNA-seq tools• Differential Expression for RNA-Seq tool
• For this you need a user ID and password in IPA
• You can either• Upload data only
• Upload and analyze
• Bonferroni
• Select log2 fold change
Data sets uploaded in IPA
Data sets in CLC GWB
Thanks for your attention!