Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
1
__________________________________________________________________________________________________Fall, 2017 GCBA/MGCB/BMI 815
Tools and Algorithms in BioinformaticsGCBA815/MCGB815/BMI815, Fall 2017
Week 10: Biological Data Visualization
Jasjit Banwait, Ph.D.Bioinformatics Data Analyst (Guda Lab)
Department of Genetics, Cell Biology and Anatomy
University of Nebraska Medical Center
__________________________________________________________________________________________________Fall, 2017 GCBA/MGCB/BMI 815
1. Introduction to Biological Data Visualization
2. Data Visualization Tools
1. Cytoscape – Biological Network Visualization Tool
2. Circos - Genome-wide Circular plot
Outline
2
__________________________________________________________________________________________________Fall, 2017 GCBA/MGCB/BMI 815
Data Deluge
__________________________________________________________________________________________________Fall, 2017 GCBA/MGCB/BMI 815
3
__________________________________________________________________________________________________Fall, 2017 GCBA/MGCB/BMI 815
5 V’s of Biological Data
Biological data has:VarietyVariabilityVolumeNeed VeracityNeed Visualization
Schultze, Joachim. (2015). Teaching 'big data' analysis to young immunologists. Nature immunology. 16. 902-5. 10.1038/ni.3250.
__________________________________________________________________________________________________Fall, 2017 GCBA/MGCB/BMI 815
• Makes the data accessible
• Utilizes the computational power to understand the global picture
• Enables insight
• Enables integrative research
• Communicates
Why Visualization is Important?
Adopted from: Alexender Lex, Havard School of Engineering and Applied Sciences
4
__________________________________________________________________________________________________Fall, 2017 GCBA/MGCB/BMI 815
http://selection.datavisualization.ch/
__________________________________________________________________________________________________Fall, 2017 GCBA/MGCB/BMI 815
• Cytoscape is a platform for visualizing molecular interaction
networks and biological pathways
• Allows integrating networks with annotations, gene expression
profiles and other attribute data
• Open-source platform
• Current version is 3.5.1
• http://www.cytoscape.org/
Introduction to Network Visualization with Cytoscape
5
__________________________________________________________________________________________________Fall, 2017 GCBA/MGCB/BMI 815
• Network is a mathematical structure composed of points
connected by lines
• Networks can be undirected or directed, depending on whether
the interaction between two neighboring nodes proceeds in both
directions or in only one of them, respectively.
Networks
Network GraphNodes Vertices (points)
Links Edges (Lines)
__________________________________________________________________________________________________Fall, 2017 GCBA/MGCB/BMI 815
• A network can be connected (presented by a single component)
or disconnected (presented by several disjoint components)
• Networks having no cycles are termed trees. The more cycles the
network has, the more complex it is.
Networks as Graphs
connected disconnected
trees
cyclic graphs
6
__________________________________________________________________________________________________Fall, 2017 GCBA/MGCB/BMI 815
Path
Stars
Cycles
Complete Graphs
Bipartite Graphs
Basic Types of Networks
__________________________________________________________________________________________________Fall, 2017 GCBA/MGCB/BMI 815
Air Transportation Network
https://flowingdata.com/2016/05/31/air-transportation-network/
7
__________________________________________________________________________________________________Fall, 2017 GCBA/MGCB/BMI 815
Social Network
http://smartblogs.com/wp-content/uploads/2014/03/social-network1.jpg
__________________________________________________________________________________________________Fall, 2017 GCBA/MGCB/BMI 815
Infection Spread Risk Network
Gomes MFC, Pastore y Piontti A, Rossi L, Chao D, Longini I, Halloran ME, Vespignani A. Assessing the International Spreading Risk Associated with the 2014 West African Ebola Outbreak. PLOS Currents Outbreaks. 2014 Sep 2 . Edition 1. doi: 10.1371/currents.outbreaks.cd818f63d40e24aef769dda7df9e0da5.
8
__________________________________________________________________________________________________Fall, 2017 GCBA/MGCB/BMI 815
• Systems biology approach
• Represents molecular entities
• Represents interactions
• Data types
• Pathways/reactions
• Interaction networks
Network Biology
__________________________________________________________________________________________________Fall, 2017 GCBA/MGCB/BMI 815
• Nodes: genes or other molecules
• Edges: interaction – can contain weights, directions
Biological Interaction Networks
Magtanong et al. 2011 Nature
9
__________________________________________________________________________________________________Fall, 2017 GCBA/MGCB/BMI 815
• Intra Cellular Networks
• Protein interaction networks
• Metabolic Networks
• Signaling Networks
• Gene Regulatory Networks
• Disease Networks
• Inter Cellular Networks
• Organ and Tissue Networks
• Ecological Networks
• Evolution Networks
Biological Networks
__________________________________________________________________________________________________Fall, 2017 GCBA/MGCB/BMI 815
• It is increasingly recognized that complex systems cannot be
described in a reductionist view.
• Understanding the behavior of such systems starts with
understanding the topology of the corresponding network.
• Topological information is fundamental in constructing realistic
models for the function of the network.
Why Study Networks?
10
__________________________________________________________________________________________________Fall, 2017 GCBA/MGCB/BMI 815
The Protein Network of Drosophila
K.G. Guruharsha, et.al, A Protein Complex Network of Drosophila melanogaster, In Cell, Volume 147, Issue 3, 2011, Pages 690-703, ISSN 0092-8674,
__________________________________________________________________________________________________Fall, 2017 GCBA/MGCB/BMI 815
• Which interactions and groups of interactions are likely to have
equivalent functions across species?
• Based on these similarities, can we predict new functional
information about proteins and interactions that are poorly
characterized?
• What do these relationships tell us about the evolution of proteins,
networks and whole species?
Importance of Networks in Biology
11
__________________________________________________________________________________________________Fall, 2017 GCBA/MGCB/BMI 815
__________________________________________________________________________________________________Fall, 2017 GCBA/MGCB/BMI 815
1. Go to http://www.cytoscape.org/2. Click on “Download 3.5.1”3. On the downloads page, pick the appropriate version of the Cytoscape
based on the operating system configuration 4. For this class, we will be downloading Cytoscape 3.5.1 for Windows (32-
bit). Note: This is not recommended version. If you have access to a 64-bit machine, download the 64-bit version of the latest Cytoscape software for smooth and faster data visualization.
5. This version requires Java 8 to be installed on your machine. 6. Once the download completes, install the software by double-clicking the
downloaded file. 7. After installation finishes, open the Cytoscape software from the start
menu.
Downloading Cytoscape
12
__________________________________________________________________________________________________Fall, 2017 GCBA/MGCB/BMI 815
Welcome Screen
__________________________________________________________________________________________________Fall, 2017 GCBA/MGCB/BMI 815
User Interface
Networkmanagementpanel Mainnetworkviewwindow
Attributebrowserpanel
13
__________________________________________________________________________________________________Fall, 2017 GCBA/MGCB/BMI 815
Control Panel
The Network tab of the Control Panel lists the available networks by name and provides information on the number of nodes and edges
The Style interface exists as a tab in the Control Panel and is divided into 3 tabs, for Node, Edge and Network properties.
__________________________________________________________________________________________________Fall, 2017 GCBA/MGCB/BMI 815
• Go to File -> Save to save the session
1. Save Session
14
__________________________________________________________________________________________________Fall, 2017 GCBA/MGCB/BMI 815
2.1 Load Data: Import from Public Databases
__________________________________________________________________________________________________Fall, 2017 GCBA/MGCB/BMI 815
2.2 Load Data: Import from Public Databases
2. Choose databases
3. Click Import
1. Enter Search Conditions and Click Search
15
__________________________________________________________________________________________________Fall, 2017 GCBA/MGCB/BMI 815
2.3 Load Data: Import Result
__________________________________________________________________________________________________Fall, 2017 GCBA/MGCB/BMI 815
• Data sources
• Literature
• Search in other databases not listed in Cytoscape
• Own experiment (e.g., correlation between genes)
• Famous formats
• SIF
• A table
Alternate Ways to Load DataImport from Files
16
__________________________________________________________________________________________________Fall, 2017 GCBA/MGCB/BMI 815
Building & Storing Interaction Network
Source TargetA BA CA DC DD B
Network Input File or EdgeList FileA
B D
C
A
B D
C
__________________________________________________________________________________________________Fall, 2017 GCBA/MGCB/BMI 815
• Interactor A <space or tab> Interaction_Type <space or tab> InteractorB
SIF File Format
17
__________________________________________________________________________________________________Fall, 2017 GCBA/MGCB/BMI 815
• From excel files or tab-delimited text tables
Load Data: Import from File
__________________________________________________________________________________________________Fall, 2017 GCBA/MGCB/BMI 815
Load Data: Import from File
18
__________________________________________________________________________________________________Fall, 2017 GCBA/MGCB/BMI 815
Load Attributes: Import Table from File
The node and edge attributes can be imported as a table in Cytoscape.
As Node Attributes
__________________________________________________________________________________________________Fall, 2017 GCBA/MGCB/BMI 815
• Go to File -> Save to save the session before we modify the
networks.
3. Save Session
19
__________________________________________________________________________________________________Fall, 2017 GCBA/MGCB/BMI 815
4.1 Go to the “Uniprot” Network.
4.2 Search “TP53” in the search box on top on the network window.
4.3 This will select the node representing TP53 gene.
4. Get a Subnetwork
Search a node by typing the gene name here
The selected/searched node, if present in the network, will be highlighted in the network as a yellow node and the table panel will provide the information regarding the selected node.
__________________________________________________________________________________________________Fall, 2017 GCBA/MGCB/BMI 815
4.4 Now, select first neighbors of the selected node from the top menu
bar
Hide unselected nodes. This will overwrite the existing big network.
4.5 Create a subnetwork using the option in the menu bar.
4. Get a Subnetwork
20
__________________________________________________________________________________________________Fall, 2017 GCBA/MGCB/BMI 815
4.6 Resulting Subnetwork
__________________________________________________________________________________________________Fall, 2017 GCBA/MGCB/BMI 815
4.7.1 Rename the sub network as “TP53_Network” by right clicking
the sub network name in the control panel and select Rename
Network option.
4.7.2 Type in the name for the network as “TP53_Network”.
4.7 Rename Network
21
__________________________________________________________________________________________________Fall, 2017 GCBA/MGCB/BMI 815
• A network layout is a process that positions the nodes and edges for the network.
• There are a large variety of layouts in Cytoscape and plugins might add new layouts.
• All of the layouts will appear under the Layouts menu.
Layouts
Grid Layout
__________________________________________________________________________________________________Fall, 2017 GCBA/MGCB/BMI 815
5.1 Go to File->layouts and change the layout of the subnetwork as “organic”
• Try other layouts and see which layout fits well with the network. • Circular• Prefuse force directed
5. Change the Layout of the Subnetwork
22
__________________________________________________________________________________________________Fall, 2017 GCBA/MGCB/BMI 815
6. Removing Duplicate Edges and Self Loops
Resulting Network after removing self loops and duplicate edges
__________________________________________________________________________________________________Fall, 2017 GCBA/MGCB/BMI 815
• In the control panel, select “Style” tab. You can remove previous
styles by clicking on the drop down arrow next to the applied style.
7. Change Style of the TP53 Network
Open and modify
23
__________________________________________________________________________________________________Fall, 2017 GCBA/MGCB/BMI 815
Change the Style
Node color
Each column represents some information that we have (this is a
column in the node table data)
Discrete: set a value for each type of information
__________________________________________________________________________________________________Fall, 2017 GCBA/MGCB/BMI 815
7.1 Change the node color to “Green” in the TP53 network
7.2 Change the node shape to “ellipse”
7.3 Select “Lock node width and height”
7.4 In the edge tab, change the edge color to “red”.
7. Change the Style
All previous mappings won’t be removed automatically when you remove the style. While customizing the style of the network, make sure the previous settings are removed by clicking on the “Delete” icon next to the style.
24
__________________________________________________________________________________________________Fall, 2017 GCBA/MGCB/BMI 815
Resulting Network in Organic Layout
__________________________________________________________________________________________________Fall, 2017 GCBA/MGCB/BMI 815
To save the network as a high quality image:
8. Saving a Visualization
25
__________________________________________________________________________________________________Fall, 2017 GCBA/MGCB/BMI 815
• Cytoscape has many tools called ‘Apps’
• Install by going to Apps -> App Manager
• Applications support
• Advanced analysis
• Biological analysis
• Integrating data
• Import special data
Apps in Cytoscape
__________________________________________________________________________________________________Fall, 2017 GCBA/MGCB/BMI 815
Tutorial: 1. http://opentutorials.cgl.ucsf.edu/index.php/Portal:Cytoscape32. Select “Basic Expression Analysis in Cytoscape 3” under
Cytoscape 3 User Tutorials. 3. Finish the tutorial up to “Observe the Network” Step.
Dataset1. http://wiki.cytoscape.org/Presentations/04_Expression_Data2. Download two data files
1. galFiltered.sif2. galExpData.pval (same as “galExpData.csv” in the
tutorial)
Cytoscape Tutorial Resource
While uploading the “galExpData.pval” file in the Cytoscape: Click on Advanced Options and delimit the data with “Space”. If your data has a header, make sure the “Use first line as column names” is Checked. Click “OK”.
26
__________________________________________________________________________________________________Fall, 2017 GCBA/MGCB/BMI 815
• Powerful data visualization tool in the form of circular layout by Krzywinski, M. et. al.
• Circos Software can be accessed from: http://circos.ca/• It is an ideal tool for visualization of genomic data, although other data can
also be used.
Circos
__________________________________________________________________________________________________Fall, 2017 GCBA/MGCB/BMI 815
• Linear Layout has limitations
Circos (Examples)
http://www.nature.com/nrc/journal/v13/n7/full/nrc3537.html
27
__________________________________________________________________________________________________Fall, 2017 GCBA/MGCB/BMI 815
• Linear Layout has limitations
Circos (Examples)
Circos website
__________________________________________________________________________________________________Fall, 2017 GCBA/MGCB/BMI 815
• Linear Layout has limitations
Circos (Examples)
Circos website
28
__________________________________________________________________________________________________Fall, 2017 GCBA/MGCB/BMI 815
• Works on a single configuration file• Karyotype file
• Backbone of the Circos image• Chromosome and its length/location (Optional: band)• Format: chr – LABEL ID START END COLOR• The first two fields are always “chr” and “-”
• Data Track file• Provides meaning to the backbone ideogram• association/characteristics of different data points• Links, tiles, histogram, scatterplot, highlight, connector, and text
• Other Configurations• Defines how karyotype and data track would be presented in the
Circos image.
How Does Circos Work?
__________________________________________________________________________________________________Fall, 2017 GCBA/MGCB/BMI 815
ClicO FS – interactive web-based service of Circos
29
__________________________________________________________________________________________________Fall, 2017 GCBA/MGCB/BMI 815
1. Go to http://codoncloud.com:3000/home
2. Click on Demo tab
Tutorial
Typeinthenameoftheproject.Let’ssay“ClassDemo”
EnterCaptcha
Clickon“StartaDemoProject”
__________________________________________________________________________________________________Fall, 2017 GCBA/MGCB/BMI 815
• Select both Human and Mouse Karyotype for the demo.
1. Select Karyotype
30
__________________________________________________________________________________________________Fall, 2017 GCBA/MGCB/BMI 815
2. Other Configurations/Settings
Repeat the color settings on Human Karyotype and proceed to next step.Keep moving further until you reach Data Tracks window.
__________________________________________________________________________________________________Fall, 2017 GCBA/MGCB/BMI 815
3. Data Tracks
Proceed to the next step. You can change the data track settings. We are using the default settings for the demo.
31
__________________________________________________________________________________________________Fall, 2017 GCBA/MGCB/BMI 815
4. Output
• Click on “Output” tab to visualize the Circos diagram
__________________________________________________________________________________________________Fall, 2017 GCBA/MGCB/BMI 815
32
__________________________________________________________________________________________________Fall, 2017 GCBA/MGCB/BMI 815
• Complex Circos figures can be created using multiple
configuration files, where each configuration file defines the
different layer, style, or figure type such as heatmap, box plot, etc.
• Learn more Circos at http://goo.gl/oeeKBw
(full link: http://circos.ca/documentation/tutorials)
Circos