Upload
others
View
14
Download
0
Embed Size (px)
Citation preview
Contents 1. Introduction ............................................................................................................................................... 2
2. User Guide ................................................................................................................................................. 3
2.1. Installation (please check the Requirements section prior to this) ................................................... 3
2.2. Importing data from file ..................................................................................................................... 4
2.3. Import data from database ................................................................................................................ 5
2.4. Query data from database ................................................................................................................. 7
2.5. Create the disease similarity network ............................................................................................... 8
2.6. Candidate Gene Prioritization ........................................................................................................... 9
3. Developer Guide ...................................................................................................................................... 14
3.1. Set up the IDE .................................................................................................................................. 14
3.2. Create the project in Eclipse ............................................................................................................ 14
3.3. Run the plugin .................................................................................................................................. 15
4. Requirements........................................................................................................................................... 17
5. References ............................................................................................................................................... 18
Cytoscape plugin: iCTNet User & Developer Manual
2 | P a g e
1. Introduction
iCTNet (integrated Complex Traits Networks), consists of a database and a tool to create and
analyze human complex traits networks, and assembles and integrates information from genome-
wide association studies, protein-protein interactions, tissue expression, and drug targets with the
goal of identifying novel relationships across several domains that may assist in elucidating a new
classification, pathogenic mechanism, or treatment for common human traits. To the best of our
knowledge, iCTNet constitutes the first effort to integrate multiple layers of information as multi-
partite networks thus enabling systematic analysis of human complex traits. The Cytoscape [1]
Core is developed on plug-in architecture, so it is extendable for tailored functions. The plugin
iCTNet has been developed to visualize and analyze genetic relationships among entities of the
integrated human complex traits networks.
The database associated with iCTNet is comprised of five different layers: Trait-gene/protein,
protein-protein, trait-tissue, tissue-gene and drug-gene interactions. Networks can be created
by downloading the demo files from our website (http://www.cs.queensu.ca/ictnet), or importing
data directly from our online database using iCTNet.
3 | P a g e
2. User Guide
2.1. Installation (please check the Requirements section prior to this) 2.1.1. The iCTNet plugin is stored as a .jar file. So the first step is to copy iCTNet.jar and the
external library mysql- connector-java-3.2.0-alpha-bin.jar (MySQL java database connector
[2]) into the plugins folder in the Cytoscape root folder, as shown in Figure 1. The second
.jar file is a third-party file needed to connect to the MySQL database associated with
iCTNet.
Figure 1
2.1.2. Once Cytoscape starts, iCTNet will be listed in the Plugins submenu, as shown in Figure 2.
Figure 2
4 | P a g e
2.2. Importing data from file A total 6 .txt files are available from our website, including one network file, one node list file and four
node-attribute files for disease, gene, drug and tissue, respectively. In order to generate the network, a
user should import the network file: “integrated_network.txt” first, and then the node list and attribute
files. Only the network file should be imported via iCTNet, and others through the Cytoscape import
menu. There is no order to follow when the user imports node attribute files.
2.2.1. Click iCTNet -> Import from local file.
2.2.2. Click Select button in Import Local Data dialog to read iCTNet file (Format specified text
file).
2.2.3. Once read in local file, all the disease names will be shown in PreView table. User can
choose which traits and interactions to import as shown in Figure 3. Disease (or trait)
interactions can be filtered by their association p-value (8.0 is the default value of –log(p)
for our data). The distance from the associated gene to that of their neighbors in the Protein-
protein and protein-DNA interactions can be specified by the drop-down menus next the
each interaction type (see section 2.3.4)
Figure 3
2.2.4. Click import button, to load the network into Cytoscape.
5 | P a g e
2.3. Import data from database 2.3.1. Click iCTNet -> Import from database. The plugin will connect to the database on the
server automatically.
2.3.2. Click the tabbed panel “Database Import”.
2.3.3. All the disease names will be listed on a panel similar to the one shown in Figure 3.
2.3.4. User can set up value for –log(P) filter, to remove unimportant disease-gene associations.
The higher value for –log(p) indicates more significant disease-gene association. There are
up to five different types of connections to be downloaded; the user can select to download
any of the following interaction types:
Disease-Gene
Disease-Tissue
Tissue-Gene
Drug-Gene
Protein-Protein
DNA-Protein
If Distance = 0 for Protein-Protein and DNA-Protein interactions, then only genes directly
connected to diseases will be imported. If Distance = 1, the first neighbors of directly
associated genes will be also included. Figure 4 shows the examples of Distance 0 (left) and
Distance 1 (right).
Figure 5 shows the disease-gene network for 5 common autoimmune diseases (Distance=0;
tissue-gene: UNSELECTED, drug-gene: UNSELECTED, disease-tissue: UNSELECTED):
Crohn’s disease, multiple sclerosis, Psoriasis, Type 1 diabetes, and Rheumatoid arthritis, in
Spring embedded Cytoscape layout.
Figure 4
6 | P a g e
Figure 5
Figure 6 shows the edge attributes of some connections in the network. There are three edge
attributes in iCTNet: Flag, PubMed and interaction. Attribute Flag is true if the disease node
and gene node in disease-gene connection share the same tissue node, otherwise false; attribute
PubMed shows the PubMed IDs for the published articles revealing the corresponding disease-
gene connections, which has been downloaded from GWAS catalog [3]; attribute interaction
refers to the types of connections in the network.
Figure 6
7 | P a g e
Once a network is imported through iCTNet it is straightforward to use other Cytoscape plugins
for further analysis. Figure 7 shows the results of using the “Network Analysis” plugin (default
in Cytoscape 2.8) on the imported network.
Figure 7
2.4. Query data from database 2.4.1. Click iCTNet -> Import from database. The plugin will connect the database located on
remote server automatically.
2.4.2. Input gene, disease, or tissue name in the text field, and press the Search button. %
represents the wildcard. Figure 8 shows an example on how to search associated genes by
disease.
8 | P a g e
Figure 8
2.5. Create the disease similarity network 2.5.1. Select one network containing disease nodes.
2.5.2. Click iCTNet ->Create Disease Similarity Net. The plugin will create the disease similarity
network, where disease nodes will be connected by edges if they share at least one gene.
2.5.3. Figure 9 shows the similarity network created from the 5 common autoimmune disease
network shown in Figure 5, where the color and width of edges are proportional to the
number of shared genes.
9 | P a g e
Figure 9
2.6. Candidate Gene Prioritization
2.6.1. Methods: Two algorithms to prioritize candidate genes have been implemented in iCTNet.
Candidate gene prioritization is a useful strategy when searching for false negative hits in a GWAS.
For example, a SNP with a p-value of 10-5 could be dismissed because of not making the genome-
wide cutoff, yet it could be a true association. The two algorithms we implemented are random walk
with restarts [4], and network propagation [5]. In the first algorithm, all genes with GWAS p-values
are classified as either “associated” or “candidate” based on a user-selected threshold. The algorithm
measures the closeness of potentially associated (candidates) to confirmed (associated) genes within
the global protein network, and ranks candidate genes for further biological investigation. The core of
the network propagation algorithm (also called PRINCE) is similar to that of random walk. The
algorithm takes as input a disease similarity matrix calculated from GWAS data (W), and a protein
interaction network. It then uses a network propagation-based algorithm to infer a strength-of-
association scoring function and exploits the prior information on causal genes for the same disease or
similar ones. This scoring is then used in combination with a PPI network to infer protein complexes
that are potentially involved in the given disease.
The original methods were modified to best take advantage of the multi-layered nature of our data. While
the random walk algorithm will work only on protein-protein interaction networks, we have
extended network propagation to work all over the entire network with up to 5 different types of
Number of shared genes
1 50
10 | P a g e
connections. Figure 10 shows the interface of iCTNet analysis. The various parameters to be
set for network analysis, as seen in this figure, are as follows:
Symmetric: refers to the normalization of the disease similarity matrix where symmetric normalization is
performed using a diagonal row sum matrix of W [2].
Candidates Only: if checked, only candidate genes will be shown in the results.
Disease similarity: given the query disease, iCTNet will calculate the similarity matrix including all
diseases in the network with the query.
Threshold (disease similarity): the value where only disease-gene association with metap
< will be considered for calculating the similarity. e.g. MS is the query disease
and the threshold is set to 5.0 where there is only one gene shared by MS and RA in the
selected network with –log(metap) greater than 5.0 and two genes shared by MS and Type 1
diabetes with –log(metap) values greater than 5.0. In this case there will be 3 genes
connected with MS and the similarity scores between MS and RA, MS and Type 1Diabetes,
and MS with itself will be 1/3, 2/3 and 1, respectively.
Parameters:
Type: The node type with which the analysis starts. Currently iCTNet only performs the analysis starting
from the disease nodes.
Threshold: This value defines whether disease-gene associations will be considered true or candidate
associations.
Filtered by tissue: If checked, then only disease-gene associations in which diseases and genes
share the same tissues will be considered.
Ratio: restart ratio between 0 and 1. In both methods, the walker begins with starting nodes and
extends to randomly selected neighbors in the network. The restart ratio represents the
probability of the transition to jump back to starting nodes at every time step. In other words,
with a small value of the restart ratio, the transition will have high probability to reach
farther nodes in the network. If the restart ratio is 1, then the walker will be trapped at
starting nodes.
Time: time step. Both methods try to simulate iterative transitions from current nodes to their
neighbors in the network. The time step defined here counts the number of such iterations.
In iCTNet, a user only needs to set up the time step for “random walk ex” and “network
propagation ex”, as for “random walk” and “network propagation”, the time step will be
controlled by iCTNet.
Connection weights: par_dt, par_tg and par_gd are weights for connections of disease-tissue,
tissue-gene and gene-drug, respectively. The default weight for protein-protein connections
is 1. The value of weights is proportional to the importance of the connection; users can set
up a value smaller than 1 if they assume the specified connection is less important than that
defined by the ppi.
11 | P a g e
Run: A ranking list will be returned once the analysis on the selected query disease finishes.
Batch Run: The analysis will run on all diseases in the current network one by one.
Figure 10
2.6.2 Results
After running the candidate prioritization, a ranking list of candidate genes will returned as
shown in Figure 11.
Normalization
Connection weights
Time step restart ratio
Only candidate
genes will be
returned in the
result list
-log(metap) for disease-gene association
12 | P a g e
Figure 11
In order to visualize the interesting candidate genes from the resulting list, a user can select
any gene from the list, and right click to pop up a menu to select these genes with/without
true candidates in the network view for further manipulation. Figure 12 shows an example
that 8 genes from the result list and true candidates defined by the threshold in the analysis
have been selected (in yellow color) in the Cytoscape network view.
Figure 12
For the “Batch Run”, all the prioritization results will be saved as the node attributes, and
can be easily accessed by view node attributes in Cytoscape, as shown in Figure 13.
13 | P a g e
Figure 13
14 | P a g e
3. Developer Guide
Cytoscape is coded in Java. This guide will show how to create and debug a Cytoscape plugin using
Eclipse, a free development platform. More information about Eclipse can be found at:
http://www.eclipse.org.
3.1. Set up the IDE 3.1.1. Download the java jdk from http://java.sun.com/javase/downloads/index.jsp (recommend
jdk6).
3.1.2. Install the jdk following the instruction.
3.1.3. Set up the system environment variables: PATH and CLASSPATH (if you want to develop
java code using command line, then you need such configuration).
PATH: C:\Program Files\Java\jdk1.6.0_18\bin;
CLASSPATH: .;C:\Program Files\Java\jdk1.6.0_18\lib\dt.jar;
C:\Program Files\Java\jdk1.6.0_18\lib\tools.jar
Test: start menu -> run -> cmd
In the command line, type javac -version
If you see the version information of java jdk, then go ahead.
3.1.4. Download the Eclipse for RCP/Plug-in Developers from http://www.eclipse.org/downloads/
You can also configure the different jdk for your project inside the eclipse, so step 3 is not
necessary if you only use eclipse for your java coding.
3.2. Create the project in Eclipse 3.2.1. Go to File->New->Project... then select Java Project.
3.2.2. On the next dialog, choose a name for your project and click Finish.
3.2.3. Right click the new created project in the Package Explorer and select Properties.
3.2.4. Then select Java Build Path on the left panel and Libraries in the tabs on the right panel,
then choose “Add External JARs...”
3.2.5. Add cytoscape.jar from the main Cytoscape folder, and then add all the .jar files from the
lib folder inside the main Cytoscape folder, as shown in Figure 14 (for Cytoscape 2.7).
Since there is no lib folder in Cytoscape 2.8, only cytoscape.jar needs to be added.
15 | P a g e
Figure 14
3.3. Run the plugin 3.3.1. To Run your plugin, select Run->Run Configurations in Eclipse and click New Java
Application.
3.3.2. Choose a Name for the Run Target and choose cytoscape.CyMain as the Main class, as
shown in Figure 15.
16 | P a g e
Figure 15
3.3.3. On the next tab ((x)= Arguments) put the path to the plugin under your workspace as
program arguments, as shown in Figure 16.
3.3.4. Press the Run button. Then the Cytoscape program will start.
Figure 16
17 | P a g e
4. Requirements
iCTNet plugin requires Cytoscape version 2.6 or later. The current release of iCTNet has been tested
on the latest released version of Cytoscape, 2.8. For further development of iCTNet with Cytoscape
2.8, updates with the new configurations may be necessary. As a plugin for Cytoscape, iCTNet has
the same minimum Java (at least java SE 1.5) and system requirements as Cytoscape. The actual
hardware requirements depend on the size of the networks the user is loading and manipulating. To
run Cytoscape with large networks, the system needs at least 2GB memory.
18 | P a g e
5. References
[1] Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D, Amin N, Schwikowski B, Ideker T,
Cytoscape: A software environment for integrated models of biomolecular interaction networks.
Genome Research 2003 13(11): 2498-504.
[2] http://www.mysql.com/products/connector/
[3] Hindorff LA, Sethupathy P, Junkins HA, Ramos EM, Mehta JP, Collins FS, Manolio TA: Potential
etiologic and functional implications of genome-wide association loci for human diseases and traits.
Proceedings of the National Academy of Sciences USA 2009, 106(23):9362-9367.
[4] Kohler S, Bauer S, Horn D, Robinson PN: Walking the interactome for prioritization of candidate
disease genes. American Journal of Human Genetics 2008, 82(4):949-958.
[5] Vanunu O, Magger O, Ruppin E, Shlomi T, Sharan R: Associating genes and protein complexes with
disease via network propagation. PLoS Computational Biology 2010, 6(1):e1000641.