Topological Analysis and Interactive Visualization of Biological Networks and Protein Structures

Embed Size (px)

Citation preview

  • 8/10/2019 Topological Analysis and Interactive Visualization of Biological Networks and Protein Structures

    1/16

    2012

    NatureAmerica,Inc.Allrightsreserved.

    PROTOCOL

    670|VOL.7 NO.4 |2012 |NATURE PROTOCOLS

    INTRODUCTION

    Current high-throughput techniques such as yeast two-hybridscreens for protein interaction partners produce great volumes

    of experimental data that can be integrated and explored to gain

    insight into biological processes performed by interacting mol-

    ecules16. Furthermore, structural biologists study the interactions

    of residues in protein structures to understand complex protein

    structure-function relationships711. Commonly, large-scale inter-

    action data are represented as networks and initially analyzed by

    graph-theoretic methods to characterize the topological network

    structure and its global and local interaction properties1218.

    A number of software tools are available for the visual explora-

    tion and computational analysis of networks1921. General software

    libraries for network analysis are the Java framework JUNG22, the

    C + + library LEDA23, the Python package NetworkX24and R pack-ages such as igraph25, statnet26, sna27, tnet28and QuACN29. However,

    they cannot be applied by users without programming expertise.

    In contrast, sophisticated free software platforms such as Pajek30,

    VisANT31, ONDEX32and BIANA33provide graphical user interfaces

    for the analysis of biological networks. In addition, the free and

    stand-alone Cytoscape platform has gained considerable inter-

    est in recent years because of its open-source code development

    and its rapidly growing community of users and developers 34,35.

    In particular, its functionality is easily extendable by additional

    plug-ins that support specific network analysis tasks. For example,

    software protocols are already available for cluster analysis with

    the TransClust and ClusterExplorer plug-ins36, as well as for the

    integration of physical and genetic interactions into module mapswith the PanGIA plug-in37.

    Here we demonstrate how to apply two of our Cytoscape plug-ins,

    NetworkAnalyzer38and RINalyzer39, for the standard and advanced

    analysis of network topologies. NetworkAnalyzer performs a

    comprehensive analysis of network topologies without requiring

    advanced knowledge in graph theory or programming expertise38.

    In particular, it supports the characterization of molecular net-

    works in terms of scale-free and small-world properties, modularity

    and hierarchical structure5,12,13,40, the identification of important

    network nodes and edges based on topological parameters11,4143,

    and the comparison of networks with regard to their topology4447.

    Since its initial release in 2007, NetworkAnalyzer has been extended

    by additional features and topological parameters and is widelyused in academia and industry as indicated by thousands of soft-

    ware downloads. Recently, this plug-in became an integral part of

    each standard installation of Cytoscape, and its source code was

    published under the GNU Lesser General Public License.

    Basically, NetworkAnalyzer efficiently computes a number of

    topological parameters, including node degree, clustering and

    topological coefficient, characteristic path length and between-

    ness centrality (see Box 1for detailed descriptions of all available

    parameters). The computed topological parameters are represented

    as single values, histograms or scatter plots and can be visualized

    in the Cytoscape network view by corresponding node and edge

    size as well as color choice. In addition, each pair of computed

    parameters can be plotted as a chart. As an exhaustive topologicalanalysis of huge networks can be a computationally intensive task

    on a global scale, NetworkAnalyzer provides the option to calculate

    only local parameters, such as node degree, neighborhood connec-

    tivity and clustering coefficient, for a selected subset of nodes. This

    avoids the time-consuming computation of path-related network

    parameters. NetworkAnalyzer also offers batch processing of net-

    works, which allows the automatized topological analysis of a large

    number of networks.

    RINalyzer complements NetworkAnalyzer on the particular task of

    analyzing and visualizing residue interaction networks (RINs) inter-

    actively39. A RIN consists of nodes that represent protein residues and

    edges that correspond to noncovalent interactions between residues.

    In particular, RINalyzer is currently the only tool that supports thesimultaneous view of a RIN in 2D and the corresponding protein

    structure in 3D by connecting Cytoscape to the UCSF Chimera molec-

    ular structure viewer48. RINalyzer also provides versatile user options,

    such as the computation of weighted network centrality measures

    to highlight biologically important residues and the network com-

    parison of superimposed protein structures to study differing resi-

    due interactions. This new structure analysis approach can be very

    useful in a number of biological and medical application scenarios.

    Examples include the identification of key residues for protein fold-

    ing and allostery, the investigation of residue interactions in protein-

    binding interfaces and active sites, and the detailed characterization

    of the molecular effects of residue mutations711,39.

    Topological analysis and interactive visualization ofbiological networks and protein structuresNadezhda T Doncheva1, Yassen Assenov1, Francisco S Domingues2& Mario Albrecht1,3

    1Max Planck Institute for Informatics, Saarbrcken, Germany. 2Center for Biomedicine, EURAC research, Bolzano, Italy. 3Institute of Biometrics and Medical Informatics,

    University Medicine Greifswald, Greifswald, Germany. Correspondence should be addressed to M.A. ([email protected]).

    Published online 15 March 2012; doi:10.1038/nprot.2012.004

    Computational analysis and interactive visualization of biological networks and protein structures are common tasks for gaininginsight into biological processes. This protocol describes three workflows based on the NetworkAnalyzer and RINalyzer plug-ins forCytoscape, a popular software platform for networks. NetworkAnalyzer has become a standard Cytoscape tool for comprehensivenetwork topology analysis. In addition, RINalyzer provides methods for exploring residue interaction networks derived from proteinstructures. The first workflow uses NetworkAnalyzer to perform a topological analysis of biological networks. The second workflowapplies RINalyzer to study protein structure and function and to compute network centrality measures. The third workflowcombines NetworkAnalyzer and RINalyzer to compare residue networks. The full protocol can be completed in ~2 h.

  • 8/10/2019 Topological Analysis and Interactive Visualization of Biological Networks and Protein Structures

    2/16

    2012

    NatureAmerica,Inc.Allrightsreserved.

    PROTOCOL

    NATURE PROTOCOLS|VOL.7 NO.4 |2012 |671

    Box 1 |Topological network parameters

    Connected components. In undirected networks, two nodes are connected if there is a path of edges between them. All nodes that are

    pairwise connected form a connected component. The number of connected componentsin a network is an indicator of the global

    connectivity of a network. A low number of connected components relates to strong network connectivity because many nodes are

    connected and form few connected components of large node size.

    Degree distributions. In undirected networks, the node degreeof a node nis the number of edges linked to n(ref. 40). A self-loop of a node

    is counted like two edges for the node degree63. A node with a high degree is referred to as hub. The node degree distributiongives thenumber of nodes with degree kfor k= 0,1, In directed networks, the in-degreeof a node nis the number of incoming edges and the

    out-degreeof a node is the number of outgoing edges. As in undirected networks, there are in-degreeand out-degree distributions. A network

    is calledscale-freeif its degree distribution approximates a power law kwith the degree exponent (ref. 40). The topological role of

    network hubs depends on the value. For > 3 the hubs are not relevant, for 3 > > 2 the hubs are organized in a hierarchy, and for

    = 2 a hub-and-spoke model emerges, in which the largest hub is in contact with a large fraction of all nodes. For most biological networks,

    it has been observed that 2 < < 3. Barabsi and Albert used this network property to distinguish between random (as defined by Erdos and

    Rnyi64) and scale-free network topologies40,65. There are also continued discussions about the observed power law and scale-freeness66,67.

    Neighborhood-related parameters. The neighborhoodof a node nis the set of its neighbors. The connectivity knis the size of the neighbor-

    hood of n(ref. 68). The average number of neighborsis an indicator for the average connectivity of the nodes in the network. A normalized

    version of this parameter is the network density. The density is a value between 0 and 1. It measures how densely the network is populated

    with edges (self-loops and duplicated edges are ignored). A network without edges and with solely isolated nodeshas a density of 0. In

    contrast, the density of a clique, which is a set of nodes that are connected to each other, is 1. Another related parameter is the network

    centralization68.Networks whose topologies resemble a star have centralization close to 1, whereas more uniformly connected networksare characterized by centralization close to 0. The network heterogeneityreflects the tendency of a network to contain hub nodes68.

    The neighborhood connectivityof a node nis the average connectivity of all neighbors of n(ref. 69). The neighborhood connectivity distribution

    gives the average of the neighborhood connectivities of all nodes nwith kneighbors for k= 0,1, In directed networks, a node has the

    following three types of neighborhood connectivity: only in, the average out-connectivity of all in-neighbors of n; only out, the average

    in-connectivity of all out-neighbors of n; and in and out, the average connectivity of all neighbors of n(edge direction is ignored). On the

    basis of these three definitions, there are three neighborhood connectivity distributions: only in, only outand in and out. If the neighborhood

    connectivity distributionis a decreasing function in k, edges between low connected and highly connected nodes prevail in the network69.

    Shortest paths. The length of a path is the number of edges forming it. The lengthof theshortest path, the distance, between two nodes

    nand mis denoted by L(n,m). Theshortest path length distributiongives the number of node pairs (n,m) with L(n,m) = kfor k= 1,2,

    The shortest path length distribution may indicate small-world properties of a network70. The eccentricityof a node nis the maximum

    noninfinite length of a shortest path between nand another node in the network. The network diameteris the maximum node eccentric-

    ity. If a network is disconnected, it is clear by definition that the network diameter is the maximum of all diameters of its connected

    components. In contrast, the network radiusis the minimum of the nonzero eccentricities of the nodes in the network. The average

    shortest path length, also known as the characteristic path length, indicates the expected distance between two connected nodes.

    Clustering coefficients. In undirected networks, the clustering coefficient Cnof a node nis defined as C

    n= 2e

    n/ (k

    n(k

    n1)), where k

    nis

    the number of neighbors of nand enthe number of edges between all neighbors of n(refs. 40,70). In directed networks, the definition

    is slightly different: Cn = e

    n/ (k

    n(k

    n1)). In both cases, the clustering coefficient constitutes a ratio N/M, where Nis the number of

    edges between the neighbors of n, and Mthe maximum number of edges that could possibly exist between the neighbors of n. The

    clustering coefficient of a node is always a number between 0 and 1. The network clustering coefficientis the average of the clustering

    coefficients of all nodes in the network. The average clustering coefficient distributiongives the average of the clustering coefficients

    for all nodes nwith kneighbors for k = 2, In particular, the average clustering coefficient distribution was used to identify a

    modular organization of metabolic networks71.

    Shared neighbors. P(n,m) is the number of interaction partners shared between the nodes nand m, that is, nodes that are neighbors of

    both nand m(ref. 38). Theshared neighbors distributiongives the number of node pairs (n,m) with P(n,m) = kfor k= 1,2,

    Topological coefficients. The topological coefficient Tnof a node nwith k

    nneighbors is computed as follows: T

    n= avg(J(n,m)) / k

    n(ref. 72).

    HereJ(n,m) is defined for all nodes mthat share at least one neighbor with n. The valueJ(n,m) is the number of neighbors shared

    between the nodes nand m, plus 1 if there is an edge between nand m. The topological coefficient is a relative measure for the extent

    to which a node shares neighbors with other nodes. The chart of the topological coefficientsindicates the tendency of the nodes in the

    network to have shared neighbors.

    Stress centrality. Thestress centralityof a node nis the number of shortest paths passing through n(refs. 73,74). Thestress centrality

    distributiongives the number of nodes with stresssfor different values ofs.

    Betweenness centrality. The betweenness centrality Cb(n) of a node nis defined as follows73: C

    b(n) =

    snt(

    st(n) /

    st). Heresand tare

    nodes in the network different from n, stdenotes the number of shortest paths fromsto t, and

    st(n) is the number of shortest paths

    fromsto tthat nlies on. The betweenness value for each node nis normalized by dividing the number of node pairs excluding n:

    (N 1)(N 2) / 2, where Nis the total number of nodes in the connected component that nbelongs to. Thus, the betweenness

    (continued)

  • 8/10/2019 Topological Analysis and Interactive Visualization of Biological Networks and Protein Structures

    3/16

    2012

    NatureAmerica,Inc.Allrightsreserved.

    PROTOCOL

    672|VOL.7 NO.4 |2012 |NATURE PROTOCOLS

    Further Cytoscape plug-ins offer complementary features for

    the analysis of biological networks. For instance, plug-ins such as

    ClusterMaker49, GLay50, MCODE51, MINE52and NeMo53 can be

    used to identify and visualize clusters in networks, whereas the

    DomainGraph plug-in has a special focus on the visual analy-

    sis of the effect of alternative splicing on gene and protein net-

    works54. Another Cytoscape plug-in with functionality related to

    NetworkAnalyzer and RINalyzer is CentiScaPe, which also com-

    putes network centrality measures55. However, among other differ-

    ences, it does not provide global measures of network topology as

    NetworkAnalyzer does, and it does not support weighted networksas RINalyzer does.

    By using NetworkAnalyzer and RINalyzer, this protocol describes

    three computational workflows of frequently applied network analy-

    sis steps (Fig. 1). The first workflow uses NetworkAnalyzer and shows

    how to conduct a typical topology analysis of biological networks

    such as protein interaction networks or RINs. The second workflow

    covers various aspects related to the use of RINalyzer for the visual

    exploration of RINs, the study of protein binding interfaces and

    the network centrality analysis. The third workflow details how to

    combine NetworkAnalyzer and RINalyzer for the comparison of

    multiple RINs. Below we outline each of the three workflows.

    Experimental designTopological analysis of biological networks. This workflow

    describes how to use the NetworkAnalyzer plug-in to perform

    a topological analysis on an unweighted network loaded into

    Cytoscape, as well as how to process and visualize the results. As

    described in the next section, RINalyzer also computes several

    centrality measures for weighted networks and provides further

    options for the visual exploration of the results.

    Basically, NetworkAnalyzer calculates many simple topo-

    logical parameters, such as clustering coefficient, number of

    connected components, diameter and radius, centralization,

    number of shortest paths, average shortest path length, average

    number of neighbors, density, heterogeneity (only for undirected

    networks), number of isolated nodes, number of self-loops andnumber of multiedge node pairs. In addition, the following com-

    plex topological parameters are computed by NetworkAnalyzer:

    average clustering coefficient distribution, shortest path length

    distribution, betweenness centrality versus number of neigh-

    bors, closeness centrality versus number of neighbors and stress

    centrality distribution. The degree distribution, topological

    coefficients, shared neighbors distribution and neighborhood

    connectivity distribution are computed for undirected networks

    only, whereas the in-degree, out-degree and three different types of

    neighborhood connectivity are used for directed networks. More

    details on the definitions of all topological parameters are given

    in Box 1. The complete set of simple and complex parameters is

    referred to as network statistics in NetworkAnalyzer.

    In the network view, topological parameters computed for net-

    work nodes can be highlighted by changing size and color attributes.

    For example, the degree might correspond to the node size and theclustering coefficient might determine the node color (Step 2A(x)).

    Complex topological parameters are depicted as histograms or scat-

    ter plots. The user can easily customize various visual settings as

    well as switch between histograms or scatter plots of the computed

    distributions and between linear or logarithmic scales of thexand

    yaxes. In addition, a power law can be fitted to the degree distri-

    bution to illustrate whether the analyzed network has scale-free

    properties (Step 2A(vi)). Finally, both displayed charts and network

    statistics can be saved to files (Steps 2A(ix) and (xi)).

    Interactive visual analysis of residue networks. This work-

    flow explains the use of the Cytoscape plug-in RINalyzer and its

    features for analyzing and visualizing RINs. It is divided into thefollowing major steps (Fig. 1): retrieving and loading RIN data into

    Cytoscape (Step 2B(iv)); customizing RIN and 3D structure views

    (Step 2B(vixi)); creating, managing and saving sets of residue

    nodes (Step 2B(xiixviii)); performing centrality analysis, explor-

    ing and saving the results (Step 2B(xixxxix)).

    Load any type

    of network data

    2A

    2C

    1Start

    Cytoscape

    Compute networkparameters

    Visualize

    parameters

    Explore complex

    parameters

    Perform centrality

    analysis

    Save results

    Perform topological

    analysis on multiple RINs

    Compare networkstatistics of RINs

    Compare RINsusing RINalyzer

    Save node sets

    Create sets ofresidue nodes

    Customize view of RINand protein structure

    LoadRIN data

    2B

    RINalyzerNetworkAnalyzer

    Save networkstatistics

    Figure 1|Outline of the protocol. This protocol starts with launchingCytoscape (Step 1) and consists of three major workflows: (Step 2A)

    topological analysis of biological networks; (Step 2B) interactive visual

    analysis of residue networks; (Step 2C) comparison of residue networks.

    Steps colored in blue are performed with NetworkAnalyzer and those in pink

    with RINalyzer. The dotted line represents an optional step that connects the

    two workflows, which is not described in detail in this protocol.

    Box 1 |Topological network parameters (continued)

    centrality of each node is a number between 0 and 1. The betweenness centrality of a node reflects the amount of control that this

    node exerts over the interactions of other nodes in the network75.

    Closeness centrality. The closeness centrality Cc(n) of a node nis the reciprocal of the average shortest path length76. The closeness

    centrality of each node is a number between 0 and 1. Closeness centrality is a measure of how quickly information spreads from a

    given node to other reachable nodes in the network76.

  • 8/10/2019 Topological Analysis and Interactive Visualization of Biological Networks and Protein Structures

    4/16

    2012

    NatureAmerica,Inc.Allrightsreserved.

    PROTOCOL

    NATURE PROTOCOLS|VOL.7 NO.4 |2012 |673

    The workflow starts with the retrieval of residue interaction data

    for a protein of interest from the web interface to our RINdata

    database (Step 2B(i)). It contains RINs generated by means of the

    RINerator software (http://rinalyzer.de/rinerator.php) for over

    50,000 protein structures from the Protein Data Bank (PDB)56.

    In contrast to previous approaches that define residue interactions

    on the basis of spatial atomic distance between residues, RINerator

    distinguishes different residue interaction types and quantifies the

    strength of individual interactions, which results in an undirected

    weighted network with multiple interaction edges. To this end,

    RINerator first adds hydrogens to the 3D protein structure by using

    the Reduce tool57and then samples contacts on the van der Waals

    surface of each atom by using Probe58.

    In a RIN, the nodes represent the protein residues and the edges

    between them represent the noncovalent interactions identified by

    Probe. The edges are labeled with an interaction type and subtype.

    Possible types are interatomic contact (cnt), hydrogen bond (hbond),

    overlapping van der Waals radii (ovl) and generic residue interac-

    tion (combi), whereas the subtypes indicate interactions between

    main chains (mc) and side chains (sc) of the amino acid residues.

    Each edge is weighted with the respective score for the interacting

    residues as computed by Probe and the weight is proportional to

    the strength of the interaction. The resulting RIN and additional

    information (such as edge weights) are stored in the Cytoscape

    default formats, the simple interaction format (SIF) for the net-

    work, and the edge attribute (EA) files for the edge weights34. Thus,

    each RIN is accompanied by the original PDB file with hydrogens

    added, and two edge attribute files.

    Once both the RIN and the corresponding protein structure

    are imported (after Step 2B(iv)), RINalyzer establishes a bidirec-

    tional connection between Cytoscape and the 3D structure viewer

    UCSF Chimera. In particular, when the user selects nodes of a RIN in

    the Cytoscape network view, the corresponding residues in the pro-

    tein structure are automatically highlighted in UCSF Chimera, and

    vice versa. RIN nodes can be colored according to secondary structure

    based on the data retrieved from UCSF Chimera, and the node colors

    can be synchronized with the residue colors in UCSF Chimera. In

    addition, the user is able to show or hide different types of interaction

    edges such as backbone and hydrogen bonds. The visual RIN settings

    that can be customized by the user are listed inBox 2. Notably, a RIN-

    specific 2D layout can be applied to the network view that takes the

    current 3D structure coordinates into account.

    The subsequent visual exploration of RINs often includes the

    study of the molecular interactions of active site residues and bind-

    ing residues. For this purpose, RINalyzer offers a user interface for

    creating and modifying sets of residue nodes. In particular, the user

    can apply it to identify the interacting residues in the binding inter-

    face of two distinct protein domains (Step 2B(xv)) or to highlight

    different sets of residues such as active site residues (Step 2B(xvii))

    in both the network and the 3D structure view.

    We also show how to use RINalyzer for the computation of

    weighted centrality measures and the identification of central

    nodes in a RIN (Step 2B(xxixxvii)). To this end, RINalyzer cal-

    culates the following centrality measures: weighted degree; short-

    est path closeness and betweenness; current flow closeness and

    betweenness; random walk closeness and betweenness (Box 3).

    Here a crucial point is the choice of the appropriate user settings

    for the centrality analysis. As the edge weights in a RIN are pro-

    portional to the strength of the represented residue interaction,

    the weights need to be converted to distance scores such that

    smaller values are assigned to edges that represent stronger inter-

    actions for the shortest path computation. For each computed

    Box 2 |Visual properties of residue networks

    RINalyzer facilitates the visualization of RINs by providing a user interface with default values for a selected set of visual node and

    edge properties (Supplementary Fig. 3). The properties are grouped as follows:

    Background color. The background color should contrast well with the colors of the nodes and edges.

    Node colors. RINalyzer can color the nodes according to the secondary structure of the represented residues. This option is particularly

    useful for the visual analysis of RINs. The secondary structure information can be loaded into the Cytoscape session either by importing

    a node attribute file or by opening the protein structure file of the RIN in UCSF Chimera via the RINalyzer menu. The node attribute

    file should contain the attribute name SS (for secondary structure). The attribute values must be strings: Sheetor the letter Sor E

    for sheet, Helixor the letter Hfor helix, Loopor the letter Lor Cfor loop, and an empty string, the minus symbol or the letter Ufor

    unknown secondary structure. When the protein structure is loaded with UCSF Chimera, RINalyzer automatically stores the secondary

    structure as a node attribute. The colors for the secondary structure elements can be changed by the user.

    Node labels and sizes. The RIN node label consists of the following elements: PDB identifier, chain identifier, residue index, insertion code

    and residue type. If one of these elements is missing, it is replaced by an underscore. Therefore, the full node label is rather long and

    might be inappropriate for visualization. Thus, the user can choose which label elements are displayed instead of the whole node label.

    An example is shown in Supplementary Figure 3. In addition, the sizes of the nodes and the fonts of their labels can be customized.

    Backbone edges. This option displays the backbone edges in the network. These edges are defined as connections between two residues

    with successive residue indices and have the interaction type backbone.

    Visibility of edges. In most cases, RINs have multiple edges that represent different interaction types. The large number of multiple

    edges might lead to an unclear view of the RIN in Cytoscape. Thus, for all interaction types, the edges of one interaction type can be

    shown (or hidden) by (de-)selecting the respective check box.

    Edge colors. The edges can be colored with respect to their interaction type.

    Edge line type and width. For improved visualization of multiple edges, these can be drawn as straight, parallel lines that lie close to

    each other. The appearance of the lines is determined by the width of the edge lines and the space between them.

  • 8/10/2019 Topological Analysis and Interactive Visualization of Biological Networks and Protein Structures

    5/16

    2012

    NatureAmerica,Inc.Allrightsreserved.

    PROTOCOL

    674|VOL.7 NO.4 |2012 |NATURE PROTOCOLS

    Box 3 |Network centrality analysis

    One of the key features of RINalyzer is the computation of weighted centrality measures with respect to a set of selected nodes. In the

    following, we will refer to the set of currently selected nodes as the root set. The centrality measures can be divided into three main

    categories according to how the distance between two nodes is measured (Supplementary Fig. 6).

    Shortest path centralities. Here the distance between two nodes is given by the length of the shortest pathbetween them. Theshortest

    path degreeis the number of neighbors of a node that are contained in the root setand lie within a given cutoff distance from it.

    Shortest path closenessis the inverted average distance from a node to all nodes in the root set76. Shortest path betweennessis thefraction of shortest paths between pairs of root setnodes that pass through the node of interest73.

    Current flow centralities.Here the distance between two nodes is computed as the effective electric resistancebetween them (i.e., the

    difference of their potentials required for generating one unit of electrical current between them77). Current flow closenessis the in-

    verted sum of the effective resistances between a node and all nodes in the root set. Current flow betweennessis the amount of current

    that passes through a node when a current unit flows from a source to a target node, over all source-target node pairs in the root set.

    Random walk centralities. Here the distance between two nodes is measured by the hitting time(i.e., the expected number of steps

    needed by a random walk from one node to the other). Random walk closenessis the mean hitting time over all random walks starting

    at a node from the root setand ending at the node of interest. Random walk betweennessis the expected number of visits to a node

    by a random walk between each pair of root setnodes relative to the hitting time of the random walk. The computation of hitting time

    and the expected number of visits is based on the relationship between random walks and the distribution of electrical current through

    the network78.

    Normalization. Degree and closeness are normalized by the number of nodes in the root set.Betweenness is normalized by the numberof pairs of root setnodes.

    For RINalyzer, the input network for the centrality analysis should be undirectedand connectedand can have multiple, unweightedor

    weightededges. Self-loops are ignored by the centrality calculation. If the network edges are weighted, the weights need to have

    non-negative values and should be proportional to the strength of the represented residue interaction. However, for the shortest

    path computation, the weights have to be converted to distance scores so that smaller scores are assigned to edges that connect

    nodes with stronger interactions. Thus, before the network centrality analysis is initiated, the user has to customize the following

    analysis settings:

    Choose attribute as edge weight. This setting is used to assign weights to the network edges by selecting a numeric edge attribute from

    the edge attributes of the loaded network. Edge attributes can be easily created in Cytoscape or imported from an edge attribute file.

    The default option for the weight attribute is Noneand stands for no specific attribute (i.e., the default edge weight is assigned to all

    edges). Edges with a weight of zero will be ignored during the network centrality analysis, and missing weight values are replaced by

    the default weight.

    Handle multiple edges. RINs can have multiple edges that represent different interactions between residues such as hydrogen bonds or

    interatomic contacts. As the computation of centrality measures is based on a single edge type, either the user must select a specific

    edge type or the weights of multiple edges between directly connected nodes must be merged into a single combined weight. The user

    can choose from four alternatives to merge weights, taking the maximum, the minimum, the average or the sum of the weights of the

    multiple edges between directly connected nodes.

    Handle negative weights. As the computed centrality measures in RINalyzer are defined only for non-negative weights, negative edge

    weights have to be removed before the computation. This can be achieved either by ignoring them (Ignore) or by reverting them to

    their absolute value (Revert).

    Convert scores into distances. Shortest path centrality measures are computed under the assumption that weights represent distances

    (i.e., the smaller the weight, the stronger the interaction). Therefore, if the assigned weights obey to a similarity function (i.e., larger

    weight values correspond to stronger interactions), it is necessary to convert them into distances. The first option is to invert each

    weight value (1/value), and the second option is to subtract each value from the largest (max) weight value found in the network

    (max value). The maxvalue is increased by 1 before subtraction to avoid weights equal to zero. Edge weights of zero cannot beinverted and are thus ignored in the subsequent analysis.

    Default edge weight (if missing). If weight values are missing for some of the edges, they are replaced by this defaultvalue. For

    instance, if no edge attribute is selected, the default edge weightis assigned to all edges. If the default weight is 1, the network is

    treated as unweighted.

    Cutoff for weighted degree. The weighted degreecentrality measure is computed by counting the nodes that can be reached by paths of

    length up to a certain cutofffrom the node of interest. This cutoff should be chosen in agreement with the specified edge weights.

    For betweenness computation exclude paths between nodes within the same set(s). This option can be changed when the subnetwork

    formed by the currently selected nodes is disconnected. In this case, the user might want to compute the betweenness values with

    respect to only a subset of node pairs that are connected by paths over intermediate, unselected nodes. Therefore, RINalyzer checks

    whether the subnetwork defined by the selected nodes is disconnected by computing the number of its connected components.

  • 8/10/2019 Topological Analysis and Interactive Visualization of Biological Networks and Protein Structures

    6/16

    2012

    NatureAmerica,Inc.Allrightsreserved.

    PROTOCOL

    NATURE PROTOCOLS|VOL.7 NO.4 |2012 |675

    centrality measure, RINalyzer offers three different ways to exam-

    ine the results: (i) inspecting the raw values in a sortable table,

    (ii) highlighting selected nodes in the network view or (iii) saving

    the values in a tab-delimited format for further processing. The

    presented workflow particularly focuses on the second option (ii),

    which involves a filter to select nodes with centrality values in a

    given numerical range (Step 2B(xxvi)). This functionality allows

    the user to create sets of best-scoring residue nodes for furtherinvestigations of their functional and structural characteristics in

    both the network view and the 3D protein structure.

    Comparison of residue networks. This workflow introduces

    one possible application scenario that combines NetworkAnalyzer

    and RINalyzer. We compiled a small data set consisting of

    four RINs that are generated from the four subunits of the

    deoxyhemoglobin structure59. First, the batch analysis option

    of NetworkAnalyzer is used to compute the network statistics

    of these RINs and to compare their topologies (Step 2C(iiv)).

    Second, two RINs that represent the two different subunits of

    deoxyhemoglobin are compared with each other using RINalyzer

    (Step 2C(vixiv)).

    This comparison requires an additional structure alignment of

    the two 3D protein structures from the user and eventually resultsin a combined RIN. The comparison network contains different

    types of edges and nodes according to the preserved residue inter-

    actions and the aligned residues. The type of each node and edge

    is stored as an attribute, which can be used to visually adjust the

    network view. Thus, the user can easily highlight and investigate

    the identified similarities and differences between the two RINs

    and the corresponding protein structures.

    MATERIALSEQUIPMENT

    Hardware requirementsPersonal computer with Internet access and web browser (e.g., MozillaFirefox, Microsoft Internet Explorer or Google Chrome); we alsorecommend a screen with resolution of at least 1024 768 pixels and athree-button mouse

    Software requirementsJava Standard Edition, version 6 (download from http://www.java.com/)Cytoscape, version 2.8 (Cytoscape can be installed following the steps

    provided in the Cytoscape protocol34or the following web page:http://cytoscape.wodaklab.org/wiki/Cytoscape_User_Manual/Launching_Cytoscape)

    NetworkAnalyzer (included in the Cytoscape 2.8 installation as a core plug-in)RINalyzer (download and installation instructions for RINalyzer are avail-able at http://rinalyzer.de/docu/install.php)UCSF Chimera, version 1.5 (instructions for its installation are available at

    http://www.cgl.ucsf.edu/chimera/download.html)

    Data

    Sample data sets required for this protocol are provided as supplementaryfiles. The human protein interaction network (Supplementary Data 1) waspublished in the recent interactome screening study by Yu et al.47. The setof four RINs (Supplementary Data 2) was generated using the RINeratorpackage (http://rinalyzer.de/rinerator.php) and represents the four subunitsof the deoxyhemoglobin structure with the PDB identifier 4HHB (ref. 59).

    PROCEDURE

    1| Start Cytoscape.?TROUBLESHOOTING

    2| Follow option A for the topological analysis of biological networks, option B for the interactive visual analysis of residuenetworks or option C for the comparison of residue networks.

    (A) Topological analysis of biological networks (i) Download data.Here we perform the topological analysis of the protein-protein interaction network from Yu et al.47

    (Supplementary Data 1). First, download the file Supplementary Data 1to a local directory. (ii)Import network data(for details, see the Cytoscape protocol34). In the Cytoscape main window, go to the menu option

    File Import Network (multiple file types). Select the option Localfor Data Source Typeand click the Selectbutton.

    Navigate to the directory that contains Supplementary Data 1and select the file. Confirm the selection by clickingthe Openbutton. Then click theImportbutton to import this network into the current Cytoscape session. When thenetwork is successfully loaded, a summary window will appear. Click the Closebutton of this window and return to the

    Cytoscape main window. (iii)Apply network layout.To apply a specific layout to the network, go to the menu option Layouts yFiles Organic.

    The network view can be enlarged by clicking the Maximizebutton in the upper right corner of the network view window. (iv) Run NetworkAnalyzer.Go to the menu option Plugins Network Analysis Analyze Network. NetworkAnalyzer

    can perform topological analysis on directed networks as well as on undirected networks. Therefore, the user canchoose how the edges should be interpreted. As this network is undirected, select the option Treat the network asundirectedand click the OKbutton to start the analysis. A Progressdialog will appear. The analysis time depends onthe size of the network and the amount of memory assigned to the Cytoscape application. The Cancelbutton can beused at any time to stop the analysis.?TROUBLESHOOTING

    (v) View results.The results window appears after the analysis is completed (Supplementary Fig. 1). The first tabshows the computed simple parameters, e.g., the clustering coefficient and the average shortest path length.

    http://cytoscape.wodaklab.org/wiki/Cytoscape_User_Manual/Launching_Cytoscapehttp://cytoscape.wodaklab.org/wiki/Cytoscape_User_Manual/Launching_Cytoscapehttp://cytoscape.wodaklab.org/wiki/Cytoscape_User_Manual/Launching_Cytoscapehttp://cytoscape.wodaklab.org/wiki/Cytoscape_User_Manual/Launching_Cytoscape
  • 8/10/2019 Topological Analysis and Interactive Visualization of Biological Networks and Protein Structures

    7/16

    2012

    NatureAmerica,Inc.Allrightsreserved.

    PROTOCOL

    676|VOL.7 NO.4 |2012 |NATURE PROTOCOLS

    The remaining tabs display complex network param-eters such as degree and shortest path distributions.All topological parameters are described in more de-tail in Box 1. Select the tab Node Degree Distribution.The node degree distribution is depicted in a log-log

    plot. Thexaxis enumerates the degrees of nodes inthe network and theyaxis shows the frequency ofnodes with a given degree.

    (vi) Fit a power law.The degree distribution of many biological networks is known to approximate a power law. Click on thebutton Fit Power Lawto fit a power law to the distribution. A warning message will inform you that only points with

    positive coordinate values are considered for the fit. Confirm this message by clicking the OKbutton. After a shortdelay, the dialog NetworkAnalyzer Fitted Functionappears. It reports the fitted power law constants, the correlationbetween the given data points and the corresponding points on the fitted curve, and the R-squared value as a measureof fit quality between 0 and 1 (the higher the value, the better the fit). Click the OKbutton to close the dialog and

    see the fitted power law in the chart (Fig. 2). (vii) Explore charts.Click the button Enlarge Chartto open the distribution plot in a separate, enlarged window. Almost all

    nodes in the network have a degree of < 30. The dot near the lower right corner of the plot indicates that there is onlyone node with degree 151, which hereafter we call hub node because of this exceptional number of protein interac-tions. Close the window.

    (viii) Customize charts.Click on the button Chart Settingsto rename the axes in the tabAxes, show or hide the gridlinesin the tab Gridlines, change the shape and color of chart points in the tab Histogram. Click the OKbutton to applychanges of the settings or the Cancelbutton to close the dialog without saving the changes.

    (ix) Export charts.Every chart in the results window can be saved to a file. To save the current chart as an image, clickthe Export Chartbutton. Adjust the image size by entering your preferred values in the two displayed text fields andconfirm it by the Savebutton. Navigate to the directory where you want to save the image and select the file typefrom the drop-down menu. Finally, click the Savebutton. In addition, it is possible to export the visualized data forfurther processing in a different application. For example, select the tab Betweenness centrality. This scatter plotdisplays the correlation between node degree and betweenness centrality in the studied network. Every node in thenetwork is represented by a point. Thexaxis gives the node degree and theyaxis the betweenness. Click the buttonExport dataand enter a file name (including extension) to store the values of these topological parameters. Afterclicking the Savebutton, the newly created tab-separated text file will contain a table of the degree and betweennesscentrality values for every node in the network. This file can be easily imported in external software applications such

    as a spreadsheet tool for further analysis or processed by other programs. (x) Visualize topological parameters.In Step 2A(vii), we identified a hub node in the network; now we are interested in

    locating it in the network view. Thus, we will visually map the node degree to node size in the network view. Click

    the button Visualize Parametersin the results dialog of NetworkAnalyzer. In the Map node size todrop-down menu,select Degree. Nodes with a low degree should be displayed as small circles in contrast to nodes with a high degree.To this end, select the option Low values to small sizes. In addition, it is possible to map the degree or any othercomputed topological parameter to the node color. Choose ClusteringCoefficientin the drop-down menu on the rightside and select the option Low values to bright colors. Nodes with low clustering coefficient will now be green andnodes with high clustering coefficient will be red. Finally, confirm the mapping choice by clicking the Applybutton.

    This results in changed network visualization (Supplementary Fig. 2). If necessary, move the networkstatistics dialog to the right corner of your screen or close it in order to see the updated network view. The hubnode is now clearly visible as the largest circle in the network view. The large number of green-colored nodesindicates that most nodes have low clustering coefficient, i.e., the neighbors of most nodes do not tend to interactwith each other. To obtain an even better view of the nodes, zoom into it by applying the button Zoom inon thetoolbar or using the mouse scroll wheel.

    Figure 2|Screenshot of network statistics computed by NetworkAnalyzer.The depicted node degree distribution is derived from the undirected

    protein-protein interaction network from Yu et al.47. The red line represents

    a fitted power law, which indicates that the analyzed network is scale-free.

    The tabs below the dialog title lead to the display of histograms or scatter

    plots of the complex topological parameters computed by NetworkAnalyzer.

    The buttons on the right side provide the user with a variety of options for

    customizing the view as well as for exporting the displayed charts and the

    underlying data.

  • 8/10/2019 Topological Analysis and Interactive Visualization of Biological Networks and Protein Structures

    8/16

    2012

    NatureAmerica,Inc.Allrightsreserved.

    PROTOCOL

    NATURE PROTOCOLS|VOL.7 NO.4 |2012 |677

    (xi) Save network statistics.Close the analysis results window. A warning message appears that the computed networkstatistics have not been saved. Click the Yesbutton to close the statistics window without saving the results. To recom-pute the network statistics at a later time point, just run NetworkAnalyzer again. Alternatively, the results can be savedto and reloaded from a text file to avoid recomputation. For this purpose, click on the button Save Statistics. Enter afile name to store the network statistics in a file with the extension .netstatsand click the Savebutton to confirm it.

    (xii) (Optional) Perform centrality analysis. In addition to NetworkAnalyzer, RINalyzer can be applied to perform central-ity analysis on the loaded network. RINalyzer supports weighted networks and computes several weighted centralitymeasures additionally (Box 3). To use RINalyzer now, continue with Step 2B(xx).

    (B) Interactive visual analysis of residue networks (i) Retrieve RIN data.Identify a protein of interest with an experimentally determined 3D structure deposited in the PDB.

    For example, we have chosen the HIV-1 protease60with the PDB identifier 1HIV. Start a web browser and go to theRINdata web page (http://rinalyzer.de/rindata.php) to download the corresponding RIN data. Enter the PDB identi-fier 1HIV in the search form and click the button Retrieve RIN data. If RIN data are available for this PDB identifier,a download link is provided. Click on this link and download the file to a local directory. The downloaded RIN data

    are a zipped archive that contains multiple files: a PDB file with the 3D protein structure of the original PDB file (asretrieved from the PDB) with added hydrogens (pdb1hiv_h.ent); a SIF file containing the RIN for all chains in thePDB file (pdb1hiv_h.sif); an edge attribute file with edge weights reflecting the strength of the interactions betweenresidues (pdb1hiv_h_intsc.ea); and an edge attribute file with edge weights representing the number of interactionsbetween residues (pdb1hiv_h_nrint.ea). Unzip all files from the archive.

    ?TROUBLESHOOTING (ii)Import network into Cytoscape.In the Cytoscape main window, go to the menu option File Import Network

    (multiple file types). Select the option Localfor Data Source Typeand click the Selectbutton. Navigate to the directorythat contains the extracted RIN files and select the network SIF file, e.g.,pdb1hiv_h.sif. Confirm the selection by clicking

    the Openbutton and then click theImportbutton. When the network is successfully loaded, a summary window willappear. Click the Closebutton of this window and return to the Cytoscape main window. The network view can beenlarged by clicking the Maximizebutton in the upper right corner of the network view window.

    (iii)Import edge attributes into Cytoscape.Import the edge weights representing the number of interactions betweenresidues, as they are needed in Step 2B(xxii) for the network centrality analysis. Go to the menu optionFile Import Edge Attributes. Navigate to the directory that contains the RIN files. Select the edge attribute filepdb1hiv_h_nrint.eaand click the Openbutton. When the attributes are successfully loaded, a summary window willappear. Click the Closebutton of this window and return to the Cytoscape main window.

    (iv) Open protein structure in UCSF Chimera.Go to the menu option Plugins

    RINalyzer

    Protein Structure

    Openstructure from filein the Cytoscape main window and navigate to the directory that contains the RIN files. Select thePDB file (pdb1hiv_h.ent) and click the Openbutton. It may take a while until UCSF Chimera is launched and the 3Dstructure is loaded. Afterwards, a summary window about the internally performed mapping between network nodesand structure entities will appear. Click the Closebutton of this window.?TROUBLESHOOTING

    (v) Explore protein structure.Use the mouse to move and scale the protein structure in the main UCSF Chimera window.By default, the left mouse button controls rotation, the middle mouse button controlsXYtranslation and the rightmouse button controls scaling. While holding down the Ctrlkey, use the left mouse button to select residues of inter-est by clicking on them or to drag out a selection area (sweep out an area before releasing the left mouse button).

    (vi) Show protein backbone.To see the protein backbone in the 3D structure and to add backbone edges to the RIN,select Plugins RINalyzer Protein Structure Show backbonein the Cytoscape main window. This option automat-ically adds protein backbone edges to the RIN in Cytoscape and also invokes the display of the ribbon representation

    for the corresponding 3D protein structure in UCSF Chimera. (vii)Apply RIN layout.Layout the RIN in Cytoscape according to the 3D structure view in UCSF Chimera by selecting the

    menu option Plugins RINalyzer Layout RIN Layout. Click on the icon 1:1in the Cytoscape toolbar to see thewhole network. As the graphics details of the network view are normally not displayed when the network is zoomed-out,select the menu option View Show Graphics Details.

    ?TROUBLESHOOTING(viii) Customize RIN view.Go to the menu option Plugins RINalyzer Visual Propertiesto choose in the tab

    General & Nodeshow the node label should be displayed. For example, if only residue index and type are selected,the node labels are updated accordingly (Supplementary Fig. 3). In the tab Edges, the visible edge types can beselected. The network view is updated automatically each time an edge type box is checked or unchecked. In thesame tab, the option Straighten edge linescontrols whether multiple edges are drawn as straight parallel lines or not.When satisfied with the customized settings, confirm them by clicking theApplybutton and click the Closebutton of

  • 8/10/2019 Topological Analysis and Interactive Visualization of Biological Networks and Protein Structures

    9/16

    2012

    NatureAmerica,Inc.Allrightsreserved.

    PROTOCOL

    678|VOL.7 NO.4 |2012 |NATURE PROTOCOLS

    the dialog RIN Visual Properties.

    In the resulting network view,the nodes are colored accordingto secondary structure and theedges according to interactiontype. More details about the

    different visual properties canbe found in Box 2.

    (ix) Synchronize colors between views.After customizing the visual properties of a RIN, nodes are usually colored according to secondary structure. To transfer

    the node colors to the corresponding residues in UCSF Chimera, go to the menu option Plugins RINalyzer ProteinStructure Sync 3D view colors. The resulting network and 3D structure views should be the same as in Figure 3.

    (x) (Optional) Show only protein backbone.Now we want to look at only the protein backbone in both the networkview and the 3D structure view. If the protein backbone is not yet visible in the 3D structure and the RIN, selectPlugins RINalyzer Protein Structure Show backbonein the Cytoscape main window. Then go to the menuoptionActions Atoms/Bonds hidein the UCSF Chimera window to hide all atoms in the 3D structure view. In theCytoscape main window, go to the menu option Plugins RINalyzer Visual Properties Edges. Uncheck the boxesnext to all edge types except of the backbone edges. The resulting RIN and the corresponding 3D protein structure

    should look as in Supplementary Figure 4. Show the edges again by checking the boxes next to the edge types in thedialog RIN Visual Properties. When the edges are added to the network, they are visualized as curved lines. Click theApplybutton tostraightenthem. The atoms in the 3D structure can be depicted again by executingActions Atoms/Bonds showin the UCSF Chimera window.

    (xi) (Optional) Hide protein backbone.The backbone can be hidden in both views by clicking on Plugins RINalyzer Protein Structure Hide backbonein the Cytoscape main window.

    (xii) Create sets of residue nodes.RINalyzer provides an interface to manage node sets. To open it, click on the menu optionPlugins RINalyzer Manage Node Sets.The RINalyzer Node Setspanel appears as the last tab in the Cytoscape

    Control Panel. New node sets can be created in different ways. For instance, to create a set that contains the currentlyselected residues in UCSF Chimera, switch to the UCSF Chimera window and click Select Chain Ato select all

    residues in chain A. Selected residues are colored in green, and the corresponding nodes in Cytoscape are also selectedautomatically (yellow). In the panel RINalyzer Node Sets, go to the menu option File New Set from selected nodesto create a set that contains the nodes corresponding to currently selected residues in UCSF Chimera. Insert a name for

    the set to be created, e.g., Chain A, and click OKto confirm it. The same actions can be repeated to create a secondset named Chain Bthat contains all nodes corresponding to residues in chain B.

    (xiii) Select set nodes in the network view.To see all set nodes selected in the network view, use the option Select nodesinthe context menu of the set (right-click the set name). To clear the current node selection, click on the background inthe network view window.

    (xiv)Add active site nodes to a set.It is known that the active site residues of the HIV-1 protease are ASP 25, THR 26 and

    GLY 27 in chains A and B (ref. 61). To create a set with the active site residues of the HIV-1 protease for use in thecentrality analysis in Step 2B(xxvii), go to the menu option File New Empty setin the Cytoscape panelRINalyzer Node Sets. Enter the nameActive siteand click the OKbutton to confirm. Go to the Searchfield in the Cyto-scape toolbar and start inserting the node identifier a:25:_:asp. As a result of this insertion, a single hit should appearin the drop-down menu of the search field. Press Enterto select the node. In the panel RINalyzer Node Sets, go to themenu option Edit Add nodes, and the selected node will be added to the currently selected node set, which should

    Figure 3|Simultaneous view of RIN and 3Dprotein structure by RINalyzer. The RIN of the HIV-1

    protease (PDB identifier 1HIV (ref.60)) is displayed

    in Cytoscape (top), whereas the molecular

    graphics visualization of the 3D protease

    structure is shown in UCSF Chimera (bottom). All

    RIN nodes and the corresponding residues are

    colored according to secondary structure: blue for

    helices and red for strands. The various types ofnoncovalent residue interactions correspond to

    different edge colors: interatomic contacts are in

    blue; hydrogen bonds in red; overlapping van der

    Waals radii in gray; and the backbone in black.

  • 8/10/2019 Topological Analysis and Interactive Visualization of Biological Networks and Protein Structures

    10/16

    2012

    NatureAmerica,Inc.Allrightsreserved.

    PROTOCOL

    NATURE PROTOCOLS|VOL.7 NO.4 |2012 |679

    beActive site. Repeat the same actions for the remaining five active site residues: a:26:_:thr; a:27:_:gly; b:25:_:asp;b:26:_:thr;and b:27:_:gly. The setActive siteshould eventually contain six nodes. It is possible to color the active sitenodes and the corresponding residues in the 3D structure as will be shown in Step 2B(xvii).

    (xv)Identify residue nodes on the interface of chain A.We can use the interface RINalyzer Node Setsto identify whichresidues from chain B interact with chain A. Right-click the node set Chain Aand execute the menu option SelectNodes. Afterwards, in the Cytoscape menu, go to the menu option Select Nodes First Neighbors of Selected Nodes.This operation may take several seconds and it is finished when the neighboring residues are highlighted in yellow.Back in the panel RINalyzer Node Sets, go to the menu option File New Set from selected nodestocreate a set that contains all nodes corresponding to chain A and their neighbors. Enter the set name, e.g., Chain Aand neighbors, and click the OKbutton to confirm it. Now, all nodes in this new set that do not belong to chain A arethe nodes from chain B that interact with nodes from chain A. To extract these nodes, we need to build the intersec-tion of the sets Chain Band Chain A and neighbors. The interface RINalyzer Node Setssupports typical set operationssuch as the union and intersection of sets. To create the intersection of two sets, select both by left-clicking whilepressing the Ctrlkey (or the Commandkey for Mac users) and go to the menu option Operations Intersection. This

    action will create a new set that is the intersection of the two selected sets. Enter a name for the new set, e.g., Chain BInterface, and click the OKbutton to confirm it.

    (xvi)Identify residue nodes on the interface of chain B.To create a node set Chain A Interface, select the nodes in the setChain B; then select their first neighbors using the Cytoscape option Select Nodes First Neighbors of SelectedNodesand create a node set Chain B and neighbors, and finally, build the intersection of the set Chain B and neighbors

    and the set Chain Ato create the node set Chain A Interfaceas described in Step 2B(xv).(xvii) Color set nodes and corresponding residues.We can highlight different sets in the network view by changing the visual

    properties of the corresponding set nodes, e.g., by coloring them in a different color. Right-click the node set Chain Ato access its context menu and select the menu option Visual Mapping Bypass Node Color. Choose a color and click

    OKto color all set nodes in the network view. In addition, select the option Sync 3D view colorsfrom the context menuto color the corresponding residues in UCSF Chimera with the same color. It is possible to repeat the same actions forthe node sets Chain A Interface, Chain Band Chain B Interface. In the end, the network and 3D structure could looklike the image shown in Supplementary Figure 5.

    (xviii)Save node sets for further analysis.Select all sets by left-clicking them while pressing the Ctrlkey or by clicking thefirst set and then clicking the last set while holding the Shiftkey pressed. In the panel RINalyzer Node Sets, go to themenu option File Save selected set(s). Enter a file name and click Save. Close the resulting dialog that informs youabout the successfully performed action.

    (xix) Prepare network for centrality analysis.Make sure that the backbone edges in the network are hidden, as they areonly meant to aid with the visual analysis of the RIN. To hide them, go to the menu option Plugins RINalyzer Protein Structure Hide backbone. Hiding the backbone edges in the RIN will concomitantly hide the ribbons in the3D structure view. Therefore, if you do not see the 3D structure any more, switch to UCSF Chimera and go to the menuoptionActions Atoms/Bonds showto display the atoms. In addition, the 1HIV structure contains a third chainIthat represents an inhibitor bound to the protease. One might want to remove or hide the corresponding RIN nodesbefore performing the centrality analysis. In order to select this chainI, go to the menu option Select Chain Iin UCSF Chimera. Then switch to the Cytoscape main window and go to the menu option Edit Delete Selected Nodes

    and Edgesto delete the selected nodes. (xx) Handle disconnected network nodes.Make sure the network is connected. The HIV-1 protease RIN contains

    two nodes, A:40:_:GLY and B:37:_:SER, which are not connected to any other node in the network. Thus, when thecentrality analysis is started, a warning message will appear that the network has more than one connected compo-nents. In such cases, shortest path centrality measures are computed for each connected component independently,

    but current flow and random walk centralities are not computed at all. There are two possible solutions to deal withthis issue: proceed with the analysis by clicking the Yesbutton, keeping in mind that these nodes are disconnectedfrom all other nodes in the network; alternatively, cancel the analysis by clicking the Nobutton, select the twodisconnected nodes in the network view and delete them by clicking Edit Delete Selected Nodes and Edges.

    (xxi) Select root nodes for analysis.The centrality analysis can be started only if a set of nodes (root set) is selectedin the network view. RINalyzer computes each centrality measure with respect to the root set (Box 3). For example,the weighted degree of a node is computed by counting its neighbors that are contained in the root set and that arewithin a given distance cutoff from the node of interest. The first-time user might just select all nodes by clickingSelect Nodes Select all nodesin the Cytoscape main window. This action can take a few seconds because boththe nodes in the network and the residues in the 3D structure are selected.

    (xxii) Perform centrality analysis.Start the analysis from the menu option Plugins RINalyzer Analyze Network.A dialogthat contains different analysis settings will appear (Supplementary Fig. 6). The settings are described in detail in

  • 8/10/2019 Topological Analysis and Interactive Visualization of Biological Networks and Protein Structures

    11/16

  • 8/10/2019 Topological Analysis and Interactive Visualization of Biological Networks and Protein Structures

    12/16

    2012

    NatureAmerica,Inc.Allrightsreserved.

    PROTOCOL

    NATURE PROTOCOLS|VOL.7 NO.4 |2012 |681

    Openbutton to confirm it. AsRINs are undirected, we do notneed to consider all network

    interpretations. Select the optionConsider networks as undirected.

    (iii) Perform batch analysis.Clickthe button Start Analysis(Supplementary Fig. 8a).

    A dialog appears that displays the progress of the batch analysis (Supplementary Fig. 8b). Depending on the number ofnetworks and their size, this might be a very time-consuming step. The batch analysis can be canceled at any time byclicking the Cancelbutton in the progress dialog.

    (iv) View batch analysis results. After the analysis is complete, the button Show Resultsis enabled. Click on it to see the

    dialog Batch Analysis - Results. The dialog contains a table of all topological analyses performed. Every row in the re-sults table lists the loaded network, its interpretation and the resulting network statistics file, which was saved to theoutput directory (Supplementary Fig. 8c).

    (v) Load network statistics in Cytoscape.Clicking on a network name and a statistics file name will load the network andthe topological analysis results in Cytoscape, respectively. Load all four statistics files and compare the simple param-eters computed for each network (Supplementary Fig. 9). We can notice that the two subunits, networks A and C,are very similar to each other. This is also the case for the two subunits, networks B and D. However, there are ap-parent differences between the network parameters for the RINs of the and subunits. Close the network statistics

    dialogs to finish the results inspection. (vi) Load networks into Cytoscape.Click on the networkspdb4hhb_h_A.sifandpdb4hhb_h_B.sifto load them into Cytoscapefor the next steps. You can now close the dialog Batch Analysis - Results.

    (vii) Retrieve structure alignment file.RINalyzer offers the functionality to compare two RINs based on a superpositionalignment of the corresponding 3D protein structures. Here we compare two of the networks loaded in the previousstep, i.e., one subunit and one subunit of the human protein deoxyhemoglobin (PDB identifier 4HHB). Start a webbrowser and navigate to the RCSB PDB Protein Comparison Tool website (http://www.rcsb.org/pdb/workbench/work-bench.do). Insert the PDB identifier 4HHB in the text field forID 1and choose chain A by selecting 4HHB.A in thedrop-down menu. Insert the same identifier in the text field for ID 2and select 4HHB.B in the drop-down menu. Then,in the drop-down menu Select Comparison Method, choose thejCE algorithmand click the Comparebutton. In the

    Structure Alignment Viewpage, scroll down to the panel Download Alignment. Right-click the link Download XMLand select the option Save Link As.Navigate to the directory where the file should be saved, enter a name for it(e.g., 4hhba_vs_4hhbb.xml), and click the Savebutton to confirm it. Close the Protein Comparison Tool.

    (viii) Perform RIN comparison.To compare RINs using RINalyzer, go to Plugins RINalyzer Compare RINs. Selectpdb4hhb_h_Aas first network,pdb4hhb_h_Bas second network and then enter a name for the resulting com-parison network (e.g., comparison). Click the button ... and navigate to the alignment file downloaded in Step2C(vii). Confirm its selection by clicking the Openbutton. Next, click the Comparebutton to perform the actualcomparison. A new network with 148 nodes and 2405 edges is created. This combined RIN consists of three typesof nodes: nodes that represent aligned residues according to the structure superposition, and two types of nodes

    that correspond to residues that were not aligned by the superposition and belong to the first or second network.The network also contains three different edge line styles: solid lines for interactions present in both networks,dashed lines for interactions from the first network and dotted lines for interactions from the second network.The type of each node and edge is stored as an attribute named BelongsToand represented by one of the followingthree values: net1, net2or both. The value net1refers to the first RIN selected in the comparison, and the valuenet2to the second RIN.

    Figure 4|Comparison network generated byRINalyzer. The combined network resulted from

    the comparison of the two RINs that represent

    one of the and one of the subunits of human

    deoxyhemoglobin (PDB identifier 4HHB (ref. 59),

    chains A and B). Edge colors refer to the

    interaction type, i.e., interatomic contacts in

    blue; hydrogen bonds in red; and overlaps in gray.

    Edge line styles correspond to noncovalent residueinteractions that are preserved in both subunits

    (solid lines), present only in the subunit (dashed

    lines) or only in the subunit (dotted lines).

  • 8/10/2019 Topological Analysis and Interactive Visualization of Biological Networks and Protein Structures

    13/16

    2012

    NatureAmerica,Inc.Allrightsreserved.

    PROTOCOL

    682|VOL.7 NO.4 |2012 |NATURE PROTOCOLS

    (ix)Adjust network view.Maximize the network view window and select View Show Graphics Detailsfrom theCytoscape menu.

    (x)Apply network layout.Apply the organic layout (Layout yFiles Organic) and the RIN visual properties(Plugins RINalyzer Visual Properties). The resulting network should look as in Figure 4.

    (xi) Hide interaction edges.First, we want to reduce the visual complexity by showing fewer edges. Go to the menu

    option Plugins RINalyzer Visual Propertiesand select the Edgestab. Hide all edges except combi:all_allbyunchecking the boxes next to each edge type and close the dialog RIN Visual Propertiesby clicking the Closebutton.

    (xii) Color nodes and edges.Now, we color the nodes and edges according to the network they belong to. In the CytoscapeControl Panel, go to the tab VizMapperand double-click the field Edge Color.Select the edge attribute BelongsTofromthe drop-down menu for edge color values and the mapping type Discrete Mappingfrom the mapping type drop-downmenu. A list that contains the three BelongsToattribute values net1, net2and bothwill appear. For each attributevalue, do the following: click the field next to the attribute value and the button ... will appear; click this button,select a color and click the OKbutton to confirm it. Repeat the same actions for mapping the node color using the

    BelongsTonode attribute.

    (xiii) Customize node labels.It is also possible to change the node labels by clicking the field next to the visual propertyNode Label.Then select the attribute CombinedLabeland the mapping type Passthrough Mapping. The node attributeCombinedLabelcontains node labels composed of the labels of the aligned nodes from the compared networks.

    (xiv) Explore comparison network.After the mapping is applied to the network view, it should look as in SupplementaryFigure 10. Zoom in using the + button in the Cytoscape toolbar to observe the residue interaction differences be-

    tween the superimposed and subunits of deoxyhemoglobin.

    ?TROUBLESHOOTINGTroubleshooting advice for basic problems that may occur during the procedure is given in Table 1.

    Further information about using Cytoscape can be found in the documentation at http://www.cytoscape.org/documentation_users.htmland via the helpdesk mailing list http://groups.google.com/group/cytoscape-helpdesk.Tutorials and documentation about UCSF Chimera are available at http://plato.cgl.ucsf.edu/chimera/docindex.html andquestions can be addressed to the users mailing list ([email protected]). RINalyzer and NetworkAnalyzerdocumentations can be found at http://rinalyzer.de/documentation.php and at http://med.bioinf.mpi-inf.mpg.de/netanalyzer/help/2.7/index.html, respectively.

    TABLE 1 |Troubleshooting table.

    Step Problem Possible reason Solution

    1 Cytoscape does not start Java is not installed properly Make sure that Java version 6 is installed. Java can be

    downloaded from http://www.java.com/

    2A(iv), 2B(xxii) The analysis takes very

    long or seems to be

    frozen

    Cytoscape has run out of

    memory

    Increase the memory for the Cytoscape program. One

    way to do this is to start Cytoscape from the command

    line and use the -Xmx option to set the memory size. To

    this end, open a command line window, navigate to the

    Cytoscape directory and type java -Xms10m

    Xmx1500M -jar cytoscape.jar -p plugins

    to start Cytoscape with 1,500 MB of memory. For alterna-tive ways to increase the memory, see the Cytoscape Wiki

    at http://cytoscape.wodaklab.org/wiki/How_to_increase_

    memory_for_Cytoscape

    2B(i) There is no RIN data for

    a protein

    The RINdata database does not

    contain precomputed RINs for

    all PDB identifiers

    Download and apply the package RINerator (http://

    rinalyzer.de/rinerator.php) to generate the RIN data.

    Alternatively, the RING web server (http://protein.cribi.

    unipd.it/ring/)62can be used to create different types

    of RINs

    (continued)

    http://www.cytoscape.org/documentation_users.htmlhttp://www.cytoscape.org/documentation_users.htmlhttp://groups.google.com/group/cytoscape-helpdeskhttp://med.bioinf.mpi-inf.mpg.de/netanalyzer/help/2.7/index.htmlhttp://med.bioinf.mpi-inf.mpg.de/netanalyzer/help/2.7/index.htmlhttp://med.bioinf.mpi-inf.mpg.de/netanalyzer/help/2.7/index.htmlhttp://med.bioinf.mpi-inf.mpg.de/netanalyzer/help/2.7/index.htmlhttp://groups.google.com/group/cytoscape-helpdeskhttp://www.cytoscape.org/documentation_users.htmlhttp://www.cytoscape.org/documentation_users.html
  • 8/10/2019 Topological Analysis and Interactive Visualization of Biological Networks and Protein Structures

    14/16

    2012

    NatureAmerica,Inc.Allrightsreserved.

    PROTOCOL

    NATURE PROTOCOLS|VOL.7 NO.4 |2012 |683

    TIMINGThe time required to execute this protocol primarily depends on the size of the analyzed data sets and the CPU power.By using the provided data sets, an experienced user can execute the protocol within about 1 h 30 min, whereas aninexperienced user may need more than 2 h. It takes 1520 min to complete Step 2A; 1 h to 1 h 20 min for Step 2B; and2030 min for Step 2C.

    ANTICIPATED RESULTSHere we discuss the results obtained by following each of the three workflows described in this protocol.

    Step 2A: Topological analysis of biological networks

    The application of NetworkAnalyzer on the protein-protein interaction network from Yu et al.47(Supplementary Data 1)produces a comprehensive set of topological network parameters. The network exhibits scale-free behavior because a powerlaw kwith = 1.62 can be fitted to the node degree distribution. Furthermore, such an value is indicative for ahub-and-spoke network with one hub being connected to a large fraction of nodes. Indeed, the network contains one hubprotein with an exceptionally high node degree (151 interactions). The visual exploration of the network view after mapping

    the clustering coefficient to node color suggests that only a few nodes have clustering coefficients larger than 0. This meansthat the proteins in the network do not tend to form clusters with their interaction partners.

    Step 2B: Interactive visual analysis of residue networksThe RIN generated from the protein structure of the HIV-1 protease (PDB identifier 1HIV) contains 200 nodes and 2,199edges. The nodes can be divided into three groups according to the protein chain: 99 nodes for residues in chain A; 99 nodesfor residues in chain B; and two nodes for chainI. By using the interface RINalyzer Node Sets, we could identify the residuesin the interface between chains A and B of the protein structure. In all, 35 residues from chain A interact with 35 residuesfrom chain B (dark blue and red nodes in Supplementary Fig. 5, respectively).

    Furthermore, we performed a centrality analysis of the RIN of the HIV-1 protease to highlight central nodes. The best-scor-ing nodes according to weighted shortest path closeness (i.e., centrality values > 0.21) were saved in a node set. The overlapbetween this node set (seven nodes) and the node set representing the protease active site (six nodes) is four nodes. Whenstudying the single centrality values in a table sorted from highest to lowest closeness, we observed that the four active siteresidues have the best ranks.

    TABLE 1 |Troubleshooting table (continued).

    Step Problem Possible reason Solution

    2B(iv) UCSF Chimera does

    not start

    The path to UCSF Chimera is not

    configured properly

    Open the dialog Cytoscape Preferences Editor(EditPreferences Properties). Click theAddbutton and enterthe name of the property: Chimera.chimeraPath. Click OK

    and enter the path to the UCSF Chimera application. On

    a Linux machine, this could be $HOME/chimera/bin; on a

    Windows machine, C:\Program Files\Chimera\bin; and on

    a Macintosh, /Applications/Chimera.app/Contents/

    MacOS. Save the new preferences by clicking the option

    Make Current Cytoscape Properties Defaultat the bottom

    of the dialog

    2B(vii) RINLayout is not applied

    to the network

    The 3D structure corresponding

    to the current network is not

    loaded in UCSF Chimera

    Load the protein structure corresponding to the current

    network using the menu option PluginsRINalyzer

    Protein StructureOpen structure from file

    More than one protein structure

    is loaded in UCSF Chimera

    Close all protein structures opened in UCSF Chimera

    except for the structure that corresponds to the current

    network, using the menu option PluginsRINalyzerProtein StructureClose

  • 8/10/2019 Topological Analysis and Interactive Visualization of Biological Networks and Protein Structures

    15/16

    2012

    NatureAmerica,Inc.Allrightsreserved.

    PROTOCOL

    684|VOL.7 NO.4 |2012 |NATURE PROTOCOLS

    Step 2C: Comparison of residue networksThe RINs (Supplementary Data 2) generated from the four subunits of human deoxyhemoglobin (chains A, B, C and D inthe PDB structure with identifier 4HHB) are of similar size: 141 nodes and 1,885 edges for chain A; 146 nodes and 1,935edges for chain B; 141 nodes and 1,887 edges for chain C; and 146 nodes and 1,971 edges for chain D. As one might expect,the analysis performed with NetworkAnalyzer (Supplementary Fig. 9) indicates that the RINs of chains A and C, the two subunits, have almost identical simple network parameters such as clustering coefficient, network centralization, numberof shortest paths, characteristic path length and network density. The same holds for the RINs of chains B and D, the twosubunits. However, the difference between the simple parameter values for chains A and B is, for most parameters, largerthan the difference between the same subunits. The complete set of both simple and complex network parameters can becompared further using the network statistics files generated by NetworkAnalyzer.

    To compare the individual residue interactions in the two RINs of chains A and B, we used RINalyzer, which generates acombined comparison network based on the superposition alignment of the corresponding 3D structures. The comparisonnetwork contains 148 nodes and 2,405 edges. Of the 148 nodes, two represent residues in chain A and 7 nodes residues inchain B; the remaining 139 nodes correspond to the aligned residues. The number of edges that correspond to noncovalent

    interactions that are identical in both subunits (1,415 edges) is considerably higher than the number of nonidentical edges(470 and 520 for chains A and B, respectively). These numbers reflect structural similarities and differences of the twosubunits. When visually exploring the simplified comparison network (Supplementary Fig. 10), we can recognize the largenumber of edges that represent noncovalent interactions identical in both subunits (523 solid edge lines) and the rathersmall number of interactions present either in the subunit (68 dashed edge lines) or in the subunit (89 dotted edge

    lines) of deoxyhemoglobin. The nonidentical edges can be seen mainly in the network part that contains nodes of unalignedresidues. Dashed or dotted edges between aligned residue nodes indicate that the corresponding residues form functionallydistinct interactions in the two homologous, structurally very similar subunits.

    Note: Supplementary information is available via the HTML version of this article.

    ACKNOWLEDGMENTS We thank D. Buezas, T. Kacprowski and C. Weichenbergerfor their useful comments on the workflows and the manuscript. This study was

    partially funded by the BMBF through the German National Genome ResearchNetwork (NGFN) and the Greifswald Approach to Individualized Medicine(GANI_MED). It was also conducted in the context of the DFG-funded Cluster of

    Excellence for Multimodal Computing and Interaction.

    AUTHOR CONTRIBUTIONS N.T.D. conceived and drafted the workflows. Y.A.contributed to the workflows. N.T.D., Y.A., F.S.D. and M.A. wrote and approvedthe manuscript.

    COMPETING FINANCIAL INTERESTSThe authors declare no competing financialinterests.

    Published online at http://www.natureprotocols.com/.

    Reprints and permissions information is available online at http://www.nature.com/reprints/index.html.

    1. Barabasi, A.L., Gulbahce, N. & Loscalzo, J. Network medicine: a network-

    based approach to human disease. Nat. Rev. Genet.12, 5668 (2011).2. Beyer, A., Bandyopadhyay, S. & Ideker, T. Integrating physical and genetic

    maps: from genomes to interaction networks. Nat. Rev. Genet.8, 699710(2007).

    3. Frishman, D. et al.Protein-protein interactions: analysis and prediction.in Modern Genome Annotation: the Biosapiens Network353410 (Springer-

    Verlag, 2008).4. Przytycka, T.M., Singh, M. & Slonim, D.K. Toward the dynamic

    interactome: its about time. Brief Bioinform.11, 1529 (2010).5. Yamada, T. & Bork, P. Evolution of biomolecular networks: lessons from

    metabolic and protein interactions. Nat. Rev. Mol. Cell Biol.10, 791803(2009).

    6. Zhang, S., Jin, G., Zhang, X.S. & Chen, L. Discovering functions andrevealing mechanisms at molecular level from biological networks.Proteomics7, 28562869 (2007).

    7. Bode, C. et al.Network analysis of protein dynamics. FEBS Lett.581,27762782 (2007).

    8. Csermely, P. Creative elements: network-based predictions of active centres

    in proteins and cellular and social networks. Trends Biochem. Sci.33,569576 (2008).

    9. Krishnan, A., Zbilut, J.P., Tomita, M. & Giuliani, A. Proteins as networks:usefulness of graph theory in protein science. Curr. Protein Pept. Sci.9,2838 (2008).

    10. Vishveshwara, S., Ghosh, A. & Hansia, P. Intra and inter-molecularcommunications through protein structure network. Curr. Protein Pept. Sci.10,146160 (2009).

    11. Welsch, C. et al.Molecular basis of telaprevir resistance due to V36 andT54 mutations in the NS3-4A protease of the hepatitis C virus. Genome

    Biol.9, R16 (2008).12. Albert, R. Scale-free networks in cell biology.J. Cell Sci.118, 49474957 (2005).13. Almaas, E. Biological impacts and context of network theory.J. Exp. Biol.210,

    15481558 (2007).

    14. Brohee, S., Faust, K., Lima-Mendez, G., Vanderstocken, G. & van Helden, J.Network analysis tools: from biological networks to clusters and pathways.Nat. Protoc.3, 16161629 (2008).

    15. Junker, B.H. & Schreiber, F.Analysis of Biological Networks(John Wiley &Sons, 2008).

    16. Pavlopoulos, G.A. et al.Using graph theory to analyze biological networks.BioData Min.4, 10 (2011).

    17. Przulj, N. Protein-protein interactions: making sense of networks viagraph-theoretic modeling. BioEssays33, 115123 (2011).

    18. Zhu, X., Gerstein, M. & Snyder, M. Getting connected: analysis andprinciples of biological networks. Genes Dev.21, 10101024 (2007).

    19. Chuang, H.Y., Hofree, M. & Ideker, T. A decade of systems biology.Annu.Rev. Cell Dev. Biol.26, 721744 (2011).

    20. Pavlopoulos, G.A., Wegener, A.-L. & Schneider, R. A survey of visualizationtools for biological network analysis. BioData Min.1, 12 (2008).

    21. Suderman, M. & Hallett, M. Tools for visually exploring biologicalnetworks. Bioinformatics23, 26512659 (2007).

    22. OMadadhain, J., Fisher, D., White, S. & Boey, Y.B. The JUNG (Java

    Universal Network/Graph) Framework. Techn. Rep. UCI-ICS0317 (2003).23. Mehlhorn, K. & Nher, S. LEDA: A Platform for Combinatorial and Geometric

    Computing(Cambridge University Press, 1999).

    24. Hagberg, A.A., Schult, D.A. & Swart, P.J. Exploring network structure,dynamics, and function using NetworkX. Proceedings of the 7th Python in

    Science Conference1115 (2008).

    25. Csrdi, G. & Nepusz, T. The igraph software package for complex networkresearch.InterJ. Complex Syst.1695 (2006).

    26. Handcock, M.S., Hunter, D.R., Butts, C.T., Goodreau, S.M. & Morris, M.

    statnet: software tools for the representation, visualization, analysis andsimulation of network data. J. Stat. Softw.24, 15487660 (2008).

  • 8/10/2019 Topological Analysis and Interactive Visualization of Biological Networks and Protein Structures

    16/16

    2012

    NatureAmerica,Inc.Allrightsreserved.

    PROTOCOL

    27. Butts, C.T. Social network analysis with sna.J. Stat. Softw.24(2008).28. Opsahl, T., Agneessens, F. & Skvoretz, J. Node centrality in weighted

    networks: generalizing degree and shortest paths. Soc. Networks32,245251 (2010).

    29. Mueller, L.A.J., Kugler, K.G., Dander, A., Graber, A. & Dehmer, M. QuACN:an R package for analyzing complex biological networks quantitatively.Bioinformatics27, 140141 (2011).

    30. Batagelj, V. & Mrvar, A. Pajekprogram for large network analysis.Connections21, 4757 (1998).

    31. Hu, Z. et al.VisANT: data-integrating visual framework for biologicalnetworks and modules. Nucleic Acids Res.W352W357 (2005).

    32. Khler, J. et al.Graph-based analysis and visualization of experimental

    results with ONDEX. Bioinformatics22, 13831390 (2006).33. Garcia-Garcia, J., Guney, E., Aragues, R., Planas-Iglesias, J. & Oliva, B.

    Biana: a software framework for compiling biological interactions and

    analyzing networks. BMC Bioinformatics11, 56 (2010).34. Cline, M.S. et al.Integration of biological networks and gene expression

    data using Cytoscape. Nat. Protoc.2, 23662382 (2007).35. Shannon, P. et al. Cytoscape: a software environment for integrated models

    of biomolecular interaction networks. Genome Res.13, 24982504 (2003).36. Wittkop, T. et al.Comprehensive cluster analysis with Transitivity

    Clustering. Nat. Protoc.6, 285295 (2011).37. Srivas, R. et al.Assembling global maps of cellular function through

    integrative analysis of physical and genetic networks. Nat. Protoc.6,13081323 (2011).

    38. Assenov, Y., Ramirez, F., Schelhorn, S.-E., Lengauer, T. & Albrecht, M.

    Computing topological parameters of biological networks. Bioinformatics24,282284 (2008).

    39. Doncheva, N.T., Klein, K., Domingues, F.S. & Albrecht, M. Analyzing andvisualizing residue networks of protein structures. Trends Biochem. Sci.36,179184 (2011).

    40. Barabasi, A.L. & Oltvai, Z.N. Network biology: understanding the cellsfunctional organization. Nat. Rev. Genet.5, 101113 (2004).

    41. Astsaturov, I. et al.Synthetic lethal screen of an EGFR-centered networkto improve targeted therapies. Sci. Signal3, ra67 (2010).

    42. Ragusa, M. et al.Expression profile and specific network features of the

    apoptotic machinery explain relapse of acute myeloid leukemia afterchemotherapy. BMC Cancer10, 377 (2010).

    43. Lorenz, W. et al.Microarray analysis and scale-free gene networks identify

    candidate regulators in drought-stressed roots of loblolly pine (P. taedaL.).BMC Genomics12, 264 (2011).

    44. Radrich, K. et al.Integration of metabolic databases for the reconstruction

    of genome-scale metabolic networks. BMC Syst. Biol.4, 114 (2010).45. Choura, M. & Reba, A. Application of computational approaches to study

    signalling networks of nuclear and Tyrosine kinase receptors. Biol. Direct5,58 (2010).

    46. Gu, H., Zhu, P., Jiao, Y., Meng, Y. & Chen, M. PRIN: a predicted riceinteractome network. BMC Bioinformatics12, 161 (2011).

    47. Yu, H. et al.Next-generation sequencing to generate interactome datasets. Nat. Methods8, 478480 (2011).

    48. Pettersen, E.F. et al.UCSF Chimera: a visualization system for exploratory

    research and analy