Topological Analysis and Interactive Visualization of Biological Networks and Protein Structures

8/10/2019 Topological Analysis and Interactive Visualization of Biological Networks and Protein Structures

1/16

2012

NatureAmerica,Inc.Allrightsreserved.

PROTOCOL

670|VOL.7 NO.4 |2012 |NATURE PROTOCOLS

INTRODUCTION

Current high-throughput techniques such as yeast two-hybridscreens for protein interaction partners produce great volumes

of experimental data that can be integrated and explored to gain

insight into biological processes performed by interacting mol-

ecules16. Furthermore, structural biologists study the interactions

of residues in protein structures to understand complex protein

structure-function relationships711. Commonly, large-scale inter-

action data are represented as networks and initially analyzed by

graph-theoretic methods to characterize the topological network

structure and its global and local interaction properties1218.

A number of software tools are available for the visual explora-

tion and computational analysis of networks1921. General software

libraries for network analysis are the Java framework JUNG22, the

C + + library LEDA23, the Python package NetworkX24and R pack-ages such as igraph25, statnet26, sna27, tnet28and QuACN29. However,

they cannot be applied by users without programming expertise.

In contrast, sophisticated free software platforms such as Pajek30,

VisANT31, ONDEX32and BIANA33provide graphical user interfaces

for the analysis of biological networks. In addition, the free and

stand-alone Cytoscape platform has gained considerable inter-

est in recent years because of its open-source code development

and its rapidly growing community of users and developers 34,35.

In particular, its functionality is easily extendable by additional

plug-ins that support specific network analysis tasks. For example,

software protocols are already available for cluster analysis with

the TransClust and ClusterExplorer plug-ins36, as well as for the

integration of physical and genetic interactions into module mapswith the PanGIA plug-in37.

Here we demonstrate how to apply two of our Cytoscape plug-ins,

NetworkAnalyzer38and RINalyzer39, for the standard and advanced

analysis of network topologies. NetworkAnalyzer performs a

comprehensive analysis of network topologies without requiring

advanced knowledge in graph theory or programming expertise38.

In particular, it supports the characterization of molecular net-

works in terms of scale-free and small-world properties, modularity

and hierarchical structure5,12,13,40, the identification of important

network nodes and edges based on topological parameters11,4143,

and the comparison of networks with regard to their topology4447.

Since its initial release in 2007, NetworkAnalyzer has been extended

by additional features and topological parameters and is widelyused in academia and industry as indicated by thousands of soft-

ware downloads. Recently, this plug-in became an integral part of

each standard installation of Cytoscape, and its source code was

published under the GNU Lesser General Public License.

Basically, NetworkAnalyzer efficiently computes a number of

topological parameters, including node degree, clustering and

topological coefficient, characteristic path length and between-

ness centrality (see Box 1for detailed descriptions of all available

parameters). The computed topological parameters are represented

as single values, histograms or scatter plots and can be visualized

in the Cytoscape network view by corresponding node and edge

size as well as color choice. In addition, each pair of computed

parameters can be plotted as a chart. As an exhaustive topologicalanalysis of huge networks can be a computationally intensive task

on a global scale, NetworkAnalyzer provides the option to calculate

only local parameters, such as node degree, neighborhood connec-

tivity and clustering coefficient, for a selected subset of nodes. This

avoids the time-consuming computation of path-related network

parameters. NetworkAnalyzer also offers batch processing of net-

works, which allows the automatized topological analysis of a large

number of networks.

RINalyzer complements NetworkAnalyzer on the particular task of

analyzing and visualizing residue interaction networks (RINs) inter-

actively39. A RIN consists of nodes that represent protein residues and

edges that correspond to noncovalent interactions between residues.

In particular, RINalyzer is currently the only tool that supports thesimultaneous view of a RIN in 2D and the corresponding protein

structure in 3D by connecting Cytoscape to the UCSF Chimera molec-

ular structure viewer48. RINalyzer also provides versatile user options,

such as the computation of weighted network centrality measures

to highlight biologically important residues and the network com-

parison of superimposed protein structures to study differing resi-

due interactions. This new structure analysis approach can be very

useful in a number of biological and medical application scenarios.

Examples include the identification of key residues for protein fold-

ing and allostery, the investigation of residue interactions in protein-

binding interfaces and active sites, and the detailed characterization

of the molecular effects of residue mutations711,39.

Topological analysis and interactive visualization ofbiological networks and protein structuresNadezhda T Doncheva1, Yassen Assenov1, Francisco S Domingues2& Mario Albrecht1,3

1Max Planck Institute for Informatics, Saarbrcken, Germany. 2Center for Biomedicine, EURAC research, Bolzano, Italy. 3Institute of Biometrics and Medical Informatics,

University Medicine Greifswald, Greifswald, Germany. Correspondence should be addressed to M.A. ([email protected]).

Published online 15 March 2012; doi:10.1038/nprot.2012.004

Computational analysis and interactive visualization of biological networks and protein structures are common tasks for gaininginsight into biological processes. This protocol describes three workflows based on the NetworkAnalyzer and RINalyzer plug-ins forCytoscape, a popular software platform for networks. NetworkAnalyzer has become a standard Cytoscape tool for comprehensivenetwork topology analysis. In addition, RINalyzer provides methods for exploring residue interaction networks derived from proteinstructures. The first workflow uses NetworkAnalyzer to perform a topological analysis of biological networks. The second workflowapplies RINalyzer to study protein structure and function and to compute network centrality measures. The third workflowcombines NetworkAnalyzer and RINalyzer to compare residue networks. The full protocol can be completed in ~2 h.


2/16

2012


PROTOCOL

NATURE PROTOCOLS|VOL.7 NO.4 |2012 |671

Box 1 |Topological network parameters

Connected components. In undirected networks, two nodes are connected if there is a path of edges between them. All nodes that are

pairwise connected form a connected component. The number of connected componentsin a network is an indicator of the global

connectivity of a network. A low number of connected components relates to strong network connectivity because many nodes are

connected and form few connected components of large node size.

Degree distributions. In undirected networks, the node degreeof a node nis the number of edges linked to n(ref. 40). A self-loop of a node

is counted like two edges for the node degree63. A node with a high degree is referred to as hub. The node degree distributiongives thenumber of nodes with degree kfor k= 0,1, In directed networks, the in-degreeof a node nis the number of incoming edges and the

out-degreeof a node is the number of outgoing edges. As in undirected networks, there are in-degreeand out-degree distributions. A network

is calledscale-freeif its degree distribution approximates a power law kwith the degree exponent (ref. 40). The topological role of

network hubs depends on the value. For > 3 the hubs are not relevant, for 3 > > 2 the hubs are organized in a hierarchy, and for

= 2 a hub-and-spoke model emerges, in which the largest hub is in contact with a large fraction of all nodes. For most biological networks,

it has been observed that 2 < < 3. Barabsi and Albert used this network property to distinguish between random (as defined by Erdos and

Rnyi64) and scale-free network topologies40,65. There are also continued discussions about the observed power law and scale-freeness66,67.

Neighborhood-related parameters. The neighborhoodof a node nis the set of its neighbors. The connectivity knis the size of the neighbor-

hood of n(ref. 68). The average number of neighborsis an indicator for the average connectivity of the nodes in the network. A normalized

version of this parameter is the network density. The density is a value between 0 and 1. It measures how densely the network is populated

with edges (self-loops and duplicated edges are ignored). A network without edges and with solely isolated nodeshas a density of 0. In

contrast, the density of a clique, which is a set of nodes that are connected to each other, is 1. Another related parameter is the network

centralization68.Networks whose topologies resemble a star have centralization close to 1, whereas more uniformly connected networksare characterized by centralization close to 0. The network heterogeneityreflects the tendency of a network to contain hub nodes68.

The neighborhood connectivityof a node nis the average connectivity of all neighbors of n(ref. 69). The neighborhood connectivity distribution

gives the average of the neighborhood connectivities of all nodes nwith kneighbors for k= 0,1, In directed networks, a node has the

following three types of neighborhood connectivity: only in, the average out-connectivity of all in-neighbors of n; only out, the average

in-connectivity of all out-neighbors of n; and in and out, the average connectivity of all neighbors of n(edge direction is ignored). On the

basis of these three definitions, there are three neighborhood connectivity distributions: only in, only outand in and out. If the neighborhood

connectivity distributionis a decreasing function in k, edges between low connected and highly connected nodes prevail in the network69.

Shortest paths. The length of a path is the number of edges forming it. The lengthof theshortest path, the distance, between two nodes

nand mis denoted by L(n,m). Theshortest path length distributiongives the number of node pairs (n,m) with L(n,m) = kfor k= 1,2,

The shortest path length distribution may indicate small-world properties of a network70. The eccentricityof a node nis the maximum

noninfinite length of a shortest path between nand another node in the network. The network diameteris the maximum node eccentric-

ity. If a network is disconnected, it is clear by definition that the network diameter is the maximum of all diameters of its connected

components. In contrast, the network radiusis the minimum of the nonzero eccentricities of the nodes in the network. The average

shortest path length, also known as the characteristic path length, indicates the expected distance between two connected nodes.

Clustering coefficients. In undirected networks, the clustering coefficient Cnof a node nis defined as C

n= 2e

n/ (k

n(k

n1)), where k

nis

the number of neighbors of nand enthe number of edges between all neighbors of n(refs. 40,70). In directed networks, the definition

is slightly different: Cn = e

n/ (k

n(k

n1)). In both cases, the clustering coefficient constitutes a ratio N/M, where Nis the number of

edges between the neighbors of n, and Mthe maximum number of edges that could possibly exist between the neighbors of n. The

clustering coefficient of a node is always a number between 0 and 1. The network clustering coefficientis the average of the clustering

coefficients of all nodes in the network. The average clustering coefficient distributiongives the average of the clustering coefficients

for all nodes nwith kneighbors for k = 2, In particular, the average clustering coefficient distribution was used to identify a

modular organization of metabolic networks71.

Shared neighbors. P(n,m) is the number of interaction partners shared between the nodes nand m, that is, nodes that are neighbors of

both nand m(ref. 38). Theshared neighbors distributiongives the number of node pairs (n,m) with P(n,m) = kfor k= 1,2,

Topological coefficients. The topological coefficient Tnof a node nwith k

nneighbors is computed as follows: T

n= avg(J(n,m)) / k

n(ref. 72).

HereJ(n,m) is defined for all nodes mthat share at least one neighbor with n. The valueJ(n,m) is the number of neighbors shared

between the nodes nand m, plus 1 if there is an edge between nand m. The topological coefficient is a relative measure for the extent

to which a node shares neighbors with other nodes. The chart of the topological coefficientsindicates the tendency of the nodes in the

network to have shared neighbors.

Stress centrality. Thestress centralityof a node nis the number of shortest paths passing through n(refs. 73,74). Thestress centrality

distributiongives the number of nodes with stresssfor different values ofs.

Betweenness centrality. The betweenness centrality Cb(n) of a node nis defined as follows73: C

b(n) =

snt(

st(n) /

st). Heresand tare

nodes in the network different from n, stdenotes the number of shortest paths fromsto t, and

st(n) is the number of shortest paths

fromsto tthat nlies on. The betweenness value for each node nis normalized by dividing the number of node pairs excluding n:

(N 1)(N 2) / 2, where Nis the total number of nodes in the connected component that nbelongs to. Thus, the betweenness

(continued)


3/16

2012


PROTOCOL


Further Cytoscape plug-ins offer complementary features for

the analysis of biological networks. For instance, plug-ins such as

ClusterMaker49, GLay50, MCODE51, MINE52and NeMo53 can be

used to identify and visualize clusters in networks, whereas the

DomainGraph plug-in has a special focus on the visual analy-

sis of the effect of alternative splicing on gene and protein net-

works54. Another Cytoscape plug-in with functionality related to

NetworkAnalyzer and RINalyzer is CentiScaPe, which also com-

putes network centrality measures55. However, among other differ-

ences, it does not provide global measures of network topology as

NetworkAnalyzer does, and it does not support weighted networksas RINalyzer does.

By using NetworkAnalyzer and RINalyzer, this protocol describes

three computational workflows of frequently applied network analy-

sis steps (Fig. 1). The first workflow uses NetworkAnalyzer and shows

how to conduct a typical topology analysis of biological networks

such as protein interaction networks or RINs. The second workflow

covers various aspects related to the use of RINalyzer for the visual

exploration of RINs, the study of protein binding interfaces and

the network centrality analysis. The third workflow details how to

combine NetworkAnalyzer and RINalyzer for the comparison of

multiple RINs. Below we outline each of the three workflows.

Experimental designTopological analysis of biological networks. This workflow

describes how to use the NetworkAnalyzer plug-in to perform

a topological analysis on an unweighted network loaded into

Cytoscape, as well as how to process and visualize the results. As

described in the next section, RINalyzer also computes several

centrality measures for weighted networks and provides further

options for the visual exploration of the results.

Basically, NetworkAnalyzer calculates many simple topo-

logical parameters, such as clustering coefficient, number of

connected components, diameter and radius, centralization,

number of shortest paths, average shortest path length, average

number of neighbors, density, heterogeneity (only for undirected

networks), number of isolated nodes, number of self-loops andnumber of multiedge node pairs. In addition, the following com-

plex topological parameters are computed by NetworkAnalyzer:

average clustering coefficient distribution, shortest path length

distribution, betweenness centrality versus number of neigh-

bors, closeness centrality versus number of neighbors and stress

centrality distribution. The degree distribution, topological

coefficients, shared neighbors distribution and neighborhood

connectivity distribution are computed for undirected networks

only, whereas the in-degree, out-degree and three different types of

neighborhood connectivity are used for directed networks. More

details on the definitions of all topological parameters are given

in Box 1. The complete set of simple and complex parameters is

referred to as network statistics in NetworkAnalyzer.

In the network view, topological parameters computed for net-

work nodes can be highlighted by changing size and color attributes.

For example, the degree might correspond to the node size and theclustering coefficient might determine the node color (Step 2A(x)).

Complex topological parameters are depicted as histograms or scat-

ter plots. The user can easily customize various visual settings as

well as switch between histograms or scatter plots of the computed

distributions and between linear or logarithmic scales of thexand

yaxes. In addition, a power law can be fitted to the degree distri-

bution to illustrate whether the analyzed network has scale-free

properties (Step 2A(vi)). Finally, both displayed charts and network

statistics can be saved to files (Steps 2A(ix) and (xi)).

Interactive visual analysis of residue networks. This work-

flow explains the use of the Cytoscape plug-in RINalyzer and its

features for analyzing and visualizing RINs. It is divided into thefollowing major steps (Fig. 1): retrieving and loading RIN data into

Cytoscape (Step 2B(iv)); customizing RIN and 3D structure views

(Step 2B(vixi)); creating, managing and saving sets of residue

nodes (Step 2B(xiixviii)); performing centrality analysis, explor-

ing and saving the results (Step 2B(xixxxix)).

Load any type

of network data

2A

2C

1Start

Cytoscape

Compute networkparameters

Visualize

parameters

Explore complex

parameters

Perform centrality

analysis

Save results

Perform topological

analysis on multiple RINs

Compare networkstatistics of RINs

Compare RINsusing RINalyzer

Save node sets

Create sets ofresidue nodes

Customize view of RINand protein structure

LoadRIN data

2B

RINalyzerNetworkAnalyzer

Save networkstatistics

Figure 1|Outline of the protocol. This protocol starts with launchingCytoscape (Step 1) and consists of three major workflows: (Step 2A)

topological analysis of biological networks; (Step 2B) interactive visual

analysis of residue networks; (Step 2C) comparison of residue networks.

Steps colored in blue are performed with NetworkAnalyzer and those in pink

with RINalyzer. The dotted line represents an optional step that connects the

two workflows, which is not described in detail in this protocol.

Box 1 |Topological network parameters (continued)

centrality of each node is a number between 0 and 1. The betweenness centrality of a node reflects the amount of control that this

node exerts over the interactions of other nodes in the network75.

Closeness centrality. The closeness centrality Cc(n) of a node nis the reciprocal of the average shortest path length76. The closeness

centrality of each node is a number between 0 and 1. Closeness centrality is a measure of how quickly information spreads from a

given node to other reachable nodes in the network76.


4/16

2012


PROTOCOL


The workflow starts with the retrieval of residue interaction data

for a protein of interest from the web interface to our RINdata

database (Step 2B(i)). It contains RINs generated by means of the

RINerator software (http://rinalyzer.de/rinerator.php) for over

50,000 protein structures from the Protein Data Bank (PDB)56.

In contrast to previous approaches that define residue interactions

on the basis of spatial atomic distance between residues, RINerator

distinguishes different residue interaction types and quantifies the

strength of individual interactions, which results in an undirected

weighted network with multiple interaction edges. To this end,

RINerator first adds hydrogens to the 3D protein structure by using

the Reduce tool57and then samples contacts on the van der Waals

surface of each atom by using Probe58.

In a RIN, the nodes represent the protein residues and the edges

between them represent the noncovalent interactions identified by

Probe. The edges are labeled with an interaction type and subtype.

Possible types are interatomic contact (cnt), hydrogen bond (hbond),

overlapping van der Waals radii (ovl) and generic residue interac-

tion (combi), whereas the subtypes indicate interactions between

main chains (mc) and side chains (sc) of the amino acid residues.

Each edge is weighted with the respective score for the interacting

residues as computed by Probe and the weight is proportional to

the strength of the interaction. The resulting RIN and additional

information (such as edge weights) are stored in the Cytoscape

default formats, the simple interaction format (SIF) for the net-

work, and the edge attribute (EA) files for the edge weights34. Thus,

each RIN is accompanied by the original PDB file with hydrogens

added, and two edge attribute files.

Once both the RIN and the corresponding protein structure

are imported (after Step 2B(iv)), RINalyzer establishes a bidirec-

tional connection between Cytoscape and the 3D structure viewer

UCSF Chimera. In particular, when the user selects nodes of a RIN in

the Cytoscape network view, the corresponding residues in the pro-

tein structure are automatically highlighted in UCSF Chimera, and

vice versa. RIN nodes can be colored according to secondary structure

based on the data retrieved from UCSF Chimera, and the node colors

can be synchronized with the residue colors in UCSF Chimera. In

addition, the user is able to show or hide different types of interaction

edges such as backbone and hydrogen bonds. The visual RIN settings

that can be customized by the user are listed inBox 2. Notably, a RIN-

specific 2D layout can be applied to the network view that takes the

current 3D structure coordinates into account.

The subsequent visual exploration of RINs often includes the

study of the molecular interactions of active site residues and bind-

ing residues. For this purpose, RINalyzer offers a user interface for

creating and modifying sets of residue nodes. In particular, the user

can apply it to identify the interacting residues in the binding inter-

face of two distinct protein domains (Step 2B(xv)) or to highlight

different sets of residues such as active site residues (Step 2B(xvii))

in both the network and the 3D structure view.

We also show how to use RINalyzer for the computation of

weighted centrality measures and the identification of central

nodes in a RIN (Step 2B(xxixxvii)). To this end, RINalyzer cal-

culates the following centrality measures: weighted degree; short-

est path closeness and betweenness; current flow closeness and

betweenness; random walk closeness and betweenness (Box 3).

Here a crucial point is the choice of the appropriate user settings

for the centrality analysis. As the edge weights in a RIN are pro-

portional to the strength of the represented residue interaction,

the weights need to be converted to distance scores such that

smaller values are assigned to edges that represent stronger inter-

actions for the shortest path computation. For each computed

Box 2 |Visual properties of residue networks

RINalyzer facilitates the visualization of RINs by providing a user interface with default values for a selected set of visual node and

edge properties (Supplementary Fig. 3). The properties are grouped as follows:

Background color. The background color should contrast well with the colors of the nodes and edges.

Node colors. RINalyzer can color the nodes according to the secondary structure of the represented residues. This option is particularly

useful for the visual analysis of RINs. The secondary structure information can be loaded into the Cytoscape session either by importing

a node attribute file or by opening the protein structure file of the RIN in UCSF Chimera via the RINalyzer menu. The node attribute

file should contain the attribute name SS (for secondary structure). The attribute values must be strings: Sheetor the letter Sor E

for sheet, Helixor the letter Hfor helix, Loopor the letter Lor Cfor loop, and an empty string, the minus symbol or the letter Ufor

unknown secondary structure. When the protein structure is loaded with UCSF Chimera, RINalyzer automatically stores the secondary

structure as a node attribute. The colors for the secondary structure elements can be changed by the user.

Node labels and sizes. The RIN node label consists of the following elements: PDB identifier, chain identifier, residue index, insertion code

and residue type. If one of these elements is missing, it is replaced by an underscore. Therefore, the full node label is rather long and

might be inappropriate for visualization. Thus, the user can choose which label elements are displayed instead of the whole node label.

An example is shown in Supplementary Figure 3. In addition, the sizes of the nodes and the fonts of their labels can be customized.

Backbone edges. This option displays the backbone edges in the network. These edges are defined as connections between two residues

with successive residue indices and have the interaction type backbone.

Visibility of edges. In most cases, RINs have multiple edges that represent different interaction types. The large number of multiple

edges might lead to an unclear view of the RIN in Cytoscape. Thus, for all interaction types, the edges of one interaction type can be

shown (or hidden) by (de-)selecting the respective check box.

Edge colors. The edges can be colored with respect to their interaction type.

Edge line type and width. For improved visualization of multiple edges, these can be drawn as straight, parallel lines that lie close to

each other. The appearance of the lines is determined by the width of the edge lines and the space between them.


5/16

2012


PROTOCOL


Box 3 |Network centrality analysis

One of the key features of RINalyzer is the computation of weighted centrality measures with respect to a set of selected nodes. In the

following, we will refer to the set of currently selected nodes as the root set. The centrality measures can be divided into three main

categories according to how the distance between two nodes is measured (Supplementary Fig. 6).

Shortest path centralities. Here the distance between two nodes is given by the length of the shortest pathbetween them. Theshortest

path degreeis the number of neighbors of a node that are contained in the root setand lie within a given cutoff distance from it.

Shortest path closenessis the inverted average distance from a node to all nodes in the root set76. Shortest path betweennessis thefraction of shortest paths between pairs of root setnodes that pass through the node of interest73.

Current flow centralities.Here the distance between two nodes is computed as the effective electric resistancebetween them (i.e., the

difference of their potentials required for generating one unit of electrical current between them77). Current flow closenessis the in-

verted sum of the effective resistances between a node and all nodes in the root set. Current flow betweennessis the amount of current

that passes through a node when a current unit flows from a source to a target node, over all source-target node pairs in the root set.

Random walk centralities. Here the distance between two nodes is measured by the hitting time(i.e., the expected number of steps

needed by a random walk from one node to the other). Random walk closenessis the mean hitting time over all random walks starting

at a node from the root setand ending at the node of interest. Random walk betweennessis the expected number of visits to a node

by a random walk between each pair of root setnodes relative to the hitting time of the random walk. The computation of hitting time

and the expected number of visits is based on the relationship between random walks and the distribution of electrical current through

the network78.

Normalization. Degree and closeness are normalized by the number of nodes in the root set.Betweenness is normalized by the numberof pairs of root setnodes.

For RINalyzer, the input network for the centrality analysis should be undirectedand connectedand can have multiple, unweightedor

weightededges. Self-loops are ignored by the centrality calculation. If the network edges are weighted, the weights need to have

non-negative values and should be proportional to the strength of the represented residue interaction. However, for the shortest

path computation, the weights have to be converted to distance scores so that smaller scores are assigned to edges that connect

nodes with stronger interactions. Thus, before the network centrality analysis is initiated, the user has to customize the following

analysis settings:

Choose attribute as edge weight. This setting is used to assign weights to the network edges by selecting a numeric edge attribute from

the edge attributes of the loaded network. Edge attributes can be easily created in Cytoscape or imported from an edge attribute file.

The default option for the weight attribute is Noneand stands for no specific attribute (i.e., the default edge weight is assigned to all

edges). Edges with a weight of zero will be ignored during the network centrality analysis, and missing weight values are replaced by

the default weight.

Handle multiple edges. RINs can have multiple edges that represent different interactions between residues such as hydrogen bonds or

interatomic contacts. As the computation of centrality measures is based on a single edge type, either the user must select a specific

edge type or the weights of multiple edges between directly connected nodes must be merged into a single combined weight. The user

can choose from four alternatives to merge weights, taking the maximum, the minimum, the average or the sum of the weights of the

multiple edges between directly connected nodes.

Handle negative weights. As the computed centrality measures in RINalyzer are defined only for non-negative weights, negative edge

weights have to be removed before the computation. This can be achieved either by ignoring them (Ignore) or by reverting them to

their absolute value (Revert).

Convert scores into distances. Shortest path centrality measures are computed under the assumption that weights represent distances

(i.e., the smaller the weight, the stronger the interaction). Therefore, if the assigned weights obey to a similarity function (i.e., larger

weight values correspond to stronger interactions), it is necessary to convert them into distances. The first option is to invert each

weight value (1/value), and the second option is to subtract each value from the largest (max) weight value found in the network

(max value). The maxvalue is increased by 1 before subtraction to avoid weights equal to zero. Edge weights of zero cannot beinverted and are thus ignored in the subsequent analysis.

Default edge weight (if missing). If weight values are missing for some of the edges, they are replaced by this defaultvalue. For

instance, if no edge attribute is selected, the default edge weightis assigned to all edges. If the default weight is 1, the network is

treated as unweighted.

Cutoff for weighted degree. The weighted degreecentrality measure is computed by counting the nodes that can be reached by paths of

length up to a certain cutofffrom the node of interest. This cutoff should be chosen in agreement with the specified edge weights.

For betweenness computation exclude paths between nodes within the same set(s). This option can be changed when the subnetwork

formed by the currently selected nodes is disconnected. In this case, the user might want to compute the betweenness values with

respect to only a subset of node pairs that are connected by paths over intermediate, unselected nodes. Therefore, RINalyzer checks

whether the subnetwork defined by the selected nodes is disconnected by computing the number of its connected components.


6/16

2012


PROTOCOL


centrality measure, RINalyzer offers three different ways to exam-

ine the results: (i) inspecting the raw values in a sortable table,

(ii) highlighting selected nodes in the network view or (iii) saving

the values in a tab-delimited format for further processing. The

presented workflow particularly focuses on the second option (ii),

which involves a filter to select nodes with centrality values in a

given numerical range (Step 2B(xxvi)). This functionality allows

the user to create sets of best-scoring residue nodes for furtherinvestigations of their functional and structural characteristics in

both the network view and the 3D protein structure.

Comparison of residue networks. This workflow introduces

one possible application scenario that combines NetworkAnalyzer

and RINalyzer. We compiled a small data set consisting of

four RINs that are generated from the four subunits of the

deoxyhemoglobin structure59. First, the batch analysis option

of NetworkAnalyzer is used to compute the network statistics

of these RINs and to compare their topologies (Step 2C(iiv)).

Second, two RINs that represent the two different subunits of

deoxyhemoglobin are compared with each other using RINalyzer

(Step 2C(vixiv)).

This comparison requires an additional structure alignment of

the two 3D protein structures from the user and eventually resultsin a combined RIN. The comparison network contains different

types of edges and nodes according to the preserved residue inter-

actions and the aligned residues. The type of each node and edge

is stored as an attribute, which can be used to visually adjust the

network view. Thus, the user can easily highlight and investigate

the identified similarities and differences between the two RINs

and the corresponding protein structures.

MATERIALSEQUIPMENT

Hardware requirementsPersonal computer with Internet access and web browser (e.g., MozillaFirefox, Microsoft Internet Explorer or Google Chrome); we alsorecommend a screen with resolution of at least 1024 768 pixels and athree-button mouse

Software requirementsJava Standard Edition, version 6 (download from http://www.java.com/)Cytoscape, version 2.8 (Cytoscape can be installed following the steps

provided in the Cytoscape protocol34or the following web page:http://cytoscape.wodaklab.org/wiki/Cytoscape_User_Manual/Launching_Cytoscape)

NetworkAnalyzer (included in the Cytoscape 2.8 installation as a core plug-in)RINalyzer (download and installation instructions for RINalyzer are avail-able at http://rinalyzer.de/docu/install.php)UCSF Chimera, version 1.5 (instructions for its installation are available at

http://www.cgl.ucsf.edu/chimera/download.html)

Data

Sample data sets required for this protocol are provided as supplementaryfiles. The human protein interaction network (Supplementary Data 1) waspublished in the recent interactome screening study by Yu et al.47. The setof four RINs (Supplementary Data 2) was generated using the RINeratorpackage (http://rinalyzer.de/rinerator.php) and represents the four subunitsof the deoxyhemoglobin structure with the PDB identifier 4HHB (ref. 59).

PROCEDURE

1| Start Cytoscape.?TROUBLESHOOTING

2| Follow option A for the topological analysis of biological networks, option B for the interactive visual analysis of residuenetworks or option C for the comparison of residue networks.

(A) Topological analysis of biological networks (i) Download data.Here we perform the topological analysis of the protein-protein interaction network from Yu et al.47

(Supplementary Data 1). First, download the file Supplementary Data 1to a local directory. (ii)Import network data(for details, see the Cytoscape protocol34). In the Cytoscape main window, go to the menu option

File Import Network (multiple file types). Select the option Localfor Data Source Typeand click the Selectbutton.

Navigate to the directory that contains Supplementary Data 1and select the file. Confirm the selection by clickingthe Openbutton. Then click theImportbutton to import this network into the current Cytoscape session. When thenetwork is successfully loaded, a summary window will appear. Click the Closebutton of this window and return to the

Cytoscape main window. (iii)Apply network layout.To apply a specific layout to the network, go to the menu option Layouts yFiles Organic.

The network view can be enlarged by clicking the Maximizebutton in the upper right corner of the network view window. (iv) Run NetworkAnalyzer.Go to the menu option Plugins Network Analysis Analyze Network. NetworkAnalyzer

can perform topological analysis on directed networks as well as on undirected networks. Therefore, the user canchoose how the edges should be interpreted. As this network is undirected, select the option Treat the network asundirectedand click the OKbutton to start the analysis. A Progressdialog will appear. The analysis time depends onthe size of the network and the amount of memory assigned to the Cytoscape application. The Cancelbutton can beused at any time to stop the analysis.?TROUBLESHOOTING

(v) View results.The results window appears after the analysis is completed (Supplementary Fig. 1). The first tabshows the computed simple parameters, e.g., the clustering coefficient and the average shortest path length.
http://cytoscape.wodaklab.org/wiki/Cytoscape_User_Manual/Launching_Cytoscapehttp://cytoscape.wodaklab.org/wiki/Cytoscape_User_Manual/Launching_Cytoscapehttp://cytoscape.wodaklab.org/wiki/Cytoscape_User_Manual/Launching_Cytoscapehttp://cytoscape.wodaklab.org/wiki/Cytoscape_User_Manual/Launching_Cytoscape


7/16

2012


PROTOCOL


The remaining tabs display complex network param-eters such as degree and shortest path distributions.All topological parameters are described in more de-tail in Box 1. Select the tab Node Degree Distribution.The node degree distribution is depicted in a log-log

plot. Thexaxis enumerates the degrees of nodes inthe network and theyaxis shows the frequency ofnodes with a given degree.

(vi) Fit a power law.The degree distribution of many biological networks is known to approximate a power law. Click on thebutton Fit Power Lawto fit a power law to the distribution. A warning message will inform you that only points with

positive coordinate values are considered for the fit. Confirm this message by clicking the OKbutton. After a shortdelay, the dialog NetworkAnalyzer Fitted Functionappears. It reports the fitted power law constants, the correlationbetween the given data points and the corresponding points on the fitted curve, and the R-squared value as a measureof fit quality between 0 and 1 (the higher the value, the better the fit). Click the OKbutton to close the dialog and

see the fitted power law in the chart (Fig. 2). (vii) Explore charts.Click the button Enlarge Chartto open the distribution plot in a separate, enlarged window. Almost all

nodes in the network have a degree of < 30. The dot near the lower right corner of the plot indicates that there is onlyone node with degree 151, which hereafter we call hub node because of this exceptional number of protein interac-tions. Close the window.

(viii) Customize charts.Click on the button Chart Settingsto rename the axes in the tabAxes, show or hide the gridlinesin the tab Gridlines, change the shape and color of chart points in the tab Histogram. Click the OKbutton to applychanges of the settings or the Cancelbutton to close the dialog without saving the changes.

(ix) Export charts.Every chart in the results window can be saved to a file. To save the current chart as an image, clickthe Export Chartbutton. Adjust the image size by entering your preferred values in the two displayed text fields andconfirm it by the Savebutton. Navigate to the directory where you want to save the image and select the file typefrom the drop-down menu. Finally, click the Savebutton. In addition, it is possible to export the visualized data forfurther processing in a different application. For example, select the tab Betweenness centrality. This scatter plotdisplays the correlation between node degree and betweenness centrality in the studied network. Every node in thenetwork is represented by a point. Thexaxis gives the node degree and theyaxis the betweenness. Click the buttonExport dataand enter a file name (including extension) to store the values of these topological parameters. Afterclicking the Savebutton, the newly created tab-separated text file will contain a table of the degree and betweennesscentrality values for every node in the network. This file can be easily imported in external software applications such

as a spreadsheet tool for further analysis or processed by other programs. (x) Visualize topological parameters.In Step 2A(vii), we identified a hub node in the network; now we are interested in

locating it in the network view. Thus, we will visually map the node degree to node size in the network view. Click

the button Visualize Parametersin the results dialog of NetworkAnalyzer. In the Map node size todrop-down menu,select Degree. Nodes with a low degree should be displayed as small circles in contrast to nodes with a high degree.To this end, select the option Low values to small sizes. In addition, it is possible to map the degree or any othercomputed topological parameter to the node color. Choose ClusteringCoefficientin the drop-down menu on the rightside and select the option Low values to bright colors. Nodes with low clustering coefficient will now be green andnodes with high clustering coefficient will be red. Finally, confirm the mapping choice by clicking the Applybutton.

This results in changed network visualization (Supplementary Fig. 2). If necessary, move the networkstatistics dialog to the right corner of your screen or close it in order to see the updated network view. The hubnode is now clearly visible as the largest circle in the network view. The large number of green-colored nodesindicates that most nodes have low clustering coefficient, i.e., the neighbors of most nodes do not tend to interactwith each other. To obtain an even better view of the nodes, zoom into it by applying the button Zoom inon thetoolbar or using the mouse scroll wheel.

Figure 2|Screenshot of network statistics computed by NetworkAnalyzer.The depicted node degree distribution is derived from the undirected

protein-protein interaction network from Yu et al.47. The red line represents

a fitted power law, which indicates that the analyzed network is scale-free.

The tabs below the dialog title lead to the display of histograms or scatter

plots of the complex topological parameters computed by NetworkAnalyzer.

The buttons on the right side provide the user with a variety of options for

customizing the view as well as for exporting the displayed charts and the

underlying data.


8/16

2012


PROTOCOL


(xi) Save network statistics.Close the analysis results window. A warning message appears that the computed networkstatistics have not been saved. Click the Yesbutton to close the statistics window without saving the results. To recom-pute the network statistics at a later time point, just run NetworkAnalyzer again. Alternatively, the results can be savedto and reloaded from a text file to avoid recomputation. For this purpose, click on the button Save Statistics. Enter afile name to store the network statistics in a file with the extension .netstatsand click the Savebutton to confirm it.

(xii) (Optional) Perform centrality analysis. In addition to NetworkAnalyzer, RINalyzer can be applied to perform central-ity analysis on the loaded network. RINalyzer supports weighted networks and computes several weighted centralitymeasures additionally (Box 3). To use RINalyzer now, continue with Step 2B(xx).

(B) Interactive visual analysis of residue networks (i) Retrieve RIN data.Identify a protein of interest with an experimentally determined 3D structure deposited in the PDB.

For example, we have chosen the HIV-1 protease60with the PDB identifier 1HIV. Start a web browser and go to theRINdata web page (http://rinalyzer.de/rindata.php) to download the corresponding RIN data. Enter the PDB identi-fier 1HIV in the search form and click the button Retrieve RIN data. If RIN data are available for this PDB identifier,a download link is provided. Click on this link and download the file to a local directory. The downloaded RIN data

are a zipped archive that contains multiple files: a PDB file with the 3D protein structure of the original PDB file (asretrieved from the PDB) with added hydrogens (pdb1hiv_h.ent); a SIF file containing the RIN for all chains in thePDB file (pdb1hiv_h.sif); an edge attribute file with edge weights reflecting the strength of the interactions betweenresidues (pdb1hiv_h_intsc.ea); and an edge attribute file with edge weights representing the number of interactionsbetween residues (pdb1hiv_h_nrint.ea). Unzip all files from the archive.

?TROUBLESHOOTING (ii)Import network into Cytoscape.In the Cytoscape main window, go to the menu option File Import Network

(multiple file types). Select the option Localfor Data Source Typeand click the Selectbutton. Navigate to the directorythat contains the extracted RIN files and select the network SIF file, e.g.,pdb1hiv_h.sif. Confirm the selection by clicking

the Openbutton and then click theImportbutton. When the network is successfully loaded, a summary window willappear. Click the Closebutton of this window and return to the Cytoscape main window. The network view can beenlarged by clicking the Maximizebutton in the upper right corner of the network view window.

(iii)Import edge attributes into Cytoscape.Import the edge weights representing the number of interactions betweenresidues, as they are needed in Step 2B(xxii) for the network centrality analysis. Go to the menu optionFile Import Edge Attributes. Navigate to the directory that contains the RIN files. Select the edge attribute filepdb1hiv_h_nrint.eaand click the Openbutton. When the attributes are successfully loaded, a summary window willappear. Click the Closebutton of this window and return to the Cytoscape main window.

(iv) Open protein structure in UCSF Chimera.Go to the menu option Plugins

RINalyzer

Protein Structure

Openstructure from filein the Cytoscape main window and navigate to the directory that contains the RIN files. Select thePDB file (pdb1hiv_h.ent) and click the Openbutton. It may take a while until UCSF Chimera is launched and the 3Dstructure is loaded. Afterwards, a summary window about the internally performed mapping between network nodesand structure entities will appear. Click the Closebutton of this window.?TROUBLESHOOTING

(v) Explore protein structure.Use the mouse to move and scale the protein structure in the main UCSF Chimera window.By default, the left mouse button controls rotation, the middle mouse button controlsXYtranslation and the rightmouse button controls scaling. While holding down the Ctrlkey, use the left mouse button to select residues of inter-est by clicking on them or to drag out a selection area (sweep out an area before releasing the left mouse button).

(vi) Show protein backbone.To see the protein backbone in the 3D structure and to add backbone edges to the RIN,select Plugins RINalyzer Protein Structure Show backbonein the Cytoscape main window. This option automat-ically adds protein backbone edges to the RIN in Cytoscape and also invokes the display of the ribbon representation

for the corresponding 3D protein structure in UCSF Chimera. (vii)Apply RIN layout.Layout the RIN in Cytoscape according to the 3D structure view in UCSF Chimera by selecting the

menu option Plugins RINalyzer Layout RIN Layout. Click on the icon 1:1in the Cytoscape toolbar to see thewhole network. As the graphics details of the network view are normally not displayed when the network is zoomed-out,select the menu option View Show Graphics Details.

?TROUBLESHOOTING(viii) Customize RIN view.Go to the menu option Plugins RINalyzer Visual Propertiesto choose in the tab

General & Nodeshow the node label should be displayed. For example, if only residue index and type are selected,the node labels are updated accordingly (Supplementary Fig. 3). In the tab Edges, the visible edge types can beselected. The network view is updated automatically each time an edge type box is checked or unchecked. In thesame tab, the option Straighten edge linescontrols whether multiple edges are drawn as straight parallel lines or not.When satisfied with the customized settings, confirm them by clicking theApplybutton and click the Closebutton of


9/16

2012


PROTOCOL


the dialog RIN Visual Properties.

In the resulting network view,the nodes are colored accordingto secondary structure and theedges according to interactiontype. More details about the

different visual properties canbe found in Box 2.

(ix) Synchronize colors between views.After customizing the visual properties of a RIN, nodes are usually colored according to secondary structure. To transfer

the node colors to the corresponding residues in UCSF Chimera, go to the menu option Plugins RINalyzer ProteinStructure Sync 3D view colors. The resulting network and 3D structure views should be the same as in Figure 3.

(x) (Optional) Show only protein backbone.Now we want to look at only the protein backbone in both the networkview and the 3D structure view. If the protein backbone is not yet visible in the 3D structure and the RIN, selectPlugins RINalyzer Protein Structure Show backbonein the Cytoscape main window. Then go to the menuoptionActions Atoms/Bonds hidein the UCSF Chimera window to hide all atoms in the 3D structure view. In theCytoscape main window, go to the menu option Plugins RINalyzer Visual Properties Edges. Uncheck the boxesnext to all edge types except of the backbone edges. The resulting RIN and the corresponding 3D protein structure

should look as in Supplementary Figure 4. Show the edges again by checking the boxes next to the edge types in thedialog RIN Visual Properties. When the edges are added to the network, they are visualized as curved lines. Click theApplybutton tostraightenthem. The atoms in the 3D structure can be depicted again by executingActions Atoms/Bonds showin the UCSF Chimera window.

(xi) (Optional) Hide protein backbone.The backbone can be hidden in both views by clicking on Plugins RINalyzer Protein Structure Hide backbonein the Cytoscape main window.

(xii) Create sets of residue nodes.RINalyzer provides an interface to manage node sets. To open it, click on the menu optionPlugins RINalyzer Manage Node Sets.The RINalyzer Node Setspanel appears as the last tab in the Cytoscape

Control Panel. New node sets can be created in different ways. For instance, to create a set that contains the currentlyselected residues in UCSF Chimera, switch to the UCSF Chimera window and click Select Chain Ato select all

residues in chain A. Selected residues are colored in green, and the corresponding nodes in Cytoscape are also selectedautomatically (yellow). In the panel RINalyzer Node Sets, go to the menu option File New Set from selected nodesto create a set that contains the nodes corresponding to currently selected residues in UCSF Chimera. Insert a name for

the set to be created, e.g., Chain A, and click OKto confirm it. The same actions can be repeated to create a secondset named Chain Bthat contains all nodes corresponding to residues in chain B.

(xiii) Select set nodes in the network view.To see all set nodes selected in the network view, use the option Select nodesinthe context menu of the set (right-click the set name). To clear the current node selection, click on the background inthe network view window.

(xiv)Add active site nodes to a set.It is known that the active site residues of the HIV-1 protease are ASP 25, THR 26 and

GLY 27 in chains A and B (ref. 61). To create a set with the active site residues of the HIV-1 protease for use in thecentrality analysis in Step 2B(xxvii), go to the menu option File New Empty setin the Cytoscape panelRINalyzer Node Sets. Enter the nameActive siteand click the OKbutton to confirm. Go to the Searchfield in the Cyto-scape toolbar and start inserting the node identifier a:25:_:asp. As a result of this insertion, a single hit should appearin the drop-down menu of the search field. Press Enterto select the node. In the panel RINalyzer Node Sets, go to themenu option Edit Add nodes, and the selected node will be added to the currently selected node set, which should

Figure 3|Simultaneous view of RIN and 3Dprotein structure by RINalyzer. The RIN of the HIV-1

protease (PDB identifier 1HIV (ref.60)) is displayed

in Cytoscape (top), whereas the molecular

graphics visualization of the 3D protease

structure is shown in UCSF Chimera (bottom). All

RIN nodes and the corresponding residues are

colored according to secondary structure: blue for

helices and red for strands. The various types ofnoncovalent residue interactions correspond to

different edge colors: interatomic contacts are in

blue; hydrogen bonds in red; overlapping van der

Waals radii in gray; and the backbone in black.


10/16

2012


PROTOCOL


beActive site. Repeat the same actions for the remaining five active site residues: a:26:_:thr; a:27:_:gly; b:25:_:asp;b:26:_:thr;and b:27:_:gly. The setActive siteshould eventually contain six nodes. It is possible to color the active sitenodes and the corresponding residues in the 3D structure as will be shown in Step 2B(xvii).

(xv)Identify residue nodes on the interface of chain A.We can use the interface RINalyzer Node Setsto identify whichresidues from chain B interact with chain A. Right-click the node set Chain Aand execute the menu option SelectNodes. Afterwards, in the Cytoscape menu, go to the menu option Select Nodes First Neighbors of Selected Nodes.This operation may take several seconds and it is finished when the neighboring residues are highlighted in yellow.Back in the panel RINalyzer Node Sets, go to the menu option File New Set from selected nodestocreate a set that contains all nodes corresponding to chain A and their neighbors. Enter the set name, e.g., Chain Aand neighbors, and click the OKbutton to confirm it. Now, all nodes in this new set that do not belong to chain A arethe nodes from chain B that interact with nodes from chain A. To extract these nodes, we need to build the intersec-tion of the sets Chain Band Chain A and neighbors. The interface RINalyzer Node Setssupports typical set operationssuch as the union and intersection of sets. To create the intersection of two sets, select both by left-clicking whilepressing the Ctrlkey (or the Commandkey for Mac users) and go to the menu option Operations Intersection. This

action will create a new set that is the intersection of the two selected sets. Enter a name for the new set, e.g., Chain BInterface, and click the OKbutton to confirm it.

(xvi)Identify residue nodes on the interface of chain B.To create a node set Chain A Interface, select the nodes in the setChain B; then select their first neighbors using the Cytoscape option Select Nodes First Neighbors of SelectedNodesand create a node set Chain B and neighbors, and finally, build the intersection of the set Chain B and neighbors

and the set Chain Ato create the node set Chain A Interfaceas described in Step 2B(xv).(xvii) Color set nodes and corresponding residues.We can highlight different sets in the network view by changing the visual

properties of the corresponding set nodes, e.g., by coloring them in a different color. Right-click the node set Chain Ato access its context menu and select the menu option Visual Mapping Bypass Node Color. Choose a color and click

OKto color all set nodes in the network view. In addition, select the option Sync 3D view colorsfrom the context menuto color the corresponding residues in UCSF Chimera with the same color. It is possible to repeat the same actions forthe node sets Chain A Interface, Chain Band Chain B Interface. In the end, the network and 3D structure could looklike the image shown in Supplementary Figure 5.

(xviii)Save node sets for further analysis.Select all sets by left-clicking them while pressing the Ctrlkey or by clicking thefirst set and then clicking the last set while holding the Shiftkey pressed. In the panel RINalyzer Node Sets, go to themenu option File Save selected set(s). Enter a file name and click Save. Close the resulting dialog that informs youabout the successfully performed action.

(xix) Prepare network for centrality analysis.Make sure that the backbone edges in the network are hidden, as they areonly meant to aid with the visual analysis of the RIN. To hide them, go to the menu option Plugins RINalyzer Protein Structure Hide backbone. Hiding the backbone edges in the RIN will concomitantly hide the ribbons in the3D structure view. Therefore, if you do not see the 3D structure any more, switch to UCSF Chimera and go to the menuoptionActions Atoms/Bonds showto display the atoms. In addition, the 1HIV structure contains a third chainIthat represents an inhibitor bound to the protease. One might want to remove or hide the corresponding RIN nodesbefore performing the centrality analysis. In order to select this chainI, go to the menu option Select Chain Iin UCSF Chimera. Then switch to the Cytoscape main window and go to the menu option Edit Delete Selected Nodes

and Edgesto delete the selected nodes. (xx) Handle disconnected network nodes.Make sure the network is connected. The HIV-1 protease RIN contains

two nodes, A:40:_:GLY and B:37:_:SER, which are not connected to any other node in the network. Thus, when thecentrality analysis is started, a warning message will appear that the network has more than one connected compo-nents. In such cases, shortest path centrality measures are computed for each connected component independently,

but current flow and random walk centralities are not computed at all. There are two possible solutions to deal withthis issue: proceed with the analysis by clicking the Yesbutton, keeping in mind that these nodes are disconnectedfrom all other nodes in the network; alternatively, cancel the analysis by clicking the Nobutton, select the twodisconnected nodes in the network view and delete them by clicking Edit Delete Selected Nodes and Edges.

(xxi) Select root nodes for analysis.The centrality analysis can be started only if a set of nodes (root set) is selectedin the network view. RINalyzer computes each centrality measure with respect to the root set (Box 3). For example,the weighted degree of a node is computed by counting its neighbors that are contained in the root set and that arewithin a given distance cutoff from the node of interest. The first-time user might just select all nodes by clickingSelect Nodes Select all nodesin the Cytoscape main window. This action can take a few seconds because boththe nodes in the network and the residues in the 3D structure are selected.

(xxii) Perform centrality analysis.Start the analysis from the menu option Plugins RINalyzer Analyze Network.A dialogthat contains different analysis settings will appear (Supplementary Fig. 6). The settings are described in detail in


11/16


12/16

2012


PROTOCOL


Openbutton to confirm it. AsRINs are undirected, we do notneed to consider all network

interpretations. Select the optionConsider networks as undirected.

(iii) Perform batch analysis.Clickthe button Start Analysis(Supplementary Fig. 8a).

A dialog appears that displays the progress of the batch analysis (Supplementary Fig. 8b). Depending on the number ofnetworks and their size, this might be a very time-consuming step. The batch analysis can be canceled at any time byclicking the Cancelbutton in the progress dialog.

(iv) View batch analysis results. After the analysis is complete, the button Show Resultsis enabled. Click on it to see the

dialog Batch Analysis - Results. The dialog contains a table of all topological analyses performed. Every row in the re-sults table lists the loaded network, its interpretation and the resulting network statistics file, which was saved to theoutput directory (Supplementary Fig. 8c).

(v) Load network statistics in Cytoscape.Clicking on a network name and a statistics file name will load the network andthe topological analysis results in Cytoscape, respectively. Load all four statistics files and compare the simple param-eters computed for each network (Supplementary Fig. 9). We can notice that the two subunits, networks A and C,are very similar to each other. This is also the case for the two subunits, networks B and D. However, there are ap-parent differences between the network parameters for the RINs of the and subunits. Close the network statistics

dialogs to finish the results inspection. (vi) Load networks into Cytoscape.Click on the networkspdb4hhb_h_A.sifandpdb4hhb_h_B.sifto load them into Cytoscapefor the next steps. You can now close the dialog Batch Analysis - Results.

(vii) Retrieve structure alignment file.RINalyzer offers the functionality to compare two RINs based on a superpositionalignment of the corresponding 3D protein structures. Here we compare two of the networks loaded in the previousstep, i.e., one subunit and one subunit of the human protein deoxyhemoglobin (PDB identifier 4HHB). Start a webbrowser and navigate to the RCSB PDB Protein Comparison Tool website (http://www.rcsb.org/pdb/workbench/work-bench.do). Insert the PDB identifier 4HHB in the text field forID 1and choose chain A by selecting 4HHB.A in thedrop-down menu. Insert the same identifier in the text field for ID 2and select 4HHB.B in the drop-down menu. Then,in the drop-down menu Select Comparison Method, choose thejCE algorithmand click the Comparebutton. In the

Structure Alignment Viewpage, scroll down to the panel Download Alignment. Right-click the link Download XMLand select the option Save Link As.Navigate to the directory where the file should be saved, enter a name for it(e.g., 4hhba_vs_4hhbb.xml), and click the Savebutton to confirm it. Close the Protein Comparison Tool.

(viii) Perform RIN comparison.To compare RINs using RINalyzer, go to Plugins RINalyzer Compare RINs. Selectpdb4hhb_h_Aas first network,pdb4hhb_h_Bas second network and then enter a name for the resulting com-parison network (e.g., comparison). Click the button ... and navigate to the alignment file downloaded in Step2C(vii). Confirm its selection by clicking the Openbutton. Next, click the Comparebutton to perform the actualcomparison. A new network with 148 nodes and 2405 edges is created. This combined RIN consists of three typesof nodes: nodes that represent aligned residues according to the structure superposition, and two types of nodes

that correspond to residues that were not aligned by the superposition and belong to the first or second network.The network also contains three different edge line styles: solid lines for interactions present in both networks,dashed lines for interactions from the first network and dotted lines for interactions from the second network.The type of each node and edge is stored as an attribute named BelongsToand represented by one of the followingthree values: net1, net2or both. The value net1refers to the first RIN selected in the comparison, and the valuenet2to the second RIN.

Figure 4|Comparison network generated byRINalyzer. The combined network resulted from

the comparison of the two RINs that represent

one of the and one of the subunits of human

deoxyhemoglobin (PDB identifier 4HHB (ref. 59),

chains A and B). Edge colors refer to the

interaction type, i.e., interatomic contacts in

blue; hydrogen bonds in red; and overlaps in gray.

Edge line styles correspond to noncovalent residueinteractions that are preserved in both subunits

(solid lines), present only in the subunit (dashed

lines) or only in the subunit (dotted lines).


13/16

2012


PROTOCOL


(ix)Adjust network view.Maximize the network view window and select View Show Graphics Detailsfrom theCytoscape menu.

(x)Apply network layout.Apply the organic layout (Layout yFiles Organic) and the RIN visual properties(Plugins RINalyzer Visual Properties). The resulting network should look as in Figure 4.

(xi) Hide interaction edges.First, we want to reduce the visual complexity by showing fewer edges. Go to the menu

option Plugins RINalyzer Visual Propertiesand select the Edgestab. Hide all edges except combi:all_allbyunchecking the boxes next to each edge type and close the dialog RIN Visual Propertiesby clicking the Closebutton.

(xii) Color nodes and edges.Now, we color the nodes and edges according to the network they belong to. In the CytoscapeControl Panel, go to the tab VizMapperand double-click the field Edge Color.Select the edge attribute BelongsTofromthe drop-down menu for edge color values and the mapping type Discrete Mappingfrom the mapping type drop-downmenu. A list that contains the three BelongsToattribute values net1, net2and bothwill appear. For each attributevalue, do the following: click the field next to the attribute value and the button ... will appear; click this button,select a color and click the OKbutton to confirm it. Repeat the same actions for mapping the node color using the

BelongsTonode attribute.

(xiii) Customize node labels.It is also possible to change the node labels by clicking the field next to the visual propertyNode Label.Then select the attribute CombinedLabeland the mapping type Passthrough Mapping. The node attributeCombinedLabelcontains node labels composed of the labels of the aligned nodes from the compared networks.

(xiv) Explore comparison network.After the mapping is applied to the network view, it should look as in SupplementaryFigure 10. Zoom in using the + button in the Cytoscape toolbar to observe the residue interaction differences be-

tween the superimposed and subunits of deoxyhemoglobin.

?TROUBLESHOOTINGTroubleshooting advice for basic problems that may occur during the procedure is given in Table 1.

Further information about using Cytoscape can be found in the documentation at http://www.cytoscape.org/documentation_users.htmland via the helpdesk mailing list http://groups.google.com/group/cytoscape-helpdesk.Tutorials and documentation about UCSF Chimera are available at http://plato.cgl.ucsf.edu/chimera/docindex.html andquestions can be addressed to the users mailing list ([email protected]). RINalyzer and NetworkAnalyzerdocumentations can be found at http://rinalyzer.de/documentation.php and at http://med.bioinf.mpi-inf.mpg.de/netanalyzer/help/2.7/index.html, respectively.

TABLE 1 |Troubleshooting table.

Step Problem Possible reason Solution

1 Cytoscape does not start Java is not installed properly Make sure that Java version 6 is installed. Java can be

downloaded from http://www.java.com/

2A(iv), 2B(xxii) The analysis takes very

long or seems to be

frozen

Cytoscape has run out of

memory

Increase the memory for the Cytoscape program. One

way to do this is to start Cytoscape from the command

line and use the -Xmx option to set the memory size. To

this end, open a command line window, navigate to the

Cytoscape directory and type java -Xms10m

Xmx1500M -jar cytoscape.jar -p plugins

to start Cytoscape with 1,500 MB of memory. For alterna-tive ways to increase the memory, see the Cytoscape Wiki

at http://cytoscape.wodaklab.org/wiki/How_to_increase_

memory_for_Cytoscape

2B(i) There is no RIN data for

a protein

The RINdata database does not

contain precomputed RINs for

all PDB identifiers

Download and apply the package RINerator (http://

rinalyzer.de/rinerator.php) to generate the RIN data.

Alternatively, the RING web server (http://protein.cribi.

unipd.it/ring/)62can be used to create different types

of RINs

(continued)
http://www.cytoscape.org/documentation_users.htmlhttp://www.cytoscape.org/documentation_users.htmlhttp://groups.google.com/group/cytoscape-helpdeskhttp://med.bioinf.mpi-inf.mpg.de/netanalyzer/help/2.7/index.htmlhttp://med.bioinf.mpi-inf.mpg.de/netanalyzer/help/2.7/index.htmlhttp://med.bioinf.mpi-inf.mpg.de/netanalyzer/help/2.7/index.htmlhttp://med.bioinf.mpi-inf.mpg.de/netanalyzer/help/2.7/index.htmlhttp://groups.google.com/group/cytoscape-helpdeskhttp://www.cytoscape.org/documentation_users.htmlhttp://www.cytoscape.org/documentation_users.html


14/16

2012


PROTOCOL


TIMINGThe time required to execute this protocol primarily depends on the size of the analyzed data sets and the CPU power.By using the provided data sets, an experienced user can execute the protocol within about 1 h 30 min, whereas aninexperienced user may need more than 2 h. It takes 1520 min to complete Step 2A; 1 h to 1 h 20 min for Step 2B; and2030 min for Step 2C.

ANTICIPATED RESULTSHere we discuss the results obtained by following each of the three workflows described in this protocol.

Step 2A: Topological analysis of biological networks

The application of NetworkAnalyzer on the protein-protein interaction network from Yu et al.47(Supplementary Data 1)produces a comprehensive set of topological network parameters. The network exhibits scale-free behavior because a powerlaw kwith = 1.62 can be fitted to the node degree distribution. Furthermore, such an value is indicative for ahub-and-spoke network with one hub being connected to a large fraction of nodes. Indeed, the network contains one hubprotein with an exceptionally high node degree (151 interactions). The visual exploration of the network view after mapping

the clustering coefficient to node color suggests that only a few nodes have clustering coefficients larger than 0. This meansthat the proteins in the network do not tend to form clusters with their interaction partners.

Step 2B: Interactive visual analysis of residue networksThe RIN generated from the protein structure of the HIV-1 protease (PDB identifier 1HIV) contains 200 nodes and 2,199edges. The nodes can be divided into three groups according to the protein chain: 99 nodes for residues in chain A; 99 nodesfor residues in chain B; and two nodes for chainI. By using the interface RINalyzer Node Sets, we could identify the residuesin the interface between chains A and B of the protein structure. In all, 35 residues from chain A interact with 35 residuesfrom chain B (dark blue and red nodes in Supplementary Fig. 5, respectively).

Furthermore, we performed a centrality analysis of the RIN of the HIV-1 protease to highlight central nodes. The best-scor-ing nodes according to weighted shortest path closeness (i.e., centrality values > 0.21) were saved in a node set. The overlapbetween this node set (seven nodes) and the node set representing the protease active site (six nodes) is four nodes. Whenstudying the single centrality values in a table sorted from highest to lowest closeness, we observed that the four active siteresidues have the best ranks.

TABLE 1 |Troubleshooting table (continued).

Step Problem Possible reason Solution

2B(iv) UCSF Chimera does

not start

The path to UCSF Chimera is not

configured properly

Open the dialog Cytoscape Preferences Editor(EditPreferences Properties). Click theAddbutton and enterthe name of the property: Chimera.chimeraPath. Click OK

and enter the path to the UCSF Chimera application. On

a Linux machine, this could be $HOME/chimera/bin; on a

Windows machine, C:\Program Files\Chimera\bin; and on

a Macintosh, /Applications/Chimera.app/Contents/

MacOS. Save the new preferences by clicking the option

Make Current Cytoscape Properties Defaultat the bottom

of the dialog

2B(vii) RINLayout is not applied

to the network

The 3D structure corresponding

to the current network is not

loaded in UCSF Chimera

Load the protein structure corresponding to the current

network using the menu option PluginsRINalyzer

Protein StructureOpen structure from file

More than one protein structure

is loaded in UCSF Chimera

Close all protein structures opened in UCSF Chimera

except for the structure that corresponds to the current

network, using the menu option PluginsRINalyzerProtein StructureClose


15/16

2012


PROTOCOL


Step 2C: Comparison of residue networksThe RINs (Supplementary Data 2) generated from the four subunits of human deoxyhemoglobin (chains A, B, C and D inthe PDB structure with identifier 4HHB) are of similar size: 141 nodes and 1,885 edges for chain A; 146 nodes and 1,935edges for chain B; 141 nodes and 1,887 edges for chain C; and 146 nodes and 1,971 edges for chain D. As one might expect,the analysis performed with NetworkAnalyzer (Supplementary Fig. 9) indicates that the RINs of chains A and C, the two subunits, have almost identical simple network parameters such as clustering coefficient, network centralization, numberof shortest paths, characteristic path length and network density. The same holds for the RINs of chains B and D, the twosubunits. However, the difference between the simple parameter values for chains A and B is, for most parameters, largerthan the difference between the same subunits. The complete set of both simple and complex network parameters can becompared further using the network statistics files generated by NetworkAnalyzer.

To compare the individual residue interactions in the two RINs of chains A and B, we used RINalyzer, which generates acombined comparison network based on the superposition alignment of the corresponding 3D structures. The comparisonnetwork contains 148 nodes and 2,405 edges. Of the 148 nodes, two represent residues in chain A and 7 nodes residues inchain B; the remaining 139 nodes correspond to the aligned residues. The number of edges that correspond to noncovalent

interactions that are identical in both subunits (1,415 edges) is considerably higher than the number of nonidentical edges(470 and 520 for chains A and B, respectively). These numbers reflect structural similarities and differences of the twosubunits. When visually exploring the simplified comparison network (Supplementary Fig. 10), we can recognize the largenumber of edges that represent noncovalent interactions identical in both subunits (523 solid edge lines) and the rathersmall number of interactions present either in the subunit (68 dashed edge lines) or in the subunit (89 dotted edge

lines) of deoxyhemoglobin. The nonidentical edges can be seen mainly in the network part that contains nodes of unalignedresidues. Dashed or dotted edges between aligned residue nodes indicate that the corresponding residues form functionallydistinct interactions in the two homologous, structurally very similar subunits.

Note: Supplementary information is available via the HTML version of this article.

ACKNOWLEDGMENTS We thank D. Buezas, T. Kacprowski and C. Weichenbergerfor their useful comments on the workflows and the manuscript. This study was

partially funded by the BMBF through the German National Genome ResearchNetwork (NGFN) and the Greifswald Approach to Individualized Medicine(GANI_MED). It was also conducted in the context of the DFG-funded Cluster of

Excellence for Multimodal Computing and Interaction.

AUTHOR CONTRIBUTIONS N.T.D. conceived and drafted the workflows. Y.A.contributed to the workflows. N.T.D., Y.A., F.S.D. and M.A. wrote and approvedthe manuscript.

COMPETING FINANCIAL INTERESTSThe authors declare no competing financialinterests.

Published online at http://www.natureprotocols.com/.

Reprints and permissions information is available online at http://www.nature.com/reprints/index.html.

1. Barabasi, A.L., Gulbahce, N. & Loscalzo, J. Network medicine: a network-

based approach to human disease. Nat. Rev. Genet.12, 5668 (2011).2. Beyer, A., Bandyopadhyay, S. & Ideker, T. Integrating physical and genetic

maps: from genomes to interaction networks. Nat. Rev. Genet.8, 699710(2007).

3. Frishman, D. et al.Protein-protein interactions: analysis and prediction.in Modern Genome Annotation: the Biosapiens Network353410 (Springer-

Verlag, 2008).4. Przytycka, T.M., Singh, M. & Slonim, D.K. Toward the dynamic

interactome: its about time. Brief Bioinform.11, 1529 (2010).5. Yamada, T. & Bork, P. Evolution of biomolecular networks: lessons from

metabolic and protein interactions. Nat. Rev. Mol. Cell Biol.10, 791803(2009).

6. Zhang, S., Jin, G., Zhang, X.S. & Chen, L. Discovering functions andrevealing mechanisms at molecular level from biological networks.Proteomics7, 28562869 (2007).

7. Bode, C. et al.Network analysis of protein dynamics. FEBS Lett.581,27762782 (2007).

8. Csermely, P. Creative elements: network-based predictions of active centres

in proteins and cellular and social networks. Trends Biochem. Sci.33,569576 (2008).

9. Krishnan, A., Zbilut, J.P., Tomita, M. & Giuliani, A. Proteins as networks:usefulness of graph theory in protein science. Curr. Protein Pept. Sci.9,2838 (2008).

10. Vishveshwara, S., Ghosh, A. & Hansia, P. Intra and inter-molecularcommunications through protein structure network. Curr. Protein Pept. Sci.10,146160 (2009).

11. Welsch, C. et al.Molecular basis of telaprevir resistance due to V36 andT54 mutations in the NS3-4A protease of the hepatitis C virus. Genome

Biol.9, R16 (2008).12. Albert, R. Scale-free networks in cell biology.J. Cell Sci.118, 49474957 (2005).13. Almaas, E. Biological impacts and context of network theory.J. Exp. Biol.210,

15481558 (2007).

14. Brohee, S., Faust, K., Lima-Mendez, G., Vanderstocken, G. & van Helden, J.Network analysis tools: from biological networks to clusters and pathways.Nat. Protoc.3, 16161629 (2008).

15. Junker, B.H. & Schreiber, F.Analysis of Biological Networks(John Wiley &Sons, 2008).

16. Pavlopoulos, G.A. et al.Using graph theory to analyze biological networks.BioData Min.4, 10 (2011).

17. Przulj, N. Protein-protein interactions: making sense of networks viagraph-theoretic modeling. BioEssays33, 115123 (2011).

18. Zhu, X., Gerstein, M. & Snyder, M. Getting connected: analysis andprinciples of biological networks. Genes Dev.21, 10101024 (2007).

19. Chuang, H.Y., Hofree, M. & Ideker, T. A decade of systems biology.Annu.Rev. Cell Dev. Biol.26, 721744 (2011).

20. Pavlopoulos, G.A., Wegener, A.-L. & Schneider, R. A survey of visualizationtools for biological network analysis. BioData Min.1, 12 (2008).

21. Suderman, M. & Hallett, M. Tools for visually exploring biologicalnetworks. Bioinformatics23, 26512659 (2007).

22. OMadadhain, J., Fisher, D., White, S. & Boey, Y.B. The JUNG (Java

Universal Network/Graph) Framework. Techn. Rep. UCI-ICS0317 (2003).23. Mehlhorn, K. & Nher, S. LEDA: A Platform for Combinatorial and Geometric

Computing(Cambridge University Press, 1999).

24. Hagberg, A.A., Schult, D.A. & Swart, P.J. Exploring network structure,dynamics, and function using NetworkX. Proceedings of the 7th Python in

Science Conference1115 (2008).

25. Csrdi, G. & Nepusz, T. The igraph software package for complex networkresearch.InterJ. Complex Syst.1695 (2006).

26. Handcock, M.S., Hunter, D.R., Butts, C.T., Goodreau, S.M. & Morris, M.

statnet: software tools for the representation, visualization, analysis andsimulation of network data. J. Stat. Softw.24, 15487660 (2008).


16/16

2012


PROTOCOL

27. Butts, C.T. Social network analysis with sna.J. Stat. Softw.24(2008).28. Opsahl, T., Agneessens, F. & Skvoretz, J. Node centrality in weighted

networks: generalizing degree and shortest paths. Soc. Networks32,245251 (2010).

29. Mueller, L.A.J., Kugler, K.G., Dander, A., Graber, A. & Dehmer, M. QuACN:an R package for analyzing complex biological networks quantitatively.Bioinformatics27, 140141 (2011).

30. Batagelj, V. & Mrvar, A. Pajekprogram for large network analysis.Connections21, 4757 (1998).

31. Hu, Z. et al.VisANT: data-integrating visual framework for biologicalnetworks and modules. Nucleic Acids Res.W352W357 (2005).

32. Khler, J. et al.Graph-based analysis and visualization of experimental

results with ONDEX. Bioinformatics22, 13831390 (2006).33. Garcia-Garcia, J., Guney, E., Aragues, R., Planas-Iglesias, J. & Oliva, B.

Biana: a software framework for compiling biological interactions and

analyzing networks. BMC Bioinformatics11, 56 (2010).34. Cline, M.S. et al.Integration of biological networks and gene expression

data using Cytoscape. Nat. Protoc.2, 23662382 (2007).35. Shannon, P. et al. Cytoscape: a software environment for integrated models

of biomolecular interaction networks. Genome Res.13, 24982504 (2003).36. Wittkop, T. et al.Comprehensive cluster analysis with Transitivity

Clustering. Nat. Protoc.6, 285295 (2011).37. Srivas, R. et al.Assembling global maps of cellular function through

integrative analysis of physical and genetic networks. Nat. Protoc.6,13081323 (2011).

38. Assenov, Y., Ramirez, F., Schelhorn, S.-E., Lengauer, T. & Albrecht, M.

Computing topological parameters of biological networks. Bioinformatics24,282284 (2008).

39. Doncheva, N.T., Klein, K., Domingues, F.S. & Albrecht, M. Analyzing andvisualizing residue networks of protein structures. Trends Biochem. Sci.36,179184 (2011).

40. Barabasi, A.L. & Oltvai, Z.N. Network biology: understanding the cellsfunctional organization. Nat. Rev. Genet.5, 101113 (2004).

41. Astsaturov, I. et al.Synthetic lethal screen of an EGFR-centered networkto improve targeted therapies. Sci. Signal3, ra67 (2010).

42. Ragusa, M. et al.Expression profile and specific network features of the

apoptotic machinery explain relapse of acute myeloid leukemia afterchemotherapy. BMC Cancer10, 377 (2010).

43. Lorenz, W. et al.Microarray analysis and scale-free gene networks identify

candidate regulators in drought-stressed roots of loblolly pine (P. taedaL.).BMC Genomics12, 264 (2011).

44. Radrich, K. et al.Integration of metabolic databases for the reconstruction

of genome-scale metabolic networks. BMC Syst. Biol.4, 114 (2010).45. Choura, M. & Reba, A. Application of computational approaches to study

signalling networks of nuclear and Tyrosine kinase receptors. Biol. Direct5,58 (2010).

46. Gu, H., Zhu, P., Jiao, Y., Meng, Y. & Chen, M. PRIN: a predicted riceinteractome network. BMC Bioinformatics12, 161 (2011).

47. Yu, H. et al.Next-generation sequencing to generate interactome datasets. Nat. Methods8, 478480 (2011).

48. Pettersen, E.F. et al.UCSF Chimera: a visualization system for exploratory

research and analy

Documents

Topological Analysis and Interactive Visualization of Biological Networks and Protein Structures