Yaniv Frishman - Technion · Graph Drawing Algorithms in Information Visualization Research Thesis In Partial Ful llment of the Requirements for the Degree of Doctor of Philosophy

Graph Drawing Algorithms inInformation Visualization

Yaniv Frishman

Technion - Computer Science Department - Ph.D. Thesis PHD-2009-02 - 2009


Graph Drawing Algorithms inInformation Visualization

Research Thesis

In Partial Fulfillment of theRequirements for the

Degree of Doctor of Philosophy

Yaniv Frishman

Submitted to the Senate ofthe Technion - Israel Institute of Technology

Tevet, 5769 Haifa Janurary, 2009


This Research Thesis Was Done Under The Supervision of

Prof. Ayellet Tal

in the Department of Computer Science.

The Generous Financial Help of the Technion is Gratefully Acknowledged.

Acknowledgements

Obtaining a Ph.D. is a great privilege. There are many people I would like to thankfor helping me with this achievement.

I would like to express my gratitude to my advisor Prof. Ayellet Tal for supportingme during the different stages of this long journey. I would especially like to thank herfor providing feedback and suggestions for improving my work. Thanks to her guidance,my presentation and writing skills have improved markedly, a valuable skill of its own.

I would like to thank my loving wife Maya for her support, encouragement and un-derstanding, especially when I was occupied with my studies and consequently not therefor her. It was very rewarding to share many joyful moments with her along the way.

This achievement would not have been possible without the help, support and en-couragement of my parents Miriam and Dov. I would also like to thank them for all theyhave done for me. Special thanks go to my mother for taking an active part in producingmy papers and accompanying videos. I would like to thank my brothers Etai and Ofrifor their support. I would also like to mention my grandmother Bela who always helpsme and encourages me. I would also like to dedicate this dissertation to the memory ofmy grandfather Solomon for his belief in the value of a higher education.

I would like to thank my parents-in-law Kitty and Arie and Maya’s grandmother forsupporting me and providing an environment where I could concentrate on my studies.

Special thanks go to my friends Sivan Bercovici, Dr. Avi Steiner and Dr. AmitMizrachi for their help and support along the way.


Contents

Abstract 1

List of Symbols and Abbreviations 3

1 Introduction 7

1.1 Information Visualization . . . . . . . . . . . . . . . . . . . . . . . . . . 7

1.2 Graph Drawing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

1.3 Graphics Processing Units . . . . . . . . . . . . . . . . . . . . . . . . . . 10

1.4 Outline and Main Contributions . . . . . . . . . . . . . . . . . . . . . . . 12

2 Related Work 15

2.1 Graph Drawing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.2 General Purpose Computation on Graphics Processing Units (GPGPU) . 20

3 Multi-Level Graph Layout on the GPU 23

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

3.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

3.3 Spectral Graph Partitioning . . . . . . . . . . . . . . . . . . . . . . . . . 27

3.4 Multi-level layout Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 30

3.5 GPU Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

3.6 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

3.7 Visualization of ISP Router Networks . . . . . . . . . . . . . . . . . . . . 45

3.8 Conclusion and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . 47

4 Uncluttering Graph Layouts Using Anisotropic Diffusion and Mass Trans-

port 49


Contents iv

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

4.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

4.3 The Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

4.4 Computing an Optimal Mapping . . . . . . . . . . . . . . . . . . . . . . 61

4.5 Implementation on the GPU . . . . . . . . . . . . . . . . . . . . . . . . . 64

4.6 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67


5 Online Dynamic Graph Drawing 75

5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

5.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

5.3 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

5.4 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

5.4.1 Computing Dynamic Layouts . . . . . . . . . . . . . . . . . . . . 79

5.4.2 Computing the Initial Layout L0 . . . . . . . . . . . . . . . . . . 88

5.5 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

5.6 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

5.7 Application to Discussion Thread Visualization . . . . . . . . . . . . . . 99

5.8 Application to Social Network Visualization . . . . . . . . . . . . . . . . 101


6 Dynamic Drawing of Clustered Graphs 105

6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

6.2 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

6.3 The Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

6.3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112

6.3.2 Supporting Clusters . . . . . . . . . . . . . . . . . . . . . . . . . . 113

6.3.3 Minimizing Visual Changes . . . . . . . . . . . . . . . . . . . . . 114

6.3.4 Merging Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115

6.3.5 Improving the Layout . . . . . . . . . . . . . . . . . . . . . . . . 115

6.3.6 Display and Animation . . . . . . . . . . . . . . . . . . . . . . . . 116

6.4 Visualizing Mobile Object Software . . . . . . . . . . . . . . . . . . . . . 117


Contents v


7 MOVIS: A system for Visualizing Distributed Mobile Object Environ-

ments 125

7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125

7.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127

7.3 Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128

7.4 Physical and Logical Visualization . . . . . . . . . . . . . . . . . . . . . . 130

7.5 Visualization Consistency . . . . . . . . . . . . . . . . . . . . . . . . . . 132

7.6 Visualization Scalability . . . . . . . . . . . . . . . . . . . . . . . . . . . 134

7.6.1 Levels of Detail . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135

7.6.2 Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137

7.7 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139

7.7.1 Event Generation . . . . . . . . . . . . . . . . . . . . . . . . . . 139

7.7.2 Event Synchronization Component . . . . . . . . . . . . . . . . . 140

7.8 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142


8 Conclusions 147

8.1 Contribution and Summary . . . . . . . . . . . . . . . . . . . . . . . . . 147

8.2 Future Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151

Bibliography 155


Contents vi


List of Figures

1.1 A straight-edge layout of an undirected, labeled graph. . . . . . . . . . . 9

1.2 A comparison of the peak floating-point calculation rate in giga float-

ing point operations per second (GFLOPS) of Intel CPUs and ATI and

NVIDIA GPUs. Image is reproduced from [164]. . . . . . . . . . . . . . 10

3.1 ISP router map. Each node represents a router. Edges link routers. Red

nodes are external to the ISPs visualized. Other nodes are colored accord-

ing to the ISP they belong to: green - Abovenet (US, 664 routers); blue -

Exodus (US, 551 routers); black - Tiscali (Europe, 513 routers). A total

of 5044 routers and 8043 connections are shown. . . . . . . . . . . . . . 24

3.2 The power iteration algorithm . . . . . . . . . . . . . . . . . . . . . . . . 29

3.3 Algorithm overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

3.4 Representing a graph on the GPU. Left: A graph spatially partitioned into

partitions; right: a corresponding location texture . . . . . . . . . . . . . 38

3.5 Representing graph edges on the GPU. Node X has three neighbors: Y,Z

and W. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

3.6 Execution graph of GPU layout (rectangles = streams, ovals=kernels) . . 40

3.7 bcsstk31. Red: our layout, black: FM 3 layout . . . . . . . . . . . . . . . 42

3.8 Sierpinski 08. Red: our layout, black: FM 3 layout . . . . . . . . . . . . . 42

3.9 finan512. Red: our layout, black: FM 3 layout . . . . . . . . . . . . . . . 43

3.10 flower B. Red: our layout, black: FM 3 layout . . . . . . . . . . . . . . . 43

3.11 4elt. Red: our layout, black: Kamada-Kawai layout . . . . . . . . . . . . 44


List of Figures viii

3.12 ISP router map. Each node represents a router. Edges link routers. Red

nodes are external to the ISPs visualized. Other nodes are colored accord-

ing to the ISP they belong to: blue - Abovenet (US, 665 routers); black

- Exodus (US, 554 routers); yellow - Ebone (Europe, 314 routers); pink -

Tiscali (Europe, 514 routers); brown - Telstra (Australia, 3756 routers). A

total of 10895 routers and 15667 connections are shown. Top left - GRIP

layout. Bottom right - our layout. . . . . . . . . . . . . . . . . . . . . . 46

4.1 Protein graph (V=30727, E=1206654). (a) FM 3 [91] layout. (b) Improved

layout. Note how displacing nodes outwards allows more details to become

visible, especially in the center of the drawing. Also note that the overall

structure of the graph is maintained. . . . . . . . . . . . . . . . . . . . . 51

4.2 Comparison between node overlap removal and graph uncluttering. (a) is

a layout produced using neato [79] of a reduced version of the bcsstk32

graph from [204]. In (b) the node overlap removal algorithm from [77] is

used. Note that although the overlaps between nodes are eliminated, the

structure of the graph is not maintained and the center of the layout is

cluttered. In (c) our algorithm is used. Note how the cluttered right side

of the input layout is expanded, thus increasing node separation, while the

structure of the graph is maintained. . . . . . . . . . . . . . . . . . . . . 53

4.3 Algorithm steps. Higher intensity represents higher values. Values are

scaled to improve contrast. . . . . . . . . . . . . . . . . . . . . . . . . . 57

4.4 Execution graph of finding the best advancement direction on the GPU

in Step 3 (rectangles = textures, ovals=kernels, θ is the current direction

being tested) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

4.5 ug 380 graph (V=1104, E=3231). Note how when using our algorithm the

center expands, reducing node density while the outer ring is unchanged.

When using [49] the layout is hardly changed. . . . . . . . . . . . . . . . 67

4.6 Add32 graph (V=4960, E=9462). Note how in (c) each of the rings is

expanded, showing more detail. . . . . . . . . . . . . . . . . . . . . . . . 68

4.7 ISP router graph (V=5044, E=8043) . Nodes are color-coded by the ISP

they belong to. Note how in (c) the blue nodes are uncluttered. . . . . . 70


List of Figures ix

4.8 Bcsstk32 graph (V=44609, E=985046). Note how in (c) reducing the node

density allows more of the mesh structure of the graph to be uncovered in

the top left, bottom and middle of the graph. . . . . . . . . . . . . . . . 70

5.1 Snapshots from the threads1 graph sequence, visualizing discussion threads

at http://www.dailytech.com, left to right. Node labels in red show user

names, edges link users replying to posted comments. Up to 119 users

are shown. Discussion topics, marked as blue A n nodes, include GPUs

(A 4864, A 4285), chipsets (A 4637, A 4425, A 4538 and A 4866) and

CPUs (A 4589). A total of 144 messages are visualized. . . . . . . . . . . 76

5.2 Dynamic layout steps: (a) previous layout, Li−1 (b) merged graph (Step 1),

color coded according to the positioning score Γ(v). Brighter nodes have

a higher Γ. Here, nodes with Γ ∈ 0.1, 0.25, 1 are shown. (c) Pinning

weights wpin(v) (Step 2). Brighter color corresponds to a higher wpin(v)

(d) Final layout (Step 5), color coded according to the partitioning (Step 4) 82

5.3 Parallel force directed layout algorithm . . . . . . . . . . . . . . . . . . . 85

5.4 Partition size effect on layout, graph bcsstk31, |V | = 35588, |E| = 572916 87

5.5 Sorting nodes by pinning weight wpin on the GPU. (a) : A location texture

separated to regions, color coded by the partition each node belongs to.

(b) : Nodes in each region are sorted from low wpin to high wpin. . . . . 91

5.6 Snapshots from layouts of the 3elt sequence (|V | ≈ 4000, |E| ≈ 10, 500),

left-to-right, top-to-bottom . . . . . . . . . . . . . . . . . . . . . . . . . . 95

5.7 Snapshots from the layouts of the newcomb fraternity data [152]. Left:

our algorithm. Right: SoNIA algorithm [11,12], used in [149]. . . . . . . 97

5.8 Snapshots from the threads2 graph sequence, visualizing discussion threads

at http://www.dailytech.com, left to right, top to bottom. 109 mes-

sages from 86 users in 5 discussion threads are shown. Discussion top-

ics, marked as blue A n nodes, include computer games (A 5054), nuclear

fusion (A 5027), low-cost PCs (A 5060), Windows/Linux switch (A 5069)

and Christmas e-shopping (A 5082) . . . . . . . . . . . . . . . . . . . . . 101


List of Figures x

5.9 Snapshots from the Rimzu graph sequence, visualizing the social network

at http://www.rimzu.com, left to right, top to bottom. Nodes represent

users and edges represent connections between users. In the visualiza-

tion the graph grows from V=216, E=544 to V=962, E=1561. Nodes are

colored by age in a red→ yellow → green scale. . . . . . . . . . . . . . 102

6.1 Snapshots from an animation sequence . . . . . . . . . . . . . . . . . . . 106

6.2 Incremental vs. non-incremental layout (from left to right) . . . . . . . . 107

6.3 Algorithm overview in pseudo-code . . . . . . . . . . . . . . . . . . . . . 112

6.4 3D view of a clustered graph . . . . . . . . . . . . . . . . . . . . . . . . . 117

6.5 2D view of a clustered graph . . . . . . . . . . . . . . . . . . . . . . . . . 117

6.6 Comparing the three layout algorithms . . . . . . . . . . . . . . . . . . . 119

6.7 Sample animation sequence (from left to right and top to bottom) . . . . 122

6.8 Density metric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123

6.9 Sum of cluster displacements . . . . . . . . . . . . . . . . . . . . . . . . . 123

6.10 Number of clusters with the same size . . . . . . . . . . . . . . . . . . . . 123

6.11 Running times . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123

7.1 MOVIS user interface. Small rectangles represent mobile objects. Color

stripes show their movement history. Big rectangles represent the cores the

objects reside in. Dashed lines represent physical communication between

cores. Higher communication frequency is indicated by a higher frequency

of alternation in the lines. Solid lines represent logical connections between

objects. The square in the middle of the figure represents several cores

which have been collapsed. The rectangle with a double boundary was

selected by the user as the current focus of attention core. . . . . . . . . 126

7.2 Levels of detail. Several visualizations of the same mobile object network

are shown. Parts of the graph are progressively collapsed. Note the sta-

bility in the layouts and the conservation of the overall structure of the

graph. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136

7.3 Focus-based clustering algorithm . . . . . . . . . . . . . . . . . . . . . . 137

7.4 Event synchronization algorithm . . . . . . . . . . . . . . . . . . . . . . . 141


List of Figures xi

7.5 Sample animation sequence of the mobile object simulator (from left to

right and top to bottom) . . . . . . . . . . . . . . . . . . . . . . . . . . . 143

7.6 Mailbox mobility in the DEM system. (a) Before movement. (b) A new

core was created. A mailbox migrated to it. . . . . . . . . . . . . . . . . 144

7.7 Sending an e-mail in the DEM system . . . . . . . . . . . . . . . . . . . 145


List of Figures xii


List of Tables

3.1 Graph information and running time [sec.]. Runtime columns show total

running times for computing a layout. . . . . . . . . . . . . . . . . . . . 41

4.1 Graph information and running times. The left side of the table gives

information about the graphs. V and E are the number of graph nodes

and edges, respectively. The central part of the table gives the running

times in seconds of the algorithm from [49], using the same machine used

to run our algorithm. The right side of the table shows the results of our

algorithm. The width and height in pixels of the density image used is

equal to√

P . ITRS is the number of iterations of Equation 4.7 in Step 5.

CPU is the total running time of the algorithm in seconds when using only

the CPU. CPU+GPU is the total running time of the algorithm in seconds

when using the GPU to accelerate Step 3. . . . . . . . . . . . . . . . . . 71

5.1 Layout quality - values are averages for a sequence of layouts . . . . . . . 96

5.2 Graph sequence information. . . . . . . . . . . . . . . . . . . . . . . . . . 98

5.3 Running times [sec.]. The running times of the CPU only and GPU-

accelerated implementation of the algorithm are shown. All times shown

are total running times for computing a layout. Dynamic layout times are

averaged over a sequence of layouts. . . . . . . . . . . . . . . . . . . . . 98

6.1 Average results of an animation sequence . . . . . . . . . . . . . . . . . . 121


List of Tables xiv


Abstract

Information Visualization is the use of computer-supported, interactive, visual represen-

tations of data, and in particular, abstract data, to amplify cognition. Graph drawing

addresses the problem of creating geometric representations of graphs.

This thesis addresses several related problems in graph drawing. First, we address

the problem of quickly creating a layout of a large, general graph. We devise a multi-

level algorithm which is based on spectral partitioning. The algorithm is able to produce

aesthetic layouts at a fraction of the time of existing algorithms. Next, we discuss an

algorithm for improving an existing layout. This is done by warping the coordinates of the

nodes, thus making use of empty and sparse regions in the image of the layout. We then

turn out attention to dynamic graph drawing. The challenge here is to compute a stable

and aesthetic layout while still making it easy for the user to comprehend the changes.

The first dynamic algorithm discussed is a multi-level online incremental algorithm for

drawing general graphs. Assigning different movement flexibilities to the nodes of the

graph allows efficiently creating a stable layout. The second dynamic algorithm discussed

is an online dynamic algorithm for drawing graphs which contain an inherent grouping

into clusters. The algorithm uses node pinning, invisible spacer nodes and edge lengths

and weights in order to minimize changes to the clustered structure of the graph.

In recent years, the programmability and computational power of commodity graph-

ics processing units (GPUs) has increased tremendously. GPUs, traditionally used for

graphics-related applications are now being employed in many data-parallel problems.

Unlike matrices and images, graphs are unstructured and hence graph layout does not

seem to be suitable for acceleration on the GPU. In this research we present methods

to accelerate both static and dynamic graph drawing using a GPU. In addition, using

GPU-accelerated ray-casting, we are able to accelerate our graph improvement algorithm

significantly. In all cases, using a GPU allows performing the required computation in a


Abstract 2

matter of seconds, even for large graphs. Accelerating static layout, dynamic layout and

graph improvement problems by factors of 5.5, 17 and 135, respectively, is demonstrated.

The algorithms developed during this research have been used in different information

visualization applications. We visualize the structure of the networks of Internet service

providers using the static layout algorithm. Our graph improvement algorithm has been

applied to improving graphs computed by various state of the art algorithms on several

applications, including bioinformatics, social interactions and finite element meshes. We

study social networks and the evolution of discussion threads in Internet sites using our

online dynamic graph drawing algorithm. Finally, we employ our dynamic clustered graph

layout algorithm in order to visualize mobile object frameworks. In these frameworks

objects migrate between hosts while the application is running. An innovative, graph-

based, scalable, focus + context visualization is used to depict both the physical network

of machines and the logical network of ties between mobile objects.


List of Symbols and Abbreviations

c core

C a set of clusters which form a partition of the vertex set V

Ci the i-th cluster

C i coarser graph of level i in the graph hierarchy

χ divergence free vector field

dG(u, v) the length of a shortest path between vertices u and v

Di set of nodes with a distance to modification equal to i

D(u, v) the distance between nodes u and v

Ddistorted(u, v) distorted distance between nodes u and v

Dfocal(u) the shortest distance between node u and the closest focal node

Dfocalavg (u, v) joint average distance of nodes u and v and a focal node

Dinitial initial density image of a graph layout

Dsmooth smoothed density image of a graph layout

Dtarget target density image

Du Jacobian matrix of the 2D function u

|Du| Determinant of the Jacobian matrix of the 2D function u

E edges



e, e′, e′′ events

Ec set of cluster-cluster edges

Ev set of vertex-vertex edges

F force acting on a graph node

fracdone fraction of the iterations done

G=(V,E) graph

Gi the i-th graph in a series of graphs

Γ(v) positioning score of node v

K optimal geometric node distance

Li the i-th layout in a series of graph layouts

λ temperature decay constant

Lfinal final graph layout

Li layout of graph i

Linitial initial graph layout

lmax length of ray until image boundary is met

µ density image

N nodes

∇⊥ gradient rotated by 90 degrees

Ω0, Ω1 subdomains of R2

Pi the i-th partition of a graph

PN(v) be the set of neighbors of the node v

pi, pi(x, y) position of node i



pos(v) position of node v

≺ precedes

r algorithm run

R the set of real numbers

t initial graph temperature

θ candidate advancement direction

θbest preferred advancement direction

U graph potential energy

u = (u1(x, y), u2(x, y)) image warp

V vertices

wij weight of edge between node i and node j in a graph

wpin(v) pinning weight of node v

W graph edge weights matrix

API application programming interface

CORBA common object request broker architecture

CPU central processing unit

DAG directed acyclic graph

FR Fruchterman-Reingold layout algorithm

GPGPU general purpose computation on graphics processing units

GPU graphics processing unit

GUI graphical user interface

ID identifier



IP internet protocol

ISP internet service provider

JDI java debug interface

KK Kamada-Kawai layout algorithm

MP mass-preservation

PDE partial differential equation

PVM parallel virtual machine

P2P peer to peer

RMI remote method invocation

SIMD single instruction multiple data

SPMD single program multiple data

UML unified modeling language

VLSI very large scale integration


Chapter 1

Introduction

This thesis addresses graph drawing in information visualization. In the implementation

of some of the algorithms graphics processing units (GPUs) are utilized in order to sig-

nificantly reduce the running time. This chapter presents the background to the thesis

and discusses the main contributions.

1.1 Information Visualization

Information Visualization is defined as the use of computer-supported, interactive, visual

representations of data, and in particular, abstract data, to amplify cognition [29]. Using

graphical representations of data allows making use of the human visual system which is

able to rapidly process large amounts of data and has good pattern recognition abilities.

Information visualization deals with abstract data, which has no inherent mapping to

space. One of the challenges in information visualization is finding a way to map the

data to an image in a way that makes it understandable. This is in contrast to scientific

visualization, which deals with physically-based data which is inherently defined in a

coordinate system. Using interactive and dynamic visual representations of data allows

the user to modify the visualization. This allows the data to be analyzed by exploration.

Users can develop an understanding of the structure and the connections inherent in

the data by observing the effects of the interaction on the data. A few examples of

visualization techniques include selective hiding of data, layering data, using 3D, scaling

and warping techniques in order to use more screen space for important parts of the data

(e.g. fisheye views) and using color and shading to convey information.

In many cases the information is dynamic in nature. In these cases, it is important to


1. Introduction 8

maintain coherence in the visualization, thereby helping the user conserve his/her mental

image of the evolving information. Simply piecing together a series of static snapshots

is not sufficient in order to create a dynamic visualization. The challenge is to create a

coherent sequence of images that tells a story. The user looking at a dynamic visualization

should be able to note changes being unfolded while maintaining an overall understanding

of the data.

Dynamic information visualization provides many interesting research challenges. Start-

ing from innovative ways of collecting data, moving to techniques of processing data to

provide meaningful insights, and ending with creating cognition amplifying methods of

displaying information. As the digital revolution continues, massive amounts of informa-

tion are becoming available. The dynamic visualization challenge is to harness the ample

sources of information available today in a way that enhances understanding and aids

the human mind in getting insight into the evolving phenomena being studied.

In a world rich in communication, processing and display technologies, there is ample

opportunity for innovative visualization techniques. Some applications of information

visualization include graph and network visualization, security and network intrusion

visualization, financial analysis, software visualization, text and document visualization

and social network visualization.

1.2 Graph Drawing

Graphs are abstract mathematical objects that are designed for describing relations be-

tween objects. Graph drawing addresses the problem of finding the best way to draw a

picture of a graph. One of the common methods used to draw a graph is the node-link

diagram. In this visualization, nodes are drawn using dots, circles or other geometrical

forms and edges are drawn using straight or curved lines. Arrows are used to show the

orientation of directed edges. The information corresponding to the nodes and edges can

be visualized using text labels at various positions in or next to a graph object, different

colors (as on a subway map), or other visual elements such as thickness of lines, size of

boxes, etc. A graph may be drawn in the plane or in three dimensions. It may be drawn

completely, partially, or hierarchically, i.e., clusters are shrunken to a single node which

can be expanded on request. Figure 1.1 shows an example of a drawing of an undirected


1. Introduction 9

graph which contains textual node names.

Figure 1.1: A straight-edge layout of an undirected, labeled graph.

Very different graph drawings or graph layouts can correspond to the same graph. In

the abstract graph, all that matters is which vertices are connected to which others by

how many edges. In the visual representation of the graph, however, the arrangement

of these vertices and edges impacts understandability, usability, fabrication cost, and

aesthetics. Therefore, graph drawing is a central problem in information visualization.

The criteria used to judge the quality of a graph layout depends on the application it is

used for. In applications where the main goal is to produce layouts for human consump-

tion these criteria are appropriately called aesthetics criteria, however in applications

where graphs are drawn for other purposes as in VLSI schematics, for example, technical

criteria such as wire length might be more important than aesthetics criteria. Some of

the commonly used aesthetics criteria include: crossing and overlap minimization, bend

minimization, area minimization, angle maximization, length minimization, symmetries

and proper separation between nodes.

Graph drawing has emerged in recent years as a very lively area in computer

1. Introduction 10

Various methods for graph drawing have been proposed, such as hierarchical, planar,

circular, orthogonal, symmetric, spectral and force directed layouts [111, 112, 116, 154,

192,199].

Graph drawing has been used for numerous applications. A few examples include VLSI

circuit design, social networks, bioinformatics, train network maps, genealogy, state ma-

chines, function call graph visualization, software evolution visualization, network visu-

alization, databases, data structures, computer security, software engineering (e.g. UML

diagramming, class browsers) and workflow management (e.g. flow chart generation).

1.3 Graphics Processing Units

Commodity computer graphics chips, known generically as Graphics Processing Units or

GPUs are one of the most accessible high-performance computational platforms [89,163,

178]. Intended initially for performing graphics related computations for applications

such as computer games, computer-aided design and visualization, GPUs have evolved

into programmable and economic parallel processing units. They exist in almost every PC

sold today and are evolving at a rapid rate. Figure 1.2 compares the relative performance

of CPUs and GPUs. Note the large increase of GPU performance over time. Also note

that the GPU performance is much higher than the CPU performance.

The GPU’s arithmetic power is a result of a specialized architecture, which evolved

over years to provide maximum performance on the highly parallel tasks of traditional

computer graphics [134, 148]. Unlike CPUs, which are optimized for high-performance

on sequential code, where many transistors are dedicated to instruction-level parallelism

using techniques such as branch prediction and out-of-order execution, GPUs are opti-

mized for data-parallel applications. This allows a larger portion of the die area of GPUs

to be dedicated to computational units. Thus, using the same semiconductor technology,

GPUs are able to achieve much higher computation speeds compared to CPUs under

some conditions.

Unlike CPUs, which dedicate a large percentage of the die area to cache memory, in

GPUs, handling the long memory access latency is achieved by quickly switching between

multiple threads in hardware. This enables the GPU to hide long memory access latencies

without sacrificing die area for large caches. Thus, the GPU is able to achieve good


1. Introduction 11

Figure 1.2: A comparison of the peak floating-point calculation rate in giga floating point

operations per second (GFLOPS) of Intel CPUs and ATI and NVIDIA GPUs. Image is

reproduced from [164].

utilization of its computational units. Unlike a CPU, the caches of the GPU are designed

for short-term reuse and are constructed to give a 2D access locality. This gives the GPU

an advantage in image related applications where data is accessed in 2D patterns.

As graphics hardware has become more powerful, one of the primary goals of each new

architecture has been to increase the visual realism of rendered images. This is achieved by

implementing increasingly complex rendering algorithms in real-time on the GPU. From

the fixed-function graphics pipelines of several years ago [60], the GPU architecture has

been steadily progressing towards a more general-purpose architecture. New versions of

the graphics APIs [16, 181] expose more programmable parts of the GPU while adding

more instructions and programming flexibility.

In parallel to the evolution of the graphics hardware, the languages used to program

GPUs have been evolving. Starting from small programs written in assembly language,

there are now several C-like programming environments in which GPUs can be pro-

grammed. Some examples include Cg [140], Sh [142] and Brook [26]. In an effort to

streamline the use of GPUs for non-graphics high-performance computation, the major

GPU vendors have released programming environments that do no use the traditional

method of using the graphics driver to access the GPU. NVIDIA’s CUDA [156] and

1. Introduction 12

ATI’s CTM [168] reduce the overhead of accessing the GPU, thus potentially improving

the efficiency of using the GPU.

The programmable units of the GPU are architected to follow the single program,

multiple data (SPMD) programming model, in which many independent elements are

processed in parallel using the same program. This model is well-suited for straight-line

programs in which many elements are processed in lockstep, running the exact same code.

Such code is single instruction, multiple data (SIMD). While today’s GPUs are capable of

executing code in which different execution paths are taken by each element, this results

in a performance penalty. Thus, GPU programs attempt to group elements into blocks,

in order to have coherent branches in each block. One of the main challenges in achieving

a speedup when using a GPU is understanding how to exploit the architecture of the

GPU effectively.

The combined advances in GPU hardware architecture and programmability have

spawned the emergence of a vibrant developer community of GPGPU (general computa-

tion of graphics processing units) applications [89, 162]. Taking advantage of the much

higher peak computation ability and memory bandwidth of the GPU, many algorithms

have been successfully implemented on the GPU with high speedups compared to CPU

implementations. Some examples of algorithms accelerated on GPUs include solving par-

tial differential equations, linear algebra, image and signal processing, segmentation, and

geometric computing [58, 89, 153, 163, 169]. See Section 2.2 for a more comprehensive

review of algorithms which were accelerated on GPUs.

1.4 Outline and Main Contributions

In this thesis we address several related problems in graph drawing for information visu-

alization. It is based on the papers [13, 63–69]. We start with the problem of drawing a

large, general graph quickly and aesthetically. Next, an algorithm for improving a graph

layout is presented. This algorithm can be used as a post-processing step of any domain-

specific layout algorithm. Our improvement algorithm unclutters the given layout by

making more efficient use of available screen space. Next, we address dynamic graph

drawing algorithms for both general and clustered graphs. These algorithms attempt

to preserve the mental map [145] and to produce, online, stable layouts of time-varying


1. Introduction 13

graphs.

Applications of structured data, such as matrices and images are very suitable for

acceleration using a GPU. The challenge is how to use this hardware to accelerate al-

gorithms that utilize unstructured data, such as graphs. In this thesis we present two

GPU-accelerated graph drawing algorithms which are able to quickly compute aesthetic

layouts of large graphs. One is for the layout of a single graph and one is for computing

stable layouts of a sequence of graphs. Speedups of x5.5 to x17 relative to a CPU im-

plementation are demonstrated. In addition, using a GPU, we are able to accelerate our

algorithm for improving graph layouts by a factor of over 100 times.

Throughout this thesis we provide practical applications of our algorithms to different

information visualization problems. We demonstrate how the structure of Internet service

provider (ISP) networks can by visualized and analyzed using our static graph drawing

algorithm. We show how the layout of graphs from different application domains such

as bioinformatics, VLSI, and finite element meshes can be improved by our uncluttering

algorithm. We apply our dynamic graph drawing algorithm to visualization of social

networks and Internet discussion threads. Finally, we have developed a system for the

visualization of mobile objects [38,102,130], which are an extension of distributed objects.

This system uses clustered graphs to show the structure and interactions in a network

of mobile objects. A hierarchical, scalable focus + context technique is used in the

visualization.

In Chapters 3- 7 we present the main research results of this thesis. The main contri-

butions of each chapter in the thesis are summarized below.

Chapter 3 presents an algorithm for static graph layout [65]. The algorithm is based

on the force-directed approach [18,70,113,199]. We propose a multi-level scheme which is

based on spectral partitioning. A technique to efficiently perform the layout on the GPU

is presented. The algorithm manages to compute high quality layouts of large graphs in

a fraction of the time required by existing algorithms of similar quality.

Chapter 4 discusses a technique for modifying an existing layout in order to reduce the

clutter in dense areas [69]. Using a physically-inspired evolution process, graph nodes are

dispersed more evenly in the available screen space. A mental-map preserving warping

process is used to displace the nodes. The complexity of the algorithm depends mainly on

the resolution of the image used for computing the density of information in the graph.


1. Introduction 14

As such, the computation can be scaled according to the allotted running time. Using a

GPU, we are able to significantly handle large graphs in a matter of seconds. Applications

to bioinformatics, VLSI and finite element meshes are demonstrated.

In Chapter 5 an algorithm for drawing a sequence of graphs online is presented [66,68].

While allowing arbitrary modifications to the graph, the algorithm strives to maintain

the global structure of the graph and thus the user’s mental map. The algorithm works

online and uses various execution culling methods in order to reduce the layout time and

handle large dynamic graphs. Techniques for representing graphs on the GPU allow a

speedup by a factor of up to 17 compared to the CPU implementation. Applications to

social networks and visualization of Internet discussion threads are presented.

Chapter 6 presents an algorithm for drawing a sequence of graphs that contain an

inherent grouping of their vertex set into clusters [63]. The algorithm works online and

allows arbitrary modifications to the graph. It uses node pinning and invisible nodes in

order to maintain the clustered structure of the graph during incremental layout. Several

metrics for measuring the quality of the dynamic layout of clustered graphs are discussed.

In Chapter 7, an application to the visualization of mobile objects is discussed.

In Chapter 7 a system for visualizing mobile object frameworks is presented [13,64,67].

In these frameworks, the objects migrate to remote hosts, along with their state and

behavior, while the application is running. A graph-based visualization is used to depict

both the physical layer (placement on hosts) and the logical layout (relations between

objects) of the system. The system is scalable and able to create a consistent visualization

of the distributed system.


Chapter 2

Related Work

In this chapter we discuss some research that is related to this thesis. We start with graph

drawing, which is a fundamental tool in many information visualization applications, such

as software visualization. Next, we review some applications of graphics processing units,

which are related to the research presented in this thesis. In the following chapters

additional, topic-specific references are discussed.

2.1 Graph Drawing

The general problem of drawing graphs, e.g., assigning coordinates to graph vertices,

edges and other elements, has been extensively studied [46, 111, 112, 116, 154, 192, 199].

In the following paragraphs we review different types of layout algorithms and give more

information about work related to this thesis.

Several classes of algorithms for drawing graphs have been developed. Selecting a spe-

cific algorithm depends both on the type of graph to be laid out and on the requirements

from the resulting layout.

Graphs that are planar, i.e. can be drawn with no edge crossings are often drawn

using planar layout algorithms [90, 114, 116, 180]. Tree-like structures are drawn using

tree layout [33,41,42,80,116,175]. Directed graphs that contain an inherent hierarchy are

drawn using hierarchical layout algorithms [46,52,76,116,194]. These algorithms attempt

to find a source and sink within a directed graph and arrange the nodes in layers with most

edges starting from the source and flowing in the direction of the sink. Such algorithms

try to minimize the number of crossings or the area of the layout. In some cases, it is

required to draw edges in either the horizontal or vertical direction, while trying to reduce


2. Related Work 16

edge crossings. Examples include drawings used for circuit board and integrated circuit

design. In these cases orthogonal layout algorithms [15,81,116,166,196,197] are used.

A variety of algorithms have been devised for drawing general graphs. Among them

spectral layout algorithms and force-directed layouts are widespread. In spectral lay-

out [19, 121–123] the node coordinates are extracted from the eigenvectors of a matrix,

such as the Laplacian matrix of the graph, which is derived from the adjacency (con-

nectivity) matrix of the graph. In force-directed layout [39, 43, 50, 62, 70, 113, 193] a

gradient-descent minimization of an energy function based on physical analogies is used

to compute the layout. In this thesis we focus on force-directed layout.

Force-directed layout One of the most popular techniques for graph layout is force-

directed layout. It uses physical analogies in order to converge to an aesthetically pleasing

drawing [18,39,43,50,62,70,113,193,199]. In this class of algorithms, the graph is modeled

as a system of particles that exert forces on each other. Springs are used to model the

edges in the graph. The direction and magnitude of the force exerted on the two particles

connected by an edge depends on the distance between them and the ”stiffness” of the

spring. These are parameters that can be modified to produce different effects. Starting

from an initial position, force-directed algorithms strive to converge to an equilibrium

position, which often produces a good layout of the graph. It should be noted that due

to the complexity of the problem, a local minimum is reached, and not a global one.

The algorithm of Fruchterman and Reingold [70] is a well-known variant of the force-

directed layout technique. In this algorithm, the following force is defined between each

pair of vertices pu, pv ∈ V :

frepulsive(pu, pv) =l2

‖pu − pv‖· pv − pu

‖pv − pu‖.

Here, l is a parameter of the algorithm, which is used to denote the natural length of

a spring attaching two vertices connected by an edge. In addition, an attractive force

is defined between every pair of vertices pu, pv which are connected by an edge (i.e.

(pu, pv) ∈ E):

fattractive(pu, pv) =‖pu − pv‖2

l· pv − pu

‖pv − pu‖.


2. Related Work 17

The running time of this algorithm is O(V 2 + E): all vertex pairs on the graph need

to be considered for calculating repulsive forces and all edges are considered in order

to calculate attractive forces. In order to prevent excessive changes, especially in later

stages of the iteration when the placement is close to a stable state, the algorithm uses a

time-dependent maximum displacement value which declines over time.

Kamada and Kawai [113] introduced a different variant of the force-directed layout

algorithm. The idea here is to minimize the energy of the layout directly, instead of

reducing the forces acting on the vertices. In this algorithm, the ideal distance between

two nodes is set as the length of the shortest path between them, multiplied by the ideal

length of a single edge. The resulting objective function is the sum over the potential

energies of all n(n− 1)/2 springs,

UKamada Kawai =∑

u,v∈V

c

dG(u, v)2· (‖Pu − Pv‖ − l · dG(u, v))2,

Where dG(u, v) denotes the length of a shortest path between vertices u and v, c is a

scaling constant and l is the ideal length of a single edge. To obtain a local minimum of

this objective function, a modified Newton-Raphson method is applied. In each iteration,

the vertex with the longest gradient is picked and displaced. The running time of this

algorithm is quite high, since it requires computing all-pairs shortest paths and it scans

all vertices in the graph and then only displaces one vertex.

Due to the high computational cost of force-directed algorithms, extending them for

drawing large graphs has been extensively studied [92]. One popular technique is to use a

multi-level approach [7,72,91,96,123,205]. The idea here is to recursively create reduced

graphs, until a sufficiently small enough graph is created. Next, a series of graph layout

problems are solved - starting from the coarsest graph, finer and finer approximations of

the final layout are created.

Various algorithms have been used to perform multi-level graph layout. In [205],

edge collapse operations, commonly used in computer graphics for mesh simplification,

are used. The algebraic multi-grid technique has also been used to successfully compute

high-quality layouts [123]. Creating ”solar systems” by clustering nodes at a distance of

2 edges or less from a central node is described in [91]. Using a maximum-independent

set filtration in order to coarsen the graph is discussed in [72]. TopoLayout [7] is a


2. Related Work 18

feature-based multi-level graph drawing algorithm. It creates a subgraph hierarchy by

recursively detecting topological features in the graph and replacing them with meta-

nodes. Chapter 3 describes a new multi-level force-directed algorithm, which is based on

spectral partitioning [59,170].

Dynamic graph drawing As opposed to static graph drawing where a single graph

is considered, dynamic graph drawing address the problem of computing layouts for a

sequence of related graphs. In offline dynamic graph drawing the entire sequence of

graphs to be laid out is known in advance. In contrast, in online dynamic graph drawing,

for each graph provided as input, a layout is computed. Thus, an online algorithm is not

able to take future changes to the graph into account when computing the layout.

If the sequence of graphs to be laid out is known in advance, different algorithms

can be employed in order to solve the incremental layout challenge. One algorithm to

address this problem is discussed in [47]. The algorithm constructs a super-graph that

combines information from several adjacent timeslots in the animation sequence in order

to produce a smooth animation. In [128], a stratified, abstracted version of the graph is

used. An offline algorithm for the visualization of the evolution of software over time is

presented in [40,56].

Online visualization of social networks is discussed in [149]. An approach based on

Bayesian networks is described in [20]. Online drawing of orthogonal and hierarchical

graphs is discussed in [86]. Chapters 5 and 6 present online algorithms for drawing

general and clustered graphs, respectively.

Incremental drawing of directed acyclic graphs is discussed in [155], which uses a mod-

ification of the Sugiyama algorithm [194] in order to draw ranked digraphs. A heuristic

that moves nodes between adjacent layers is employed. Although the algorithm performs

well, it is restricted to graphs that contain an inherent hierarchy of nodes.

Clustered graph drawing Work on clustered graph drawing is less widespread. In [206],

a divide and conquer approach, in which each cluster is laid out separately and then the

clusters are composed to form the graph, is used. This approach has the drawback of

not taking edges between vertices belonging to different clusters into account. This may

result in many edge crossings or long edges. In [51], a method of drawing the clustering


2. Related Work 19

hierarchies of the graph using different Z coordinates in a 3D view is discussed. Display

in 3D allows to present the recursive clustered structure of the graph more easily. One

drawback of this work is that the entire structure is presented, which may be quite com-

plex. No means are given in order to help simplify the visual complexity of the graph.

Other research in clustered and compound graph layout includes [14, 25]. In Chapter 6

we present our online algorithm for drawing sequences of clustered graphs.

Commercial graph drawing software There are several companies which have graph

drawing products. These are used for different applications such as process and workflow

diagrams, business organizational charts, network management displays and supply-chain

diagrams.

The ILOG JViews Diagrammer package [107] includes a broad range of layout al-

gorithms implemented in Java. Algorithms supported include hierarchical layout, tree

layout, circular layout and a spring embedder. Nested subgraphs are supported, allowing

the user to expand the view of the contents of subgraphs. The graph can be annotated

with different line styles, labels and tooltips. The package includes some incremental lay-

out capability, ensuring that small changes do not force large diagram rearrangements.

Tom Sawyer offers graph layout software [200]. Algorithms supported include circular,

hierarchical, orthogonal, symmetric and well as an interface for supplying constraints

on the layout. The package includes an edge router, which can help reduce node-edge

overlaps. The software has several interfaces, including ActiveX, C++, Java and .Net.

Emphesis is put on running the layout algorithms quickly. The package can interface

with analysis and visualization software.

YWorks offers the yFiles graph layout pacakge [209]. Several layout algorithms are

supported, including circular, hierarchical, orthagonal and tree layout. In addition, an

“organic” layout algorithm exists, which seems to be based on force-directed layout. Note-

worthy is support for incremental layout of tree, heirarchical and circular drawings [117],

as well as support for incremental edge routing.

The clearcase [37] software configuration and version management tool, uses a hierar-

chical graph layout algorithm [194] in order to display versioning information for files and

directories in a software project. This allows inspecting the modification history of each

software component in the project. In addition, the user interface allows running com-


2. Related Work 20

parison and merging queries on the graph being displayed. Using the graph visualization,

it is much easier for the user to understand the changes applied to a module.

While the commercial tools discussed above include implementations of several layout

algorithms, there is still ample room for research. Examples include incremental layout

of clusterd graphs, incremental layout of multi-level clustered graphs, efficient handling

of large graphs, creating high-quality layouts and techniques for improving layout quality.

In this thesis, some of these challenges are addressed.

2.2 General Purpose Computation on Graphics Pro-

cessing Units (GPGPU)

In recent years, graphics processing units (GPUs) have been used in many applications

not directly related to computer graphics [58,153,163,169]. A few examples include clas-

sification using support vector machines [31], sequence alignment in Biomedical applica-

tions [139], image compression [55], visual tracking [147], probabilistic sequence search in

Biomedical applications [104], tone mapping [84], particle systems [118], histogram gen-

eration [179], neural networks [160], level-set segmentation [177], wavelet transforms [28],

database queries [87, 88], geometric pattern matching [5], acustic simulation [176]. This

has been termed GPGPU or general purpose computation on graphics processing units.

The website [89] lists several hundred papers and research applications of GPUs. In this

section we review work in linear algebra, ray tracing, solution of PDEs and computation

of forces between particles, which is more relevant to our work.

GPUs have been successfully used to perform linear algebra and matrix computa-

tions. In [73] a system for solving dense linear systems using LU decompositions and

other techniques is described. The computation is accelerated using GPU architectural

features devised initially for texture processing, such as coordinate interpolation units. A

system for performing general-purpose linear algebra calculations on matrices and vectors

is presented in [127]. Applications to multi-dimensional finite differecnes, such as the 2D

wave equation and incompressible Navier-Stokes equations are presented. In [57,131] al-

gorithms for dense matrix multiplication are presented and their performance is analyzed.

A recent paper presents new ways of accelerating spare matrix-vector multiplication on

the GPU using scan primitives to perform the calculation efficiently on the GPU [182].


2. Related Work 21

The multi-grid algorithm [24,44] is an advanced, fast and popular approach to solving

large boundary value problems. In [17] an implementation of a spare matrix conjugate

gradient solver and a regular-grid multi-grid solver on the GPU are discussed. Another

implementation of a multi-grid solver on the GPU is presented in [85]. In [109], an appli-

cation of creating marble-like textures on the GPU which uses the multi-grid technique

to solve PDEs (partial differential equations) in a fluid dynamics simulation, is presented.

In Chapter 4 A GPU is used to accelerate a ray casting algorithm. In [30] ray-triangle

intersections are performed on the GPU. The highest acceleration is achieved when caches

of coherent rays are processed. The algorithm is partitioned between the CPU and the

GPU, combining the strengths of both. In [171] all of the triangles in the scene are stored

on the GPU in a 3D grid. This allows the entire raytracer to run on the GPU, eliminating

the CPU-GPU communication bottleneck of [30]. More recent work, such as [61] uses

more advanced techniques, such as using kd-trees to accelerate the computation. Ray

casting on GPUs has also been used in order to perform volume rendering of 3D data.

In [126] a multi-pass ray casting technique, employing empty space skipping and early

termination is used. In [188] single-pass ray casting on the GPU, which improves accuracy

in the volume integral computation compared to texture slicing which uses framebuffer

precision of 8 to 16 bits, is introduced. In [137] an adaptive object and image-space

sampling density of multiresolution volumes is used to reduce running time.

Simulation of physical phenomena using PDEs is one of the many applications of

GPUs. In [99] a real-time simulation of fluid dynamics on the GPU, which uses Jacobi

iterations [44] to converge to a solution is used. Simulation of the dynamics of clouds

on the GPU is discussed in [100]. There, methods to efficiently process a 3D volume

using the GPUs 2D addressing capabilities are presented. Simulation in 3D of fluids in a

volume that contains obstacles is discussed in [136].

Chapters 3 and 5 show how the calculation of forces acting on nodes, used for graph

layout, can be accelerated on the GPU. There are many research problems in physics,

chemistry and astrology where very similar calculations between interacting particles are

important. Hence, similarly to this research, researchers in these fields have turned to

the GPU in order to achieve high performance computation on cheap and accessible

platforms (i.e. GPUs). In [6, 135,190] the simulation of the dynamics of molecules using

a GPU is described. In [165] a world-wide distributed system for the simulation of protein


2. Related Work 22

folding on the GPU and other high-performance computational platforms is described.

The simulation of N-body gravitational forces on the GPU is discussed in [157].


Chapter 3

Multi-Level Graph Layout on theGPU

This chaper presents a new algorithm for force directed graph layout on the GPU. The

algorithm, whose goal is to compute layouts accurately and quickly, has two contributions.

The first contribution is proposing a general multi-level scheme, which is based on spectral

partitioning. The second contribution is computing the layout on the GPU. Since the

GPU requires a data parallel programming model, the challenge is devising a mapping

of a naturally unstructured graph into a well-partitioned structured one. This is done

by computing a balanced partitioning of a general graph. This algorithm provides a

general multi-level scheme, which has the potential to be used not only for computation

on the GPU, but also on emerging multi-core architectures. The algorithm manages to

compute high quality layouts of large graphs in a fraction of the time required by existing

algorithms of similar quality. An application for visualization of the topologies of ISP

(Internet Service Provider) networks is presented. This chapter is based on [65].

The rest of this chapter is structured as follows. Section 3.1 gives an introduction.

Related work is disucssed in Section 3.2. Partitioning graphs using spectral methods is

reviewed in Section 3.3. Section 3.4 presents the layout algorithm. The GPU implemen-

tation of the algorithm is reviewed in Section 3.5. Results are presented in Section 3.6.

An application to the visualization of Internet service provider networks is discussed in

Section 3.7. Finally, Section 3.8 concludes.


3. Multi-Level Graph Layout on the GPU 24

Figure 3.1: ISP router map. Each node represents a router. Edges link routers. Red

nodes are external to the ISPs visualized. Other nodes are colored according to the ISP

they belong to: green - Abovenet (US, 664 routers); blue - Exodus (US, 551 routers);

black - Tiscali (Europe, 513 routers). A total of 5044 routers and 8043 connections are

shown.

3.1 Introduction

Rapidly producing aesthetically pleasing, high-quality graph layouts is still a challenging

problem. For instance, one of the most popular graph layout algorithms, the force di-

rected algorithm, is computationally expensive. The complexity of each iteration of the

algorithm is O(V 2 +E). On large graphs, the layout procedure can take anywhere from a

few seconds to several minutes to complete, hindering the capability to use this algorithm

to explore large data sets.

In recent years, a popular way to accelerate computations is to perform them on the

GPU (graphics processing unit) [58, 89, 163, 169]. This is due to the high computational

power, low cost, and ubiquity of GPUs in every modern PC. Please refer to Sections 1.3



and 2.2 for more information about accelerating computations using GPUs.

GPUs are geared towards repetitively performing the same computation on large

streams of data. Therefore, the GPU suits uniformly structured data, such as images

or matrices. Graphs do not posses a uniform structure, hence, they do not admit any

intuitive and natural representation that suits computation on the GPU.

This chapter proposes two ways in which force directed algorithms can be accelerated.

The first is a general multi-level scheme, which is based on spectral partitioning. The

second is computation of a graph layout on the GPU.

Multi-level graph layout algorithms have been proposed in the past [72,91,96,98,123,

172, 205]. In these algorithms, the given graph is recursively coarsened, to compute its

multi-level representation. In contrast, in our scheme, the algorithm works on a high-

detailed graph at all levels of the partitioning. Thus, a good hierarchical representation

of the graph is obtained. The scheme proposed in this chapter is a general multi-level

scheme, which is based on spectral partitioning. Using a coarse to fine approach, layouts

of increasing detail are computed. It is shown how coarse layouts of a graph can be

efficiently extended to the final high quality layout.

In addition, this chapter describes a method of representing graphs so as to make

efficient use of GPU resources. Partitioning is used to break the large problem into

smaller and similarly-sized problems that suit computation on the GPU or on other

data-parallel programming models. This algorithm exposes the underlying structure of

the graph, and thus can be used in a multi-level scheme.

Another algorithmic contribution of the chapter is devising a layout algorithm that

combines the strengths of two different well-known layout algorithms [70,113]. The pro-

duced layouts are as good as existing state of the art layouts [91, 92], yet computed at a

fraction of the running time. For example, a layout of the graph bcsstk31 is computed

using our approach in 5.8 seconds (using a GPU on a Core 2 machine) compared to 83

seconds in [91] (using a Pentium CPU).

Implementation-wise, the chapter elaborates on how force directed layout is acceler-

ated, by performing the time-consuming stages on the GPU. The data storage and the

stream processing are described.

Last but not least, the algorithm is applied to the visualization of the topologies of

Internet Service Providers (ISP) networks. In this application, illustrated in Figure 3.1,


nodes represent routers and edges represent the connections between them.

3.2 Related Work

Many algorithms have been proposed to perform graph layouts [116, 199]. This chapter

focuses on force directed layout [70, 113], which is based on simulating the graph as a

network of charged particles that repel each other, where edges are simulated by springs.

The algorithm is popular due to its ability to draw general undirected graphs, its ability

to be tailored according to specific requirements, and the aesthetically pleasing layouts

it produces. However, a major drawback of the algorithm is its high computational cost.

Some algorithms have been proposed to perform force directed layouts of large graphs

[92]. In [205] coarser representations of the graph are recursively built using the edge

collapse operation. Instead of computing all-pairs repulsion forces, only close-by nodes

are addressed. The algorithm in [96] creates coarse graphs using an approximation of

the k-center problem. A modified version of [113] is used to perform single level layout.

This algorithm requires O(V 2) memory and O(V E) time for a graph with V nodes and

E edges. The algorithm in [9] computes repulsion forces in O(N log N) for N nodes.

In [172] a quadtree is used to accelerate layout and to visualize the graph in multiple

levels of detail. In [72] a maximum independent set filtration is used to coarsen the

graph. At each level new nodes are placed in accordance with their neighbors. A local

force computation is performed using both [113] and [70]. FM 3 [91] is a state of the art

multi-level algorithm [92]. There, solar systems are created, which consist of nodes at a

distance of two edges or less from the center of the solar system. A clever O(N log N)

approximation of the all-pairs repulsive forces is used to accelerate layout.

In [123] a simplified energy function is used, which allows more robust mathematical

treatment. The layout problem is reduced to an Eigen value computation problem, which

is solved using an algebraic multi-grid approach. Although the resulting algorithm is very

rapid, the quality of the layout is limited [92]. This may be attributed to the algorithm

defining forces only along edges of the graph. In [98] a high dimensional embedding of

the graph is computed and then projected into the drawing plane, allowing a linear time

O(E + V ) algorithm.

In the current chapter, instead of working on increasingly coarsened graphs, the input


graph is partitioned to smaller and smaller parts. This helps construct an accurate multi-

level representation of the graph.

In recent years, GPUs have been successfully applied to numerous problems outside of

classical computer graphics [163]. Some GPU usage examples include solving differential

equations [85], linear algebra [73,127], signal processing [150], visualization [95,108] and

simulation [100,118,136], to name a few.

Several other GPU applications are somewhat related to ours. In [83,198] simulation

of deformable bodies using mass-spring systems is performed. However, while the mass-

spring algorithms take only nodes connected by edges into account, the force directed

algorithm considers all the nodes when calculating the force exerted on a node. GPUs

have also been used to simulate gravitational forces [157], where an approximate force

field is used to calculate forces. Accelerating dynamic graph drawing on the GPU has

been addressed in [66]. The focus of that work was on creating stable layouts of changing

graphs, whereas the current chapter addresses static layouts.

3.3 Spectral Graph Partitioning

Computing directly the layout of a large graph is both time-consuming and difficult.

This is due to the sensitivity of force directed layout to the initial conditions given to the

algorithm. To address these problems, multi-level schemes have been used [72,91,96,98,

123, 172, 205]. The key idea is that a good representation of the overall structure of the

graph will yield a layout of the “skeleton”, which can be quickly computed, and which

can assist in drawing the large input graph.

We propose an algorithm for creating a series of resolution decreasing representations

of the graph by recursively partitioning it. We require the parts to have similar size and

have a minimal cut between them. The former requirement helps preserve the balance

between the nodes during layout, while the latter guarantees that different parts are

weakly coupled and hence can be treated relatively independently.

While existing multi-level graph layout algorithms recursively coarsen the graph in

order to compute the multi-level representation, our algorithm works on a high-detailed

graph at all levels of the partitioning. This allows us to obtain a high-quality repre-

sentation of the graph, which does not suffer from the growing inaccuracy involved in


repetitively creating coarser and coarser representations of a reduced version of the graph.

To do it, we use spectral graph theory [36]. This theory has been used in the field of

parallel computation to partition computation dependency graphs, where the amount of

work between processors needs to be balanced [170]. It was also used in image segmenta-

tion, where normalized cuts were introduced [184]. The idea of using eigenvectors of the

Laplacian for finding partitions of graphs has a rich history [59].

Suppose that wij is the weight of the edge (i,j), D is a diagonal matrix, D(i, i) ≡∑

j wij, and W (i, j) ≡ wij is the graph edge weights matrix. The matrix L = D −W

is the Laplacian of graph G. The goal is to partition G into two equal-sized partitions

A,B. For node i, we define qi = 1 if i ∈ A and qi = −1 if i ∈ B. It can be shown [170]

that the cut size J is:

J = CutSize =1

4

∑

i,j

wij(qi − qj)2 =

1

2qT (D −W )q.

This is so since if qi and qj are in the same partition, qi− qj is zero. If not, the expression

evaluates to 22. Hence, dividing by four achives the desired result. The right hand side

of the equality stems from the characteristics of the Lapalcian of the graph.

In order to minimize J, we can relax the indicators qi to continuous values and take

the second smallest eigenvector of

(D −W )q = λq.

This vector is known as the Fiedler vector [59]. (The smallest eigenvector, corresponding

to an eigenvalue λ1 = 0 is q1 = (1, ..., 1)T .)

To compute the Fiedler vector, we use the power iteration algorithm [208], shown

in Figure 3.2. The input of the algorithm is a guess for the Fiedler vector, stored in

v2. The computed Fiedler vector is returned in v2. The algorithm is iterative. In each

iteration v2 is orthogonalized against the first eigenvector and multiplied by the matrix

B which is used to reverse the order of the eigenvectors, using the Gershgorin bound,

which bounds the magnitude of the largest eigenvalue of the Laplacian. This algorithm

fits sparse matrices (i.e., graphs), since it requires only matrix-vector multiplications. A

similar algorithm is used in [123] to directly compute the graph layout, whereas it is used

here only to partition the graph.


v2 = random guess

L = Laplacian(G)

g = Gershgorin bound(L) = maxi

(

Lii +∑

j 6=i

|Lij|)

B = gI - L

v1 = 1√N· (1, ..., 1) //first (known) eigenvector

do

v2old = v2

v2 = v2− (v2T · v1)v1

v2 = B · v2

v2 = v2‖v2‖

until |v2old ·v2T −1| < ε or max iteration count reached

Figure 3.2: The power iteration algorithm

A drawback of the power iteration algorithm is its slow convergence rate. To accelerate

the convergence, a multi-grid algorithm is used. Instead of directly operating on the

largest Laplacian matrix, a series of coarsening operations is performed, until reaching a

minimal problem size. The coarsening algorithm is detailed in Section 3.4, Step 1. After

coarsening, the coarser problems are recursively solved and interpolated back, setting a

good initial guess for the next (finer) problem.

After computing the Fiedler vector v2, it is used to partition the graph. Each node

in the graph has a corresponding value in v2. Unlike the discrete partitioning case, when

using the Fiedler vector, the values in v2 are continuous, and range between -1 and 1.

These values can be used to partition the graph. This value is used to determine which

partition the node will be assigned to. The values in the vector v2 are sorted from lowest

to highest. This creates an ordering of the nodes of the graph. A set of k − 1 splitting

values is determined by sampling the sorted vector at k−1 uniformly spaced points. This

splits the vector into k regions. The partition to which a node is assigned is computed by

determining to which of the k regions the value of v2 corresponding to the node belongs

to.

Since the graph is partitioned into more than two parts, some clusters may be discon-


nected. A post-processing stage that merges clusters is performed. Each cluster whose

size is below a threshold, is merged with its largest neighboring cluster. In our imple-

mentation, disconnected clusters smaller than 19

of the graph are merged.

The partitioning algorithm continues repetitively, building finer and finer represen-

tations of the graph. The finer representations are then used in a multi-level scheme,

described in Section 3.4, to compute a globally pleasing layout of the original graph.

In our implementation, any eigen problem of a size smaller than 128 nodes is directly

solved, since coarsening it further is not time-effective. For each problem, a maximum of

10000 power iterations are allowed and an accuracy ε = 10−8 is used.

It should be noted that although the spectral partitioning algorithm was conceived to

split the graph into two partitions, we partition by default to three parts (k=3). This is a

heuristic that works well in practice and helps reduce the running time of the algorithm.

Our attempts to perform a more adaptive partitioning, resulted in lower quality results.

3.4 Multi-level layout Algorithm

Given an undirected weighted graph G = G0 = (V,E), the goal of the algorithm is

to compute a straight-line drawing of G, assigning 2D coordinates to each node. Our

algorithm is based on the force-directed approach [70, 113, 116, 199], which simulates a

system of forces defined on the input graph and converges towards a local minimum

energy position, starting from an initial placement of the vertices.

Our algorithm has several key ideas. First, a multi-level scheme is used to compute

the layout. Instead of directly computing a layout for the input graph, several coarsened

versions of it are created. Starting from the coarsest version, a series of increasingly

detailed layouts are computed. Care is taken to interpolate positions from each coarse

layout and use them as the starting point for the next finer layout.

Second, spectral partitioning methods are used to compute lower resolution represen-

tations of the graph, as discussed in Section 3.3. Using this approach the difficult graph

partitioning problem is transformed to a 1D partitioning problem. Breaking the graph

into increasingly finer parts allows us to produce a series of increasingly detailed graphs,

which are used in the multi-level scheme.

Third, a layout algorithm which combines the strengths of [70,113] is used. While [113]


is able to compute a good layout, given any starting point, it is time consuming. The

algorithm of [70] is faster and computes ”smoother” layouts, but is more sensitive to the

initial conditions given to it. We propose an algorithm which combines the strengths of

both algorithms in order to produce the final layout.

The algorithm is composed of the following stages, shown in Figure 3.3: We elaborate

on each stage below.

1. Initial coarsening: Given G = G0, compute G1, G2, . . . , Gcoarsest where Gk+1 =

edge collapse(Gk).

2. Partitioning initialization: set P level=0part num=0 to Gcoarsest. Set l = 0.

3. Partitioning: try to partition each graph P ln. This creates a new set of graphs

P l+10 , P l+1

1 , . . .. If no graph P ln could be partitioned, goto step 7.

4. Multi-level construction: construct Ll out of Gcoarsest, where each node in Ll cor-

responds to a graph P ln.

5. Layout initialization: compute an initial layout for Ll, using interpolated initial

positions from the coarser Ll−1.

6. Layout: compute the layout for Ll. This is the core step of the algorithm, which

uses our variant of the force-directed approach. Set l = l + 1, goto step 3.

7. Compute a layout for Gcoarsest using interpolated initial positions from Lfinest, the

finest graph layout computed in stage 6.

8. Final un-coarsening: Compute layouts for Gcoarsest−1, Gcoarsest−2, . . . , G0 by repet-

itively interpolating from Gi to Gi−1 and laying out Gi−1.

Figure 3.3: Algorithm overview

Initial coarsening (Step 1): In step 1, the graph is coarsened several times, as a

pre-processing stage that helps reduce computation time. At each level k, given a fine

graph Gk, a coarser representation Gk+1 is constructed using a series of edge collapse


operations [205]. A collapse operation replaces two connected nodes and the edge between

them by a single node, whose weight is the sum of the weights of the nodes being replaced.

The weights of the edges are updated accordingly. (The initial weight of a node/edge is

1.) The order of the edge collapse operations is different than in [205]: First, candidate

nodes for elimination are sorted by their degree, so as to eliminate low-degree nodes first.

An adjacent edge of a low-degree node is chosen for collapse by maximizing the following

measure: w(u,v)w(v)

+ w(u,v)w(u)

, where w(x) is the weight of node x and w(x, y) is the weight

of edge (x, y). This function helps to preserve the topology of the graph by “uniformly”

collapsing highly connected nodes.

In our implementation, three initial coarsening steps are performed. This significantly

reduces the computation time of spectral partitioning (Step 3), while maintaining a good

relation between the input graph G0 and Gcoarsest.

Partitioning initialization (Step 2): This step initializes the variables used in the

recursive partitioning of graph Gcoarsest in the next step. The graph P 00 , which is set to

Gcoarsest, is created.

Partitioning (Step 3): The goal of this step is to create high quality coarser repre-

sentations of the graph Gcoarsest, which are used in the multi-level layout scheme.

Starting from the single graph P 00 at level 0, for each level l the set of graphs P l

n in

this level are partitioned as described in Section 3.3. Each graph P ln is partitioned into

graphs P l+1m , by adding the corresponding edges from P l

n. As the level number l increases,

Gcoarsest is partitioned into a growing number of graphs decreasing in size.

Multi-level construction (Step 4): A series of graphs L0, L1, . . . , Lfinest of increasing

detail is created. At level l, the graph Ll is created as follows. Each node nk in Ll

corresponds to a single graph P lk in level l. The weight of a node nk in Ll is the sum

of the weights of the nodes in graph P lk it corresponds to. Edges (nk, nj) in Ll are

created by summing corresponding edges in Gcoarsest which connect the nodes in Gcoarsest

corresponding to P lk and P l

j .


Layout initialization (Step 5): The goal of this stage is to compute a good initial

layout of Ll. This is done based on the layout of Ll−1, and proceeds as follows. Initially,

each node pi ∈ Ll is placed at the position of its parent node in Ll−1, whose layout was

already computed. Next, the position of each node is scaled, as follows:

pi(x, y) =

√

|V (Ll)||V (Ll−1)| · pi(x, y), (3.1)

where V (Lk) is the set of nodes in Lk. The intuition behind Eq. 3.1 is that the scale

should be proportional to the ratio between the number of nodes in the graphs Ll and

Ll−1. A square root is used since the area of the graph should be scaled linearly with

the node ratio. Finally, an iterative algorithm is used to improve the placement. At each

iteration, each node i is placed at the average between its current position, pi, and the

average position of its neighbors, N(i), as follows:

pi =1

2

(

pi +1

degree(i)

∑

j∈N(i)

pj

)

.

This procedure creates a good initial placement, which is used in the next step. In our

implementation 50 iterations are used.

Layout (Step 6): In this stage, a layout for Ll is computed, using our variant of the

force directed approach. This is done utilizing the multi-level scheme, until the final

layout of the finest graph, Lfinest, is computed. Using this scheme, it is possible to retain

important information about the overall structure of the graph from previous layouts,

which is extracted from the spectral partitioning of the graph.

There are a couple of common approaches to performing force directed layout. The

first common approach, exemplified by the Fruchterman-Reingold (FR) algorithm [70],

computes the forces directly. Each node is moved according to the forces acting on it.

It computes ”smooth” layouts, but is sensitive to the initial conditions given to it. A

second common approach, used in the Kamada-Kawai (KK) algorithm [113], derives an

energy function from the forces and attempts to minimize the energy in order to create

the layout. The node that reduces the energy the most is moved in each step. This


algorithm is less sensitive to the initial conditions. However, it requires an expensive

all-pairs shortest path calculation and the computed layouts are less ”smooth”.

In this chapter , an approach that combines the strengths of both algorithms is used.

The key idea is to use the KK approach, to give the overall structure of the graph and

reduce the sensitivity to initial conditions. Then, the computed layout is used as an input

to the FR-based algorithm. On finer graphs, only the faster FR layout is used. By doing

so, we get a good initial placement from the KK algorithm and a ”smooth”, aesthetically

more pleasing layout from the FR algorithm. Note that a combined approach is used

in [97] in order to meet node-size constraints. In the current chapter, however, FR is

used to refine the layout of finer graphs in the multi-level hierarchy.

The most expensive step of the FR algorithm is the computation of all-pairs repulsive

forces between nodes, which is crucial for obtaining a good layout. This step is accelerated

in two ways. First, the graph is geometrically partitioned. Instead of calculating all-pairs

repulsive forces, as customary, approximate forces are calculated. An exact calculation

is performed only for nodes contained in the same partition, while an approximate cal-

culation is performed for nodes belonging to different partitions. Second, the calculation

of the forces is parallelized and performed on the GPU.

Graph Ll is now partitioned geometrically, according to the current layout, so as to

balance the number of nodes per partition. This is important in order to achieve good

load balance between the parallel processors of the GPU (Section 3.5). Moreover, since

the nodes in each partition are geometrically localized, it is possible to approximate the

partitions with a single ”heavy” node, as discussed below.

Specifically, a KD-tree-type partitioning is created. The nodes are partitioned ac-

cording to their median, alternating between the X and Y coordinates. This recursive

subdivision terminates when the size of the subset is below the required partition size.

The algorithm is iterative. In each iteration, the KD-tree is updated according to the

current layout (while required). Then, the center of gravity is found for each partition

and is used to replace the nodes it contains. Next, The forces applied to each node are

computed. Finally, the nodes are displaced according to the forces acting on them, while

bounding the allowed displacement according to the exponential converge schedule, which

resembles simulated annealing.

The key to achieving high performance is to perform these computations (i.e., finding


the center of gravity of the partitions, calculating the various forces acting on the nodes,

and calculating the displacements), in parallel on the GPU for each node/partition.

In particular, the repulsive and attractive forces that are computed in parallel for each

node are as follows. The difference from [70] is that the forces from distant partitions are

approximated using their center of gravity CG. For each node v that belongs to partition

Pi,

F repl(v) = K2(

∑

u6=v,u∈Pi

pos(v)− pos(u)

‖pos(v)− pos(u)‖2 +∑

Pj 6=Pi

|Pj|pos(v)− CG(Pj)

‖pos(v)− CG(Pj)‖2)

F attr(v) =∑

u:(u,v)∈E

‖pos(u)− pos(v)‖(pos(u)− pos(v))

K,

where pos(u) is the 2D position vector of node u and CG(Pi) is the 2D position vector

of the center of gravity of partition Pi.

The attractive and repulsive forces are then summed up in parallel for every node, re-

sulting in an approximation of the total force applied to each node, F total(v). Then, each

node is displaced, in parallel, using a simulated annealing technique, which exponentially

decreases the allowed displacement:

posnew(v) = pos(v) + F total(v)‖F total(v)‖min(t, ‖F total(v)‖).

Here, t is the bound for the maximum displacement, which is initialized to K ∗√

|V | and

decreases at each iteration by a factor λ. In our implementation, K = 0.1 and λ = 0.9.

This makes the scale of the graph proportional to the number of vertices it contains and

makes the annealing process stop after 50 iterations.

The simulated annealing technique makes the graph slowly freeze into position. Thus,

later iterations perform increasingly local corrections to the layout. Because of this be-

havior, it is possible to perform geometrical KD partitioning of the graph with decreasing

frequency.

In our implementation, re-partitioning is done on iterations 1-4 and then every 10

iterations. A total of 50 FR iterations are performed [205]. KK layout is performed on

graphs smaller than 1000 nodes. This constant was selected so the layout time will not

be dominated by KK layout which requires performing an expensive all-pairs shortest


path calculation. We use 2000 iterations in each KK layout.

Layout of Gcoarsest (Step 7): In this step, the layout of Lfinest is extended to a layout

for Gcoarsest. Here, the same method applied in Steps 4–6, is used. Instead of interpo-

lating positions from Li−1 to Li, an initial placement for Gcoarsest is computed using the

existing layout of Lfinest. The mapping of nodes between Gcoarsest and Lfinest is performed

similarly to Step 4: each graph P finestn corresponds to several nodes in Gcoarsest. After

computing an initial placement for Gcoarsest, layout proceeds as discussed in Step 5-6.

Final un-coarsening (Step 8): This step extends the layout of Gcoarsest to a layout

of the original graph G = G0. In each iteration, the layout of Gi is used to compute an

initial placement for the nodes of the finer graph Gi−1, using the algorithm described in

Step 5. Then, the force directed algorithm of Step 6 is applied to the initial placement

of nodes in Gi−1.

In our implementation, we do not perform force directed layout of the final graph

G0, for which the layout is the most expensive. Instead, using the layout of G1 and the

interpolation algorithm for computing initial positions, we are able to get a good layout

for G0.

Complexity: The most time consuming steps of the algorithm are spectral partitioning

and the FR force directed layout. Assuming that each KD partition of the graph contains

Cs nodes, the asymptotic FR complexity is O(|E|+ |V | ∗ (Cs + |V |Cs

)), which is minimized

to O(|E| + |V |1.5) when Cs =√

|V |. The spectral partitioning takes O(|V |1.5) [184].

Therefore, the total complexity is O(|E|+ |V |1.5). When |E| ≈ |V |, the dominating term

is |V |1.5. However, due to the calculation’s simplicity and its parallel implementation,

the actual running times are low, as discussed in Section 3.6.

3.5 GPU Implementation

This section describes how the GPU is utilized to accelerate the force-directed layout. It

elaborates on key details, which are briefly introduced in [66]. Figures that illustrate the

overall process are included.


The key to high performance on the GPU is using multiple processors, which operate

in parallel. The GPU schedules the execution of multiple threads, thus hiding memory

access latency. Each thread runs a small program called a kernel program, which computes

a single element of the output stream.

In the following, we first describe how the data is stored on the GPU and then how

the stream processing is performed [26].

Data Storage: On the GPU, input and output are represented as two-dimensional

arrays of data, called textures. The challenge is to map the graph and its elements onto

textures, even though graphs do not admit any intuitive and natural representation as

balanced arrays. Below, we describe the textures used to represent the graph,

To represent the graph layout, three textures are used: one texture for the nodes and

two textures for the edges.

The location texture holds the (x,y) positions of all the nodes in the graph. Each

graph node has a corresponding (u,v) index in the texture. As shown in Figure 3.4, the

nodes in each partition are stored at a rectangular region in the location texture. Recall

that Section 3.4 described how to partition a graph, so that the nodes in each partition are

geometrically close and the number of nodes in each partition is similar. This partitioning

is critical for the acceleration of the layout on the GPU for two reasons. First, storing

neighboring nodes (those that belong to the same partition) together maximizes memory

access locality. Thus, it makes efficient use of the GPU’s memory bandwidth, since

information regarding neighboring nodes will most likely reside in the cache. Second, since

the number of nodes in each partition is similar, the amount of computation performed on

each node is balanced. Thus, it makes efficient use of the GPU’s data parallel architecture,

which requires lock-step execution.

The location texture also holds the partition number of each node. Given a partition

of maximum size csz, the height and width of each rectangular region representing a

partition are set to hpartition = max(8,√

Csz) and d Csz

hpartitone, respectively.

Graph edges are represented by a neighbors texture and by an adjacency texture, as

shown in Figure 3.5. The adjacency texture, whose size is O(|E|), contains lists of (u, v)

pointers into the location texture. These lists represent the neighbors of each node.

The neighbors texture holds for each node a pointer into the adjacency texture, to the


Figure 3.4: Representing a graph on the GPU. Left: A graph spatially partitioned into

partitions; right: a corresponding location texture

coordinates of the first neighbor of the node. Pointers to additional neighboring nodes

are stored in consecutive locations in the adjacency texture. Doing so improves access

locality. The degree of each node is also stored in the neighbors texture. Its size is equal

to that of the location texture.

Figure 3.5: Representing graph edges on the GPU. Node X has three neighbors: Y,Z and

W.

The geometric (KD) partitions (described in Section 3.4, Step 6) are represented

using two textures: the partition information texture and the partition center of gravity

texture. The partition information texture holds the following information: (u0, v0) –

the coordinates in the location texture of the upper left corner of the partition, the width

and height of the partition rectangle in the location texture, the number of nodes in

the last row of the partition (which may be partially filled), and the number of nodes


in the partition. The partition center of gravity (C.G.) texture holds the current (x,y)

coordinates of the center of gravity of each partition. Two textures are used to represent

partitions not only because each texture is limited in the number of fields (to 4), but also

to separate between the constant information and the information modified during the

layout computation (i.e., center of gravity).

The forces computed during layout iteration are stored in two textures in a straightfor-

ward manner: the attractive force texture and the repulsive force texture. The attractive

force texture contains for each node the sum of the attractive forces F attr exerted on it by

its neighbors. The repulsive force texture holds the sum of repulsive forces, F repl: both

by nodes in the same partition and by the other partitions in the graph. Both textures

have the same dimensions as the location texture and contain the 2D components of the

forces, (Fx, Fy).

Stream processing: On the GPU computation is performed by selecting the rendering

target, which is the stream, or the texture, to which the output should be written.

Next, an appropriate kernel program is loaded. Finally, graphics primitives such as

quadrilaterals, are rendered in order to invoke the computation. For each pixel in the

primitive (i.e., that the quadrilateral covers), the loaded kernel program is executed.

Below we describe the order of invocations of the kernel programs, and their input and

output textures. Figure 3.6 displays the execution graph of the algorithm.

The algorithm is composed of three main stages, each implemented in a separate

parallel foreach loop which is executed in parallel for all elements on the GPU. The first

loop calculates the center of gravity of each partition. The second loop calculates the

forces acting on each node. The third loop displaces nodes using simulated annealing.

The partition CG (center of gravity) kernel calculates the center of gravity of each

partition. The kernel reads information about each partition from the partition informa-

tion texture and from the location texture and writes its result into the partition center

of gravity texture. The GPU operates on all partitions in parallel.

The repulse kernel, which is the most time consuming kernel, calculates the repulsive

forces exerted on each node. The kernel reads information from the partition information,

the partition center of gravity, and the location textures. The output of the kernel is

written to the repulsive force texture. For each fragment, the kernel first calculates the



Figure 3.6: Execution graph of GPU layout (rectangles = streams, ovals=kernels)

internal forces (exerted by nodes contained in the partition that the node belongs to).

Then, it approximates the forces by all other partitions. Both of these calculations are

performed using branching and looping instructions, in order to iterate over all other

nodes in a partition and over all other partitions. Since the partitions are similarly sized,

good branching consistency is maintained.

The attract kernel calculates the attractive forces caused by graph edges. It reads

the neighbors, adjacency, and location textures and writes its output to the attractive

forces texture. For each node, the kernel accesses the neighbors texture in order to get

a pointer into the adjacency texture, which contains the (u,v) texture coordinates in the

location texture, of the node’s neighbors. For each neighboring node, the attractive force

is calculated and accumulated.

Finally, the anneal kernel calculates the total force on each node. It reads the at-

tractive force, repulsive force, and location textures and updates a second copy of the

location texture. This double-buffering technique is used due to the inability of the GPU

to read and write to the same stream. In the next iteration, the updated location texture

is bound as input to the different kernels, thus facilitating feedback in our computation.

The anneal kernel also bounds the total displacement of each node according to the



graph |V | |E| FM3 alg. our alg. our alg. our alg.2.8GHz 3GHz 2.4GHz 2.4GHzPentium Pentium Core 2 Duo Core 2 Duo +

8800GTS GPUflower B 9030 131241 11.9 3.25 2.21 1.59

4elt 14588 40176 N\A 8.094 4.973 3.237crack 10240 30380 23.0 4.844 3.018 2.44

bcsstk31 35586 572913 83.6 25.329 14.199 5.754bcsstk32 44609 985046 110.9 39.266 22.549 9.617bcsstk33 8738 291583 23.8 5.141 2.986 2.486fe pwt 36463 144794 69.0 22.985 13.48 5.44

finan512 74752 261120 158.2 79.268 43.645 12.267fe ocean 143437 409593 355.9 158.849 86.32 15.536

Sierpinski 08 9843 19683 16.8 5.25 3.127 2.705

Table 3.1: Graph information and running time [sec.]. Runtime columns show total

running times for computing a layout.

current temperature of the layout. This temperature exponentially decreases at every

iteration, hence allowing the graph to ”freeze” into its final layout.

In total, the partition CG kernel performs O(|V |) operations; the repulse kernel per-

forms O(|V |1.5) operations; the attract kernel performs O(|E|) operations; and the anneal

kernel O(|V |) operations. On the GPU, the computations executed in each kernel, are

run in parallel.

3.6 Results

Our algorithm was tested on several well-known graphs, commonly used in the graph

drawing literature [204]. The bcsstk* graphs represent stiffness matrices. The Sierpinski

graph is a self-similar fractal composed of triangles. The finan512 graph is taken from

a linear programming matrix. The flower B graph is constructed by joining 6 circles of

length 50 at a single node before replacing each of the nodes by a complete subgraph

with 30 nodes (K30) [92]. The 4elt and crack graphs are 2D Finite–element meshes.

The fe * graphs are unstructured meshes related to fluid dynamics, structural mechanics,

or combinatorial optimization problems. Figures 3.7 - 3.11 show some of the layouts

computed by our algorithm, whereas Table 3.1 gives information about the graphs. Each

image is accompanied with a layout computed by other algorithms [75,92].

It can be seen that the layouts computed by our algorithm compare well with FM 3 [91].



Figure 3.7: bcsstk31. Red: our layout, black: FM 3 layout

Figure 3.8: Sierpinski 08. Red: our layout, black: FM 3 layout



Figure 3.9: finan512. Red: our layout, black: FM 3 layout

Figure 3.10: flower B. Red: our layout, black: FM 3 layout



Figure 3.11: 4elt. Red: our layout, black: Kamada-Kawai layout

The bcsstk31 graph (Figure 3.7) has a high edge density: |E|/|V | = 16. Moreover, it

has a regular mesh-like structure. This regularity is extracted in our layout, as a result

of the good partitioning and interpolation of the graph. Figure 3.8 shows the Sierpinski

graph, which demonstrates that the symmetry of the graph is maintained, even though

the holes in the graph are challenging, compared to more uniform mesh graphs. Figure 3.9

demonstrated the layout of the topologically challenging finan512. It is of similar quality

to FM 3 and better than the other algorithms compared in [92]. Figure 3.10 shows the

flower B graph, which has a relatively high edge density: |E|/|V | ≥ 14. Here, k = 6 is

used for partitioning the graph and KK layout is performed on graphs up to 128 nodes.

The 4elt graph, shown in Figure 3.11, exhibits large variations in node density and is thus

challenging for an algorithm that seeks to maintain equal edge lengths [205]. The layout

manages to show the interesting features of the graph – planarity and holes. Our layout

is more uniform and contains less overlaps than the Kamada-Kawai layout from [75].

For the performance tests, a PC equipped with a 2.4 GHz Intel Core 2 Duo CPU

and an NVIDIA 8800GTS GPU is used. Our algorithm was implemented in C++, Cg,

and OpenGL. Table 3.1 shows the running time of our algorithm when using only the



CPU and using the GPU to accelerate the computation. It also shows the running times

for the FM 3 algorithm, produced on a 2.8 GHz Intel Pentium 4 CPU . In addition, it

shows our algorithm on a slower machine (3.0 GHz Pentium 4), which is comparable to

the machine used for the reported experiments of FM 3 [92].

Compared to FM 3 running on an older machine, running our algorithm using a new

GPU-equipped machine, a speedup by a factor of up to 22 times is achieved. The GPU

accelerates the total computation time by a factor of up to 5.5. Without the GPU, on

comparable hardware, our algorithm runs 2-4 times faster than FM 3.

3.7 Visualization of ISP Router Networks

We have applied our algorithm to the visualization of Internet Service Provider (ISP)

router networks. The router networks of ISPs are comprised of several points of presence

(POPs). In each POP, several routers are located. They are connected to the backbone

of the ISP and to routers connected to subscribers of the ISP. The data is taken from [2].

It was collected by using the traceroute tool to determine the route taken by packets

traversing the ISP’s network [186].

Figures 3.1, 3.12 show layouts of the networks of several ISPs. Each node in the

graph corresponds to a router. Edges represent links between routers. Red nodes are not

associated with any ISP in the data – they are used to connect the ISP to the rest of the

Internet. The other nodes are color coded according to the ISP they belong to.

The layouts make evident some facts about these networks. First, most routers of

each ISP are clustered together. This can be seen from the large clusters of nodes having

the same color (excluding the red nodes). Second, two clusters are evident in Figure 3.12

– the brown cluster on the left, which represents an Australian ISP, and the rest of the

graph. The yellow and pink nodes represent European ISPs. The black and blue nodes

represent North American ISPs. The strongest connections exist between the two North

American ISPs. There are good connections between European and North American

ISPs. Connections between the Australian ISP and the other ISPs are sparser. Third,

the per-ISP clusters are further divided into small clusters of routers, perhaps in the

same city or nearby area. For instance, it can be seen that the brown routers belong to

a couple of clusters. Fourth, the red external routers, which do not belong to any ISP,



Figure 3.12: ISP router map. Each node represents a router. Edges link routers. Red

nodes are external to the ISPs visualized. Other nodes are colored according to the ISP

they belong to: blue - Abovenet (US, 665 routers); black - Exodus (US, 554 routers);

yellow - Ebone (Europe, 314 routers); pink - Tiscali (Europe, 514 routers); brown - Telstra

(Australia, 3756 routers). A total of 10895 routers and 15667 connections are shown. Top

left - GRIP layout. Bottom right - our layout.



are used to link to the external world (outside the ISPs visualized). Fifth, the number of

external routers is about the same as the number of internal routers, hence each router

has one link on average to the world outside the ISP it belongs to. Sixth, the routers

have varying degrees. Some have high degree and are central points (such as the router

connecting the brown ISP and the yellow ISP), while others have low degree.

Figure 3.12 also compares our layout to one computed by GRIP [72]. It can be

seen that GRIP’s layout does not display the overall, clustered structure of the graph.

Moreover, important edges, such as the ones connecting the brown cluster to the other

part of the graph, are not visible. However, the GRIP layout contains less overlap between

nodes. To compare the performance, both layouts were computed using only the CPU

on a 3GHz Pentium PC. Linux, required for GRIP, is not available on the PC with the

GPU. The running time of GRIP was 3 seconds and the running time of our algorithm

was 12 seconds. Trying to modify the parameters of GRIP resulted in a higher runtime,

but without an improvement in layout quality.

3.8 Conclusion and Future Work

This chapter has presented a new algorithm for multi-level force directed layout of graphs

on the GPU. The algorithm has several key ideas. First, the graph is multi-level and is

based on spectral partitioning. Second, the algorithm combines the strengths of both

the Kamada–Kawai and Fruchterman– Reingold approaches, in order to compute a good

layout fast. Third, a geometric partitioning and interpolation method in proposed, which

facilitates the generation of good initial layouts of the finer versions of the graph.

Moreover, the chapter has demonstrated how the GPU can be used to accelerate the

algorithm by a factor of up to 5.5 times compared to our CPU implementation.

Last but not least, it has been demonstrated that the algorithm computes meaningful

high quality layouts, while requiring significantly lower running times than existing algo-

rithms of similar quality. Moreover, the algorithm was applied to visualize ISP networks.

There are several avenues for future research. Using the stress majorization algo-

rithm [74] can help improve the coarsest layout computed. Computing the Fidler vector,

which is used for the spectral partitioning, on the GPU, can further accelerate the al-

gorithm. This is a non-trivial task which requires sparse matrix multiples, which are



difficult to accelerate on the GPU. However, in our case we are tasked with computing

many Fiedler vectors on different parts of the partitioned graph. Performing the compu-

tations on parallel on the GPU can help improve the results. Creating a more balanced

graph hierarchy can help improve both runtime and layout quality. Currently, the al-

gorithm does not attempt to take steps to balance the number of nodes in each part of

the graph in the spectral partitioning phase. An improved graph partitioning algorithm,

such as one based on [183] may further improve the layout quality. Finally, a better force

approximation scheme may help improve the results.


Chapter 4

Uncluttering Graph Layouts UsingAnisotropic Diffusion and MassTransport

Many graph layouts include very dense areas, making the layout difficult to understand.

In this chapter, we propose a technique for modifying an existing layout in order to reduce

the clutter in dense areas. A physically-inspired evolution process, based on a modified

heat equation is used to create an improved layout density image, making better use of

available screen space. Using results from optimal mass transport problems, a warp to

the improved density image is computed. The graph nodes are displaced according to

the warp. The warp maintains the overall structure of the graph, thus preserving the

mental map, while reducing the clutter in dense areas of the layout. The complexity

of the algorithm depends mainly on the resolution of the image visualizing the graph

and is linear in the size of the graph. This allows scaling the computation according

to required running times. It is demonstrated how the algorithm can be significantly

accelerated using a graphics processing unit (GPU), resulting in the ability to handle

large graphs in a matter of seconds. Results on several layout algorithms and applications

are demonstrated. The material is this chapter is based on [69].

The rest of this chapter is structured as follows. Section 4.1 gives an introduction.

Related work is reviewed in Section 4.2. The algorithm is presented in Section 4.3. An

algorithm for the solution of mass transport problems and it’s connection to this chapter

is discussed in Section 4.4. Methods to accelerate the running time of the algorithm

on a GPU are presented in Section 4.5. Results are presented in Section 4.6. Finally,


4. Uncluttering Graph Layouts Using Anisotropic Diffusion and MassTransport 50

Section 4.7 concludes.

4.1 Introduction

Graph layouts often contain a highly varying local density. While some regions in the

generated layouts are sparse or even empty, others are very dense, containing many close-

by or overlapping edges and nodes. This results in low efficiency in utilizing the available

screen space.

Instead of developing a new layout algorithm, this chapter describes an algorithm

that can improve a given graph layout. This allows the user to select a layout algorithm

that is suited for the application at hand. The clutter in the layout can then be reduced

by our algorithm, resulting in a layout with a smaller node density in the high-denisty

regions of the original layout. This is achieved while preserving the overall structure of

the graph. Figure 4.1(a) shows an example of a cluttered layout. The layout is difficult

to read and the available screen space is not used effectively. Figure 4.1(b) shows the

enhanced layout. Note how the screen space is more efficiently used, allowing more details

of the graph to become visible.

Some research has addressed the problem of reducing the visual clutter of graph

layouts in the past. Lyons et. al. [138] use a combination of a Voronoi diagram and a

force-directed type approach [50, 70, 113] in order to disperse nodes clustered together.

Merrick and Gudmundsson [143] modify the layout based on properties of the structure

of the underlying graph. However, these algorithms employ schemes that are either

computationally expensive or perform local improvements to the graph. In contrast,

the algorithm in this chapter is able to operate on large graphs, making a more global

enhancement to the layout.

Instead of operating on the abstract graph representation, the algorithm proposed in

this chapter operates on an image of the density of the input layout. The density image

is modified, making use of low-density regions in order to reduce the visual complexity in

high-density regions of the layout. A physically-inspired evolution of the density image

using a modified heat diffusion process is used to create the target density image. Given

the target density, a warp of the 2D layout is computed, in which dense regions are allowed

to expand and make use of available screen space. The warp is computed using results



from optimal mass transport problems [10, 94, 115]. The evolution process attempts to

retain the overall structure of the input graph layout, thus preserving the user’s mental

map [145] of the layout.

This chapter makes a couple of contributions. First, a new algorithm for uncluttering

graph layouts in a mental-map preserving fashion is presented. Second, a method for

accelerating the computation of the target density, which is the most time-consuming

stage of the algorithm, using a graphics processing unit (GPU), is described. Several

examples, using various layout algorithms and applications, are provided to demonstrate

the capabilities of the algorithm.

(a) (b)

Figure 4.1: Protein graph (V=30727, E=1206654). (a) FM 3 [91] layout. (b) Improved

layout. Note how displacing nodes outwards allows more details to become visible, espe-

cially in the center of the drawing. Also note that the overall structure of the graph is

maintained.

4.2 Related Work

This work is related to three sub-fields: algorithms for graph uncluttering, node overlap

removal in graph drawing and overlap removal in areas outside of graph drawing. In this

section we discuss related work in these fields.



Several papers have addressed the graph uncluttering problem. Lyons et. al. [138]

attempt to more evenly distribute the nodes while maintaining the user’s mental map of

the original layout. Two algorithms are presented. The first uses a Voronoi diagram in

order to move nodes. The second algorithm repositions nodes inside a region defined by

a Voronoi diagram, according to the forces acting on them, defined using a force-directed

approach [50,70,113]. Using a Voronoi diagram performs only local enhancements, which

may not be sufficient in order to reduce clutter in dense areas of the graph.

Merrick and Gudmundsson [143] propose a technique for enlarging dense areas of

a given graph layout and shrinking sparse areas. Their algorithm first determines the

important nodes, then calculates the desired edge lengths, and finally repositions vertices

using the algorithm of Shimizu and Inoue [185], which tries to minimize the change in

the angles of the edges. Determining the important nodes, called node centrality, is an

expensive operation, taking O(V · E) for V nodes and E edges. It is thus not scalable

to large graphs. Centrality is determined according to graph-theoretic properties of the

underlying graph, which do not take the actual layout into account. Therefore, the

algorithm is not effective at uncluttering dense areas of the graph with non-central nodes.

Our algorithm attempts to solve these problems.

There are two related, yet distinct, problems to graph uncluttering: graph overlap

removal and overlap removal in other fields such as map cartography. Hereafter we

describe some related work on these issues.

While most graph drawing algorithms assume that nodes are dimensionless (e.g.

point-sized), in practice nodes may be labeled, and the labels may overlap. Several

algorithms have been developed to remove overlaps between nodes.

Chuang et. al. [35] use potential fields in order to remove overlaps. Gansner and

North [77] use an iterative Voronoi diagram method in order to tidy up the layout. Harel

and Koren [97] use a combination of a Kamada Kawai [113] method and a modified

spring method, which takes node shapes into account when calculating forces in order to

converge to an overlap free layout. Marriott et. al. [141] use a constrained optimization

approach in order to remove overlaps. Eades and Nikolov [133] remove overlaps using

spring algorithms, followed by displacement of nodes in a way that preserves the mental

map as measured by the orthogonal node ordering model. Huang et. al. [106] discuss the

force-transfer algorithm which pushes overlapping nodes away from each other. Dwyer



et. al. [49] use a constraint optimization problem for each dimension separately.

The graph uncluttering problem addressed in this chapter is different from the node

overlap removal problem. Overlap removal attempts to compute a minimal displacement

of nodes in order to avoid overlaps, but may result in graphs that are still difficult to

comprehend since they include very dense areas. Moreover, while the algorithms discussed

above deal with removing overlaps between a small number of large, labeled nodes, our

algorithm attempts to improve layouts of large, dense graphs in a mental-map preserving

fashion. Finally, graph uncluttering attempts to maintain the original structure of the

graph, while overlap removal does not necessarily have this aim.

Figure 4.2 shows a comparison between the results of using a node overlap removal

algorithm [77] and using our graph uncluttering algorithm. It can be seen that the overlap

removal algorithm not only modifies the structure of the graph, but also leaves some dense

areas (Figure 4.2(b)). Our uncluttering algorithm improves the layout in a mental-map

conserving manner by expanding the graph to empty regions (Figure 4.2(c)).

(a) Input layout (b) Removing node overlaps (c) Uncluttering using(V=247, E=1230) our algorithm

Figure 4.2: Comparison between node overlap removal and graph uncluttering. (a) is a

layout produced using neato [79] of a reduced version of the bcsstk32 graph from [204].

In (b) the node overlap removal algorithm from [77] is used. Note that although the

overlaps between nodes are eliminated, the structure of the graph is not maintained and

the center of the layout is cluttered. In (c) our algorithm is used. Note how the cluttered

right side of the input layout is expanded, thus increasing node separation, while the

structure of the graph is maintained.

Overlap removal problems arise in other fields except graph drawing. Deussen et.



al. [45] present an extension of Lloyd’s method for distributing objects on the plane in

order to create stipple drawings. Chan et. al. [32] use a density constrained minimization

formulation in order to compute overlap-free placements for components in integrated

circuits. Hayashi et. al. [101] present an O(n2) algorithm for finding the minimum area

layout of a set of n rectangles that avoids intersections and preserves the orthogonal

ordering of the rectangles.

Map cartography attempts to create maps in which the size of regions is in proportion

to their population or some other analogous property. Gastner and Newman [82] perform

diffusion in order to create maps which have a uniform information density. There are

a couple of differences between their work and this chapter. First, in cartography an

attempt to conserve the area is made, while our algorithm tries to use sparse or empty

regions of the screen. Second, while in [82] isotropic diffusion is used, here anisotropic

diffusion is used in order to avoid ”collisions” between neighboring dense areas of the

graph.

4.3 The Algorithm

Given Linitial, which is a straight-edge layout of an un-directed graph G = (V,E), the

goal of the algorithm is to produce an enhanced layout Lfinal. This layout should make

better use of the available screen space by dispersing nodes from high density regions to

surrounding regions, while maintaining the structure of the original layout. The algorithm

utilizes several key ideas. First, for each pixel in the image of the layout, we compute

the density of the information it contains. Second, we perform an evolution process in

order to improve this density, making use of unused areas of the image and reducing

the density in congested areas. Third, a warp is computed between the initial and the

improved densities. This image warp is used to modify the graph layout in a mental-map

preserving way, resulting in an enhanced layout. Algorithm 1 gives an overview of the

steps of the algorithm. We elaborate on each of these steps below.

Computing the density image of the layout (Step 1): The first step of the

algorithm computes the density Dinitial of the given layout Linitial, as illustrated in Fig-

ure 4.3(a) and (c). The intensity of each pixel in the density image is proportional to the

number of graph elements that cover the pixel. Using the density image, the cluttered



Algorithm 1 Layout improvement algorithm

input: Linitial, layout of a graph G=(V,E)

output: Lfinal, modified layout of G

1. Compute Dinitial, the density image of the layout Linitial.

2. Calculate Dsmooth, a smoothed density image of Linitial, using the heat equation.

3. Calculate Dtarget, the target density image, using a modified heat evolution.

4. Calculate an optimal mapping u between Dsmooth and Dtarget.

5. Calculate Lfinal by displacing nodes according to the mapping u.

areas of the graph, which we wish to visualize more clearly, can be identified.

The density image can be computed using only the nodes or both the nodes and edges

of the graph. Our experiments indicate that using only the nodes produces better results.

This is since each edge has a rigid structure, while node concentrations consist of individ-

ual points which can be dispersed by our algorithm to generate a more understandable

layout. The resolution of the computed image is configurable by the user. While small

grids reduce the running time of the algorithm, the quality of the results can suffer, espe-

cially for large, dense graphs. In our experience, using a resolution of 257 by 257 pixels

gave good results at a reasonable running time for a large variety of graphs, and thus

was used as the default. (Note that the multigrid algorithm requires a resolution equal

to k · 2m + 1 where k,m ∈ N (see Section 4.4) [24].)

In our implementation, the density is computed using OpenGL and the GPU. Since

we are interested in identifying areas where several graph elements (i.e. nodes) occupy

the same screen pixel (i.e. overlap), we use blending in order to accumulate the density.

This is achieved by using a rendering mode in which the color of different overlapping

rendered primitives is accumulated. Thus, pixels that contain more graph elements will

have a higher value in the density image. Anti-aliasing is used to render a smoother

image.

Note that in this chapter density images are used to compute an improved layout.

However, there can be other uses of density images. For instance, in [202] they have been



used to aid in visualization.

Smoothing the density image (Step 2): In this step the image Dinitial is smoothed

it in order to create the image Dsmooth. This is a pre-processing phase that creates an

input that is more suitable and hence improves the numerical stability of the warping

algorithm in Step 4.

We base the smoothing algorithm on the heat equation [191]. This is a partial differ-

ential equation (PDE) that models the variation of the temperature in a region over time.

Intuitively, this PDE implies that the rate of change in temperature over time depends

on the temperature difference between a point and its neighbors. The PDE describes a

diffusion process that can be used for smoothing. In addition, it has the desirable prop-

erty that given a potentially discontinuous initial temperature, it very rapidly becomes

continuous.

Given a 2D domain Ω we define the temperature in each point in the domain as

u(x, y). The heat equation is

∂u

∂t= k(

∂2u

∂x2+

∂2u

∂y2) ≡ k∇2u, (4.1)

where ∇2 is the Laplacian operator and k is a constant describing the rate of heat

diffusion. In our case, u(x, y) is set to the density Dinitial(x, y) computed in Step 1

and it is evolved to compute the smoother density Dsmooth(x, y). Appropriate boundary

conditions need to be set on the values of u. We define u = 0 on the boundary ∂Ω,

corresponding to setting a zero density at the boundary of the image of the layout.

To solve this equation numerically it is necessary to discretize the grid and use nu-

merical approximations for derivatives [44]. This results in the following discrete approx-

imation of Equation 4.1:

ut+1(i, j)− ut(i, j)

dt= k

ut(i + 1, j)− 2ut(i, j) + ut(i− 1, j)

(dx)2

+kut(i, j + 1)− 2ut(i, j) + ut(i, j − 1)

(dy)2, (4.2)

where ut(i, j) is the value of the density at grid point (i,j) at time step t, dx and dy are the

grid dimensions in the x and y directions, respectively, dt is the time step and u0(x, y) =



(a) Input layout Linitial for the 3elt graph, (b) Output graph Lfinal

V=4720 E=13722

(c) Initial density (d) Smoothed density (e) Target densityDinitial (Step 1) Dsmooth (Step 2) Dtarget (Step 3)

(f) x-component of the warp u (Step 4) (g) y-component of the warp u (Step 4)

Figure 4.3: Algorithm steps. Higher intensity represents higher values. Values are scaled

to improve contrast.



Dintitial(x, y). Thus, given the density at every grid point at time t we are able to compute

the density at time t + 1. Figure 4.3 (c) and (d) shows the smoothing performed by the

heat equation. The Laplacian operator on the right-hand side of Equation 4.2 can be

represented by the following template [44]:

∇2 ≈

0 1 01 −4 10 1 0

, (4.3)

which describes how the values in each grid point are updated, taking its neighbors into

consideration.

It should be noted that it is possible to perform the smoothing by performing a

convolution with the heat kernel. The iterative formulation discussed here serves as a

basis for the anisotropic case discussed in Step 3.

The algorithm uses several parameters. We use a square grid and therefore set dx =

dy = 1. Using k = 1 in the heat equation results in a reasonable diffusion rate. In

order to maintain numerical stability, it is required to have dt ≤ 18

(dx)2+(dy)2

k[34]. We use

dt = 0.23. Thirty iterations of Equation 4.2 are run. This number represents a tradeoff.

If too few iterations are used, the smoothing will not be sufficient for Step 4. If too many

iterations are used, the image will be too smooth, potentially reducing the displacements

computed in Step 4.

Calculating the target density image (Step 3): Although the algorithm in

Step 2 has the advantage of creating a more uniform, evenly distributed density, it has

the disadvantage that the diffusion process takes into account only local properties of

the density, as governed by the heat equation. This is not desirable in our case since

it may lead to cases of ”collisions” between close-by high density regions. We would

like to take the topology of the given graph density into consideration when calculating

an alternative, more uniform density with lower maximal values, corresponding to a less

cluttered layout. The goal of this step is to compute Dtarget, which is an improved density

image, given Dinitial.

Creating an improved, shape-aware density image is achieved by modifying the evo-

lution described by the heat equation (Equation 4.1). Instead of performing isotropic

diffusion as governed by the discrete Lapalcian operator (shown in Matrix 4.3), we mod-



ify the direction of the diffusion according to the shape of the density image. The diffusion

is performed in a direction that makes use of empty and low-density regions of the image.

This allows making more effective use of the screen space in the improved layout Lfinal.

To select the preferred direction θbest at each time step and for each pixel of the current

density image µ, a ray-shooting process is performed. For location (x, y) in the density

image, given a possible diffusion direction θ, we calculate the following score

score(x, y, θ) =

∫ l=lmax

l=0

µ(x + lcosθ, y + lsinθ) dl,

where lmax corresponds to a point on the ray that is on the image boundary. The intuition

behind this formula is that we sum up the amount of material we encounter when traveling

in direction θ from (x, y) up to the boundary of the density image. In discrete form, the

score is

score(x, y, θ) =

l=blmaxc∑

l=0

µ(x + lcosθ, y + lsinθ). (4.4)

The final advancement direction is

θbest(x, y) = argminθ∈[0,2π]

score(x, y, θ), (4.5)

which corresponds to the direction in which the least amount of material is encountered,

hence making the best use of available screen space (since we disperse the material to the

emptiest regions).

Since there are potentially several nodes located in the same pixel of the density image

µ, it is required to use sub-pixel accuracy in the sampling performed in Equation 4.4.

This is efficiently handled by using bilinear interpolation for sampling µ. Using higher

fidelity kernels is also possible, but would result in a significant decrease in performance.

Given θbest for every pixel in the current density image, we evolve the density according

to equation 4.1, but replace the isotropic Laplacian operator in Matrix 4.3 with the

following anisotropic operator:

∇2anisotropic ≈

0 1 + sin(θbest) 01 + cos(θbest) −4 1− cos(θbest)

0 1− sin(θbest) 0

. (4.6)



The intuition behind this operator is that the averaging performed depends on the direc-

tion θbest, resulting in a new density that is biased in the required direction.

In summary, in this step, starting with µ = Dinitial, we iteratively compute Equa-

tion 4.5 and update µ using the anisotropic Laplacian Matrix 4.6, resulting in Dtarget.

In our implementation we calculate the best diffusion direction for 64 angles symmet-

rically distributed over the possible advancement directions (i.e. [0, 2π]). Five iterations

of the heat equation evolution (using Matrix 4.6) are performed between recalculations

of the best direction (Equation 4.5). This is a tradeoff between computation speed and

accuracy, which our experiments show produces good results. A total of 60 iterations of

the heat equation evolution are performed. This number is used in order to ensure that

the evolution of the target density Dtarget continues for more iterations than the evolution

of Dsmooth. Doing so allows the warp computed in Step 4 to expand the layout to unused

portions of the screen.

Computing an optimal warp (Step 4): After computing Dsmooth and Dtarget in

the previous steps, we are now ready to compute a warp u = (u1(x, y), u2(x, y)) that

maps location (x, y) in Dsmooth to location (u1(x, y), u2(x, y)) in Dtarget. Using u, we are

able to modify the layout, as discussed in Step 5, in order to compute Lfinal.

The warp procedure is based on the algorithm of Haker et. al. [94], which is shown to

compute a warp that minimizes displacements. In our case this helps maintain the overall

structure of the graph, thus preserving the mental map. The key idea of the algorithm

is to iteratively converge to an optimal mapping by using a gradient descent technique.

More details are given in Section 4.4.

Computing the final layout (Step 5): In the final stage of the algorithm, the

positions of the nodes are modified in order to create the output layout Lfinal. Given the

optimal warping u = (u1(x, y), u2(x, y)) that was computed in Step 4, which is defined

over a discreet, regular grid, this step computes the updated positions of each node in

the graph, which are non-integral. Note that this stage modifies the node coordinates

and not the image of the layout.

The optimal warping u = (u1(x, y), u2(x, y)) gives for each pixel in the input density

a destination position in the image. Using the warp, new node positions are computed

using an iterative process. Given a node n with current position (xn, yn) (initialized to



the node position in Linitial), its updated position is set to

xupdatedn = xn + α(u1(xn, yn)− xn)

yupdatedn = yn + α(u2(xn, yn)− yn). (4.7)

The number of repetitions of Equation 4.7 is controlled by the user. Performing more

iterations results in a larger displacement, representing a tradeoff between node separation

and preserving the structure of the graph. The constant α, whose default value is 0.5 is

used to scale the displacement.

In order to compute the value of the functions u1 and u2 at the non-integral node

coordinates (xn, yn) bilinear interpolation is used. Using an interpolation method with

sub-pixel accuracy helps increase the separation between close-by nodes in the input

layout.

Complexity: Step 1 requires traversing the nodes and edges of the graph, which is

O(E + V ) for a graph with E edges and V nodes. In addition it requires rasterizing the

nodes and edges, which is performed quickly on the GPU. Step 2 performs a fixed number

of iterations, each of which takes O(P ) for an image containing P pixels. Step 3 uses

a fixed number of directions, each requiring O(√

P ) work for summing up the densities

along the ray emanating from each of the P pixels. The total here is O(P 1.5). As discussed

in Section 4.4, Step 4 requires O(P ). Finally, the last step is O(V ). Hence, the total

runtime is O(E + V + P 1.5). As shown in Section 4.6, it is dominated by the time spent

in Step 3, which can be controlled by changing P .

4.4 Computing an Optimal Mapping

In this section we describe a method, based on optimal mass transport, for finding a

mapping between the two density images Dsmooth and Dtarget in a way that minimizes

displacements, thus preserving the structure of the graph.

First, a brief introduction to the optimal mass transport problem, which was first

formulated by Monge in 1781 and later by Kantorovich [115] is provided. Next, the

application of this problem to improving graph layouts is discussed. The section con-

cludes by briefly describing how the mass-transport problem is efficiently solved using

the algorithm of Haker et. al. [94].



Let Ω0 and Ω1 be two subdomains of R2, with smooth boundaries. Positive density

functions µ0(x, y) and µ1(x, y) are defined on these domains, respectively. We assume

that∫∫

Ω0

µ0(x, y) dx dy =

∫∫

Ω1

µ1(x, y) dx dy, (4.8)

i.e. the same total mass is contained in both regions. In our case of density images of

graph layouts, we assume Ω0 = Ω1 = [0, 1]× [0, 1].

Our purpose is to construct a mapping between Dsmooth computed in Step 2 and

Dtarget computed in Step 3. Unlike the classical setting described above, the densities

used in our case can be zero in some regions of the image - the ones not occupied by the

input graph layout. We therefore equalize the mass (in order to ensure Equation 4.8 is

met) and add a constant ε to each of the input densities before computing the optimal

mapping u, using the following relations:

µ0 = ε + Dsmooth , µ1 = ε + Dtarget

∫∫

Ω0

Dsmooth(x, y) dxdy

∫∫

Ω1

Dtarget(x, y) dxdy, (4.9)

where µ0, µ1 are the equalized and shifted densities which are used to compute the optimal

warp. In our implementation ε = 0.5.

Diffeomorphisms u = (u1(x, y), u2(x, y)) from Ω0 to Ω1, which map one density func-

tion to the other according to the following relation

µ0(x, y) = |Du(x, y)|µ1(u(x, y)) (4.10)

are considered. Here Du is the Jacobian matrix and |Du| is its determinant [191] .

Equation 4.10 is called the Mass Preservation (MP) property and accordingly u ∈ MP .

It implies, for example, that if a small region in Ω0 is mapped to a large region in Ω1,

there must be a corresponding decrease in density in order for the mass to be preserved.

Many mappings u that satisfy Equation 4.10 exist. We would like to choose an optimal

one for our application. We use the squared L2 Monge-Kantorovich distance, defined as

follows

d22(µ0, µ1) = inf

u∈MP

∫∫

‖u(x, y)− (x, y)‖2µ0(x, y) dx dy. (4.11)



This distance places a penalty on the distance the map u moves each bit of material,

weighted by its mass. Hence, this distance fits our requirement of disturbing the input

graph layout as little as possible, in order to reduce changes to the structure of the layout,

thus conserving the user’s mental map.

A fundamental theoretical result [10,22,119] states that there exists a unique optimal

mapping u that is a gradient of a convex function ω, i.e. u = ∇ω. In order to find

the optimal mapping u we use the algorithm of Haker et. al. [94]. This algorithm has

two main stages. First, an initial mapping u0 is found. Next, the mapping is updated

iteratively in order to decrease the functional in Equation 4.11.

Finding an initial mapping is achieved by first solving a one-dimensional problem of

transporting mass in a direction parallel to the x-axis (Equation 4.12), followed by the

solution of a series of problems transporting mass parallel to the y-axis (Equation 4.13).

A function a = a(x) is implicitly defined by the equation

∫ a(x)

0

∫ 1

0

µ1(η, y) dy dη =

∫ x

0

∫ 1

0

µ0(η, y) dy dη. (4.12)

a(x) is determined by numerically calculating the integrals. Differentiating Equation 4.12

with respect to x gives

a′(x)

∫ 1

0

µ1(a(x), y) dy =

∫ 1

0

µ0(x, y) dy.

A function b = b(x, y) is now defined implicitly by the equation

a′(x)

∫ b(x,y)

0

µ1(a(x), ρ) dρ =

∫ y

0

µ0(x, ρ) dρ. (4.13)

Given a(x), the function b(x, y) can be computed by numerically performing the integra-

tions in Equation 4.13. The initial mapping is set to be u0(x, y) = (a(x), b(x, y)).

Considering u0 to be a vector field, the Helmholtz-Hodge decomposition [191] states

that u0 can be decomposed into the sum of a curl-free vector field ∇ω and a divergence

free vector field χ, i.e. u0 = ∇ω + χ. In the 2D case a divergence free vector field χ can

be written as χ = ∇⊥h for some scalar function h, were ⊥ represents rotation by 90, so

∇⊥h = (−∂h∂y

, ∂h∂x

). In this case the decomposition is u0 = ∇ω +∇⊥h.



In order to compute the optimal MP mapping u = ∇ω, the second step of the algo-

rithm removes the curl from u0. This is achieved by using an iterative gradient descent

method. In each iteration the current mapping u is modified in order to reduce the func-

tional in Equation 4.11. Note that at all stages, the mapping u is a valid solution to the

mass-transport problem. Setting u = ∇ω + ∇⊥f , f is found by solving the following

Poisson Equation with a Dirichlet-type boundary condition:

∇2f = −div(u⊥)

f = 0 on ∂Ω0. (4.14)

The boundary condition ensures that the mapping will remain constrained in the given

domain. It is shown in [94] that the functional in Equation 4.11 can be reduced by the

following evolution equation:

∂u

∂t=

1

µ0

Du∇⊥f. (4.15)

The time step 4t is set as 4t = minx,i ‖ 1µ0

(∇⊥f)i‖−1, where the subscript i stands for

the component of the vector. The algorithm iteratively solves the Poisson Equation and

updates the mapping u until the curl of u is below a given threshold. Our experiments

show that performing up to 30 iterations of Equation 4.15 is sufficient for obtaining a

high-quality warp.

A multi–grid method [24, 44] is used in order to quickly solve Equation 4.14. The

implementation uses the V-cycle algorithm to control the transition between grid levels,

Jacobi iterations for smoothing the solution and full weighting for downsampling solutions

between grids [24]. The complexity of the multi-grid method for an image containing P

pixels is O(P ), resulting in a rapid solution. Equations 4.12,4.13,4.15 are linear in the

image size. A fixed number of iterations of Equation 4.15 is performed. Hence, The total

complexity of this step in the algorithm is O(P ).

4.5 Implementation on the GPU

Computing the target density (Step 3) is the most time consuming stage of the algorithm

since we need to perform many computations for each pixel of the image. In this section



we describe how this step is implemented on the GPU, resulting in a significant speedup

of the running time of the algorithm, as shown in Section 4.6.

Please refer to Sections 1.3 and 2.2 for more information about accelerating compu-

tations using GPUs.

The GPU has several architectural characteristics that help improve the speed of

computation compared to the CPU. First, the GPU is highly parallel. It is able to run

hundreds of computational threads in parallel. In some cases, memory access latency

is hidden by switching to executing a different thread. Second, the GPUs memory sys-

tem is optimized for two-dimensional locality, as opposed to the one-dimensional locality

employed in CPUs. Our implementation on the GPU takes advantage of these properties.

Given the current density image µ as an input, the goal is to calculate for each pixel

the best advancement direction, θbest, as in Equation 4.5. This is done by finding for each

pixel the angle that minimizes the score in Equation 4.4.

Figure 4.4: Execution graph of finding the best advancement direction on the GPU in

Step 3 (rectangles = textures, ovals=kernels, θ is the current direction being tested)

Several textures, which are two-dimensional images or data arrays, are used to store

data on the GPU, as illustrated in Figure 4.4. The input density µ is stored in the density

texture. For each candidate direction θ, the current score for each pixel is stored in the

local metric texture. Two textures are used to store the current best angle for each pixel:



global metric #1 and global metric #2. We use two textures due to the GPU’s inability

to read and write to the same texture. At the end of the computation the global metric

texture holds the best advancement direction θbest for each pixel.

Computation on the GPU is achieved by running a kernel or fragment program for each

pixel in the image. The GPU is able to split the computation into hundreds of parallel

threads, thus achieving high performance. The computation, shown in Figure 4.4, is

performed using two kernels. The first kernel, called calc metric, calculates Equation 4.4

for each pixel in the image given the current direction θ. Given the coordinates of the

current pixel, lmax from Equation 4.4 is determined by calculating the closest intersection

of the ray in direction θ, starting at the current pixel, with a boundary of the image. Next,

the score is accumulated using Equation 4.4. During this process, Bilinear interpolation

is used to access the density texture in the non-integral coordinates.

The GPU is able to efficiently execute the calc metric kernel. For each direction θ,

when concurrently running the kernel on neighboring pixels, the accesses to the density

metric have a 2D locality. This results in a good utilization of the caches and memory

bandwidth of the GPU, which are optimized for 2D operations.

A second kernel, the merge kernel is used to update the current best advancement

direction per pixel. This kernel accepts as input the previous best direction, stored in

the global metric texture, and the value of the score calculated in the current direction

θ, stored in the local metric texture. The kernel compares the two scores and writes to

its output the merged best score. After iteratively running the calc metric and merge

kernel for the set of all candidate angles, the global metric texture contains the value of

the best angle θbest for each pixel.

It should be noted that it is better to compute the best direction (Equation 4.5) in a

single pass, using the current density µ as the input and the best direction θbest(x, y) as

the output. This would remove the necessity for having a temporary texture for the local

result, performing the ping-pong algorithm between the two copies of the global metric

texture and running the merge kernel. However, in order to protect the system from fatal

errors, the graphics driver limits the amount of time a computational kernel is allowed to

run. The allotted time is insufficient to perform the computation in one pass, especially

in lower performance GPUs. Thus, we chose the multi-pass implementation discussed in

the previous paragraphs.



(a) Input FM 3 layout (b) Removing node overlaps (c) Our improved layoutusing [49]

Figure 4.5: ug 380 graph (V=1104, E=3231). Note how when using our algorithm the

center expands, reducing node density while the outer ring is unchanged. When using [49]

the layout is hardly changed.

4.6 Results

Our algorithm was tested using the output of several state-of-the-art graph layout algo-

rithms in a variety of applications. Table 4.1 gives information about the graphs and the

parameters used in our algorithm. Below, we discuss the results of our algorithm and

compare them to the results obtained by the node overlap algorithm of Dwyer et. al. [49].

Figures 4.1 and 4.5 show improvements of layouts computed by FM 3 [91], which is

a multi-level force-directed algorithm. It uses solar systems, which consist of nodes at a

distance of two edges or less from the center of the solar system, in order to create the

graph hierarchy.

Figure 4.1 shows a layout of the protein graph, which is the unweighted version of

the protein homology graph presented in [3]. The layout contains a large, dense central

cluster. Applying our algorithm increases the percentage of screen space devoted to the

elements of the graph. This allows more of the fine details of the graph to become

visible, especially in the central region of the graph. Note how the overall structure of

the different elements of the graph, such as the different ”spokes” it contains, is retained.

In comparison, the algorithm from [49] was not able to remove all of the overlaps and the

changes to the layout were small, similarly to Figure 4.8 (b).

Figure 4.5 shows a layout of the ug 380 graph [1], which contains one node with a



very high degree. The layout contains a central core which is packed with many nodes.

In (b) the results of a node overlap removal algorithm [49] are shown. Since the input

layout contains hardly any overlaps, the result in (b) is very similar to (a) and the graph

remains cluttered. Applying our algorithm to this challenging case, shown in (c), results

in an increase in the radius of the central core, increasing the separation between the

nodes. The exterior nodes, which are sparser, are unaffected.

(a) Input layout by (b) Removing node overlaps (c) Our improved layoutTopoLayout [7] using [49]

Figure 4.6: Add32 graph (V=4960, E=9462). Note how in (c) each of the rings is

expanded, showing more detail.

Figure 4.6 shows an improvement of the layout produced by TopoLayout, which is a

feature-based multi-level graph drawing algorithm [7]. It creates a subgraph hierarchy

by recursively detecting topological features in the graph and replacing them with meta-

nodes. Each feature is drawn using an algorithm tuned for the specific topology. The

graph hierarchy is drawn bottom-up using an area-aware algorithm. The figure shows

the add32 graph [204], which describes a 32-bit adder that contains many biconnected

components. In (b) the results of a node overlap removal algorithm [49] are shown.

Note that the structure of the input layout is significantly distorted, making it difficult

to comprehend the structure of the graph. Our improved layout, shown in (c), is able

to expand the circular clusters contained in the graph, better visualizing the intricate

details of the graph. For example, additional details about the composition of the inner

circle in the leftmost part of the graph become visible. Also, expanding the small circular

formation at the bottom right hand side of the graph allows more detail about the sub-



clusters it contains to become visible. Moreover, as opposed to (b), the layout in (c)

maintains the overall structure of the layout.

Figures 4.7 and 4.8 show improvements of the layouts produced by [65], which is a

multi-level forced directed graph layout algorithm. Spectral partitioning is used to create

the graph hierarchy. KD-tree type partitioning is used to accelerate the computation and

allows for an efficient GPU implementation.

Figure 4.7 shows the ISP graph, which represents the router networks of several in-

ternet service providers (ISPs) [2]. In the layout, green, black and blue nodes represent

routers belonging to the ISPs visualized, while red nodes show other routers used to

connect to the Internet. The layout in (b), computed by the algorithm from [49], man-

ages to displace nodes in order to avoid overlaps, while generally maintaining the overall

structure of the graph. Unlike our algorithm, the resulting layout does not attempt to

make use of sparse regions of the layout. Instead, small displacements are used in order

to avoid overlaps. Applying our algorithm to this layout, as shown in (c), improves the

separation between the nodes of the graph, while maintaining important characteristics of

the graph, such as the separation to clusters (excluding the red nodes). This is especially

evident in the blue cluster at the bottom right and among the red nodes in the center

left part of the graph. Note how the algorithm is able to expand each of the clusters

into surrounding sparse areas, allowing more details to become visible inside the clusters,

while still preserving the overall clustered structure of the graph.

Figure 4.8 shows the bcsstk32 graph [204], which represents a stiffness matrix. It has

a very high edge density: E/V > 22. The layout in (b), computed by the algorithm

from [49], is nearly identical to the input layout. The algorithm is not able to remove all

of the overlaps of the graph, even when we change the size of the squares representing the

nodes. In (c) our uncluttering algorithm is used. It stretches the input layout, making

the mesh-like structure of the graph more evident. Note that the overall structure and

features of the graph are conserved after the uncluttering process. Also note that in

the improved layout there are less highly-concentrated areas, where the edges are totally

hidden. This makes the mesh structure of the graph visible in a larger portion of the

layout.

For our performance tests, we used a PC running Windows XP equipped with 2GB

RAM, an Intel Core 2 Duo E6750 2.66 GHz CPU and an NVIDIA 8800GTS GPU with



(a) Input layout from [65] (b) Removing node overlaps (c) Our improved layoutusing [49]

Figure 4.7: ISP router graph (V=5044, E=8043) . Nodes are color-coded by the ISP they

belong to. Note how in (c) the blue nodes are uncluttered.

(a) Input layout from [65] (b) Removing node overlaps (c) Our improved layoutusing [49]

Figure 4.8: Bcsstk32 graph (V=44609, E=985046). Note how in (c) reducing the node

density allows more of the mesh structure of the graph to be uncovered in the top left,

bottom and middle of the graph.



graph information node overlap removal [49] our algorithm

graph V E CPU√

P ITRS CPU CPU+GPUprotein 30727 1206654 543 257 8 643 6.62add32 4960 9462 2.23 257 15 641 4.86

bcsstk32 44609 985046 462 257 4 642 5.84ISP 5044 8043 0.9 257 25 643 5.19

ug 380 1104 3231 0.03 257 30 643 4.86

Table 4.1: Graph information and running times. The left side of the table gives informa-

tion about the graphs. V and E are the number of graph nodes and edges, respectively.

The central part of the table gives the running times in seconds of the algorithm from [49],

using the same machine used to run our algorithm. The right side of the table shows

the results of our algorithm. The width and height in pixels of the density image used

is equal to√

P . ITRS is the number of iterations of Equation 4.7 in Step 5. CPU is the

total running time of the algorithm in seconds when using only the CPU. CPU+GPU

is the total running time of the algorithm in seconds when using the GPU to accelerate

Step 3.

96 shader processors running at 1.2GHz. The algorithm was implemented using C++,

OpenGL and Cg.

Table 4.1 gives information about the graphs and the running times. It is evident that

the running time is relatively independent of the size of the graph and the number of

displacement iterations made. This is so since the bulk of the computation time is spent

working on the different images the algorithm operates on. More specifically, as can be

seen from comparing the CPU and CPU+GPU columns, most of the time is spent in

Step 3, which involves a computationally demanding ray-shooting process (Equations 4.4

and 4.5). Using the GPU results in a very large speedup of this step, accelerating the

total runtime by up to 130 times. This reduces the total runtime to a few seconds.

Table 4.1 compares our running times to those of [49]. In the latter, there is a big

variation in the running time, since it depends on the number of overlaps. When there

are few overlaps (add32, ISP, ug 380), the algorithm runs quickly. Consequently, the

changes to the layout are small. In other cases (protein, bcsstk32), the running time is

higher. Due to the large variation in running times, in some cases it runs faster than our

GPU implementation while in others it runs slower.

There are several reasons why the GPU is able to accelerate Step 3 and therefore the



execution of the entire algorithm so significantly. First, since the amount of work per-

pixel is similar, there is good load balance between the different processors in the GPU.

Thus, the GPU is able to make efficient use of its computing power, which is much higher

than the CPU’s. Second, due to the 2D locality in the memory access pattern during

the ray-shooting process, the GPU is able to make efficient use of its caches. On the

CPU, however, accessing a 2D image requires lookups using pointers, which is inefficient.

Finally, as opposed to the CPU, the GPU contains built-in instructions for performing the

clamping operations needed for performing the interpolation of the values in the density

texture. In summary, this is a good example in which the architecture of the GPU is able

to provide a significant speedup compared to a CPU implementation.


This chapter proposes a new algorithm for reducing the cluttering commonly occurring

in graph layouts. Given any graph layout, the algorithm moves nodes to empty regions

of the screen in a mental-map preserving way.

The algorithm has several key ideas. First, the density image of the computed graph

layout is used to decide how nodes will be displaced. Second, a diffusion process that takes

the structure of the density image into account computes an alternative node distribution,

making better use of the available screen space. Third, an optimal and mental-map pre-

serving warp, based on results from mass-transport problems, determines how to displace

the nodes. Although the mathematical techniques used in this chapter require a great

deal of computation, the chapter demonstrates how improved layouts can be computed

in a matter of seconds, by using the GPU to significantly accelerate the algorithm.

It has been shown that our algorithm is able to improve layouts of large graphs,

produced by a variety of well-known algorithms.

The are several future research directions. First, more research into edge uncluttering

is required. Possible techniques include edge-bundling which is based on the actual layout

of the graph (as opposed to the graph-theoretical structure of the graph) and bending

some of the edges. Second, a model for edge repulsion can help better separate the edges,

improving the readability of the improved layout. Third, the algorithm can be integrated

into an interactive graph exploration system in which the areas to unclutter are selected



by the user. This will allow interactively expanding the current region of interest on

expense of the other parts of the graph. Finally, the algorithm can be used to enhance

the visualization of changes in a dynamic graph sequence.

It may be possible to accelerate the algorithm further by moving more parts to the

GPU. These include the multi-grid solution of the Poisson equation [85] (Equation 4.14)

and the iterative mass-transport evolution [174] (Equation 4.15).




Chapter 5

Online Dynamic Graph Drawing

This chapter presents an algorithm for drawing a sequence of graphs online. The algo-

rithm strives to maintain the global structure of the graph and thus the user’s mental

map, while allowing arbitrary modifications between consecutive layouts. The algorithm

works online and uses various execution culling methods in order to reduce the layout time

and handle large dynamic graphs. Techniques for representing graphs on the GPU allow

a speedup by a factor of up to 17 compared to the CPU implementation. The scalability

of the algorithm across GPU generations is demonstrated. Applications of the algorithm

to the visualization of discussion threads in Internet sites and to the visualization of social

networks are provided. The material in this chapter is based on [66,68].

The rest of the chapter is organized as follows. Section 5.1 gives an introduction.

Section 5.2 discusses related work. Section 5.3 formally defines the problem and gives an

overview of key algorithm ideas. Section 5.4 presents the algorithm in detail. Section 5.5

discusses our implementation. Section 5.6 presents results. Section 5.7 discusses an ap-

plication to Internet discussion threads visualization. Section 5.8 presents an application

to the visualization of social networks. Section 5.9 concludes the chapter .

5.1 Introduction

Many applications require the ability of dynamic graph drawing, i.e., the ability to modify

the graph [47,116,155], as illustrated in Figure 5.1. Sample applications include financial

analysis, network visualization, security, social networks, and software visualization. The

challenge in dynamic graph drawing is to compute a new layout that is both aesthet-

ically pleasing as it stands and fits well into the sequence of drawings of the evolving


5. Online Dynamic Graph Drawing 76

(a) (b) (c)

Figure 5.1: Snapshots from the threads1 graph sequence, visualizing discussion threads

at http://www.dailytech.com, left to right. Node labels in red show user names, edges

link users replying to posted comments. Up to 119 users are shown. Discussion topics,

marked as blue A n nodes, include GPUs (A 4864, A 4285), chipsets (A 4637, A 4425,

A 4538 and A 4866) and CPUs (A 4589). A total of 144 messages are visualized.

graph. The latter criterion has been termed preserving the mental map [145] or dynamic

stability [155].

Most existing algorithms address the problem of offline dynamic graph drawing, where

the entire sequence of graphs to be drawn is known in advance [47, 56, 128]. This gives

the layout algorithm information about future changes in the graph, which allows it to

optimize the layouts generated across the entire sequence. For instance, the algorithm

can leave place in order to accommodate a node that appears later in the sequence. In

contrast, very little research has addressed the problem of online dynamic graph drawing,

where the graph sequence to be laid out is not known in advance [63,132].

This chapter proposes an online algorithm for dynamic layout of graphs. It attempts

to maintain the user’s mental map, while computing fast layouts that take the global

graph structure into account. The algorithm, which is based on force directed layout

techniques, controls the displacement of nodes according to the structure and changes

performed on the graph. By taking special care in order to represent the graph in a GPU-

efficient manner, the algorithm is able to make use of the GPU to significantly accelerate

the layout.

This chapter makes the following contributions. First, a novel, efficient algorithm



for online dynamic graph drawing is presented. It spends most of the execution time

on the parts of the graph being modified. Second, it is shown how the heaviest part

of the algorithm, performing force directed layout, can be implemented in a manner

suitable for execution on the GPU. This allows us to significantly shorten the layout time.

For example, incremental drawing of a graph of 32,000 nodes takes 0.704 seconds per

layout. Finally, two information visualization applications of the algorithm are presented.

The first is the visualization of the evolution over time of discussion threads in Internet

sites. In this application, illustrated in Figure 5.1, nodes represent users and edges

represent messages sent between users in discussion forums. The second application is

the visualization of the growth of a social network, shown in Figure 5.9. Here, nodes

represent users and edges represent connections between friends.

5.2 Related Work

Several algorithms address the problem of offline dynamic graph drawing, where the entire

sequence is known in advance. In [47], a meta-graph built using information from the

entire graph sequence, is used in order to maintain the mental map. In [128], a stratified,

abstracted version of the graph is used. The nodes are topologically sorted into a tree–like

structure (before layout) in order to expose interesting features. An offline force directed

algorithm is used in [56] in order to create 2D and 3D animations of evolving graphs.

Creating smooth animation between changing sequences of graphs is addressed in [19].

A few algorithms have been proposed to address the online dynamic graph drawing

problem, where the graph sequence is not known in advance. An approach based on

Bayesian networks is described in [20]. A cost function that takes both aesthetic and

stability considerations into account, is defined in [132]. Unfortunately, computing this

function is very expensive (45 seconds for a 63 node graph). An algorithm for visualizing

dynamic social networks is discussed in [149]. Drawing constrained graphs has also been

addressed. Incremental drawing of DAGs (directed acyclic graphs) is discussed in [155].

In [63] dynamic drawing of clustered graphs is addressed. Dynamic drawing of orthogonal

and hierarchical graphs is discussed in [86]. The current chapter aims at producing online

layouts of general graphs efficiently.

In recent years, GPUs have been successfully applied to numerous problems outside



of classical computer graphics [163]. Protein folding [165] and simulation of deformable

bodies using mass-spring systems [83,198] are related to our application. However, while

the mass-spring algorithms take only nodes connected by edges into account, the force

directed algorithm considers all the nodes when calculating the force exerted on a node.

GPUs have also been used to simulate gravitational forces [157], where an approximate

force field is used to calculate forces. A GPU-based implementation of the MDS (multidi-

mensional scaling) algorithm is discussed in [189]. Accelerating static graph drawing on

the GPU has been addressed by several authors [8, 65, 93]. A GPU accelerated force di-

rected layout algorithm using an Euler method is presented in [8]. Although a very large

acceleration is achieved, the complexity of the underlying algorithm is O(|E|+ |V |2) for

|E| edges and |V | nodes. Please refer to Sections 1.3 and 2.2 for more information about

accelerating computations using GPUs.

5.3 Overview

Given, online, a series of undirected graphs G0 = (V0, E0), G1 = (V1, E1), . . . , Gn =

(Vn, En), the goal of the algorithm is to produce a sequence of layouts L0, L1, . . . , Ln,

where Li is a straight-edge drawing of Gi. The updates Ui that can be performed between

successive graphs Gi−1 and Gi, include adding or removing vertices and edges.

A key issue in dynamic graph drawing is the preservation of the mental map, i.e. the

stability of the layouts [145]. This is an important consideration since a user looking at a

graph drawing becomes gradually familiar with the structure of the graph. The quality of

the layout can be evaluated by measuring the movement of the nodes between successive

layouts, which should be small, especially in unchanged areas of the graph. In addition,

each layout in the sequence should satisfy the standard requirements from static graph

layouts, such as minimization of edge crossings, avoidance of node overlaps and layout

symmetry [116].

Among the different classes of graph drawing algorithms, the force directed algorithm

class [116, 199] is a natural choice in our case, for several reasons. First, different layout

criteria can be easily integrated into these algorithms. Second, in some of these algo-

rithms, it is possible to update node positions in parallel, thus making it possible to

efficiently employ the GPU’s parallel computation model. Finally, it is possible to use a



convergence scheme that resembles simulated annealing, in which nodes are slowly frozen

into position [70]. This is suitable for use in dynamic layout, where nodes have different

scales of movement.

Our algorithm utilizes several key ideas. In order to maintain the mental map, we

perform the following. First, nodes are initially placed using local graph properties and

information from the previous layout. Second, a movement flexibility degree is assigned

to each node, according to the changes in the graph. This allows the algorithm to “focus”

on nodes that may have large displacements. Third, an approach similar to simulated

annealing is used, where the graph slowly freezes into its final position. Fourth, the

changes between graphs are smoothly animated. In order to reduce the layout time while

maintaining layout quality, the graph is partitioned so that forces from distant nodes can

be approximated, and the GPU is used to accelerate the layout. Moreover, in order to

quickly compute aesthetic layouts, a multi-level force directed scheme is used.

5.4 Algorithm

Given a sequence of graphs G0, . . . Gn, our algorithm computes layouts L0, . . . Ln. This

section describes the algorithm in detail. We begin with describing how the online dy-

namic layouts Li, i ≥ 1 are computed, given Li−1 and Gi. Next, we discuss the algorithm

used to compute the initial layout L0.

5.4.1 Computing Dynamic Layouts

Given a set of undirected graphs G1, G2 . . . Gn, the goal of the dynamic algorithm is

to compute online layouts L1, L2, . . . Ln. Algorithm 2 is used to compute the layouts.

Figure 5.2 visualizes the main steps of the algorithm. We elaborate on these steps below.

Merging (Step 1): Computing a good initial position is vital for reducing the layout

time and maintaining dynamic stability [39,72]. The coordinates of nodes that exist both

in Gi−1 and in Gi are copied from Li−1. Nodes in Gi that do not exist in Gi−1 are assigned

coordinates while considering local graph properties, as follows.

Each un–positioned node v is examined in turn. Let PN(v) be the set of neighbors of

node v ∈ Vi that have already been assigned a position. If v has at least two positioned



Algorithm 2 Dynamic layout of graph Gi, i ≥ 1

input: Gi, Li−1 output: Li

1. Merging: Merge layout Li−1 and graph Gi to produce an initial layout.

2. Pinning: Assign pinning weights to the nodes, which control the allowed displace-

ment of each node.

3. Coarsening: Set C0 = Gi. Compute C1, C2, . . . , Ccoarsest where Ck+1 =

edge collapse(Ck). Set l = coarsest.

4. Compute a geometric partitioning of the nodes of C l.

5. Perform incremental layout of C l. If l = 0 goto step 7 and use the layout of C0 as

Li (the layout of Gi).

6. Interpolation: Update the initial layout of C l−1 using the layout of C l. Set l = l−1,

goto step 4.

7. Animation: Smoothly morph Li−1 into Li.

neighbors, v is placed at their weighted barycenter: pos(v) = 1|PN(v)|

∑

u∈PN(v)

pos(u). If v

has a single positioned neighbor, u, then v is positioned along the line between pos(u) and

the center of the bounding box of Li−1. This procedure is performed in a BFS (breadth–

first search) manner, starting from the positioned nodes. The nodes that cannot be placed

by this procedure are placed in a circle around the center of the bounding box of Li−1.

A Positioning score Γ(v) ∈ [0, 1] is assigned to each node, based on the method used

to position it. These scores indicate the “confidence” in the node’s position. The higher

the positioning score, the better the initial placement is considered. The scores are used

to control the movement of nodes, as described in Step 2. The highest score is assigned

to nodes whose neighborhood has not changed between Gi−1 and Gi, since we are most

confident with their positions. A lower score is assigned to nodes that are positioned

according to two or more neighbors. An even lower score is assigned to nodes positioned

according to one neighbor. Finally, the lowest score is assigned to nodes for which no



good initial guess is known, and are therefore placed near the center of the bounding

box of the graph. In our implementation, scores of 1, 0.25, 0.1 and 0 are assigned to

nodes positioned according to their coordinates at Li−1, at the barycenter of two or more

neighbors, according to one neighbor (in a direction pointing away from the center of the

bounding box of the graph), and at the center of the bounding box of Li−1, respectively.

Figure 5.2 (b) shows an example of computing the positioning score Γ. Note that darker

nodes, with a lower Γ are relatively localized. These changes are propagated to the reset

of the graph in the next step.

Pinning (Step 2): After all the nodes are placed, their pinning weights, wpin(v) ∈[0, 1], which reflect the stiffness in the positions of the nodes, are computed [20,63,128].

The position of a node with a pinning weight 1 is fixed during layout, while a node with

a pinning weight 0 is completely free to move during layout.

Pinning weights are assigned using two sweeps. The first sweep, which is local, uses

information regarding the positioning scores Γ of the node and its neighbors:

wpin(v) = α · Γ(v) + (1− α)1

degree(v)

∑

u:(u,v)∈E

Γ(u).

Taking the neighbors of v into account amounts to performing low pass filtering of the

pinning weights, according to graph connectivity information. This mimics the creation

of flexible ligaments in the graph around areas that were modified. Using a higher α

value will reduce the influence of the neighbors of a node on its displacement. In our

implementation α = 0.6.

In the second sweep, the local changes are propagated, in order to create a global

effect. A BFS-type algorithm assigns each node a distance-to-modification measure, as

follows. The distance-zero node set, D0, is defined as the union of the set of nodes with a

pinning weight of less than one and the set of nodes adjacent to an edge that was either

added or removed from Gi−1. The distance-one set, D1, is defined as the subset of nodes

in V \D0 adjacent to a node in D0. In general, Di is the subset of nodes not yet marked,

which are adjacent to a node in Di−1. This process continues until all the nodes in V are

assigned to one of the sets D0, D1, · · · , Ddmax. Note that according to this definition, the

nodes in set Di, i ≥ 1 were assigned wpin ≡ 1 in the first sweep. In the second sweep, as

described below, some of these nodes are assigned a lower pinning weight. This gives the



(a) (b)

(c) (d)

Figure 5.2: Dynamic layout steps: (a) previous layout, Li−1 (b) merged graph (Step 1),

color coded according to the positioning score Γ(v). Brighter nodes have a higher Γ. Here,

nodes with Γ ∈ 0.1, 0.25, 1 are shown. (c) Pinning weights wpin(v) (Step 2). Brighter

color corresponds to a higher wpin(v) (d) Final layout (Step 5), color coded according to

the partitioning (Step 4)



layout algorithm more flexibility in adopting to changes in the graph.

Pinning weights are assigned to nodes based on their distance-to-modification. In

particular, nodes that are farther than some cutoff distance dcutoff , are assigned a

pinning weight of one, thus remaining fixed, since they are far away from areas of the

graph that were changed. The movement of other nodes depend on the set Di they belong

to. This is done as follows. Given dcutoff = k ∗ dmax, the nodes in Di, i ∈ [1, dcutoff ]

are assigned pinning weights:

wpin = (winitialpin )(1− i

dcutoff).

This assignment creates a decaying effect in which nodes farther away from D0 are as-

signed higher pinning weights. The constant winitialpin is used to determine the decay in

pinning weight. The nodes in Dj+1 are assigned a pinning weight that is (winitialpin )( −1

dcutoff)

times the pinning weight of nodes in Dj. Note that a larger k results in a more global

effect, possibly trading layout stability for better layout quality (since nodes are more

free to move). Setting a higher winitialpin will make the graph more rigid, thus limiting the

displacement of nodes already existing in the previous layout. In our implementation

k = 0.5 and winitialpin = 0.35.

Figure 5.2 (c) shows an example of computing the pinning weights. Note how the local

changes in (b) are propagated to a larger portion of the graph. Also note the decaying

effect as the distance from the modified part, in the middle of the graph, increases. This

reflects the requirement that nodes further from the changed areas should undergo fewer

modifications during layout.

While pinning weights were proposed in the past [128], the approach taken here is

different. In the current chapter pinning weights are used as part of setting the allowed

displacement of nodes, prior to computing the layout. This controls the movement flex-

ibility of each node. In [128], nodes are displaced according to a combination of two

different forces. The relative strength of the forces is determined by weights that are

modified as the layout iterations progress.

Coarsening (Step 3): In this step a series of reduced versions of the graph, which

include initial positions, are constructed. These are used to compute increasingly detailed

”skeletons” of the final layout. At each level, given a fine graph, a coarser representation



is constructed by performing a series of edge collapse operations. This is done by replacing

two connected nodes and the edge between them by a single node, whose weight is the

sum of the weights of the nodes being replaced. The pinning weight of the new node is

set to the geometric mean of the pinning weights of the replaced nodes. The new node is

placed at the weighted average position of the corresponding fine nodes, biased according

to their weights. The weights of the edges are updated accordingly. (The weight of a

node/edge in the finest graph is 1.)

The order of the edge collapse operations is determined as follows. First, nodes, which

are candidates to be eliminated, are sorted by their degree (so as to eliminate low-degree

nodes first). An adjacent edge of an un-paired low-degree node is chosen for collapse by

maximizing the following measure: w(u,v)w(v)

+ w(u,v)w(u)

, where w(x) is the weight of node x

and w(x, y) is the weight of edge (x, y). This function helps to preserve the topology of

the graph by “uniformly” collapsing highly connected nodes. Coarsening is used in [205],

where a different ordering of the edge collapse operations is used.

In our implementation, the coarsening stops either when the graph is reduced to

several hundred nodes or after four coarsening steps. Coarsening further may lead to

diminishing results due to the inaccuracy in the computed pinning weights of the coarse

graph.

Geometric partitioning (Step 4): The partitioning step is used to accelerate the

layout step, discussed below. There are three requirements that should be satisfied by

partitioning. First, the partitions should be geometrically localized, thus the nodes in

each partition should be relatively close to each other. This will let us represent each

partition using a single ”heavy” node. Second, the number of nodes in each partition

should be similar. This is important in order to achieve good load balance between the

parallel processors of the GPU, as discussed in Section 5.5. Third, the algorithm should

be fast.

We have chosen to use a KD-tree-type partitioning. The algorithm works top down.

Given the positions of all nodes, they are sorted according to the X coordinate and the

index of the median node is located. The nodes are partitioned into two sets: one with

indices below the median and one with indices equal or greater to the median index.

The algorithm proceeds recursively with the two subsets. This time, sorting is performed



according to the Y coordinate. The algorithm alternates between computing the median

X and Y coordinates. The recursive subdivision terminates when the size of the subset

is below the required partition size. Figure 5.2 (d) shows an example of computing a

geometric partitioning of a graph.

Layout (Step 5): This step of the algorithm computes the layout. Our algorithm

builds on the basic Fruchterman-Reingold (FR) force directed algorithm [70], which is

modified, so as to make it suitable both for incremental layout and for efficient imple-

mentation on the GPU. The basic algorithm is thus modified in three ways. First, an

approximate force model is used in order to speedup the calculation. Second, node pin-

ning allows individual control over the movement of each node. Third, the algorithm is

reformulated in a manner suitable for efficient implementation on the GPU.

Figure 5.3 outlines our algorithm. The input is a graph G = (V,E) decomposed into

partitions Pi, nodes with initial placement pos(v), and their pinning weights wpin(v). The

output is the positions for all nodes. The key idea of the algorithm is to converge into a

minimal energy configuration, which usually leads to aesthetically pleasing layouts.

The initialization of the algorithm includes setting the optimal geometric node dis-

tance K (that affects the scale of the graph), the initial annealing temperature t, the

temperature decay constant λ, and the fraction of the iterations done fracdone ∈ [0, 1].

Partitioning is used to accelerate the algorithm. Instead of calculating all-pair repul-

sive forces, as is customary, approximate forces are calculated. An exact calculation is

performed only for nodes contained in the same partition, while an approximate calcu-

lation is performed for nodes belonging to different partitions. The center of gravity is

found for each partition Pi and is used to replace the nodes in Pi.

Our experiments show that there is flexibility in the number of nodes in each partition,

e.g. Figure 5.4 shows that using twenty times fewer nodes in each partition has little effect

on the final layout. Moreover, it is not necessary to re-partition at every iteration, except

for the initial iterations of the initial layout (Algorithm 3, Step 4), where the nodes may

have a high displacement. During the incremental layout, the merge stage (Algorithm 2,

Step 1) already gives a good approximation of the final layout. In cases where there

are large changes between consecutive graphs, performing several re-partitioning steps

may improve the results. These cases can be identified using the following formula:



fracdone = 0 , K = 0.1, t = K ∗√

|V |, λ = 0.9

do iteration count times,

update partitioning (Alg. 2 Step 4, Alg. 3 Step 3) if required

parallel foreach partition Pi ∈ P ,

(1) calculate partition center of gravity CG(Pi) =

P

v∈Pi

pos(v)

|Pi|

parallel foreach node v, v ∈ Pi where fracdone > wpin(v),

(2) F replint (v) =

∑

u∈Pi,u6=v

K2 pos(v)−pos(u)‖pos(v)−pos(u)‖2

(3) F replext (v) =

∑

Pj∈P,Pj 6=Pi

K2|Pj| pos(v)−CG(Pj)

‖pos(v)−CG(Pj)‖2

(4) F repltot (v) = F repl

int (v) + F replext (v)

(5) F attr(v) =∑

u:(u,v)∈E

‖pos(u)−pos(v)‖(pos(u)−pos(v))K

parallel foreach node v where fracdone > wpin(v),

(6) F total(v) = F repltot (v) + F attr(v)

(7) posnew(v) = pos(v) + F total(v)‖F total(v)‖min(t, ‖F total(v)‖)

t∗ = λ, fracdone+ = iteration count−1

Figure 5.3: Parallel force directed layout algorithm

1|V |

∑

v∈V

(1− wpin(v)), whose value is proportional to the changes performed to the graph.

This is so since the number of iterations during which each node v moves, is proportional

to (1− wpin(v)) (see Figure 5.3).

The key to efficient implementation of this algorithm on the GPU is deciding which

nodes will be processed by the parallel foreach loops. In order to reduce layout time and

maintain dynamic stability, only some of the nodes are displaced in each layout iteration.

For each node v, wpin(v) is compared to the current fraction of layout iterations done,

fracdone. Only nodes that satisfy fracdone > wpin(v) are processed. This makes it possible

to control the relative displacement of nodes. Nodes with a low pinning weight will be

displaced during more iterations of the algorithm. Thus, the pinning weight, assigned

according to the changes performed in the vicinity of each node, controls the stability of



(a) 0.5√

|V | partitions (b) 10√

|V | partitions

Figure 5.4: Partition size effect on layout, graph bcsstk31, |V | = 35588, |E| = 572916

node locations. Because the allowed displacement is decreased from one iteration to the

next, setting a higher pinning weight limits the total displacement of nodes.

Using this method, the algorithm spends computation time only on nodes which

should be displaced in each layout iteration. The amount of work done depends on the

changes performed to the graph. Areas which did not change are not processed, thereby

reducing the layout time. It is often possible to accelerate the incremental layout time

by a factor of two using this technique.

The algorithm computes the total force acting on each node in several steps. First,

the centers of gravity of all partitions are computed. Next, the set of active nodes, which

are allowed to be displaced in the current iteration, is determined. For each such node,

the repulsive forces F replint , F repl

ext and the attractive force F attr acting on it, are calculated.

Finally, the nodes are displaced by an amount bounded by the current temperature of

the algorithm, which slowly decays, mimicking particles freezing into position.

Interpolation (Step 6): In this stage the computed layout of graph C l is interpo-

lated and used to update the initial layout of the higher-resolution graph C l−1. Given a

node v ∈ C l−1, which was mapped to node p ∈ C l, node v is displaced by the following

amount:

(1− wpin(v))A(Bboxold(C l))

A(Bboxnew(C l))(posnew(p)− posold(p)),



where A(Bboxold(C l)) is the area of the bounding box of graph C l computed during the

coarsening step, A(Bboxnew(C l)) is the area computed during the layout step, posold(p) is

the position of node p computed during the coarsening step and posnew(p) is the position

of p computed during the layout step. The motivation for using this formula is as follows.

The amount 1 − wpin(v) is used to displace nodes according to their pinning weights.

Nodes with a higher pinning weight are allowed a smaller displacement. Doing so helps

maintain the stability of the graph. Nodes with a lower pinning weight are allowed

greater flexibility in order to compute a high-quality layout. The displacement is scaled

according to the change in the area of the coarser C l due to the layout step. Finally, node

v is displaced according to the movement of the corresponding lower-resolution node p.

Morphing (Step 7): The old layout Li−1 is morphed into the new layout Li. The

animation, showing a gradual change, helps the user maintain the mental map of the

graph. Node positions are linearly interpolated. Removed nodes and edges fade out,

then the nodes and edges move to their new position and finally added nodes and edges

fade into view.

Complexity: The asymptotic complexity of the merging, pinning, coarsening and

interpolation steps is O(|E| + |V |). The complexity of the partitioning step is O(|V | ·log(|V |)): finding the median is linear at each level in the partition tree which contains

O(log|V |) levels. Assuming that each partition contains Cs nodes, the running time of

each layout iteration is O(|E| + |V | · (Cs + |V |Cs

)). This expression is minimized when

Cs =√

|V |, resulting in a total complexity of O(|E| + |V |1.5). When |E| ≈ |V |, the

dominating term is |V |1.5. Although this may look relatively high, the simplicity of the

calculation and its parallel implementation on the GPU give good results, as discussed

in Section 5.6. We use 50 layout iterations [205].

5.4.2 Computing the Initial Layout L0

Algorithm 3 is used to compute a static layout of the first graph, G0. This algorithm

uses a multi-level force directed scheme in order to quickly compute an aesthetic layout.

Both the Kamada-Kawai (KK) [113] and Fruchterman-Reingold (FR) [70] algorithms are

employed. We elaborate on the steps of the algorithm below.



Algorithm 3 Static layout of the first graph, G0

input: G0 output: L0

1. Coarsening: Set C0 = G0. Compute C1, C2, . . . , Ccoarsest where Ck+1 =

edge collapse(Ck). Set l = coarsest.

2. Perform KK layout of Ccoarsest.

3. Compute a geometric partitioning of the graph nodes.

4. Perform layout of C l. Update the partitioning (step 3) every few iterations. If l = 0

terminate and use the layout of C0 as L0 (the layout of G0).

5. Interpolate the layout of C l to form an initial layout for C l−1. Set l = l − 1, goto

step 3.

Coarsening (Step 1): A similar method to Algorithm 2, Step 3 is utilized to create

a series of reduced versions of the graph, which are used to compute increasingly detailed

”skeletons” of the final layout. The coarsening continues recursively until a small graph

of several hundred nodes is created. This graph is then efficiently handled in the next

step and is used as a basis of a series of resolution-increasing layouts. Note that unlike

the incremental case, initial coordinates for the constructed graphs Ck, are not available.

KK layout (Step 2): The KK algorithm [113] is used to compute a force-directed

layout of the coarsest graph, Ccoarsest. This algorithm is used in conjunction with the

FR [70] force-directed algorithm (in Step 4) in order to produce an aesthetic layout.

While the KK algorithm is good at producing a good placement from an arbitrary ini-

tial position, the FR algorithm produces a ”smoother” layout, is quicker, but is more

sensitive to the initial conditions given to it. Hence, combining the algorithms gives a

fast and aesthetic result. In our implementation 2000 iterations of the KK algorithm are

performed. Note that during incremental layout (Section 5.4.1) combining our multi-level

approach while reusing the previous layout as a starting point gives fast and good results

without incurring KK’s performance penalty.

Geometric partitioning (Step 3): The same algorithm as in step 4 of Algorithm 2

(Section 5.4.1) is used here.



FR layout (Step 4): In this step we perform force-directed layout of the current

graph in the hierarchy, C l. The algorithm is described in detail in Step 5 of Algorithm 2

(Section 5.4.1). Unlike the dynamic case, here pinning weights are not used and all nodes

are free to move in every layout iteration. In order to get improved results, we update

the node partitioning (Step 3) several times during the layout. The center of gravity of

each partition is updated every iteration, though. The algorithm terminates when the

layout of C0 = G0 is computed.

Interpolation (Step 5): In this stage the existing layout of C l is interpolated to

form an initial layout for the higher-resolution C l−1. Nodes in C l−1 are initially placed

near the position of their parent in C l.

5.5 Implementation

This section discusses the implementation of the algorithm. As will be shown in Sec-

tion 5.6, performing incremental layout, i.e. Algorithm 2, Step 5, (and similarly Algo-

rithm 3, Step 4) on the GPU can significantly accelerate the overall running time of the

algorithm. Therefore, in this section we focus on describing the GPU implementation of

this step.

On the GPU, parallel computation is achieved by rendering graphics primitives that

cover several pixels. The GPU runs a program called a kernel program for each pixel

candidate, called a fragment. The key to high performance on the GPU is using multiple

fragment processors, which operate in parallel. The GPU suits uniformly structured

data, such as matrices. The challenge is representing graphs, which are unstructured, in

a manner that makes efficient use of GPU resources.

Implementing static force directed layout on the GPU has been discussed in [65].

While the algorithm used here for static layout is different, the GPU implementation is

similar. This section reviews the GPU implementation and focuses on the changes needed

for dynamic layout.

Several textures are used on the GPU to represent the graph: the textures represent

the nodes, the partitions, the edges, and the forces. The location texture holds the (x,y)

positions of all the nodes in the graph. Each graph node has a corresponding (u,v) index

in the texture. As shown in Figure 5.5 (a), the nodes in each partition are stored in a



rectangular region in the location texture.

Bucket-sort is performed on the pinning weights of the nodes in each partition. Nodes

are placed into the texture in a left to right, top to bottom order, according to the bucket

they belong to, as shown in Figure 5.5 (b). The number of buckets is set to the number

of iterations of the layout algorithm. Sorting creates contiguous regions of nodes with

similar wpin values. This allows the algorithm to control the set of nodes whose positions

are updated at every layout iteration. Using appropriate rendering commands, the GPU

is instructed to process only the relevant nodes in each iteration, as discussed below.

Figure 5.5: Sorting nodes by pinning weight wpin on the GPU. (a) : A location texture

separated to regions, color coded by the partition each node belongs to. (b) : Nodes in

each region are sorted from low wpin to high wpin.

The partition center of gravity texture holds the current (x,y) coordinates of the center

of gravity of each partition. Graph edges are represented using the neighbors texture and

the adjacency texture. The adjacency texture contains lists of (u, v) pointers into the

location texture, representing the neighbors of each node. The neighbors texture holds

for each node v, a pointer into the adjacency texture, to the coordinates of the first

neighbor of the node. Pointers to additional neighboring nodes are stored in consecutive

locations in the adjacency texture. The neighbors texture also holds the degree of each

node. The forces computed during layout are stored in two textures: the attractive force

texture and the repulsive force texture. The attractive force texture contains for each

node the sum of the attractive forces F attr exerted on it by its neighbors. The repulsive



force texture holds the sum of repulsive forces, both by nodes in the same partition –

F replint and by the other partitions in the graph – F repl

ext .

The overall storage complexity is O(|V | + |E|): every node and edge is stored a

fixed number of times. Each node is represented as four 32-bit floating-point values

in the following textures: location (two textures), forces (two textures) and neighbors.

Each edge is represented twice in the adjacency texture (once for each of the nodes in its

endpoints), whose entries are also four 32-bit floating-point numbers. Due to performance

reasons, information about the graph partitions is stored in three textures holding four

32-bit floating-point numbers each. These textures have the same size as the textures

representing nodes.

Hence, in the current implementation, a total of 32 32-bit numbers are stored per node

and 8 32-bit numbers are stored per edge in the different textures. This amounts to about

8MB of texture memory for the fe pwt graph with (V,E) = (32045, 112395). Modern

graphics cards have hundreds of megabytes of texture memory, making accommodation

of very large graphs possible. Note that for implementation ease, textures holding four

32-bit numbers are used in all cases. This in not always required, and can further reduce

the memory footprint.

Computing each layout iteration is done in several steps, which are implemented as

kernel programs that run on the GPU. The partition CG kernel calculates the center of

gravity of each partition, as shown in the line numbered (1) in Figure 5.3. The repulse

kernel calculates the repulsive forces exerted on each node. This kernel first calculates for

each fragment it processes, the internal forces, e.g. forces exerted by nodes contained in

the partition that the fragment belongs to. Then, it approximates the forces by all other

partitions. See lines (2)-(4) in Figure 5.3. The attract kernel is used to calculate the

attractive forces caused by graph edges. For each node, the kernel accesses the neighbors

texture in order to get a pointer into the adjacency texture, which contains the (u,v)

location texture coordinates of the node’s neighbors. For each neighboring node, the

attractive force is calculated and accumulated. This corresponds to line (5) in Figure 5.3.

Finally, the anneal kernel calculates the total force on each node, F total, and displaces

nodes accordingly, as shown in lines (6),(7) in Figure 5.3. This kernel updates a second

copy of the location texture. This double buffering is required since the GPU can not



read and write to the same texture.

In total, the partition CG kernel performs O(|V |) operations; the repulse kernel per-

forms O(|V |1.5) operations; the attract kernel performs O(|E|) operations; and the anneal

kernel O(|V |) operations. On the GPU, the computations executed in each kernel, are

run in parallel. Since, as discussed below, only some of the nodes are operated on during

each layout iteration, in practice the average number of operations performed by each

kernel is lower than the maximum values presented above.

Recall that the nodes in each partition are sorted according to wpin, as shown in

Figure 5.5 (b). This allows us to control the nodes processed in each layout iteration, thus

spending GPU time only on the nodes which should move. Before each layout iteration,

for each rectangular texture region representing a partition of the graph, the rows which

contain nodes for which fracdone > wpin(v) are determined. A set of quadrilaterals which

cover the corresponding parts of each region are rendered. This instructs the GPU to

process only these nodes. OpenGL display lists are used in order to efficiently send these

rendering commands to the GPU. Note that this method operates on a per-row basis,

potentially causing a small amount of extra fragments to be processed for each region.

The processing of these extra fragments is avoided by conditionally updating the location

of a node only if fracdone > wpin(v).

Note that our implementation does not require copying data from GPU memory

(textures) to CPU memory while performing the layout iterations. Keeping the data on

the graphics card enables full utilization of the GPUs compute and memory bandwidth

resources.

5.6 Results

Two criteria are used to measure the quality of the resulting dynamic layouts: average

displacement of nodes between each pair of successive layouts and potential energy. The

first criterion measures the stability of the layout. The second criterion judges the quality

of the layout. Lower energy (in absolute value) implies low stress in the graph, corre-

sponding to a good layout. The energy U is derived from the relation ~F = −∇U . Hence,

given the force ~F , the energy can be derived by integrating. Given two nodes at positions



~u,~v, connected by an edge , the attractive force acting along the edge is

~F attr =1

K‖~u− ~v‖(~u− ~v) = −∇U attr,

hence

Uattr =−1

3K‖~u− ~v‖3.

The repulsive force between two nodes is

~F repl =−(~u− ~v)

‖~u− ~v‖2 K2 = −∇U repl,

hence

U repl =1

2K2log(‖~u− ~v‖2).

The total energy is computed by summing over all edges and over all node pairs: U total =

Uattr + U repl , e.g.

U total =∑

u:(u,v)∈E

−1

3K‖~u− ~v‖3 +

∑

u,v∈V,u6=v

1

2K2 log(‖~u− ~v‖2).

Other static graph layout quality criteria are indirectly handled by the underlying force

directed algorithm. Note that other criteria have also been used to measure mental map

preservation. For example the orthogonal ordering of nodes [145].

The quality of the layout is compared to two algorithms. The first is a force-directed

non-incremental algorithm that lays each graph in the sequence independently. This

algorithm, which is expected to produce the best layouts since it has no constraints,

is used to check the quality of our dynamic layouts. The second is a variant of our

dynamic algorithm which does not use pinning weights (e.g. wpin ≡ 0). This algorithm

demonstrates that simply using the previous placement is insufficient for generating stable

layouts. Note that the running time of these two algorithms is much higher than the

running time of our algorithm since they process all nodes in each layout iteration.

Several well–known graphs (3elt, 4elt, fe pwt, bcsstk31) are used to demonstrate our

algorithm [204]. The dynamic sequences are generated by performing random changes

on the graphs, modifying |E| and |V | by up to 15%. In addition, the sequences marked



Figure 5.6: Snapshots from layouts of the 3elt sequence (|V | ≈ 4000, |E| ≈ 10, 500),

left-to-right, top-to-bottom

threads1,2 and Rimzu come from real data, discussed in Sections 5.7, 5.8. In these graphs,

there are cases in which the changes between consecutive graphs in the series are small

(e.g. a few nodes are added). As discussed in Section 5.4.1, Step 5, the algorithm is able

to efficiently handle such changes by performing computations only on the nodes which

should be displaces in each layout iteration. Figure 5.6 shows a few snapshots from the

dynamic graph layout of 3elt.

Another example is Newcomb’s fraternity data [152], which represents friendship rela-

tions between college students. This data was visualized using the SoNIA tool for social

network visualization [11,12,149]. As discussed in [149], the Newcomb data is best visual-

ized by the peer-influence (PI) algorithm of SoNIA, where nodes are displaced according

to forces exerted by neighbors.

Table 5.1 shows average results for the layout quality metrics. (Lower values are

better.) The ∆pos column shows the average displacement of nodes and the |U total|column shows the absolute value of the potential energy of the graph. It is clear that our



graph rimzu threads1 threads2metric ∆pos |U total| ∆pos |U total| ∆pos |U total|non-incr 31.4 4418 1.45 39.2 1.06 9.72basic-incr 4.62 4435 0.333 40.4 0.297 9.81ours 0.274 3418 0.042 30.3 0.048 5.55

graph newcomb 3elt fe pwtmetric ∆pos |U total| ∆pos |U total| ∆pos |U total|non-incr 0.48 1.82 25.9 2.73x105 105.5 9.59x105

basic-incr 0.221 1.81 2.3 3.06x105 10.7 9.37x105

ours 0.099 1.94 0.968 2.79x105 3.62 8.1x105

Table 5.1: Layout quality - values are averages for a sequence of layouts

incremental algorithm outperforms the other algorithms and maintains dynamic stability.

The potential energies achieved by all algorithms are similar, demonstrating that the

quality of layouts computed by our algorithm is good. In some cases (like fe pwt) the

two incremental algorithms surprisingly perform better than the static one. This is due

to the fact that the force-directed algorithm finds a local minimum which depends on

the initial conditions, which are different for each algorithm used here. In summary,

the results demonstrate that our algorithm computes aesthetic layouts while decreasing

the movements of the nodes. This reduction does not come at the expense of layout

quality. The algorithm tries to maintain the structure of the graph, using node pinning

to propagate changes across the graph, allowing for new landmarks to be created, while

at the same time maintaining the mental map. Note that compared to the algorithm

of [66], using a multi-level incremental algorithm somewhat reduces the stability of the

layout. However, this gives the algorithm an opportunity to calculate a higher quality

layout.

Figure 5.7 shows a comparison of the SoNIA layouts using the PI algorithm and our

layouts. As can be seen, one of the advantages of our algorithm is the greater stability in

node positions, especially when only the edges of the graph are modified. Although both

SoNIA and our algorithm are based on force-directed methods, the more sophisticated

initial placement and pinning algorithms help improve the results.

For our performance tests we used two computers. The first is a PC with a 3 GHz

Pentium IV CPU and an NVIDIA 7900GS GPU. The second is a newer PC with a 2.4

GHz Intel Core 2 Duo E6600 CPU and an NVIDIA 8800GTS GPU. Our algorithm was

implemented using C++, Cg and OpenGL.



Figure 5.7: Snapshots from the layouts of the newcomb fraternity data [152]. Left: our

algorithm. Right: SoNIA algorithm [11,12], used in [149].



Graph name avg. |V | avg. |E|3elt 4097 104684elt 14588 40176bcsstk31 32715 48495fe pwt 32045 112395

Table 5.2: Graph sequence information.

Graph 3GHz Pentium + 7900GS GPU 2.4GHz Core 2 + 8800GTS GPUname initial layout dynamic layout initial layout dynamic layout

CPU CPU+GPU CPU CPU+GPU CPU CPU+GPU CPU CPU+GPU3elt 2.72 1.49 0.764 0.249 1.72 1.27 0.436 0.24elt 17.6 2.98 5.91 0.777 10.4 2.22 3.38 0.39bcsstk31 50.4 9.28 21.2 4.74 34 9.61 12.1 1.38fe pwt 47.7 6.03 21 2.1 28.8 4.27 12 0.704

Table 5.3: Running times [sec.]. The running times of the CPU only and GPU-accelerated

implementation of the algorithm are shown. All times shown are total running times for

computing a layout. Dynamic layout times are averaged over a sequence of layouts.

Table 5.2 gives information about the graph sequences and Table 5.3 shows running

times - when using only the CPU and when using the GPU to accelerate the compu-

tation. As can be seen in the table, our GPU implementation provides a significant

speedup compared to the CPU. Using the older 7900GS GPU, a speedup of up to 10

times is achieved. Using the newer and faster 8800GTS GPU, the speedup increases to

up to 17 times, compared to the latest CPU. Due to the high ratio of arithmetic opera-

tions to memory accesses, the algorithm is compute and not memory bound. Therefore,

as demonstrated in the comparison between the PCs, the GPU implementation of the

algorithm is scalable.

Focusing on the part of the algorithm that runs on the GPU leads to interesting

insights. For the fe pwt graph, the average time for computing the FR incremental

layout stage using the 7900GS GPU was 1.66 seconds. Using the 8800GTS GPU, the

time dropped to 0.417 seconds. This represents a significant performance increase between

GPU generations (∼ 4 times), which is larger than the performance increase between the

CPU generations [163]. The speedup is achieved while taking into account the overhead

of instructing the GPU to perform the layouts, which can be significant in the coarser

graphs. The speedup of performing the last layout stage (on the finest graph) is about 8

times.



There are several factors contributing to the increase in performance between the

GPUs. The new GPU has a different architecture, which is better suited for dealing with

graphs. Due to its smaller branch granularity, a smaller penalty is encountered when

dealing with non-uniform data, such as graphs. In addition, the 8800GTS uses a scalar

architecture, which is more efficient here, since the algorithm deals mostly with 2D and

1D quantities. Finally, the new GPU has more raw compute power.

5.7 Application to Discussion Thread Visualization

We applied our algorithm to the visualization of Internet discussion forums. We col-

lected data from several discussion threads at http://www.dailytech.com . This site

contains various hi-tech related news items. The discussion threads visualized contain

the comments people make on the news items. In the graph, each node represents a user.

Edges are constructed between the user adding a comment and users which replied to

that comment. Each discussion thread is represented by a node labeled A n where n is

the discussion thread number (corresponding to a news item).

In order to create the visualization, shown in Figures 5.1 and 5.8, several steps are

executed. First, the graph is transformed into a connected graph, as required by the graph

layout algorithm. This is achieved by adding an invisible root node and connecting it with

invisible edges to all the A n nodes representing the discussion threads. The connected

graph is then handed to the incremental layout algorithm.

Second, in order to improve the visualization of the computed layout sequence, over-

lapping between node labels is addressed. A set of bounding boxes of drawn node labels

is maintained and updated after each label is drawn. If a new label to be drawn inter-

sects any of the bounding boxes of already drawn labels, it is drawn at the background

– farther away from the viewer and with a lighter color. Doing so prevents the new label

from occluding the text of any previously drawn labels. If a new label does not intersect

any of the existing labels, it is drawn in the foreground. Before each node label is drawn,

a rectangle with the same color as the background is drawn behind the node label. This

is done so each pixel will display the text of a single label (preventing overlaps).

Third, during animation, the nodes are drawn in a specific order which is designed

to visualize the interesting features of the evolving graph sequence more clearly. The



labels of important nodes should receive priority when drawn. These include nodes with

a high degree, acting as central nodes and in the graph, and nodes whose neighborhood

in the graph has changed. Each node is assigned a score. Nodes with a higher score

are rendered before nodes with a lower score. This reduces the probability that an

important node’s label will be occluded. The score of each node v is set to score(v) =

degree(v) + β · degree change(v), where degree change(v) is the change of the degree of

node v between the current and previous graphs. The score helps emphasize the main

features of the evolving graph sequence. The constant β can be changed by the user. Its

default value is 2.

Figure 5.1 shows a sample visualization of 7 discussion threads with 119 users. Al-

though during visualization the graph more than doubles, our layout manages to preserve

the mental map. Several insights can be gained from the visualization. Clusters are ev-

ident around the A n nodes, representing each discussion thread. As time progresses,

more clusters, representing new discussion threads, become visible. There are clusters of

various sizes – correlating to threads drawing different levels of attention. Some users

post messages on several threads while others discuss only one topic. Some users are

very active and post many messages, acting as central nodes in the graph. The degree of

nodes representing such users increases over time and they contribute to the connectivity

of the graph. Some users, who are drawn at the boundaries of the graph, contribute only

one comment.

As a second example we studied the latest headlines section of the website. We

selected five items, appearing over a span of three days, from seemingly unrelated fields:

computer games, nuclear fusion, low-cost PCs, Windows/Linux switch and Christmas

e-shopping. The number of comments for each article varied from 15 to 31. A total of 86

users contributed to the discussion threads. Figure 5.8 presents several snapshots from

the animation sequence showing the evolution of these discussion threads over time. A

movie showing the visualization is available in the supplementary material.

Looking at the visualization, several conclusions can be drawn. The graph is initially

partitioned into disconnected clusters, representing nuclear fusion, low-cost PCs and com-

puter games. Later, connections start to appear in the graph. The threads discussing

low-cost PCs and Windows/Linux switch are highly connected. Some connections exist

between these clusters and the computer game cluster. Surprisingly, several users dis-



Figure 5.8: Snapshots from the threads2 graph sequence, visualizing discussion threads at

http://www.dailytech.com, left to right, top to bottom. 109 messages from 86 users in 5

discussion threads are shown. Discussion topics, marked as blue A n nodes, include com-

puter games (A 5054), nuclear fusion (A 5027), low-cost PCs (A 5060), Windows/Linux

switch (A 5069) and Christmas e-shopping (A 5082) .

cussing nuclear fusion join both the computer games and Windows/Linux switch threads.

Good correlation also exists between nuclear fusion and the Christmas e-shopping dis-

cussion.

5.8 Application to Social Network Visualization

Our algorithm was applied to the visualization of the growth of social networks. We used

data from the social network at http://www.rimzu.com. In this network, new users can

register after receiving an invitation from an existing user. Each user is able to list a set

of friends among the members of the network. In the visualization, users are represented

as nodes. Edges link each user to his/her friends.

Figure 5.9 shows a visualization of the growth of this network. The visualization



Figure 5.9: Snapshots from the Rimzu graph sequence, visualizing the social network at

http://www.rimzu.com, left to right, top to bottom. Nodes represent users and edges

represent connections between users. In the visualization the graph grows from V=216,

E=544 to V=962, E=1561. Nodes are colored by age in a red→ yellow → green scale.



shows a period in time where the network grew considerably, from 216 nodes to 962

nodes. The visualization was created by constructing the graph of the network at equally

spaced intervals in time. As in the Internet threads visualization, a dummy invisible root

node was added in order to make the graph connected.

Several properties of the network are evident from the created visualization. The

graph has dozens of connected components. The fact that the graph is not connected is

surprising since members are able to join the network only after receiving an invitation.

There are many users who joined the network but did not list any friends. They are

represented as a cluster of nodes with degree zero (no edges). There are components of

varying complexity in the network. Some are very simple, connecting a handful of nodes,

while others are large and highly connected. Several tree-like components are visible.

These correspond to one user with several friends who are not linked between themselves.

There is one large component which exists from the beginning of the visualization.

Coloring the nodes by age reveals more information on the graph. Some components

of the graph were created in a relatively short time frame. Others, such as the large

component on the right, grow continuously.

Note how the algorithm manages to compute a stable, mental-map preserving layout

of the dynamic graph sequence while at the same time providing meaningful layouts

from which the insights discussed above can be extracted. This is especially challenging

due to the large growth of the network in the period visualized. A movie showing the

visualization is available in the supplementary material.


We have presented an online algorithm for dynamic layout of graphs, whose goal is to

efficiently compute stable and aesthetic layouts. The algorithm has several key ideas.

First, a good initial layout is computed. Second, the allowed displacement of nodes

is controlled according to the changes applied to the graph. In particular, each node

is assigned an individual convergence schedule. Third, the global interactions in the

graph are approximated in order to maintain the structure of the graph and compute an

aesthetic layout. Fourth, a multi-level scheme is used in order to compute high-quality

layouts. Last but not least, the GPU is used to accelerate the algorithm, requiring the



representation of unstructured graphs in an ordered manner that fits the GPU.

It has been demonstrated that the algorithm computes an aesthetic layout, while

reducing displacement and maintaining the user’s mental map between layout iterations.

Our GPU implementation of the algorithm performs up to 17 times faster than the CPU

version. We have applied our algorithm to the visualization of discussion threads on the

Internet and to social network visualization.

There are several avenues for future research. An interesting research direction is

the extension of the algorithm to drawing multi-level clustered graphs. Finding ways to

implement more parts of the algorithm on the GPU will help accelerate the computa-

tion. Improving the algorithm used for morphing between layouts can further help in

maintaining the mental map.


Chapter 6

Dynamic Drawing of ClusteredGraphs

This chapter presents an algorithm for drawing a sequence of graphs that contain an

inherent grouping of their vertex set into clusters. It differs from previous work on

dynamic graph drawing in the emphasis that is put on maintaining the clustered structure

of the graph during incremental layout. The algorithm works online and allows arbitrary

modifications to the graph. It is generic and can be implemented using a wide range

of static force-directed graph layout tools. This chapter introduces several metrics for

measuring layout quality of dynamic clustered graphs. The performance of our algorithm

is analyzed using these metrics. The algorithm has been successfully applied to visualizing

mobile object software. This chapter is based on [63].

The rest of this chapter is structured as follows. An introduction is given in Sec-

tion 6.1. Section 6.2 defines the problem. Section 6.3 describes the algorithm. A software

visualization application is presented in Section 6.4. Finally, Section 6.5 concludes and

discusses future directions.

6.1 Introduction

In clustered graphs, the vertices are divided between a set of components called clusters,

which form a partition of the vertex set. In some applications, the graphs are inherently

clustered [25]. In other cases, clustering has been successfully used in order to aid in the

visualization of graphs [210].

Many applications require the ability of dynamic graph drawing, i.e., the ability of


6. Dynamic Drawing of Clustered Graphs 106

Figure 6.1: Snapshots from an animation sequence

modifying the graph [21, 47, 155]. Different types of graph modifications may be per-

formed: adding vertices and clusters, moving vertices between clusters, removing edges,

etc. The challenge in dynamic graph drawing is to compute a new layout that is both

aesthetically pleasing as it stands and fits well into the sequence of drawings of the

evolving graph. The latter criterion has been termed preserving the mental map [145]

or dynamic stability [155]. A short animation sequence showing incremental layouts of

clustered graphs computed by our algorithm is shown in Figure 6.1. In this dynamic

scenario, vertices move between clusters and thus the size of clusters change, edges are

added, and clusters are added and removed. Yet, the relative locations of the clusters

and the vertices are preserved, while allowing changes in the size of clusters when deemed

necessary.

One field in which clustered graphs arise is software visualization, and in particular,

visualization of mobile object frameworks [38, 102, 130]. Such frameworks extend the

distributed objects concept [158,195] in allowing the objects to migrate to remote hosts,



along with their state and behavior, while the application is executing (in order to speed

up interaction).

In these frameworks, the notion of a dynamic clustered graph arises quite naturally.

Every object is represented by a vertex in the graph. A machine is represented as a cluster

that contains the objects currently residing in it. The area occupied by a cluster is used

as a visual clue to the user regarding the number of objects located in the machine

represented by the cluster. Naturally, the graph being visualized evolves with time,

as objects migrate between machines and machines connect and disconnect from the

network. Our algorithm has been designed to show these interactions.

(a) Force-directed non-incremental layout

(b) Our incremental layout

Figure 6.2: Incremental vs. non-incremental layout (from left to right)

Compared to graph which are not clustered, work on clustered graph drawing is less

widespread. In [206], a divide and conquer approach, in which each cluster is laid out

separately and then the clusters are composed to form the graph, is used. In [51], a

method of drawing the clustering hierarchies of the graph using different Z coordinates

in a 3D view is discussed. See also [14, 116] for a discussion of clustered and compound

graph layout.

Several algorithms address the problem of offline dynamic graph drawing, where the

entire sequence is known in advance. In [47], a meta-graph built using information from

the entire graph sequence, is used in order to maintain the mental map. In [128] a

stratified, abstracted version of the graph is used to expose its underlying structure. An

offline force directed algorithm is used in [56] in order to create 2D and 3D animations



of evolving graphs. Creating smooth animation between changing sequences of graphs is

addressed in [19].

An online graph drawing algorithm is discussed in [132], where a cost function that

takes both aesthetic and stability considerations into account, is defined and used. Un-

fortunately, computing this function is very expensive (45 seconds for a 63 node graph).

Drawing constrained graphs has also been addressed. Incremental drawing of DAGs is

discussed in [155]. Dynamic drawing of orthogonal and hierarchical graphs is discussed

in [86].

The DA-TU system described in [105] allows navigating and interactively clustering

huge graphs. In [138] an algorithm that tries to improve the distribution of nodes in a

graph while maintaining the mental map is described. Finally, some commercial graph

layout packages such as [200,209] contain provisions for dynamic layout of graphs. As far

as we know, none of the above was designed to handle incremental drawing of clustered

graphs. Here, we wish to support adding and removing nodes, clusters and edges and

moving nodes between clusters.

In this chapter we propose a new algorithm for online incremental layout of clustered

graphs. The algorithm does not impose restrictions on the structure of the graph. It

allows drawing of edges not only between vertices but also between clusters, which is

used to convey information to the user. The algorithm provides a means of separating

the set of vertices in each cluster to a subset of vertices that stay in the same cluster

and a subset of vertices that might move to a different cluster. The layout of the vertices

inside the cluster is influenced by this separation.

The major design consideration of our algorithm is preserving the mental map while

the graph is being updated. We show that force directed layout techniques [18, 113,

199] can be used as a basic building block. However, they cannot be used as is, as

demonstrated in Figure 6.2(a), where clusters and vertices move considerably between

successive drawings. We propose a few enhancements to existing algorithms in order to

preserve the mental map, as shown in Figure 6.2(b), where only small variations in cluster

location and size are exhibited. Also note the stability of the vertices inside the clusters

as opposed to the non-incremental layout.

A key consideration in designing algorithms is the desirable properties of the results.

This chapter proposes several criteria for evaluating the quality of dynamic clustered



graphs. They include space compactness, minimization of the changes between frames

and run-time efficiency. We demonstrate that our algorithm performs well according to

these properties. Moreover, we show that this is the case when considering a software

visualization application.

6.2 Problem Statement

This section defines clustered graphs and possible graph updates. It also discusses criteria

by which the quality and stability of the layout is evaluated.

Definition 6.2.1. Partition: A k-way partition of a set C is a family of subsets

(C1, C2, . . . , Ck) such that⋃k

i=1 Ci = C and Ci ∩ Cj = ∅ for i 6= j.

Definition 6.2.2. Clustered Graph: A clustered graph is an ordered quadruple G =

(V,C,Ev, Ec), where V is the vertex set, C is a set of clusters which form a partition of

the vertex set V , Ev is the set of vertex-vertex edges Ev ⊆

(vi, vj)|i 6= j, vi, vj ∈ V

and

Ec is the set of cluster-cluster edges Ec ⊆

(Ci, Cj)|i 6= j, Ci, Cj ∈ C

.

Given a series of clustered graphs G1, G2, . . . , Gn, the goal of the algorithm is to

produce a sequence of layouts L1, L2, . . . , Ln, where Li is a drawing of Gi, such that the

sets Vi, Ci, Evi, Eci

are assigned coordinates. Since the sequence of graphs Gi is not

known in advance, the algorithm is an online algorithm. The updates Ui that can be

performed between successive elements Gi−1 and Gi are: Adding or removing vertices,

edges or clusters, and modifying the partition of vertices into clusters (i.e. moving vertices

between clusters).

A key issue in incremental graph drawing is the stability of the layouts [145,155]. This

is important since a user looking at a graph drawing gradually becomes familiar with the

structure of the graph. We propose the following criteria for evaluating the quality of the

layout [23,145,155,199]:

1. The movement of clusters between successive drawings should be small. Specifically,

clusters that are not modified should remain in their previous position if possible.

The location of clusters plays an important role in the user’s mental map of the

graph.



2. The change in cluster size between successive drawings should be minimal when the

number of vertices in the cluster is similar. Unnecessarily large deviations in size

cause the user to be distracted.

3. Movement of vertices inside a cluster should be minimized. This improves layout

stability.

4. The size of each cluster Ci should be proportional to the number of vertices it

contains. This allows the user to quickly understand how the mobile objects are

distributed between cores.

5. In order to conserve screen space, the drawing of each cluster Ci should be compact.

6. In order to reduce graph cluttering, overlapping between vertices should be avoided

and overlapping between cluster boundaries should be minimal.

Our application to software visualization adds an additional requirement. The vertices in

each cluster are divided into two subsets, static objects that remain at the same cluster

throughout the animation and movable objects. This should become visually apparent.

Note that there are classical aesthetic criteria such as the number of edge crossings,

the total edge length, etc. which we ignore here. However, the underlying static algorithm

used addresses these criteria.

6.3 The Algorithm

Given a sequence of clustered graphs G1, G2, . . . , Gn, our goal is to compute a sequence of

graph layouts L1, L2, . . . , Ln, so as to adhere as much as possible to the criteria discussed

in Section 6.2. A possible approach is to develop an incremental algorithm for drawing

clustered graphs from the ground up. A different approach, which we have pursued, is to

use an existing non-incremental graph layout algorithm as a basic block, and build the

incremental layout capability on top.

Among the different classes of graph drawing algorithms, the force directed algorithm

class seems to be the natural choice in our case [18, 40, 48, 113, 199]. Roughly speaking,

this approach simulates a system of forces defined on the input graph and outputs a local

minimum energy configuration. An edge is simulated by a spring connecting its endpoint



vertices. Edge length influences the optimal spring length and edge weight determines its

stiffness. The algorithm converges towards a minimum energy position, starting from an

initial placement of the vertices. In our case, the previous layout, Li−1, can be used as a

starting position for the new layout, Li. Extending a force directed algorithm to perform

a layout of clustered graphs is discussed in Section 6.3.2.

Our algorithm’s requirements from the underlying force-directed static layout algo-

rithm are that there exist ways to assign initial coordinates to vertices, to restrict their

movement, to set edge lengths and to add support for drawing clusters. Since little as-

sumptions are made regarding the underlying layout algorithm, a wide variety of existing

layout tools can be used. As such, our algorithm can add incremental layout capabilities

to most existing packages.

In our implementation we use the GraphViz graph drawing package [53] and its force

directed layout component, Neato [78, 113]. Neato avoids overlaps between vertices and

allows setting preferred edge lengths and weights. It also allows pinning down vertices.

Pinned vertices are not moved while the algorithm converges by moving vertices according

to the forces acting on them. However, Neato neither supports clustered graphs nor does it

support controlling the repulsive forces between vertices. These deficiencies are addressed

by our algorithm, as will be described next.

We adopt the proposition made in [155] that vertex stability is more crucial than

edge stability. Specifically, we prefer changing edge lengths rather than moving vertices.

Moreover, in our case, cluster stability is more significant than vertex stability. Thus,

our algorithm utilizes the following key ideas.

First, dummy vertices and edges are used in order to create a clustered structure.

Since clusters are treated as vertices, their motion can be controlled. Second, invisible

place-holder vertices are used in order to minimize the movement of clusters and of

vertices within clusters. This is done while maintaining compactness and keeping the

size of the clusters proportional to the number of vertices they contain. Third, edge

length and weight are used as a means of controlling the changes made to the layout.

Fourth, to achieve both dynamic stability and distinguish between stable and movable

vertices, the set of vertices is partitioned into two sub-sets – stable and movable. The

subsets are laid out in a structure that approximates two concentric circles around the

center of the cluster. Static objects are placed in the inner circle and movable objects in



the outer one.

These ideas are elaborated in this section. After outlining the algorithm, various

phases and aspects of the algorithm are discussed in detail, including cluster support,

minimization of visual changes, and animations of graph updates.

6.3.1 Overview

To compute layout Li, only the last layout, Li−1, and the new graph that needs to be

laid out, Gi, are used. This is a fast and simple approach that fits well with the view

that incremental layout performs some local changes in the graph. In other words, the

previous layout is considered as a good starting point for the new layout, with some

adjustments made according to the changes that occurred.

The first step in computing the new layout, described in Section 6.3.4, is a merge

stage, which merges layout Li−1 and graph Gi. In the second stage, an actual layout,

L1i , is computed using a static force directed layout algorithm with the modifications

described in Sections 6.3.2–6.3.3. In the third stage, the quality of this layout is checked,

as described in Section 6.3.5. If the layout is deemed satisfactory, it is accepted and Li

= L1i . Otherwise, a second layout attempt is performed, producing layout L2

i . During

this attempt, more freedom is given to the layout algorithm in terms of moving vertices,

at the expense of weakening the connection between the old and the new layouts. The

better of L1i and L2

i is selected as the final drawing Li. The final stage of the algorithm,

described in Section 6.3.6, animates the change between the drawings Li−1 and Li in a

smooth manner. The algorithm is summarized in Figure 6.3.

6.3.2 Supporting Clusters

Adding an invisible dummy attractor vertex to each cluster, to which all of the vertices in

the cluster are connected with invisible edges, is proposed in [25], where repulsive forces

are also used, in order to increase cluster separation. One of the approaches discussed is a

divide and conquer algorithm, in which the clusters are first laid out separately and then

the different layouts are composed together. A hybrid approach that solves the problem

of neglecting inter-cluster edges, caused by this algorithm, is discussed in [206].

We follow the approach of adding a dummy vertex to each cluster. However, separa-



procedure incremental drawing ( Li−1, Gi ) Gm

i = merge graphs ( Li−1, Gi )L1

i = layout graph ( Gmi )

if ( L1i is good enough )

Li = L1i

else L2

i = layout graph ( modify graph ( L1i ) )

Li = better ( L2i , L1

i )animate change ( Li−1, Li )

Figure 6.3: Algorithm overview in pseudo-code

tion between the clusters and meeting the other requirements described in Section 6.2,

is achieved differently. It is accomplished through proper settings of edge lengths and

weights, as described below.

Five kinds of edge lengths are utilized and indicate the expected level of proximity

between their adjacent vertices. The shortest length is assigned to the invisible edges

connecting static vertices to the dummy vertex of the cluster they belong to. The edges

connecting movable vertices and the dummy vertex are assigned longer lengths. This

creates a layout that resembles two concentric circles. The next type of edges is the

edges between vertices. If both vertices at the endpoints of the edge are contained in the

same cluster, a shorter length is set than if the vertices are in different clusters. This

increases the separation between clusters. The last kind of edges are cluster-cluster edges.

The length of these edges is variable and depends on the requested proximity between

the different clusters, which is determined by the application, e.g., by the amount of

interaction between clusters.

Edge weights are also used in our algorithm. Higher edge weights instruct the un-

derlying force-directed algorithm to try harder to generate edges with lengths close to

the optimal lengths supplied to the algorithm (as discussed above). Inter-cluster edges

are assigned lower weights than intra-cluster edges. This is done in an attempt to give

inter-cluster edges less influence on the layout. This is important when vertices move

between clusters. In such cases, it is preferable to stretch or shorten the length of the

edges somewhat, rather than displace vertices.



In our implementation, the lengths assigned to the edges connecting a static vertex to

a dummy vertex, a movable vertex to a dummy vertex, two regular vertices in the same

cluster and two regular vertices located in different clusters, are 1, 2, 1.5 and 4 units of

length, respectively. The lengths assigned to cluster-cluster edges vary between 5 and

6 units, where the dummy vertices are used as endpoints for cluster-cluster edges. The

weight of intra-cluster edges is set to 1 unit and the weight of inter-cluster edges is set to

2.5 units. These values represent a compromise between stable layouts to aesthetic ones.

Allowing the user control over these parameters will tailor the visualization to the user’s

preferences.

6.3.3 Minimizing Visual Changes

Invisible vertices, called spacer vertices, are added to each cluster, in an attempt to reduce

the change in clusters’ outlines and minimize the movement of clusters between successive

layouts.

The spacer vertices are used as place-holders for regular vertices in a cluster. They are

connected with invisible edges to the dummy vertex of the cluster to which they belong,

like any other vertex in the cluster. When a vertex is removed from a cluster, a spacer

vertex is added to the cluster instead of it. The initial location of the spacer vertex is set

to be the location of the vertex that left the cluster. This is done in order to keep the

size of the cluster constant and in order to reserve space for a new vertex that might be

added to the cluster in the future. When a vertex moves (or is added) to a cluster, the

spacer vertex that is closest to its previous location is replaced by this new vertex.

However, when adding or removing spacers, the algorithm keeps the number of spacers

in a cluster between an upper and a lower fraction of the number of vertices in the cluster.

This is done in order to give the algorithm breathing room when modifying clusters.

Moreover, the limits are set so as to avoid a case in which a cluster with a very small

number of regular, visible vertices occupies a large area due to the many spacer vertices

it contains.

When calculating the outline of each cluster, which is often simply the bounding

box, the spacer vertices are taken into account as if they were regular visible vertices.

Obviously, this minimization of the movements comes at the expense of extra screen



space, which is occupied by the spacers.

6.3.4 Merging Graphs

The first step in performing the incremental layout is merging the new graph to be drawn,

Gi, and the previous graph drawing, Li−1. The result of the merge stage is a partially

laid out graph, Gmi , in which some of the vertices are assigned initial coordinates. After

merging, the graph Gmi is laid-out by the static layout algorithm. The quality of the

resulting incremental layout depends on the initial conditions computed by the merging

algorithm.

Merging is performed in several steps. Unchanged and dummy vertices are assigned

initial coordinates from Li−1. Then, clusters to which vertices were both added and

removed are handled. The added and removed vertices of a cluster are paired-up, and

the initial coordinates of an added vertex is set to the coordinates of a removed vertex.

Then, vertices that were added to a cluster or removed from it, but cannot be paired-

up, are handled, as discussed in Section 6.3.3. Next, the vertices in new clusters, that

is clusters that exist in Gi but not in Li−1, are inserted into the graph without initial

coordinates, along with new spacer vertices. The number of the latter is set to a constant

fraction of the number of vertices in the cluster.

The last stage of merging involves vertex pinning, which restricts vertex movement,

allowing it to move only as an indirect result of the movement of an unpinned vertex. We

have experimented with several strategies for computing the set of vertices to be pinned.

Our conclusion is that pinning all vertices that were assigned coordinates achieves good

results in terms of the dynamic stability of the layout. We have also observed that in

most cases the resulting layouts are aesthetically pleasing.

6.3.5 Improving the Layout

After computing the graph layout L1i , a cluster density metric determines whether the

layout is of satisfactory quality. For a cluster Ci, we define

density metric(Ci) =area(bounding box(Ci))

number of vertices(Ci).



That is, the density metric of a cluster is the ratio between the area of its bounding

box and the number of vertices it contains. Higher values imply that the vertices in the

cluster are spaced further apart, which is not desirable. For the entire graph G we define

density metric(G) = maxCi∈Gdensity metric(Ci).

Experience has shown that a correlation exists between high density metric values and

overlaps between clusters.

A second layout, L2i , is computed if the value of the graph density metric exceeds

a threshold. To improve the layout, the restrictions on vertex movement are relaxed.

The layout algorithm is re-run with the positions of the vertices in L1i as the initial

condition. This time the vertices are not pinned down. This gives the layout algorithm

more freedom and allows it to converge to a better result. The new layout L2i still

resembles L1i because of the supplied initial condition. The final layout is selected as the

layout with the lower density metric between L1i and L2

i . Clearly, the choice between

L1i and L2

i demonstrates the tradeoff between preserving the mental map and creating

an aesthetically pleasing layout. It should be noted that initial attempts to use more

relaxed constraints when computing L2i , such as removing some of the assigned vertex

coordinates, were counterproductive.

6.3.6 Display and Animation

We have investigated display in three dimensions, as illustrated in Figure 6.4, in order to

distinguish between vertex types and edge types. Vertex-vertex edges are drawn on the

lower plane, while cluster-cluster edges are drawn on the upper plane. In 3D, a cluster

is drawn as a semi-transparent pyramid with the cluster’s dummy vertex, which is the

endpoint of cluster-cluster edges, drawn at the apex of the pyramid. One of our guidelines

in creating this visualization is being able to collapse the 3D view into a 2D view in a

natural and comprehensible way, as illustrated in Figure 6.5, which shows a 2D drawing

of the graph from Figure 6.4. Color is also employed in order to help the user comprehend

the image – each cluster has a different color.

The transition between Li−1 and Li is performed using a sequence of intermediate

drawings generated by a linear interpolation of the coordinates of vertices, edges and

cluster boundaries.



Figure 6.4: 3D view of a clustered graph

Figure 6.5: 2D view of a clustered graph

6.4 Visualizing Mobile Object Software

Our layout algorithm has been used in the visualization of mobile object applications [38,

102, 130]. This framework extends the distributed objects concept, where objects can

migrate to remote hosts, along with their state and behavior, during the execution of the

application. The visualization should expose the connections, interactions and movements

of the objects that are distributed throughout a computer network. We discuss this

application in detail in Chapter 7. Here, we briefly review the visualization.

In our visualization, every object is depicted by a vertex. Connections between objects

are drawn as vertex–vertex edges. Each machine is represented by a cluster that contains



all of the objects currently residing on that machine. The set of cluster–cluster edges is

used to display physical connections between machines, as opposed to logical relations

that exist between objects.

Our algorithm is demonstrated in Figures 6.6 and 6.7 as well as in Figures 6.1 and 6.2.

A movie showing results is available at http://www.ee.technion.ac.il/∼ayellet/Movies/-

FrishmanTal-1.mov . The algorithm was tested on several graph sequences. Some of them

represent executions of real mobile object applications and others represent simulated

data.

To measure the quality of the resulting layouts, we identify several criteria. The first

is the density metric discussed in Section 6.3.5, which is used to measure the compactness

of the layout. The second is the sum of displacement of clusters between each pair of

successive layouts, which is used to measure the stability of the layout. The third is

the percentage of clusters with the same size between successive layouts, which helps to

demonstrate the effectiveness of using spacer vertices in minimizing visual changes to the

graph.

The performance of our algorithm is compared to two other algorithms. The first

is a non-incremental algorithm which computes each layout from scratch using force-

directed methods. The second is a variant of our incremental algorithm in which vertices

are assigned initial coordinates computed in the merge stage, but vertex pinning and

spacer vertices are not used. We use this second algorithm in order to show that simply

reusing the initial coordinates from the previous graph does not yield satisfactory results.

Figure 6.6 shows a comparison of the layouts computed by the three algorithms. Note

that only our algorithm manages to compute stable layouts.

Figures 6.8-6.11 present a quantitative comparison of our algorithm to the two other

algorithms. The density metric is plotted in Figure 6.8. Higher values in the graph

represent sparse clusters, which should be avoided. All three algorithms produce simi-

lar results, which means that the incremental algorithm manages to compute compact

layouts of the graph. Figure 6.9 shows the sum of the displacements of clusters between

each pair of successive layouts. Lower values imply higher stability in the location of

clusters. As can be seen, our algorithm outperforms the other algorithms. Figure 6.10

depicts the number of clusters that maintain their size between each pair of successive

layouts. Higher values imply that there are less modifications to cluster outlines. It is



(a) Non-incremental layout

(b) Incremental layout without using pinning and spacers

(c) Our incremental layout

Figure 6.6: Comparing the three layout algorithms

clear from the graphs that our algorithm produces much better results than the other

algorithms. Finally, Figure 6.11 depicts the running times of the algorithms. Both incre-

mental algorithms take more time to compute than the non-incremental algorithm. This

is mostly due to the extra processing done in the merge stage.

Table 6.1 summarizes the average values of each of the above metrics. All algorithms

produce similar cluster densities. The cluster displacement of our algorithm is by far supe-

rior to the non-incremental algorithm, averaging about one twelfth of the non-incremental

algorithm. Reducing the movement of clusters has indeed been one of the main design

goals of the algorithm. The average percentage of clusters that remain with the same size

in our algorithm is about four times as much as the non-incremental algorithm. This is

facilitated by the spacer vertices that are used to minimize visual changes to the graph.

Finally, the running times of both incremental algorithms is about twice the running time

of the non-incremental algorithm, which is reasonable.




We have presented an online algorithm for incremental layout of clustered graphs. The

algorithm uses a force directed static layout tool as a basic building block. The key idea of

the algorithm is to establish priorities of avoiding changes. First and foremost, movement

of clusters should be avoided, because clusters give insight into the basic structure of the

graph. Then, movement of vertices should be avoided, since vertices convey information

regarding the size of the clusters and aid in navigating the graph. Movement of edges is

considered the least critical.

To achieve this, our algorithm incorporates a few novel concepts. First, crucial vertices

(dummy and old) are pinned down. Second, invisible place-holders are used to minimize

changes. Finally, lengths and weights of edges are used to control both vertex placement

and graph modifications.

It has been demonstrated that the algorithm computes a compact and space efficient

graph layout, while minimizing the displacement and changes to clusters between layout

iterations.

The algorithm has been applied to the visualization of mobile object environments,

where both real and simulated data has been tested. Good results have been achieved

at the expense of higher running times. This is due both to the added complexity of

the algorithm and to the fact that our implementation is only loosely coupled to the

underlying static layout tool.

In future research, we plan to investigate enhancements to our 3D display mode.

We would also like to extend the spacer vertices concept to drawing the cluster bound-

aries. Allowing some flexibility in fitting the boundary around the vertices in the cluster

might improve the layout. An additional layout stage where each cluster is modeled as a

non-uniform node could help improve cluster separation [35]. Finally, using stronger con-

straints when a second layout is necessary might further improve the dynamic stability

of the algorithm.



Average / Algorithm non- no vertex with vertexincremental pinning pinning

density metric [area\vertices] 1.2516× 104 1.1994× 104 1.1936× 104

cluster displacement [distance] 4.0193 1.4118 0.3311fraction of clusters with the same size 0.1575 0.23 0.615running time [ms.] 492 1076 1084

Table 6.1: Average results of an animation sequence



Figure 6.7: Sample animation sequence (from left to right and top to bottom)



0 5 10 15 200.5

1

1.5

2

2.5x 104 a) Non−incremental

Layout number

Dens

ity m

etric

[area

/node

s]

0 5 10 15 200.5

1

1.5

2

2.5x 104 b) Without vertex pinning

Layout numberDe

nsity

metr

ic [ar

ea/no

des]

0 5 10 15 200.5

1

1.5

2

2.5x 104 c) With vertex pinning

Layout number

Dens

ity m

etric

[area

/node

s] final layout1layout2

Figure 6.8: Density metric

0 5 10 15 200

2

4

6

8

10

12a) Non−incremental

Layout number

Displa

ceme

nt

0 5 10 15 200

2

4

6

8

10

12b) Without vertex pinning

Layout number

Displa

ceme

nt

0 5 10 15 200

2

4

6

8

10

12c) With vertex pinning

Layout numberDis

place

ment

Figure 6.9: Sum of cluster displacements

0 5 10 15 200

1

2

3


Layout number

Numb

er of

cluste

rs

0 5 10 15 200

1

2

3

4b) Without vertex pinning

Layout number

Numb

er of

cluste

rs

0 5 10 15 200

1

2

3


Layout number

Numb

er of

cluste

rs

Figure 6.10: Number of clusters with the same size

0 5 10 15 200

500

1000

1500

2000


Layout number

Time [

ms.]

0 5 10 15 200

500

1000

1500

2000

2500b) Without vertex pinnning

Layout number

Time [

ms.]

0 5 10 15 200

500

1000

1500

2000


Layout number

Time [

ms.]

merge layout1layout2

Figure 6.11: Running times




Chapter 7

MOVIS: A system for VisualizingDistributed Mobile ObjectEnvironments

This chapter presents MOVIS – a system for visualizing mobile object frameworks. In

such frameworks, the objects can migrate to remote hosts, along with their state and

behavior, while the application is running. The graph–based visualization algorithm,

described in Chapter 6, is used to depict the physical and the logical connections in the

distributed object network. Scalability is achieved by using a focus+context technique

jointly with a user-steered clustering algorithm. In addition, an event synchronization

model for mobile objects is presented. The system has been applied to visualizing several

mobile object applications. This chapter is based on [13,64,67].

The rest of this chapter is structured as follows: Section 7.1 gives an introduction.

Section 7.2 discusses related work. In Section 7.3, the requirements of a mobile object vi-

sualization system are discussed. Our visualization is presented in Section 7.4. Section 7.5

addresses consistency. Section 7.6 discusses scalability. Section 7.7 discusses implemen-

tation issues. Results are presented in Section 7.8. Finally, Section 7.9 concludes and

discusses future directions.

7.1 Introduction

In recent years, distributed objects have become prominent in the design of distributed

applications [158]. Mobile objects are a natural evolution of the distributed objects

concept [4, 102, 103, 144, 159]. The mobile object paradigm allows programs to migrate


7. MOVIS: A system for Visualizing Distributed Mobile ObjectEnvironments 126

Figure 7.1: MOVIS user interface. Small rectangles represent mobile objects. Color

stripes show their movement history. Big rectangles represent the cores the objects reside

in. Dashed lines represent physical communication between cores. Higher communica-

tion frequency is indicated by a higher frequency of alternation in the lines. Solid lines

represent logical connections between objects. The square in the middle of the figure rep-

resents several cores which have been collapsed. The rectangle with a double boundary

was selected by the user as the current focus of attention core.

to remote hosts while they are running. It offers scalability, availability and flexibility

advantages compared to other methods of creating distributed applications. However,

such systems are more difficult to design and debug, two tasks in which visualization

can greatly assist. This chapter addresses the challenging problem of visualizing mobile

objects.

Mobile objects have two distinctive features. The first feature is code mobility : objects

can migrate to remote hosts, together with their state and behavior, while the application

is running. We refer to the processes hosting mobile objects as cores. The second feature

of mobile objects is location transparency, which allows the programmer to make calls to

objects regardless of their current location. Since the location of objects may change over



time, provisions must be supplied in order to track referenced objects. Unlike regular

distributed objects, in which the location of a remote object is fixed, when making a call

using a reference to a mobile object, the parameters may pass through several intermedi-

ary cores until reaching the called object. The introduction of intermediary cores allows

for a more scalable, lazy update of the location of a referenced object [102].

Although research on mobile objects is widespread, visualization of such frameworks

has hardly been done. As far as we know, the only work in this field includes [207,

211]. These systems fall short in several aspects, including the types of events generated

and visualized and visualization consistency and scalability, which are addressed in this

chapter .

This chapter makes the following contributions: First, we present MOVIS (Mobile

Object Visualization), a system for visualization of distributed mobile object environ-

ments. Second, we discuss the requirements of a visualization system for mobile objects.

Third, a graph-based visualization that concurrently shows the physical connections in

the computer network as well as the logical relations between the mobile objects is pre-

sented (see Figure 7.1). Fourth, a context-sensitive focus+context fisheye type display

technique is suggested in order to provide hierarchical information display and support

scalability. Fifth, a clustering algorithm, which is affected by nodes of interest to the

user, is presented. Sixth, we present a model for event synchronization that is used to

guarantee visualization consistency. Finally, we propose a method in which events are

automatically generated, avoiding additional work by the programmer of the application.

7.2 Related Work

Several tools have been developed for visualizing parallel and distributed programs [124].

The PVanim system [201] is a toolkit for creating visualizations of the execution of PVM

programs. PARADE [187] in an environment for developing visualizations of parallel and

distributed programs. In [146], tracing of CORBA [158] remote procedure calls is used to

analyze runtime activities and look for anomalous behavior. Vade [151] is a distributed

algorithm animation system in which visualizations can be created and executed on a

web page on the client’s machine. Pablo [173] provides analysis and presentation of

performance data for massively parallel distributed memory systems. Jinsight [167] is a



system for the visual exploration of the run-time behavior of complex Java programs.

Although research on mobile objects is widespread, visualization of such frameworks

has hardly been done. In [207], a modification of the process-time diagram, adopted

from XPVM [120], is used as the means of visualization. The creation, destruction

and movement of mobile objects are visualized. Event synchronization is handled by

timestamps and ordering rules. This system has a few drawbacks. It requires manual

annotation of source code in order to generate events. The system does not visualize

the physical connections between machines nor does it display the logical connections

between objects. Finally, it is not scalable. In [211] a visualization tool used to debug

mobile objects is presented. This tool is concerned mainly with checking the mobility of

objects as a function of time and identifying movement hotspots. Visualizations offered

include an object location display and movement history for an object. The system does

not visualize communication between objects or between computers hosting the objects.

Visualization consistency and scalability to large numbers of objects in not addressed.

In this chapter we discuss a different approach to the visualization of mobile object

frameworks, attempting to solve these problems.

7.3 Requirements

This section discusses the requirements of a visualization system for mobile objects:

1. Physical and logical visualization: A mobile object application has two dis-

tinct, yet related facets. The first is the physical computer network with the interconnec-

tions between the cores. The second is the logical network of mobile objects that can be

used to show the connections and interactions between objects. The visualization should

display both of these facets. This is important in order to easily detect cases where

closely interacting system components are placed on distant nodes. Using the visualiza-

tion the system architect will detect this inefficiency and modify the logic and layout of

the application in order to place such objects close together.

2. Interesting events: In any visualization system, the events that need to be

visualized greatly affect the design of the system. In the case of mobile objects, the

following interesting events should be visualized:

• Object Movement: The movement of objects between cores while the application



is running is the main difference between mobile object frameworks and regular

distributed applications. Therefore, a clear and concise depiction of such activities

is of great importance.

• Construction/destruction: Being a dynamic, distributed application, both objects

and cores may be added or removed during the execution of the application.

• Communication: Being distributed in nature, the messages sent between the differ-

ent parts of the system play a paramount role during execution of the application

and therefore provisions to visualize them should be supplied.

Event generation should be transparent both to the programmer and to the user of

the application. Moreover, care must be taken in order to reduce the perturbation of the

application caused by generating the events.

Being able to visualize movement allows easily identifying cases where objects mi-

grate too often, which is inefficient. Visualizing communication can help expose closely

interacting objects, which should be placed in close proximity

3. Consistent depiction: In a distributed, asynchronous environment there is no

global clock that can be used to synchronize events. This may lead to inconsistent

visualizations in which, for example, a message is shown to be received before it is sent.

One of the challenges in visualizing distributed systems is creating an animation that

provides a consistent depiction of events. This is especially challenging for mobile objects,

since parts of the application change their physical location during execution.

4. Scalability: One of the main challenges in software visualization is building a scal-

able visualization. This is especially important when dealing with networks of computers,

which can potentially generate massive amounts of information. A visualization system

should be able to process large amounts of data. This should be done while avoiding

swamping the user with unnecessary information and without slowing the response of the

visualization system to a point where it is no longer useful.

The user should be able to steer the visualization system to display relevant and

interesting data out of the large amount of information collected. This control should

be interactive, allowing the user to feed back to the system new requests based on the

knowledge accumulated while viewing the unfolding visualization. This will allow the user



to easily study interesting parts of a large distributed system, for example ones which

require tuning.

5. Dynamic graph layout: A graph is a natural way to represent the structure of

a software system. In the context of mobile objects, a dynamic, clustered graph is used.

Since the graph is dynamic, it is important to produce stable layouts that help maintain

the users mental map of the system. This is required in order to avoid distracting the

user with confusing changes to the way the graph looks each time it changes.

In the following sections we describe how MOVIS addresses there requirements.

7.4 Physical and Logical Visualization

As discussed in Section 7.3, two simultaneous networks are of interest: the physical

network of cores (machines) and the logical relations and interactions between mobile

objects. A graph is a natural choice for visualizing a distributed network. In our case,

we need to simultaneously visualize two graphs. This is done using a clustered graph, as

defined in Definition 6.2.2.

A clustered graph is a natural choice for displaying the simultaneous physical and

logical graphs, as demonstrated in Figure 7.1. Every mobile object is depicted by a node in

the graph. The logical connections between objects are shown using solid edges connecting

the nodes. In order to overlay the physical structure of the network, clusters are used.

Each core is represented by a cluster that contains all of the objects currently residing

in that core. Dashed cluster-cluster edges are used to represent physical connections

between cores (see Figure 7.1), as opposed to logical relations that exist between objects.

As discussed in Section 6.3.6 and demonstrated in Figures 6.4 and 6.5, the graph can

be displayed either in 3D or 2D. Color and transparency (in 3D) are used to help the

user comprehend the visualization.

We use several techniques and attributes in order to display information in this graph.

Each cluster boundary is drawn using a different color. This helps the user track the

different clusters while changes are performed to the graph during the visualization.

Each node is drawn using color strips, as shown in Figure 7.1. The strips are colored

according to the location history of the object. The bottom strip is the current location

(e.g. colored with the same color as the cluster the node currently resides in) the strip



above corresponds to the previous location, etc. This ”growing stacks” metaphor is

similar to the growing squares in [54].

In order to create a more scalable and meaningful display, we employ lazy construc-

tion of edges. Instead of cluttering the graph with node-node edges showing all of the

references between objects, an edge is drawn between two objects once a method call

between the objects is detected.

In addition to the existence of communication between objects or cores, the frequency

of this communication is of interest to the user. Line patterns are used to convey this

information. The higher the frequency of alternation in the dashed lines, the higher the

frequency of communication. See for example Figure 7.1. The sum of two weighted

averages is used to calculate the amount of communication between cores. The first is

the average number of objects moving between the cores connected by the edge. The

second is the average number of remote invocations performed between the two cores.

The averages are calculated using a weighted sliding window, taking the last N samples

into account.

Some mobile object frameworks [102] allow tagging of specific objects as stable, i.e.

objects that remain at the same location throughout their lifetime. This distinction

between stable and movable objects is visualized by laying out the objects in each cluster

using two concentric circles. The inner circle contains the stable objects while the outer

one contains movable ones.

As discussed in Chapter 6, we have developed a special incremental graph layout

algorithm tailored for the requirements of mobile object visualization [63]. The algorithm

produces a dynamic display of clustered graphs, attempting to preserve the users mental

map of the graph, as it is being changed [145, 155]. The algorithm uses a static force-

directed layout algorithm as a basic building block [53,113,199]. It uses invisible dummy

nodes to create the clustered structure and place-holder nodes to maintain layout stability.

Edge length and weight are used as a means of controlling the changes made to the layout.

Animation is used in order to show different events. When a new graph layout is

performed, for example after an object moves between cores, the positions of nodes,

edges and clusters are linearly interpolated between the old and the new locations. A

method call between two remote objects is animated using a lightning bolt icon that

moves from the caller to the called object.



7.5 Visualization Consistency

One of the main challenges in visualizing distributed environments is the accurate depic-

tion of events. Since in asynchronous distributed systems there is no way of knowing the

real ordering of events, it is necessary to generate a visualization that is consistent with

the events [129].

We base our solution to event synchronization on [151], where consistency of dis-

tributed environments with static objects was addressed, and extend it to support mobile

object frameworks. In [151], the following is assumed:

1. There is a fixed (known) number of processes.

2. A process can perform two types of actions: sending a message to a different process

and an internal computation, possibly modifying the process’s local state. Receiving

a message is considered an internal action.

3. The communication network and processes are reliable.

4. Messages sent by a single process to another process arrive in the order they were

sent.

5. The network is asynchronous - there is no universal clock.

Since the visualization process is part of the distributed environment, it cannot know

the relative order of actions performed by different processes. A way to solve this difficulty

is to introduce semantic causality.

Definition 7.5.1. With respect to a given algorithm run r, we say that an event e in r

semantically causes e′, denoted by e→ e′, if one of the following holds:

1. e and e′ are on the same process, e occurs before e′ and the user specified that they

are semantically dependant.

2. e and e′ are on two different processes connected by a communication channel, e is

a send event and e′ is the corresponding receive event.

3. There is an event e′′ such that e→ e′′ and e′′ → e′.



Let e and e′ be two events of the algorithm. Let An(e) and An(e′) be the animation

segments of these events, respectively. We say that an animation An(e) precedes an

animation An(e′), denoted by An(e) ≺ An(e′), if An(e) completes before An(e′) starts.

The following theorem has been proved in [151]:

Theorem 7.5.1. An animation is consistent with the execution of the algorithm if and

only if for every two algorithm events e and e′, such that e→ e′ also An(e) ≺ An(e′).

That is, in order to ensure that the animation is consistent with the execution of

the algorithm, we have to ensure that for every two events e and e′, if e → e′ then

An(e) ≺ An(e′).

A possible implementation of this requirement is called receive synchronization. In this

method, reports of send and receive events are sent to the animation system immediately

after they take place and there is no delay in the execution of the algorithm. The

animation of the receive event is delayed until the corresponding send event has been

animated.

We now turn our attention to mobile object environments. The main differences

between this model and the distributed environments model, in the context of consistency,

are:

1. Assumption 1 is violated. Both cores and objects might join or leave the network.

2. Objects might move between cores.

3. Assumption 4 is violated. Since objects might move, messages sent by a single

object to another might be received out of order.

The first problem is addressed as follows. Dynamic creation and deletion of cores and

objects are modeled as internal messages. A core / object is introduced to the animation

system after its internal create event is received. A core / object is deleted from the

animation system once a deletion event is received and all proceeding events have been

animated.

To solve the second problem, object movement between locations is modeled as a

method call between the sending and receiving cores. The parameters passed include the

state and behavior (code) of the object that is being moved from one core to the other.



This approach allows us to synchronize the events emitted by objects, even when they

migrate to remote cores. It is ensured that the movement event will be animated before

any events generated at the destination core are shown. Hence, when treating object

movement as a method call between cores, we are able to revert to using existing event

synchronization algorithms.

The third problem, out-of-order messages, should be solved by the middleware or the

application. It is not a visualization problem, but rather an inherent problem. When this

is solved, all that remains is to solve possible out-of-order reception of messages by the

visualization system. This can be done by adding an event counter to each object and

using the receive synchronization technique described above for visualization.

We have chosen to perform synchronization at the core level. Using a finer-grained

approach requires extensive profiling of the application , possibly considerably slowing

down the execution. Like regular distributed applications, each core is viewed as a sepa-

rate process. Events notifying about communication between cores and activities internal

to each core are emitted and synchronized. The internal events in each core are serialized.

This may add redundant dependencies between activities that are independent in a core

but is guaranteed to create a consistent visualization. The alternative of asking the user

to explicitly define dependencies is not viable in the context of our problem.

Messages sent between cores are modeled as messages sent between processes. The

dependency between receiving the parameters for a message call and forwarding the

parameters to the next core on the way to the destination core is handled automatically

since these are two events that occur at the same core, one after the other. This is also

true for messages sending the return value back to the caller core.

Events showing average information that is periodically updated are not synchronized.

For example, in our system events notifying the amount of communication between cores

are periodically generated, yet not synchronized.

7.6 Visualization Scalability

As the number of objects and cores increases, the visualization might get cluttered with

information. Gaining any insight from the visualization will become increasingly difficult.

In this section we present a context sensitive focus + context technique that alleviates



this problem.

7.6.1 Levels of Detail

The visualization should provide the user with an overview of the graph while at the

same time allowing focusing on specific, user-defined areas in order to get more detailed

information [29, 71]. To achieve these goals, a hierarchy of levels of detail is defined,

allowing different parts of the graph to be displayed in different levels of detail.

At the highest level, full information is displayed, as shown in Figure 7.2(a). The next

level of the hierarchy omits information about the objects residing in each core and the

logical connections between objects. Instead of displaying a cluster for each core in the

network, a single node is used to depict each core. As before, cluster–cluster edges are

used to convey the physical connections to other cores in the network, as demonstrated

in Figure 7.2(b). The final level combines several cores into one node in the display. This

allows collapsing un-interesting parts of the graph into a small display area while still

showing the user the overall structure of the graph. The size of such nodes is proportional

to the number of cores they depict. Figures 7.2(c) and (d) demonstrate graphs containing

nodes of various levels of detail. Note the stability of the layouts and the way nodes are

collapsed as the level of detail is decreased.

The user has several methods to control which parts of the graph will be displayed

in which level of detail. The first is selecting focal nodes (cores) that are of primary

interest to the user and thus should be displayed with full detail. The second method

is navigating the graph using zoom-in and zoom-out operations. The third is choosing

the total number of nodes to be displayed in the graph and letting the system cluster

the graph nodes accordingly. Once the user selects focus nodes, a clustering algorithm

is employed in order to decide at what level of detail each core will be displayed, as

described in the next subsection.

Zoom–in and zoom–out operations are animated smoothly. The old nodes fade out of

the graph while the new nodes fade in. Next, the new nodes smoothly move to their final

location. This helps maintain the mental map. A similar animation is performed when

re–clustering is performed. The locations of the new clusters are calculated by the layout

algorithm, which takes into account the previous locations of the nodes comprising the



(a) Original graph – 43 clusters (b) Clustering to 36 clusters

(c) Clustering to 25 clusters (d) Clustering to 15 clusters

Figure 7.2: Levels of detail. Several visualizations of the same mobile object network are

shown. Parts of the graph are progressively collapsed. Note the stability in the layouts

and the conservation of the overall structure of the graph.



cluster, thus maintaining layout stability.

In order to improve scalability, the synchronization scheme presented in Section 7.5

can be extended to a hierarchy of synchronization units, which is constructed according

to the hierarchical representation of the graph. Each level in the hierarchy contains a

synchronization unit. Events are forwarded to the next (higher) level only if they are not

contained in the current level in the hierarchy. Using this method, the amount of events

reaching the higher levels of the hierarchy (which represent more cores) is significantly

reduced. In order to further reduce the volume of events, instead of showing movements

of objects using animation, this information can be time-averaged and visualized by

changing the frequency of the dashed lines connecting cores.

7.6.2 Clustering

A clustering algorithm is used in order to compute the hierarchical representation of

the graph. The clustering is influenced by the focal nodes, which are interesting nodes

selected by the user. A fisheye type effect is used, where nodes farther away from the focal

nodes are displayed with less detail. The algorithm, which is summarized in Figure 7.3,

is based on an extension of the agglomerative clustering algorithm.

Input: Set of focal nodes; distances between nodes; number of desired clusters

Algorithm:

1. Calculate shortest distance between each node and the closest focus node.

2. Update distances between nodes according to distance to focus node.

3. Perform hierarchical clustering.

Output: Clustering hierarchy of the nodes

Figure 7.3: Focus-based clustering algorithm

The algorithm has several inputs. The first is a set of focal nodes (e.g., cores of

interest), selected interactively by the user. The second is the distances between nodes,

designated D(u, v), which correspond to the weights of edges in the graph. They are

calculated according to the frequency of method calls and object moves between cores, as

described in Section 7.4. The third input is the desired number of clusters. The output

of the algorithm is a hierarchical clustering of the graph.



In the first step of the algorithm, the shortest distance between each node u and the

closest focal node, Dfocal(u), is calculated. This is done using Dijkstra’s algorithm on

the focal nodes. Additionally, the maximum of the minimal distances is computed as

dmax = maxv∈V Dfocal(v), where V is the set of nodes in the graph.

In the second step, the distances, D(u, v), between every pair of nodes u, v, are up-

dated according to their proximity to focal nodes. As opposed to the regular fisheye

technique, in which geometric distortion is used, our method moves the distortion to the

clustering phase. This results in a better layout since the graph is not distorted after

layout. We set the initial, joint average distance of nodes u and v and a focus node to

Dfocalavg (u, v) =

Dfocal(u) + Dfocal(v)

2.

It should be noted that the focal node used in Dfocal(u) may be different from the one

used in Dfocal(v). The distance D(u, v) is distorted to form Ddistorted(u, v), the updated

distance between nodes u and v, according to the following formula:

Ddistorted(u, v) =D(u, v)

1 + C · Dfocalavg (u,v)

dmax

.

The greater the average distance between the nodes and the closest focal node, the bigger

the distortion. This behavior mimics the fisheye effect. Nodes in the periphery are less

interesting and therefore have a higher probability of being clustered together, since they

are perceived to be close. In our implementation we use C = 3. Another option is to

have C depend on the size of the graph.

In the last step of the algorithm the actual clustering is performed, using the distances

computed in the previous steps. A bottom–up, hierarchical clustering algorithm is used.

The algorithm starts with assigning every node to its own singleton cluster. It then

repetitively greedily joins the two closest clusters. The algorithm terminates when the

required number of clusters have been created.

The distance between clusters Ci and Cj is calculated using a modified average dis-

tance metric. Only edges (distances) in the set Eij = e = (u, v) ∈ G|u ∈ Ci, v ∈ Cj,that is edges directly connecting a node u ∈ Ci and a node v ∈ Cj (e.g., edges crossing



the boundary between the clusters) are taken into account. The distance is

Dist(Ci, Cj) =

∑

(u,v)∈Eij

Ddistorted(u, v)

| Eij |

e.g., the sum of the lengths of the edges divided by the number of edges. This formula is

a tradeoff between an exact calculation and a rapid, approximate calculation.

7.7 Implementation

MOVIS was implemented on top of FarGo [102], a Java-based mobile object framework.

FarGo contains extensive monitoring facilities [103] and uses a source–to–source compiler

called Fargoc for generating proxies and other code used to implement support for mobile

objects. Our implementation is Java based. We use the Java3D API for generating the

visualization.

Our system is composed of several components. In each core (machine), a special

local profiling object, used to collect events, is instantiated. This object listens both to

events generated by the Fargo monitor and to events generated by our modified Fargoc

compiler. The events generated by each core are forwarded to a main event collection

object. This object either stores the events for offline visualization or forwards them to

the event synchronization unit, described Section 7.7.2. After creating a synchronized

event list, from which a consistent run of the application can be constructed, the events

are sent to the visualization component. Events generated by the user, such as requests

for re-clustering or zoom in / zoom out operations are fused together with the events

collected from the system, in order to form a unified event queue that is visualized.

7.7.1 Event Generation

One of the goals of a program visualization system is to generate events with minimal

effort by the programmer and the user of the application being visualized, while perturb-

ing the running application as little as possible. In this section we describe how this is

achieved.

The interesting events are related to communication between mobile objects and move-

ment of objects between cores. Since location transparency needs to be maintained when



communication is performed between mobile objects, some kind of proxy needs to be used

in order to forward the method call to the actual destination object. This proxy is gen-

erated either statically [102] or dynamically [4, 159]. This is where the event generation

code is (automatically) inserted.

In order to trace method calls, the Fargoc compiler was modified to transparently

generate an event each time execution enters an interface method of a mobile object.

Generating events for movement of objects between cores is implemented by piggybacking

onto the migration code supplied by the middleware. Other types of actions for which

events need to be generated include the creation and destruction of mobile objects and

cores (e.g., connecting/disconnecting from the application network). This is handled by

tapping into an existing profiling interface.

7.7.2 Event Synchronization Component

The event synchronization component receives events from all of the event collection

objects located at the different cores that constitute the application to be monitored. It

reorders the events in order to generate a sequence of events that is consistent. This

stream of events is then visualized.

The implementation of the synchronization component follows several rules and obser-

vations made in this chapter . The first is that all events generated at a core are reported

in FIFO order and each event depends on the previous event. The second is that a send

event should be reported (to the visualization) before (depends on) the receive event.

The algorithm is described in Figure 7.4.

For each core, the synchronization component maintains a queue of events. This queue

contains received events that cannot be forwarded to the visualization component, since

a dependent send event was not received yet by the synchronization component. We will

call the act of sending an event to the visualization component committing the event.

Committing a send event may be delayed since it in itself is dependant on a previous

event that has not been committed, yet.

When a new event is received by the synchronization component, the following is

done. First, a check is made if the core from which the event was sent is blocked, e.g.

waiting for events. If this is the case, the event is added at the end of the event queue of



procedure handle event (event e) if (e’s core is blocked)

queue event(e)

else

if (e is a send or internal event)

commit event(e)

else //e is a receive event

if (e depends on a committed event)

commit event(e)

else

queue event(e)

procedure commit event(event e)

send to vis(e)

if (e can unblock a core)

BFS unblock core(e.getCore())

procedure BFS unblock core(core c)

active cores list ← cwhile (active cores list 6= ∅)

c = remove first(active cores list)

if (c has more queued events)

ec = c.nextEvent()

if (ec can be sent)

send to vis(ec)

if (ec is a send event)

dest(ec) = ec destination core

if (dest(ec) blocked on ec)

add dest(ec) to active cores list

add c to active cores list

Figure 7.4: Event synchronization algorithm



the core. If the core is not queuing events and the event is a send event - it is committed.

A check is made if there is another core that is blocked on this event. If this is the case,

events from the blocked core may be committed, according to their order in the queue.

If the newly received event is a receive event, a check is made to determine if the send

event that it depends on was already sent. If this is the case, the event is committed. If

this is not the case, the event is queued and its core enters the blocked state.

When a core unblocks, the queued events are committed. This, in turn may cause

other cores to become unblocked (due to committing a send event that the blocked core

depends on). A list of active cores is maintained. Each time one event is committed from

a core, the activity switches over to the next core in the list. This is similar to advancing

in a graph using a BFS algorithm. The motivation of using this method is to create a

stream of events that will produce animation that is maximally parallel. Switching from

one core to an other while committing events attempts to expose the possible parallelism

to the visualization component. The synchronization component can be modified to

produce a variety of interesting orderings, as described in [125].

7.8 Results

Our system has been used for visualizing several applications, including a mobile object

simulator, an e-commerce application [110] and a distributed e-mail system (abbreviated

DEM) [13]. We first present visualizations of our mobile object simulator and then

proceed to discuss the application of MOVIS to the DEM system.

Mobile object simulator In order to test our visualization system, a mobile object

simulator was implemented. The simulator uses a configuration file which governs the

activities of mobile objects it creates. The number of objects, their creation and destruc-

tion time and location, their movement and communication patterns are all specified in

the configuration file. Figure 7.5 demonstrates an animation sequence created with our

visualization algorithm. Note how the users mental map is maintained during the ani-

mation sequence. Also note the stripes which show the location history of each mobile

object.



Figure 7.5: Sample animation sequence of the mobile object simulator (from left to right

and top to bottom)

Mobile-object based E-mail application E-mail is one of the most popular Internet

applications. Nowadays, e-mail architectures are governed by a server-centric design,

which implies a handful of weaknesses such as a single point of failure, storage and

processing stress, bottlenecks and inefficiency.

The goal of the DEM system is to overcome these drawbacks. Service is provided by

using the participants’ resources. Lightweight servers and users’ mailboxes scatter be-



tween participants’ computers instead of residing on a single server (or cluster). By using

the mobile objects paradigm, the mailboxes and servers are able to travel on the “live”

network, so that they continue their operation despite the fact that participants con-

stantly join and leave the network. Most of the communication is done directly between

users, thus removing the bottlenecks caused by mail servers. The system’s components

are replicated across numerous hosts, eliminating single point of failure problems. Storage

and processing stress is reduced as participants take an even share of the burden. This

yields a reliable and scalable system, with negligible operational and maintenance cost.

Visualization has been used during the development of this application – for debugging

purposes as well as for managing and monitoring its deployment across the network. Due

to the complexity of the architecture, its developer expressed a need for visualization

at the very early stages of implementation. Using visualization, several problems were

quickly discovered. For example, a case where an object does not flee from a core that is

shutting down was uncovered.

(a) (b)

Figure 7.6: Mailbox mobility in the DEM system. (a) Before movement. (b) A new core

was created. A mailbox migrated to it.

In this application, icons have been used to represent the objects. The mailboxes

are displayed using a mailbox icon. Servers are represented as gray disks. Yellow pools



Figure 7.7: Sending an e-mail in the DEM system

represent mailbox placeholders. Finally, the GUI is represented by a mailbox icon with

a white background.

Figure 7.6 shows a visualization of the movement of a mailbox between computers.

In Figure 7.6(a) there is one mailbox in each core. In Figure 7.6(b) a mailbox moved to

a new core that connected to the service, shown at the bottom.

Filtering of method calls was used in order to show specific interesting events. For

example, Figure 7.7 shows an e-mail message being sent from the source mailbox directly

to the destination mailbox. The message, in transit, is drawn inside a red circle. An

accompanying movie can be found at http://www.ee.technion.ac.il/∼ayellet/Movies/-

FrishmanTal.mov.


We have presented MOVIS – a system for visualizing mobile object frameworks. The key

features of these frameworks – object mobility, location transparency, and distributed

operation – are addressed by our system. A clustered graph is used to concurrently show

the physical connections between cores and the logical connections between objects. A

clustering algorithm, which is influenced by the areas of interest to the user, is used to

provide a hierarchical, scalable context+focus visualization. The overall complexity of

the graph is user controlled. The visualization is dynamic: incremental graph layout and

animation depict changes in a smooth, comprehensible manner.

MOVIS has been used for monitoring, debugging and presenting system architectures.



It has been used in several scenarios, including simulators, e-commerce and distributed

e-mail.

There are several avenues of future research. Additional levels of detail can be inte-

grated into the visualization. The existing profiling infrastructure can be used to supply

object-specific information such as memory usage and creation time. Information about

the cores themselves, such as thread count, memory usage and CPU usage can also be

integrated into the visualization.


Chapter 8

Conclusions

In this thesis we have examined several problems in the field of graph drawing in infor-

mation visualization. In this chapter we summarize the main results and propose several

topics for extending this work in future research.

8.1 Contribution and Summary

The major contribution of this thesis is addressing several interconnected problems in the

field of graph drawing. First, a new algorithm for solving the basic problem of computing

a layout for a single graph is presented. The algorithm is able to quickly compute layouts

of large graphs. One of the difficulties with graph layouts is the variable information

density in different parts of the screen. An algorithm that improves a layout computed

by any algorithm is presented next. One of the goals of the algorithm is to maintain the

overall structure of the graph while it is improved. This is one instance of the problem of

maintaining the mental map [145], which is also addressed in this research. In addition to

computing a layout for a single graph, we study methods of creating sequences of graph

layouts. The challenge here is maintaining aesthetics while at the same time maintaining

stability, in a way that allows the user to comprehend the changes preformed on the

graph without being distracted by unwanted, abrupt changes to the layout. Dynamic

algorithms for both clustered and un-clustered graphs are discussed.

In recent years, graphics processing units (GPUs) have become increasingly powerful

and programmable. Devised for quickly rendering high-quality images for graphics tasks,

GPUs are architected for working in parallel on large, structured data. On the other

hand, graphs are inherently unstructured and hence do not seem suitable for processing


8. Conclusions 148

on GPUs. In this thesis we have demonstrated how a variety of problems related to graph

layout can be restructured and thus efficiently handled by the GPU.

The algorithms developed in this thesis have been used in several information visual-

ization applications. The dynamic clustered graph drawing algorithm is used as a basic

building block for a system for the visualization of mobile object frameworks. Focus +

context techniques are used to create a scalable visualization system, showing both physi-

cal and logical interactions in the mobile object network. The static layout algorithm has

been used for visualization of the structure of the networks of internet service providers.

It is shown that the layout can provide meaningful insights about these networks. The

dynamic graph drawing algorithm has been used for a couple of applications. The first

is the visualization of discussion threads occurring at an Internet news site. The second

is the visualization of the growth of an Internet-based social network.

The research presented in this thesis is based on the following papers [13, 63–69].

Below, we give more details about the contributions and main results of each chapter.

In Chapter 3, based on [65], a new algorithm for force directed graph layout on the

GPU was presented. The algorithm uses a multi-level scheme, which is based on spectral

partitioning. The strengths of the algorithms of [70,113] are combined in order to create

a high-quality layout of a simplified graph, which is the basis for the final layout. A new

scheme for extending coarse layouts to finer layouts, which creates good initial layouts

for coarse graphs is discussed. Finally, the algorithm presented is able to efficiently

use the GPU to accelerate the layout. Using spectral partitioning and KD-partitioning

techniques, we are able to restructure the graph layout problem in a manner suitable for

acceleration on the GPU or any other data-parallel architecture. Thus, this algorithm

can be efficently implemented on future, parallel, architectures.

It has been demonstrated that the algorithm is able to quickly compute aesthetic lay-

outs of different types of graphs. Using the GPU the layout computation was accelerated

by up to 5.5 times compared to a CPU implementation of the algorithm. Combined with

the inherent speed of the algorithm, this resulted in being able to compute layouts with

similar quality to state-of the art force directed algorithms such as [91] in a fraction of

the running time. The algorithm has been applied to the visualization of the networks

of Internet service providers.

In Chapter 4 [69], the problem of reducing cluttering in graph layouts was addressed.


8. Conclusions 149

In many cases, graph layouts contain a non-uniform spatial density of information. While

some regions of the layout are highly congested, others are sparse or even empty. A new

algorithm for improving a given graph layout, computed by any layout algorithm was

presented.

The algorithm is based on a physically-inspired evolution process, where the content

of dense areas of the layout is spread to surrounding empty areas. The evolution uses a

ray-casting approach in order to find a better distribution for the information contained

in the graph layout. An image warp, which is used to displace the nodes of the graph is

computed. Results from optimal mass-transport problems are used in order to compute

this warp. Since the wrap minimizes the displacements in each pixel of the image, the

algorithm is able to compute a mental-map preserving improvement of the layout.

Various acceleration techniques were used. The GPU was used to efficiently perform

the ray-casting, which is required to compute the updated layout density image. Using

the GPU accelerated the total computation time by a factor of over 100 over our CPU

implementation. A multi-grid method was used to accelerate the computation of the

image warp. These techniques resulted in being able to compute an updated layout in a

matter of seconds, even for large input graphs.

The algorithm has been applied to unclutter layouts of both small and large graphs

computed by several well-known algorithms. It was demonstrated that the algorithm is

able to better utilize the available screen space while maintaining the user’s mental map.

This allows, for example, to create animations of the improvement process, where the

structure of the graph is maintained while the readability of the graph improves.

In Chapter 5, based on [66, 68], we described a new, GPU-accelerated algorithm for

online dynamic graph drawing. The algorithm is able to efficiently compute stable and

aesthetic layouts of a series of graphs, which contain arbitrary modifications between

consecutive graphs. The algorithm uses various execution culling techniques in order to

reduce the layout time, while maintaining the layout quality. Nodes are assigned individ-

ual movement flexibilities according to the changes to the graph. A multi-level scheme

for dynamic graphs is presented and used to improve the layout quality. The algorithm

has been applied to the visualization of several real datasets, including discussion threads

in Internet sites and visualization of social networks.

The algorithm was shown to compute high-quality layouts while reducing node dis-


8. Conclusions 150

placement and preserving the user’s mental map. Implementation on the GPU allowed

for a speedup of the total running time of the algorithm (including parts running on the

CPU) by up to 17 times. Further, it was shown that using newer GPUs results in an

even larger acceleration of the layout.

In Chapter 6, based on [63], an online algorithm for dynamic drawing of graphs which

contain an inherent grouping into clusters was discussed. The algorithm is based on a

few concepts. First, in order to maintain stability, some of the nodes of the graph are

pinned down. Second, invisible place-holder vertices are used to minimize changes to the

structure of the graph. Finally, edge lengths and weights are used to control the placement

of vertices and the modifications of the graph. Several metrics for measuring dynamic

layout quality were introduced. The algorithm has been applied to the visualization of

mobile objects, which is discussed in Chapter 7.

In Chapter 7, based on [13, 64, 67], a system for the visualization of mobile object

environments was presented. During this research several visualization challenges were

encountered. First, the visualization needs to be consistent with the execution of the

asynchronous application. Second, the user’s mental map needs to be maintained while

the visualization unfolds. Third, the visualization must be scalable. Distributed systems

are naturally scalable and mobile object systems are even more so. Devising a visual-

ization method that can scale well to hundreds of objects and machines is much harder

than providing a tool that can display a few objects. Fourth, simultaneously showing ac-

tivities in the physical network communication level and in the logical object interaction

level is required. This is important since objects are mobile and the interaction between

machines changes over time due to object migration. Finally, the massive amounts of

information available need to be filtered in order to allow the user to focus on interesting

events.

In our work we devised and implemented algorithms to solve all of these difficulties.

We made the following contributions. First, we identified and discussed the requirements

from a mobile object visualization system. Second, we supplied an algorithm to maintain

the consistency between the execution of the algorithm and its visualization. We proved

the correctness of our synchronization algorithm. Third, we developed a focus+context

visualization algorithm [29, 71] which provides a system that is scalable and capable of

displaying large networks and many events. This was achieved by using a focus-based,


8. Conclusions 151

user-directed clustering algorithm and displaying information using different levels of

detail. Fourth, we displayed the logical and physical aspects of the system simultaneously

using a dynamic clustered graph. We used both 2D and 3D graphics in our visualization.

Finally, we devised a way to generate interesting events automatically, avoiding additional

work by the programmer of the application. We implemented our system on top of the

FarGo mobile object framework [102].

Our visualization system has been used in several scenarios, ranging from simulators

to distributed e-mail and e-commerce applications. It has been used for monitoring,

debugging, as well as for presenting system architectures. One byproduct of this work

has been the design and implementation of a distributed e-mail architecture that is based

on the mobile object paradigm [13].

8.2 Future Research

In our modern world, which is filled with different sources of information, being able to

visualize the large amounts of available information is an increasingly important task.

In this thesis, several algorithms for visualizing different types of information have been

presented. In this section we outline a few possible extensions of this thesis and related

research problems. More specific reserch ideas are included in the conclusions section of

each chapter.

In this research, an algorithm for dynamic drawing of clustered graphs has been

presented. Dynamic drawing of changing, nested hierarchical graphs of arbitrary depth

is an interesting research challenge. Adding more levels of detail and more information

while providing a consistent, mental-map preserving and understandable visualization is

significantly more complex. Some of the applications of such an algorithm include web

visualization and software visualization.

One of the major deficiencies of force-directed graph layout algorithms is the fact

that they converge to a local minimum position. This has two drawbacks. First, the final

outcome depends on the initial conditions used. Second, since the minimum is local, it is

not guaranteed that the optimal layout is computed. Hence, finding a high-quality layout

algorithm that converges to a global minimum is an interesting research problem.

Many graph drawing algorithms spend most of their effort optimizing the positions


8. Conclusions 152

of the nodes in the graph. Often, the edges are simply used to connect the nodes,

almost as an after-thought of the layout process. However, when looking at a layout, it

is evident that the edges are an important part of the layout, not only in terms of the

information they contain, but also in terms of the amount of screen space allocated for

drawing the edges. New static and dynamic algorithms that make the graph edges an

integral part of the layout process can generate improved and more readable layouts. An

especially challenging problem here is dynamic graph drawing. Updating node positions,

edge positions and edge shapes in a mental-map preserving way is an interesting future

research problem.

Another future research direction is applying the algorithms to diverse information vi-

sualization problems. One application is visualization for computer security. In this field,

graphs and especially dynamic graphs can provide a meaningful visualization, helping

identify behaviors and patterns. For example, visualization of the time-varying changes

in network traffic between Internet hosts can help identify suspicious node groups with

highly variable traffic, possibly due to a breach of security. Another area that has received

little attention thus far is using visualization tools in order to perform application man-

agement tasks. An initial attempt, preformed as part of our previous work [13] suggests

some promising results. Visualization of the process of developing software [27, 161, 203]

is another interesting application. Software is composed of hierarchal modules: source

files, classes and directories. These modules, their relative locations and the connections

between them constantly change during the lifetime of the software. Here, visualization

can be applied for project management, complexity analysis and understanding software

structure. To meet this challenge, algorithms for hiding some of the information, creating

smooth animations, and display algorithms need to be devised. Biological networks are

immensely complex. Applying graph layout techniques in order to visualize activity in

such networks can help researchers better comprehend the large amounts of data they

face. The size and complexity of biological data make graph drawing applications in this

field especially challenging.

Clearly, there is a growing interest in using GPUs to accelerate computations [89,163].

While GPUs have been successfully applied in graphics and visualization tasks, the use

of GPUs for accelerating information visualization tasks is not as common. In this thesis

some progress was made in applying GPUs in the field of graph drawing, which is one


8. Conclusions 153

of the central problems in information visualization. Implementing other graph-related

problems such as graph partitioning and clustering on the GPU can provide a basic

building block for a variety of applications, extending outside the field of information

visualization.


8. Conclusions 154


References

[1] AT&T graph library. linked from http://www.graphdrawing.org/.

[2] Rocketfuel maps and data. http://www.cs.washington.edu/research/-

networking/rocketfuel/ .

[3] S. W. A. T. Adai, S. V. Date and E. M. Marcotte. Lgl: creating a map of protein

function with an algorithm for visualizing very large biological networks. J. Mol

Biol, pages 179–190, 2004.

[4] A. Acharya, M. Ranganathan, and J. Saltz. Sumatra: A language for resource-aware

mobile programs. In J. Vitek and C. Tschudin, editors, Mobile Object Systems:

Towards the Programmable Internet, number 1222 in Lecture Notes in Computer

Science, LNCS, pages 111–130. Springer-Verlag, 1996.

[5] D. Aiger and K. Kedem. Applying graphics hardware to achieve extremely fast

geometric pattern matching in two and three dimensional transformation space.

Inf. Process. Lett, 105(6):224–230, 2008.

[6] J. A. Anderson, C. D. Lorenz, and A. Travesset. General purpose molecular dy-

namics simulations fully implemented on graphics processing units. J. Comput.

Phys., 227(10):5342–5359, 2008.

[7] D. Archambault, T. Munzner, and D. Auber. TopoLayout: Multilevel graph layout

by topological features. IEEE Trans. on Visualization and Computer Graphics,

13(2):305–317, 2007.

[8] D. Auber and Y. Chriricota. Improved efficiency of spring embedders: Taking

advantage of GPU programming. In Visualization, Imaging, and Image Processing,

pages 169–175, 2007.


References 156

[9] J. Barnes and P. Hut. A hierarchical O(N logN) force-calculation algorithm. Nature,

324(4):446–449, 1986.

[10] J.-D. Benamou and Y. Brenier. A computational fluid mechanics solution to the

Monge–Kantorovich mass transfer problem. Numerische Mathematik, 84(3):375–

393, Oct. 2000.

[11] S. Bender-deMoll and D. McFarland. The art and science of dynamic network

visualization. Journal of Social Structure, 7(2), 2006.

[12] S. Bender-deMoll and D. A. McFarland. SoNIA - social network image animator.

http://www.stanford.edu/group/sonia/.

[13] S. Bercovici, Y. Frishman, I. Keidar, and A. Tal. Decentralized electronic mail. In

International Workshop on Dynamic Distributed Systems (IWDDS), 2006.

[14] F. Bertault and M. Miller. An algorithm for drawing compound graphs. In J. Kra-

tochvıl, editor, Proc. 7th Int. Symp. Graph Drawing (GD 1999), number 1731 in

Lecture Notes in Computer Science, LNCS, pages 197–204. Springer-Verlag, 2000.

[15] T. Biedl and G. Kant. A better heuristic for orthogonal graph drawings. In Proc.

2nd European Symp. on Algorithms (ESA’94), number 855 in LNCS, pages 24–35,

1994.

[16] D. Blythe. The direct3D 10 system. ACM Trans. Graph, 25(3):724–734, 2006.

[17] J. Bolz, I. Farmer, E. Grinspun, and P. Schroder. Sparse matrix solvers on the

GPU: conjugate gradients and multigrid. ACM Trans. Graph, 22(3):917–924, 2003.

[18] U. Brandes. 4. drawing on physical analogies. Lecture Notes in Computer Science,

LNCS, 2025:71–86, 2001.

[19] U. Brandes, D. Fleischer, and T. Puppe. Dynamic spectral layout of small worlds.

In Proc. 13th Int. Symp. Graph Drawing, GD, pages 25–36, 2005.

[20] U. Brandes and D. Wagner. A Bayesian paradigm for dynamic graph layout. In

Proc. 5th Int. Symp. Graph Drawing, GD, number 1353 in LNCS, pages 85–99,

1997.


References 157

[21] J. Branke. 9. dynamic graph drawing. Lecture Notes in Computer Science, LNCS,

2025:228–246, 2001.

[22] Y. Brenier. Polar factorization and monotone rearrangement of vector-valued func-

tions. Communications on pure and applied mathematics, 44(4):375–417, 1991.

[23] S. S. Bridgeman and R. Tamassia. A user study in similarity measures for graph

drawing. J. Graph Algorithms Appl, 6(3):225–254, 2002.

[24] W. L. Briggs, V. E. Henson, and S. F. McCormick. A multigrid tutorial: second

edition. SIAM, 2000.

[25] R. Brockenauer and S. Cornelsen. 8. drawing clusters and hierarchies. Lecture Notes

in Computer Science, LNCS, 2025:193–227, 2001.

[26] I. Buck, T. Foley, D. Horn, J. Sugerman, K. Fatahalian, M. Houston, and P. Han-

rahan. Brook for GPUs: stream computing on graphics hardware. ACM Trans. on

Graphics, 23(3):777–786, 2004.

[27] M. Burch, S. Diehl, and P. Weiβgerber. Visual data mining in software archives.

In ACM Symposium on Software Visualization, pages 37–46, May 2005.

[28] C. Tenllado and J. Setoain and M. Prieto and L. Pinuel and F. Tirado. Parallel

implementation of the 2D discrete wavelet transform on graphics processing units:

Filter bank versus lifting. IEEE Transactions on Parallel and Distributed Systems,

19(3):299–310, 2008.

[29] S. K. Card, J. D. Mackinlay, and B. Shneiderman, editors. Readings in Information

Visualization Using Vision to Think. Morgan Kaufman, 1999.

[30] N. A. Carr, J. D. Hall, and J. C. Hart. The ray engine. In SIGGRAPH/Eurographics

Workshop on Graphics Hardware, pages 37–46, 2002.

[31] B. Catanzaro, N. Sundaram, and K. Keutzer. Fast support vector machine training

and classification on graphics processors. In Proceedings of the 25th Annual Inter-

national Conference on Machine Learning (ICML 2008), pages 104–111, 2008.


References 158

[32] T. F. Chan, J. Cong, and K. Sze. Multilevel generalized force-directed method for

circuit placement. In P. Groeneveld and L. Scheffer, editors, ISPD, pages 185–192.

ACM, 2005.

[33] T. M. Chan. A near-linear area bound for drawing binary trees. In Proceedings of

the Tenth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages

161–168, 1999.

[34] S. C. Chapra and R. P. Canale. Numerical methods for engineers: with programming

and software applications, 3rd edition. McGraw Hill, 1998.

[35] J. H. Chuang, C. C. Lin, and H. C. Yen. Drawing graphs with nonuniform nodes

using potential fields. In G. Liotta, editor, Proc. 11th Int. Symp. Graph Drawing

(GD 2003), number 2912 in Lecture Notes in Computer Science, LNCS, pages

460–465. Springer-Verlag, 2004.

[36] F. R. K. Chung. Spectral graph theory. Regional Conference Series in Mathematics,

American Mathematical Society, 92:1–212, 1997.

[37] IBM rational clearcase, 2008. Currently Available at http://www-306.ibm.com/-

software/awdtools/clearcase/.

[38] W. R. Cockayne and M. Zyda, editors. Mobile Agents. Prentice Hall, 1998.

[39] J. D. Cohen. Drawing graphs to convey proximity: an incremental arrangement

method. ACM Trans. Comput.-Hum. Interact., 4(3):197–229, 1997.

[40] C. Collberg, S. Kobourov, J. Nagra, J. Pitts, and K. Wampler. A system for

graph-based visualization of the evolution of software. In Proceedings ACM 2003

Symposium on Software Visualization, pages 77–86. ACM, 2003.

[41] P. Crescenzi, G. D. Battista, and A. Piperno. A note on optimal area algorithms

for upward drawings of binary trees. Comput. Geom, 2:187–200, 1992.

[42] P. Crescenzi and A. Piperno. Optimal-area upward drawings of AVL trees. In Pro-

ceedings of the DIMACS International Workshop, Graph Drawing, GD’94, volume

894 of LNCS, pages 307–317. 1995.


References 159

[43] R. Davidson and D. Harel. Drawing graphics nicely using simulated annealing.

ACM Transactions on Graphics, 15(4):301–331, Oct. 1996.

[44] J. W. Demmel. Applied Numerical Linear Algebra. SIAM, 1997.

[45] O. Deussen, S. Hiller, C. van Overveld, and T. Strothotte. Floating points: A

method for computing stipple drawings. Computer Graphics Forum, 19(3), Aug.

2000. ISSN 1067-7055.

[46] G. Di Battista, P. Eades, R. Tamassia, and I. G. Tollis. Algorithms for drawing

graphs: An annotated bibliography. Computational Geometry: Theory and Appli-

cations, 4(5):235–282, 1994.

[47] S. Diehl and C. Gorg. Graphs, They Are Changing - Dynamic Graph Drawing for

a Sequence of Graphs. In Proc. 10th Int. Symp. Graph Drawing, pages 23–31, 2002.

[48] T. Dwyer. Three dimensional UML using force directed layout. In P. Eades and

T. Pattison, editors, Australian Symposium on Information Visualisation, (invis.au

2001), volume 9 of Conferences in Research and Practice in Information Technology,

pages 77–85, Sydney, Australia, 2001. ACS.

[49] T. Dwyer, K. Marriott, and P. J. Stuckey. Fast node overlap removal. In P. Healy

and N. S. Nikolov, editors, Graph Drawing, volume 3843 of Lecture Notes in Com-

puter Science, pages 153–164. Springer, 2005.

[50] P. Eades. A heuristic for graph drawing. Congressus Numerantium, 42:149–160,

1984.

[51] P. Eades and Q. W. Feng. Multilevel visualization of clustered graphs. In S. C.

North, editor, Proc. 4th Int. Symp. Graph Drawing (GD 1996), number 1190 in

Lecture Notes in Computer Science, LNCS, pages 101–112. Springer-Verlag, 18–

20 Sept. 1996.

[52] P. Eades and K. Sugiyama. How to draw a directed graph. J. Information Process-

ing, 13(4):424–437, 1990.


References 160

[53] J. Ellson, E. R. Gansner, L. Koutsofios, S. C. North, and G. Woodhull. Graphviz

— open source graph drawing tools. In Proc. 9th Int. Symp. Graph Drawing (GD

2001), number 2265 in LNCS, pages 483–484, 2002.

[54] N. Elmqvist and P. Tsigas. Growing squares: animated visualization of causal

relations. In S. Diehl, J. T. Stasko, and S. N. Spencer, editors, Proceedings ACM

2003 Symposium on Software Visualization, pages 17–26. ACM, 2003.

[55] U. Erra. Toward real time fractal image compression using graphics hardware. In

Advances in Visual Computing, pages 723–728, 2005.

[56] C. Erten, P. J. Harding, S. G. Kobourov, K. Wampler, and G. V. Yee. GraphAEL:

Graph animations with evolving layouts. In Proc. 11th Int. Symp. Graph Drawing,

pages 98–110, 2003.

[57] K. Fatahalian, J. Sugerman, and P. Hanrahan. Understanding the efficiency of GPU

algorithms for matrix-matrix multiplication. In SIGGRAPH/EUROGRAPHICS

Workshop On Graphics Hardware, pages 133–137, 2004.

[58] R. Fernando, editor. GPU Gems: Programming Techniques, Tips, and Tricks for

Real-Time Graphics. 2004.

[59] M. Fiedler. A property of eigenvectors of nonnegative symmetric matrices and its

application to graph theory. Czechoslovak Mathematical Journal, 25(100):619–633,

1975.

[60] J. Foley, A. van Dam, S. Feiner, and J. Hughes. Computer Graphics: Principles

and Practice, second edition. Addison-Wesley Professional, 1990.

[61] T. Foley and J. Sugerman. KD-tree acceleration structures for a GPU raytracer.

In Graphics Hardware, pages 15–22, 2005.

[62] A. Frick, A. Ludwig, and H. Mehldau. A fast adaptive layout algorithm for undi-

rected graphs. In Graph Drawing, volume 894 of Lecture Notes in Computer Science,

pages 388–403. DIMACS, Oct. 1994.


References 161

[63] Y. Frishman and A. Tal. Dynamic drawing of clustered graphs. In Proc. of the

IEEE Symposium on Information Visualization, InfoVis, pages 191–198, 2004.

[64] Y. Frishman and A. Tal. Visualization of mobile object environments. In ACM

Symposium on Software Visualization, pages 145–154, 2005.

[65] Y. Frishman and A. Tal. Multi-level graph layout on the GPU. IEEE Trans. on

Visualization and Computer Graphics (Proc. InfoVis), 13(6):1310–1317, 2007.

[66] Y. Frishman and A. Tal. Online dynamic graph drawing. In EuroVis, pages 75–82,

2007.

[67] Y. Frishman and A. Tal. Movis: A system for visualizing distributed mobile object

environments. Journal of Visual Languages and Computing, 19(3):303–320, 2008.

[68] Y. Frishman and A. Tal. Online dynamic graph drawing. IEEE Transactions on

Visualization and Computer Graphics, 14(4):727–740, 2008.

[69] Y. Frishman and A. Tal. Uncluttering graph layouts using anisotropic diffusion and

mass transport. submitted for publication.

[70] T. M. J. Fruchterman and E. M. Reingold. Graph drawing by force-directed place-

ment. Software—Practice and Experience, 21(11):1129–1164, 1991.

[71] G. W. Furnas. Generalized fisheye views. In M. Mantei and P. Orbeton, editors,

Human Factors in Computing Systems, CHI’86 Conference Proceedings, pages 16–

23. ACM/SIGCHI, Special Issue of ACM SIGCHI Bulletin, 1986.

[72] P. Gajer, M. T. Goodrich, and S. G. Kobourov. A multi-dimensional approach to

force-directed layouts of large graphs. Comput. Geom, 29(1):3–18, 2004.

[73] N. Galoppo, N. K. Govindaraju, M. Henson, and D. Manocha. LU-GPU: Efficient

algorithms for solving dense linear systems on graphics hardware. In ACM / IEEE

Supercomputing, 2005.

[74] E. R. Gansner, Y. Koren, and S. C. North. Graph drawing by stress majorization.

In J. Pach, editor, Graph Drawing, 12th International Symposium, GD, volume

3383 of Lecture Notes in Computer Science, pages 239–250. Springer, 2004.


References 162

[75] E. R. Gansner, Y. Koren, and S. C. North. Topological fisheye views for visual-

izing large graphs. IEEE Transactions on Visualization and Computer Graphics,

11(4):457–468, 2005.

[76] E. R. Gansner, E. Koutsofios, S. C. North, and K.-P. Vo. A technique for drawing

directed graphs. IEEE Trans. Software Engineering, 19(3):214–230, Mar. 1993.

[77] E. R. Gansner and S. C. North. Improved force-directed layouts. In Graph Drawing,

volume 1547 of LNCS, pages 364–373, 1998.

[78] E. R. Gansner and S. C. North. Improved force-directed layouts. In S. Whitesides,

editor, Proc. 6th Int. Symp. Graph Drawing (GD 1998), number 1547 in Lecture

Notes in Computer Science, LNCS, pages 364–373. Springer-Verlag, 1998.

[79] E. R. Gansner and S. C. North. An open graph visualization system and its appli-

cations to software engineering. Software — Practice and Experience, 30(11):1203–

1234, 2000.

[80] A. Garg, M. T. Goodrich, and R. Tamassia. Planar upward tree drawings with

optimal area. Int. J. Comput. Geometry Appl, 6(3):333–356, 1996.

[81] A. Garg and R. Tamassia. A new minimum cost flow algorithm with applications

to graph drawing. In GD ’96: Proceedings of the Symposium on Graph Drawing,

pages 201–216, 1997.

[82] M. T. Gastner and M. E. J. Newman. Diffusion-based method for producing

density-equalizing maps. Proc. Nat. Acad. Sci. USA, 101(20):7499–7504, 2004.

[83] J. Georgii, F. Echtler, and R. Westermann. Interactive simulation of deformable

bodies on GPUs. In SimVis, pages 247–258, 2005.

[84] N. Goodnight, R. Wang, C. Woolley, and G. Humphreys. Interactive time-

dependent tone mapping using programmable graphics hardware. In Eurographics

Symposium on Rendering, pages 1–13, 2003.

[85] N. Goodnight, C. Woolley, G. Lewin, D. Luebke, and G. Humphreys. A multigrid

solver for boundary value problems using programmable graphics hardware. In

SIGGRAPH/Eurographics Workshop on Graphics Hardware, pages 102–111, 2003.


References 163

[86] C. Gorg, P. Birke, M. Pohl, and S. Diehl. Dynamic graph drawing of sequences of

orthogonal and hierarchical graphs. In Proc. 12th Int. Symp. Graph Drawing, GD,

volume 3383 of LNCS, pages 228–238, 2004.

[87] N. K. Govindaraju, J. Gray, R. Kumar, and D. Manocha. GPUTerasort: high

performance graphics co-processor sorting for large database management. In SIG-

MOD Conference, pages 325–336, 2006.

[88] N. K. Govindaraju, B. Lloyd, W. Wang, M. Lin, and D. Manocha. Fast computation

of database operations using graphics processors. In Proceedings of the 2004 ACM

SIGMOD International Conference on Management of Data, pages 215–226, 2004.

[89] GPGPU. http://www.gpgpu.org.

[90] C. Gutwenger and P. Mutzel. Planar polyline drawings with good angular resolu-

tion. In Proc. 6th Int. Symp. Graph Drawing, GD, volume 1547 of Lecture Notes

in Computer Science, LNCS, pages 167–182, 1998.

[91] S. Hachul and M. Junger. Drawing large graphs with a potential-field-based multi-

level algorithm. In Graph Drawing, pages 285–295, 2004.

[92] S. Hachul and M. Junger. An experimental comparison of fast algorithms for draw-

ing general large graphs. In Graph Drawing, volume 3843 of LNCS, pages 235–250,

2005.

[93] T. Hakamata, T. Caudell, and E. Angel. Force-directed graph layout using the gpu.

In Supercomputing ’06 Workshop ”General-Purpose GPU Computing: Practice And

Experience”, 2006.

[94] S. Haker, L. Zhu, A. Tannenbaum, and S. Angenent. Optimal mass transport for

registration and warping. International Journal of Computer Vision, 60(3):225–240,

2004.

[95] C. D. Hansen, J. M. Kniss, A. E. Lefohn, and R. T. Whitaker. A streaming

narrow-band algorithm: Interactive computation and visualization of level sets.

IEEE Transactions on Visualization and Computer Graphics, 10(4):422–433, 2004.


References 164

[96] D. Harel and Y. Koren. A Fast Multi-Scale Algorithm for Drawing Large Graphs.

J. Graph Algorithms Appl., 6(3):179–202, 2002.

[97] D. Harel and Y. Koren. Drawing graphs with non-uniform vertices. In Proc.

Working Conference on Advanced Visual Interfaces (AVI’02), pages 157–166. ACM

Press, 2002.

[98] D. Harel and Y. Koren. Graph drawing by high-dimensional embedding. J. Graph

Algorithms Appl, 8(2):195–214, 2004.

[99] M. J. Harris. GPU Gems: Programming Techniques, Tips, and Tricks for Real-

Time Graphics, chapter 38: Fast Fluid Dynamics Simulation on the GPU, pages

637–665. Addison-Wesley, 2004.

[100] M. J. Harris, W. Baxter, T. Scheuermann, and A. Lastra. Simulation of cloud dy-

namics on graphics hardware. In SIGGRAPH/Eurographics Workshop on Graphics

Hardware, pages 92–101, 2003.

[101] K. Hayashi, M. Inoue, T. Masuzawa, and H. Fujiwara. A layout adjustment prob-

lem for disjoint rectangles preserving orthogonal order. In S. Whitesides, editor,

Graph Drawing, volume 1547 of Lecture Notes in Computer Science, pages 183–197.

Springer, 1998.

[102] O. Holder, I. Ben-Shaul, and H. Gazit. Dynamic layout of distributed applications in

fargo. In Proceedings of the 1999 International Conference on Software Engineering,

pages 163–173. IEEE Computer Society Press / ACM Press, 1999.

[103] O. Holder, I. Ben-Shaul, and H. Gazit. System support for dynamic layout of dis-

tributed applications. In 19th International Conference on Distributed Computing

Systems (19th ICDCS’99), Austin, Texas, May 1999. IEEE.

[104] D. R. Horn, M. Houston, and P. Hanrahan. ClawHMMER: a streaming HMMer-

search implementation. In Proceedings of the 2005 ACM/IEEE Conference on Su-

percomputing, Seattle, Washington, 2005.


References 165

[105] M. L. Huang and P. Eades. A fully animated interactive system for clustering

and navigating huge graphs. In Proc. 6th Int. Symp. Graph Drawing (GD 1998),

number 1547 in LNCS, pages 374–383, 1998.

[106] X. Huang, W. Lai, A. S. M. Sajeev, and J. Gao. A new algorithm for removing

node overlapping in graph visualization. Inf. Sci., 177(14):2821–2844, 2007.

[107] ILOG JViews diagrammer, 2008. Currently Available at http://www.ilog.com/-

products/jviews/graphlayout.

[108] T. Jansen, B. von Rymon-Lipinski, N. Hanssen, and E. Keeve. Fourier volume ren-

dering on the GPU using a split-stream-FFT. In Vision, modeling and visualization,

pages 395–403, 2004.

[109] X. Jin, S. Chen, and X. Mao. Computer-generated marbling textures: A gpu-based

design system. IEEE Computer Graphics and Applications, 27(2):78–84, 2007.

[110] A. Joseph, R. Dar, and Y. Almog. Active Market Project Report, 2000. Available

at http://softlab.technion.ac.il/project/amarket/html/home.htm.

[111] M. Junger and P. Mutzel, editors. Graph Drawing Software. Springer-Verlag, 2003.

[112] T. Kamada. Visualizing Abstract Objects and Relations. World Scientific, 1989.

[113] T. Kamada and S. Kawai. An algorithm for drawing general undirected graphs.

Information Processing Letters, 31(1):7–15, 1989.

[114] G. Kant. Drawing planar graphs using the canonical ordering. Algorithmica,

16(1):4–32, July 1996.

[115] L. V. Kantorovich. On a problem of Monge. Uspekhi Mat. Nauk., 3(2):225–226,

1948.

[116] M. Kaufmann and D. Wagner, editors. Drawing Graphs: Methods and Models.

2001.

[117] M. Kaufmann and R. Wiese. Maintaining the mental map for circular drawings.

In Graph Drawing, 10th International Symposium, volume 2528 of Lecture Notes in

Computer Science, pages 12–22, 2002.


References 166

[118] P. Kipfer, M. Segal, and R. Westermann. Uberflow: A GPU-based particle en-

gine. In Eurographics/SIGGRAPH Workshop on Graphics Hardware, pages 115–

122, 2004.

[119] M. Knott and C. S. Smith. On the optimal mapping of distributions. Journal of

Optimization Theory and Applications, 43(1):39–49, 1984.

[120] J. A. Kohl and G. A. Geist. The PVM 3.4 tracing facility and XPVM 1.1. In

H. El-Rewini and B. D. Shriver, editors, Proceedings of the Twenty-Ninth Hawaii

International Conference on System Sciences (HICSS-29), volume 1, pages 290–

299. IEEE Computer Society Press, 1996.

[121] Y. Koren. Drawing graphs by eigenvectors: Theory and practice. In Computers

and Mathematics with Applications, volume 45, pages 1867–1888. Elsevier, 2005.

[122] Y. Koren, L. Carmel, and D. Harel. ACE: A fast multiscale eigenvectors com-

putation for drawing huge graphs. In INFOVIS, pages 137–144. IEEE Computer

Society, 2002.

[123] Y. Koren, L. Carmel, and D. Harel. Drawing huge graphs by algebraic multigrid

optimization. Multiscale Modeling & Simulation, 1(4):645–673, 2003.

[124] E. Kraemer and J. Stasko. The visualization of parallel systems: an overview.

Journal of Parallel and Distributed Computing, 18(2):105–117, 1993.

[125] E. Kraemer and J. T. Stasko. Creating an accurate portrayal of concurrent execu-

tions. IEEE Concurrency, 6(1):36–46, 1998.

[126] J. Kruger and R. Westermann. Acceleration techniques for GPU-based volume

rendering. In IEEE Visualization, pages 287–292, 2003.

[127] J. Kruger and R. Westermann. Linear algebra operators for GPU implementa-

tion of numerical algorithms. In Proc. ACM SIGGRAPH, volume 22(3) of ACM

Transactions on Graphics, pages 908–916, 2003.

[128] G. Kumar and M. Garland. Visual exploration of complex time-varying graphs.

IEEE Trans. on Visualization and Computer Graphics, Proc. InfoVis, 2006.


References 167

[129] L. Lamport. Time, clocks, and the ordering of events in a distributed system. In

Communications of the ACM, pages 558–565, July 1978.

[130] D. Lange and M. Oshima. Seven Good Reasons for Mobile Agents. Communications

of the ACM, 42(3):88–89, 1999.

[131] E. S. Larsen and D. McAllister. Fast matrix multiplies using graphics hardware.

In ACM / IEEE Supercomputing, page 55, 2001.

[132] Y.-Y. Lee, C.-C. Lin, and H.-C. Yen. Mental Map Preserving Graph Drawing

Using Simulated Annealing, volume 60 of Conferences in Research and Practice in

Information Technology. 2006.

[133] W. Li, P. Eades, and N. Nikolov. Using spring algorithms to remove node over-

lapping. In Asia Pacific Symposium on Information Visualisation (APVIS2005),

volume 45 of CRPIT, pages 131–140, 2005.

[134] E. Lindholm, M. J. Kilgard, and H. Moreton. A user-programmable vertex engine.

In SIGGRAPH 2001, Computer Graphics Proceedings, pages 149–158, 2001.

[135] W. Liu, B. Schmidt, G. Voss, and W. Muller-Wittig. Molecular dynamics simula-

tions on commodity GPUs with CUDA. In HiPC, volume 4873 of Lecture Notes in

Computer Science, pages 185–196, 2007.

[136] Y. Liu, X. Liu, and E. Wu. Real-time 3D fluid simulation on GPU with complex

obstacles. In Pacific Conference on Computer Graphics and Applications, pages

247–256, 2004.

[137] P. Ljung. Adaptive sampling in single pass, GPU-based raycasting of multiresolu-

tion volumes. In Eurographics/IEEE VGTC Workshop on Volume Graphics, pages

39–46, Boston, Massachusetts, USA, 2006.

[138] K. A. Lyons, H. Meijer, and D. Rappaport. Algorithms for cluster busting in

anchored graph drawing. J. Graph Algorithms and Applications, 2(1):1–24, 1998.

[139] S. A. Manavski and G. Valle. CUDA compatible GPU cards as efficient hardware

accelerators for smith-waterman sequence alignment. BMC Bioinformatics, 9(Suppl

2):S10, 2008.


References 168

[140] W. R. Mark, R. S. Glanville, K. Akeley, and M. J. Kilgard. Cg: a system for pro-

gramming graphics hardware in a C-like language. ACM Transactions on Graphics,

22(3):896–907, July 2003.

[141] K. Marriott, P. J. Stuckey, V. Tam, and W. He. Removing node overlapping in

graph layout using constrained optimization. Constraints, 8(2):143–171, 2003.

[142] M. D. McCool, S. D. Toit, T. Popa, B. Chan, and K. Moule. Shader algebra. ACM

Transactions on Graphics, 23(3):787–795, 2004.

[143] D. Merrick and J. Gudmundsson. Increasing the readability of graph drawings with

centrality-based scaling. In Asia Pacific Symposium on Information Visualisation

(APVIS2006), volume 60 of CRPIT, pages 67–76, 2006.

[144] D. Milojicic, F. Douglis, and R. Wheeler, editors. Mobility: Processes, Computers

and Agents. ACM Press, 1999.

[145] K. Misue, P. Eades, W. Lai, and K. Sugiyama. Layout adjustment and the mental

map. J. Visual Languages and Computing, 6(2):183–210, 1995.

[146] J. Moe and D. A. Carr. Understanding distributed systems via execution trace

data. In International Workshop on Program Comprehension, pages 60–69. IEEE

Computer Society Press, 2001.

[147] A. S. Montemayor, R. Cabido, J. J. Pantrigo, and B. R. Payne. Bandwidth-

improved gpu particle filter for visual tracking. In 3rd Ibero-American Symposium

on Computer Graphics, SIACG, pages 874–881, 2006.

[148] J. Montrym and H. P. Moreton. The geforce 6800. IEEE Micro, 25(2):41–51, 2005.

[149] J. Moody, D. McFarland, and S. Bender-deMoll. Dynamic network vi-

sualization. American Journal of Sociology, 110(4):1206–1241, 2005.

http://www.journals.uchicago.edu/AJS/journal/issues/v110n4/-

080349/080349.html.

[150] K. Moreland and E. Angel. The FFT on a GPU. In SIGGRAPH/Eurographics

Workshop on Graphics Hardware, pages 112–119, 2003.


References 169

[151] Y. Moses, Z. Polunsky, A. Tal, and L. Ulitsky. Algorithm visualization for dis-

tributed environments. Journal of Visual Languages and Computing, 15(1):97–123,

2004.

[152] T. M. Newcomb. The acquaintance process. Holt, Rinehart and Winston, 1961.

[153] H. Nguyen, editor. GPU Gems 3. Addison-Wesley, 2007.

[154] T. Nishizeki and M. S. Rahman. Planar graph drawing. World Scientific, 2004.

[155] S. C. North. Incremental layout in dynadag. In Proc. 3rd Int. Symp. Graph Drawing,

number 1027 in LNCS, pages 409–418, 1995.

[156] NVIDIA. CUDA : Compute unified device architecture. http://-

www.nvidia.com/object/cuda home.html.

[157] L. Nyland, M. Harris, and J. Prins. The rapid evaluation of potential fields using

programmable graphics hardware. In ACM Workshop on General Purpose Com-

puting on Graphics Hardware, 2004.

[158] Object Management Group. The Common Object Request Broker: Architecture

and Specification. Revision 2.2, February 1998.

[159] ObjectSpace. ObjectSpace Voyager Core Package: Technical Overview, December

1997.

[160] K. Oh and K. Jung. Gpu implementation of neural networks. Pattern Recognition,

37(6):1311–1314, June 2004.

[161] C. O’Reilly, D. Bustard, and P. Morrw. The war room command console - shared

visualizations for inclusive team coordination. In ACM Symposium on Software

Visualization, pages 57–65, May 2005.

[162] J. Owens, M. Houston, D. Luebke, S. Green, J. Stone, and J. Phillips. GPU

computing. Proceedings of the IEEE, 96(5):879–899, 2008.

[163] J. D. Owens, D. Luebke, N. Govindaraju, M. Harris, J. Kruger, A. E. Lefohn, and

T. J. Purcell. A survey of general-purpose computation on graphics hardware. In

Eurographics, pages 21–51, 2005.


References 170

[164] J. D. Owens, D. Luebke, N. Govindaraju, M. Harris, J. Kruger, A. E. Lefohn,

and T. J. Purcell. A survey of general-purpose computation on graphics hardware.

Computer Graphics Forum, 26(1):80–113, 2007.

[165] V. Pande. Folding@home on ati gpu’s, 2006.

http://folding.stanford.edu/FAQ-ATI.html.

[166] A. Papakostas and I. G. Tollis. Orthogonal drawing of high degree graphs with small

area and few bends. In WADS: 5th Workshop on Algorithms and Data Structures,

1997.

[167] W. D. Pauw, E. Jensen, N. Mitchell, G. Sevitsky, J. Vlissides, and J. Yang. Vi-

sualizing the execution of java programs. In S. Diehl, editor, Proceedings of the

International Seminar on Software Visualization, number 2269 in Lecture Notes in

Computer Science, LNCS, pages 151–162. Springer-Verlag, 2001.

[168] S.-M. Peercy, M. and D. Derstmann. A performance-oriented data parallel virtual

machine for gpus. In ACM SIGGRAPH sketches. ACM Press, 2006.

[169] M. Pharr and R. Fernando, editors. GPU Gems 2 : Programming Techniques for

High-Performance Graphics and General-Purpose Computation. 2005.

[170] Pothen, A., Simon, H., and Liou, K. Partitioning sparse matrices with eigenvectors

of graphs. SIAM J. Matrix Anal. and Appl., 11:430–452, 1990.

[171] T. J. Purcell, I. Buck, W. R. Mark, and P. Hanrahan. Ray tracing on programmable

graphics hardware. ACM Transactions on Graphics, 21(3):703–712, 2002.

[172] A. J. Quigley and P. Eades. FADE: Graph drawing, clustering, and visual abstrac-

tion. In Graph Drawing, number 1984 in LNCS, pages 197–210, 2000.

[173] D. A. Reed, R. A. Aydt, R. J. Noe, P. C. Roth, K. A. Shields, B. W. Schwartz,

and L. F. Tavera. Scalable Performance Analysis: The Pablo Performance Analysis

Environment. In Proceedings of Scalable Parallel Libraries Conference, pages 104–

113. IEEE Computer Society, 1993.


References 171

[174] T. Rehman, E. Haber, G. Pryor, J. Melonakos, and A. Tannenbaum. 3D nonrigid

registration via optimal mass transport on the GPU. Accepted - Elsevier Journal

of Medical Image Analysis, 2008.

[175] E. M. Reingold and J. S. Tilford. Tidier drawings of trees. IEEE Trans. on Softw.

Eng., 7(2):223, Mar. 1981.

[176] N. Rober, U. Kaminski, and M. Masuch. Ray acoustics using computer graphics

technology. In 10th International Conference on Digital Audio Effects (DAFx-07),

pages 117–124, 2007.

[177] M. Rumpf and R. Strzodka. Level set segmentation in graphics hardware. In

International Conference on Image Processing, pages 1103–1106, 2001.

[178] S. Ryoo, C. I. Rodrigues, S. S. Baghsorkhi, S. S. Stone, D. B. Kirk, and W. mei

W. Hwu. Optimization principles and application performance evaluation of a

multithreaded GPU using CUDA. In Proceedings of the 13th ACM SIGPLAN

Symposium on Principles and Practice of Parallel Programming, PPOPP, pages

73–82, 2008.

[179] T. Scheuermann and J. Hensley. Efficient histogram generation using scattering on

GPUs. In B. Gooch and P.-P. J. Sloan, editors, SI3D, pages 33–37. ACM, 2007.

[180] W. Schnyder. Embedding planar graphs on the grid. In SODA ’90: Proceedings

of the first annual ACM-SIAM symposium on Discrete algorithms, pages 138–148,

Philadelphia, PA, USA, 1990. Society for Industrial and Applied Mathematics.

[181] M. Segal and K. Akeley. The opengl graphics system: A specification, version 2.0

www.opengl.org, 2004.

[182] S. Sengupta, M. Harris, Y. Zhang, and J. D. Owens. Scan primitives for GPU

computing. In Graphics Hardware, pages 97–106, San Diego, California, USA,

2007.

[183] E. Sharon, A. Brandt, and R. Basri. Fast multiscale image segmentation. In

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

(CVPR-00), pages 70–77, Los Alamitos, June 13–15 2000. IEEE.


References 172

[184] J. Shi and J. Malik. Normalized cuts and image segmentation. IEEE Trans. on

PAMI, 22(8):888–905, 2000.

[185] E. Shimizu and R. Inoue. Time-distance mapping: visualization of transporta-

tion level of service. In Proc. of Symposium on Environmental Issues Related to

Infrastructure Development, pages 221–230, 2003.

[186] N. T. Spring, R. Mahajan, and D. Wetherall. Measuring ISP topologies with rock-

etfuel. In SIGCOMM, pages 133–145, 2002.

[187] J. T. Stasko and E. Kraemer. A methodology for building application-specific

visualizations of parallel programs. Journal of Parallel and Distributed Computing,

18(2):258–264, 1993.

[188] S. Stegmaier, M. Strengert, T. Klein, and T. Ertl. A simple and flexible volume ren-

dering framework for graphics-hardware-based raycasting. In Eurographics/IEEE

VGTC Workshop on Volume Graphics, pages 187–195, 2005.

[189] I. Stephen, T. Munzner, and M. Olano. Glimmer: Multilevel MDS on the GPU.

Technical Report UBC CS TR-2007-15, University of British Columbia, 2007.

[190] J. E. Stone, J. C. Phillips, P. L. Freddolino, D. J. Hardy, L. G. Trabuco, and

K. Schulten. Accelerating molecular modeling applications with graphics processors.

Journal of Computational Chemistry, 28(16):2618–2640, 2007.

[191] G. Strang. Introduction to Applied Mathematics. Wellesley-Cambridge press, 1986.

[192] K. Sugiyama. Graph Drawing and Applications for Software and Knowledge Engi-

neers. World Scientific, 2002.

[193] K. Sugiyama and K. Misue. A simple and unified method for drawing graphs:

Magnetic-spring algorithm. In Graph Drawing, volume 894 of Lecture Notes in

Computer Science, pages 364–375. DIMACS, Oct. 1994.

[194] K. Sugiyama, S. Tagawa, and M. Toda. Methods for visual understanding of hier-

achical system structures. IEEE Transactions on Systems, Man, and Cybernetics,

SMC-11(2):109–125, Feb. 1981.


References 173

[195] Sun Microsystems, Inc. Java Remote Method Invocation (RMI) Specification, De-

cember 1997.

[196] R. Tamassia. On embedding a graph in the grid with the minimum number of

bends. SIAM J. Comput., 16(3):421–444, 1987.

[197] R. Tamassia and I. Tollis. Planar grid embedding in linear time. IEEE Transactions

on Circuits and Systems, 36:1230–1234, 1989.

[198] E. Tejada and T. Ertl. Large Steps in GPU-based Deformable Bodies Simulation.

Simulation Modelling Practice and Theory, 13:703–715, 2005.

[199] I. G. Tollis, G. D. Battista, P. Eades, and R. Tamassia. Graph Drawing: Algorithms

for the Visualization of Graphs. Prentice Hall, 1999.

[200] Tom sawyer graph layout toolkit, 2004. Currently Available at

http://www.tomsawyer.com.

[201] B. Topol, J. T. Stasko, and V. Sunderam. Pvanim: A tool for visualization

in network computing environments. Concurrency: Practice and Experience,

10(14):1197–1222, 1998.

[202] R. van Liere and W. C. de Leeuw. Graphsplatting: Visualizing graphs as continuous

fields. IEEE Trans. Vis. Comput. Graph, 9(2):206–212, 2003.

[203] L. Voinea, A. Telea, and J. J. van Wijk. CVSscan: Visualization of code evolution.

In ACM Symposium on Software Visualization, pages 47–56, May 2005.

[204] C. Walshaw. graph collection. http://staffweb.cms.gre.ac.uk/~c.walshaw/-

partition/.

[205] C. Walshaw. A Multilevel Algorithm for Force-Directed Graph Drawing. J. Graph

Algorithms Appl., 7(3):253–285, 2003.

[206] X. Wang and I. Miyamoto. Generating customized layouts. In F.-J. Brandenburg,

editor, Proc. 3rd Int. Symp. Graph Drawing (GD 1995), number 1027 in Lecture

Notes in Computer Science, LNCS, pages 504–515. Springer-Verlag, 1996.


References 174

[207] Y. Wang and T. Kunz. Visualizing mobile agent executions. In E. Horlait, editor,

Second International Workshop on Mobile Agents for Telecommunication Applica-

tions (MATA 2000), number 1931 in Lecture Notes in Computer Science, LNCS,

pages 103–114. Springer-Verlag, 2000.

[208] D. S. Watkins. Fundamentals of Matrix Computations. John Wiley, 2002.

[209] R. Wiese, M. Eiglsperger, and M. Kaufmann. yfiles: Visualization and automatic

layout of graphs. In P. Mutzel, M. Junger, and S. Leipert, editors, Proc. 9th Int.

Symp. Graph Drawing (GD 2001), number 2265 in Lecture Notes in Computer

Science, LNCS, pages 453–454. Springer-Verlag, 2001.

[210] R. Wilson and R. Bergeron. Dynamic hierarchy specification and visualization. In

Proc. IEEE Symp. Information Visualization, InfoVis, pages 65–72, 1999.

[211] A. Wong, T. Dillon, M. Ip, and W. Lin. A generic visualization framework to

help debug mobile-object-based distributed programs running on large networks.

In WORDS, pages 240–250. IEEE Computer Society, 2001.


את הבעיה בצורה המתאימה למימוש על מעבד גרפי או על ארכיטקטורות מקביליות עתידיות . ריצה על ידי שימוש במעבד גרפיהראינו כיצד מתקבלת האצה משמעותית של זמן ה. אחרות

הטובים םהאלגוריתם שהתקבל הוא מהיר מאוד ומחשב שיכונים בעלי איכות דומה לאלגוריתמי . ביותר בתחום

במקרים . אנו מציגים אלגוריתם לצמצום העומס הנוצר בשיכונים של גרפים4בפרק מסוימים של התמונה הם בעוד שחלקים . רבים צפיפות האינפורמציה בשיכון הגרף אינה אחידה

דבר זה מקשה על מיצוי . חלקים אחרים מכילים צמתים וקשתות רבות, דלילים או אפילו ריקיםבעזרת שימוש . המידע משיכון הגרף ומהווה ניצול לא יעיל של שטח המסך העומד לרשותנו

של האלגוריתם המוצע מחשב פיזור חדש , בתהליך התפתחות המדמה התפשטות של חום במישורקביעת כיוון ההתפתחות של פיזור המידע . המידע בגרף כך ששטח המסך מנוצל בצורה יעילה יותר

. המואץ בצורה משמעותית על ידי שימוש במעבד גרפי, ידי תהליך השלכת קרניים-נעשית עלמיקום כל צומת בגרף מעודכן , לבסוף. לאחר מכן מחושב העיוות של תמונת פיזור המידע בגרף

.יוות שחושבבהתאם לעמלבד . מוצג אלגוריתם מקוון לחישוב סדרת שיכונים של גרף דינאמי כללי5בפרק

המפה "יש צורך לשמור על , כאשר מטפלים בגרפים דינאמיים, הדרישה לקבלת גרף אסתטיתוך כדי , מספר שיטות לצמצום כמות החישוב הנדרשת. שבונה המשתמש בראשו" המנטאלית

האלגוריתם פותר את בעיית השיכון . מוצגות, הגרף שעברו שינוייםהתרכזות באזורים של הרעיון המרכזי של האלגוריתם הוא קביעת . הדינאמי על ידי חישוב השיכון במספר רמות פירוט

בשיטה זו מתקבל שיכון יציב . יכולת התנועה של כל צומת בגרף בהתאם לשינויים בסביבתה בגרף כיצד ניתן להאיץ את חישוב סדרת השיכונים על ידי שימוש הראינו, כן-כמו. ואסטטי של הגרף

. בזמן הריצה17כדי להשיג שיפור של עד פי , במעבד גרפי, אנו דנים באלגוריתם לחישוב מקוון של סדרת שיכונים של גרף המכיל אשכולות6בפרק

. תמשלית של הגרף שיש למשאהאלגוריתם שומר על המפה המנט. המשתנה כפונקציה של הזמןהאלגוריתם עושה . יהדינאמכחלק מהמחקר פיתחנו מספר מטריקות למדידת איכות השיכון

ותקשתות בלתי נראבושאינם מורשים לזוז שימוש בצמתים נעשה, ראשית. שימוש במספר כלים בצמתים נוספים כדי למזער את השינויים בין נעשה שימוש, שנית. גרףה שמור על מבנהכדי ל

, בנוסף. במשקלות על הקשתות כדי לשלוט בצורת הגרףנעשה שימוש, שלישית. שיכון אחד לשני .נשארים קבועים במקומםה לצמתים דינאמים מתיר להבדיל בין צמתים האלגוריתם

. [64 ,61 ,14] לויזואליזציה של מערכות אובייקטים ניידים אנו מציגים מערכת 7בפרק נעשה שימוש . ין המחשבים בזמן ריצת האפליקציההאובייקטים רשאים לעבור ב, במערכות כאלו

כמו גם השכבה ) מיקום על מחשבים ששיכים למערכת(בגרף עם אשכולות להצגת השכבה הפיסית המערכת היא סקלאבילית ויוצרת ויזואליזציה . במערכת) קשרים בין אובייקטים(הלוגית

.קונסיסטנטית של המערכת המבוזרת

iii


,139 ,26] יבד הגרפ כדי להיות מסוגלים להפעיל את המעיקיימת בעבר להכרות עם הממשק הגרפ

[141, 155, 167. פרץ םהשילוב בין כוח החישוב הרב לבין הגמישות ביכולת התכנות של המעבדים הגרפיי

הארכיטקטורה של . [161 ,88] יאת הדרך ליישום של שלל אפליקציות מדעיות על הכרטיס הגרפ האתגר הוא הרצת לעומת מעבד רגיל שבו. שונה מהארכיטקטורה של מעבד רגיליהמעבד הגרפ

מצטיין במקרים שבהם נדרש ביצוע צעדים זהים יהמעבד הגרפ, קוד סדרתי מהר ככל האפשר . במקביל על פיסות מידע רבות בקצב מהיר ככל הניתן

. בתזה זו נחקרו מספר בעיות קשורות בתחום של שיכון גרפים עבור ויזואליזציה של מידעלאחר מכן מוצג . אנו מתמקדים בבעיה של חישוב שיכון מהיר ואיכותי של גרף כללי יחיד, ראשית

ניתן להריץ אלגוריתם זה כדי לשפר פלט של כל אלגוריתם . אלגוריתם לשיפור שיכון קיים של גרףהן בגרפים המחולקים , של גרפיםיאנו עוסקים בבעיות של שיכון דינאמ, לאחר מכן.שיכון

אנו מראים שהאלגוריתמים שומרים בצורה טובה על . לאשכולות והן בגרפים בעלי מבנה כללישיכונים יציבים לגרפים , באופן מקוון, של המשתמש לגבי הגרף ומייצריםתהמפה המנטאלי

. המשתנים עם הזמןגרפים הם בעלי מבנה לא , שהן בעלות מבנה מסודר ואחיד, ד לתמונות או למטריצותבניגו

במבט ראשון נראה שפתרון של בעיות הקשורות לגרפים אינו מתאים ליכולות החישוב , לכן. אחיד. הבנויים לבצע במקביל אותה סדרת פעולות על פיסות מידע שונות, של המעבדים הגרפיים

ואף , ם לבצע מספר חישובים על גרפים בצורה יעילה על מעבדים גרפייבמחקר זה הראינו שניתן .לקבל תוצאות מהירות פי כמה מאשר ניתן להשיג בהרצה על מעבדים רגילים

אנו מציגים שיכונים . בתזה זו מוצגות אפליקציות שונות לאלגוריתמים שפותחו משפרים שיכונים של אנו. סטאטיים של גרפים המייצגים רשתות של ספקי תשתית אינטרנט

אנו מציגים התפתחות . תכן מעגלים ורשתות אלמנטים סופיים, גרפים מתחומי ביואינפורמטיקהפיתחנו , בנוסף. בזמן של גרפים המייצגים קבוצות דיון באינטרנט ורשתות חברתיות באינטרנט

טים יכולים במערכות אלה האובייק. מערכות אובייקטים ניידיםמערכת תוכנה לויזואליזציה של אנו מציגים מערכת סקלאבילית העושה שימוש . לעבור ממחשב למחשב בזמן הרצת האפליקציה

כדי לאפשר ויזואליזציה של האובייקטים , באלגוריתם לשיכון דינאמי של גרף בעל אשכולות . הניידים

אנחנו. אנו מטפלים בבעיה של חישוב של שיכון של גרף לא מכוון כללי במישור3בפרק בעזרת שיטה זו ניתן לקבל שיכון טוב תוך . מציעים שיטה שבה הגרף מוצג במספר רמות פירוט

האלגוריתם מחלק אותו , כדי לבנות את רמות הפירוט השונות של הגרף. צמצום זמן החישובחלוקת הגרף מתבצעת על ידי אלגוריתם חלוקה . לחלקים הולכים וקטניםיבאופן רקורסיב

הנבנית מתוך יחסי השכנות בין הצמתים , מטריצת הלפלסיאן של הגרףספקטראלי הפועל עלכדי לקבל , המחקר עושה שימוש בתכונות של שני אלגוריתמים קיימים לחישוב השיכון. בגרף

מוצגת שיטה חדשה להמרת שיכון של גרף ברמת פירוט נמוכה לשיכון התחלתי של . שיכון אסתטי בין חלקים הוהאינטראקצי, וב הגרף מחולק לחלקיםכדי להאיץ את החיש. גרף מפורט יותר

ניתן למקבל , כיוון שהחלקים שיוצר האלגוריתם הנם בעלי גודל דומה. רחוקים של הגרף מקורבת

ii Technion - Computer Science Department - Ph.D. Thesis PHD-2009-02 - 2009

תקציר

וגרפיותאינטראקטיביות , שיטות מבוססות מחשבב השימוש היא ויזואליזציה של מידע שימוש .]29 [ כדי לעזור למשתמש להגיע לתובנות,ובעיקר מידע מופשט, להצגה גרפית של מידע

מערכת . מאפשר לרתום את מערכת הראייה האנושית לצורך הבנת המידע וניתוחות גרפיבהצגהאחד . ידי היכולות של קליטת כמויות רבות של מידע וזיהוי דפוסים במידע-הראייה מתאפיינת על

. מהאתגרים המרכזיים בויזואליזציה של מידע הוא מציאת דרכים למפות מידע מופשט לתמונה

ר ו ייצהנוהאתגר כאן . ימידע שלו רוצים לבצע ויזואליזציה הוא דינאמה, במקרים רבים" מפה המנטלית"כך שהצופה יכול לשמור על ה, "מספרות סיפור"סידרה קוהרנטית של תמונות ש

ויזואליזציה של ליצירת מספיקה אינההרכבה של מספר תמונות סטטיות .]139[שלו לגבי המידע צריך להיות מסוגל להבחין בשינויים תדינאמיויזואליזציה צופה בההמשתמש . ידינאממידע כל זה צריך להתבצע תוך כדי שמירה של המבנה . ולהבינםים במידע בזמן הויזואליזציההחל

.המידעייצוג הכללי של ,45[ היא אחת הבעיות המרכזיות בתחום של ויזואליזציה של מידע שיכון גרפיםבעיית

גרפים הינם אובייקטים מתמטיים מופשטים המציגים קשרים ]. 197 ,190 ,153 ,115 ,111 ,110בבעיה . גרף מכיל קבוצה של צמתים וקבוצה של קשתות המחברות בין צמתים. בין אובייקטים

בייצוג זה נקבע המיקום במישור . של גרףישל שיכון גרפים מתמודדים עם מציאת ייצוג גיאומטר. ל ידי חיבור הצמתים בעזרת עקומים או קווים ישריםלכל צומת של הגרף והקשתות מיוצגות ע

מיקום הצמתים , בשיכון הגרף. קיימים אינספור שיכונים המתאימים לגרף מופשט נתון . והקשתות במישור קובע את יכולתנו להבין את מבנה הגרף ולהגיע לתובנות כלפיו

רשתות , משולביםתכן מעגלים : מספר דוגמאות הם. לשיכון גרפים ישנם שימושים רביםמבני , גרף קריאות לפונקציות(הנדסת תוכנה , מכונות מצבים, ביואינפורמטיקה, חברתיות .רשתות תקשורת ותהליכי בקרה, )הצגת התפתחות של תוכנה, נתונים

התפתחו , של המחשביהמהוות את ליבת הכרטיס הגרפ, )GPU (תיחידות עיבוד גרפיו רק עבור יבעבר השתמשו במעבד הגרפ]. 176 ,162 ,88[בשנים האחרונות בקצב מהיר מאוד

ככל שטכנולוגיית ייצור המוליכים למחצה . אפליקציות הקשורות בצורה ישירה לגרפיקהכוח החישוב והגמישות , של ההתקנים הנמצאים על פיסת סיליקון גדלהההשתפרה והאינטגרצי

גרפיים מתקדמים הפועלים םמיכתוצאה מכך מומשו אלגורית. גדלוישל יחידות העיבוד הגרפ . בזמן אמת על המעבדים הגרפיים

התפתחו גם שפות םבמקביל להתפתחות בארכיטקטורת החומרה של המעבדים הגרפיי כל דור חדש של מעבדים . בשפות עיליותםכיום ניתן לכתוב תוכנה למעבדים גרפיי. התכנות שלהם

סביבות הרצה . םל המעבדים הגרפיימצמצם את המגבלות הקיימות על התוכניות הרצות ע מאפשרות למתכנת שליטה טובה יותר על תשאינן קשורות ישירות לאפליקציות גראפיו, חדשות

דבר שמשפר את הביצועים ומצמצם את הדרישה שהייתה, יהרצת החישובים על הכרטיס הגרפ

i



.המחקר נעשה בהנחיית פרופסור אילת טל בפקולטה למדעי המחשב

.אני מודה לטכניון על התמיכה הכספית הנדיבה בהשתלמותי

תודות

יע להישג הייתי רוצה להודות למספר אנשים שעזרו לי להג. ר היא זכות גדולה"קבלת תואר ד .הזה

איילת טל על התמיכה שלה בשלבים השונים של המסע ' פרופ, ברצוני להכיר תודה למנחה שליתודות להכוונה . אני רוצה להודות לה במיוחד על מתן משוב והצעות לשיפור המחקר. הארוך הזה

נות וזאת מיומ, כישורי הכתיבה שלי והיכולת שלי להציג נושאים השתפרו בצורה ניכרת, שלה .חשובה בפני עצמה

בעיקר בזמן שהייתי , העידוד וההבנה שלה, אני רוצה להודות לאשתי האהובה מאיה על התמיכההיה מאוד מספק לשתף איתה רגעים . טרוד בענייני לימודים ולכן לא יכולתי להיות איתה

.משמחים לאורך הדרך

הייתי רוצה . ורים שלי מרים ודבהתמיכה והעידוד של הה, ההישג הזה לא היה אפשרי ללא העזרהתודות מיוחדות מגיעות לאימי על ההשתתפות . להודות להם עבור כל מה שהם עשו בשבילי

אני רוצה להודות גם . הפעילה שלה בהפקת המאמרים והסרטים שמדגימים את המחקר שלית לי ברצוני להזכיר את סבתא בלה שתמיד עוזר. לאחים שלי איתי ועופרי על התמיכה שלהם

שתמיד האמין בערך , ל"אני רוצה להקדיש את המחקר הזה לסבא שלי שלמה ז. ומעודדת אותי . של השכלה גבוהה

הייתי רוצה להודות להורים של מאיה קיטי ואריה ולסבתה של מאיה על התמיכה בי ועל כך שהם

.סיפקו לי סביבה שבה הייתי יכול להתמקד במחקר שלי

ר עמית מזרחי על "ר אבי שטיינר וד"ד, חברים שלי סיוון ברקוביץתודות מיוחדות מגיעות ל .העזרה והתמיכה שלהם



שיטות שיכון גרפים לויזואליזציה של מידע

חיבור על מחקר

לשם מילוי חלקי של הדרישות לקבלת התואר דוקטור לפילוסופיה

יניב פרישמן

ולוגי לישראל מכון טכנ–הוגש לסנט הטכניון

2009 ינואר חיפה ט" תשסטבת



שיטות שיכון גרפים לויזואליזציה של מידע

יניב פרישמן


Documents

Yaniv Frishman - Technion · Graph Drawing Algorithms in Information Visualization Research Thesis In Partial Ful llment of the Requirements for the Degree of Doctor of Philosophy