22
Data Visualization with GEPHI Vinod Gupta School of Management IIT Kharagpur Gephi is dubbed as the Photoshop of Data Analytics. It is open source software to visualize and manipulate complex data networks in an intuitive manner. This user guide is an attempt to present a walkthrough for the new user. Luv Walia – 10BM60043 Prabhjot Singh Bhatia – 10BM60060 Class of 2012

Tutorial - Gephi

Embed Size (px)

DESCRIPTION

Gephi’s project aims to bring the perfect tool for visualizing and manipulating networks.

Citation preview

Page 1: Tutorial - Gephi

Data Visualization with GEPHI

Vinod Gupta School of Management

IIT Kharagpur

Gephi is dubbed as the Photoshop of Data Analytics. It is

open source software to visualize and manipulate

complex data networks in an intuitive manner. This user

guide is an attempt to present a walkthrough for the new

user.

Luv Walia – 10BM60043

Prabhjot Singh Bhatia – 10BM60060

Class of 2012

Page 2: Tutorial - Gephi

1

Contents Introduction ............................................................................................................................................ 2

What this tutorial is about and what it is not about ........................................................................... 2

Who This Tutorial Is For ...................................................................................................................... 2

Prerequisites ....................................................................................................................................... 2

About Gephi ........................................................................................................................................ 2

Features: ............................................................................................................................................. 3

Uses for Gephi in Business .................................................................................................................. 3

Fundamentals ......................................................................................................................................... 4

Installing .............................................................................................................................................. 4

Opening a file ...................................................................................................................................... 4

Graph Visualization ................................................................................................................................. 5

Layout Algorithms ................................................................................................................................. 10

Installing plugins ................................................................................................................................... 20

The cover page image was created with Gephi Version 0.8.1 Beta using Force Atlas 2 layout algorithm

Page 3: Tutorial - Gephi

2

Introduction Data visualization is the representation of processed data using graphical means, so as to make it

easy to communicate the information clearly and effectively. There is a trade-off to be made

between aesthetics and functionality. Gephi helps achieve this trade-off effortlessly.

What this tutorial is about and what it is not about This tutorial is highly practical oriented. It guides one on how to go about data visualization, but

limits itself to Gephi. It remains limited to the basic tools and techniques available in Gephi, and

does not attempt to discuss all available techniques.

The tutorial uses an example dataset to show the implementation of the techniques. Screenshots

have been included for the same. The tutorial has been created on the basis of the latest available

version, 0.8.1 beta. Future versions may or may not contain the features listed here, or may

implement in a manner different from that listed here.

In addition, this book does not specifically discuss the following topics.

The concepts of data visualization

The algorithm followed by various plugins

The internal working of the software.

Who This Tutorial Is For This tutorial is aimed at the budding business professional, who is new to the software and wishes to

get started with data visualization.

Prerequisites A basic understanding of data analysis techniques is necessary. Additionally, one must know how the

results of these analyses are to be interpreted for solving a real life problem. However, no prior data

visualization experience is necessary.

To try out some of the advanced techniques for live data capture and visualization, one must be

comfortable doing programming and setting up a server connected to the internet.

About Gephi [Pronounced: G-fai] Gephi, an open source network visualization platform has a rich set of built in

functionalities and an intuitive user interface. The software provides a powerful and interactive

visualization and exploration tool for all kinds of networks and complex systems, all with a smooth

learning curve.

As software for Exploratory Data Analysis, Gephi provides with a robust toolkit to explore,

understand and manipulate graph structures, to reveal hidden insights. An analyst can make

hypothesis, discover patterns and identify faults during data collection, all with a slick visual

interface to have an overall perspective of things. Gephi is a complementary tool to statistics, since

the importance of visual thinking has finally been recognized. Additionally, Gephi has built in tools

for Social Network Analysis.

Page 4: Tutorial - Gephi

3

Features: Realtime Visualization: Gephi sports the fastest graph visualization engine which helps an

analyst create and analyze a variety of scenarios to make accurate decisions, faster.

SNA Metrics: Although Gephi can work with incorporates all major metrics currently used to

perform a social network analysis(SNA) like

Betweenness: an indicator of influence

Diameter: An indicator of the reach of an individual

Closeness: An indicator of how fast this individual can reach its entire network

Clustering Coefficient: An indicator of how closely knit a particular group of nodes is.

Average shortest path: An indicator of how many nodes to cross to reach a particular node

PageRank: The importance of a page

HITS: Social value of links and content on a page

Clustering and Hierarchical graphs: Gephi helps us create clusters and sub clusters out of

the given network graphs.

Suppport for Large datasets: What differentiates Gephi from other similar software is its

ability to work with a very large dataset, upto 50,000 nodes.

Uses for Gephi in Business Gephi can help visualize any kind of network data graphs. Specifically from a business viewpoint,

Gephi can be of help in a number of ways, as detailed:

Marketing

o Segmentation:

Gephi provides an inbuilt clustering tool to the customers from a

product/service targeting perspective

o Targeting:

Whom to target. More importantly, whom NOT to target

Gephi helps us to find users with the most influence, and hence identify

them as potential targets for marketing communication.

Customer Relationship Management:

o Identify the worth of a customer, based on his network

o Whether or not to go the extra mile to retain that customer

Organizational Development

o Similar to the manner that we employ social network analysis for customers, a large

organization could also apply the same concepts to its own employees and generate

meaningful insights that could help in running the organization more effectively.

Mergers & Acquisition:

o How successful is the merger? Gephi can help answer this question by analyzing the

past and the present scenarios

Team Building:

o What set of employees could bond well?

o Where can conflicts arise?

o Who are the unsung heroes/leaders?

o Where do the barriers to internal communication lie?

Human Resources

Page 5: Tutorial - Gephi

4

o Gephi can help us identify potential candidates best suited for a particular position.

It could also help us target a particular geography to hunt for potential candidates

Gephi can help us answer all the above questions, given the right set of data.

Fundamentals

Installing Get Gephi from this link: https://gephi.org/users/download/. Being java based, Gephi is available for

all: Windows, Linux and Macintosh.

The installation is a simple process.

NOTE: One needs to have java installed and configured on the system before attempting to install

Gephi. To get java, visit this link:

http://www.oracle.com/technetwork/java/javase/downloads/index.html . To just run Gephi, Java

Runtime Environment would be fine. However, to build plugins for Gephi, one must have the Java

Development Kit installed.

Opening a file Gephi cannot work on raw data. It

needs data to be processed into

graph formats (for example, say

.gexf). To accomplish this, we can

take the help from other enterprise

grade FOSS software such as R.

However, for the purpose of

demonstration, we shall be working

with the sample datasets included in

the Gephi toolkit. More specifically

we shall be using the social network

data sets, available here:

http://wiki.gephi.org/index.php/Datasets

Open Graph File (File>Open…)

Import Report

When the file is opened, a report is created, and a

sum-up of the data and any issues are listed:

o Number of nodes

o Number of edges

o Type of graph

Page 6: Tutorial - Gephi

5

Click on OK to validate and see the graph:

o Use the mouse to move and scale the visualization

Zoom: Mouse Wheel

Pan: Right Mouse Drag

Graph Visualization o While the “Drag” mode is enabled you can drag the nodes by keeping left mouse

pressed and moving away.

Click on the area where “Dragging” is written

Configure the “Diameter” with the slider

Page 7: Tutorial - Gephi

6

o You can change the edge thickness by locating the edge-weight slider:

o If you lose your graph, reset the position, using “Center On Graph” button

o Autoselect neighbors

Essential option to enhance readability of the network. Selected nodes

neighbors are automatically selected as well, allowing to know who is

connected to who easily.

Expand the visualization settings (right bottom corner of the graph)

Check the “Autoselect neighbors” option

o Edge color

By default edges have the same color as their source node. This can be

configured and a single color can be used instead.

Page 8: Tutorial - Gephi

7

Expand the visualization settings and go to the “Edges” tab

Uncheck the “Source node color” and configure “Edge default color”

o Node shape and 3-D

Although Gephi uses a 3-D rendering engine, networks are usually in 2-D and

this is the default mode.

Expand the visualization settings and go to the “Nodes” tab

Select “Sphere 3d” instead of “Disk 2d”

o Display attributes

Besides a label, nodes and edges have attributes, like gender, age or

relationship type in a social network. It’s easy to display them instead/with

the label

Click on the “Attributes” button in the visualization settings.

A dialog appears and lists all attributes, separated for nodes and edges.

Page 9: Tutorial - Gephi

8

Check all attributes you want to display, for instance “Code”.

Click on OK to confirm

o Transform text color and size

The Ranking module will be used to do that.

Find the label color transformer and select which

attribute to use for ranking. Here the “Degree” is

chosen.

Configure the ranking colors and click on “APPLY”

The text should be colored now. Try also to use

“Betweenness Centrality” instead of “Degree”.

Now select the label size transformer

Select sizes between 0 and 1, as this size value is

multiplied with the default element size

Click on “APPLY” to see how the text size changes

Page 10: Tutorial - Gephi

9

o Antialiasing option

Antialiasing is a visualization option which makes edges look smoother. It is

set at 4x by default and can be set up to 16x.

Go to Gephi options in the “Tools” menu

Select the “Visualization” tab and then the “OpenGL” tab.

Here you can change the antialising option. Restart Gephi to validate the

changes.

Layout the graph

o Layout algorithms sets the graph

shape, it is the most essential action.

o Locate the Layout module, on the left

panel.

Choose “Force Atlas 2” (to

handle large networks while

keeping a very good quality.)

Page 11: Tutorial - Gephi

10

“RUN” the layout by applying the following settings step by step:

LinLog mode = checked (Linear attraction & logarithmic repulsion

(lin-lin by default), makes clusters tighter)

Scaling = 100 (Increase to make the graph sparser)

Edge weight influence = 0 (From 0 (no influence) to 1 (normal). Set 0

to calculate forces without edge weight)

Now “STOP” the algorithm.

Layout Algorithms o The purpose of Layout Properties is to let you control the algorithm in order to

make a aesthetically pleasing representation.

There are several layout options available to the user, namely, OpenOrd, ForceAtlas, Yifan Hu,

Frushterman-Reingold, Circular, Radial Axis and GeoLayout, each one being used for a specific

purpose.

LAYOUT EMPHASIS

OpenOrd Divisions/Clustering

ForceAtlas, Yifan Hu, Frushterman-Reingold

Complementarities

Circular, Radial Axis Ranking

GeoLayout Geographic Repartition

Ranking (color)

o Ranking module lets you configure node’s color and size.

o Locate Ranking module, in the top left.

o Choose “Degree” as a rank parameter.

o You should obtain the configuration panel below

o configure colors

Move your mouse over the gradient component

Double-click on triangles to configure the color

o Click on apply to see the result

Page 12: Tutorial - Gephi

11

Ranking result table

o You can see rank values by enabling the result table. ACARVIN has 252 links and is

the most connected node in the network

o Enable table result view at the bottom toolbar

o Click again on apply

Metrics

o Calculate the average path length for the network. It computes the path length for

all possibles pairs of nodes and give information about how nodes are close from

each other

o Click on “RUN” near “Average Path Length”. The settings panel immediately

appears

Page 13: Tutorial - Gephi

12

o Select “Directed” and click on OK to compute the metric

o When finished, the metric displays its result in a report

Page 14: Tutorial - Gephi

13

Ranking (size)

o Metrics generates general reports but also results for each node. Thus three new

values have been created by the “Average Path Length” algorithm we ran.

Betweeness Centrality

Closeness Centrality

Eccentricity

o Go back to Ranking

o Select “Betweeness Centrality” in the list. This metrics indicates influencial nodes

for highest value.

o The node’s size will be set now. Colors remain the “Degree” indicator.

o Select the diamond icon in the toolbar for size.

o Set a min size at 40 and a max size at 200

o And click on “APPLY” to see the result.

Color: Degree Size: Betweeness Centrality metric

Page 15: Tutorial - Gephi

14

Show labels

o Display node labels

o Set label size proportional to node size

o Set label size with the scale slider

o Set label color

Locate the color chooser in the visualization settings

Press the left mouse to display the palette and pick a color. This sets node

label color.

To configure edge label color, expand the settings bar

o Label Adjust

Go to the Layout panel

Choose the “Label Adjust” layout in the list

Click “RUN” on to proceed

Page 16: Tutorial - Gephi

15

Community detection

o The ability to detect and study communities is central in network analysis. We

would like to colorize clusters in our example

o Gephi implements the Louvain method1, available from the Statistics panel

o Click on “RUN” near the “Modularity” line

Page 17: Tutorial - Gephi

16

Partition

o The community detection algorithm created a “Modularity Class” value for each

node. The partition module can use this new data to colorize communities.

o Locate the Partition module on the left panel.

o Immediately click on the “Refresh” button to populate the partition list.

o Select “Modularity Class” in the partition list.

o You can see that many communities were found, sorted in decreasing order by

percentage, could be different for you. A random color has been set for each

community identifier.

o Click on “APPLY” to colorize nodes

Page 18: Tutorial - Gephi

17

Filter

o The last manipulation step is filtering. You create filters that can hide nodes and

egdes on the network. We will create a filter to remove leaves, i.e. nodes with nine

edge.

o Locate the Filters module on the right panel.

o Select “Degree Range” in the “Topology” category.

o Drag it to the Queries, drop it to “Drag filter here”.

Page 19: Tutorial - Gephi

18

o Click on “Degree Range” to activate the filter. The parameters panel appears.

o It shows a range slider and the chart that represents the data, the degree

distribution.

o Move the slider to sets its lower bound to 9. Enable filtering by pushing the button.

o Nodes with a degree inferior to 9 are now hidden.

Preview

o Before exporting your graph as a SVG or PDF file, go to the Preview to:

o Select the “Preview” tab in the banner. Click on Refresh to see the preview.

o See exactly how the graph will look like. Put the last touch.

Page 20: Tutorial - Gephi

19

o In the Node properties, find “Show Labels” and enable the option. Click on

“REFRESH”.

Page 21: Tutorial - Gephi

20

Export as SVG

o From Preview, click on SVG near Export (SVG Files are vector graphics, like PDF.

Images scale smoothly to different sizes and can therefore be printed or integrated

in high-resolution presentations. Transform and manipulate SVG files in Inkscape or

Adobe Illustrator)

Save your project.

Installing plugins Being the true open source feature extensive software in its class, Gephi has attracted a lot of

attention from developers and researchers all round the world. As a result, there are a plethora of

plugins available for Gephi to extend its functionality. These plugins can be found at

https://gephi.org/plugins/ . A majority of these plugins are developed by the community and quite a

few are under active development.

A few prominent ones are:

o Retweet Monitor: Used for monitoring live retweets. More details at

https://gephi.org/plugins/retweet-monitor/

o Graphviz Layout: Used to make layouts suitable for the specialized graphviz

software. More details at https://gephi.org/plugins/graphviz-layout/

o Parallel Force Atlas: Used to speed up ForceAtlas, using multiple threads. More

details at https://gephi.org/plugins/parallel-force-atlas/

o Social Network Analysis: This plugin allows computation of various metrics used in

social network analysis and influencer analysis. More details at

https://gephi.org/plugins/social-network-analysis/

o Layered Layout: This is a specialized layout with nodes in different orbits, specially

used in Social Network Analysis. More details at https://gephi.org/plugins/layered-

layout/

o HTTP Graph: Generates data based on the web browsing activity on the machine.

Details at: https://gephi.org/plugins/http-graph/

o Circular Layout, OpenOrd Layout, GeoLayout : These are layout algorithms as

described previously in layouts

To install a plugin,

1. Download the .zip file from the respective webpage for the plugin.

2. Extract the file to a specified folder of your choice, to get a “.nbm” file.

3. Open Gephi.

4. Go to Tools>Plugins.

5. Click on “Downloaded” tab.

6. Click “Add Plugins”

7. Browse to the path where the file was extracted and select the “.nbm” file.

8. Click OK and then Install.

9. Follow the onscreen instructions.

Page 22: Tutorial - Gephi

21