Upload
luv-walia
View
10.308
Download
11
Embed Size (px)
DESCRIPTION
Gephi’s project aims to bring the perfect tool for visualizing and manipulating networks.
Citation preview
Data Visualization with GEPHI
Vinod Gupta School of Management
IIT Kharagpur
Gephi is dubbed as the Photoshop of Data Analytics. It is
open source software to visualize and manipulate
complex data networks in an intuitive manner. This user
guide is an attempt to present a walkthrough for the new
user.
Luv Walia – 10BM60043
Prabhjot Singh Bhatia – 10BM60060
Class of 2012
1
Contents Introduction ............................................................................................................................................ 2
What this tutorial is about and what it is not about ........................................................................... 2
Who This Tutorial Is For ...................................................................................................................... 2
Prerequisites ....................................................................................................................................... 2
About Gephi ........................................................................................................................................ 2
Features: ............................................................................................................................................. 3
Uses for Gephi in Business .................................................................................................................. 3
Fundamentals ......................................................................................................................................... 4
Installing .............................................................................................................................................. 4
Opening a file ...................................................................................................................................... 4
Graph Visualization ................................................................................................................................. 5
Layout Algorithms ................................................................................................................................. 10
Installing plugins ................................................................................................................................... 20
The cover page image was created with Gephi Version 0.8.1 Beta using Force Atlas 2 layout algorithm
2
Introduction Data visualization is the representation of processed data using graphical means, so as to make it
easy to communicate the information clearly and effectively. There is a trade-off to be made
between aesthetics and functionality. Gephi helps achieve this trade-off effortlessly.
What this tutorial is about and what it is not about This tutorial is highly practical oriented. It guides one on how to go about data visualization, but
limits itself to Gephi. It remains limited to the basic tools and techniques available in Gephi, and
does not attempt to discuss all available techniques.
The tutorial uses an example dataset to show the implementation of the techniques. Screenshots
have been included for the same. The tutorial has been created on the basis of the latest available
version, 0.8.1 beta. Future versions may or may not contain the features listed here, or may
implement in a manner different from that listed here.
In addition, this book does not specifically discuss the following topics.
The concepts of data visualization
The algorithm followed by various plugins
The internal working of the software.
Who This Tutorial Is For This tutorial is aimed at the budding business professional, who is new to the software and wishes to
get started with data visualization.
Prerequisites A basic understanding of data analysis techniques is necessary. Additionally, one must know how the
results of these analyses are to be interpreted for solving a real life problem. However, no prior data
visualization experience is necessary.
To try out some of the advanced techniques for live data capture and visualization, one must be
comfortable doing programming and setting up a server connected to the internet.
About Gephi [Pronounced: G-fai] Gephi, an open source network visualization platform has a rich set of built in
functionalities and an intuitive user interface. The software provides a powerful and interactive
visualization and exploration tool for all kinds of networks and complex systems, all with a smooth
learning curve.
As software for Exploratory Data Analysis, Gephi provides with a robust toolkit to explore,
understand and manipulate graph structures, to reveal hidden insights. An analyst can make
hypothesis, discover patterns and identify faults during data collection, all with a slick visual
interface to have an overall perspective of things. Gephi is a complementary tool to statistics, since
the importance of visual thinking has finally been recognized. Additionally, Gephi has built in tools
for Social Network Analysis.
3
Features: Realtime Visualization: Gephi sports the fastest graph visualization engine which helps an
analyst create and analyze a variety of scenarios to make accurate decisions, faster.
SNA Metrics: Although Gephi can work with incorporates all major metrics currently used to
perform a social network analysis(SNA) like
Betweenness: an indicator of influence
Diameter: An indicator of the reach of an individual
Closeness: An indicator of how fast this individual can reach its entire network
Clustering Coefficient: An indicator of how closely knit a particular group of nodes is.
Average shortest path: An indicator of how many nodes to cross to reach a particular node
PageRank: The importance of a page
HITS: Social value of links and content on a page
Clustering and Hierarchical graphs: Gephi helps us create clusters and sub clusters out of
the given network graphs.
Suppport for Large datasets: What differentiates Gephi from other similar software is its
ability to work with a very large dataset, upto 50,000 nodes.
Uses for Gephi in Business Gephi can help visualize any kind of network data graphs. Specifically from a business viewpoint,
Gephi can be of help in a number of ways, as detailed:
Marketing
o Segmentation:
Gephi provides an inbuilt clustering tool to the customers from a
product/service targeting perspective
o Targeting:
Whom to target. More importantly, whom NOT to target
Gephi helps us to find users with the most influence, and hence identify
them as potential targets for marketing communication.
Customer Relationship Management:
o Identify the worth of a customer, based on his network
o Whether or not to go the extra mile to retain that customer
Organizational Development
o Similar to the manner that we employ social network analysis for customers, a large
organization could also apply the same concepts to its own employees and generate
meaningful insights that could help in running the organization more effectively.
Mergers & Acquisition:
o How successful is the merger? Gephi can help answer this question by analyzing the
past and the present scenarios
Team Building:
o What set of employees could bond well?
o Where can conflicts arise?
o Who are the unsung heroes/leaders?
o Where do the barriers to internal communication lie?
Human Resources
4
o Gephi can help us identify potential candidates best suited for a particular position.
It could also help us target a particular geography to hunt for potential candidates
Gephi can help us answer all the above questions, given the right set of data.
Fundamentals
Installing Get Gephi from this link: https://gephi.org/users/download/. Being java based, Gephi is available for
all: Windows, Linux and Macintosh.
The installation is a simple process.
NOTE: One needs to have java installed and configured on the system before attempting to install
Gephi. To get java, visit this link:
http://www.oracle.com/technetwork/java/javase/downloads/index.html . To just run Gephi, Java
Runtime Environment would be fine. However, to build plugins for Gephi, one must have the Java
Development Kit installed.
Opening a file Gephi cannot work on raw data. It
needs data to be processed into
graph formats (for example, say
.gexf). To accomplish this, we can
take the help from other enterprise
grade FOSS software such as R.
However, for the purpose of
demonstration, we shall be working
with the sample datasets included in
the Gephi toolkit. More specifically
we shall be using the social network
data sets, available here:
http://wiki.gephi.org/index.php/Datasets
Open Graph File (File>Open…)
Import Report
When the file is opened, a report is created, and a
sum-up of the data and any issues are listed:
o Number of nodes
o Number of edges
o Type of graph
5
Click on OK to validate and see the graph:
o Use the mouse to move and scale the visualization
Zoom: Mouse Wheel
Pan: Right Mouse Drag
Graph Visualization o While the “Drag” mode is enabled you can drag the nodes by keeping left mouse
pressed and moving away.
Click on the area where “Dragging” is written
Configure the “Diameter” with the slider
6
o You can change the edge thickness by locating the edge-weight slider:
o If you lose your graph, reset the position, using “Center On Graph” button
o Autoselect neighbors
Essential option to enhance readability of the network. Selected nodes
neighbors are automatically selected as well, allowing to know who is
connected to who easily.
Expand the visualization settings (right bottom corner of the graph)
Check the “Autoselect neighbors” option
o Edge color
By default edges have the same color as their source node. This can be
configured and a single color can be used instead.
7
Expand the visualization settings and go to the “Edges” tab
Uncheck the “Source node color” and configure “Edge default color”
o Node shape and 3-D
Although Gephi uses a 3-D rendering engine, networks are usually in 2-D and
this is the default mode.
Expand the visualization settings and go to the “Nodes” tab
Select “Sphere 3d” instead of “Disk 2d”
o Display attributes
Besides a label, nodes and edges have attributes, like gender, age or
relationship type in a social network. It’s easy to display them instead/with
the label
Click on the “Attributes” button in the visualization settings.
A dialog appears and lists all attributes, separated for nodes and edges.
8
Check all attributes you want to display, for instance “Code”.
Click on OK to confirm
o Transform text color and size
The Ranking module will be used to do that.
Find the label color transformer and select which
attribute to use for ranking. Here the “Degree” is
chosen.
Configure the ranking colors and click on “APPLY”
The text should be colored now. Try also to use
“Betweenness Centrality” instead of “Degree”.
Now select the label size transformer
Select sizes between 0 and 1, as this size value is
multiplied with the default element size
Click on “APPLY” to see how the text size changes
9
o Antialiasing option
Antialiasing is a visualization option which makes edges look smoother. It is
set at 4x by default and can be set up to 16x.
Go to Gephi options in the “Tools” menu
Select the “Visualization” tab and then the “OpenGL” tab.
Here you can change the antialising option. Restart Gephi to validate the
changes.
Layout the graph
o Layout algorithms sets the graph
shape, it is the most essential action.
o Locate the Layout module, on the left
panel.
Choose “Force Atlas 2” (to
handle large networks while
keeping a very good quality.)
10
“RUN” the layout by applying the following settings step by step:
LinLog mode = checked (Linear attraction & logarithmic repulsion
(lin-lin by default), makes clusters tighter)
Scaling = 100 (Increase to make the graph sparser)
Edge weight influence = 0 (From 0 (no influence) to 1 (normal). Set 0
to calculate forces without edge weight)
Now “STOP” the algorithm.
Layout Algorithms o The purpose of Layout Properties is to let you control the algorithm in order to
make a aesthetically pleasing representation.
There are several layout options available to the user, namely, OpenOrd, ForceAtlas, Yifan Hu,
Frushterman-Reingold, Circular, Radial Axis and GeoLayout, each one being used for a specific
purpose.
LAYOUT EMPHASIS
OpenOrd Divisions/Clustering
ForceAtlas, Yifan Hu, Frushterman-Reingold
Complementarities
Circular, Radial Axis Ranking
GeoLayout Geographic Repartition
Ranking (color)
o Ranking module lets you configure node’s color and size.
o Locate Ranking module, in the top left.
o Choose “Degree” as a rank parameter.
o You should obtain the configuration panel below
o configure colors
Move your mouse over the gradient component
Double-click on triangles to configure the color
o Click on apply to see the result
11
Ranking result table
o You can see rank values by enabling the result table. ACARVIN has 252 links and is
the most connected node in the network
o Enable table result view at the bottom toolbar
o Click again on apply
Metrics
o Calculate the average path length for the network. It computes the path length for
all possibles pairs of nodes and give information about how nodes are close from
each other
o Click on “RUN” near “Average Path Length”. The settings panel immediately
appears
12
o Select “Directed” and click on OK to compute the metric
o When finished, the metric displays its result in a report
13
Ranking (size)
o Metrics generates general reports but also results for each node. Thus three new
values have been created by the “Average Path Length” algorithm we ran.
Betweeness Centrality
Closeness Centrality
Eccentricity
o Go back to Ranking
o Select “Betweeness Centrality” in the list. This metrics indicates influencial nodes
for highest value.
o The node’s size will be set now. Colors remain the “Degree” indicator.
o Select the diamond icon in the toolbar for size.
o Set a min size at 40 and a max size at 200
o And click on “APPLY” to see the result.
Color: Degree Size: Betweeness Centrality metric
14
Show labels
o Display node labels
o Set label size proportional to node size
o Set label size with the scale slider
o Set label color
Locate the color chooser in the visualization settings
Press the left mouse to display the palette and pick a color. This sets node
label color.
To configure edge label color, expand the settings bar
o Label Adjust
Go to the Layout panel
Choose the “Label Adjust” layout in the list
Click “RUN” on to proceed
15
Community detection
o The ability to detect and study communities is central in network analysis. We
would like to colorize clusters in our example
o Gephi implements the Louvain method1, available from the Statistics panel
o Click on “RUN” near the “Modularity” line
16
Partition
o The community detection algorithm created a “Modularity Class” value for each
node. The partition module can use this new data to colorize communities.
o Locate the Partition module on the left panel.
o Immediately click on the “Refresh” button to populate the partition list.
o Select “Modularity Class” in the partition list.
o You can see that many communities were found, sorted in decreasing order by
percentage, could be different for you. A random color has been set for each
community identifier.
o Click on “APPLY” to colorize nodes
17
Filter
o The last manipulation step is filtering. You create filters that can hide nodes and
egdes on the network. We will create a filter to remove leaves, i.e. nodes with nine
edge.
o Locate the Filters module on the right panel.
o Select “Degree Range” in the “Topology” category.
o Drag it to the Queries, drop it to “Drag filter here”.
18
o Click on “Degree Range” to activate the filter. The parameters panel appears.
o It shows a range slider and the chart that represents the data, the degree
distribution.
o Move the slider to sets its lower bound to 9. Enable filtering by pushing the button.
o Nodes with a degree inferior to 9 are now hidden.
Preview
o Before exporting your graph as a SVG or PDF file, go to the Preview to:
o Select the “Preview” tab in the banner. Click on Refresh to see the preview.
o See exactly how the graph will look like. Put the last touch.
19
o In the Node properties, find “Show Labels” and enable the option. Click on
“REFRESH”.
20
Export as SVG
o From Preview, click on SVG near Export (SVG Files are vector graphics, like PDF.
Images scale smoothly to different sizes and can therefore be printed or integrated
in high-resolution presentations. Transform and manipulate SVG files in Inkscape or
Adobe Illustrator)
Save your project.
Installing plugins Being the true open source feature extensive software in its class, Gephi has attracted a lot of
attention from developers and researchers all round the world. As a result, there are a plethora of
plugins available for Gephi to extend its functionality. These plugins can be found at
https://gephi.org/plugins/ . A majority of these plugins are developed by the community and quite a
few are under active development.
A few prominent ones are:
o Retweet Monitor: Used for monitoring live retweets. More details at
https://gephi.org/plugins/retweet-monitor/
o Graphviz Layout: Used to make layouts suitable for the specialized graphviz
software. More details at https://gephi.org/plugins/graphviz-layout/
o Parallel Force Atlas: Used to speed up ForceAtlas, using multiple threads. More
details at https://gephi.org/plugins/parallel-force-atlas/
o Social Network Analysis: This plugin allows computation of various metrics used in
social network analysis and influencer analysis. More details at
https://gephi.org/plugins/social-network-analysis/
o Layered Layout: This is a specialized layout with nodes in different orbits, specially
used in Social Network Analysis. More details at https://gephi.org/plugins/layered-
layout/
o HTTP Graph: Generates data based on the web browsing activity on the machine.
Details at: https://gephi.org/plugins/http-graph/
o Circular Layout, OpenOrd Layout, GeoLayout : These are layout algorithms as
described previously in layouts
To install a plugin,
1. Download the .zip file from the respective webpage for the plugin.
2. Extract the file to a specified folder of your choice, to get a “.nbm” file.
3. Open Gephi.
4. Go to Tools>Plugins.
5. Click on “Downloaded” tab.
6. Click “Add Plugins”
7. Browse to the path where the file was extracted and select the “.nbm” file.
8. Click OK and then Install.
9. Follow the onscreen instructions.
21