Tutorial - Gephi

Embed Size (px)

DESCRIPTION

Gephi’s project aims to bring the perfect tool for visualizing and manipulating networks.

Text of Tutorial - Gephi

Data Visualization with GEPHI

Luv Walia 10BM60043 Prabhjot Singh Bhatia 10BM60060Gephi is dubbed as the Photoshop of Data Analytics. It is open source software to visualize and manipulate complex data networks in an intuitive manner. This user guide is an attempt to present a walkthrough for the new user.

Class of 2012 Vinod Gupta School of Management IIT Kharagpur

ContentsIntroduction ............................................................................................................................................ 2 What this tutorial is about and what it is not about........................................................................... 2 Who This Tutorial Is For ...................................................................................................................... 2 Prerequisites ....................................................................................................................................... 2 About Gephi ........................................................................................................................................ 2 Features: ............................................................................................................................................. 3 Uses for Gephi in Business .................................................................................................................. 3 Fundamentals ......................................................................................................................................... 4 Installing .............................................................................................................................................. 4 Opening a file ...................................................................................................................................... 4 Graph Visualization ................................................................................................................................. 5 Layout Algorithms ................................................................................................................................. 10 Installing plugins ................................................................................................................................... 20

The cover page image was created with Gephi Version 0.8.1 Beta using Force Atlas 2 layout algorithm

1

IntroductionData visualization is the representation of processed data using graphical means, so as to make it easy to communicate the information clearly and effectively. There is a trade-off to be made between aesthetics and functionality. Gephi helps achieve this trade-off effortlessly.

What this tutorial is about and what it is not aboutThis tutorial is highly practical oriented. It guides one on how to go about data visualization, but limits itself to Gephi. It remains limited to the basic tools and techniques available in Gephi, and does not attempt to discuss all available techniques. The tutorial uses an example dataset to show the implementation of the techniques. Screenshots have been included for the same. The tutorial has been created on the basis of the latest available version, 0.8.1 beta. Future versions may or may not contain the features listed here, or may implement in a manner different from that listed here. In addition, this book does not specifically discuss the following topics. The concepts of data visualization The algorithm followed by various plugins The internal working of the software.

Who This Tutorial Is ForThis tutorial is aimed at the budding business professional, who is new to the software and wishes to get started with data visualization.

PrerequisitesA basic understanding of data analysis techniques is necessary. Additionally, one must know how the results of these analyses are to be interpreted for solving a real life problem. However, no prior data visualization experience is necessary. To try out some of the advanced techniques for live data capture and visualization, one must be comfortable doing programming and setting up a server connected to the internet.

About Gephi[Pronounced: G-fai] Gephi, an open source network visualization platform has a rich set of built in functionalities and an intuitive user interface. The software provides a powerful and interactive visualization and exploration tool for all kinds of networks and complex systems, all with a smooth learning curve. As software for Exploratory Data Analysis, Gephi provides with a robust toolkit to explore, understand and manipulate graph structures, to reveal hidden insights. An analyst can make hypothesis, discover patterns and identify faults during data collection, all with a slick visual interface to have an overall perspective of things. Gephi is a complementary tool to statistics, since the importance of visual thinking has finally been recognized. Additionally, Gephi has built in tools for Social Network Analysis.

2

Features: Realtime Visualization: Gephi sports the fastest graph visualization engine which helps an analyst create and analyze a variety of scenarios to make accurate decisions, faster. SNA Metrics: Although Gephi can work with incorporates all major metrics currently used to perform a social network analysis(SNA) like Betweenness: an indicator of influence Diameter: An indicator of the reach of an individual Closeness: An indicator of how fast this individual can reach its entire network Clustering Coefficient: An indicator of how closely knit a particular group of nodes is. Average shortest path: An indicator of how many nodes to cross to reach a particular node PageRank: The importance of a page HITS: Social value of links and content on a page Clustering and Hierarchical graphs: Gephi helps us create clusters and sub clusters out of the given network graphs. Suppport for Large datasets: What differentiates Gephi from other similar software is its ability to work with a very large dataset, upto 50,000 nodes.

Uses for Gephi in BusinessGephi can help visualize any kind of network data graphs. Specifically from a business viewpoint, Gephi can be of help in a number of ways, as detailed: Marketing o Segmentation: Gephi provides an inbuilt clustering tool to the customers from a product/service targeting perspective o Targeting: Whom to target. More importantly, whom NOT to target Gephi helps us to find users with the most influence, and hence identify them as potential targets for marketing communication. Customer Relationship Management: o Identify the worth of a customer, based on his network o Whether or not to go the extra mile to retain that customer Organizational Development o Similar to the manner that we employ social network analysis for customers, a large organization could also apply the same concepts to its own employees and generate meaningful insights that could help in running the organization more effectively. Mergers & Acquisition: o How successful is the merger? Gephi can help answer this question by analyzing the past and the present scenarios Team Building: o What set of employees could bond well? o Where can conflicts arise? o Who are the unsung heroes/leaders? o Where do the barriers to internal communication lie? Human Resources

3

o

Gephi can help us identify potential candidates best suited for a particular position. It could also help us target a particular geography to hunt for potential candidates

Gephi can help us answer all the above questions, given the right set of data.

FundamentalsInstallingGet Gephi from this link: https://gephi.org/users/download/. Being java based, Gephi is available for all: Windows, Linux and Macintosh. The installation is a simple process. NOTE: One needs to have java installed and configured on the system before attempting to install Gephi. To get java, visit this link: http://www.oracle.com/technetwork/java/javase/downloads/index.html . To just run Gephi, Java Runtime Environment would be fine. However, to build plugins for Gephi, one must have the Java Development Kit installed.

Opening a fileGephi cannot work on raw data. It needs data to be processed into graph formats (for example, say .gexf). To accomplish this, we can take the help from other enterprise grade FOSS software such as R. However, for the purpose of demonstration, we shall be working with the sample datasets included in the Gephi toolkit. More specifically we shall be using the social network data sets, available http://wiki.gephi.org/index.php/Datasets

here:

Open Graph File (File>Open) Import Report When the file is opened, a report is created, and a sum-up of the data and any issues are listed: o Number of nodes o Number of edges o Type of graph

4

Click on OK to validate and see the graph:

o

Use the mouse to move and scale the visualization Zoom: Mouse Wheel Pan: Right Mouse Drag

Graph Visualizationo While the Drag mode is enabled you can drag the nodes by keeping left mouse pressed and moving away. Click on the area where Dragging is written Configure the Diameter with the slider

5

o

You can change the edge thickness by locating the edge-weight slider:

o

If you lose your graph, reset the position, using Center On Graph button

o

Autoselect neighbors Essential option to enhance readability of the network. Selected nodes neighbors are automatically selected as well, allowing to know who is connected to who easily. Expand the visualization settings (right bottom corner of the graph) Check the Autoselect neighbors option

o

Edge color By default edges have the same color as their source node. This can be configured and a single color can be used instead.

6