43
ICWSM’11 Tutorial Exploratory Network Analysis with: Instructors: Sébastien Heymann, Julian Bilcke [email protected], [email protected] July 17, 2011 | 1 PM - 4 PM

Gephi icwsm-tutorial

  • Upload
    csedays

  • View
    1.072

  • Download
    6

Embed Size (px)

Citation preview

Page 1: Gephi icwsm-tutorial

ICWSM’11 TutorialExploratory Network Analysis with:

Instructors: Sébastien Heymann, Julian [email protected], [email protected]

July 17, 2011 | 1 PM - 4 PM

Page 2: Gephi icwsm-tutorial

Exploratory Network Analysis with Gephi

This tutorial is an introduction to Gephi, the open source graph network visualization and manipulation software.

Gephi aims to fulfill the complete chain from data importing to aesthetics refinements and interaction.

Users interact with the visualization and manipulate structures, shapes and colors to reveal hidden properties.

The goal is to help data analysts to make hypotheses, intuitively discover patterns or errors in large data collections.

At the end, the participants will walk away with the practical knowledge enabling them to use Gephi for their own projects.

OFFLINE

Page 3: Gephi icwsm-tutorial

Exploratory Network Analysis with Gephi

It starts with a brief introduction on the network exploration process and a hands-on demonstration of the essential functionalities of Gephi.

Participants are guided step by step through the complete chain of rep-resentation, manipulation, layout, analysis and aesthetics refinements. Next, teams work on real datasets.

They finally present their preliminary results. The tutorial concludes with a general question and answer session.

OFFLINE

Page 4: Gephi icwsm-tutorial

Requirements

Bring your own laptop with Java and Gephi installed.Gephi should be updated (menu Help > Check for Updates).

Bring a mouse with a wheel.

Bring a dataset of your own if you want, verify if it loads well in Gephi.[1]

[1] http://gephi.org/users/supported-graph-formats/

Page 5: Gephi icwsm-tutorial

Workshop Schedule - Part I

Exploratory Network Analysis

• Exploratory Data Analysis• Exploratory Network Analysis• Looking for Orderness in Data• Examples• Guideline

Introduction to Gephi

• Approach and Community• Networked Data• Quick Start Demo

* 30 min break *

Page 6: Gephi icwsm-tutorial

Workshop Schedule - Part II

Hands-On!

• Team Work on a Dataset• Presentation of Preliminary Results

Q&A

Page 7: Gephi icwsm-tutorial

Exploratory Data Analysis

“The greatest value of a picture is when it forces us to notice what we never expected to see”

started with John Tukey (1962)

ConfirmatoryExploratorySerendipity

resultsintuitionsurprise

Page 8: Gephi icwsm-tutorial

Exploratory Data Analysis

Non-linear processing chain of Ben Fry in Computational Information Design (2004)

Page 9: Gephi icwsm-tutorial

Dummy Example

P2P file size distribution (Latapy et al., 2008)

Observation: visual saliences on specific file sizes

External knowledge:these sizes correspond to films

New hypothesis on data:films are highly exchanged, so the study might dig in this direction

Page 10: Gephi icwsm-tutorial

Exploratory Network Analysis

see the network1

1st graph viz tool: Pajek (1996)Vladimir Batagelj, Andrej Mrvar

interact in real time2

3

Gephi prototype (2008)group, filter, compute metrics...

size by rank, color by partition,label, curved edges, thickness...

build a visual language

Page 11: Gephi icwsm-tutorial

Looking for a “Simple Small Truth”?

Drew Conway, What Data Visualization Should Do: 1. Make complex things simple2. Extract small information from large data3. Present truth, do not deceive

http://www.dataists.com/2010/10/what-data-visualization-should-do-simple-small-truth/

Page 12: Gephi icwsm-tutorial

Looking for Orderness in Data

Make varying 3 cursors simultaneously to extract meaningful patterns

MICRO level MACRO level

1 dimension N dimensions

T+0 T+N

at different levels

on multiple dimensions

at time scale

Page 13: Gephi icwsm-tutorial

“Zoom” cursor on Quantitative Data

Global- connectivity- density- centralization

Local- communities- bridges between communities- local centers vs periphery

Individual- centrality- distances- neighborhood- location- local authority vs hub

MICRO level MACRO level

Page 14: Gephi icwsm-tutorial

“Crossing” cursor on Qualitative Data

Social- who with whom- communities- brokerage- influence and power- homophily

Semantic- topics- thematic clusters

Geographic- spatial phenomena

1 dimension N dimensions

Page 15: Gephi icwsm-tutorial

“Timeline” cursor on Temporal Data

Evolution of social ties

Evolution of communities

Evolution of topics

T+0 T+N

Page 16: Gephi icwsm-tutorial

Mapping an Innovation CenterCollaborations on projects at Images et Réseaux

Themes and content

Actors

Territory

Franck Ghitalla & Ecole de Design de Nantes

Page 17: Gephi icwsm-tutorial

Mapping Scientific Cooperations

Page 18: Gephi icwsm-tutorial

Network Map: a Series of Choices

corpus

data

algorithms

thresholds

graphicaloperations

communication goals

Page 19: Gephi icwsm-tutorial

Guideline

lists + edges in bonus, focus on qualitative data

How attributes explain the structure?• easy to read, “obvious” patterns• focus on entities (in context)• metrics are tools to describe the graph (centrality, bridging...)• links help to build and interpret categories of entitieschallenge: mix attribute crossing and connectivity

How the structure explains attributes?• hard to read, problem of “hidden signals”:

track patterns with various layouts and filtering• focus on structures• metrics are tools to build the graph (cosine similarity...)• categories help to understand the structurechallenge: pattern recognition

require high computational power

1 - 100

100 - 1,000

1,000 - 50,000

> 50,000

# nodes

Page 20: Gephi icwsm-tutorial

Gephi now!

Page 21: Gephi icwsm-tutorial

Gephi in a Nutshell

« Like Photoshop™ for graphs. »

Helps data analysts to reveal patterns and trends,highlight outliers and tells story with their data.

• Network visualization platform

• Open source, supported by a community

• Built for performance and usability

• Extensible by plug-ins

• Windows, MacOS X, Linux

Page 22: Gephi icwsm-tutorial

Gephi Community

ContributorsCommunities

Mathieu Bastian, Mathieu Jacomy, Eduardo Ramos Ibañez, Sébastien Heymann, Guillaume Ceccarelli, André Panisson, Antonio Patriarca, Cezary Bartosiak, Martin Škurla, Patrick McSweeney, Yi Du, Hélder Suzuki, Daniel Bernardes, Ernesto Aneiro, Keheliya Gallaba, Luiz Ribeiro, Urban Škudnik, Vojtech Bardiovsky, Yudi Xue

Nonprofit organization

Page 23: Gephi icwsm-tutorial

Community Mission

Provide a “sustainable” software

Maintain the technical ecosystem

Build a business ecosystem

Face cutting-edge technological challenges with a long-term vision

Distribute the software in Open Source

Page 24: Gephi icwsm-tutorial

Community Values

Open innovation: ideas and features come from the entire community.

Decisions are taken with transparency.

We consider this technology as a public good,and will keep it in open source.

Page 25: Gephi icwsm-tutorial

Diversity of Usages

business leisure :-)

communication academic art

Page 26: Gephi icwsm-tutorial

Diversity of Network Encoding

V = { a, b, c, d, e }E = { (a,b), (a,d), (b,c), (e,a), (c,e) }

Textual

a b c d ea - 1 - 1 -b - - 1 - -c - - - - 1d - - - - -e 1 - - - -

Tabular

<graph> <nodes> <node id=”a” /> <node id=”b” /> <node id=”c” /> <node id=”d” /> <node id=”e” /> </nodes> <edges> <edge source=”a” target=”b” /> <edge source=”a” target=”d” /> <edge source=”b” target=”c” /> <edge source=”e” target=”a” /> <edge source=”c” target=”e” /> </edges></graph>

XMLGraphical

and many others...

Page 27: Gephi icwsm-tutorial

Software I/O

} >

graph streaming

databases

file

file

CSVPajek NETGuess GDFGEXFGraphMLGraphviz DOTUCInet DLNetdrawVNATulip TLPExcel Spreadsheet

MySQL PostgreSL

SQL ServerNeo4j

CSVPajek NETGuess GDFGEXFGraphMLExcel SpreadsheetSVGPDFPNG

user input

Page 28: Gephi icwsm-tutorial

Choosing a File Format

Table of features supported by Gephi

* spreadsheets can be loaded in the Data Laboratory

Edg

e List

/Matr

ix Str

uctur

e

XML S

trutur

e

Ed

ge W

eight

At

tribu

tes

Vi

suali

zatio

n Attr

ibutes

At

tribu

te Defa

ult Va

lue

H

ierarc

hical

Graphs

D

ynam

ics

CSVDL UcinetDOT GraphvizGDFGEXFGMLGraphMLNET PajekTLP TulipVNA NetdrawSpreadsheet*

Page 29: Gephi icwsm-tutorial

Do you need...

GEXFSpreadsheetGraphMLGuess GDFGMLUCINet DLNetdraw VNAGraphviz DOTPajek NETCSVTulip TLP

Many features

Few features

XMLTabularText

File Type

Page 30: Gephi icwsm-tutorial

Using Gephi

DEMO

Page 31: Gephi icwsm-tutorial

Team work

Create a team of 2~3 people.1

Two teams present their preliminary findings.

Explore it during 1H.

Choose a dataset.2

3

4

Page 32: Gephi icwsm-tutorial

Dataset #1: GitHub Software Repository

“GitHub is an application used by nearly a million people to store over two million code repositories, making GitHub the largest code

host in the world.”

Started in 2008, it provides the features of an online social network and a software repository to lower the barriers of collaboration and make the code easier to contribute.

https://github.com

Page 33: Gephi icwsm-tutorial

Dataset #1: GitHub Software Repository

Data extracted by Franck Cuny* at Linkfluence SAS

1st release in March 2010 -> this poster2nd release in June 2011 -> your data

_____________Network of user profiles__________

Nodes: peoples with at least one repository who are followed by at least two other peopleEdges: A follows B

_____________Network of repositories__________

Nodes: repositoriesEdges: A shares a developer with B

Very few research publications on this OSN!

* [email protected]

Page 34: Gephi icwsm-tutorial

Dataset #1: GitHub Software Repository

Data extracted by a crawl using the GitHub APISeed: 10 well-known contributors in the Perl community

Networks by country: Japan, France, United StatesNetworks by language: Perl, PHP, Python, Ruby

Node attributes:• user country• number of followers• main programming language

Edges:• directed• weight = number of projects A has forked from B

Page 35: Gephi icwsm-tutorial

Dataset #1: GitHub Software Repository

Your mission (should you decide to accept it): find research hypotheses based on your exploration

Example question: are the Perl communities based on geography?

Page 36: Gephi icwsm-tutorial

Dataset #2: The Irish Blogosphere

_______________Blogroll Network______________

Nodes: blogs with more than two blogroll linksEdges: blogroll link (in-link)

_______________Post-link Network_____________

Nodes: blogs with more than two blogroll linksEdges: hyperlink inside post from a blog to another (post-link)

“Identifying Representative Textual Sources in Blog Networks”. K. Wade, D. Greene, C. Lee, D. Archambault, P. Cunningham (2011) http://mlg.ucd.ie/blogs

Page 37: Gephi icwsm-tutorial

Dataset #2: The Irish Blogosphere

Data extracted by a crawl at distance 2 from the seed for the in-links and Google Blog Search for the post-links.Seed: 21 popular blogs, winners of the “2010 Irish Blog Awards”

Node attributes:• post count = total number of posts by blog• category = from the irish blog index at www.irishblogdirectory.com,

where available• infomap_comm = community to which a node belongs (infomap algo)• gce_comms = overlapping communities (GCE algo)• moses_comms = overlapping communities (MOSES algo)

Edges:• directed• weight = number of hyperlinks in the Post-link network

crawl at distance 2 from the seed

Page 38: Gephi icwsm-tutorial

Dataset #2: The Irish Blogosphere

Your mission: explore and try to confirm the official results

Page 39: Gephi icwsm-tutorial

Hands-On!

Start:

• Load a graph• Apply a layout• Color the nodes by a qualitative variable in Partition Panel• Size the nodes by a quantitative variable in Ranking Panel• Start to explore...compute metrics, filter the network

End:

• Export maps to PDF in Preview Tab• Save

Page 40: Gephi icwsm-tutorial

Presentations

GitHub Repository Irish Blogosphere

Page 41: Gephi icwsm-tutorial

Gephi Documentation

Web Site:

Support:Wiki:Source code:

Online Tutorialshttp://gephi.org/users/quick-start/http://gephi.org/users/tutorial-visualization/http://gephi.org/users/tutorial-layouts/http://wiki.gephi.org/index.php/Import_CSV_Datahttp://wiki.gephi.org/index.php/Import_Dynamic_Data

Tutorial in Spanishhttps://code.google.com/p/camon/wiki/Taller_Gephi

Supported Graph Formatshttp://gephi.org/users/supported-graph-formats/

http://gephi.org

http://forum.gephi.org

http://wiki.gephi.org

https://launchpad.net/gephi

Page 42: Gephi icwsm-tutorial

Thank You!

Caspar David Friedrich - Wanderer Above the Sea of Fog

Page 43: Gephi icwsm-tutorial

Credits

[slide 11] images from Drew Conway

http://www.dataists.com/2010/10/what-data-visualization-should-do-simple-small-truth/

[slide 22 top left] Benoît Vidal at MFG Labs

[slide 22 bottom center] Franck Ghitalla at UTC

[slide 22 right] Studies in MA Digital Fashion at LCF by Peter Jeun Ho Tsang

http://jeunhotsang.com/blog/2010/12/07/prototype/

[slide 27] sketches from Ben Fry, Computational Information Design

Special Thanks to Franck Ghitalla and Mathieu Jacomy

for their insightful discussions.