Upload
michal-bachman
View
1.544
Download
2
Tags:
Embed Size (px)
DESCRIPTION
slides from my talk at WebExpo Prague 2013
Citation preview
GraphAwareTM
by Michal Bachman
and a bit of Graph Theory
(Big) Data Science
GraphAwareTM
“the sexiest job in the 21st century”
HARVARD BUSINESS REVIEW
Data Science
GraphAwareTM
by 2018 the United States could be short up to 190,000 people with the analytical skills ... to make wise use of virtual mountain ranges of data for critical decisions in business, energy, intelligence, health care, finance, and other fields.
McKinsey Global Institute (2011)
Data Science
GraphAwareTM
GraphAwareTM
“hybrid computer scientist/software engineer/statistician”
The Times
Data Scientist
GraphAwareTM
a collection of data sets that are large and complex.
Big Data
GraphAwareTM
is a function of size, connectedness, and uniformity.
Data Complexity
GraphAwareTM
a pattern of interconnections among a set of things.
Network
GraphAwareTM
Social ties
Information we consume
Technological and economic systems
...
Networks
GraphAwareTM
a pattern of interconnections among a set of things.
Network
GraphAwareTM
implicit consequences of one’s actions for the outcomes of everyone in the system
who is linked to whom
Structure Behaviour
GraphAwareTM
is the study of network structure.
Graph Theory
GraphAwareTM
0
25.0
50.0
75.0
100.0
2007 2008 2009 2010
GraphAwareTM
Leonhard Euler
GraphAwareTM
Seven Bridges of Königsberg
A
B
C D
GraphAwareTM
Graph Theory
A
B
C D
GraphAwareTM
Graph Theory
A
B
C D
GraphAwareTM
Graph Theory
A
B
C D
GraphAwareTM
Connected Graph
A
B
C D
E
F
GraphAwareTM
Connected Components
GraphAwareTM
is the social network of the entire world connected?
Question:
GraphAwareTM
(probably :-))
No.
GraphAwareTM
Giant Components
GraphAwareTM
how many giant components are there in a large, complex network?
Question:
GraphAwareTM
why?
1
GraphAwareTM
“I read somewhere that everybody on this planet is separated only by six other people. Six degrees of separation. Between us and everyone else on this planet.”
Six Degrees of Separation: A Play. (John Guare)
Six Degrees of Separation
GraphAwareTM
average Bacon number for all performers in the IMDb.
2.9
GraphAwareTM
Collaboration networks
Who-talks-to-whom graphs
Information linkage graphs
Technological networks
Natural world networks
Transport networks
...
Graphs Are Everywhere
GraphAwareTM
Domain interest
Proxy for a related network
Look for domain-agnostic properties
Motivations for Study
GraphAwareTM
People learned about new jobs through acquaintances rather than close friends.
Granovetter’s Experiment
A
B C
GraphAwareTM
Triadic Closure
A
B C
GraphAwareTM
Triadic Closure
A
B C
GraphAwareTM
If two people in a social network have a friend in common, then there is an increased likelihood that they will become friends themselves at some point in the future.
Triadic Closure
A
D
C
E
B
GraphAwareTM
Bridge
A
D
C
E
B
A
D
C
E
B
F H
J KG
GraphAwareTM
Local Bridge
A
B C
A
B C
GraphAwareTM
Strong Triadic Closure
A
D
C
E
BA
D
C
E
B
F H
J KG
A
D
C
E
B
F H
J KG
GraphAwareTM
Local Bridge = Weak Tie
A
B C
GraphAwareTM
Structural Balance
A
B C
GraphAwareTM
Structural Balance
A
B C
A
B C
A
B C
GraphAwareTM
Structural Balance
A
B C
A
B C
GraphAwareTM
Structural Balance
A
B C
A
B C
A
B C
A
B C
GraphAwareTM
Structural Balance
A
B C
B
C D
A
B
C D
A
GraphAwareTM
Structural Balance
GraphAwareTM
If a labelled complete graph is balanced, then either all pairs of nodes are friends, or else the nodes can be divided into two groups, X and Y, such that each pair of people in X likes each other, each pair of people in Y likes each other, and everyone in X is the enemy of everyone in Y.
The Balance Theorem
B
C D
A
B
C
D
A
GraphAwareTM
The Balance Theorem
GraphAwareTM
Graph Partitioning
GraphAwareTM
is an open-source, fully transactional graph database. It manipulates data in the form of a directed property graph with labelled vertices and edges.
Neo4j
name: "Drama"type: "genre"
name: "Triller"type: "genre"
name: "Pulp Fiction"year: 1994type: "movie"
DIRECTED
IS_OF_GENRE
name: "Quentin Tarantino"type: "person"
name: "Director"type: "occupation"
name: "Actor"type: "occupation"
IS_OF_GENRE
ACTED_IN
name: "Samuel L. Jackson"type: "person"
IS_A
IS_A
IS_A
ACTED_IN
role: "Jules Winnfield"
role: "Jimmie Dimmick"
GraphAwareTM
Neo4j
GraphAwareTM
MATCH (a)-[:ACTED_IN]->(m)
Cypher Query Language
GraphAwareTM
MATCH (a)-[:ACTED_IN]->(m)
Cypher Query Language
GraphAwareTM
START a=node(*)MATCH (a)-[:ACTED_IN]->(m)
Cypher Query Language
GraphAwareTM
START a=node(*)MATCH (a)-[:ACTED_IN]->(m)RETURN a.name, count(m)
Cypher Query Language
GraphAwareTM
START a=node(*)MATCH (a)-[:ACTED_IN]->(m)RETURN a.name, count(m)ORDER BY count(m) DESC
Cypher Query Language
GraphAwareTM
START a=node(*)MATCH (a)-[:ACTED_IN]->(m)RETURN a.name, count(m)ORDER BY count(m) DESCLIMIT 5;
Cypher Query Language
GraphAwareTM
==> +-----------------------------+==> | a.name | count(m) |==> +-----------------------------+==> | "Tom Hanks" | 12 |==> | "Keanu Reeves" | 7 |==> | "Hugo Weaving" | 5 |==> | "Meg Ryan" | 5 |==> | "Jack Nicholson" | 5 |==> +-----------------------------+==> 5 rows==> ==> 47 ms
Cypher Query Language
GraphAwareTM
www.graphaware.com@graph_aware
Thank You