63
GraphAware TM by Michal Bachman and a bit of Graph Theory (Big) Data Science

(Big) Data Science

Embed Size (px)

DESCRIPTION

slides from my talk at WebExpo Prague 2013

Citation preview

Page 1: (Big) Data Science

GraphAwareTM

by Michal Bachman

and a bit of Graph Theory

(Big) Data Science

Page 2: (Big) Data Science

GraphAwareTM

“the sexiest job in the 21st century”

HARVARD BUSINESS REVIEW

Data Science

Page 3: (Big) Data Science

GraphAwareTM

by 2018 the United States could be short up to 190,000 people with the analytical skills ... to make wise use of virtual mountain ranges of data for critical decisions in business, energy, intelligence, health care, finance, and other fields.

McKinsey Global Institute (2011)

Data Science

Page 4: (Big) Data Science

GraphAwareTM

Page 5: (Big) Data Science

GraphAwareTM

“hybrid computer scientist/software engineer/statistician”

The Times

Data Scientist

Page 6: (Big) Data Science
Page 7: (Big) Data Science
Page 8: (Big) Data Science

GraphAwareTM

a collection of data sets that are large and complex.

Big Data

Page 9: (Big) Data Science

GraphAwareTM

is a function of size, connectedness, and uniformity.

Data Complexity

Page 10: (Big) Data Science

GraphAwareTM

a pattern of interconnections among a set of things.

Network

Page 11: (Big) Data Science

GraphAwareTM

Social ties

Information we consume

Technological and economic systems

...

Networks

Page 12: (Big) Data Science

GraphAwareTM

a pattern of interconnections among a set of things.

Network

Page 13: (Big) Data Science
Page 14: (Big) Data Science

GraphAwareTM

implicit consequences of one’s actions for the outcomes of everyone in the system

who is linked to whom

Structure Behaviour

Page 15: (Big) Data Science

GraphAwareTM

is the study of network structure.

Graph Theory

Page 16: (Big) Data Science

GraphAwareTM

0

25.0

50.0

75.0

100.0

2007 2008 2009 2010

Page 17: (Big) Data Science

GraphAwareTM

Leonhard Euler

Page 18: (Big) Data Science

GraphAwareTM

Seven Bridges of Königsberg

Page 19: (Big) Data Science

A

B

C D

GraphAwareTM

Graph Theory

Page 20: (Big) Data Science

A

B

C D

GraphAwareTM

Graph Theory

Page 21: (Big) Data Science

A

B

C D

GraphAwareTM

Graph Theory

Page 22: (Big) Data Science

A

B

C D

GraphAwareTM

Connected Graph

Page 23: (Big) Data Science

A

B

C D

E

F

GraphAwareTM

Connected Components

Page 24: (Big) Data Science

GraphAwareTM

is the social network of the entire world connected?

Question:

Page 25: (Big) Data Science

GraphAwareTM

(probably :-))

No.

Page 26: (Big) Data Science

GraphAwareTM

Giant Components

Page 27: (Big) Data Science

GraphAwareTM

how many giant components are there in a large, complex network?

Question:

Page 28: (Big) Data Science

GraphAwareTM

why?

1

Page 29: (Big) Data Science

GraphAwareTM

“I read somewhere that everybody on this planet is separated only by six other people. Six degrees of separation. Between us and everyone else on this planet.”

Six Degrees of Separation: A Play. (John Guare)

Six Degrees of Separation

Page 30: (Big) Data Science
Page 31: (Big) Data Science
Page 32: (Big) Data Science
Page 33: (Big) Data Science

GraphAwareTM

average Bacon number for all performers in the IMDb.

2.9

Page 34: (Big) Data Science

GraphAwareTM

Collaboration networks

Who-talks-to-whom graphs

Information linkage graphs

Technological networks

Natural world networks

Transport networks

...

Graphs Are Everywhere

Page 35: (Big) Data Science

GraphAwareTM

Domain interest

Proxy for a related network

Look for domain-agnostic properties

Motivations for Study

Page 36: (Big) Data Science

GraphAwareTM

People learned about new jobs through acquaintances rather than close friends.

Granovetter’s Experiment

Page 37: (Big) Data Science

A

B C

GraphAwareTM

Triadic Closure

Page 38: (Big) Data Science

A

B C

GraphAwareTM

Triadic Closure

A

B C

Page 39: (Big) Data Science

GraphAwareTM

If two people in a social network have a friend in common, then there is an increased likelihood that they will become friends themselves at some point in the future.

Triadic Closure

Page 40: (Big) Data Science

A

D

C

E

B

GraphAwareTM

Bridge

Page 41: (Big) Data Science

A

D

C

E

B

A

D

C

E

B

F H

J KG

GraphAwareTM

Local Bridge

Page 42: (Big) Data Science

A

B C

A

B C

GraphAwareTM

Strong Triadic Closure

Page 43: (Big) Data Science

A

D

C

E

BA

D

C

E

B

F H

J KG

A

D

C

E

B

F H

J KG

GraphAwareTM

Local Bridge = Weak Tie

Page 44: (Big) Data Science

A

B C

GraphAwareTM

Structural Balance

Page 45: (Big) Data Science

A

B C

GraphAwareTM

Structural Balance

A

B C

A

B C

Page 46: (Big) Data Science

A

B C

GraphAwareTM

Structural Balance

A

B C

Page 47: (Big) Data Science

A

B C

GraphAwareTM

Structural Balance

A

B C

A

B C

A

B C

Page 48: (Big) Data Science

A

B C

GraphAwareTM

Structural Balance

A

B C

Page 49: (Big) Data Science

B

C D

A

B

C D

A

GraphAwareTM

Structural Balance

Page 50: (Big) Data Science

GraphAwareTM

If a labelled complete graph is balanced, then either all pairs of nodes are friends, or else the nodes can be divided into two groups, X and Y, such that each pair of people in X likes each other, each pair of people in Y likes each other, and everyone in X is the enemy of everyone in Y.

The Balance Theorem

Page 51: (Big) Data Science

B

C D

A

B

C

D

A

GraphAwareTM

The Balance Theorem

Page 52: (Big) Data Science
Page 53: (Big) Data Science

GraphAwareTM

Graph Partitioning

Page 54: (Big) Data Science

GraphAwareTM

is an open-source, fully transactional graph database. It manipulates data in the form of a directed property graph with labelled vertices and edges.

Neo4j

Page 55: (Big) Data Science

name: "Drama"type: "genre"

name: "Triller"type: "genre"

name: "Pulp Fiction"year: 1994type: "movie"

DIRECTED

IS_OF_GENRE

name: "Quentin Tarantino"type: "person"

name: "Director"type: "occupation"

name: "Actor"type: "occupation"

IS_OF_GENRE

ACTED_IN

name: "Samuel L. Jackson"type: "person"

IS_A

IS_A

IS_A

ACTED_IN

role: "Jules Winnfield"

role: "Jimmie Dimmick"

GraphAwareTM

Neo4j

Page 56: (Big) Data Science

GraphAwareTM

MATCH (a)-[:ACTED_IN]->(m)

Cypher Query Language

Page 57: (Big) Data Science

GraphAwareTM

MATCH (a)-[:ACTED_IN]->(m)

Cypher Query Language

Page 58: (Big) Data Science

GraphAwareTM

START a=node(*)MATCH (a)-[:ACTED_IN]->(m)

Cypher Query Language

Page 59: (Big) Data Science

GraphAwareTM

START a=node(*)MATCH (a)-[:ACTED_IN]->(m)RETURN a.name, count(m)

Cypher Query Language

Page 60: (Big) Data Science

GraphAwareTM

START a=node(*)MATCH (a)-[:ACTED_IN]->(m)RETURN a.name, count(m)ORDER BY count(m) DESC

Cypher Query Language

Page 61: (Big) Data Science

GraphAwareTM

START a=node(*)MATCH (a)-[:ACTED_IN]->(m)RETURN a.name, count(m)ORDER BY count(m) DESCLIMIT 5;

Cypher Query Language

Page 62: (Big) Data Science

GraphAwareTM

==> +-----------------------------+==> | a.name | count(m) |==> +-----------------------------+==> | "Tom Hanks" | 12 |==> | "Keanu Reeves" | 7 |==> | "Hugo Weaving" | 5 |==> | "Meg Ryan" | 5 |==> | "Jack Nicholson" | 5 |==> +-----------------------------+==> 5 rows==> ==> 47 ms

Cypher Query Language

Page 63: (Big) Data Science

GraphAwareTM

www.graphaware.com@graph_aware

Thank You