37
Rule Mining and applications in Social Data Luis Galárraga Télécom ParisTech Presented at: International Workshop on Social Media and Culture 2014 Daejeon, Korea April 4th, 2014 1

Rule Mining and Applications in Social Data

Embed Size (px)

DESCRIPTION

An overview of potential applications of rule mining on social graphs presented at the International Workshop of Social Media and Culture 2014.

Citation preview

Page 1: Rule Mining and Applications in Social Data

Rule Mining and applications in Social Data

Luis GalárragaTélécom ParisTech

Presented at:International Workshop on Social Media and Culture 2014

Daejeon, KoreaApril 4th, 2014

1

Page 2: Rule Mining and Applications in Social Data

Natural Language vs Knowledge Bases (KBs)

Natural Language Knowledge Bases

2

Is a performs

born On

Feb 2, 1977

Singer

Hips don't lie

Shakira

Page 3: Rule Mining and Applications in Social Data

Natural Language vs Knowledge Bases (KBs)

Natural Language Knowledge Bases

Understandable forcomputer programs

3

Suitable for humans but difficult

for computers

Page 4: Rule Mining and Applications in Social Data

Some popular KBs

4

Page 5: Rule Mining and Applications in Social Data

KBs in action

Page 6: Rule Mining and Applications in Social Data

Some popular KBs

6

Page 7: Rule Mining and Applications in Social Data

Social graphs are KBs

7

● Sources may be different but they both share: ● Natural graph-like structure likes

Luis Galárraga

Shakira

Hips don't lie

friendOf

Shamiralikes

performs

likes

Page 8: Rule Mining and Applications in Social Data

Social graphs are KBs

8

● Sources may be different but they both share: ● Natural graph-like structure

● Incompleteness

likes

Luis Galárraga

friendOf

Shamira

likes

Shakira

Hips don't lie

likes

performs

likes

Page 9: Rule Mining and Applications in Social Data

likes

Social graphs are KBs

9

likes

Luis Galárraga

friendOf

Shamira

likes

● Sources may be different but they both share: ● Natural graph-like structure

● Incompleteness

Shakira

Hips don't lie

likes

performs

Maybe nobody asked me? :(

Page 10: Rule Mining and Applications in Social Data

Social graphs are KBs

10

likes

Luis Galárraga

friendOf

Shamira

● Sources may be different but they both share: ● Natural graph-like structure

● Incompleteness

● Opportunities for data description and prediction.

Shakira

Hips don't lie

likes

performs

likes

Page 11: Rule Mining and Applications in Social Data

Social graphs are KBs

likes

Luis Galárraga

friendOf

Shamira

● Sources may be different but they both share: ● Natural graph-like structure

● Incompleteness

● Opportunities for data description and prediction.

90% of computer scientists like political party X

If you like Shakira you are likely to buy her latest song

Shakira

Hips don't lie

likes

performs

likes

11

Page 12: Rule Mining and Applications in Social Data

Rule Mining and KBs

● Data Mining is about finding interesting and non-obvious correlations in data.

● Correlations are rules that hold often.● You probably live in the same city of your spouse.● If you like an artist, you like her songs.

● They can be formulated as logical rules:

12

isMarriedTo(x, y) ^ livesIn(x, city) => livesIn(y, city)

likes(x, artist) ^ performs(artist, song) => likes(x, song)

Page 13: Rule Mining and Applications in Social Data

Applications for social data

● Recommendations

likes

Luis Galárraga

friendOf

Shamira

Shakira

Hips don't lie

likes

performs

likes(x, artist) ^ performs(artist, song) => likes(x, song)

likes

13

Page 14: Rule Mining and Applications in Social Data

Applications for social data

● Recommendations

likes

Luis Galárraga

friendOf

Shamira

Shakira

Hips don't lie

likes

performs

likes(x, artist) ^ performs(artist, song) => likes(x, song)

likes

14

Page 15: Rule Mining and Applications in Social Data

Applications for social data

● Recommendations

likes

Luis Galárraga

friendOf

Shamira

Shakira

Hips don't lie

performs

likes(x, artist) ^ performs(artist, song) => likes(x, song)

likes

likes

likes

15

Page 16: Rule Mining and Applications in Social Data

Applications for social data

● Market basket analysis.● People who buy laptops also buy laptop cases.

● Link and event prediction● Two people who attended the same high school the same

year might know each other.● If you registered for this workshop, then you are coming to

Daejeon (and need to book a flight and hotel).

● Dealing with incompleteness● If you like German newspapers, fluency in German is

perhaps missing in your profile.

16

Page 17: Rule Mining and Applications in Social Data

Challenges

17

Page 18: Rule Mining and Applications in Social Data

Challenges of Rule Mining from KBs

● Scalability● State-of-the-art approaches for rule mining cannot

handle the size of current KBs.– YAGO: 10M entities, 120M facts– Dbpedia 3.8: 24.9M entities, 1.98B facts.– Facebook Graph: 1.2B users

● Rule Mining requires exhaustive search of the data.

18

Page 19: Rule Mining and Applications in Social Data

Challenges of Rule Mining from KBs

● Scalability● State-of-the-art approaches cannot handle the size of

current KBs.– YAGO: 10M entities, 120M facts– Dbpedia 3.8: 24.9M entities, 1.98B facts.– Facebook Graph: 1.2B users

● Rule Mining requires exhaustive search of the data.

● Solution:● Language bias.● A set of pruning heuristics.● Optimized storage implementation. 19

Page 20: Rule Mining and Applications in Social Data

AMIE: Association Rule Mining Under Incomplete Evidence

● AMIE is a system that learns Horn rules such as:

● Starting with all possible head relations r(x,y) and a minimum support threshold:– The system explores the search space by means of

carefully designed mining operators.– Search space is restricted to closed Horn rules.– Monotonicity of support helps pruning non-promising paths.– It relies on an optimized in-memory database.– Confidence gain is used to prune the output.

20

livesIn(x, city) ^ isMarriedTo(x, y) => livesIn(y, city)

L. Galárraga, C. Teflioudi, K. Hose, and F. M. Suchanek. AMIE: association rule mining under incomplete evidence in ontological knowledge bases. In WWW, 2013.

Page 21: Rule Mining and Applications in Social Data

y citylivesIn

21

livesIn(x, city) ^ isMarriedTo(x, y) => livesIn(y, city)

Page 22: Rule Mining and Applications in Social Data

y citylivesIn

22

Page 23: Rule Mining and Applications in Social Data

y citylivesIn

Add dangling atom (OD) y city

livesIn?r

isMarriedTolivesIn

….

x

23

Page 24: Rule Mining and Applications in Social Data

y citylivesIn

Add dangling atom (OD) y city

livesIn?r

isMarriedTolivesIn

….

x

y citylivesInmarriedTo

x

24

Page 25: Rule Mining and Applications in Social Data

y citylivesIn

Add dangling atom (OD) y city

livesIn?r

isMarriedTolivesIn

….

x

y citylivesInmarriedTo

x

Add closing atom (OC) y city

livesInx

isMarriedTo

?rlivesIndiedIn

25

Page 26: Rule Mining and Applications in Social Data

y citylivesIn

Add dangling atom (OD) y city

livesIn?r

isMarriedTolivesIn

….

x

y citylivesInmarriedTo

x

Add closing atom (OC) y city

livesInx

isMarriedTo

?rlivesIndiedIn

livesIn

y citylivesIn

xisMarriedTo

26

Page 27: Rule Mining and Applications in Social Data

y citylivesIn

Add dangling atom (OD) y city

livesIn?r

isMarriedTolivesIn

….

x

y citylivesInmarriedTo

x

Add closing atom (OC) y city

livesInx

isMarriedTo

?rlivesIndiedIn

livesIn

y citylivesIn

xisMarriedTo

27livesIn(x, city) ^ isMarriedTo(x, y) => livesIn(y, city)

Page 28: Rule Mining and Applications in Social Data

AMIE: Association Rule Mining Under Incomplete Evidence

Minimum support threshold

RDF KB

k

11

Concurrent mining implementation

Tailored In-memory DB

28

Page 29: Rule Mining and Applications in Social Data

AMIE: Association Rule Mining Under Incomplete Evidence

Facts RulesYAGO2 1M 3.62min 138

1M 17.76min 18K6.7M 2.89min 6.9K

Dataset Runtime

YAGO2 (const)Dbpedia (2 atoms)

AMIE finds rules in medium-size ontologies in a few minutes.

Page 30: Rule Mining and Applications in Social Data

Challenges of Rule Mining on KBs

● Incompleteness● Graph data often contains gaps.

● Open World Assumption (OWA)● Absence of evidence is not evidence of absence

● Problem to estimate the confidence of a rule.

Page 31: Rule Mining and Applications in Social Data

Challenges of Rule Mining on KBs

● Incompleteness● Graph data often contains gaps.

● Open World Assumption (OWA)● Absence of evidence is not evidence of absence

● Problem to estimate the confidence of a rule.

likes

Luis Galárraga

friendOf

ShamiraShakira

likes

citizenOf

likes(x, Shakira) => isCitizenOf(x, Ecuador)

Ecuador

Page 32: Rule Mining and Applications in Social Data

Challenges of Rule Mining on KBs

● Incompleteness● Graph data often contains gaps.

● Open World Assumption (OWA)● Absence of evidence is not evidence of absence

● Problem to estimate the confidence of a rule.

likes(x, Shakira) => isCitizenOf(x, Ecuador) likes

Luis Galárraga

friendOf

ShamiraShakira

likes

citizenOf

Standard confidence uses a CWA and counts Shamira as counterexample. Score = 0.5

Ecuador

Page 33: Rule Mining and Applications in Social Data

Challenges of Rule Mining on KBs

likes(x, Shakira) => isCitizenOf(x, Ecuador)

33

likes

Luis Galárraga

friendOf

ShamiraShakira

likes

Ecuador

AMIE uses the Partial Completeness Assumption (PCA) to estimate the confidence of rules under OWA.

A KB knows all or none of the nationalities of a person.

citizenOf

Page 34: Rule Mining and Applications in Social Data

Challenges of Rule Mining on KBs

likes(x, Shakira) => isCitizenOf(x, Ecuador)

34

likes

Luis Galárraga

friendOf

ShamiraShakira

likes

Ecuador

AMIE uses the Partial Completeness Assumption (PCA) to estimate the confidence of rules under OWA.

A KB knows all or none of the nationalities of a person.

citizenOf

PCA confidence considers as counterexamples only those people whose nationality is known to be different from Ecuador. Score = 1.0

Page 35: Rule Mining and Applications in Social Data

AMIE: Association Rule Mining under Incomplete Evidence

● PCA confidence has better predictive behavior than the standard confidence.

35

Page 36: Rule Mining and Applications in Social Data

AMIE: Association Rule Mining under Incomplete Evidence

isMarriedTo(x, y) livesIn(x, z) => livesIn(y, z)∧isCitizenOf(x, y) => livesIn(x, y)hasAdvisor(x, y) graduatedFrom(x, z) => worksAt(y, z)∧hasWonPrize(x, Gottfried Wilhelm Leibniz Prize) => livesIn(x, Germany)

● Some rules mined by AMIE on YAGO:

Page 37: Rule Mining and Applications in Social Data

Research outlook● Mine other types of logical rules for more

applications.● Numerical correlations for data description and

prediction.– If you like Justin Bieber, then you are probably less than

18.● Rules involving temporal information for event

prediction.– If a person bought a laptop today, then she will buy a hard

disk in approximately one month.– If a person traveled for Christmas to the same place in the

last two years, then she will probably do it this year.37