DBtrends Exploring Query Logs for Ranking RDF Data · 2019-09-19 · Edgard Marx, Gustavo Publio...

Preview:

Citation preview

CACAOConditional Spread Activationfor Keyword Query Interpretation

15th Internation Conference on Semantic WebKarlsruhe 2019

Edgard Marx, Gustavo Publio & Thomas Riechert

LiberAI

2

openQA

*https://github.com/AKSW/openQA

Previous Works

LiberAI

3

KBox

KBox

Is the Knowledge

Graph in KBox?

YES

NOResolve

User provides SPARQLquery and the target Knowledge Graph URI

ExecuteSPARQL Query

DDNS

User/Application

SELECT ?s ?p ?o ...

+http://knowledgegraph.uri

Dereference

*https://github.com/AKSW/KBoxLiberAI

Previous Works

4

NSpM

*https://github.com/AKSW/NSpMLiberAI

Previous Works

5

DBNQA

*https://github.com/AKSW/DBNQA

https://wikipedia.org/wiki/List_of_datasets_for_machine-learning_research

Previous Works

LiberAI

Outline

• Motivation

• Problem Statment

• Background

• Approach

• Evaluation

• Counclusion & Future Works

6LiberAI

Resource Description

Framework (RDF)

Unstructured

Linked Data

Structured

Crowd

Linked Data

LiberAI

7

Motivation

8

• Entity Retrieval• Question Answering• Entity Linking

Linked Data

Resource Description

Framework (RDF)Access (the information needed)

LiberAI

Motivation

9

Resource Description

Framework (RDF)Entity Retrieval

Formally, a top-k entity retrieval takes a keyword query Q,

an integer 0 < k, a set of entities E={e1, e2, . . . , e|E|},

and returns the top-k entities based on

a scoring function S(Q, e)

LiberAI

Problem statement

10

Resource Description

Framework (RDF)Entity Retrieval

Select * where {

?s dbo:birthPlace dbpedia:Leipzig

}

LiberAI

Problem statement

TF-IDF

11

𝑤𝑡, 𝑑= 𝑡𝑓𝑡, 𝑑 ∙ 𝑙𝑜𝑔

|𝐷|

| 𝑑′ ∈ 𝐷 𝑡 ∈ 𝑑′

LiberAI

Background

BM25

12

"This can opener can open any can that a can opener can and if this can opener cannot open any can that a can opener can, then this can opener is free"

“If this can opener does not open any can, then you can take it for free”

LiberAI

Background

BM25

13LiberAI

Background

BM25F

14LiberAI

Background

Field Weighted

15LiberAI

weight(p1) > weight(p2) > weight(p3) …

Background

Learning to Rank

16LiberAI

Background

17LiberAI

Entity Link in Queries

Entity Linking

Query

Entity Candidate Generation

Ranking

Entities

Query Expansion

Query

Linked Entities

Entity Retrieval

Background

General IR Architecture

18

Entity Retrieval, Entity Linking, Question Answering

Query

Entity Candidate Generation

Ranking

Result

Query Expansion

LiberAI

Background

Interaction

19

Entity Retrieval Entity Linking Question Answering

Query

Entity Candidate Generation

Ranking

Entities

Query Expansion

Query

Entity Candidate Generation

Ranking

Linked Entities

Query Expansion

Query

Entity Candidate Generation

Ranking

Answer

Query Expansion

LiberAI

Background

Lastest Architecture

20

Entity Retrieval Entity Linking

Query

Entity Candidate Generation

Ranking

Entities

Query Expansion

Query

Entity Candidate Generation

Ranking

Linked Entities

Query Expansion

LiberAI

Background

Lastest Architecture

21LiberAI

Background

Current Scenario

22

Entity Retrieval Entity Linking Question Answering

Query

Entity Candidate Generation

Ranking

Entities

Query Expansion

Query

Entity Candidate Generation

Ranking

Linked Entities

Query Expansion

Query

Entity Candidate Generation

Ranking

Answer

Query Expansion

LiberAI

Background

Return linked resources

23

Entity Retrieval Entity Linking Question Answering

Query

Entity Candidate Generation

Ranking

Linked Resources

Query Expansion

Query

Entity Candidate Generation

Ranking

Linked Entities

Query Expansion

Query

Entity Candidate Generation

Ranking

Answer

Query Expansion

LiberAI

1

Proposal

Return the answer

24

Entity Retrieval Entity Linking Question Answering

Query

Entity Candidate Generation

Ranking

Linked Resources +

Answer

Query Expansion

Query

Entity Candidate Generation

Ranking

Linked Entities

Query Expansion

Query

Entity Candidate Generation

Ranking

Answer

Query Expansion

LiberAI

1

2

Proposal

Challenge 1: *Local Term Frequency

25LiberAI

*not to confuse with global

dbr:Albert_Leadbr:Albert_Lea_Public_Library …

18

1

“Albert Lea …”

“Albert Lea …”

“Albert Lea …” …

7

1

“Albert Lea …”

“Albert Lea …”

“Albert Lea …”

Q: “Albert Lea”

Challenge 2: Cohesion

26LiberAI

Q: “Albert Lea”

dbr:Albert_Leadbr:Albert_Lea_Public_Library …

18

1

“Albert Lea …”

“Albert Lea …”

“Albert Lea …” …

7

1

“Albert Lea …”

“Albert Lea …”

“Albert Lea …”

A B

27LiberAI

Q: “Albert Lea”

dbr:Albert_Leadbr:Albert_Lea_Public_Library …

18

1

“Albert Lea …”

“Albert Lea …”

“Albert Lea …” …

7

1

“Albert Lea …”

“Albert Lea …”

“Albert Lea …”

Preposition 1 [1][2]: maximize the number of tokens and reduce the number of mapped entities

A B

[1] SWJ, Shekarpour et al. 2015, [2] NAACL, Sakor et al. 2019

Challenge 2: Cohesion

28LiberAI

Q: “Albert Lea”

dbr:Albert_Leadbr:Albert_Lea_Public_Library

18“Albert Lea …”

7“Albert Lea …”

Proposition 1 [1][2]: maximize the number of tokens and reduce the number of mapped entities

A B

Does NOT satisfy: Either A and B comply with the proposition 1

[1] SWJ, Shekarpour et al. 2015, [2] NAACL, Sakor et al. 2019

Challenge 2: Cohesion

29LiberAI

Q: “Albert Lea”

dbr:Albert_Leadbr:Albert_Lea_Public_Library …

18

1

“Albert Lea …”

“Albert Lea …”

“Albert Lea …” …

7

1

“Albert Lea …”

“Albert Lea …”

“Albert Lea …”

Proposition 2 [3][4]: use the context

A B

[3] ISWC, Usbeck et al. 2014, [4] KCap, Moussallem et al. 2017

Challenge 2: Cohesion

30LiberAI

Q: “Albert Lea”

Proposition 2 [3][4]: use the context

Does NOT satisfy: Either A and B comply with preposition 2

[3] ISWC, Usbeck et al. 2014, [4] KCap, Moussallem et al. 2017

dbr:Albert_Leadbr:Albert_Lea_Public_Library

18“Albert Lea …”

7“Albert Lea …”

A B

Challenge 2: Cohesion

31LiberAI

Q: “strawberry ice cream”

Proposition 3 [5][6]: Max Similarity (Levenshtein and Jaccard)

dbr:Strawberry_ice_cream

“Strawberry_ice_cream”A

dbr:Banana_split

“strawberry”B

“ice cream”

[5] AAAI, Zhang et al. 2016, [6] VLDB, Zheng et al. 2019

Challenge 2: Cohesion

32LiberAI

Q: “strawberry ice cream”

Proposition 3 [5][6]: Max Similarity (Levenshtein and Jaccard)

Does NOT satisfy: Either A and B comply with preposition 3

[5] AAAI, Zhang et al. 2016, [6] VLDB, Zheng et al. 2019

dbr:Strawberry_ice_cream

“Strawberry_ice_cream”A

dbr:Banana_split

“strawberry”B

“ice cream”

1

1/3

2/3

Challenge 2: Cohesion

Challenge 3: Scoring Function

33LiberAI

0

1

2

3

4

5

6

7

8

9

10

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

BM25F

Score

Where is the answer?

Approach

34LiberAI

Can we do better?

Conditional Spread Activation

35LiberAI

dbr:Albert_Leadbr:Albert_Lea_Public_Library …

18

1

“Albert Lea …”

“Albert Lea …”

“Albert Lea …” …

7

1

“Albert Lea …”

“Albert Lea …”

“Albert Lea …”

s = 0

s = 0s > 0

s > 0

Q: “Albert Lea”Challenge 1: Local Term Frequency

A B

Approach

Field Weight

36LiberAI

dbr:Albert_Leadbr:Albert_Lea_Public_Library

1“Albert Lea …”

1“Albert Lea …”

s > 0 s > 0

Q: “Albert Lea”Challenge 1: Local Term FrequencyChallenge 2: Cohesion

weight(ptype) > weight(plabel) > weight(pothers)

A B

Approach

Field Weight

37LiberAI

dbr:Albert_Leadbr:Albert_Lea_Public_Library

1“Albert Lea …”

1“Albert Lea …”

Q: “Albert Lea”Challenge 1: Local Term FrequencyChallenge 2: Cohesion

𝑆 𝑄𝑖, 𝐿𝑖 = ∑𝑄𝑖𝐿𝑖 if ∑𝑄𝑖𝐿𝑖 = 1

A B

Approach

Field Weight

38LiberAI

dbr:Albert_Leadbr:Albert_Lea_Public_Library

1“Albert Lea …”

1“Albert Lea …”

1 > j(𝑄𝑖, 𝐿𝑖)

Q: “Albert Lea”Challenge 1: Local Term FrequencyChallenge 2: Cohesion

𝑆(𝑄𝑖, 𝐿𝑖) = ∑𝑄𝑖𝐿𝑖

22 = 4

A B

𝑆 𝑄𝑖, 𝐿𝑖 = ∑𝑄𝑖𝐿𝑖 if ∑𝑄𝑖𝐿𝑖 = 1

Approach

Field Weight

39LiberAI

Challenge 1: Local Term FrequencyChallenge 2: CohesionChallenge 3: Scoring Function

0

500

1000

1500

2000

2500

1 2 3 4 5 6

*PThe answer!!

Approach

Pipeline

40

Query

Entity Candidate Generation

Ranking

Result

Query Expansion

LiberAI

Evaluation

Benchmark

41LiberAI

Evaluation

DBpedia-Entity

42LiberAI

Evaluation

QALD-4

43LiberAI

Evaluation

Conclusion & Future Works

44LiberAI

• We have shown a promising algorithm that outperform state-of-the-art ER engines on standard benchmarks by addressing:

o Local Term Frequency;o Cohesion, and;o Score Function problems

• It is still too soon to trace conclusions (law of small numbers)achieves that• Improve the algorithm runtime so it can be used at scale

Acknowledgements

45

LiberAI

@theLiberAI @HTWKLeipzig @edgardmarx @akswgroup

Acknowledgements

46

@theLiberAI @HTWKLeipzig @edgardmarx @akswgroup

Thank you!!!* 100+ GitHub stars

* 40+ forks

Recommended