26
Towards Efficient and Effective Semantic Table Interpretation Ziqi Zhang Department of Computer Science, University of Sheffield

Towards Efficient and Effective Semantic Table Interpretation

Embed Size (px)

DESCRIPTION

Presentation given by Ziqi Zhang at ISWC2014 on "Towards Efficient and Effective Semantic Table Interpretation"

Citation preview

Page 1: Towards Efficient and Effective Semantic Table Interpretation

Towards Efficient and Effective Semantic Table Interpretation

Ziqi Zhang Department of Computer Science, University of Sheffield

Page 2: Towards Efficient and Effective Semantic Table Interpretation

Outline

• Define semantic table interpretation

• State-of-the-art and motivation

• The method – TableMiner

• Evaluation

Z. Zhang / Towards Efficient and Effective Semantic Table Interpretation

Page 3: Towards Efficient and Effective Semantic Table Interpretation

Semantic Table Interpretation

• Input

• Ontology

• Relational table

• Goals/Tasks

• Label columns by concepts

• Link cells to named entities

• Connect columns by

relations

Z. Zhang / Towards Efficient and Effective Semantic Table Interpretation

Thing Work

Artist

Location

… …

Ent:USA

Ent:UK

… Film

Actor/ Actress

Country

Name Film Country

1 Tom Hanks Philadelphia USA

2 Jamie Foxx Ray USA

3 Kate Winslet The Reader UK

99 Charlize Theron

Monster South Africa

Table of Best Actor/Actress

< … … >

… … Rel:performIn

Rel:performIn

Page 4: Towards Efficient and Effective Semantic Table Interpretation

Semantic Table Interpretation

• Input

• Ontology

• Relational table

• Goals/Tasks

• Label columns by concepts

• Link cells to named entities

• Connect columns by

relations

Column classification/ header

disambiguation

Relation interpretation

Cell disambiguation

Page 5: Towards Efficient and Effective Semantic Table Interpretation

Motivation and State-of-the-art

• 154 mil. relational tables on the Web and growing [Cafarella2008]

• Classic Information Extraction methods do not work [Limaye2010, Lu2013]

• They cannot model the complex interdependence among table components

Z. Zhang / Towards Efficient and Effective Semantic Table Interpretation

Page 6: Towards Efficient and Effective Semantic Table Interpretation

Motivation and State-of-the-art

• SoA semantic table interpretation methods, e.g. [Limaye2010, Venetis2011, Mulwad2013]

Z. Zhang / Towards Efficient and Effective Semantic Table Interpretation

Limitation 1 Inference is ‘exhaustive’, but unnecessary

Name Film Country

1 Tom Hanks Philadelphia USA

2 Jamie Foxx Ray USA

3 Kate Winslet The Reader UK

99 Charlize Theron

Monster South Africa

Table of Best Actor/Actress

< … … >

Goal: Assign a concept to this column

Hint: Content in the column gives useful clues

How much do we need for inference (99 rows in this example)?

- Human: SOME (learn by example)

- SoA: ALL

Page 7: Towards Efficient and Effective Semantic Table Interpretation

Motivation and State-of-the-art

• SoA semantic table interpretation methods, e.g. [Limaye2010, Venetis2011, Mulwad2013]

Z. Zhang / Towards Efficient and Effective Semantic Table Interpretation

Limitation 2 Contextual features for inference

Table of Best Actor/Actress

SoA: features only from within the table

Context outside the table also makes hint for interpretation. E.g., the words in the paragraph are often found in descriptions of actors

Page 8: Towards Efficient and Effective Semantic Table Interpretation

TableMiner

Page 9: Towards Efficient and Effective Semantic Table Interpretation

TableMiner

• Two tasks:

• Column classification

• Cell disambiguation

• Non-exhaustive inference in a bootstrapping pattern

• phase 1 – inference with partial content

• phase 2 – propagation and update

• Contextual features both inside and outside tables

Z. Zhang / Towards Efficient and Effective Semantic Table Interpretation

Page 10: Towards Efficient and Effective Semantic Table Interpretation

TableMiner – Phase 1 I-Inf

• Incremental inference with stopping (I-Inf)

Tj – a column; Cj – candidate concepts for the column; Ei,j candidate entities for a cell

Z. Zhang / Towards Efficient and Effective Semantic Table Interpretation

Page 11: Towards Efficient and Effective Semantic Table Interpretation

TableMiner – Phase 1 I-Inf

• Incremental inference with stopping (I-Inf)

Tj – a column; Cj – candidate concepts for the column; Ei,j candidate entities for a cell

Z. Zhang / Towards Efficient and Effective Semantic Table Interpretation

Itr.1

….

(until stop)

Ei,j= {<e1,s1>, <e2,s2>, …}

Page 12: Towards Efficient and Effective Semantic Table Interpretation

TableMiner – Phase 1 I-Inf

• Incremental inference with stopping (I-Inf)

Tj – a column; Cj – candidate concepts for the column; Ei,j candidate entities for a cell

Z. Zhang / Towards Efficient and Effective Semantic Table Interpretation

Itr.1

….

(until stop)

Ei,j= {<e1,s1>, <e2,s2>, …}

concepts = {<c1,s1>, <c2,s2>, …}

Cj= {<c1,s1’>, <c2,s2‘>}

Page 13: Towards Efficient and Effective Semantic Table Interpretation

TableMiner – Phase 1 I-Inf

• Incremental inference with stopping (I-Inf)

Tj – a column; Cj – candidate concepts for the column; Ei,j candidate entities for a cell

Z. Zhang / Towards Efficient and Effective Semantic Table Interpretation

Itr.1

….

(until stop)

Ei,j= {<e1,s1>, <e2,s2>, …}

concepts = {<c1,s1>, <c2,s2>, …}

Cj= {<c1,s1’>, <c2,s2‘>}

|H(Cj) – H(prevCj)|<t? Yes – stop

No – next itr.

Page 14: Towards Efficient and Effective Semantic Table Interpretation

TableMiner – Phase 1 I-Inf

• Incremental inference with stopping (I-Inf)

Tj – a column; Cj – candidate concepts for the column; Ei,j candidate entities for a cell

Z. Zhang / Towards Efficient and Effective Semantic Table Interpretation

….

(until stop)

concepts = {<c1,s1>, <c3,s3>, …}

Cj= {<c1,s1’>, <c2,s2‘>, <c3,s3‘>}

Ei,j= {<e1,s1>, <e2,s2>, …}

Itr.2

|H(Cj) – H(prevCj)|<t? Yes – stop

No – next itr.

Page 15: Towards Efficient and Effective Semantic Table Interpretation

TableMiner – Phase 1 I-Inf

• Incremental inference with stopping (I-Inf)

Tj – a column; Cj – candidate concepts for the column; Ei,j candidate entities for a cell

Z. Zhang / Towards Efficient and Effective Semantic Table Interpretation

….

(until stop)

Itr.3 Ei,j= {<e1,s1>, <e2,s2>, …}

concepts = {<c11,s11>}

Cj= {<c1,s1’>, <c2,s2‘>, <c3,s3‘>, …. <c11,s11‘>}

|H(Cj) – H(prevCj)|<t? Yes – stop

No – next itr.

Page 16: Towards Efficient and Effective Semantic Table Interpretation

TableMiner – Phase 1 I-Inf

• To compute scores of candidate named entities (e.g.

<e1,s1>) and concepts (e.g., <c1,s1’>)

Z. Zhang / Towards Efficient and Effective Semantic Table Interpretation

• Candidate NE

• Build a feature vector of a candidate using the ontology

• Build a feature vector of the cell/column header using its context

• Compute vector similarity

• Candidate concept: same principle, but also depends on score of contributing NEs

Page 17: Towards Efficient and Effective Semantic Table Interpretation

TableMiner – Phase 2 Propagate, Update

• When I-Inf stops

• Select the highest scoring candidate concept c+ to label the column

• Propagate: use c+ as constrain to disambiguate remaining cells – candidate NEs not belonging to c+ are discarded

• Update:

• Re-compute c+ after all cells are disambiguated

• If the new c+ is different, revise disambiguation across the entire column with it as new constraint

• Repeat until no change

Z. Zhang / Towards Efficient and Effective Semantic Table Interpretation

Cj= {<c1,s1’>, <c2,s2‘>, <c3,s3‘>, …. <c11,s11‘>}

c+ Rank and select

Use as constraint to disamb-iguate cells

Page 18: Towards Efficient and Effective Semantic Table Interpretation

Evaluation

Z. Zhang / Towards Efficient and Effective Semantic Table Interpretation

Page 19: Towards Efficient and Effective Semantic Table Interpretation

TableMiner – Evaluation

• Data

• Freebase as reference ontology/background knowledge

• Limaye112 – 112 Web tables from Limaye2010 originally annotated with Wikipedia

• Cells are automatically mapped to Freebase – some are unmapped

• Columns are manually annotated

• IMDB – 7,354 “cast” tables of films mapped to Freebase

Z. Zhang / Towards Efficient and Effective Semantic Table Interpretation

Page 20: Towards Efficient and Effective Semantic Table Interpretation

TableMiner – Evaluation

• Baselines (both uses exhaustive inference)

• Bfirst - cell disambiguation: choose the top ranked NE candidate in the Freebase search result

- column classification: each disambiguated cell casts a vote to the set of concepts the NEs belong to, and the majority wins

• Bsim - cell disambiguation: string similarity + feature vector similarity (in-table context only)

- column classification: the majority vote method as above + string similarity

Z. Zhang / Towards Efficient and Effective Semantic Table Interpretation

Page 21: Towards Efficient and Effective Semantic Table Interpretation

TableMiner – Evaluation Results

• Cell disambiguation

Manual validation of 932 cell annotations in Limaye112

not covered by the above results (i.e., unmapped cells)

Z. Zhang / Towards Efficient and Effective Semantic Table Interpretation

If only consider those cells

where at least one system

predicts correctly

Page 22: Towards Efficient and Effective Semantic Table Interpretation

TableMiner – Evaluation Results

• Column classification

best only – a column is labelled correctly only if the concept

is suitable for the data in the column and is specific enough

best or ok – a column is labelled correctly if the concept is

suitable for the data in the column, though not very specific

(E.g., ‘Film Actors’ may be the best, while ‘Artist’ or

‘Person’ is OK, but ‘Engineer’ is incorrect)

Z. Zhang / Towards Efficient and Effective Semantic Table Interpretation

Page 23: Towards Efficient and Effective Semantic Table Interpretation

TableMiner – Evaluation Results

• Efficiency – TableMiner is efficient because

• Column classification: processes partial content from a column (avg. 57% Limaye112, 43% IMDB)

• Cell disambiguation: constrained by column classification, resulting in smaller NE candidate space (avg. 32% reduction Limaye32, 24% IMDB)

• Fewer candidates => less time spent on retrieval and feature space creation (typically >90% of CPU in the pipeline, Limaye2010)

Z. Zhang / Towards Efficient and Effective Semantic Table Interpretation

Page 24: Towards Efficient and Effective Semantic Table Interpretation

TableMiner – Conclusion

• TableMiner take-home messages

• How can it be more effective?

• Use both context within and outside tables as features for inference

• How can it be more efficient?

• Perform inference with partial data and follow the boot-strapping pattern of learning

Z. Zhang / Towards Efficient and Effective Semantic Table Interpretation

Message 1

Message 2

Page 25: Towards Efficient and Effective Semantic Table Interpretation

References

• [Cafarella2008] Cafarella, M.J., Halevy, A., Wang, D.Z., Wu, E., Zhang, Y. 2008: Webtables: exploring the power of tables on the web. Proceedings of VLDB Endowment 1(1), 538–549

• [Limaye2010] Limaye, G., Sarawagi, S., Chakrabarti, S. 2010: Annotating and searching web tables using entities, types and relationships. Proceedings of the VLDB Endowment 3(1-2), 1338–134

• [Lu2013] Lu, C., Bing, L., Lam, W., Chan, K., Gu, Y. 2013: Web entity detection for semi-structured text data records with unlabeled data. International Journal of Computational Linguistics and Applications

• [Mulwad2013] Mulwad, V., Finin, T., Joshi, A. 2013: Semantic message passing for generating linked data from tables. In: International Semantic Web Conference (1). pp. 363–378. Lecture Notes in Computer Science, Springer

• [Venetis2011] Venetis, P., Halevy, A., Madhavan, J., Pas ca, M., Shen,W.,Wu, F., Miao, G.,Wu, C. 2011: Recovering semantics of tables on the web. Proceedings of VLDB Endowment 4(9), 528–538

Z. Zhang / Towards Efficient and Effective Semantic Table Interpretation

Page 26: Towards Efficient and Effective Semantic Table Interpretation

Thank you

Z. Zhang / Towards Efficient and Effective Semantic Table Interpretation