64
A Robust Approach to Aligning Heterogeneous Lexical Resources Mohammad Taher Pilehvar Roberto Navigli MultiJEDI ERC 259234

A Robust Approach to Aligning Heterogeneous Lexical Resources

  • Upload
    luigi

  • View
    54

  • Download
    0

Embed Size (px)

DESCRIPTION

A Robust Approach to Aligning Heterogeneous Lexical Resources. Mohammad Taher Pilehvar Roberto Navigli. MultiJEDI ERC 259234. Lexical Resource. WordNet. BabelNet. UBY. Lexical Resource. WordNet. BabelNet. UBY. Lexical Resource. WordNet. BabelNet. UBY. - PowerPoint PPT Presentation

Citation preview

Page 1: A  Robust Approach  to  Aligning Heterogeneous Lexical Resources

A Robust Approach to Aligning Heterogeneous Lexical Resources

Mohammad Taher PilehvarRoberto Navigli

MultiJEDI ERC 259234

Page 2: A  Robust Approach  to  Aligning Heterogeneous Lexical Resources

Lexical Resource

WordNet

BabelNet

UBY

Page 3: A  Robust Approach  to  Aligning Heterogeneous Lexical Resources

Lexical Resource

WordNet

BabelNet

UBY

Page 4: A  Robust Approach  to  Aligning Heterogeneous Lexical Resources

Lexical Resource

WordNet

BabelNet

UBY

Page 5: A  Robust Approach  to  Aligning Heterogeneous Lexical Resources

Improved word and concept coveragee.g., named entities, new senses

Improved domain coverage

Improved multilingualitydozens of new languages

Expert-made relations preservede.g., Hypernymy, meronymy, etc.

Why combine resources?

Page 6: A  Robust Approach  to  Aligning Heterogeneous Lexical Resources

Provides complementary knowledge

Applications:

Semantic Parsing Shi and Mihalcea, 2005

Semantic Role LabelingPalmer et al., 2010

WSD and entity linkingMoro et al., TACL 2014

Why combine resources?

Page 7: A  Robust Approach  to  Aligning Heterogeneous Lexical Resources

Difficulty of resource alignment

Fine granularity of lexical resources

plant

WordNet

4 senses 15 senses

Page 8: A  Robust Approach  to  Aligning Heterogeneous Lexical Resources

How resource alignment works?

WordNetplant#n#1plant#n#1

• Usually measures the similarity of two concepts

WKT: WN:

Page 9: A  Robust Approach  to  Aligning Heterogeneous Lexical Resources

How resource alignment works?

• Usually measures the similarity of two concepts

• And aligns two concepts if their similarity exceeds a certain threshold

?WordNetplant#n#1plant#n#1WKT: WN:

Page 10: A  Robust Approach  to  Aligning Heterogeneous Lexical Resources

How resource alignment works?

Alignment approaches differ in the way they calculate this similarity

?WordNetplant#n#1plant#n#1WKT: WN:

Page 11: A  Robust Approach  to  Aligning Heterogeneous Lexical Resources

Gloss similarity WordNet

Denfinitional similarity

Page 12: A  Robust Approach  to  Aligning Heterogeneous Lexical Resources

• Strong baseline

• Fall short when• Different wordings are used for same concepts• When two words lack quality glosses

plant -- Buildings for carrying on industrial labor.

plant -- The necessary infrastructure used in support and maintenance of a given facility.

Denfinitional similarity

Page 13: A  Robust Approach  to  Aligning Heterogeneous Lexical Resources

Contributions

A novel concept similarity measure

Denfinitional similarity

A robust technique foralignment of resourcesA robust technique for

alignment of heterogeneous resources

An effective ontologization approach

Page 14: A  Robust Approach  to  Aligning Heterogeneous Lexical Resources

Our approach: SemAlign

Page 15: A  Robust Approach  to  Aligning Heterogeneous Lexical Resources

Our approach: SemAlign

Definition similarity

Page 16: A  Robust Approach  to  Aligning Heterogeneous Lexical Resources

Our approach: SemAlign

Structural similarity

Page 17: A  Robust Approach  to  Aligning Heterogeneous Lexical Resources

SemAlign: structural similarity

WordNet

Page 18: A  Robust Approach  to  Aligning Heterogeneous Lexical Resources

SemAlign: core

Modeling concepts through Semantic Signatures

Page 19: A  Robust Approach  to  Aligning Heterogeneous Lexical Resources

conceptsome

Personalized PageRankSemantic Signature of a concept concept

Page 20: A  Robust Approach  to  Aligning Heterogeneous Lexical Resources

Distributional representationover all concepts in the semantic network

Personalized PageRankSemantic Signature of a concept concept

Page 21: A  Robust Approach  to  Aligning Heterogeneous Lexical Resources

SemAlign

Definition similarity

WordNet

Page 22: A  Robust Approach  to  Aligning Heterogeneous Lexical Resources

SemAlign

WordNet

Page 23: A  Robust Approach  to  Aligning Heterogeneous Lexical Resources

SemAlign: signature unification

Page 24: A  Robust Approach  to  Aligning Heterogeneous Lexical Resources

conc

ept_4

conc

ept_5

conc

ept_2

conc

ept_6

conc

ept_1

conc

ept_3

conc

ept_4

conc

ept_5

conc

ept_2

conc

ept_6

conc

ept_1

conc

ept_3

WordNet

SemAlign: signature unificationFind concepts associated with monosemous words

Page 25: A  Robust Approach  to  Aligning Heterogeneous Lexical Resources

conc

ept_4

conc

ept_5

conc

ept_2

conc

ept_6

conc

ept_1

conc

ept_3

conc

ept_4

conc

ept_5

conc

ept_2

conc

ept_6

conc

ept_1

conc

ept_3

WordNet

SemAlign: signature unificationTruncate vectors to the overlapping concepts

Page 26: A  Robust Approach  to  Aligning Heterogeneous Lexical Resources

SemAlign: signature unification

WordNet Synsets containing at least one monosemous word

~ 60% (72,000)

The reliability of leveraging monosemous words

Page 27: A  Robust Approach  to  Aligning Heterogeneous Lexical Resources

Semantic Signature Comparison

Page 28: A  Robust Approach  to  Aligning Heterogeneous Lexical Resources

Weighted Overlap(Pilehvar et al., ACL 2013)

Semantic Signature Comparison

Page 29: A  Robust Approach  to  Aligning Heterogeneous Lexical Resources

SemAlign: score combination

Page 30: A  Robust Approach  to  Aligning Heterogeneous Lexical Resources

Ontologizationof lexical resources

Page 31: A  Robust Approach  to  Aligning Heterogeneous Lexical Resources

WordNet

Ontologizationof lexical resources

Page 32: A  Robust Approach  to  Aligning Heterogeneous Lexical Resources

WordNet

Ontologizationof lexical resources

Page 33: A  Robust Approach  to  Aligning Heterogeneous Lexical Resources

Ontologizationof lexical resources

Page 34: A  Robust Approach  to  Aligning Heterogeneous Lexical Resources

Ontologizationof lexical resources

Page 35: A  Robust Approach  to  Aligning Heterogeneous Lexical Resources

WordNet

Ontologizationof lexical resources

Page 36: A  Robust Approach  to  Aligning Heterogeneous Lexical Resources

WordNet

Ontologizationof lexical resources

Page 37: A  Robust Approach  to  Aligning Heterogeneous Lexical Resources

Ontologizationof lexical resources

Page 38: A  Robust Approach  to  Aligning Heterogeneous Lexical Resources

Ontologizationof lexical resources

Definition page for sail

Page 39: A  Robust Approach  to  Aligning Heterogeneous Lexical Resources

WordNet

Ontologizationof lexical resources

Page 40: A  Robust Approach  to  Aligning Heterogeneous Lexical Resources

WordNet

Ontologizationof lexical resources

Page 41: A  Robust Approach  to  Aligning Heterogeneous Lexical Resources

Ontologizationof lexical resources

Page 42: A  Robust Approach  to  Aligning Heterogeneous Lexical Resources

WordNet

Ontologizationof lexical resources

Page 43: A  Robust Approach  to  Aligning Heterogeneous Lexical Resources

Definition page for windmill

1. A machine which translates linear motion of wind to rotational motion by means of adjustable vanes called sails.

Ontologizationof lexical resources

Page 44: A  Robust Approach  to  Aligning Heterogeneous Lexical Resources

Definition page for windmill

1. A machine which translates linear motion of wind to rotational motion by means of adjustable vanes called sails.

windmill.n.1linear motion.n.1

rotational motion.n.1translate.v.3

wind.n.1

machine.n.1sail.n.3

adjustable vanes.v.4

Ontologizationof lexical resources

Page 45: A  Robust Approach  to  Aligning Heterogeneous Lexical Resources

Ontologization: similarity-based disambiguation

Definition page for windmill

1. A machine which translates linear motion of wind to rotational motion by means of adjustable vanes called sails.

Page 46: A  Robust Approach  to  Aligning Heterogeneous Lexical Resources

Ontologization: similarity-based disambiguation

1. A trip in a boat, especially a sailboat.

2. A sailing vessel; a vessel of any kind; a craft.

3. The blade of a windmill.

4. A tower-like structure found on the dorsal (topside) surface of submarines.

1. A machine which translates linear motion of wind to rotational motion by means of adjustable vanes called sails.

Definition page for sail

Definition page for windmill

Page 47: A  Robust Approach  to  Aligning Heterogeneous Lexical Resources

Ontologization: similarity-based disambiguation

1. A trip in a boat, especially a sailboat.

2. A sailing vessel; a vessel of any kind; a craft.

3. The blade of a windmill.

4. A tower-like structure found on the dorsal (topside) surface of submarines.

1. A machine which translates linear motion of wind to rotational motion by means of adjustable vanes called sails.

Definition page for sail

Definition page for windmill

Page 48: A  Robust Approach  to  Aligning Heterogeneous Lexical Resources

Ontologization: similarity-based disambiguation

1. A trip in a boat, especially a sailboat.

2. A sailing vessel; a vessel of any kind; a craft.

3. The blade of a windmill.

4. A tower-like structure found on the dorsal (topside) surface of submarines.

1. A machine which translates linear motion of wind to rotational motion by means of adjustable vanes called sails.

Definition page for sail

Definition page for windmill

Page 49: A  Robust Approach  to  Aligning Heterogeneous Lexical Resources

Ontologization: similarity-based disambiguation

1. A trip in a boat, especially a sailboat.

2. A sailing vessel; a vessel of any kind; a craft.

3. The blade of a windmill.

4. A tower-like structure found on the dorsal (topside) surface of submarines.

1. A machine which translates linear motion of wind to rotational motion by means of adjustable vanes called sails.

Definition page for sail

Definition page for windmill

Page 50: A  Robust Approach  to  Aligning Heterogeneous Lexical Resources

Ontologization:evaluation

Our methodWKTWSD

0.7

0.8

0.9

F1

Acc

(Meyer & Gurevych, 2012)

Page 51: A  Robust Approach  to  Aligning Heterogeneous Lexical Resources

Alignment Experiments:Datasets

(Matuschek & Gurevych, TACL 2013)

WordNetWN synsetsmanually mapped to their corresponding concepts

320

484315

Page 52: A  Robust Approach  to  Aligning Heterogeneous Lexical Resources

Alignment Experiments:System configurations

Parameter 1:

score combination parameter

𝜷𝜷

𝜷

Page 53: A  Robust Approach  to  Aligning Heterogeneous Lexical Resources

Alignment Experiments:System configurations

Parameter 2:

similarity threshold 𝜽

Page 54: A  Robust Approach  to  Aligning Heterogeneous Lexical Resources

Alignment Experiments:System configurations

𝜽 𝜷

0 1

0.5

Setting and

a. Unsupervised system

Page 55: A  Robust Approach  to  Aligning Heterogeneous Lexical Resources

Alignment Experiments:System configurations

𝜽 𝜷

0 1

a. Unsupervised system

Setting and

b. Tuning on subset

0.42 0.53

Page 56: A  Robust Approach  to  Aligning Heterogeneous Lexical Resources

Alignment Experiments:System configurations

𝜽 𝜷

a. Unsupervised system

Setting and

b. Tuning on subsetc. Cross validation

0 1

0.44 0.51

Page 57: A  Robust Approach  to  Aligning Heterogeneous Lexical Resources

Comparison systemDijkstra-WSA

(Matuschek & Gurevych, TACL 2013)

Page 58: A  Robust Approach  to  Aligning Heterogeneous Lexical Resources

Alignment Experiments

WN-WP WN-WT WN-OW0.6

0.65

0.7

0.75

0.8

0.85

0.9

0.805

0.712 0.709

0.833

0.73 0.733

0.84

0.722

0.749

Unsupervised Tuning on subset Cross-validation

SB

SB+DWSAF1

Page 59: A  Robust Approach  to  Aligning Heterogeneous Lexical Resources

Alignment Experiments

WN-WP WN-WT WN-OW0.6

0.65

0.7

0.75

0.8

0.850.824

0.736

0.824

0.723

0.684 0.684

Tuning on WN-WP Tuning on WN-WT Tuning on WN-OW

F1SB

SB+DWSA

Page 60: A  Robust Approach  to  Aligning Heterogeneous Lexical Resources

SemAlign: structural similarity

Page 61: A  Robust Approach  to  Aligning Heterogeneous Lexical Resources

Alignment Experiments

WN-WP WN-WT WN-OW0.3

0.4

0.5

0.6

0.7

0.8

0.9

0.71

0.39

0.473

0.83

0.623 0.627

DWSA SemAlign

F1

Page 62: A  Robust Approach  to  Aligning Heterogeneous Lexical Resources

Conclusions

• A novel approach for aligning lexical resources– Accurate even in the absence of training data

– Robust across different resources

• An effective ontologization approach

• Experiments on aligning – WN to WP, WT, and OW

Page 63: A  Robust Approach  to  Aligning Heterogeneous Lexical Resources

Future directions

• Integrating the approach into BabelNet for boosting the alignment accuracy

• Alignment across different languages

• Updating lexicons with novel terms

Page 64: A  Robust Approach  to  Aligning Heterogeneous Lexical Resources

1. thanks -- an acknowledgment of appreciation

2. thanks -- with the help of or owing to;

2. thanks -- an expression of gratitude.

2. thanks -- because of; normally used with a positive connotation, though it can be used sarcastically.

WordNet

Thanks!