31
Word Re-Embedding via Manifold Learning Souleiman Hasan Lecturer, Maynooth University, School of Business and Hamilton Institute [email protected] Machine Learning Dublin Meetup, Dublin, Ireland, 26 Feb 2018

Word Re-Embedding via Manifold Learning · 2020. 3. 13. · Word Re-Embedding: Related Work – Word embedding: Word2Vec (Mikolov et al., 2013), GloVe (Pennington et al., 2014a) –

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Word Re-Embedding via Manifold Learning · 2020. 3. 13. · Word Re-Embedding: Related Work – Word embedding: Word2Vec (Mikolov et al., 2013), GloVe (Pennington et al., 2014a) –

Word Re-Embedding via Manifold Learning

Souleiman Hasan Lecturer, Maynooth University, School of Business and

Hamilton Institute

[email protected]

Machine Learning Dublin Meetup, Dublin, Ireland, 26 Feb 2018

Page 2: Word Re-Embedding via Manifold Learning · 2020. 3. 13. · Word Re-Embedding: Related Work – Word embedding: Word2Vec (Mikolov et al., 2013), GloVe (Pennington et al., 2014a) –

Based On

• Souleiman Hasan and Edward Curry. "Word Re-Embedding via Manifold Dimensionality Retention." Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (EMNLP 2017)

http://aclweb.org/anthology/D17-1033

2

Page 3: Word Re-Embedding via Manifold Learning · 2020. 3. 13. · Word Re-Embedding: Related Work – Word embedding: Word2Vec (Mikolov et al., 2013), GloVe (Pennington et al., 2014a) –

Outline

• Motivation

– NLP tasks, Semantics, Word embeddings

• Background

– Mathematical Structures, Manifold Learning

• Word Re-embedding

– Methodology and Related Work

– Approach

– Results

3

Page 4: Word Re-Embedding via Manifold Learning · 2020. 3. 13. · Word Re-Embedding: Related Work – Word embedding: Word2Vec (Mikolov et al., 2013), GloVe (Pennington et al., 2014a) –

NLP Tasks

– Named entity recognition

4

Page 5: Word Re-Embedding via Manifold Learning · 2020. 3. 13. · Word Re-Embedding: Related Work – Word embedding: Word2Vec (Mikolov et al., 2013), GloVe (Pennington et al., 2014a) –

NLP Tasks

– Sentiment Analysis

5

Page 6: Word Re-Embedding via Manifold Learning · 2020. 3. 13. · Word Re-Embedding: Related Work – Word embedding: Word2Vec (Mikolov et al., 2013), GloVe (Pennington et al., 2014a) –

Generic NLP Supervised Model

6

ML Training Data

Model W

ord

Fe

atu

re

Extr

acti

on

Page 7: Word Re-Embedding via Manifold Learning · 2020. 3. 13. · Word Re-Embedding: Related Work – Word embedding: Word2Vec (Mikolov et al., 2013), GloVe (Pennington et al., 2014a) –

Generic NLP Supervised Model

7

ML Training Data

Model W

ord

Fe

atu

re

Extr

acti

on

Distributed Word Representations

E.g. (Collobert, 2011), (Turian et al., 2010), (Devlin et al., 2014)

Page 8: Word Re-Embedding via Manifold Learning · 2020. 3. 13. · Word Re-Embedding: Related Work – Word embedding: Word2Vec (Mikolov et al., 2013), GloVe (Pennington et al., 2014a) –

Word Distributed Representations capture Semantics

• Semantics: “The relation between the words or expressions of a language and their meaning.” (Gardenfors, 2004)

Mean

ing

s

Sym

bo

ls

Orange Yellow

{sunset, the tray, my table, Joe’s laptop, that towel, …}

{my car, the sticker, her pen,

my bag, …}

Extensional e.g. Tarski model theory

Conceptual Spaces (Gardendors, 2004)

Orange Yellow

Distributional ESA (Gabrilovich & Markovitch, 2007)

8

Orange Yellow

http://en.wikipedia.org /wiki/Fire

http://en.wikipedia.org / wiki/Sun v1 v2

topology, statistical topology

Image CC BY-SA 3.0 by Michael Horvath https://commons.wikimedia.org/wiki/User:SharkD

Page 9: Word Re-Embedding via Manifold Learning · 2020. 3. 13. · Word Re-Embedding: Related Work – Word embedding: Word2Vec (Mikolov et al., 2013), GloVe (Pennington et al., 2014a) –

Word Embeddings

• Word2Vec (Mikolov et al., 2013)

• GloVe (Pennington et al., 2014)

9

Page 10: Word Re-Embedding via Manifold Learning · 2020. 3. 13. · Word Re-Embedding: Related Work – Word embedding: Word2Vec (Mikolov et al., 2013), GloVe (Pennington et al., 2014a) –

Word Embeddings

• Word2Vec (Mikolov et al., 2013)

• GloVe (Pennington et al., 2014)

10 Image is from work created and shared by Google and used according to terms described in the Creative Commons 3.0 Attribution License.

Page 11: Word Re-Embedding via Manifold Learning · 2020. 3. 13. · Word Re-Embedding: Related Work – Word embedding: Word2Vec (Mikolov et al., 2013), GloVe (Pennington et al., 2014a) –

Why Called Embedding?

11 Image CC BY-SA 3.0 by Lars H. Rohwedder, https://commons.wikimedia.org/wiki/User:RokerHRO

Page 12: Word Re-Embedding via Manifold Learning · 2020. 3. 13. · Word Re-Embedding: Related Work – Word embedding: Word2Vec (Mikolov et al., 2013), GloVe (Pennington et al., 2014a) –

Why Called Embedding?

12 Derivative image CC BY-SA 3.0 by Ryan Wilson from Jitse Niesen https://commons.wikimedia.org/wiki/User:Pbroks13; https://en.wikipedia.org/wiki/User:Jitse_Niesen

Page 13: Word Re-Embedding via Manifold Learning · 2020. 3. 13. · Word Re-Embedding: Related Work – Word embedding: Word2Vec (Mikolov et al., 2013), GloVe (Pennington et al., 2014a) –

Mathematical Structures

13

Set

Topological Space

Metric Space

Vector Space

Mo

re Structu

re

{Dublin, Paris, LA}

{Dublin, Paris, LA} {{}, {Dublin, Paris}, {Dublin, Paris, LA}}

{Dublin, Paris, LA} Dublin Paris LA

Dublin 0 779 8314

Paris 779 0 9096

LA 8314 9096 0

{Dublin, Paris, LA} Dublin=53.3498° N+6.2603° W Paris= 48.8566° N+ 2.3522° E LA= 34.0522° N+ 118.2437° W

Page 14: Word Re-Embedding via Manifold Learning · 2020. 3. 13. · Word Re-Embedding: Related Work – Word embedding: Word2Vec (Mikolov et al., 2013), GloVe (Pennington et al., 2014a) –

Mathematical Structures and ML

14

Set

Topological Space

Metric Space

Vector Space

{Dublin, Paris, LA}

{Dublin, Paris, LA} {{}, {Dublin, Paris}, {Dublin, Paris, LA}}

{Dublin, Paris, LA} Dublin Paris LA

Dublin 0 779 8314

Paris 779 0 9096

LA 8314 9096 0

{Dublin, Paris, LA} Dublin=53.3498° N+6.2603° W Paris= 48.8566° N+ 2.3522° E LA= 34.0522° N+ 118.2437° W N dim-> k dim

Dimensionality Reduction

Kernel Methods

Graph Kernels

Emb

edd

ing

Page 15: Word Re-Embedding via Manifold Learning · 2020. 3. 13. · Word Re-Embedding: Related Work – Word embedding: Word2Vec (Mikolov et al., 2013), GloVe (Pennington et al., 2014a) –

e.g. Earth Surface 2D Embedding

15 Image CC BY-SA 3.0 by Lars H. Rohwedder, https://commons.wikimedia.org/wiki/User:RokerHRO

Page 16: Word Re-Embedding via Manifold Learning · 2020. 3. 13. · Word Re-Embedding: Related Work – Word embedding: Word2Vec (Mikolov et al., 2013), GloVe (Pennington et al., 2014a) –

e.g. Shape of the Universe

16

Page 17: Word Re-Embedding via Manifold Learning · 2020. 3. 13. · Word Re-Embedding: Related Work – Word embedding: Word2Vec (Mikolov et al., 2013), GloVe (Pennington et al., 2014a) –

Manifold Learning

• Embedding while preserving the neighbourhood

17 Saul, Lawrence K., and Sam T. Roweis. "An introduction to locally linear embedding.“ Available at: http://www. cs. toronto. edu/~ roweis/lle/publications. html(2000).

Page 18: Word Re-Embedding via Manifold Learning · 2020. 3. 13. · Word Re-Embedding: Related Work – Word embedding: Word2Vec (Mikolov et al., 2013), GloVe (Pennington et al., 2014a) –

Manifold Learning for Dimensionality Reduction

18

Page 19: Word Re-Embedding via Manifold Learning · 2020. 3. 13. · Word Re-Embedding: Related Work – Word embedding: Word2Vec (Mikolov et al., 2013), GloVe (Pennington et al., 2014a) –

Manifold Learning- Various Algorithms

19 http://scikit-learn.org/stable/auto_examples/manifold/plot_compare_methods.html

Page 20: Word Re-Embedding via Manifold Learning · 2020. 3. 13. · Word Re-Embedding: Related Work – Word embedding: Word2Vec (Mikolov et al., 2013), GloVe (Pennington et al., 2014a) –

Word Re-Embedding: Problem

20

Wo

rd P

airs

Em

be

dd

ing

Sim

ilari

ty

bas

ed o

n G

loV

e 4

2B

30

0d

em

bed

din

g

Word Pairs Ground Truth Similarity By WS353 ground truth similarity score

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 1 2 3 4 5 6 7 8 9 10

Pair: “shore”

“woodland”

Embedding sim(“shore”, “woodland”) > sim(“physics”, “proton”)

Despite an opposite order with a big difference in the ground truth

Pair: “physics”

“proton”

Page 21: Word Re-Embedding via Manifold Learning · 2020. 3. 13. · Word Re-Embedding: Related Work – Word embedding: Word2Vec (Mikolov et al., 2013), GloVe (Pennington et al., 2014a) –

Word Re-Embedding: Problem

21

Wo

rd P

airs

Em

be

dd

ing

Sim

ilari

ty

bas

ed o

n G

loV

e 4

2B

30

0d

em

bed

din

g

Word Pairs Ground Truth Similarity By WS353 ground truth similarity score

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 1 2 3 4 5 6 7 8 9 10

Pair: “shore”

“woodland”

Embedding sim(“shore”, “woodland”) > sim(“physics”, “proton”)

Despite an opposite order with a big difference in the ground truth

Pair: “physics”

“proton”

Can we improve an existing embedding

space?

Page 22: Word Re-Embedding via Manifold Learning · 2020. 3. 13. · Word Re-Embedding: Related Work – Word embedding: Word2Vec (Mikolov et al., 2013), GloVe (Pennington et al., 2014a) –

Word Re-Embedding: Methodology

22

Page 23: Word Re-Embedding via Manifold Learning · 2020. 3. 13. · Word Re-Embedding: Related Work – Word embedding: Word2Vec (Mikolov et al., 2013), GloVe (Pennington et al., 2014a) –

Word Re-Embedding: Related Work

– Word embedding: Word2Vec (Mikolov et al., 2013), GloVe (Pennington et al., 2014a)

– Unified metric recovery framework for word embedding and manifold learning (Hashimoto et al., 2016)

– Manifold learning for dimensionality reduction and embedding: Locally Linear Embedding (LLE) (Roweis and Saul, 2000), Isomap (Balasubramanian and Schwartz, 2002), t-SNE (Maaten and Hinton, 2008), etc.

– Word embedding post-processing: (Labutov and Lipson, 2013), (Lee et al., 2016), (Mu et al., 2017)

– Need for generic, unsupervised, nonlinear, and theoretically-founded model for post-processing

23

Page 24: Word Re-Embedding via Manifold Learning · 2020. 3. 13. · Word Re-Embedding: Related Work – Word embedding: Word2Vec (Mikolov et al., 2013), GloVe (Pennington et al., 2014a) –

Approach

24 Original Word

Embedding Space

Sample

Manifold

Fitting Window

fit

Manifold

LearningNew

Embedding

tra

nd

form

window start

window

length

Test Vector

Re-Embedded

Test Vector

dimensionality d

dimensionality d

1

2

3

4

Vo

ca

bu

lary

(fr

eq

ue

ncy o

rde

red

)

Page 25: Word Re-Embedding via Manifold Learning · 2020. 3. 13. · Word Re-Embedding: Related Work – Word embedding: Word2Vec (Mikolov et al., 2013), GloVe (Pennington et al., 2014a) –

Results

25

Wo

rd P

airs

Sim

ilari

ty

bas

ed o

n G

loV

e 4

2B

30

0d

em

bed

din

g, a

nd

n

orm

aliz

ed t

o u

nit

mea

ns

Ori

gin

al E

mb

edd

ing

Man

ifo

ld R

e-Em

bed

din

g

Word Pairs Ground Truth Similarity By WS353 ground truth similarity score

-0.005

-1E-17

0.005

0.01

0.015

0.02

0.025

0.03

0 1 2 3 4 5 6 7 8 9 10

Pair: “shore” “woodland”

Pair: “physics” “proton”

Re-embedding by Manifold learning increased the similarity estimation

between nearby words, and decreased the similarity estimation between distant words, resulting in a higher correlation with human judgement

Page 26: Word Re-Embedding via Manifold Learning · 2020. 3. 13. · Word Re-Embedding: Related Work – Word embedding: Word2Vec (Mikolov et al., 2013), GloVe (Pennington et al., 2014a) –

Word Re-Embedding: Results

26

Page 27: Word Re-Embedding via Manifold Learning · 2020. 3. 13. · Word Re-Embedding: Related Work – Word embedding: Word2Vec (Mikolov et al., 2013), GloVe (Pennington et al., 2014a) –

Word Re-Embedding: Results

27

Page 28: Word Re-Embedding via Manifold Learning · 2020. 3. 13. · Word Re-Embedding: Related Work – Word embedding: Word2Vec (Mikolov et al., 2013), GloVe (Pennington et al., 2014a) –

Word Re-Embedding: Results

28

Page 29: Word Re-Embedding via Manifold Learning · 2020. 3. 13. · Word Re-Embedding: Related Work – Word embedding: Word2Vec (Mikolov et al., 2013), GloVe (Pennington et al., 2014a) –

Word Re-Embedding: Results

29

Page 30: Word Re-Embedding via Manifold Learning · 2020. 3. 13. · Word Re-Embedding: Related Work – Word embedding: Word2Vec (Mikolov et al., 2013), GloVe (Pennington et al., 2014a) –

Word Re-Embedding: Results

30

Page 31: Word Re-Embedding via Manifold Learning · 2020. 3. 13. · Word Re-Embedding: Related Work – Word embedding: Word2Vec (Mikolov et al., 2013), GloVe (Pennington et al., 2014a) –

Conclusions

– Word re-embedding improves performance on word similarity tasks

– The sample window start should be chosen just after the stop words

– The sample length should be close or equal to the number of local neighbours, which in turn can be chosen from a wide range

– The dimensionality of the original embedding space should be retained

31