Jointly Learning Word and Phrase Embeddings Using
Neural Networks and Implicit Tensor Factorization
Kazuma Hashimoto
Tsuruoka Laboratory, University of Tokyo
19/06/2015 Talk@UCL Machine Reading Lab.
• Name
– Kazuma Hashimoto (橋本 和真 in Japanese)
– http://www.logos.t.u-tokyo.ac.jp/~hassy/
• Belong
– Tsuruoka Laboratory, University of Tokyo
• April 2015 – present Ph.D. student
• April 2013 – March 2015 Master’s student
– National Centre for Text Mining (NaCTeM)
• Research Interest
– Word/phrase/document embeddings and their
applications
Self Introduction
19/06/2015 Talk@UCL Machine Reading Lab. 2 / 39
1. Background
– Word and Phrase Embeddings
2. Jointly Learning Word and Phrase Embeddings
– General Idea
3. Our Methods Focusing on Transitive Verb Phrases
– Word Prediction (EMNLP 2014)
– Implicit Tensor Factorization (CVSC 2015)
4. Experiments and Results
5. Summary
Today’s Agenda
19/06/2015 Talk@UCL Machine Reading Lab. 3 / 39
1. Background
– Word and Phrase Embeddings
2. Jointly Learning Word and Phrase Embeddings
– General Idea
3. Our Methods Focusing on Transitive Verb Phrases
– Word Prediction (EMNLP 2014)
– Implicit Tensor Factorization (CVSC 2015)
4. Experiments and Results
5. Summary
Today’s Agenda
19/06/2015 Talk@UCL Machine Reading Lab. 4 / 39
• Word: String Index Vector
• Why vectors?
– Word similarities can be measured using distance
metrics of the vectors (e.g., the cosine similarity)
Assigning Vectors to Words
cause
trigger
disorder
disease
animal
mouse
ratanimal
mouserat
diseasedisorder
triggercause
Embedding words in a vector space
19/06/2015 Talk@UCL Machine Reading Lab. 5 / 39
• Two approaches using large corpora:
(systematic comparison of them in Baroni+ (2014))
– Count-based approach
• e.g.) Reducing the dimension of word co-
occurrence matrix using SVD
– Prediction-based approach
• e.g.) Predicting words from their contexts using
neural networks
• We focus on prediction-based approach
– Why?
Approaches to Word Representations
19/06/2015 Talk@UCL Machine Reading Lab. 6 / 39
• Prediction-based approaches usually
– parameterize the word embeddings
– learn them based on co-occurrence statistics
• Word embeddings appearing in similar contexts get
close to each other
Learning Word Embeddings
------------
text data
… the prevalence of drunken driving and accidents caused by drinking …
target
word prediction using the word embedding
SkipGram model (Mikolov+, 2013) in word2vec
19/06/2015 Talk@UCL Machine Reading Lab. 7 / 39
• Learning word embeddings for relation classification
– To appear at CoNLL 2015 (just advertising)
Task-Oriented Word Embeddings
19/06/2015 Talk@UCL Machine Reading Lab. 8 / 39
• Treating phrases and sentences as well as words
– gaining much attention recently!
Beyond Word Embeddings
make payment
pay money
moneypay
pay moneymake payment
paymentmake
Embedding phrases in a vector space
19/06/2015 Talk@UCL Machine Reading Lab. 9 / 39
• Element-wise addition/multiplication (Lapata+, 2010)
– 𝑣 sentnce = 𝑖 𝑣 𝑤𝑖
• Recursive autoencoders (Socher+, 2011; Hermann+, 2013)
– Using parse trees
– 𝑣 parent = 𝑓(𝑣 left child , 𝑣 right child )
• Tensor/matrix-based methods
– 𝑣 adj noun = 𝑀 adj 𝑣(noun) (Baroni+, 2010)
– 𝑀 verb = 𝑖,𝑗 𝑣 subj𝑖T𝑣 obj𝑗 (Grefenstette+, 2011)
• 𝑀 subj, verb, obj ={𝑣 subj T𝑣 obj } ∗ 𝑀(verb)
• 𝑣 subj, verb, obj = 𝑀 verb 𝑣 obj ∗ 𝑣 subj
(Kartsaklis+, 2012)
Approaches to Phrase Embeddings
19/06/2015 Talk@UCL Machine Reading Lab. 10 / 39
• Co-occurrence matrix + SVD
• C&W (Collobert+, 2011)
• RNNLM (Mikolov+, 2013)
• SkipGram/CBOW (Mikolov+, 2013)
• vLBL/ivLBL (Mnih+, 2013)
• Dependency-based SkipGram (Levy+, 2014)
• Glove (Pennington+, 2014)
Which Word Embeddings are the Best?
19/06/2015 Talk@UCL Machine Reading Lab.
Which word embeddings should we use for which composition methods?
Joint leaning11 / 39
1. Background
– Word and Phrase Embeddings
2. Jointly Learning Word and Phrase Embeddings
– General Idea
3. Our Methods Focusing on Transitive Verb Phrases
– Word Prediction (EMNLP 2014)
– Implicit Tensor Factorization (CVSC 2015)
4. Experiments and Results
5. Summary
Today’s Agenda
19/06/2015 Talk@UCL Machine Reading Lab. 12 / 39
• Word co-occurrence statistics word embeddings
• How about phrase embeddings?
– Phrase co-occurrence statistics!
Co-Occurrence Statistics of Phrases
The importer made payment in his own domestic currency
19/06/2015 Talk@UCL Machine Reading Lab.
The businessman pays his monthly fee in yen
similar contexts
similar meanings?
13 / 39
• Using Predicate-Argument Structures (PAS)
– Enju parer (Miyao+, 2008)
• Analyzes relations between phrases and words
How to Identify Phrase-Word Relations?
The importer made payment in his own domestic currency
NP
NP
NP
VPNP
verb prepositionpredicates
19/06/2015 Talk@UCL Machine Reading Lab.
arguments
14 / 39
1. Background
– Word and Phrase Embeddings
2. Jointly Learning Word and Phrase Embeddings
– General Idea
3. Our Methods Focusing on Transitive Verb Phrases
– Word Prediction (EMNLP 2014)
– Implicit Tensor Factorization (CVSC 2015)
4. Experiments and Results
5. Summary
Today’s Agenda
19/06/2015 Talk@UCL Machine Reading Lab. 15 / 39
• Meanings of transitive verbs are affected by their
arguments
– e.g.) run, make, etc.
Good target to test composition models
Why Transitive Verb Phrases?
19/06/2015 Talk@UCL Machine Reading Lab.
make
make payment
make money
make use (of)
pay
earn
use
16 / 39
• Embedding subject-verb-object tuples in a vector space
– Semantic similarities between SVOs can be used!
Possible Application: Semantic Search
19/06/2015 Talk@UCL Machine Reading Lab. 17 / 39
• Focusing on the role of prepositional adjuncts
– Prepositional adjuncts complement meanings of
verb phrases should be useful
Training Data from Large Corpora
simplification
How to model the relationships between predicates and arguments?
19/06/2015 Talk@UCL Machine Reading Lab.
------------
English Wikipedia,BNC, etc.
parse
18 / 39
1. Background
– Word and Phrase Embeddings
2. Jointly Learning Word and Phrase Embeddings
– General Idea
3. Our Methods Focusing on Transitive Verb Phrases
– Word Prediction (EMNLP 2014)
– Implicit Tensor Factorization (CVSC 2015)
4. Experiments and Results
5. Summary
Today’s Agenda
19/06/2015 Talk@UCL Machine Reading Lab. 19 / 39
• Predicting words in predicate-argument tuples
Word Prediction Model (like word2vec)
arg1
+
currency furniture
max(0, 1-s(currency)+s(furniture)) cost function
pred
[importer make payment] in
𝐩 = tanh(𝐡𝑎𝑟𝑔1prep
⊙𝐯𝑎𝑟𝑔1 +
𝐡𝑝𝑟𝑒𝑑prep
⊙𝐯𝑝𝑟𝑒𝑑)
𝐯𝑎𝑟𝑔1 𝐯𝑝𝑟𝑒𝑑
feature vectorfor the word prediction
𝐡𝑎𝑟𝑔1prep
𝐡𝑝𝑟𝑒𝑑prep
19/06/2015 Talk@UCL Machine Reading Lab.
PAS-CLBLM20 / 39
• Two methods:
– (a) assigning a vector to each SVO tuple
– (b) composing SVO embeddings
How to Compute SVO Embeddings?
[importer make payment]
subj obj
+verb
[importer make payment]
(a) (b)
- parameterized vectors
- composed vectors
19/06/2015 Talk@UCL Machine Reading Lab. 21 / 39
1. Background
– Word and Phrase Embeddings
2. Jointly Learning Word and Phrase Embeddings
– General Idea
3. Our Methods Focusing on Transitive Verb Phrases
– Word Prediction (EMNLP 2014)
– Implicit Tensor Factorization (CVSC 2015)
4. Experiments and Results
5. Summary
Today’s Agenda
19/06/2015 Talk@UCL Machine Reading Lab. 22 / 39
• Only element-wise vector operations
– Pros: Fast training
– Cons: Poor interaction between predicates and
arguments
• Interactions between predicates and arguments are
important for transitive verbs
Weakness of PAS-CLBLM
19/06/2015 Talk@UCL Machine Reading Lab.
make
make payment
make money
make use (of)
pay
earn
use
23 / 39
• Tensor/matrix-based approaches (Noun: vector)
– Adjective: matrix (Baroni+, 2010)
– Transitive verb: matrix
(Grefenstette+, 2011; Van de Cruys+, 2013)
Focusing on Tensor-Based Approaches
19/06/2015 Talk@UCL Machine Reading Lab.
verb
subject
verb
𝑑𝑑
𝑑
subject≅
𝑃𝑀𝐼(importer, make, payment) = 0.31
GivenGiven
Given
pre-trained
24 / 39
• Parameterizing
– Predicate matrices and
– Argument embeddings
Implicit Tensor Factorization (1)
19/06/2015 Talk@UCL Machine Reading Lab.
predicate
argument 2
predicate
𝑑𝑑
𝑑
argument 2≅
GivenGiven
Given
25 / 39
• Calculating plausibility scores
– Using predicate matrices & argument embeddings
Implicit Tensor Factorization (2)
19/06/2015 Talk@UCL Machine Reading Lab.
predicate
argument 2
predicate
𝑑𝑑
𝑑
argument 2≅
GivenGiven
Given
𝑇(i, j, k) =
ij k
26 / 39
• Learning model parameters
– Using plausibility judgment task
• Observed tuple: (i, j, k)
• Collapsed tuple: (i’, j, k), (i, j’, k), (i, j, k’)
–Negative sampling (Mikolov+, 2013)
Implicit Tensor Factorization (3)
19/06/2015 Talk@UCL Machine Reading Lab.
Cost function
27 / 39
• Discriminating between observed and collapsed ones
Example
19/06/2015 Talk@UCL Machine Reading Lab.
(i, j, k) = (in, importer make payment, currency)(i’, j, k)= (on, importer make payment, currency)(i, j’, k)= (in, child eat pizza, currency)(i, j, k’)= (in, importer make payment, furniture)
28 / 39
• Two methods:
– (a) assigning a vector to each SVO tuple
– (b) composing SVO embeddings
How to Compute SVO Embeddings?
- parameterized vectors
- composed vectors
19/06/2015 Talk@UCL Machine Reading Lab.
[importer make payment][importer make payment]
(a) (b)
- parameterized matrices
(Kartsaklis+, 2012)
29 / 39
• The function is presented in Kartsaklis+ (2012)
– Using verb matrices in Grefenstette+ (2011)
• Our verb matrices are related to Grefenstette+
(2011)
• The function can compute
– verb-object phrase embeddings
– subject-verb-object phrase embeddings
Why the Copy-Subject Function?
19/06/2015 Talk@UCL Machine Reading Lab. 30 / 39
1. Background
– Word and Phrase Embeddings
2. Jointly Learning Word and Phrase Embeddings
– General Idea
3. Our Methods Focusing on Transitive Verb Phrases
– Word Prediction (EMNLP 2014)
– Implicit Tensor Factorization (CVSC 2015)
4. Experiments and Results
5. Summary
Today’s Agenda
19/06/2015 Talk@UCL Machine Reading Lab. 31 / 39
• Training corpus: English Wikipedia
– SVO data: 23.6 million instances
– SVO-preposition-noun data: 17.3 million instances
• Parameter Initialization: random values
• Optimization: mini-batch AdaGrad (Duchi+, 2011)
• Embedding dimensionality
– PAS-CLBLM: 200
– Tensor method: 50
• # of model parameters of PAS-CLBLM is a little
bit larger than that of the tensor method
Experimental Settings
19/06/2015 Talk@UCL Machine Reading Lab. 32 / 39
• Case 1: assigning a vector to each SVO tuple
Examples of Learned SVO Embeddings
Adjuncts seem to be helpful in learning the meanings of verb phrases
This approach omits the information about individual words!
19/06/2015 Talk@UCL Machine Reading Lab. 33 / 39
• Case 2: composing SVO embeddings
Examples of Learned SVO Embeddings
Tensor (CVSC 2015) PAS-CLBLM (EMNLP 2014)
More flexible!
19/06/2015 Talk@UCL Machine Reading Lab.
Strongly enhancing the head word
34 / 39
• In the latest approach, the learned verb matrices
capture multiple meanings
Multiple Meanings in Verb Matrices
19/06/2015 Talk@UCL Machine Reading Lab. 35 / 39
• Measuring semantic similarities of verb pairs taking
the same subjects and objects (Grefenstette+, 2011)
– Evaluation: Speaman’s rank correlation between
similarity scores and human ratings
Verb Sense Disambiguation Task
verb pair with subj&obj human rating
student write name
student spell name7
child show sign
child express sign6
system meet criterion
system visit criterion1
19/06/2015 Talk@UCL Machine Reading Lab. 36 / 39
• State-of-the-art results on the disambiguation task
– Prepositional adjuncts improve the results
• How about other kinds of adjuncts?
Results
Method
Tensor (only verb data) 0.480
Tensor (verb and preposition data) 0.614
PAS-CLBLM (this experiment) 0.374
Milajevs+, 2014 0.456
Hashimoto+, 2014 0.422
Future work: improving real-world applications using the method
19/06/2015 Talk@UCL Machine Reading Lab. 37 / 39
1. Background
– Word and Phrase Embeddings
2. Jointly Learning Word and Phrase Embeddings
– General Idea
3. Our Methods Focusing on Transitive Verb Phrases
– Word Prediction (EMNLP 2014)
– Implicit Tensor Factorization (CVSC 2015)
4. Experiments and Results
5. Summary
Today’s Agenda
19/06/2015 Talk@UCL Machine Reading Lab. 38 / 39
• Word and phrase embeddings are jointly learned
using large corpora parsed by syntactic parsers
– Tensor-based method is suitable for verb sense
disambiguation
– Adjuncts are useful in learning verb phrases
• Future directions:
– improving the embedding methods
– applying them to real-world NLP applications
• What kind of information should be captured?
Summary
19/06/2015 Talk@UCL Machine Reading Lab. 39 / 39