Upload
others
View
7
Download
1
Embed Size (px)
Citation preview
Deep Learning for Sentence Representation
Internship Project Summary
Yonatan Belinkov IBM Research - Haifa Summer 2015
Goals • Develop deep learning methods for representing
natural language sentences from text
• Acquire knowledge in deep learning tools and techniques
Background • Vector representations (embeddings) for words and
sentences
Background • Vector representations (embeddings) for words and
sentences
• Supervised vs unsupervised approaches
Background • Vector representations (embeddings) for words and
sentences • Supervised vs unsupervised approaches
• Neural network architectures Recursive (RecNN) Convolutional (CNN) Recurrent (RNN)
RecNN
The cat sat on the mat
NP NP
PP
VP
S
CNN
The cat sat on the mat
RNN
The cat sat on the mat
Autoencoder Formulation • Given a sentence that is a sequence of word vectors
w1...wn, each of dimension d: § Encode the sentence into a single vector representation § Decode the representation back into the sentence
Autoencoder Formulation • Given a sentence that is a sequence of word vectors
w1...wn, each of dimension d: § Encode the sentence into a single vector representation § Decode the representation back into the sentence
• During training § Get feedback from original sentence, propogate in the network
to learn parameters
Autoencoder Formulation • Given a sentence that is a sequence of word vectors
w1...wn, each of dimension d: § Encode the sentence into a single vector representation § Decode the representation back into the sentence
• During training § Get feedback from original sentence, propogate in the network
to learn parameters • During testing
§ Compare decoded sentence to original one
Basic RNN model • LSTM encoder-decoder (from Li, 2015)
CNN Encoder
Time dimension
Word dimension
w11 w12 … w1d
w21 w22 … w2d
… … … … wn1 wn2 … wnd
CNN Encoder
Time dimension
Word dimension
w11 w12 … w1d
w21 w22 … w2d
… … … … wn1 wn2 … wnd
Time dimension 1 (coarse-grained): include all word embedding dimensions (Kim 2014) Torch: nn.TemporalConvolution #params = embeddingDim * numFilters * filterWidth
CNN Encoder
Time dimension
Word dimension
w11 w12 … w1d
w21 w22 … w2d
… … … … wn1 wn2 … wnd
Time dimension 1 (coarse-grained): include all word embedding dimensions (Kim 2014) Torch: nn.TemporalConvolution #params = embeddingDim * numFilters * filterWidth
Time dimension 1 (fine-grained): convolve each embedding dimension independently (Kalchbrenner 2014) Torch: nn.SpatialConvolution #params = 1 * numFilters * filterWidth
CNN Encoder
Time dimension
Word dimension
w11 w12 … w1d
w21 w22 … w2d
… … … … wn1 wn2 … wnd
Time dimension 1 (coarse-grained): include all word embedding dimensions (Kim 2014) Torch: nn.TemporalConvolution #params = embeddingDim * numFilters * filterWidth
Time dimension 1 (fine-grained): convolve each embedding dimension independently (Kalchbrenner 2014) Torch: nn.SpatialConvolution #params = 1 * numFilters * filterWidth
Word dimension (fine-grained): convolve each word independently (???) Torch: nn.SpatialConvolution #params = 1 * numFilters * filterWidth
Loss Functions • Log-likelihood of predicted words from the decoder
§ Penalize for every wrong word § Word order matters
• Cosine distance between bag-of-words representations of gold and predicted sentences § Representation the size of the vocabulary § Word order doesn’t matter
Implementation Details • Torch • Minimal preprocessing of sentences • Optimization with AdaGrad • Dropout • 1000 dimensions for word and sentence vectors
Data • Hotel reviews § “we were the only people on our floor who spoke english” § “first rate ! the rooms look like they have been recently renovated .” § “recently stayed at the colonnade .”
• Dataset sizes (# sentences)
Train set Validation
set Test set
10K-1M 100 100
Quantitative Evaluation Machine translation metrics: how well the decoded sentence
“translates” the original sentence
Encoder Train size
BLEU Meteor Val error Train error
LSTM 100K 39.3 32.9 27.3 9.3
LSTM 1M 55.2 42.6 12.5 10.4
LSTM (drop 0.3) 1M 63.9 45.1 14.7 7.0
CNN (word) 100K 0.8 5.3 62.9 55.2
CNN (time, fine-grained) 100K 0.6 6.2 53.8 49.1
CNN (time, coarse-grained) 100K 16.6 20.7 39.0 26.0
Quantitative Evaluation Machine translation metrics: how well the decoded sentence
“translates” the original sentence
Encoder Train size
BLEU Meteor Val error Train error
LSTM 100K 39.3 32.9 27.3 9.3
LSTM 1M 55.2 42.6 12.5 10.4
LSTM (drop 0.3) 1M 63.9 45.1 14.7 7.0
CNN (word) 100K 0.8 5.3 62.9 55.2
CNN (time, fine-grained) 100K 0.6 6.2 53.8 49.1
CNN (time, coarse-grained) 100K 16.6 20.7 39.0 26.0
Quantitative Evaluation Machine translation metrics: how well the decoded sentence
“translates” the original sentence
Encoder Train size
BLEU Meteor Val error Train error
LSTM 100K 39.3 32.9 27.3 9.3
LSTM (drop 0.1) 1M 55.2 42.6 12.5 10.4
LSTM (drop 0.3) 1M 63.9 45.1 14.7 7.0
CNN (word) 100K 0.8 5.3 62.9 55.2
CNN (time, fine-grained) 100K 0.6 6.2 53.8 49.1
CNN (time, coarse-grained) 100K 16.6 20.7 39.0 26.0
Quantitative Evaluation Machine translation metrics: how well the decoded sentence
“translates” the original sentence
Encoder Train size
BLEU Meteor Val error Train error
LSTM 100K 39.3 32.9 27.3 9.3
LSTM 1M 55.2 42.6 12.5 10.4
LSTM (drop 0.3) 1M 63.9 45.1 14.7 7.0
CNN (word) 100K 0.8 5.3 62.9 55.2
CNN (time, fine-grained) 100K 0.6 6.2 53.8 49.1
CNN (time, coarse-grained) 100K 16.6 20.7 39.0 26.0
More observations • Bag-of-words based loss did not help
More observations • Bag-of-words based loss did not help
• Preliminary results on Wikipedia are much lower
• Possible explanations
§ Open domain, larger vocabulary, longer sentences
Model Train size BLEU Meteor
LSTM-LSTM 1M sentences 18.5 21.8
Qualitative Evaluation • Run trained model on unseen sentences
• Compare original and decoded sentences
Gold sentence Predicted sentence
1 we were the only people on our floor who spoke english
we were only the people who on our floor group seemed on top ,
2 which was nice . the place needs updated , which was nice . the place needs updating ,
3 but it's not horrible . but it's not horrible .
4 recently stayed at the colonnade . recently stayed at the conrad .
5 i must say i was extremely impressed with the staff and overall appearance of the hotel .
i must say i was extremely impressed with the cleanliness and helpfulness of the staff overall .
6 i would definitely stay here again and would recommend this hotel to family and friends .
i would definitely stay here again and would recommend this hotel to friends and family
Qualitative Evaluation
Qualitative Evaluation Gold sentence Predicted sentence
1 we were the only people on our floor who spoke english
we were only the people who on our floor group seemed on top ,
2 which was nice . the place needs updated , which was nice . the place needs updating ,
3 but it's not horrible . but it's not horrible .
4 recently stayed at the colonnade . recently stayed at the conrad .
5 i must say i was extremely impressed with the staff and overall appearance of the hotel .
i must say i was extremely impressed with the cleanliness and helpfulness of the staff overall .
6 i would definitely stay here again and would recommend this hotel to family and friends .
i would definitely stay here again and would recommend this hotel to friends and family
Qualitative Evaluation Gold sentence Predicted sentence
1 we were the only people on our floor who spoke english
we were only the people who on our floor group seemed on top ,
2 which was nice . the place needs updated , which was nice . the place needs updating ,
3 but it's not horrible . but it's not horrible .
4 recently stayed at the colonnade . recently stayed at the conrad .
5 i must say i was extremely impressed with the staff and overall appearance of the hotel .
i must say i was extremely impressed with the cleanliness and helpfulness of the staff overall .
6 i would definitely stay here again and would recommend this hotel to family and friends .
i would definitely stay here again and would recommend this hotel to friends and family
Qualitative Evaluation Gold sentence Predicted sentence
1 we were the only people on our floor who spoke english
we were only the people who on our floor group seemed on top ,
2 which was nice . the place needs updated , which was nice . the place needs updating ,
3 but it's not horrible . but it's not horrible .
4 recently stayed at the colonnade . recently stayed at the conrad .
5 i must say i was extremely impressed with the staff and overall appearance of the hotel .
i must say i was extremely impressed with the cleanliness and helpfulness of the staff overall .
6 i would definitely stay here again and would recommend this hotel to family and friends .
i would definitely stay here again and would recommend this hotel to friends and family
Qualitative Evaluation • Run trained model on train sentences
• Create vector representations for train sentences
• Cluster vectors with k-means
LSTM Encoder Clusters 'i would definitely stay here again . i love it ! 'but i would stay here again . ' 'i think i would stay here again ' 'i would ( and will ) stay here again . ' 'i would 100% stay here again . ' 'i would come here again . ' 'i would consider staying here again . ' 'would i stay here again ? '
'our staff was friendly and very fast to help us . ' 'but the staff was very friendly and accommodaCng . ' 'staff in the recepCon was very friendly . ' 'the check in staff was very friendly and helpful . ' 'the construcCon was complete . the staff is very friendly and helpful ' 'the hotel staff was very friendly and open to helping make dinner ' 'the internet was free and the staff was very friendly . '
'and the hotel is in an excellent locaCon . ' 'hotel is in a great locaCon -‐ nothing wrong with the neighbourhood . ' 'the back bay hotel is in a great locaCon ' 'the edison hotel is in a perfect locaCon ' 'the hotel circle is in a good locaCon i think ' 'the hotel is a fine hotel in a great area ' 'the hotel is huge and in a good downtown locaCon . '
CNN Encoder Clusters 'great locaCon !' 'locaCon !‘ 'locaCon locaCon locaCon !' 'cute and great locaCon !' 'great stay and fabulous locaCon !' 'staff and locaCon !‘ 'wonderful locaCon !'
'fresh fruit pastries etc .''coffee shops etc . ' 'outback steak house etc . ' 'dinner walk around etc . ' 'french toast pancakes fresh fruit etc .‘ 'dinner walk around etc . ‘ 'bread toast etc .‘ 'a whole foods grocery etc . '
'the room was very clean ' 'the room was very big ' 'the room was very spacious by new york hotel standards ' 'the room i recieved was very spacious ' 'the decor of the room was very nice and modern ' 'cons : room was very small ' 'the hotel room was very nice ' 'i liked the locaCon and the room was very nice ' 'the king room at the back of the hotel was very quiet ' 'the room as very modern '
Observations • Clusters tend to differ by topic (hotel, location, staff) • Certain bias towards the beginning of the sentence,
especially in pure LSTM model • Sometimes failing to capture negation • LSTM prefers full sentences, CNN also forms clusters
of words and sentences
Qualitative Evaluation Distances in 10 most dense clusters
Future Work • General domain model from Wikipedia
Future Work • General domain model from Wikipedia • Improvements to LSTM implementations
§ Attention mechanism?
Future Work • General domain model from Wikipedia • Improvements to LSTM implementations
§ Attention mechanism? • Supervised tasks (question similarity, answer selection)
§ Use autoencoder representation as fixed features § Add a supervised classification layer
Future Work • General domain model from Wikipedia • Improvements to LSTM implementations
§ Attention mechanism? • Supervised tasks (question similarity, answer selection)
§ Use autoencoder representation as fixed features § Add a supervised classification layer
• Better CNN models, also on fine-grained levels § Deal with locality of convolution
Future Work • General domain model from Wikipedia • Improvements to LSTM implementations
§ Attention mechanism? • Supervised tasks (question similarity, answer selection)
§ Use autoencoder representation as fixed features § Add a supervised classification layer
• Better CNN models, also on fine-grained levels § Deal with locality of convolution
• Combine LSTM and CNN during encoding
Future Work • General domain model from Wikipedia • Improvements to LSTM implementations
§ Attention mechanism? • Supervised tasks (question similarity, answer selection)
§ Use autoencoder representation as fixed features § Add a supervised classification layer
• Better CNN models, also on fine-grained levels § Deal with locality of convolution
• Combine LSTM and CNN during encoding • Decode with CNNs à variable length sentences?