41
Deep Learning for Sentence Representation Internship Project Summary Yonatan Belinkov IBM Research - Haifa Summer 2015

Deep Learning for Sentence Representationpeople.csail.mit.edu/jrg/meetings/ibm-internship-summary...Deep Learning for Sentence Representation Internship Project Summary Yonatan Belinkov

  • Upload
    others

  • View
    7

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Deep Learning for Sentence Representationpeople.csail.mit.edu/jrg/meetings/ibm-internship-summary...Deep Learning for Sentence Representation Internship Project Summary Yonatan Belinkov

Deep Learning for Sentence Representation

Internship Project Summary

Yonatan Belinkov IBM Research - Haifa Summer 2015

Page 2: Deep Learning for Sentence Representationpeople.csail.mit.edu/jrg/meetings/ibm-internship-summary...Deep Learning for Sentence Representation Internship Project Summary Yonatan Belinkov

Goals •  Develop deep learning methods for representing

natural language sentences from text

•  Acquire knowledge in deep learning tools and techniques

Page 3: Deep Learning for Sentence Representationpeople.csail.mit.edu/jrg/meetings/ibm-internship-summary...Deep Learning for Sentence Representation Internship Project Summary Yonatan Belinkov

Background •  Vector representations (embeddings) for words and

sentences

Page 4: Deep Learning for Sentence Representationpeople.csail.mit.edu/jrg/meetings/ibm-internship-summary...Deep Learning for Sentence Representation Internship Project Summary Yonatan Belinkov

Background •  Vector representations (embeddings) for words and

sentences

•  Supervised vs unsupervised approaches

Page 5: Deep Learning for Sentence Representationpeople.csail.mit.edu/jrg/meetings/ibm-internship-summary...Deep Learning for Sentence Representation Internship Project Summary Yonatan Belinkov

Background •  Vector representations (embeddings) for words and

sentences •  Supervised vs unsupervised approaches

•  Neural network architectures Recursive (RecNN) Convolutional (CNN) Recurrent (RNN)

Page 6: Deep Learning for Sentence Representationpeople.csail.mit.edu/jrg/meetings/ibm-internship-summary...Deep Learning for Sentence Representation Internship Project Summary Yonatan Belinkov

RecNN

The cat sat on the mat

NP NP

PP

VP

S

Page 7: Deep Learning for Sentence Representationpeople.csail.mit.edu/jrg/meetings/ibm-internship-summary...Deep Learning for Sentence Representation Internship Project Summary Yonatan Belinkov

CNN

The cat sat on the mat

Page 8: Deep Learning for Sentence Representationpeople.csail.mit.edu/jrg/meetings/ibm-internship-summary...Deep Learning for Sentence Representation Internship Project Summary Yonatan Belinkov

RNN

The cat sat on the mat

Page 9: Deep Learning for Sentence Representationpeople.csail.mit.edu/jrg/meetings/ibm-internship-summary...Deep Learning for Sentence Representation Internship Project Summary Yonatan Belinkov

Autoencoder Formulation •  Given a sentence that is a sequence of word vectors

w1...wn, each of dimension d: §  Encode the sentence into a single vector representation §  Decode the representation back into the sentence

Page 10: Deep Learning for Sentence Representationpeople.csail.mit.edu/jrg/meetings/ibm-internship-summary...Deep Learning for Sentence Representation Internship Project Summary Yonatan Belinkov

Autoencoder Formulation •  Given a sentence that is a sequence of word vectors

w1...wn, each of dimension d: §  Encode the sentence into a single vector representation §  Decode the representation back into the sentence

•  During training §  Get feedback from original sentence, propogate in the network

to learn parameters

Page 11: Deep Learning for Sentence Representationpeople.csail.mit.edu/jrg/meetings/ibm-internship-summary...Deep Learning for Sentence Representation Internship Project Summary Yonatan Belinkov

Autoencoder Formulation •  Given a sentence that is a sequence of word vectors

w1...wn, each of dimension d: §  Encode the sentence into a single vector representation §  Decode the representation back into the sentence

•  During training §  Get feedback from original sentence, propogate in the network

to learn parameters •  During testing

§  Compare decoded sentence to original one

Page 12: Deep Learning for Sentence Representationpeople.csail.mit.edu/jrg/meetings/ibm-internship-summary...Deep Learning for Sentence Representation Internship Project Summary Yonatan Belinkov

Basic RNN model •  LSTM encoder-decoder (from Li, 2015)

Page 13: Deep Learning for Sentence Representationpeople.csail.mit.edu/jrg/meetings/ibm-internship-summary...Deep Learning for Sentence Representation Internship Project Summary Yonatan Belinkov

CNN Encoder

Time dimension

Word dimension

w11 w12 … w1d

w21 w22 … w2d

… … … … wn1 wn2 … wnd

Page 14: Deep Learning for Sentence Representationpeople.csail.mit.edu/jrg/meetings/ibm-internship-summary...Deep Learning for Sentence Representation Internship Project Summary Yonatan Belinkov

CNN Encoder

Time dimension

Word dimension

w11 w12 … w1d

w21 w22 … w2d

… … … … wn1 wn2 … wnd

Time dimension 1 (coarse-grained): include all word embedding dimensions (Kim 2014) Torch: nn.TemporalConvolution #params = embeddingDim * numFilters * filterWidth

Page 15: Deep Learning for Sentence Representationpeople.csail.mit.edu/jrg/meetings/ibm-internship-summary...Deep Learning for Sentence Representation Internship Project Summary Yonatan Belinkov

CNN Encoder

Time dimension

Word dimension

w11 w12 … w1d

w21 w22 … w2d

… … … … wn1 wn2 … wnd

Time dimension 1 (coarse-grained): include all word embedding dimensions (Kim 2014) Torch: nn.TemporalConvolution #params = embeddingDim * numFilters * filterWidth

Time dimension 1 (fine-grained): convolve each embedding dimension independently (Kalchbrenner 2014) Torch: nn.SpatialConvolution #params = 1 * numFilters * filterWidth

Page 16: Deep Learning for Sentence Representationpeople.csail.mit.edu/jrg/meetings/ibm-internship-summary...Deep Learning for Sentence Representation Internship Project Summary Yonatan Belinkov

CNN Encoder

Time dimension

Word dimension

w11 w12 … w1d

w21 w22 … w2d

… … … … wn1 wn2 … wnd

Time dimension 1 (coarse-grained): include all word embedding dimensions (Kim 2014) Torch: nn.TemporalConvolution #params = embeddingDim * numFilters * filterWidth

Time dimension 1 (fine-grained): convolve each embedding dimension independently (Kalchbrenner 2014) Torch: nn.SpatialConvolution #params = 1 * numFilters * filterWidth

Word dimension (fine-grained): convolve each word independently (???) Torch: nn.SpatialConvolution #params = 1 * numFilters * filterWidth

Page 17: Deep Learning for Sentence Representationpeople.csail.mit.edu/jrg/meetings/ibm-internship-summary...Deep Learning for Sentence Representation Internship Project Summary Yonatan Belinkov

Loss Functions •  Log-likelihood of predicted words from the decoder

§  Penalize for every wrong word §  Word order matters

•  Cosine distance between bag-of-words representations of gold and predicted sentences §  Representation the size of the vocabulary §  Word order doesn’t matter

Page 18: Deep Learning for Sentence Representationpeople.csail.mit.edu/jrg/meetings/ibm-internship-summary...Deep Learning for Sentence Representation Internship Project Summary Yonatan Belinkov

Implementation Details •  Torch •  Minimal preprocessing of sentences •  Optimization with AdaGrad •  Dropout •  1000 dimensions for word and sentence vectors

Page 19: Deep Learning for Sentence Representationpeople.csail.mit.edu/jrg/meetings/ibm-internship-summary...Deep Learning for Sentence Representation Internship Project Summary Yonatan Belinkov

Data •  Hotel reviews §  “we were the only people on our floor who spoke english” §  “first rate ! the rooms look like they have been recently renovated .” §  “recently stayed at the colonnade .”

•  Dataset sizes (# sentences)

Train set Validation

set Test set

10K-1M 100 100

Page 20: Deep Learning for Sentence Representationpeople.csail.mit.edu/jrg/meetings/ibm-internship-summary...Deep Learning for Sentence Representation Internship Project Summary Yonatan Belinkov

Quantitative Evaluation Machine translation metrics: how well the decoded sentence

“translates” the original sentence

Encoder Train size

BLEU Meteor Val error Train error

LSTM 100K 39.3 32.9 27.3 9.3

LSTM 1M 55.2 42.6 12.5 10.4

LSTM (drop 0.3) 1M 63.9 45.1 14.7 7.0

CNN (word) 100K 0.8 5.3 62.9 55.2

CNN (time, fine-grained) 100K 0.6 6.2 53.8 49.1

CNN (time, coarse-grained) 100K 16.6 20.7 39.0 26.0

Page 21: Deep Learning for Sentence Representationpeople.csail.mit.edu/jrg/meetings/ibm-internship-summary...Deep Learning for Sentence Representation Internship Project Summary Yonatan Belinkov

Quantitative Evaluation Machine translation metrics: how well the decoded sentence

“translates” the original sentence

Encoder Train size

BLEU Meteor Val error Train error

LSTM 100K 39.3 32.9 27.3 9.3

LSTM 1M 55.2 42.6 12.5 10.4

LSTM (drop 0.3) 1M 63.9 45.1 14.7 7.0

CNN (word) 100K 0.8 5.3 62.9 55.2

CNN (time, fine-grained) 100K 0.6 6.2 53.8 49.1

CNN (time, coarse-grained) 100K 16.6 20.7 39.0 26.0

Page 22: Deep Learning for Sentence Representationpeople.csail.mit.edu/jrg/meetings/ibm-internship-summary...Deep Learning for Sentence Representation Internship Project Summary Yonatan Belinkov

Quantitative Evaluation Machine translation metrics: how well the decoded sentence

“translates” the original sentence

Encoder Train size

BLEU Meteor Val error Train error

LSTM 100K 39.3 32.9 27.3 9.3

LSTM (drop 0.1) 1M 55.2 42.6 12.5 10.4

LSTM (drop 0.3) 1M 63.9 45.1 14.7 7.0

CNN (word) 100K 0.8 5.3 62.9 55.2

CNN (time, fine-grained) 100K 0.6 6.2 53.8 49.1

CNN (time, coarse-grained) 100K 16.6 20.7 39.0 26.0

Page 23: Deep Learning for Sentence Representationpeople.csail.mit.edu/jrg/meetings/ibm-internship-summary...Deep Learning for Sentence Representation Internship Project Summary Yonatan Belinkov

Quantitative Evaluation Machine translation metrics: how well the decoded sentence

“translates” the original sentence

Encoder Train size

BLEU Meteor Val error Train error

LSTM 100K 39.3 32.9 27.3 9.3

LSTM 1M 55.2 42.6 12.5 10.4

LSTM (drop 0.3) 1M 63.9 45.1 14.7 7.0

CNN (word) 100K 0.8 5.3 62.9 55.2

CNN (time, fine-grained) 100K 0.6 6.2 53.8 49.1

CNN (time, coarse-grained) 100K 16.6 20.7 39.0 26.0

Page 24: Deep Learning for Sentence Representationpeople.csail.mit.edu/jrg/meetings/ibm-internship-summary...Deep Learning for Sentence Representation Internship Project Summary Yonatan Belinkov

More observations •  Bag-of-words based loss did not help

Page 25: Deep Learning for Sentence Representationpeople.csail.mit.edu/jrg/meetings/ibm-internship-summary...Deep Learning for Sentence Representation Internship Project Summary Yonatan Belinkov

More observations •  Bag-of-words based loss did not help

•  Preliminary results on Wikipedia are much lower

•  Possible explanations

§  Open domain, larger vocabulary, longer sentences

Model Train size BLEU Meteor

LSTM-LSTM 1M sentences 18.5 21.8

Page 26: Deep Learning for Sentence Representationpeople.csail.mit.edu/jrg/meetings/ibm-internship-summary...Deep Learning for Sentence Representation Internship Project Summary Yonatan Belinkov

Qualitative Evaluation •  Run trained model on unseen sentences

•  Compare original and decoded sentences

Page 27: Deep Learning for Sentence Representationpeople.csail.mit.edu/jrg/meetings/ibm-internship-summary...Deep Learning for Sentence Representation Internship Project Summary Yonatan Belinkov

Gold sentence Predicted sentence

1 we were the only people on our floor who spoke english

we were only the people who on our floor group seemed on top ,

2 which was nice . the place needs updated , which was nice . the place needs updating ,

3 but it's not horrible . but it's not horrible .

4 recently stayed at the colonnade . recently stayed at the conrad .

5 i must say i was extremely impressed with the staff and overall appearance of the hotel .

i must say i was extremely impressed with the cleanliness and helpfulness of the staff overall .

6 i would definitely stay here again and would recommend this hotel to family and friends .

i would definitely stay here again and would recommend this hotel to friends and family

Qualitative Evaluation

Page 28: Deep Learning for Sentence Representationpeople.csail.mit.edu/jrg/meetings/ibm-internship-summary...Deep Learning for Sentence Representation Internship Project Summary Yonatan Belinkov

Qualitative Evaluation Gold sentence Predicted sentence

1 we were the only people on our floor who spoke english

we were only the people who on our floor group seemed on top ,

2 which was nice . the place needs updated , which was nice . the place needs updating ,

3 but it's not horrible . but it's not horrible .

4 recently stayed at the colonnade . recently stayed at the conrad .

5 i must say i was extremely impressed with the staff and overall appearance of the hotel .

i must say i was extremely impressed with the cleanliness and helpfulness of the staff overall .

6 i would definitely stay here again and would recommend this hotel to family and friends .

i would definitely stay here again and would recommend this hotel to friends and family

Page 29: Deep Learning for Sentence Representationpeople.csail.mit.edu/jrg/meetings/ibm-internship-summary...Deep Learning for Sentence Representation Internship Project Summary Yonatan Belinkov

Qualitative Evaluation Gold sentence Predicted sentence

1 we were the only people on our floor who spoke english

we were only the people who on our floor group seemed on top ,

2 which was nice . the place needs updated , which was nice . the place needs updating ,

3 but it's not horrible . but it's not horrible .

4 recently stayed at the colonnade . recently stayed at the conrad .

5 i must say i was extremely impressed with the staff and overall appearance of the hotel .

i must say i was extremely impressed with the cleanliness and helpfulness of the staff overall .

6 i would definitely stay here again and would recommend this hotel to family and friends .

i would definitely stay here again and would recommend this hotel to friends and family

Page 30: Deep Learning for Sentence Representationpeople.csail.mit.edu/jrg/meetings/ibm-internship-summary...Deep Learning for Sentence Representation Internship Project Summary Yonatan Belinkov

Qualitative Evaluation Gold sentence Predicted sentence

1 we were the only people on our floor who spoke english

we were only the people who on our floor group seemed on top ,

2 which was nice . the place needs updated , which was nice . the place needs updating ,

3 but it's not horrible . but it's not horrible .

4 recently stayed at the colonnade . recently stayed at the conrad .

5 i must say i was extremely impressed with the staff and overall appearance of the hotel .

i must say i was extremely impressed with the cleanliness and helpfulness of the staff overall .

6 i would definitely stay here again and would recommend this hotel to family and friends .

i would definitely stay here again and would recommend this hotel to friends and family

Page 31: Deep Learning for Sentence Representationpeople.csail.mit.edu/jrg/meetings/ibm-internship-summary...Deep Learning for Sentence Representation Internship Project Summary Yonatan Belinkov

Qualitative Evaluation •  Run trained model on train sentences

•  Create vector representations for train sentences

•  Cluster vectors with k-means

Page 32: Deep Learning for Sentence Representationpeople.csail.mit.edu/jrg/meetings/ibm-internship-summary...Deep Learning for Sentence Representation Internship Project Summary Yonatan Belinkov

LSTM Encoder Clusters  'i  would  definitely  stay  here  again  .  i  love  it  !      'but  i  would  stay  here  again  .    '  'i  think  i  would  stay  here  again      '  'i  would  (  and  will  )  stay  here  again  .    '  'i  would  100%  stay  here  again  .    '  'i  would  come  here  again  .    '  'i  would  consider  staying  here  again  .    '  'would  i  stay  here  again  ?    '  

 

'our  staff  was  friendly  and  very  fast  to  help  us  .    '  'but  the  staff  was  very  friendly  and  accommodaCng  .    '  'staff  in  the  recepCon  was  very  friendly  .    '  'the  check  in  staff  was  very  friendly  and  helpful  .    '  'the  construcCon  was  complete  .  the  staff  is  very  friendly  and  helpful      '  'the  hotel  staff  was  very  friendly  and  open  to  helping  make  dinner      '  'the  internet  was  free    and  the  staff  was  very  friendly  .  '  

'and  the  hotel  is  in  an  excellent  locaCon  .    '  'hotel  is  in  a  great  locaCon  -­‐  nothing  wrong  with  the  neighbourhood  .    '  'the  back  bay  hotel  is  in  a  great  locaCon      '  'the  edison  hotel  is  in  a  perfect  locaCon      '  'the  hotel  circle  is  in  a  good  locaCon  i  think      '  'the  hotel  is  a  fine  hotel  in  a  great  area      '  'the  hotel  is  huge  and  in  a  good  downtown  locaCon  .    '  

 

Page 33: Deep Learning for Sentence Representationpeople.csail.mit.edu/jrg/meetings/ibm-internship-summary...Deep Learning for Sentence Representation Internship Project Summary Yonatan Belinkov

CNN Encoder Clusters 'great  locaCon  !'  'locaCon  !‘  'locaCon  locaCon  locaCon  !'  'cute  and  great  locaCon  !'  'great  stay  and  fabulous  locaCon  !'  'staff  and  locaCon  !‘  'wonderful  locaCon  !'  

'fresh  fruit    pastries    etc  .''coffee  shops    etc  .  '  'outback  steak  house    etc  .  '  'dinner    walk  around  etc  .  '  'french  toast    pancakes    fresh  fruit  etc  .‘  'dinner    walk  around  etc  .  ‘  'bread    toast    etc  .‘  'a  whole  foods  grocery    etc  .  '  

'the  room  was  very  clean  '  'the  room  was  very  big  '  'the  room  was  very  spacious  by  new  york  hotel  standards  '  'the  room  i  recieved  was  very  spacious  '    'the  decor  of  the  room  was  very  nice  and  modern  '    'cons  :  room  was  very  small  '    'the  hotel  room  was  very  nice  '    'i  liked  the  locaCon  and  the  room  was  very  nice  '    'the  king  room  at  the  back  of  the  hotel  was  very  quiet  '    'the  room  as  very  modern  '  

Page 34: Deep Learning for Sentence Representationpeople.csail.mit.edu/jrg/meetings/ibm-internship-summary...Deep Learning for Sentence Representation Internship Project Summary Yonatan Belinkov

Observations •  Clusters tend to differ by topic (hotel, location, staff) •  Certain bias towards the beginning of the sentence,

especially in pure LSTM model •  Sometimes failing to capture negation •  LSTM prefers full sentences, CNN also forms clusters

of words and sentences

Page 35: Deep Learning for Sentence Representationpeople.csail.mit.edu/jrg/meetings/ibm-internship-summary...Deep Learning for Sentence Representation Internship Project Summary Yonatan Belinkov

Qualitative Evaluation Distances in 10 most dense clusters

Page 36: Deep Learning for Sentence Representationpeople.csail.mit.edu/jrg/meetings/ibm-internship-summary...Deep Learning for Sentence Representation Internship Project Summary Yonatan Belinkov

Future Work •  General domain model from Wikipedia

Page 37: Deep Learning for Sentence Representationpeople.csail.mit.edu/jrg/meetings/ibm-internship-summary...Deep Learning for Sentence Representation Internship Project Summary Yonatan Belinkov

Future Work •  General domain model from Wikipedia •  Improvements to LSTM implementations

§  Attention mechanism?

Page 38: Deep Learning for Sentence Representationpeople.csail.mit.edu/jrg/meetings/ibm-internship-summary...Deep Learning for Sentence Representation Internship Project Summary Yonatan Belinkov

Future Work •  General domain model from Wikipedia •  Improvements to LSTM implementations

§  Attention mechanism? •  Supervised tasks (question similarity, answer selection)

§  Use autoencoder representation as fixed features §  Add a supervised classification layer

Page 39: Deep Learning for Sentence Representationpeople.csail.mit.edu/jrg/meetings/ibm-internship-summary...Deep Learning for Sentence Representation Internship Project Summary Yonatan Belinkov

Future Work •  General domain model from Wikipedia •  Improvements to LSTM implementations

§  Attention mechanism? •  Supervised tasks (question similarity, answer selection)

§  Use autoencoder representation as fixed features §  Add a supervised classification layer

•  Better CNN models, also on fine-grained levels §  Deal with locality of convolution

Page 40: Deep Learning for Sentence Representationpeople.csail.mit.edu/jrg/meetings/ibm-internship-summary...Deep Learning for Sentence Representation Internship Project Summary Yonatan Belinkov

Future Work •  General domain model from Wikipedia •  Improvements to LSTM implementations

§  Attention mechanism? •  Supervised tasks (question similarity, answer selection)

§  Use autoencoder representation as fixed features §  Add a supervised classification layer

•  Better CNN models, also on fine-grained levels §  Deal with locality of convolution

•  Combine LSTM and CNN during encoding

Page 41: Deep Learning for Sentence Representationpeople.csail.mit.edu/jrg/meetings/ibm-internship-summary...Deep Learning for Sentence Representation Internship Project Summary Yonatan Belinkov

Future Work •  General domain model from Wikipedia •  Improvements to LSTM implementations

§  Attention mechanism? •  Supervised tasks (question similarity, answer selection)

§  Use autoencoder representation as fixed features §  Add a supervised classification layer

•  Better CNN models, also on fine-grained levels §  Deal with locality of convolution

•  Combine LSTM and CNN during encoding •  Decode with CNNs à variable length sentences?