44
Deep Learning in Natural Language Processing Tong Wang Advisor: Prof. Ping Chen Computer Science University of Massachusetts Boston

Deep Learning in Natural Language Processingtwang/file/cs188_TongWang.pdf · Deep Learning in Natural Language Processing Tong Wang Advisor: Prof. Ping Chen Computer Science University

  • Upload
    others

  • View
    24

  • Download
    0

Embed Size (px)

Citation preview

Deep Learning in Natural Language Processing Tong Wang Advisor: Prof. Ping Chen Computer Science University of Massachusetts Boston

Outline � Natural Language Processing � Deep Learning in NLP � My Research Projects � My Path in Computer Science � My Experience to Find Internship

What is Natural Language Processing � Natural Language Processing is related to

the area of human-computer interaction. � Natural language understanding � Natural language generation

Natural Language Processing

https://d396qusza40orc.cloudfront.net/nlangp/lectures/intro.pdf http://www.cs.nyu.edu/~petrov/lecture1.pdf

Natural Language Processing

http://www.slideshare.net/BenjaminBengfort/introduction-to-machine-learning-with-scikitlearn

NLP Applications �  Information Extraction � Name Entity Recognition � Machine Translation � Question Answering � Topic Model �  Summarization

Information Extraction

https://d396qusza40orc.cloudfront.net/nlangp/lectures/intro.pdf

Name Entity Recognition � Classify elements in text into categories

such as location, time, name of person, organization.

�  Jim worked in Google corp. in 2012 �  (Jim)[person] worked in (Google corp.)

[organization] in (2012)[time]

Machine Translation

Machine Translation Difficulties � Words together are more than the sum

of their parts. � Can not translated word by word ◦  E.g, Fast food, Light rain

� Need a big dictionary with grammar rules in both languages, large start-up cost

� Require computer to understand

Question Answering �  IBM Watson won Jeopardy on 02/16/2011

Question Answering

Question Answering

Question Answering

NLP Tasks

https://class.coursera.org/nlp/lecture/124

Why NLP is hard � Basically text is not computer-friendly � Many different ways to represent the

same thing � Order and context are extremely

important � Language is very high dimensional and

sparse. Tons of rare words. ◦  B4 (before), IC (I see), cre8(create)

� Ambiguity

Ambiguity �  “At last, a computer understands you like

your mother” ◦  It understands you as well as your mother

understands you ◦  It understands (that) you like your mother ◦  It understands you as well as it understands

your mother

Ambiguity at Syntactic Level

https://d396qusza40orc.cloudfront.net/nlangp/lectures/intro.pdf

DEEP LEARNING IN NATURAL LANGUAGE PROCESSING

Deep Learning (Representation learning) in NLP

http://www.iro.umontreal.ca/~memisevr/dlss2015/DLSS2015-NLP-1.pdf

Deep Learning in NLP � Word Level Application: Word Embedding,

word2vec �  Sentence/paragraph Level Application:

Neural Machine Translation, doc2vec, etc.

Word Representation � The majority of rule-based and statistical

NLP work regarded words as atomic symbols

�  In vector space terms, this is a vector with one 1 and many zeros, it is called “one-hot” representation ◦ Condo: [0,0,0,0,1,0,0,…0] ◦ Apartment: [0,1,0,0,0,0,0,…0]

� These two vectors are orthogonal, no similarity

Word2vec

http://www.iro.umontreal.ca/~memisevr/dlss2015/DLSS2015-NLP-1.pdf

Word Embedding

From word2vec Parameter Learning Explained

Word Embedding

From Distributed Representations of Words and Phrases and their Compositionality

Word Embedding � W(‘woman’) – W(‘man’) ≈ W(‘queen’) –

W(‘king’)

Sentence Embedding

From Paragraph Vector - Stanford Computer Science

Recurrent Neural Network

http://colah.github.io/posts/2015-08-Understanding-LSTMs/

Neural Machine Translation

https://devblogs.nvidia.com/parallelforall/introduction-neural-machine-translation-gpus-part-2/

MY RESEARCH ROJECT

Text Simplification � Text simplification (TS) aims to simplify

the lexical, grammatical, or structural complexity of text while retaining its semantic meaning

�  It can help various groups of people, including children, non-native speakers, and people with cognitive disabilities

Lexical Simplification �  Substitute long and infrequent words with

shorter and more frequent words � Candidate selection ◦  Semantic similarity ◦  Syntax and grammar correct ◦ The meaning of the sentence remains the

same

� Disadvantage: On word level

Lexical Simplification � Lexical Simplification webpage: � http://158.121.178.171/

LS System �  For each word w in text: ◦ Check part of speech tagging of w �  Retrieve top 20 most similar words from word2vec �  For c in 20 candidate words:

�  If c is the same pos with w �  If c is not a different form of w, e.g, past tense. �  If w is more difficult than c: �  Put c in the sentence, compute sentence similarity and n-

gram � Otherwise continue

TS using Neural Machine Translation

� Original English and simplified English can be thought of as two different languages.

� TS would be the process to translate English to simplified English.

Text Simplification using Neural Machine Translation

AAAI 2016, Student abstract

Steps � Collecting training data ◦  Pairs of sentences: original sentence and

simplified sentence ◦  From English Wikipedia and Simple English

Wikipedia

� Build RNN Encoder Decoder Model � Evaluation

Use Sentence Similarity to Collect Training Data

From Siamese Recurrent Architectures for Learning Sentence Similarity

Other projects � Extended topic model for word

dependency � Opinion mining for chemical spill in West

Virginia ◦  http://158.121.178.175/

� Compression and data mining

My Path in Computer Science � Huazhong Agricultural University,

Information and Computing Science, BS, China, 2006 – 2010

� Bioinformatics lab, Huazhong Agricultural University, 2010 - 2010

� Northeastern University, Computer Systems Engineering, MS, 2011-2013

�  IoMosaic, Software Engineer, 2013 - 2013 � University of Massachusetts Boston,

Computer Science, PhD, 2014 - present

Keep Healthy � Play badminton almost every day from

Monday to Friday � Run 5 miles in weekend

Keys to find internship � Good resume � Did a lot of projects � Networking (Very important!) ◦ Go to conference ◦ Ask for job reference from professors, friends,

alumni, strangers from Linkedin…

Prepare interview � Know that company � Behavior questions � Technical questions ◦ You must start to practice programming in

your favorite language at least 1 month before the interview. (Leetcode)

� Thank you!