Upload
others
View
5
Download
0
Embed Size (px)
Citation preview
Natural Language Processing
Introduction Spring 2019
Based on slides from Chris Manning, Michael Collins, and Yoav Artzi
2001: A Space Odyssey
In case you didn’t notice
• The class will be in English
What is NLP?• Goal: Develop methods for processing, analyzing
and understanding the structure and meaning of language.
NLPLanguage
Structure
Meaning• For all languages
• Many “levels”: transcribed, newswire ,social media… 4
• Application: Build systems that help people do stuff (with text):
• Question answering, virtual assistants, automatic translation, etc.
• There is a lot of language out there…
What is NLP?
5
Applications are taking off!• Search
• Translation
• Sentiment analysis
• Speech recognition
• Chatbots
• Virtual assistants
• Ad matching
• …
6
Levels of analysis• Phonology: sounds that make up language (more focus on the speech community)
• daˈvɪd ben gurˈjo:n: דוד בן-גוריון
• Morphology: internal structure of words
• ובבתיהם
• Syntax: structure of phrases, how words modify one another
• pretty little girl’s school
• Semantics: meaning of language in the world
• students on the back row
• Discourse: relations between clauses and sentences
• I registered to the class on NLP because it is fascinating
• Pragmatics: the way we use language in context
• It’s hot in here
7
What’s so special about language
• Created by humans for communication
• מותר האדם מן הבהמה
• Learned from experience (!!)
• A symbolic/discrete system:
• table:
• piano:
• that pre-dates logic…
8
Language is special• Encoded via continuous signals:
• Sounds
• Gestures
• Image (writing)
• Brain encodings are continuous
9
Why is it hard?
• ambiguity
• variability
• sparsity
• grounding
10
Ambiguity
1. The computer understands you as well as your mother understands you. 2. The computer understands that you like your mother. 3. The computer understands you as well as it understands your mother.
“Finally, a computer that understands you like your mother” (Ad , 1985)
Lee, 2004 11
Ambiguity
1. The computer understands you as well as your mother understands you. 2. The computer understands that you like your mother. 3. The computer understands you as well as it understands your mother.
“Finally, a computer that understands you like your mother” (Ad , 1985)
“Finally, a computer that understands your lie cured mother”
Lee, 2004 11
Syntactic ambiguity
VP
V
discuss
NP
violence
PP
P
on
NP
TV
VP
V
discuss
NP
N
violence
PP
P
on
NP
TV
Even short sentences have hundreds of analyses
12
Semantic ambiguity
• There’s a river that crosses every town
• Quantification ambiguity (not syntactic)
• חולצה מטיילת בואדי
• Lexical, syntactic and semantic ambiguity
13
Ambiguity• Headlines:
• Enraged Cow Injures Farmer with Ax
• Ban on Nude Dancing on Governor’s Desk
• Teacher Strikes Idle Kids
• Hospitals Are Sued by 7 Foot Doctors
• Iraqi Head Seeks Arms
• Stolen Painting Found by Tree
• Kids Make Nutritious Snacks
• Local HS Dropouts Cut in Half
more
14
Variability• Crucial in semantics
• Dow ends up 255 points
• Dow climbs 255
• All major stock markets surged
• Dow gains 255 points
• Stock market hits a high record
• The Dow Jones Industrial Average close up 25589-680 23
Variability of Semantic Expression
Dow ends up
Dow climbs 255
The Dow Jones Industrial Average closed up 255
Stock market hits a record high
Dow gains 255 points All major stock markets surged
15
Sparsity
“I ate an apple”
• Unigrams: “I”, “ate”, “an”, “apple”
• Bigrams: “I ate”, “ate an”, “an apple”
16
Sparsity
“I ate an apple”
• Unigrams: “I”, “ate”, “an”, “apple”
• Bigrams: “I ate”, “ate an”, “an apple”
16
Compositionality is key!
Grounding
• Humans do not learn language by observing an endless stream of text.
17
Interdisciplinary field
NLPCS ML/statistics
Linguistics
AI
18
Related fields
• Computational Linguistics
• Use computational models to learn about language (e.g., meaning change)
• Cognitive Science
• How does the human brain process language?
19
NLP applications• Text categorization
• Information Extraction
• Search
• Question Answering
• Virtual assistants
• Machine translation
• Summarization
• Reading comprehension
20
Text categorization
Sports
Politics
Science
21
Information extractionInformation Extraction• Unstructured text to database entries
• SOTA: perhaps 80% accuracy for multi-sentence temples, 90%+ for single easy fields
• But remember: information is redundant!
New York Times Co. named Russell T. Lewis, 45, president and general manager of its flagship New York Times newspaper, responsible for all business-side activities. He was executive vice president and deputy general manager. He succeeds Lance R. Primis, who in September was named president and chief operating officer of the parent.
startpresident and CEONew York Times Co.Lance R. Primis
endexecutive vice president
New York Times newspaper
Russell T. Lewis
startpresident and general manager
New York Times newspaper
Russell T. Lewis
StatePostCompanyPerson
22
Question answeringQuestion Answering• More than search
23
Question answeringQuestion Answering• More than search
24
Critique
• Douglas Hofstadter: “just a text search algorithm connected to a database, just like Google search. It doesn’t understand what it’s reading.”
26
Virtual assistants
Move all my Wed meetings in April with John to 5pm
27
Machine translation
28
Machine Translation
Machine Translation
SummarizationSummarization
• Condensing documents– Single or
multiple docs– Extractive or
abstractive• Very context-
dependent!
31
Reading ComprehensionLanguage Comprehension"The rock was still wet. The animal was glistening, like it was still swimming," recalls Hou Xianguang. Hou discovered the unusual fossil while surveying rocks as a paleontology graduate student in 1984, near the Chinese town of Chengjiang. "My teachers always talked about the Burgess Shale animals. It looked like one of them. My hands began to shake." Hou had indeed found a Naraoia like those from Canada. However, Hou's animal was 15 million years older than its Canadian relatives.
It can be inferred that Hou Xianguang's "hands began to shake" because he was(A) afraid that he might lose the fossil(B) worried about the implications of his finding(C) concerned that he might not get credit for his work(D) uncertain about the authenticity of the fossil(E) excited about the magnitude of his discovery
32
!33
[Turing, 1950]
I propose to consider the question, "Can machines think?" This should begin with definitions of the meaning of the terms "machine" and "think"... If the meaning of the words "machine" and "think" are to be found by examining how they are commonly used... is to be sought in a statistical survey such as a Gallup poll. But this is absurd... I shall replace the question by another, which is closely related to it and is expressed in relatively unambiguous words.
Turing test:
Some history
Some historyA. Colorless green ideas sleep furiously
B. Furiously sleep ideas green colorless“It is fair to assume that neither sentence (1) nor (2) … had ever occurred in an English discourse. Hence, in any statistical model for grammaticalness, these sentences will be ruled out on identical grounds as equally “remote” from English. Yet (1), though nonsensical, is grammatical, while (2) is not.”
(Chomsky, Syntactic structures,1957)
34
History: rule-based• 70s and 80s:
• Grammars (rules) of English syntax
• Small domains
• Substantial engineering effort
S!NP VP
NP!DET N
VP!Vt N
VP!Vi
N!man
N!woman
DET!the
DET!a
Vt!hugged
Vi!slept
35
!36
Empirical revolution“Whenever I fire a linguist, our system performance improves.”
Jelinek (?), 1988
Of course, we must not go overboard and mistakenly conclude that the successes of statistical NLP render linguistics irrelevant (rash statements to this effect have been made in the past, e.g., the notorious remark, “Every time I fire a linguist, my performance goes up”). The information and insight that linguists, psychologists, and others have gathered about language is invaluable in creating high-performance broad domain language understanding systems; for instance, in the speech recognition setting described above, better understanding of language structure can lead to better language models.” (Lilian Lee, 2001)
When I look at an article in Russian, I say: 'This is really written in English, but it has been coded in some strange symbols. I will now proceed to decode.’
Weaver, 1955
37
Empirical Revolution
• 1990: corpus-based statistical methods
• 2000s: rich structural formalisms
• 2010s: representation learning with neural nets
38
Class goals
• Appreciate the challenges of NLP
• Learn about NLP problems and their solution
• Learn general purpose methods
• Be able to read modern research in NLP
39
Disclaimer• This class borrows from NLP classes around the world
• Statistical NLP by Michael Collins
• Stanford 224n
• Others
• You are welcome to watch videos, read those if you feel more comfortable with that
• However - slides are not self-contained. You need to be in class to fully understand their meaning
• Classes might be video-taped?
40
Plan
• Class 1: introduction + word embeddings
• Class 2: Word embeddings
• Class 3: Language models, contextualized word representations
• Class 4: RNNs
• Class 5: Tagging, log-linear models
• Class 6: Globally-normalized models, deep learning for tagging
• Technical foci: structured prediction and deep learning
Plan• Class 7: Introduction to parsing, CKY
• Class 8: Lexical PCFG
• Class 9: Deep syntactic parsing
• Class 10: Semantic parsing, sequence-to-sequence models
• Class 11: Projects
• Class 12: Sequence-to-sequence models
Prerequisites• Introduction to machine learning
• Proficiency in programming (some assignments require python)
• Implied from intro to ML:
• Probability
• Linear algebra
• Calculus
• Algorithms
Assignments (tentative)• Every two classes. Tentative plan:
• Assignment 1: Word embeddings
• Assignment 2: Language modeling
• Assignment 3: Sequence tagging
• Assignment 4: Deep learning
• Assignment 5: Syntax
Assignments• Five homework assignments
• All involve coding
• Submit in pairs or triplets
• Submit by e-mail to the 2 TAs:
• E-mail title: Assignment <number>: <firstname1> <lastname1> <id1> <firstname2> <lastname2> <id2>
• Some simple written questions
Administration• Instructor: Jonathan Berant
• TAs: Ohad Rubin, Omri Koshorek
• Time: Tue, 13-16
• Place: Dan David 001
• Office hours: coordinate by e-mail
• Website: https://www.cs.tau.ac.il/~joberant/teaching/advanced_nlp_spring_2019/index.html
• Forum: http://moodle.tau.ac.il/course/view.php?id=368307701
• Grades: 50% final project, 50% homework assignments
Final project• Groups of 2-3 people (best to just have same team as for home
assignments)
• There will be a default project - TBD
• Or you can do a research projects
• Choose a research paper, implement it, extend it
• Build a model that does something you care about
• Define a research problem and work on it
• If you are taking Prof. Globerson’s class (advance ML) it is advised that you have one project for both classes. Standard for such project will be higher. All members of team must be enrolled in both classes.
Final project• Research projects:
• Decide on project by roughly week 10.
• Get 1-bit approval
• Present in last class
• 2-3 top-tier papers result from the class per year.
• See class website for examples
Textbooks• Martin and Jurafsky:
• Speech and Language Processing: An Introduction to Natural Language Processing
• Schutze and Manning:
• Statistical Foundations of NLP
• Noah Smith:
• Linguistic structure prediction
• Yoav Goldberg:
• Neural Network Methods for Natural Language Processing