77
A Primer on Natural Language Processing Mohammad Taher Pilehvar TeIAS Summer School on Data Science 26 August 2019

A Primer on Natural Language Processing...Natural Language Processing Main Current Research Challenges * Existing challenges in NLP Natural Language Understanding •Learning language

  • Upload
    others

  • View
    24

  • Download
    0

Embed Size (px)

Citation preview

Page 1: A Primer on Natural Language Processing...Natural Language Processing Main Current Research Challenges * Existing challenges in NLP Natural Language Understanding •Learning language

A Primer onNatural Language Processing

Mohammad Taher Pilehvar

TeIAS Summer School on Data Science

26 August 2019

Page 2: A Primer on Natural Language Processing...Natural Language Processing Main Current Research Challenges * Existing challenges in NLP Natural Language Understanding •Learning language

Artificial Intelligence

Design algorithms that make computers behave intelligently

But, what is intelligent behavior?

Image from threatpost.com

Page 3: A Primer on Natural Language Processing...Natural Language Processing Main Current Research Challenges * Existing challenges in NLP Natural Language Understanding •Learning language

Artificial IntelligenceScenario 1: Vision

Page 4: A Primer on Natural Language Processing...Natural Language Processing Main Current Research Challenges * Existing challenges in NLP Natural Language Understanding •Learning language

Artificial IntelligenceScenario 1: Vision

Page 5: A Primer on Natural Language Processing...Natural Language Processing Main Current Research Challenges * Existing challenges in NLP Natural Language Understanding •Learning language

Artificial IntelligenceScenario 1: Vision

To a non-intelligent computer, photos are nothing but sets of colored pixels

Page 6: A Primer on Natural Language Processing...Natural Language Processing Main Current Research Challenges * Existing challenges in NLP Natural Language Understanding •Learning language

Artificial Intelligence

Scenario 1: Vision (face detection/recognition)

Page 7: A Primer on Natural Language Processing...Natural Language Processing Main Current Research Challenges * Existing challenges in NLP Natural Language Understanding •Learning language

Artificial Intelligence

Scenario 1: Vision (autonomous cars)

Page 8: A Primer on Natural Language Processing...Natural Language Processing Main Current Research Challenges * Existing challenges in NLP Natural Language Understanding •Learning language

Artificial Intelligence

Scenario 2: Motion/Manipulation (Robotics)

Page 9: A Primer on Natural Language Processing...Natural Language Processing Main Current Research Challenges * Existing challenges in NLP Natural Language Understanding •Learning language

Artificial Intelligence

Scenario 3: Learning/Planning

Page 10: A Primer on Natural Language Processing...Natural Language Processing Main Current Research Challenges * Existing challenges in NLP Natural Language Understanding •Learning language

Artificial Intelligence

Scenario 4: Natural language!

??!!

What’s the capital of Iran?

Page 11: A Primer on Natural Language Processing...Natural Language Processing Main Current Research Challenges * Existing challenges in NLP Natural Language Understanding •Learning language

Artificial Intelligence

Khatam

01001011 01101000 01110100 01101000 01101101

K h a t a m

Scenario 4: Natural language!

Page 12: A Primer on Natural Language Processing...Natural Language Processing Main Current Research Challenges * Existing challenges in NLP Natural Language Understanding •Learning language

Artificial Intelligence

Scenario 4: Natural language!

Make computers

understand and

generate natural

language

Page 13: A Primer on Natural Language Processing...Natural Language Processing Main Current Research Challenges * Existing challenges in NLP Natural Language Understanding •Learning language

Natural Language Processing(Computational Linguistics)

NLP ML AI

Page 14: A Primer on Natural Language Processing...Natural Language Processing Main Current Research Challenges * Existing challenges in NLP Natural Language Understanding •Learning language

Natural Language Processing(Computational Linguistics)

Page 15: A Primer on Natural Language Processing...Natural Language Processing Main Current Research Challenges * Existing challenges in NLP Natural Language Understanding •Learning language

Natural Language Processing(Computational Linguistics)

Natural Language Understanding Natural Language Generation

*

Page 16: A Primer on Natural Language Processing...Natural Language Processing Main Current Research Challenges * Existing challenges in NLP Natural Language Understanding •Learning language

Difficulties of Language Understanding

Page 17: A Primer on Natural Language Processing...Natural Language Processing Main Current Research Challenges * Existing challenges in NLP Natural Language Understanding •Learning language

Difficulty of Language Understanding

Common sense knowledge

• The trophy would not fit in the brown suitcase because it is too big.• What is too big?

Page 18: A Primer on Natural Language Processing...Natural Language Processing Main Current Research Challenges * Existing challenges in NLP Natural Language Understanding •Learning language

Difficulty of Language Understanding

Common sense knowledge

• The trophy would not fit in the brown suitcase because it is too big.• What is too big?

• The town councilors refused to give the demonstrators a permit because they feared (advocated) violence. • Who feared (advocated) violence?

Page 19: A Primer on Natural Language Processing...Natural Language Processing Main Current Research Challenges * Existing challenges in NLP Natural Language Understanding •Learning language

Difficulty of Language Understanding

Context

“It is raining outside. This is the reason why I won't go out”.

• What is the reason to not go outside?• This?

Coreference resolution:• I did not vote for Donald Trump because I think he is a lier!

Anaphora resolution:• I bought a new Thinkpad, I have an old Macbook. I am going to give it away!

Page 20: A Primer on Natural Language Processing...Natural Language Processing Main Current Research Challenges * Existing challenges in NLP Natural Language Understanding •Learning language

Difficulty of Language Understanding

Slang, idioms and sarcasm

• Those shoes are goat; She is busted; He is rather a frenemy

• In a nutshell; piece of cake; think outside the box; bad apple; get the picture

• That’s just what I needed today!(When something bad happens)

Page 21: A Primer on Natural Language Processing...Natural Language Processing Main Current Research Challenges * Existing challenges in NLP Natural Language Understanding •Learning language

Difficulty of Language Understanding

Ambiguity

Illustration from IBM Watson

Page 22: A Primer on Natural Language Processing...Natural Language Processing Main Current Research Challenges * Existing challenges in NLP Natural Language Understanding •Learning language

Difficulty of Language Understanding

Ambiguity

Page 23: A Primer on Natural Language Processing...Natural Language Processing Main Current Research Challenges * Existing challenges in NLP Natural Language Understanding •Learning language

Difficulty of Language Understanding

Ambiguity

Page 24: A Primer on Natural Language Processing...Natural Language Processing Main Current Research Challenges * Existing challenges in NLP Natural Language Understanding •Learning language

Difficulty of Language Understanding

Ambiguity

Page 25: A Primer on Natural Language Processing...Natural Language Processing Main Current Research Challenges * Existing challenges in NLP Natural Language Understanding •Learning language

Difficulty of Language Understanding

Ambiguity

Page 26: A Primer on Natural Language Processing...Natural Language Processing Main Current Research Challenges * Existing challenges in NLP Natural Language Understanding •Learning language

Difficulty of Language Understanding

Amazon fire!

Page 27: A Primer on Natural Language Processing...Natural Language Processing Main Current Research Challenges * Existing challenges in NLP Natural Language Understanding •Learning language

Difficulty of Language Understanding

Metonymic Ambiguity

• London voted to stay in the EU

• The White House admits Trump is lying to manipulate his voters

• The kettle is boiling

• Iran beat Cuba after dropping first two sets

Page 28: A Primer on Natural Language Processing...Natural Language Processing Main Current Research Challenges * Existing challenges in NLP Natural Language Understanding •Learning language

Difficulty of Language Understanding

Syntactic Ambiguity

I heard his cell phone ring in my office

Page 29: A Primer on Natural Language Processing...Natural Language Processing Main Current Research Challenges * Existing challenges in NLP Natural Language Understanding •Learning language

WiC (Word-in-Context) dataset(Pilehvar and Collados, 2019, nominated for IJCAI’s research excellence award)

Label Target Context-1 Context-2

False bedThere's a lot of trash on the bed of the river

I keep a glass of water next to my bed when I sleep

Label Target Context-1 Context-2

False bedThere's a lot of trash on the bed of the river

I keep a glass of water next to my bed when I sleep

False landThe pilot managed to land the airplane safely

The enemy landed several of our aircrafts

Label Target Context-1 Context-2

False bedThere's a lot of trash on the bed of the river

I keep a glass of water next to my bed when I sleep

False landThe pilot managed to land the airplane safely

The enemy landed several of our aircrafts

True air Air pollutionOpen a window and let in some air

Label Target Context-1 Context-2

False bedThere's a lot of trash on the bed of the river

I keep a glass of water next to my bed when I sleep

False landThe pilot managed to land the airplane safely

The enemy landed several of our aircrafts

True air Air pollutionOpen a window and let in some air

True windowThe expanded window will give us time to catch the thieves

You have a two-hour window of clear weather to finish working on the lawn

Page 30: A Primer on Natural Language Processing...Natural Language Processing Main Current Research Challenges * Existing challenges in NLP Natural Language Understanding •Learning language

WiC (Word-in-Context) dataset

Team System Accuracy

Google BERT++ 69.9

Facebook AI RoBERTa 69.6

Stanford Hazy Research Snorkel 72.1

Performance upperbound -- 80.0

Page 31: A Primer on Natural Language Processing...Natural Language Processing Main Current Research Challenges * Existing challenges in NLP Natural Language Understanding •Learning language

Difficulty of Language Generation

Massive vocabulary size

Dynamic word order

Syntax and grammar

Fluency

Page 32: A Primer on Natural Language Processing...Natural Language Processing Main Current Research Challenges * Existing challenges in NLP Natural Language Understanding •Learning language

Natural Language Processing(Computational Linguistics)

Page 33: A Primer on Natural Language Processing...Natural Language Processing Main Current Research Challenges * Existing challenges in NLP Natural Language Understanding •Learning language

Applications of NLP

*

Page 34: A Primer on Natural Language Processing...Natural Language Processing Main Current Research Challenges * Existing challenges in NLP Natural Language Understanding •Learning language

Machine Translation

Page 35: A Primer on Natural Language Processing...Natural Language Processing Main Current Research Challenges * Existing challenges in NLP Natural Language Understanding •Learning language

Information Retrieval

Page 36: A Primer on Natural Language Processing...Natural Language Processing Main Current Research Challenges * Existing challenges in NLP Natural Language Understanding •Learning language

Document Summarisation

Page 37: A Primer on Natural Language Processing...Natural Language Processing Main Current Research Challenges * Existing challenges in NLP Natural Language Understanding •Learning language

Question Answering

Page 38: A Primer on Natural Language Processing...Natural Language Processing Main Current Research Challenges * Existing challenges in NLP Natural Language Understanding •Learning language

Plagiarism Detection

Page 39: A Primer on Natural Language Processing...Natural Language Processing Main Current Research Challenges * Existing challenges in NLP Natural Language Understanding •Learning language

Document Classification

Page 40: A Primer on Natural Language Processing...Natural Language Processing Main Current Research Challenges * Existing challenges in NLP Natural Language Understanding •Learning language

Spam Detection

Page 41: A Primer on Natural Language Processing...Natural Language Processing Main Current Research Challenges * Existing challenges in NLP Natural Language Understanding •Learning language

Fake News Detection

Page 42: A Primer on Natural Language Processing...Natural Language Processing Main Current Research Challenges * Existing challenges in NLP Natural Language Understanding •Learning language

Chatbots

Page 43: A Primer on Natural Language Processing...Natural Language Processing Main Current Research Challenges * Existing challenges in NLP Natural Language Understanding •Learning language

Social Media Analysis

Page 44: A Primer on Natural Language Processing...Natural Language Processing Main Current Research Challenges * Existing challenges in NLP Natural Language Understanding •Learning language

Sentiment Analysis

Page 45: A Primer on Natural Language Processing...Natural Language Processing Main Current Research Challenges * Existing challenges in NLP Natural Language Understanding •Learning language

Social Media Analysis

Page 46: A Primer on Natural Language Processing...Natural Language Processing Main Current Research Challenges * Existing challenges in NLP Natural Language Understanding •Learning language

Tip of the Tongue (ToT)

Reverse dictionary

Page 47: A Primer on Natural Language Processing...Natural Language Processing Main Current Research Challenges * Existing challenges in NLP Natural Language Understanding •Learning language

NLP and Deep Learning

Source: XenonStack

*

Page 48: A Primer on Natural Language Processing...Natural Language Processing Main Current Research Challenges * Existing challenges in NLP Natural Language Understanding •Learning language

NLP and Deep Learning

Word Sense Disambiguation

Page 49: A Primer on Natural Language Processing...Natural Language Processing Main Current Research Challenges * Existing challenges in NLP Natural Language Understanding •Learning language

NLP and Deep Learning

Word Sense Disambiguation

Conventional approach

Extract (hand-crafted) features:

• Surrounding words

• Part of speech tags

• Collocations

Page 50: A Primer on Natural Language Processing...Natural Language Processing Main Current Research Challenges * Existing challenges in NLP Natural Language Understanding •Learning language

NLP and Deep Learning

Word Sense Disambiguation

DL-based approach

• End-to-end model

• Input words, output classes

• No features involved

Figure from Kågebäck and Salomonsson (2016)

Page 51: A Primer on Natural Language Processing...Natural Language Processing Main Current Research Challenges * Existing challenges in NLP Natural Language Understanding •Learning language

NLP and Deep Learning

Sentence Similarity Measurement

Figure from Google AI blog

Page 52: A Primer on Natural Language Processing...Natural Language Processing Main Current Research Challenges * Existing challenges in NLP Natural Language Understanding •Learning language

NLP and Deep Learning

Sentence Similarity Measurement

Conventional approach

Extract features:

• String-based: if their words look similar (phone vs. telephone)

• Semantic: if their words have similar meanings (dozens of individual techniques)

• Style: ratio of function words, if they have overlapping numbers

• Phonetic: if they sound similar

• …

Page 53: A Primer on Natural Language Processing...Natural Language Processing Main Current Research Challenges * Existing challenges in NLP Natural Language Understanding •Learning language

NLP and Deep Learning

Sentence Similarity Measurement

DL-based approach

Figure from Mueller and Thyagarajan (2016)

Page 54: A Primer on Natural Language Processing...Natural Language Processing Main Current Research Challenges * Existing challenges in NLP Natural Language Understanding •Learning language

NLP and Deep Learning

Stance detection

Gibraltar source says the Iranian tanker Grace-1 will be allowed to leave

Agree: Iran says Britain might release seized Grace 1 oil tanker soon

Disagree: Iranian tanker continues to be detained by Gibraltar

Page 55: A Primer on Natural Language Processing...Natural Language Processing Main Current Research Challenges * Existing challenges in NLP Natural Language Understanding •Learning language

NLP and Deep Learning

Stance detection

Conventional approach

Extract (hand-crafted) features:

• Word overlaps

• Word frequencies

• Count features

• …

Page 56: A Primer on Natural Language Processing...Natural Language Processing Main Current Research Challenges * Existing challenges in NLP Natural Language Understanding •Learning language

NLP and Deep Learning

Stance detection

DL-based approachEnd-to-end

Page 57: A Primer on Natural Language Processing...Natural Language Processing Main Current Research Challenges * Existing challenges in NLP Natural Language Understanding •Learning language

NLP and Deep Learning

Word embeddings (2013)

Khatam pizza

desk

rain

Page 58: A Primer on Natural Language Processing...Natural Language Processing Main Current Research Challenges * Existing challenges in NLP Natural Language Understanding •Learning language

NLP and Deep Learning

Word embeddings (2013)

train

rail

station

passenger

railway

bus

terminal

transit

flower

fruit

treeseed

leaf

university

education

library

studies

Page 59: A Primer on Natural Language Processing...Natural Language Processing Main Current Research Challenges * Existing challenges in NLP Natural Language Understanding •Learning language

NLP and Deep Learning

Word embeddings (2013)

Page 60: A Primer on Natural Language Processing...Natural Language Processing Main Current Research Challenges * Existing challenges in NLP Natural Language Understanding •Learning language

NLP and Deep Learning

Word embeddings (2013)

Page 61: A Primer on Natural Language Processing...Natural Language Processing Main Current Research Challenges * Existing challenges in NLP Natural Language Understanding •Learning language

NLP and deep learning

Contextualised Models (since 2018)

A new turning point in NLP

Evolving very rapidly

2013 2014 2015 2016 2017 2018 2019 2020

Word2vec

GloVe

ELMo

GPTBERT

XLNet

ULMFit GPT-2

RoBERTa

Page 62: A Primer on Natural Language Processing...Natural Language Processing Main Current Research Challenges * Existing challenges in NLP Natural Language Understanding •Learning language

NLP and deep learning

Contextualised Models

One system for all tasks!

Page 63: A Primer on Natural Language Processing...Natural Language Processing Main Current Research Challenges * Existing challenges in NLP Natural Language Understanding •Learning language

Natural Language Processing

Main Current Research Challenges

*

Page 64: A Primer on Natural Language Processing...Natural Language Processing Main Current Research Challenges * Existing challenges in NLP Natural Language Understanding •Learning language

Existing challenges in NLP

Natural Language Understanding

• Learning language from the ground up

• Innate biases vs. learning from scratch

• Linguistics, cognitive and neuroscience aspects

• Reasoning

Page 65: A Primer on Natural Language Processing...Natural Language Processing Main Current Research Challenges * Existing challenges in NLP Natural Language Understanding •Learning language

Existing challenges in NLP

NLP for low-resource languages

• Lack of data, for training and for evaluation

• Incentives

• Universal language models

• Cross-lingual representations

Page 66: A Primer on Natural Language Processing...Natural Language Processing Main Current Research Challenges * Existing challenges in NLP Natural Language Understanding •Learning language

Existing challenges in NLP

Reasoning at scale

Current NLP is unable to analyze large or multiple documents

A challenging task:

• NarrativeQA: questions about entire movie scripts and books

Page 67: A Primer on Natural Language Processing...Natural Language Processing Main Current Research Challenges * Existing challenges in NLP Natural Language Understanding •Learning language

Existing challenges in NLP

Evaluation

Current evaluation benchmarks and performance metrics often themselves need re-evaluation!

• Machine Translation

• Dialogue

• Language Generation

Page 68: A Primer on Natural Language Processing...Natural Language Processing Main Current Research Challenges * Existing challenges in NLP Natural Language Understanding •Learning language

Language Modeling

Language Model

Page 69: A Primer on Natural Language Processing...Natural Language Processing Main Current Research Challenges * Existing challenges in NLP Natural Language Understanding •Learning language

Language Modeling

Language Model

Page 70: A Primer on Natural Language Processing...Natural Language Processing Main Current Research Challenges * Existing challenges in NLP Natural Language Understanding •Learning language

Language ModelingPersian poetry

https://www.darbare.com/Post/30084

مثنوی مولویشاهنامه فردوسی

Page 71: A Primer on Natural Language Processing...Natural Language Processing Main Current Research Challenges * Existing challenges in NLP Natural Language Understanding •Learning language

Language ModelingWikipedia articles

http://karpathy.github.io

Page 72: A Primer on Natural Language Processing...Natural Language Processing Main Current Research Challenges * Existing challenges in NLP Natural Language Understanding •Learning language

Language ModelingWikipedia articles

http://karpathy.github.io

Page 73: A Primer on Natural Language Processing...Natural Language Processing Main Current Research Challenges * Existing challenges in NLP Natural Language Understanding •Learning language

Language ModelingXML

http://karpathy.github.io

Page 74: A Primer on Natural Language Processing...Natural Language Processing Main Current Research Challenges * Existing challenges in NLP Natural Language Understanding •Learning language

Language ModelingScientific article

Page 75: A Primer on Natural Language Processing...Natural Language Processing Main Current Research Challenges * Existing challenges in NLP Natural Language Understanding •Learning language

Generative modelsSunspring

Page 76: A Primer on Natural Language Processing...Natural Language Processing Main Current Research Challenges * Existing challenges in NLP Natural Language Understanding •Learning language

Generative modelsSunspring

Page 77: A Primer on Natural Language Processing...Natural Language Processing Main Current Research Challenges * Existing challenges in NLP Natural Language Understanding •Learning language

Thanks

Up next:

Michael Zock on Tip of the tongue problem!

MZ is not a computer scientist, but a psycholinguist working (now for decades) on languageproduction.

His goal lies in the building of computational tools to help people to speak and to write be it themother tongue, or a foreign language. To achieve his goal he relies on knowledge from psychology(psycholinguistics + neuroscience) and engineering skills (NLP).

Those who are interested in more details may take a look at his website:

http://pageperso.lif.univ-mrs.fr/~michael.zock/