Lecture 1: Introduction to NLP - Search for courses

LECTURE 1: INTRODUCTION TO NLP

Mark Granroth-Wilding

PROCESSING NATURAL LANGUAGE

1

PROCESSING NATURAL LANGUAGE

• Getting information out of language

Example

Who made the first electric guitar?

• Query: string of characters

• How to represent meaning in suitable form to find answer?

• How to get from text to that?

• What sort of processes are needed?

• What does system need to know about language?

2

COURSE STRUCTURE

Each week:

Mon 10:15–12:00 Lecture Exactum, D122Thurs 9:15–11:00 Assignment help session Exactum, B221Fri 10:15–12:00 Lecture Chemicum A127

• Lectures: compulsory

• Assignment session: optional

3

COURSE OUTLINE

Date Topic1 13.1 Introduction to NLP2 17.1 NLU pipeline and toolkits

3 20.1 Evaluation4 24.1 Meaning and representations; FS methods

5 27.1 FS methods; statistical NLP6 31.1 Syntax & parsing

7 3.2 Syntax & parsing8 7.2 Lexical & distributional semantics

MarkGranroth-Wilding

LeoLeppanen

LidiaPivovarova

4

COURSE OUTLINE

Date Topic9 10.2 Vector-space models

10 14.2 NLG subtasks & pipeline

11 17.2 NLG evaluation; discourse12 21.2 Information extraction

13 24.2 Advanced statistical NLP; formal semantics14 28.2 Semantics and pragmatics; the future

MarkGranroth-Wilding

LeoLeppanen

LidiaPivovarova

5

ASSIGNMENTS

• Practical programming assignments

• One each week: released on Mon

• Due following Mon

• TAs available to help: Thursday session

• Includes:• Python programming• Use of NLP tools• Implementing algorithms & statistical models from lectures• Analysis of system output/behaviour• Relation to theory from lectures• Consideration of method uses, limitations, . . .

6

ASSIGNMENTS

• Submit using Moodle

• Important part of learning: compulsory• Help session optional• Opportunity to ask questions, get help

• Submit code• Clean, readable• Not marked – for reference only

• Submit answers• Marked on 0-5 scale• Overall mark: average

7

FINAL PROJECT

• No exam

• Graded final project submitted after course

• Task: extend one of assignments• Small extension – improvement, application, . . .• Suggestions provided with assignments

• Submit code and short report (2-3 pages)• What you did• Why?• Did it work? How do you know?• What would you do next?

• Due 1 week after course

8

TEACHING ASSISTANTS

Assistance with assignments will be provided by:

Mark Leo Lidia

Eliel Khalid Elaine

9

ASSESSMENT

• Requirements to pass the course• Attend all lectures

Speak to me ifproblematic

• Attempt all assignments: >60% mark• Submit final project report: >60% mark

• We don’t expect state-of-the art, amazing systems!

• We do expect you to• try everything• show understanding of lecture content

10

COURSE MATERIALS

Course homepage: https://g-w.fi/nlp2020

• Lecture slides• Further reading recommendations: end of lectures

• Assignment instructions & data

• Moodle

11

https://g-w.fi/nlp2020

MOODLE

12

MOODLE FORUM

• Feel free to post

• Discusss assignments / lecture content

• Help other students

13

READING MATERIAL

• Provided at end of each lecture

• Not expected to read everything

• Further explanations of material

• More details

• Further reading to delve deeper

14

READING MATERIAL

Main course textbook:Speech and Language Processing Jurafsky & Martin, 2nd ed.New draft: https://web.stanford.edu/ jurafsky/slp3/

References to online draft where possible: J&M3Print version, 2nd edition: J&M2

15

https://web.stanford.edu/~jurafsky/slp3/

READING MATERIAL

Foundations of Statistical NLP Manning & Schutze, 1999.Good reference for statistical topics

NLP with Python, ‘The NLTK book’ Bird, Klein & Loper.https://www.nltk.org/book/

Natural Language Processing Eisenstein.https://tinyurl.com/eisenstein-nlp

Linguistic Fundamentals of NLP Bender, 2013.http://tinyurl.com/bender-nlp

16

https://www.nltk.org/book/

https://tinyurl.com/eisenstein-nlp

http://tinyurl.com/bender-nlp

PRE-COURSE QUESTIONNAIRE

https://presemo.helsinki.fi/nlp2020

• Quick questionnaire

• Your familiarity with topics

• Not a test! Anonymous

• Fine if all answers are 1! We’ll learn about everything

Rating Meaning1 Never heard of it2 Name familiar3 Basic familiarity (not much detail)4 Studied/read about before5 Studied in detail

Same link:ask questions during lecture

17

https://presemo.helsinki.fi/nlp2020

WHY NLP?

Why do we need computers to understand(or generate) human language?

• People expect interactive agents tocommunicate in NL

• E.g. dialogue systems

• Huge knowledge encoded in language• Hard to find: requires NLP• Automatic processing central to AI:

knowledge acquisition bottleneck• Information extraction (more later)

18

WHY NLP?

• Search• corpora, libraries, medical datasets, ....

• Computational models of human processing

• Tools for studying:• language (corpus linguistics)• sociology• history...

• Analysing human behaviour

• And much more!

19

WHY EVEN SIMPLE NLP IS HARD

Example

What is the forecast mean daytime temperature for Kumpulatomorrow?

• Simple: answer in a database!

• No reasoning/computation: just query

Query

SELECT day_mean FROM daily_forecast

WHERE station = ’Helsinki Kumpula’

AND date = ’2019-05-21’;

20


What is the forecast mean daytimetemperature for Kumpula tomorrow?



AND date = ’2019-05-21’;

What temperature is predicted inKumpula during the day tomorrow?

How hot will it be in Arabia tomorrow?

• Many ways to say the same thing

21


What is the forecast mean daytimetemperature for Kumpula tomorrow?



AND date = ’2019-05-21’;

What is the forecast mean salary forKumpula tomorrow? ?

What is the forecast mean salary forCEOs tomorrow? ?

• Similar utterances mean very different things

22


What is the mean temperature inKumpula?

SELECT day_mean FROM daily_forecast:

WHERE station = ’Helsinki Kumpula’:

AND date = ’2019-05-21’;

SELECT day_mean FROM weekly_forecast


AND week = ’w22’;

SELECT MEAN(day_temp)

FROM weather_history


AND year = ’2019’;

. . .?

• Ambiguity

• Many forms

• Every level/step of analysis

• The big challenge of NLP

23

EXERCISEIn small groups

• Look at sentences below

• Assume you:• are a computer• have database of logical/factual world knowledge• have lots of rules/statistics about English

• What steps are involved in:• analysing this textual input?• extracting & encoding relevant information?• answering the question?

A robotic co-pilot developed underDARPA’s ALIAS programme hasalready flown a light aircraft.

What agency has created acomputer that can pilot a plane?

24

NATURAL LANGUAGE PROCESSING

LanguageText

Speech

Knowledgerepresentation

NLU

NLG

Natural Language Understanding (NLU)Natural Language Generation (NLG)

• Mostly different models/algorithms

• Some sharing possible

• Knowledge/meaning repr: depends on applicationMore in lecture 4

• This course: mostly NLU

• NLG : lectures 10 and 11

25

MACHINE TRANSLATION

Language 1text

Language 2text

Knowledgerepresentation

NLU NLG

Not the standard approach

26

MACHINE TRANSLATION

Language 1text

Language 2text

Intralingua

NLU NLG

Direct translation

Phrases, syntax,semantics, . . .

MT pyramidVariety of approaches translate at different levels

Large field: no more detail here. Plenty of courses available!

27

STEPS OF NLU

Task: given sentence, get some representation computer can usefor question answering

John loves Mary

1. Divide into words (by spaces)

2. Identify John and Mary as names

3. Recognise main relation loves

4. Identify John as agent, Mary as patient

Potentially tricky

∼100k in English...

Syntactic rules

28

STEPS OF NLU

The number of moped-related crimes rose from 827 in 2012 tomore than 23,000 last year.

Extra difficulties:

• More difficult to segment words

• Varied vocabulary

• More complex syntax

• More complex meaning structure

• Vagueness/ambiguity in meaning

29

STEPS OF NLU

The list of people unhappy with this decision is decidedly longerand more comprehensive than the list that support the move.

• Reference to earlier context

• Repeated references

• Actual meaning requires inference!

And then... (more tomorrow)

• Ambiguity

• Noise

• Disfluency

• Multiple languages

30

LESS NATURAL LANGUAGE?

• Why not just write a more natural query language?

Query

Give me the daily mean from the forecast data

for the station called ‘Helsinki Kumpula’

for 21.5.2019.

• Still have to learn specialized language to interact

• Not good for non-expert users

• Natural interaction requires natural language

• Not just interaction• Extraction of information from (existing) text

31

A BRIEF HISTORY OF NLP

1600s Discussion of machine translation (MT), theoretical!

1930s Early proposals for MT using dictionaries

1950 Alan Turing: proposed ‘Turing Test’, depends on NLP

1954 Georgetown-IBM experiment:simple closed-domain MT, some grammatical rules

1957 Noam Chomsky: Syntactic Structuresformal grammars, NLP becomes computable!

33


1960s-70s Algorithms for parsing, semantic reasoningFormal representations of syntax, semantics, logicHand-written rules

1964 ELIZA: simple dialogue system

1970 SHRDLU: narrow-domain system with NL commands

1970 Augmented Transition Networksautomata for parsing text

1980s More sophisticated parsing, semantics, reasoningApplications!

34


1990s Statistical methodsStatistical models for sub-tasks

1987 Probabilistic n-gram language models

1996 MT: IBM statistical, word-based models

1997 Parsing:Probabilistic context-free grammars (PCFGS)

∼1998 Distributional semantics (simeq word embeddings)

1999 Probabilistic (unsupervised) topic models

35


2010s More computing power, more data, more statisticsDeep learning, neural networks, Bayesian models, ...

2013 word2vec: word embeddings from lots of data

2014 RNNs for MT

2015 RNNs for NLG

And so on...

36

WORD FREQUENCIES

Type Tokens, 11 341the 5 792I 5 087and 4 708. . .unhappy 5resolve 5murderers 5. . .overwhelm 1lamented 1insufficient 1

37

WORD FREQUENCIES

0 20 40 60 80 1000

2000

4000

6000

8000

10000

0 100 200 300 400 500

0

2000

4000

6000

8000

10000

0 250 500 750 1000 1250 1500 1750 2000

0

2000

4000

6000

8000

10000

0 1000 2000 3000 4000 5000 6000 7000

0

2000

4000

6000

8000

10000

100 101 102 103 104

100

101

102

103

104Most frequent:

,

Next:the

‘Long tail’

Log-log scale

38

ZIPF’S LAW

0 1000 2000 3000 4000 5000 6000 7000

0

2000

4000

6000

8000

10000

• Inverse log-log distribution of frequencies

• Power law

• Almost any linguistic phenomenon

• Zipf’s law / Zipfian distribution• A few things are very common• Many things are very rare (long tail)

• Many levels of linguistic analysis

39

ZIPF’S LAW

0 1000 2000 3000 4000 5000 6000 7000

0

2000

4000

6000

8000

10000

Difficulty: rare things often contain most information informationthe, a, for Frequent, little informationlamented, insufficient Rare, informative

40

ERRORS

• No tool is perfect!

• Language ambiguous, variable, noisy: any system

Or human

makes errors

• Often not a problem!• Corpus-wide statistical analysis• Some level of error OK• if reasonably randomly distributed

• Some errors may be more problematic, e.g.:• Affect meaning in important ways• Consistent across many analyses

41

ERRORS

• Quantitative evaluation important: lecture 3

• Error analysis: some errors worse than others

• Understand your tool’s weaknesses!• Zipf’s law: common things easy• Easily overrepresented in evaluation• Harder, rarer phenomena more important

42

CORPORA

Corpus (pl. corpora)

The body of written or spoken material upon which a linguisticanalysis is based

• Why do we need corpora?• Test linguistic hypotheses

Not so muchon this course

• Evaluate tools: annotated/labelled corpusEvaluation:

L3• Train statistical models

Statistical NLP:L5 and 13• Most often: collection of text

• Other types: speech (audio), video, numeric data, ...• Often combinations

43

STATISTICAL MODELS

Why use statistical models for NLP?

• Older systems were rule-based

• Used long-studied linguistic knowledge, but:• Lots of rules• Complex interactions• Narrow domain• Hard to handle varied (“incorrect”) and changing language

• Hard to quantify uncertainty/ambiguity of analysis

• Statistics from data can help

44

SUMMARY

• Why do NLP?

• Some challenges of NLP

• NLU and NLG

• History

• Zipf’s law

• Breaking into subtasks: pipeline

• Corpora and statistical models

Tomorrow:

• Some current bigchallenges

• The NLU pipeline

45

READING MATERIAL

• J&M2, p. 35-43

• Eisenstein, Introduction (p. 1-10)

46

https://tinyurl.com/eisenstein-nlp

ASSIGNMENTS

• Released today, due next MonLinks from Moodle

• TAs available to help: Thursday session

• Get started now, come and ask questions

• Submit code• Clean, readable• Not marked – for reference only

• Submit answers• Marked on 0-5 scale• Overall mark: average

47

THIS WEEK’S ASSIGNMENT

• Python 3 refresher/intro• Quick for those with much experience!

• Basic use of Natural Language Toolkit (NLTK) and others

• Working with linguistic data/tools

• Groundwork for next week: pipeline implementation

48

Documents

Lecture 1: Introduction to NLP - Search for courses