55
Proseminar Natural Language Processing and the Web: Introduction Summer Semester 2017 Instructor: Michael Wiegand Spoken Language Systems Saarland University, Germany

Proseminar Natural Language Processing and the Web ... · Proseminar Natural Language Processing and the Web: ... excuse for only one session! ... •Link to detailed guide how to

  • Upload
    lykhue

  • View
    224

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Proseminar Natural Language Processing and the Web ... · Proseminar Natural Language Processing and the Web: ... excuse for only one session! ... •Link to detailed guide how to

Proseminar Natural Language

Processing and the Web:

Introduction Summer Semester 2017

Instructor: Michael Wiegand

Spoken Language Systems

Saarland University, Germany

Page 2: Proseminar Natural Language Processing and the Web ... · Proseminar Natural Language Processing and the Web: ... excuse for only one session! ... •Link to detailed guide how to

2

Before we start:

Do not forget to register for the course

(link now available via course webpage;

this is not the LSF-registration)

The deadline is 21st April (tomorrow!)

Page 3: Proseminar Natural Language Processing and the Web ... · Proseminar Natural Language Processing and the Web: ... excuse for only one session! ... •Link to detailed guide how to

3

Outline

• The different relationships between NLP

and the Web

• Using the Web to solve NLP-tasks

• Challenges for NLP-methods on the Web

• Using NLP to solve Web-specific problems

• Administrative matters

Page 4: Proseminar Natural Language Processing and the Web ... · Proseminar Natural Language Processing and the Web: ... excuse for only one session! ... •Link to detailed guide how to

4

Outline

• The different relationships between NLP

and the Web

• Using the Web to solve NLP-tasks

• Challenges for NLP-methods on the Web

• Using NLP to solve Web-specific problems

• Administrative matters

Page 5: Proseminar Natural Language Processing and the Web ... · Proseminar Natural Language Processing and the Web: ... excuse for only one session! ... •Link to detailed guide how to

5

Outline

• The different relationships between NLP

and the Web

• Using the Web to solve NLP-tasks

• Challenges for NLP-methods on the Web

• Using NLP to solve Web-specific problems

• Administrative matters

Page 6: Proseminar Natural Language Processing and the Web ... · Proseminar Natural Language Processing and the Web: ... excuse for only one session! ... •Link to detailed guide how to

6

Properties of the World Wide Web

• The Web is huge.

• The Web is constantly increasing in size.

• It also contains (linguistic) knowledge that is

freely accessible.

• Existing (non-web) corpora are comparably

sparse.

• Traditional corpora are missing important

aspects of natural language (e.g. informal

language).

• NLP should consider the Web as source data

for building applications.

Page 7: Proseminar Natural Language Processing and the Web ... · Proseminar Natural Language Processing and the Web: ... excuse for only one session! ... •Link to detailed guide how to

7

Size of the Web

Page 8: Proseminar Natural Language Processing and the Web ... · Proseminar Natural Language Processing and the Web: ... excuse for only one session! ... •Link to detailed guide how to

8

Web as a Resource

Page 9: Proseminar Natural Language Processing and the Web ... · Proseminar Natural Language Processing and the Web: ... excuse for only one session! ... •Link to detailed guide how to

9

Wikipedia as a means for Word Sense

Disambiguation

Page 10: Proseminar Natural Language Processing and the Web ... · Proseminar Natural Language Processing and the Web: ... excuse for only one session! ... •Link to detailed guide how to

10

Wikipedia as a means for Word Sense

Disambiguation

• Consider articles for different

concepts as sense

definitions.

• Use this proxy for sense

definitions for supervised

learning

Page 11: Proseminar Natural Language Processing and the Web ... · Proseminar Natural Language Processing and the Web: ... excuse for only one session! ... •Link to detailed guide how to

11

Wiktionary

Page 12: Proseminar Natural Language Processing and the Web ... · Proseminar Natural Language Processing and the Web: ... excuse for only one session! ... •Link to detailed guide how to

12

Wiktionary

• Multilingual web-based

dictionary

• Collaboratively edited

• Contains entries

representing informal

language

Page 13: Proseminar Natural Language Processing and the Web ... · Proseminar Natural Language Processing and the Web: ... excuse for only one session! ... •Link to detailed guide how to

13

Crosslingual NLP

?

+ +

Page 14: Proseminar Natural Language Processing and the Web ... · Proseminar Natural Language Processing and the Web: ... excuse for only one session! ... •Link to detailed guide how to

14

Crosslingual NLP

?

+ +

Typical situation:

• test data in language A (here: German)

• training data in language B (here: English)

Page 15: Proseminar Natural Language Processing and the Web ... · Proseminar Natural Language Processing and the Web: ... excuse for only one session! ... •Link to detailed guide how to

15

Crosslingual NLP

?

+ +

Page 16: Proseminar Natural Language Processing and the Web ... · Proseminar Natural Language Processing and the Web: ... excuse for only one session! ... •Link to detailed guide how to

16

Crosslingual NLP

?

+ Classifier

Trained on

English

data

+

Page 17: Proseminar Natural Language Processing and the Web ... · Proseminar Natural Language Processing and the Web: ... excuse for only one session! ... •Link to detailed guide how to

17

Crosslingual NLP

?

+ Classifier

Trained on

English

data

+

Train a classifier on the training data

in language B

Page 18: Proseminar Natural Language Processing and the Web ... · Proseminar Natural Language Processing and the Web: ... excuse for only one session! ... •Link to detailed guide how to

18

Crosslingual NLP

?

+ Classifier

Trained on

English

data

+

Page 19: Proseminar Natural Language Processing and the Web ... · Proseminar Natural Language Processing and the Web: ... excuse for only one session! ... •Link to detailed guide how to

19

Crosslingual NLP

?

+ Classifier

Trained on

English

data

+

Page 20: Proseminar Natural Language Processing and the Web ... · Proseminar Natural Language Processing and the Web: ... excuse for only one session! ... •Link to detailed guide how to

20

Crosslingual NLP

?

+ Classifier

Trained on

English

data

?

+

Translate test data from language A

to language B.

Page 21: Proseminar Natural Language Processing and the Web ... · Proseminar Natural Language Processing and the Web: ... excuse for only one session! ... •Link to detailed guide how to

21

Crosslingual NLP

?

+ Classifier

Trained on

English

data

?

+

Now you can use the classifier in

language B.

Page 22: Proseminar Natural Language Processing and the Web ... · Proseminar Natural Language Processing and the Web: ... excuse for only one session! ... •Link to detailed guide how to

22

Crosslingual NLP

?

+ Classifier

Trained on

English

data

?

+

Page 23: Proseminar Natural Language Processing and the Web ... · Proseminar Natural Language Processing and the Web: ... excuse for only one session! ... •Link to detailed guide how to

23

Crosslingual NLP

?

+ Classifier

Trained on

English

data

?

+

Page 24: Proseminar Natural Language Processing and the Web ... · Proseminar Natural Language Processing and the Web: ... excuse for only one session! ... •Link to detailed guide how to

24

Crosslingual NLP

?

+ Classifier

Trained on

English

data

?

+

Page 25: Proseminar Natural Language Processing and the Web ... · Proseminar Natural Language Processing and the Web: ... excuse for only one session! ... •Link to detailed guide how to

25

Crosslingual NLP

?

+ Classifier

Trained on

English

data

+

+

Page 26: Proseminar Natural Language Processing and the Web ... · Proseminar Natural Language Processing and the Web: ... excuse for only one session! ... •Link to detailed guide how to

26

Twitter as a Source for Text

Classification

Page 27: Proseminar Natural Language Processing and the Web ... · Proseminar Natural Language Processing and the Web: ... excuse for only one session! ... •Link to detailed guide how to

27

Twitter as a Source for Text

Classification

• Lots of informal and

subjective language.

• Learn from twitter

classifiers, for instance, an

emotion classifier.

Page 28: Proseminar Natural Language Processing and the Web ... · Proseminar Natural Language Processing and the Web: ... excuse for only one session! ... •Link to detailed guide how to

28

Emotion Classification

Page 29: Proseminar Natural Language Processing and the Web ... · Proseminar Natural Language Processing and the Web: ... excuse for only one session! ... •Link to detailed guide how to

29

Crowdsourcing

• Portmanteau of crowd and outsourcing.

• Individuals/organizations use

contributions from Web users to obtain

services, such as manual annotation.

• Typical services are:

• Amazon Mechanical Turk (AMT)

• Prolific Academic

• CrowdFlower

Page 30: Proseminar Natural Language Processing and the Web ... · Proseminar Natural Language Processing and the Web: ... excuse for only one session! ... •Link to detailed guide how to

30

Outline

• The different relationships between NLP

and the Web

• Using the Web to solve NLP-tasks

• Challenges for NLP-methods on the Web

• Using NLP to solve Web-specific problems

• Administrative matters

Page 31: Proseminar Natural Language Processing and the Web ... · Proseminar Natural Language Processing and the Web: ... excuse for only one session! ... •Link to detailed guide how to

31

Challenges for NLP-methods on the

Web

• NLP is dependent on tools, such as

• Part-of-speech tagging

• Syntactic parsing

• Semantic role labeling

• Most publicly available NLP tools are

typically trained on news corpora.

• The language on the Web may differ

significantly from the language contained

in news corpora.

Page 32: Proseminar Natural Language Processing and the Web ... · Proseminar Natural Language Processing and the Web: ... excuse for only one session! ... •Link to detailed guide how to

32

Language Examples from the Web

• Boom! Ya ur website suxx bro

(from Eisenstein (2013))

• ikr smh he asked fir yo last name so he

can add u on fb lololol

(from Owoputi et al. (2013))

Page 33: Proseminar Natural Language Processing and the Web ... · Proseminar Natural Language Processing and the Web: ... excuse for only one session! ... •Link to detailed guide how to

33

Language Examples from the Web

• Boom! Ya ur website suxx bro

(from Eisenstein (2013))

• ikr smh he asked fir yo last name so he

can add u on fb lololol

(from Owoputi et al. (2013))

Traditional NLP tools will utterly fail on this

type of language new tools are needed.

Page 34: Proseminar Natural Language Processing and the Web ... · Proseminar Natural Language Processing and the Web: ... excuse for only one session! ... •Link to detailed guide how to

34

Outline

• The different relationships between NLP

and the Web

• Using the Web to solve NLP-tasks

• Challenges for NLP-methods on the Web

• Using NLP to solve Web-specific problems

• Administrative matters

Page 35: Proseminar Natural Language Processing and the Web ... · Proseminar Natural Language Processing and the Web: ... excuse for only one session! ... •Link to detailed guide how to

35

Deception Detection

Page 36: Proseminar Natural Language Processing and the Web ... · Proseminar Natural Language Processing and the Web: ... excuse for only one session! ... •Link to detailed guide how to

36

Deception Detection

• Many hotel reviews are

fakes (e.g. praising a bad

hotel).

• Use NLP to filter this

opinion spam.

Page 37: Proseminar Natural Language Processing and the Web ... · Proseminar Natural Language Processing and the Web: ... excuse for only one session! ... •Link to detailed guide how to

37

Hate Speech

Page 38: Proseminar Natural Language Processing and the Web ... · Proseminar Natural Language Processing and the Web: ... excuse for only one session! ... •Link to detailed guide how to

38

Hate Speech

• Many social media platforms need

to take action against posts

representing hate speech.

• NLP may be suitable method to

automatically detect such posts.

Page 39: Proseminar Natural Language Processing and the Web ... · Proseminar Natural Language Processing and the Web: ... excuse for only one session! ... •Link to detailed guide how to

39

Outline

• The different relationships between NLP

and the Web

• Using the Web to solve NLP-tasks

• Challenges for NLP-methods on the Web

• Using NLP to solve Web-specific problems

• Administrative matters

Page 40: Proseminar Natural Language Processing and the Web ... · Proseminar Natural Language Processing and the Web: ... excuse for only one session! ... •Link to detailed guide how to

40

Aims of this Course

• Learn something about NLP for the web.

• … from a linguistic point of view.

• Learn to read and understand scientific

literature.

• Learn to present a paper.

• Learn to write a term paper.

Page 41: Proseminar Natural Language Processing and the Web ... · Proseminar Natural Language Processing and the Web: ... excuse for only one session! ... •Link to detailed guide how to

41

Aims of this Course

• Jump in at the deep end:

• No textbook.

• We will read research publications.

• Difference to textbooks: • Condensed language

• Terminology may vary.

• Basic research (task/evaluation may be a bit abstract)

• There may be things you will have to look up yourself.

• Establish connections to between different papers

yourself.

Page 42: Proseminar Natural Language Processing and the Web ... · Proseminar Natural Language Processing and the Web: ... excuse for only one session! ... •Link to detailed guide how to

42

How is this course organized?

• Short introduction (given by instructor):

• this week

• next week

• Student presentations:

• From the third week onwards:

every week one presentation

Page 43: Proseminar Natural Language Processing and the Web ... · Proseminar Natural Language Processing and the Web: ... excuse for only one session! ... •Link to detailed guide how to

43

Admission Policy

• We only have space for about 10

students.

• We may have more registrations.

• BSc students (from CoLi) are given

priority!

• This course will be held in German!

Page 44: Proseminar Natural Language Processing and the Web ... · Proseminar Natural Language Processing and the Web: ... excuse for only one session! ... •Link to detailed guide how to

44

Requirements for Participation

• You should have attended the lectures: • Einführung in die Computerlinguistik

• Mathematische Grundlagen III

• Some basic knowledge on linguistics

(mostly syntax and semantics).

• Some basic knowledge on machine

learning and evaluation we will have a

very brief recap on that next week.

Page 45: Proseminar Natural Language Processing and the Web ... · Proseminar Natural Language Processing and the Web: ... excuse for only one session! ... •Link to detailed guide how to

45

Requirements for Passing the Course

• Oral presentation

• Term paper

• Reviewing a paper

• Constant active participation

• You are allowed to be absent without

excuse for only one session!

• Oral exam (on demand)

Page 46: Proseminar Natural Language Processing and the Web ... · Proseminar Natural Language Processing and the Web: ... excuse for only one session! ... •Link to detailed guide how to

46

Oral Presentation

• Length: up to 45 minutes (excluding discussion)

• In German or English

• Based on one paper (see course webpage).

• You are welcome to include more related papers!

• We have a discussion after each oral presentation:

• You will have to answer questions!

• Make sure you really understood the problem

discussed in your paper!

• Next week, I will give a few useful hints how to do

a presentation.

Page 47: Proseminar Natural Language Processing and the Web ... · Proseminar Natural Language Processing and the Web: ... excuse for only one session! ... •Link to detailed guide how to

47

Oral Presentation

• We offer every student an additional

(individual) meeting prior to the oral

presentation:

• Discuss open questions.

• Feedback on presentation draft

• Arrangement of appointment:

• Send in draft at least 8 days prior to your

presentation!

Page 48: Proseminar Natural Language Processing and the Web ... · Proseminar Natural Language Processing and the Web: ... excuse for only one session! ... •Link to detailed guide how to

48

Term Paper (Hausarbeit)

• Write an essay on the paper you orally

presented.

• Length: 10-15 pages (tentative!)

• In German or English

• Link to detailed guide how to write a term

paper on the course webpage (this guide

is binding for this seminar!)

• Use LaTeX (template on webpage)!

• Deadline for term paper: 30th August

Page 49: Proseminar Natural Language Processing and the Web ... · Proseminar Natural Language Processing and the Web: ... excuse for only one session! ... •Link to detailed guide how to

49

Reviewing a Paper

• Review a paper that you do not orally present.

• You are given the main responsibility of the

discussion session after the student

presentation of that paper:

• Ask questions about the paper.

• Get involved into the discussion.

• Show us that you have understood the paper

and thought about the problem/task presented.

Page 50: Proseminar Natural Language Processing and the Web ... · Proseminar Natural Language Processing and the Web: ... excuse for only one session! ... •Link to detailed guide how to

50

Oral Exams (CoLi only!)

• We offer oral exams for this course.

• Remember: passing 3 oral exams is one

requirement for the CoLi-BSc (alte

Studienordnung)!

• Topic of exam: another paper from the seminar

which you did not orally present.

• Length of exam: 15 minutes

• Please notify instructors in case you want to

have an exam by April 30th.

Page 51: Proseminar Natural Language Processing and the Web ... · Proseminar Natural Language Processing and the Web: ... excuse for only one session! ... •Link to detailed guide how to

51

How is the final mark computed?

• Weighting: • 40% oral presentation

• 40% term paper

• 20% reviewing

• In case oral exam is requested: • 10% oral exam

• 90% everything else with internal weighting from

above

• If the final „score“ is between two grades,

course participation will be considered.

Page 52: Proseminar Natural Language Processing and the Web ... · Proseminar Natural Language Processing and the Web: ... excuse for only one session! ... •Link to detailed guide how to

52

Timeline

• Next session:

• Recap on machine learning and evaluation

• How to present a paper

• From then onwards:

• One student presentation per week.

• Schedule on webpage.

Page 53: Proseminar Natural Language Processing and the Web ... · Proseminar Natural Language Processing and the Web: ... excuse for only one session! ... •Link to detailed guide how to

53

What to do next?

• Register for the course as soon as

possible.

• Tell me which paper you want to present.

• Give a list of 5 papers sorted by priority

• deadline: 22nd April

Page 54: Proseminar Natural Language Processing and the Web ... · Proseminar Natural Language Processing and the Web: ... excuse for only one session! ... •Link to detailed guide how to

54

If you have further questions

• Ask now!

• Another good point in time: at the end of

each session.

• Don‘t try to discuss complex issues via

email!

• Ask for an appointment instead.

Page 55: Proseminar Natural Language Processing and the Web ... · Proseminar Natural Language Processing and the Web: ... excuse for only one session! ... •Link to detailed guide how to

55

References

• [Eisenstein, 2013]:

What to do about bad language on the internet.

J. Eisenstein. Proceedings of NAACL, 2013.

• [Owoputi, 2013]:

Improved Part-of-Speech Tagging for Online Conversational Text with Word Clusters.

O.Owoputi, B. O’Connor, C. Dyer, K. Gimpel, N. Schneider, N. A. Smith. Proceedings of NAACL, 2013.