33
Introduction to cl This course Introduction to Finite State Morphology (fsm) Computational Linguistics Session 1 Annette Hautli October 26th, 2012 1 / 33

Introduction to This course Introduction to Finite State ...ling.uni-konstanz.de/pages/home/hautli/teaching/FSM-1213/intro.pdf · This course Introduction to Finite State Morphology

Embed Size (px)

Citation preview

Page 1: Introduction to This course Introduction to Finite State ...ling.uni-konstanz.de/pages/home/hautli/teaching/FSM-1213/intro.pdf · This course Introduction to Finite State Morphology

Introduction to clThis course

Introduction to Finite State Morphology (fsm)

Computational Linguistics

Session 1

Annette Hautli

October 26th, 2012

1 / 33

Page 2: Introduction to This course Introduction to Finite State ...ling.uni-konstanz.de/pages/home/hautli/teaching/FSM-1213/intro.pdf · This course Introduction to Finite State Morphology

Introduction to clThis course

Introduction to Finite State Morphology (fsm)

Outline

1 Introduction to cl

2 This course

3 Introduction to Finite State Morphology (fsm)

2 / 33

Page 3: Introduction to This course Introduction to Finite State ...ling.uni-konstanz.de/pages/home/hautli/teaching/FSM-1213/intro.pdf · This course Introduction to Finite State Morphology

Introduction to clThis course

Introduction to Finite State Morphology (fsm)

Outline

1 Introduction to cl

2 This course

3 Introduction to Finite State Morphology (fsm)

3 / 33

Page 4: Introduction to This course Introduction to Finite State ...ling.uni-konstanz.de/pages/home/hautli/teaching/FSM-1213/intro.pdf · This course Introduction to Finite State Morphology

Introduction to clThis course

Introduction to Finite State Morphology (fsm)

What is cl?

Computational Linguistics (mt)

cl is the science of making computers understand naturallanguage. Computational linguists are interested in providingcomputational models for various kinds of linguistic phenomena.These models may be knowledge-based (“hand-crafted”) ordata-driven (“statistical” or “empirical”).

4 / 33

Page 5: Introduction to This course Introduction to Finite State ...ling.uni-konstanz.de/pages/home/hautli/teaching/FSM-1213/intro.pdf · This course Introduction to Finite State Morphology

Introduction to clThis course

Introduction to Finite State Morphology (fsm)

Challenges for cl

1. Categorial ambiguity: A word can belong to more than onesyntactic category.

Task #1

Where do problems with categorial ambiguity arise?

(1) a. I saw her duck.b. Time flies like an arrow.c. Cleaning fluids can be dangerous.

5 / 33

Page 6: Introduction to This course Introduction to Finite State ...ling.uni-konstanz.de/pages/home/hautli/teaching/FSM-1213/intro.pdf · This course Introduction to Finite State Morphology

Introduction to clThis course

Introduction to Finite State Morphology (fsm)

Challenges for cl

2. Syntacitc ambiguity: whole phrases, typically prepositionalphrases, attach to more than one position in a sentence(attachment ambiguity).

Task #2

Where do you detect syntactic ambiguities? Draw trees for each ofthe interpretations.

(2) a. Students hate annoying professors.b. I saw the man with the telescope.

6 / 33

Page 7: Introduction to This course Introduction to Finite State ...ling.uni-konstanz.de/pages/home/hautli/teaching/FSM-1213/intro.pdf · This course Introduction to Finite State Morphology

Introduction to clThis course

Introduction to Finite State Morphology (fsm)

Challenges for cl

3. Semantic ambiguity: Semantic ambiguity arises when a clausecan be interpreted in more than one way.

Task #3

Where can you detect semantic ambiguity?

(3) a. The chickens are too hot to eat.b. They don’t smoke or drink.

7 / 33

Page 8: Introduction to This course Introduction to Finite State ...ling.uni-konstanz.de/pages/home/hautli/teaching/FSM-1213/intro.pdf · This course Introduction to Finite State Morphology

Introduction to clThis course

Introduction to Finite State Morphology (fsm)

Challenges for mt

4. Real-world knowledge: human knowledge about the concepts inthe world and their characteristics.

Task #6

Where will have an automatic cl system have problems whenparsing the following sentence? Why?

(4) Put the paper in the printer. Then switch it on.

8 / 33

Page 9: Introduction to This course Introduction to Finite State ...ling.uni-konstanz.de/pages/home/hautli/teaching/FSM-1213/intro.pdf · This course Introduction to Finite State Morphology

Introduction to clThis course

Introduction to Finite State Morphology (fsm)

Research topics in cl

Searching large corpora for patterns of use and linguisticexamples

Developing syntactic and semantic parsers

Automatic morphological analysis of words

Employing real-world knowledge

Anaphora resolution

Word sense disambiguation

9 / 33

Page 10: Introduction to This course Introduction to Finite State ...ling.uni-konstanz.de/pages/home/hautli/teaching/FSM-1213/intro.pdf · This course Introduction to Finite State Morphology

Introduction to clThis course

Introduction to Finite State Morphology (fsm)

Areas in cl

Computational linguistics is employed in the following areas:

Speech recognition

Machine translation

Information retrieval

Question-answering

Spell and grammar checkers

OCR (Optical Character Recognition)

10 / 33

Page 11: Introduction to This course Introduction to Finite State ...ling.uni-konstanz.de/pages/home/hautli/teaching/FSM-1213/intro.pdf · This course Introduction to Finite State Morphology

Introduction to clThis course

Introduction to Finite State Morphology (fsm)

Where do we need morphology?

English is a language with comparatively few morphologicalchallenges

But even then:

I saw her duck.Time flies like an arrow.

11 / 33

Page 12: Introduction to This course Introduction to Finite State ...ling.uni-konstanz.de/pages/home/hautli/teaching/FSM-1213/intro.pdf · This course Introduction to Finite State Morphology

Introduction to clThis course

Introduction to Finite State Morphology (fsm)

Example: The English xle grammar

Industry-standard deep syntactic parser:

Text breaker (fst)↓

Tokenizer & Morphologies (fst)↓

Syntax (xle lfg)↓

Semantics (xfr ordered rewriting)↓

Abstract Knowledge Representation (xfr ordered rewriting)

12 / 33

Page 13: Introduction to This course Introduction to Finite State ...ling.uni-konstanz.de/pages/home/hautli/teaching/FSM-1213/intro.pdf · This course Introduction to Finite State Morphology

Introduction to clThis course

Introduction to Finite State Morphology (fsm)

Example: The English xle grammar

Industry-standard deep syntactic parser:

Text breaker (fst)↓

Tokenizer & Morphologies (fst)↓

Syntax (xle lfg)↓

Semantics (xfr ordered rewriting)↓

Abstract Knowledge Representation (xfr ordered rewriting)

13 / 33

Page 14: Introduction to This course Introduction to Finite State ...ling.uni-konstanz.de/pages/home/hautli/teaching/FSM-1213/intro.pdf · This course Introduction to Finite State Morphology

Introduction to clThis course

Introduction to Finite State Morphology (fsm)

Outline

1 Introduction to cl

2 This course

3 Introduction to Finite State Morphology (fsm)

14 / 33

Page 15: Introduction to This course Introduction to Finite State ...ling.uni-konstanz.de/pages/home/hautli/teaching/FSM-1213/intro.pdf · This course Introduction to Finite State Morphology

Introduction to clThis course

Introduction to Finite State Morphology (fsm)

This course

What we will do:

Learn basics of computational linguistics, e.g. regularexpressions

Basic steps into programming

Build an automatic morphological analyzer for the language ofyour choice

Make use of your linguistic knowledge

Contribute an important part to your computational linguisticstudies

15 / 33

Page 16: Introduction to This course Introduction to Finite State ...ling.uni-konstanz.de/pages/home/hautli/teaching/FSM-1213/intro.pdf · This course Introduction to Finite State Morphology

Introduction to clThis course

Introduction to Finite State Morphology (fsm)

This course

Advice

Practice programming as much as possible

Read the literature before class (can be downloaded fromILIAS)

Beesley, Kennneth and Lauri Karttunen. 2003. Finite StateMorphology.

Try to solve the assignments on your own

Come to the optional help session if you have problems

Ask if things are unclear!

16 / 33

Page 17: Introduction to This course Introduction to Finite State ...ling.uni-konstanz.de/pages/home/hautli/teaching/FSM-1213/intro.pdf · This course Introduction to Finite State Morphology

Introduction to clThis course

Introduction to Finite State Morphology (fsm)

Requirements

Regular attendance, maximally two absences

Final grade is composed of the following parts:

Weekly exercises (30%)Three quizzes (30%)Final project (40%)

One joker exercise (not graded)

Final project

Solve a morphological challenge for the language of your choiceMaximum of two people per groupPresentation in the last session

17 / 33

Page 18: Introduction to This course Introduction to Finite State ...ling.uni-konstanz.de/pages/home/hautli/teaching/FSM-1213/intro.pdf · This course Introduction to Finite State Morphology

Introduction to clThis course

Introduction to Finite State Morphology (fsm)

Questions?

18 / 33

Page 19: Introduction to This course Introduction to Finite State ...ling.uni-konstanz.de/pages/home/hautli/teaching/FSM-1213/intro.pdf · This course Introduction to Finite State Morphology

Introduction to clThis course

Introduction to Finite State Morphology (fsm)

Outline

1 Introduction to cl

2 This course

3 Introduction to Finite State Morphology (fsm)

19 / 33

Page 20: Introduction to This course Introduction to Finite State ...ling.uni-konstanz.de/pages/home/hautli/teaching/FSM-1213/intro.pdf · This course Introduction to Finite State Morphology

Introduction to clThis course

Introduction to Finite State Morphology (fsm)

The light switch

20 / 33

Page 21: Introduction to This course Introduction to Finite State ...ling.uni-konstanz.de/pages/home/hautli/teaching/FSM-1213/intro.pdf · This course Introduction to Finite State Morphology

Introduction to clThis course

Introduction to Finite State Morphology (fsm)

The light switch

21 / 33

Page 22: Introduction to This course Introduction to Finite State ...ling.uni-konstanz.de/pages/home/hautli/teaching/FSM-1213/intro.pdf · This course Introduction to Finite State Morphology

Introduction to clThis course

Introduction to Finite State Morphology (fsm)

The car fan

22 / 33

Page 23: Introduction to This course Introduction to Finite State ...ling.uni-konstanz.de/pages/home/hautli/teaching/FSM-1213/intro.pdf · This course Introduction to Finite State Morphology

Introduction to clThis course

Introduction to Finite State Morphology (fsm)

The car fan

23 / 33

Page 24: Introduction to This course Introduction to Finite State ...ling.uni-konstanz.de/pages/home/hautli/teaching/FSM-1213/intro.pdf · This course Introduction to Finite State Morphology

Introduction to clThis course

Introduction to Finite State Morphology (fsm)

Introduction to fsm

Some unavoidable terminology:

Finite state morphologies are finite state networks

State: Substances or people are said to be in one state oranother, e.g. a light switch is either on or offFinite: The number of states can be satisfactorily definedNetwork: Networks are graph-like structures of nodes linkedtogether by arc

24 / 33

Page 25: Introduction to This course Introduction to Finite State ...ling.uni-konstanz.de/pages/home/hautli/teaching/FSM-1213/intro.pdf · This course Introduction to Finite State Morphology

Introduction to clThis course

Introduction to Finite State Morphology (fsm)

The Coke machine

Task #5

Imagine a coke costs 25 cents.Build a machine that will acceptnickels (N = 5), dimes (D = 10)and quarters (Q = 25) in anyorder and then returns a coke.

25 / 33

Page 26: Introduction to This course Introduction to Finite State ...ling.uni-konstanz.de/pages/home/hautli/teaching/FSM-1213/intro.pdf · This course Introduction to Finite State Morphology

Introduction to clThis course

Introduction to Finite State Morphology (fsm)

The connection to language

If the inputs to the coke machine are taken to be letter symbols,then:

The set of valid symbols the machine accepts is its alphabet.

The sequences of symbols are words.

The entire set of words is the language.

26 / 33

Page 27: Introduction to This course Introduction to Finite State ...ling.uni-konstanz.de/pages/home/hautli/teaching/FSM-1213/intro.pdf · This course Introduction to Finite State Morphology

Introduction to clThis course

Introduction to Finite State Morphology (fsm)

Linguistic examples

A one-word language

The network has a start state, a final state and non-final states

The machine transfers through a series of states, ending up inthe final state

27 / 33

Page 28: Introduction to This course Introduction to Finite State ...ling.uni-konstanz.de/pages/home/hautli/teaching/FSM-1213/intro.pdf · This course Introduction to Finite State Morphology

Introduction to clThis course

Introduction to Finite State Morphology (fsm)

Linguistic examples

A three-word language

Expand this network to three million tokens → spell checker

28 / 33

Page 29: Introduction to This course Introduction to Finite State ...ling.uni-konstanz.de/pages/home/hautli/teaching/FSM-1213/intro.pdf · This course Introduction to Finite State Morphology

Introduction to clThis course

Introduction to Finite State Morphology (fsm)

Linguistic examples

Analysis of mesa

Enter a word to check whether the network contains it →lookup

lookup doesn’t know anything about language — simplematching

29 / 33

Page 30: Introduction to This course Introduction to Finite State ...ling.uni-konstanz.de/pages/home/hautli/teaching/FSM-1213/intro.pdf · This course Introduction to Finite State Morphology

Introduction to clThis course

Introduction to Finite State Morphology (fsm)

Linguistic examples

A two-level transducer

Match the input symbols against the lower-side symbols onthe arcs and find a path from the start state to the final state.

If successful, return the string of upper-side symbols on thepath.

If the analysis is not successful, return nothing.

30 / 33

Page 31: Introduction to This course Introduction to Finite State ...ling.uni-konstanz.de/pages/home/hautli/teaching/FSM-1213/intro.pdf · This course Introduction to Finite State Morphology

Introduction to clThis course

Introduction to Finite State Morphology (fsm)

Linguistic examples

One path in a morphological analyzer for Spanish — canto

Spanish verb with the base form cantar

The verb is conjugated in the present indicative, first person,singular

31 / 33

Page 32: Introduction to This course Introduction to Finite State ...ling.uni-konstanz.de/pages/home/hautli/teaching/FSM-1213/intro.pdf · This course Introduction to Finite State Morphology

Introduction to clThis course

Introduction to Finite State Morphology (fsm)

Linguistic examples

Analyzing canto

32 / 33

Page 33: Introduction to This course Introduction to Finite State ...ling.uni-konstanz.de/pages/home/hautli/teaching/FSM-1213/intro.pdf · This course Introduction to Finite State Morphology

Introduction to clThis course

Introduction to Finite State Morphology (fsm)

Wrap-up

Automatic morphological analyzers play an important part incomputational linguistics

Finite state morphology as an efficient way to encodelinguistic information

Flexible system that allows for different languages,programming tastes and application areas

Next session: first implementations

33 / 33