Upload
trankhue
View
220
Download
3
Embed Size (px)
Citation preview
Introduction to clThis course
Introduction to Finite State Morphology (fsm)
Computational Linguistics
Session 1
Annette Hautli
October 26th, 2012
1 / 33
Introduction to clThis course
Introduction to Finite State Morphology (fsm)
Outline
1 Introduction to cl
2 This course
3 Introduction to Finite State Morphology (fsm)
2 / 33
Introduction to clThis course
Introduction to Finite State Morphology (fsm)
Outline
1 Introduction to cl
2 This course
3 Introduction to Finite State Morphology (fsm)
3 / 33
Introduction to clThis course
Introduction to Finite State Morphology (fsm)
What is cl?
Computational Linguistics (mt)
cl is the science of making computers understand naturallanguage. Computational linguists are interested in providingcomputational models for various kinds of linguistic phenomena.These models may be knowledge-based (“hand-crafted”) ordata-driven (“statistical” or “empirical”).
4 / 33
Introduction to clThis course
Introduction to Finite State Morphology (fsm)
Challenges for cl
1. Categorial ambiguity: A word can belong to more than onesyntactic category.
Task #1
Where do problems with categorial ambiguity arise?
(1) a. I saw her duck.b. Time flies like an arrow.c. Cleaning fluids can be dangerous.
5 / 33
Introduction to clThis course
Introduction to Finite State Morphology (fsm)
Challenges for cl
2. Syntacitc ambiguity: whole phrases, typically prepositionalphrases, attach to more than one position in a sentence(attachment ambiguity).
Task #2
Where do you detect syntactic ambiguities? Draw trees for each ofthe interpretations.
(2) a. Students hate annoying professors.b. I saw the man with the telescope.
6 / 33
Introduction to clThis course
Introduction to Finite State Morphology (fsm)
Challenges for cl
3. Semantic ambiguity: Semantic ambiguity arises when a clausecan be interpreted in more than one way.
Task #3
Where can you detect semantic ambiguity?
(3) a. The chickens are too hot to eat.b. They don’t smoke or drink.
7 / 33
Introduction to clThis course
Introduction to Finite State Morphology (fsm)
Challenges for mt
4. Real-world knowledge: human knowledge about the concepts inthe world and their characteristics.
Task #6
Where will have an automatic cl system have problems whenparsing the following sentence? Why?
(4) Put the paper in the printer. Then switch it on.
8 / 33
Introduction to clThis course
Introduction to Finite State Morphology (fsm)
Research topics in cl
Searching large corpora for patterns of use and linguisticexamples
Developing syntactic and semantic parsers
Automatic morphological analysis of words
Employing real-world knowledge
Anaphora resolution
Word sense disambiguation
9 / 33
Introduction to clThis course
Introduction to Finite State Morphology (fsm)
Areas in cl
Computational linguistics is employed in the following areas:
Speech recognition
Machine translation
Information retrieval
Question-answering
Spell and grammar checkers
OCR (Optical Character Recognition)
10 / 33
Introduction to clThis course
Introduction to Finite State Morphology (fsm)
Where do we need morphology?
English is a language with comparatively few morphologicalchallenges
But even then:
I saw her duck.Time flies like an arrow.
11 / 33
Introduction to clThis course
Introduction to Finite State Morphology (fsm)
Example: The English xle grammar
Industry-standard deep syntactic parser:
Text breaker (fst)↓
Tokenizer & Morphologies (fst)↓
Syntax (xle lfg)↓
Semantics (xfr ordered rewriting)↓
Abstract Knowledge Representation (xfr ordered rewriting)
12 / 33
Introduction to clThis course
Introduction to Finite State Morphology (fsm)
Example: The English xle grammar
Industry-standard deep syntactic parser:
Text breaker (fst)↓
Tokenizer & Morphologies (fst)↓
Syntax (xle lfg)↓
Semantics (xfr ordered rewriting)↓
Abstract Knowledge Representation (xfr ordered rewriting)
13 / 33
Introduction to clThis course
Introduction to Finite State Morphology (fsm)
Outline
1 Introduction to cl
2 This course
3 Introduction to Finite State Morphology (fsm)
14 / 33
Introduction to clThis course
Introduction to Finite State Morphology (fsm)
This course
What we will do:
Learn basics of computational linguistics, e.g. regularexpressions
Basic steps into programming
Build an automatic morphological analyzer for the language ofyour choice
Make use of your linguistic knowledge
Contribute an important part to your computational linguisticstudies
15 / 33
Introduction to clThis course
Introduction to Finite State Morphology (fsm)
This course
Advice
Practice programming as much as possible
Read the literature before class (can be downloaded fromILIAS)
Beesley, Kennneth and Lauri Karttunen. 2003. Finite StateMorphology.
Try to solve the assignments on your own
Come to the optional help session if you have problems
Ask if things are unclear!
16 / 33
Introduction to clThis course
Introduction to Finite State Morphology (fsm)
Requirements
Regular attendance, maximally two absences
Final grade is composed of the following parts:
Weekly exercises (30%)Three quizzes (30%)Final project (40%)
One joker exercise (not graded)
Final project
Solve a morphological challenge for the language of your choiceMaximum of two people per groupPresentation in the last session
17 / 33
Introduction to clThis course
Introduction to Finite State Morphology (fsm)
Questions?
18 / 33
Introduction to clThis course
Introduction to Finite State Morphology (fsm)
Outline
1 Introduction to cl
2 This course
3 Introduction to Finite State Morphology (fsm)
19 / 33
Introduction to clThis course
Introduction to Finite State Morphology (fsm)
The light switch
20 / 33
Introduction to clThis course
Introduction to Finite State Morphology (fsm)
The light switch
21 / 33
Introduction to clThis course
Introduction to Finite State Morphology (fsm)
The car fan
22 / 33
Introduction to clThis course
Introduction to Finite State Morphology (fsm)
The car fan
23 / 33
Introduction to clThis course
Introduction to Finite State Morphology (fsm)
Introduction to fsm
Some unavoidable terminology:
Finite state morphologies are finite state networks
State: Substances or people are said to be in one state oranother, e.g. a light switch is either on or offFinite: The number of states can be satisfactorily definedNetwork: Networks are graph-like structures of nodes linkedtogether by arc
24 / 33
Introduction to clThis course
Introduction to Finite State Morphology (fsm)
The Coke machine
Task #5
Imagine a coke costs 25 cents.Build a machine that will acceptnickels (N = 5), dimes (D = 10)and quarters (Q = 25) in anyorder and then returns a coke.
25 / 33
Introduction to clThis course
Introduction to Finite State Morphology (fsm)
The connection to language
If the inputs to the coke machine are taken to be letter symbols,then:
The set of valid symbols the machine accepts is its alphabet.
The sequences of symbols are words.
The entire set of words is the language.
26 / 33
Introduction to clThis course
Introduction to Finite State Morphology (fsm)
Linguistic examples
A one-word language
The network has a start state, a final state and non-final states
The machine transfers through a series of states, ending up inthe final state
27 / 33
Introduction to clThis course
Introduction to Finite State Morphology (fsm)
Linguistic examples
A three-word language
Expand this network to three million tokens → spell checker
28 / 33
Introduction to clThis course
Introduction to Finite State Morphology (fsm)
Linguistic examples
Analysis of mesa
Enter a word to check whether the network contains it →lookup
lookup doesn’t know anything about language — simplematching
29 / 33
Introduction to clThis course
Introduction to Finite State Morphology (fsm)
Linguistic examples
A two-level transducer
Match the input symbols against the lower-side symbols onthe arcs and find a path from the start state to the final state.
If successful, return the string of upper-side symbols on thepath.
If the analysis is not successful, return nothing.
30 / 33
Introduction to clThis course
Introduction to Finite State Morphology (fsm)
Linguistic examples
One path in a morphological analyzer for Spanish — canto
Spanish verb with the base form cantar
The verb is conjugated in the present indicative, first person,singular
31 / 33
Introduction to clThis course
Introduction to Finite State Morphology (fsm)
Linguistic examples
Analyzing canto
32 / 33
Introduction to clThis course
Introduction to Finite State Morphology (fsm)
Wrap-up
Automatic morphological analyzers play an important part incomputational linguistics
Finite state morphology as an efficient way to encodelinguistic information
Flexible system that allows for different languages,programming tastes and application areas
Next session: first implementations
33 / 33