Upload
vuongphuc
View
214
Download
0
Embed Size (px)
Citation preview
Formal Models in NLPFinite-State Automata
Nina Seemann
Universitat Stuttgart– Institut fur Maschinelle Sprachverarbeitung –
Pfaffenwaldring 5b70569 Stuttgart
May 15, 2012
Nina Seemann (IMS) Formal Models in NLP: Finite-State Automata May 15, 2012 1
Outline
1 Finite-State Automata: Characterization
2 Closure Properties of Finite-State Acceptors
3 Closure Properties of Finite-State Transducers
4 Equivalence Transformations on Finite-State Acceptors
Nina Seemann (IMS) Formal Models in NLP: Finite-State Automata May 15, 2012 2
Outline
1 Finite-State Automata: CharacterizationFinite-State AcceptorsFinite-State Transducers
2 Closure Properties of Finite-State Acceptors
3 Closure Properties of Finite-State Transducers
4 Equivalence Transformations on Finite-State Acceptors
Nina Seemann (IMS) Formal Models in NLP: Finite-State Automata May 15, 2012 3
Finite-State Acceptors
Example (NFA Alex accepting some animal names)
Nina Seemann (IMS) Formal Models in NLP: Finite-State Automata May 15, 2012 4
Finite-State AcceptorsNon-Deterministic Finite-State Acceptor
Definition (Non-deterministic finite-state acceptor (NFA))
A non-deterministic finite-state acceptor A is a 5-tuple (Q,Σ, q0,F , δ)where
Q is a finite set of states
Σ is the alphabet
q0 ∈ Q is the start state
F ⊆ Q is a set of final states
δ : Q × Σ ∪ {ε} → 2Q , the transition function
Nondeterminism refers to the fact that a NFA has the power to be inseveral states at once.
A transition may be labeled with ε.
Nina Seemann (IMS) Formal Models in NLP: Finite-State Automata May 15, 2012 5
Finite-State AcceptorsDeterministic Finite-State Acceptor
Definition (Deterministic finite-state acceptor (DFA))
A deterministic finite-state acceptor D is a 5-tuple (Q,Σ, q0,F , δ) where
Q is a finite set of states
Σ is a finite set and called the alphabet
q0 ∈ Q is the initial state
F ⊆ Q is a set of final states
δ : Q × Σ→ Q, the transition function
Determinism refers to the fact that DFAs can go to one state only.
DFAs are ε-free by definition.
DFA and NFA have the same generative power, i.e. they areequivalent.
Nina Seemann (IMS) Formal Models in NLP: Finite-State Automata May 15, 2012 6
Finite-State Acceptors
Example (DFA Dlex accepting some animal names)
Nina Seemann (IMS) Formal Models in NLP: Finite-State Automata May 15, 2012 7
Finite-State AcceptorsExtended Transition Function & Language
Definition (Extended transition function δ)
δ describes what happens when we start in any state and follow anysequence of inputs.
δ(q, ε) = q.
δ(q,w) = δ(δ(q, x), a) with w = xa.
Definition (Language of a DFA A)
L(A) = {w ∈ Σ∗ | δ(qo ,w) ∈ F}We also say that L(A) is recognized by A.
Definition (Regular language)
The language is called regular if there exists some DFA which recognizes it.
Nina Seemann (IMS) Formal Models in NLP: Finite-State Automata May 15, 2012 8
Finite-State AcceptorsExtended Transition Function for DFA
Example (frog in DFA Dlex)
Assumption: δ(0, frog) ∈ {26, 24, 22, 13, 11, 9, 8}
δ(0, ε) = 0
δ(0, f ) = δ(δ(0, ε), f ) = δ(0, f ) = 3
δ(0, fr) = δ(δ(0, f ), r) = δ(3, r) = 6
δ(0, fro) = δ(δ(0, fr), o) = δ(6, o) = 7
δ(0, frog) = δ(δ(0, fro), g) = δ(7, g) = 8
Nina Seemann (IMS) Formal Models in NLP: Finite-State Automata May 15, 2012 9
Finite-State AcceptorsExtended Transition Function for NFA
Example (frog in NFA Alex)
Assumption: δ(31, frog) ∩ {2, 6, 9, 13, 18, 21, 30} 6= ∅
δ(31, ε) = {31}δ(31, f ) = δ(δ(31, ε), f ) = δ(31, f ) = {3, 7, 10}δ(31, fr) = δ(δ(31, f ), r) = δ(3, r) ∪ δ(7, r) ∪ δ(10, r) = {4} ∪ ∅ ∪ ∅ = {4}δ(31, fro) = δ(δ(31, fr), o) = δ(4, o) = {5}δ(31, frog) = δ(δ(31, fro), g) = δ(5, g) = {6}
Nina Seemann (IMS) Formal Models in NLP: Finite-State Automata May 15, 2012 10
Finite-State TransducersDefinition
Definition ((Non-deterministic) finite-state transducer (NFST))
A (non-deterministic) finite-state transducer T is a 7-tuple(Q,Σ,∆, q0,F , δ, σ) where
Q is a set of states
Σ is the input alphabet of T
∆ is the output alphabet of T
q0 ∈ Q is the start state
F ⊆ Q is a set of final states
δ : Q × Σ ∪ {ε} → 2Q , the transition function
σ : Q × Σ ∪ {ε} × Q → ∆∗, the output function
Nina Seemann (IMS) Formal Models in NLP: Finite-State Automata May 15, 2012 11
Finite-State TransducersAlternative Definition
Definition (Normalized finite-state transducer)
A normalized finite-state transducer T is a 6-tuple (Q,Σ,∆, q0,F ,E )where
Q is a set of states
Σ is a set and called the input alphabet of T
∆ is a set and called the output alphabet of T
q0 ∈ Q is the start state
F ⊆ Q is a set of final states
E ⊆ Q × (Σ ∪ {ε})× (∆ ∪ {ε})× Q, the set of transitions
Every transducer can be transformed into a normalized transducer.
Nina Seemann (IMS) Formal Models in NLP: Finite-State Automata May 15, 2012 12
Finite-State Transducers
Example (NFST Tlex mapping surface forms to morph. features)
Nina Seemann (IMS) Formal Models in NLP: Finite-State Automata May 15, 2012 13
Finite-State TransducersDeterministic Finite-State Transducer
Definition (Deterministic finite-state transducer (DFST))
A deterministic finite-state transducer T is a 7-tuple (Q,Σ,∆, q0,F , δ, σ)where
Q is a set of states
Σ is a set and called the input alphabet of T
∆ is a set and called the output alphabet of T
q0 ∈ Q is the start state
F ⊆ Q is a set of final states
δ : Q × Σ→ Q, the (deterministic) transition function
σ : Q × Σ× Q 7→ ∆∗, the (deterministic) output function
Note: Not every NFST can be determinized.
Nina Seemann (IMS) Formal Models in NLP: Finite-State Automata May 15, 2012 14
Outline
1 Finite-State Automata: CharacterizationFinite-State AcceptorsFinite-State Transducers
2 Closure Properties of Finite-State Acceptors
3 Closure Properties of Finite-State Transducers
4 Equivalence Transformations on Finite-State Acceptors
Nina Seemann (IMS) Formal Models in NLP: Finite-State Automata May 15, 2012 15
Closure Properties of Finite-State Acceptors
Finite-state acceptors are closed under:
Union
Concatenation
Closure (Kleene Star)
Reversal
Intersection
Complementation
Difference
Homomorphism / Inverse homomorphism
Nina Seemann (IMS) Formal Models in NLP: Finite-State Automata May 15, 2012 16
Closure Properties of Finite-State AcceptorsUnion
Example (Union of two acceptors A1 and A2)
A1 A2
A1 ∪ A2
Nina Seemann (IMS) Formal Models in NLP: Finite-State Automata May 15, 2012 17
Closure Properties of Finite-State AcceptorsConcatenation
Example (Concatenation of two acceptors A1 and A2)
A1 A2
A1 · A2
Nina Seemann (IMS) Formal Models in NLP: Finite-State Automata May 15, 2012 18
Closure Properties of Finite-State AcceptorsClosure (Kleene Star)
Example (Closure of acceptor A1 )
A1
A∗1
Nina Seemann (IMS) Formal Models in NLP: Finite-State Automata May 15, 2012 19
Closure Properties of Finite-State AcceptorsReversal
Example (Reversal of acceptor A2)
A2
AR2
Nina Seemann (IMS) Formal Models in NLP: Finite-State Automata May 15, 2012 20
Closure Properties of Finite-State AcceptorsIntersection
Intersection
Let L and M be the languages of the deterministic automataAL = (QL,Σ, δL, qL,FL) and AM = (QM ,Σ, δM , qM ,FM). For L ∩M wewill construct an automaton
A = (QL × QM ,Σ, δ, (qL, qM),FL × FM)
where δ((p, q), σ) = (δL(p, σ), δM(q, σ)) [p ∈ QL, q ∈ QM , and σ ∈ Σ].The set F of final states consists of all pairs (p, q) such that p ∈ FL andq ∈ FM .
states of A are pair of states (AL, AM)suppose state (p,q):
I Given input symbol aF what does AL on input a → sF what does AM on input a → t
⇒ new state pair (s, t)
Nina Seemann (IMS) Formal Models in NLP: Finite-State Automata May 15, 2012 21
Closure Properties of Finite-State AcceptorsIntersection
Example (Intersection of two acceptors A1 and A3)
A1 A3
A1 ∩ A3
Nina Seemann (IMS) Formal Models in NLP: Finite-State Automata May 15, 2012 22
Closure Properties of Finite-State AcceptorsComplementation
Example (Complementation of acceptor A3)
A3 A3
Complementation requires a deterministic acceptor.
If the acceptor is not total, a sink state has to be added.
Nina Seemann (IMS) Formal Models in NLP: Finite-State Automata May 15, 2012 23
Closure Properties of Finite-State AcceptorsDifference
Example (Difference of two acceptors A1 and A2)
A1 A2
A1 − A2 = A1 ∩ A2
Nina Seemann (IMS) Formal Models in NLP: Finite-State Automata May 15, 2012 24
Outline
1 Finite-State Automata: CharacterizationFinite-State AcceptorsFinite-State Transducers
2 Closure Properties of Finite-State Acceptors
3 Closure Properties of Finite-State Transducers
4 Equivalence Transformations on Finite-State Acceptors
Nina Seemann (IMS) Formal Models in NLP: Finite-State Automata May 15, 2012 25
Closure Properties of Finite-State Transducers
Finite-state transducers are closed under
Union
Concatenation
Closure (Kleene Star)
Reversal
Projection (leads to FSAs)
Composition
Inversion
Finite-state transducers are not closed under
Complementation
Intersection (but acyclic and ε-free transducers are)
Difference
Nina Seemann (IMS) Formal Models in NLP: Finite-State Automata May 15, 2012 26
Closure Properties of Finite-State TransducersProjection
Example (Projection of transducer T )
Transducer T
π1(T ) π2(T )
Nina Seemann (IMS) Formal Models in NLP: Finite-State Automata May 15, 2012 27
Composition
Definition (ε-free composition)
Let T1 = (Q1,Σ1,∆1, q1,F1,E1) and T2 = (Q2,Σ2,∆2, q2,F2,E2) be twonormalized, ε-free FSTs. T1 ◦ T2 is the transducer
T = (Q1 × Q2,Σ1,∆2, (q1, q2),F1 × F2,E )
where E = {((p, q), a, b, (p′, q′)) | ∃c ∈ ∆1 ∩ Σ2 :(p, a, c , p′) ∈ E1 ∧ (q, c , b, q′) ∈ E2}
How does composition work?
Whenever T1 contains a transition: and T2 contains a transition:
T will contain a transition:
Nina Seemann (IMS) Formal Models in NLP: Finite-State Automata May 15, 2012 28
Closure Properties of Finite-State TransducersComposition
Example (Composition)
◦ =
Nina Seemann (IMS) Formal Models in NLP: Finite-State Automata May 15, 2012 29
Closure Properties of Finite-State TransducersInversion
Example (Inversion)
FST TMorph mapping words to morphological categories
FST T−1Morph mapping morphological categories to words
Nina Seemann (IMS) Formal Models in NLP: Finite-State Automata May 15, 2012 30
Outline
1 Finite-State Automata: CharacterizationFinite-State AcceptorsFinite-State Transducers
2 Closure Properties of Finite-State Acceptors
3 Closure Properties of Finite-State Transducers
4 Equivalence Transformations on Finite-State Acceptors
Nina Seemann (IMS) Formal Models in NLP: Finite-State Automata May 15, 2012 31
Equivalence Transformations on Finite-State Acceptors
Equivalence transformations are operations on automata whichchange the topology of an automaton but not its language.
They usually serve optimization purposes, i.e. they create smallerand/or faster automata.
Sometimes they are even necessary (e.g. determinization is crucial forcomplementation).
Finite-state acceptors admit the following transformations:
ε-Removal
Determinization
Minimization
Nina Seemann (IMS) Formal Models in NLP: Finite-State Automata May 15, 2012 32
DeterminizationSubset Construction
A DFA can be constructed from a NFA by the subset construction.
In worst case, the smallest DFA can have 2n states.
Example
. . .
QD is the power set of QN
FD is the set of subsets S of QN such that S ∩ FN 6= ∅.For each set S ⊆ QN and for each input symbol a ∈ Σ
δD(S , a) =⋃p∈S
δN(p, a)
Nina Seemann (IMS) Formal Models in NLP: Finite-State Automata May 15, 2012 33
DeterminizationSubset Construction
transition diagram: transition function δ:
δ(p0, 0) = {p0, p1}δ(p0, 1) = {p0}δ(p1, 1) = {p2}
0 1
∅ ∅ ∅ not accessible!→ {p0} {p0, p1} {p0}{p1} ∅ {p2} not accessible!∗{p2} ∅ ∅ not accessible!{p0, p1} {p0, p1} {p0, p2}∗{p0, p2} {p0, p1} {p0}∗{p1, p2} ∅ {p2} not accessible!
∗{p0, p1, p2} {p0, p1} {p0, p2} not accessible!
Nina Seemann (IMS) Formal Models in NLP: Finite-State Automata May 15, 2012 34
DeterminizationSubset Construction: Lazy Evaluation
Lazy Evaluation
Basis NFA N’s start state is accessible.
Induction Set S of states is accessible. Then for each input symbol a,compute the set of states δD(S , a).
Example
δD({p0}, 0) = {p0, p1} (new accessible state)
δD({p0}, 1) = {p0} (’old’ state)
δD({p0, p1}, 0) = δN(p0, 0) ∪ δN(p1, 0) = {p0, p1} ∪ ∅ = {p0, p1} (’old’)
δD({p0, p1}, 1) = δN(p0, 1) ∪ δN(p1, 1) = {p0} ∪ {p2} = {p0, p2} (n.a.s.)
δD({p0, p2}, 0) = δN(p0, 0) ∪ δN(p2, 0) = {p0, p1} ∪ ∅ = {p0, p1} (’old’)
δD({p0, p2}, 1) = δN(p0, 1) ∪ δN(p2, 1) = {p0} ∪ ∅ = {p0} (’old’)
⇒ Converging!Nina Seemann (IMS) Formal Models in NLP: Finite-State Automata May 15, 2012 35
Determinization
Example (Determinized Version of A2)
Nina Seemann (IMS) Formal Models in NLP: Finite-State Automata May 15, 2012 36
Bibliography
J. E. Hopcroft, R. Motwani & J. D. Ullman: Introduction toAutomata Theory, Languages, and Computation. Addison-Wesley,2007.
T. Hanneforth: Finite-state Machines: Theory and Applications.Unweighted Finite-state Automata. Universitat Potsdam, 2008.Slides:tagh.de/tom/wp-content/uploads/fsm unweigtedautomata.pdf
Nina Seemann (IMS) Formal Models in NLP: Finite-State Automata May 15, 2012 37