Formal Models in NLP - ims.uni-stuttgart.de · Formal Models in NLP Finite-State Automata Nina Seemann Universit at Stuttgart { Institut fur Maschinelle Sprachverarbeitung {Pfa enwaldring

Formal Models in NLPFinite-State Automata

Nina Seemann

Universitat Stuttgart– Institut fur Maschinelle Sprachverarbeitung –

Pfaffenwaldring 5b70569 Stuttgart

May 15, 2012

Nina Seemann (IMS) Formal Models in NLP: Finite-State Automata May 15, 2012 1

Outline

1 Finite-State Automata: Characterization

2 Closure Properties of Finite-State Acceptors

3 Closure Properties of Finite-State Transducers

4 Equivalence Transformations on Finite-State Acceptors


Outline

1 Finite-State Automata: CharacterizationFinite-State AcceptorsFinite-State Transducers





Finite-State Acceptors

Example (NFA Alex accepting some animal names)


Finite-State AcceptorsNon-Deterministic Finite-State Acceptor

Definition (Non-deterministic finite-state acceptor (NFA))

A non-deterministic finite-state acceptor A is a 5-tuple (Q,Σ, q0,F , δ)where

Q is a finite set of states

Σ is the alphabet

q0 ∈ Q is the start state

F ⊆ Q is a set of final states

δ : Q × Σ ∪ {ε} → 2Q , the transition function

Nondeterminism refers to the fact that a NFA has the power to be inseveral states at once.

A transition may be labeled with ε.


Finite-State AcceptorsDeterministic Finite-State Acceptor

Definition (Deterministic finite-state acceptor (DFA))

A deterministic finite-state acceptor D is a 5-tuple (Q,Σ, q0,F , δ) where

Q is a finite set of states

Σ is a finite set and called the alphabet

q0 ∈ Q is the initial state


δ : Q × Σ→ Q, the transition function

Determinism refers to the fact that DFAs can go to one state only.

DFAs are ε-free by definition.

DFA and NFA have the same generative power, i.e. they areequivalent.


Finite-State Acceptors

Example (DFA Dlex accepting some animal names)


Finite-State AcceptorsExtended Transition Function & Language

Definition (Extended transition function δ)

δ describes what happens when we start in any state and follow anysequence of inputs.

δ(q, ε) = q.

δ(q,w) = δ(δ(q, x), a) with w = xa.

Definition (Language of a DFA A)

L(A) = {w ∈ Σ∗ | δ(qo ,w) ∈ F}We also say that L(A) is recognized by A.

Definition (Regular language)

The language is called regular if there exists some DFA which recognizes it.


Finite-State AcceptorsExtended Transition Function for DFA

Example (frog in DFA Dlex)

Assumption: δ(0, frog) ∈ {26, 24, 22, 13, 11, 9, 8}

δ(0, ε) = 0

δ(0, f ) = δ(δ(0, ε), f ) = δ(0, f ) = 3

δ(0, fr) = δ(δ(0, f ), r) = δ(3, r) = 6

δ(0, fro) = δ(δ(0, fr), o) = δ(6, o) = 7

δ(0, frog) = δ(δ(0, fro), g) = δ(7, g) = 8


Finite-State AcceptorsExtended Transition Function for NFA

Example (frog in NFA Alex)

Assumption: δ(31, frog) ∩ {2, 6, 9, 13, 18, 21, 30} 6= ∅

δ(31, ε) = {31}δ(31, f ) = δ(δ(31, ε), f ) = δ(31, f ) = {3, 7, 10}δ(31, fr) = δ(δ(31, f ), r) = δ(3, r) ∪ δ(7, r) ∪ δ(10, r) = {4} ∪ ∅ ∪ ∅ = {4}δ(31, fro) = δ(δ(31, fr), o) = δ(4, o) = {5}δ(31, frog) = δ(δ(31, fro), g) = δ(5, g) = {6}


Finite-State TransducersDefinition

Definition ((Non-deterministic) finite-state transducer (NFST))

A (non-deterministic) finite-state transducer T is a 7-tuple(Q,Σ,∆, q0,F , δ, σ) where

Q is a set of states

Σ is the input alphabet of T

∆ is the output alphabet of T



δ : Q × Σ ∪ {ε} → 2Q , the transition function

σ : Q × Σ ∪ {ε} × Q → ∆∗, the output function


Finite-State TransducersAlternative Definition

Definition (Normalized finite-state transducer)

A normalized finite-state transducer T is a 6-tuple (Q,Σ,∆, q0,F ,E )where


Σ is a set and called the input alphabet of T

∆ is a set and called the output alphabet of T



E ⊆ Q × (Σ ∪ {ε})× (∆ ∪ {ε})× Q, the set of transitions

Every transducer can be transformed into a normalized transducer.


Finite-State Transducers

Example (NFST Tlex mapping surface forms to morph. features)


Finite-State TransducersDeterministic Finite-State Transducer

Definition (Deterministic finite-state transducer (DFST))

A deterministic finite-state transducer T is a 7-tuple (Q,Σ,∆, q0,F , δ, σ)where


Σ is a set and called the input alphabet of T

∆ is a set and called the output alphabet of T



δ : Q × Σ→ Q, the (deterministic) transition function

σ : Q × Σ× Q 7→ ∆∗, the (deterministic) output function

Note: Not every NFST can be determinized.


Outline






Closure Properties of Finite-State Acceptors

Finite-state acceptors are closed under:

Union

Concatenation

Closure (Kleene Star)

Reversal

Intersection

Complementation

Difference

Homomorphism / Inverse homomorphism


Closure Properties of Finite-State AcceptorsUnion

Example (Union of two acceptors A1 and A2)

A1 A2

A1 ∪ A2


Closure Properties of Finite-State AcceptorsConcatenation

Example (Concatenation of two acceptors A1 and A2)

A1 A2

A1 · A2


Closure Properties of Finite-State AcceptorsClosure (Kleene Star)

Example (Closure of acceptor A1 )

A1

A∗1


Closure Properties of Finite-State AcceptorsReversal

Example (Reversal of acceptor A2)

A2

AR2


Closure Properties of Finite-State AcceptorsIntersection

Intersection

Let L and M be the languages of the deterministic automataAL = (QL,Σ, δL, qL,FL) and AM = (QM ,Σ, δM , qM ,FM). For L ∩M wewill construct an automaton

A = (QL × QM ,Σ, δ, (qL, qM),FL × FM)

where δ((p, q), σ) = (δL(p, σ), δM(q, σ)) [p ∈ QL, q ∈ QM , and σ ∈ Σ].The set F of final states consists of all pairs (p, q) such that p ∈ FL andq ∈ FM .

states of A are pair of states (AL, AM)suppose state (p,q):

I Given input symbol aF what does AL on input a → sF what does AM on input a → t

⇒ new state pair (s, t)


Closure Properties of Finite-State AcceptorsIntersection

Example (Intersection of two acceptors A1 and A3)

A1 A3

A1 ∩ A3


Closure Properties of Finite-State AcceptorsComplementation

Example (Complementation of acceptor A3)

A3 A3

Complementation requires a deterministic acceptor.

If the acceptor is not total, a sink state has to be added.


Closure Properties of Finite-State AcceptorsDifference

Example (Difference of two acceptors A1 and A2)

A1 A2

A1 − A2 = A1 ∩ A2


Outline






Closure Properties of Finite-State Transducers

Finite-state transducers are closed under

Union

Concatenation

Closure (Kleene Star)

Reversal

Projection (leads to FSAs)

Composition

Inversion

Finite-state transducers are not closed under

Complementation

Intersection (but acyclic and ε-free transducers are)

Difference


Closure Properties of Finite-State TransducersProjection

Example (Projection of transducer T )

Transducer T

π1(T ) π2(T )


Composition

Definition (ε-free composition)

Let T1 = (Q1,Σ1,∆1, q1,F1,E1) and T2 = (Q2,Σ2,∆2, q2,F2,E2) be twonormalized, ε-free FSTs. T1 ◦ T2 is the transducer

T = (Q1 × Q2,Σ1,∆2, (q1, q2),F1 × F2,E )

where E = {((p, q), a, b, (p′, q′)) | ∃c ∈ ∆1 ∩ Σ2 :(p, a, c , p′) ∈ E1 ∧ (q, c , b, q′) ∈ E2}

How does composition work?

Whenever T1 contains a transition: and T2 contains a transition:

T will contain a transition:


Closure Properties of Finite-State TransducersComposition

Example (Composition)

◦ =


Closure Properties of Finite-State TransducersInversion

Example (Inversion)

FST TMorph mapping words to morphological categories

FST T−1Morph mapping morphological categories to words


Outline






Equivalence Transformations on Finite-State Acceptors

Equivalence transformations are operations on automata whichchange the topology of an automaton but not its language.

They usually serve optimization purposes, i.e. they create smallerand/or faster automata.

Sometimes they are even necessary (e.g. determinization is crucial forcomplementation).

Finite-state acceptors admit the following transformations:

ε-Removal

Determinization

Minimization


DeterminizationSubset Construction

A DFA can be constructed from a NFA by the subset construction.

In worst case, the smallest DFA can have 2n states.

Example

. . .

QD is the power set of QN

FD is the set of subsets S of QN such that S ∩ FN 6= ∅.For each set S ⊆ QN and for each input symbol a ∈ Σ

δD(S , a) =⋃p∈S

δN(p, a)


DeterminizationSubset Construction

transition diagram: transition function δ:

δ(p0, 0) = {p0, p1}δ(p0, 1) = {p0}δ(p1, 1) = {p2}

0 1

∅ ∅ ∅ not accessible!→ {p0} {p0, p1} {p0}{p1} ∅ {p2} not accessible!∗{p2} ∅ ∅ not accessible!{p0, p1} {p0, p1} {p0, p2}∗{p0, p2} {p0, p1} {p0}∗{p1, p2} ∅ {p2} not accessible!

∗{p0, p1, p2} {p0, p1} {p0, p2} not accessible!


DeterminizationSubset Construction: Lazy Evaluation

Lazy Evaluation

Basis NFA N’s start state is accessible.

Induction Set S of states is accessible. Then for each input symbol a,compute the set of states δD(S , a).

Example

δD({p0}, 0) = {p0, p1} (new accessible state)

δD({p0}, 1) = {p0} (’old’ state)

δD({p0, p1}, 0) = δN(p0, 0) ∪ δN(p1, 0) = {p0, p1} ∪ ∅ = {p0, p1} (’old’)

δD({p0, p1}, 1) = δN(p0, 1) ∪ δN(p1, 1) = {p0} ∪ {p2} = {p0, p2} (n.a.s.)

δD({p0, p2}, 0) = δN(p0, 0) ∪ δN(p2, 0) = {p0, p1} ∪ ∅ = {p0, p1} (’old’)

δD({p0, p2}, 1) = δN(p0, 1) ∪ δN(p2, 1) = {p0} ∪ ∅ = {p0} (’old’)

⇒ Converging!Nina Seemann (IMS) Formal Models in NLP: Finite-State Automata May 15, 2012 35

Determinization

Example (Determinized Version of A2)


Bibliography

J. E. Hopcroft, R. Motwani & J. D. Ullman: Introduction toAutomata Theory, Languages, and Computation. Addison-Wesley,2007.

T. Hanneforth: Finite-state Machines: Theory and Applications.Unweighted Finite-state Automata. Universitat Potsdam, 2008.Slides:tagh.de/tom/wp-content/uploads/fsm unweigtedautomata.pdf


Documents

Formal Models in NLP - ims.uni-stuttgart.de · Formal Models in NLP Finite-State Automata Nina Seemann Universit at Stuttgart { Institut fur Maschinelle Sprachverarbeitung {Pfa enwaldring