Download pdf - (Co)-Algebraic and Analytical Aspects of Weighted Automata ... · Minimisation vs Approximation • Minimisation is naturally related to (co-)algebraic properties • Approximation

(Co)-Algebraic and Analytical Aspects of Weighted Automata Minimisation and Equivalence

(Part 2)

Borja Balle

[CMCS Tutorial, Apr 2 2016]

Minimisation vs Approximation

16 novática Special English Edition - 2013/2014 Annual Selection of Articles

monograph Process Mining

monograph

Figure 6. Process Model Discovered for a Group of 627 Gynecological Oncology Patients.

Figure 5. Conformance Analysis Showing Deviations between Eventlog and Process.



monograph



minimisation




monograph





monograph



minimisation

q1 q2

q3 q4approximation


• Minimisation is naturally related to (co-)algebraic properties

• Approximation questions need analytical tools (eg. metrics)



monograph





monograph



minimisation

q1 q2

q3 q4approximation

Motivations for Approximation

1. Efficient (approximate) computation with WA

• Compute faster and pay a controlled price in terms of accuracy

2. Machine learning for WA

• Understand learning bias, design better algorithms

3. Interesting mathematical questions

WA and Rational Functions

A q1

11

q2

´1´2

a, 1b, 0

a,´1b,´2

a, 3b, 5

a,´2b, 0 A = ��, �, {Ta}a��

�, � � Rn, Ta � Rn�n

� =

1-1

�Tb =

�0 �20 5

�Ta =

�1 �1

�2 3

�� =

�1 �2

�

WA and Rational Functions

A q1

11

q2

´1´2

a, 1b, 0

a,´1b,´2

a, 3b, 5

a,´2b, 0 A = ��, �, {Ta}a��

�, � � Rn, Ta � Rn�n

fA : ⌃? ! Rsum of weights of all paths labelled by the

input string

fA(x) = �Tx1 · · · Txt�

= �Tx�

(Pseudo-)Metrics for ApproximationHow do we measure the quality of WA approximations?

(Pseudo-)Metrics for Approximation

�f�p =

��

x��

|f(x)|p�1/p

distp(A,B) = ∥fA − fB∥p

How do we measure the quality of WA approximations?


�f�p =

��

x��

|f(x)|p�1/p



pseudo-metric


�f�p =

��

x��

|f(x)|p�1/p



ℓp = {f : Σ⋆ → R | ∥f∥p < ∞}

Rat = {f : Σ⋆ → R | rational}

ℓpRat = Rat ∩ ℓp = {f : Σ⋆ → R | rational, ∥f∥p < ∞}

Important linearsubspaces of R��


�f�p =

��

x��

|f(x)|p�1/p



ℓp = {f : Σ⋆ → R | ∥f∥p < ∞}

Rat = {f : Σ⋆ → R | rational}

ℓpRat = Rat ∩ ℓp = {f : Σ⋆ → R | rational, ∥f∥p < ∞}

Important linearsubspaces of R��

Banach/Hilbert

not complete!

An Approximation Result

There is an algorithm that given a minimal WA A with nstates and k < n, runs in time O(|�|n6) and returns aWA A with k states such that

dist2(A, A) ��

�2k+1 + · · · + �2

n

[Balle, Panangaden, Precup 2015, Kulesza, Jiang, Singh 2015]

An Approximation Result

There is an algorithm that given a minimal WA A with nstates and k < n, runs in time O(|�|n6) and returns aWA A with k states such that

dist2(A, A) ��

�2k+1 + · · · + �2

n

singular values of theHankel matrix of fA

[Balle, Panangaden, Precup 2015, Kulesza, Jiang, Singh 2015]

The Plan

1. Hankel Matrices and Singular Value Automata (SVA)

2. Algorithms for Computing SVA

3. WA Approximation via SVA Truncation

The Hankel Matrix

Hf 2 R�?⇥�?

f : �? ! R

Hf(p, s) = f(ps)

»

——————————–

" a b ¨¨¨ s ¨¨¨

" fp"q fpaq fpbq ...

a fpaq fpaaq fpabq ...

b fpbq fpbaq fpbbq ......p ¨ ¨ ¨ ¨ ¨ ¨ ¨ ¨ ¨ fppsq...

fi

��fl

The Hankel Matrix

Hf 2 R�?⇥�?

f : �? ! R

Hf(p, s) = f(ps)

»

——————————–

" a b ¨¨¨ s ¨¨¨

" fp"q fpaq fpbq ...

a fpaq fpaaq fpabq ...

b fpbq fpbaq fpbbq ......p ¨ ¨ ¨ ¨ ¨ ¨ ¨ ¨ ¨ fppsq...

fi

��fl

The Hankel Matrix

[Kronecker 1881, Carlyle & Paz 1971, Fliess 1974]

Theorem: rank(Hf) = n iff f computed by minimal WA with n states

Hf 2 R�?⇥�?

f : �? ! R

Hf(p, s) = f(ps)

Hf(p, s) = f(ps) = αTp1 · · · TptTs1 · · · Tst′β = αp ·βs

Proof (upper bound rank(HfA) � |A|)

Structure of Low-rank Hankel Matrices

HfA R� � P R� n S Rn �

s

...

...

...p

...

p

s

fA p1 pT s1 sT �0 Ap1 ApT

↵A p

As1 AsT �

�A s

�A p P p, ⇥A s S , s



Structure of Low-rank Hankel Matrices

HfA R� � P R� n S Rn �

s

...

...

...p

...

p

s

fA p1 pT s1 sT �0 Ap1 ApT

↵A p

As1 AsT �

�A s

�A p P p, ⇥A s S , s

Hf = P · S



Proof (equality rank(Hf) = minfA=f |A|)1. Suppose Hf = P · S is rank factorisation

2. For a � � define Haf (p, s) = f(pas) and note

Haf = RaHf where (Raf)(x) = f(xa)

3. Observe that because S has full row rank colspan(RaP) =colspan(Ha

f ) � colspan(Hf) = colspan(P)

4. Hence there exists Ta such that RaP = PTa

5. Let A = �� = P(�, �), � = S(�, �), {Ta}�

6. Note fA = f since P(�, �)TxS(�, �) = RxP(�, �)S(�, �) =P(x, �)S(�, �) = Hf(x, �)

A Useful Correspondence

AQ = ��Q, Q�1�, {Q�1TaQ}a��

A = ��, �, {Ta}a�� Q � Rn�n invertible


(�Q)(Q�1Tx1Q) · · · (Q�1TxtQ)(Q�1�) = �Tx1 · · · Txt�

AQ = ��Q, Q�1�, {Q�1TaQ}a��



(�Q)(Q�1Tx1Q) · · · (Q�1TxtQ)(Q�1�) = �Tx1 · · · Txt�

AQ = ��Q, Q�1�, {Q�1TaQ}a��


Theorem:

bijection

For any rational f : �� R

Minimal WA Acomputing f

Rank factorizationsof the form Hf = P · S

[Balle, Carreras, Luque, Quattoni 2014]

SVD of Hankel Matrices

Hf = UDV⊤ =

⎡

⎢⎢⎣

......

u1 · · · un

......

⎤

⎥⎥⎦

⎡

⎢⎣σ1

. . .σn

⎤

⎥⎦

⎡

⎢⎣· · · v⊤1 · · ·

...· · · v⊤n · · ·

⎤

⎥⎦


Hf = UDV⊤ =

⎡

⎢⎢⎣

......

u1 · · · un

......

⎤

⎥⎥⎦

⎡

⎢⎣σ1

. . .σn

⎤

⎥⎦

⎡

⎢⎣· · · v⊤1 · · ·

...· · · v⊤n · · ·

⎤

⎥⎦

Theorem:

Hf with f rational admitsan SVD iff �f�2 < �

[Balle, Panangaden, Precup 2015, 201?]


Hf = UDV⊤ =

⎡

⎢⎢⎣

......

u1 · · · un

......

⎤

⎥⎥⎦

⎡

⎢⎣σ1

. . .σn

⎤

⎥⎦

⎡

⎢⎣· · · v⊤1 · · ·

...· · · v⊤n · · ·

⎤

⎥⎦

Hf : �2 � �2

g �� Hfg

(Hfg)(x) =�

y��

f(xy)g(y)

Key Idea: see Hankel matrix as an operator in a Hilbert spaceTheorem:

Hf with f rational admitsan SVD iff �f�2 < �


The Singular Value AutomatonFrom Correspondence Theorem:

If Hf admits an SVD Hf = UDV�, then there exists a WAA for f inducing P = UD1/2 and S = D1/2V�

[Balle, Panangaden, Precup 2015]



singular value automaton




WA computing sing. vect. can be obtained from SVA:

Ui = ⟨α, ei/√σi, {Ta}⟩

Vi = ⟨ei/√σi,β, {Ta}⟩

fUi= ui

fVi= vi

A = ��, �, {Ta}a��

singular value automaton


Consequences of SVA

1. For every f � �2Rat the SVA provides a “unique”canonical WA representation

2. The vector subspace �2Rat � R��is closed under

taking left/right singular vectors of Hankel matrices

Consequences of SVA

1. For every f � �2Rat the SVA provides a “unique”canonical WA representation

2. The vector subspace �2Rat � R��is closed under

taking left/right singular vectors of Hankel matrices

The Plan




Two Important Matrices

Reachability Gramian: GP = P> · P

Observability Gramian: GS = S · S>




Theorem:Assuming A is minimal, GP and GS are well-defined iff �fA�2 < �





P = UD1/2 ) GP = D

S = D1/2V> ) GS = D

For SVA




GQP = Q>GPQ

GQS = Q-1GSQ

->

AQFor

P = UD1/2 ) GP = D

S = D1/2V> ) GS = D

For SVA

Computing the SVAGiven minimal WA A computing f � �2Rat do:

1. Compute Gramians GP and GS

2. Diagonalize M = GSGP

(i.e. find Q s.t. Q�1MQ = D2)

3. Obtain SVA as AQ

Computing the SVA

note:Step 2 is a finite eigenvalue problem :-)

Step 1 involves products of infinite matrices :-o

Given minimal WA A computing f � �2Rat do:

1. Compute Gramians GP and GS

2. Diagonalize M = GSGP

(i.e. find Q s.t. Q�1MQ = D2)

3. Obtain SVA as AQ

Computing the Gramians

Fixed-point Equations for Gramians

GS = β · β⊤ +∑

a∈Σ

Ta ·GS · T⊤a

GP = α⊤ · α+∑

a∈Σ

T⊤a ·GP · Ta


Closed-form Algorithm

Total time O(n6 + |�|n4)vec(GS) =

�

I ��

a��

Ta � Ta

��1

(� � �)


GS = β · β⊤ +∑

a∈Σ

Ta ·GS · T⊤a

GP = α⊤ · α+∑

a∈Σ

T⊤a ·GP · Ta


Closed-form Algorithm

Total time O(n6 + |�|n4)vec(GS) =

�

I ��

a��

Ta � Ta

��1

(� � �)


GS = β · β⊤ +∑

a∈Σ

Ta ·GS · T⊤a

GP = α⊤ · α+∑

a∈Σ

T⊤a ·GP · Ta

note: there is also an iterative algorithm with O(|�|n2.4) flops per iteration

The Plan




Approximation (Attempt 1)⎡

⎢⎢⎣

......

u1 · · · un

......

⎤

⎥⎥⎦

⎡

⎢⎣σ1

. . .σn

⎤

⎥⎦

⎡

⎢⎣· · · v⊤1 · · ·

...· · · v⊤n · · ·

⎤

⎥⎦ ≈

⎡

⎢⎢⎣

......

u1 · · · uk

......

⎤

⎥⎥⎦

⎡

⎢⎣σ1

. . .σk

⎤

⎥⎦

⎡

⎢⎣· · · v⊤1 · · ·

...· · · vk

⊤ · · ·

⎤

⎥⎦

Hf H


⎢⎢⎣

......

u1 · · · un

......

⎤

⎥⎥⎦

⎡

⎢⎣σ1

. . .σn

⎤

⎥⎦

⎡

⎢⎣· · · v⊤1 · · ·

...· · · v⊤n · · ·

⎤

⎥⎦ ≈

⎡

⎢⎢⎣

......

u1 · · · uk

......

⎤

⎥⎥⎦

⎡

⎢⎣σ1

. . .σk

⎤

⎥⎦

⎡

⎢⎣· · · v⊤1 · · ·

...· · · vk

⊤ · · ·

⎤

⎥⎦

Hf H best rank k approximation

Claim:

If f : �� R is such that Hf = H, then rank(f) = k

and �f � f�2 � �Hf � H�op = �k+1


⎢⎢⎣

......

u1 · · · un

......

⎤

⎥⎥⎦

⎡

⎢⎣σ1

. . .σn

⎤

⎥⎦

⎡

⎢⎣· · · v⊤1 · · ·

...· · · v⊤n · · ·

⎤

⎥⎦ ≈

⎡

⎢⎢⎣

......

u1 · · · uk

......

⎤

⎥⎥⎦

⎡

⎢⎣σ1

. . .σk

⎤

⎥⎦

⎡

⎢⎣· · · v⊤1 · · ·

...· · · vk

⊤ · · ·

⎤

⎥⎦


Claim:

If f : �� R is such that Hf = H, then rank(f) = k

and �f � f�2 � �Hf � H�op = �k+1


⎢⎢⎣

......

u1 · · · un

......

⎤

⎥⎥⎦

⎡

⎢⎣σ1

. . .σn

⎤

⎥⎦

⎡

⎢⎣· · · v⊤1 · · ·

...· · · v⊤n · · ·

⎤

⎥⎦ ≈

⎡

⎢⎢⎣

......

u1 · · · uk

......

⎤

⎥⎥⎦

⎡

⎢⎣σ1

. . .σk

⎤

⎥⎦

⎡

⎢⎣· · · v⊤1 · · ·

...· · · vk

⊤ · · ·

⎤

⎥⎦


not Hankel!

Approximation (Attempt 2)

Hf =

⎡

⎢⎢⎣

...f...

⎤

⎥⎥⎦ =

⎡

⎢⎢⎣

......√

σ1u1 · · · √σnun

......

⎤

⎥⎥⎦

⎡

⎢⎣β1...

βn

⎤

⎥⎦

UD1/2 D1/2V⊤


f =n∑

i=1

βi√σiuiHf =

⎡

⎢⎢⎣

...f...

⎤

⎥⎥⎦ =

⎡

⎢⎢⎣

......√

σ1u1 · · · √σnun

......

⎤

⎥⎥⎦

⎡

⎢⎣β1...

βn

⎤

⎥⎦

UD1/2 D1/2V⊤

Claim:

Taking f =�k

i=1 �i�

�iui we have a rational functionsum of k orthogonal rational functions and �f � f�2

2 =�ni=k+1 �2

i�i � �ni=k+1 �2

i


f =n∑

i=1

βi√σiuiHf =

⎡

⎢⎢⎣

...f...

⎤

⎥⎥⎦ =

⎡

⎢⎢⎣

......√

σ1u1 · · · √σnun

......

⎤

⎥⎥⎦

⎡

⎢⎣β1...

βn

⎤

⎥⎦

UD1/2 D1/2V⊤

Claim:

Taking f =�k

i=1 �i�

�iui we have a rational functionsum of k orthogonal rational functions and �f � f�2

2 =�ni=k+1 �2

i�i � �ni=k+1 �2

i


f =n∑

i=1

βi√σiui

has rank n, but want k!

Hf =

⎡

⎢⎢⎣

...f...

⎤

⎥⎥⎦ =

⎡

⎢⎢⎣

......√

σ1u1 · · · √σnun

......

⎤

⎥⎥⎦

⎡

⎢⎣β1...

βn

⎤

⎥⎦

UD1/2 D1/2V⊤

SVA Truncation

q1 q2

q3 q4

Keep

Remove

SVA Truncation

q1 q2

q3 q4

Keep

Remove

corresponding to k largest singular values

SVA Truncation

q1 q2

q3 q4

Keep

Remove

corresponding to k largest singular values

A� =

2

664

• • • •• • • •• • • •• • • •

3

775

keep to keep

remove to removeremove to keep

keep to remove

Tσ

Error in SVA TruncationOriginal Truncated

A� =

2

664

• • • •• • • •• • • •• • • •

3

775 A� =

�• •• •

�Tσ Tσ

A, f, n states A, f, k states

Error in SVA TruncationOriginal Truncated

A� =

2

664

• • • •• • • •• • • •• • • •

3

775 A� =

�• •• •

�Tσ Tσ

A, f, n states A, f, k states

Error Bounds:

�f � f�2 = �

��

�

Cf(�k+1 + · · · + �n)1/4

(�2k+1 + · · · + �2

n)1/2

[Balle, Panangaden, Precup 2015, 201?, Kulesza, Jiang, Singh 2015]

Conclusion

Conclusion

• Tools from functional analysis must play a role in a theory of automata approximation

Conclusion


• Approximation theories can lead to useful algorithms for large-scale problems

Conclusion


• Approximation theories can lead to useful algorithms for large-scale problems

• Change of basis in WA doesn’t change the function but yields automata with different properties (eg. for truncation)

Open Problems

Open Problems• How can this be extended to other types of automata, state

machines, co-algebras, etc?



• How important is it that WA are closed under reversal for the error analysis? [see Rabusseau, Balle, Cohen 2016]




• Can the AAK theorems in control theory be generalised to WA?




• Can the AAK theorems in control theory be generalised to WA?

• Can we obtain efficient approximation algorithms with extra constraints (eg. sparsity, positivity)?

(Co)-Algebraic and Analytical Aspects of Weighted Automata Minimisation and Equivalence

(Part 2)

Borja Balle

[CMCS Tutorial, Apr 2 2016]