(Co)-Algebraic and Analytical Aspects of Weighted Automata Minimisation and Equivalence
(Part 2)
Borja Balle
[CMCS Tutorial, Apr 2 2016]
Minimisation vs Approximation
16 novática Special English Edition - 2013/2014 Annual Selection of Articles
monograph Process Mining
monograph
Figure 6. Process Model Discovered for a Group of 627 Gynecological Oncology Patients.
Figure 5. Conformance Analysis Showing Deviations between Eventlog and Process.
16 novática Special English Edition - 2013/2014 Annual Selection of Articles
monograph Process Mining
monograph
Figure 6. Process Model Discovered for a Group of 627 Gynecological Oncology Patients.
Figure 5. Conformance Analysis Showing Deviations between Eventlog and Process.
minimisation
Minimisation vs Approximation
16 novática Special English Edition - 2013/2014 Annual Selection of Articles
monograph Process Mining
monograph
Figure 6. Process Model Discovered for a Group of 627 Gynecological Oncology Patients.
Figure 5. Conformance Analysis Showing Deviations between Eventlog and Process.
16 novática Special English Edition - 2013/2014 Annual Selection of Articles
monograph Process Mining
monograph
Figure 6. Process Model Discovered for a Group of 627 Gynecological Oncology Patients.
Figure 5. Conformance Analysis Showing Deviations between Eventlog and Process.
minimisation
q1 q2
q3 q4approximation
Minimisation vs Approximation
• Minimisation is naturally related to (co-)algebraic properties
• Approximation questions need analytical tools (eg. metrics)
16 novática Special English Edition - 2013/2014 Annual Selection of Articles
monograph Process Mining
monograph
Figure 6. Process Model Discovered for a Group of 627 Gynecological Oncology Patients.
Figure 5. Conformance Analysis Showing Deviations between Eventlog and Process.
16 novática Special English Edition - 2013/2014 Annual Selection of Articles
monograph Process Mining
monograph
Figure 6. Process Model Discovered for a Group of 627 Gynecological Oncology Patients.
Figure 5. Conformance Analysis Showing Deviations between Eventlog and Process.
minimisation
q1 q2
q3 q4approximation
Motivations for Approximation
1. Efficient (approximate) computation with WA
• Compute faster and pay a controlled price in terms of accuracy
2. Machine learning for WA
• Understand learning bias, design better algorithms
3. Interesting mathematical questions
WA and Rational Functions
A q1
11
q2
´1´2
a, 1b, 0
a,´1b,´2
a, 3b, 5
a,´2b, 0 A = ��, �, {Ta}a���
�, � � Rn, Ta � Rn�n
� =
1-1
�Tb =
�0 �20 5
�Ta =
�1 �1
�2 3
�� =
�1 �2
�
WA and Rational Functions
A q1
11
q2
´1´2
a, 1b, 0
a,´1b,´2
a, 3b, 5
a,´2b, 0 A = ��, �, {Ta}a���
�, � � Rn, Ta � Rn�n
fA : ⌃? ! Rsum of weights of all paths labelled by the
input string
fA(x) = �Tx1 · · · Txt�
= �Tx�
(Pseudo-)Metrics for ApproximationHow do we measure the quality of WA approximations?
(Pseudo-)Metrics for Approximation
�f�p =
��
x���
|f(x)|p�1/p
distp(A,B) = ∥fA − fB∥p
How do we measure the quality of WA approximations?
(Pseudo-)Metrics for Approximation
�f�p =
��
x���
|f(x)|p�1/p
distp(A,B) = ∥fA − fB∥p
How do we measure the quality of WA approximations?
pseudo-metric
(Pseudo-)Metrics for Approximation
�f�p =
��
x���
|f(x)|p�1/p
distp(A,B) = ∥fA − fB∥p
How do we measure the quality of WA approximations?
ℓp = {f : Σ⋆ → R | ∥f∥p < ∞}
Rat = {f : Σ⋆ → R | rational}
ℓpRat = Rat ∩ ℓp = {f : Σ⋆ → R | rational, ∥f∥p < ∞}
Important linearsubspaces of R��
(Pseudo-)Metrics for Approximation
�f�p =
��
x���
|f(x)|p�1/p
distp(A,B) = ∥fA − fB∥p
How do we measure the quality of WA approximations?
ℓp = {f : Σ⋆ → R | ∥f∥p < ∞}
Rat = {f : Σ⋆ → R | rational}
ℓpRat = Rat ∩ ℓp = {f : Σ⋆ → R | rational, ∥f∥p < ∞}
Important linearsubspaces of R��
Banach/Hilbert
not complete!
An Approximation Result
There is an algorithm that given a minimal WA A with nstates and k < n, runs in time O(|�|n6) and returns aWA A with k states such that
dist2(A, A) ��
�2k+1 + · · · + �2
n
[Balle, Panangaden, Precup 2015, Kulesza, Jiang, Singh 2015]
An Approximation Result
There is an algorithm that given a minimal WA A with nstates and k < n, runs in time O(|�|n6) and returns aWA A with k states such that
dist2(A, A) ��
�2k+1 + · · · + �2
n
singular values of theHankel matrix of fA
[Balle, Panangaden, Precup 2015, Kulesza, Jiang, Singh 2015]
The Plan
1. Hankel Matrices and Singular Value Automata (SVA)
2. Algorithms for Computing SVA
3. WA Approximation via SVA Truncation
The Hankel Matrix
Hf 2 R�?⇥�?
f : �? ! R
Hf(p, s) = f(ps)
»
——————————–
" a b ¨¨¨ s ¨¨¨
" fp"q fpaq fpbq ...
a fpaq fpaaq fpabq ...
b fpbq fpbaq fpbbq ......p ¨ ¨ ¨ ¨ ¨ ¨ ¨ ¨ ¨ fppsq...
fi
����������fl
The Hankel Matrix
Hf 2 R�?⇥�?
f : �? ! R
Hf(p, s) = f(ps)
»
——————————–
" a b ¨¨¨ s ¨¨¨
" fp"q fpaq fpbq ...
a fpaq fpaaq fpabq ...
b fpbq fpbaq fpbbq ......p ¨ ¨ ¨ ¨ ¨ ¨ ¨ ¨ ¨ fppsq...
fi
����������fl
The Hankel Matrix
[Kronecker 1881, Carlyle & Paz 1971, Fliess 1974]
Theorem: rank(Hf) = n iff f computed by minimal WA with n states
Hf 2 R�?⇥�?
f : �? ! R
Hf(p, s) = f(ps)
Hf(p, s) = f(ps) = αTp1 · · · TptTs1 · · · Tst′β = αp ·βs
Proof (upper bound rank(HfA) � |A|)
Structure of Low-rank Hankel Matrices
HfA R� � P R� n S Rn �
s
...
...
...p
...
p
s
fA p1 pT s1 sT �0 Ap1 ApT
↵A p
As1 AsT �
�A s
�A p P p, ⇥A s S , s
Hf(p, s) = f(ps) = αTp1 · · · TptTs1 · · · Tst′β = αp ·βs
Proof (upper bound rank(HfA) � |A|)
Structure of Low-rank Hankel Matrices
HfA R� � P R� n S Rn �
s
...
...
...p
...
p
s
fA p1 pT s1 sT �0 Ap1 ApT
↵A p
As1 AsT �
�A s
�A p P p, ⇥A s S , s
Hf = P · S
Hf(p, s) = f(ps) = αTp1 · · · TptTs1 · · · Tst′β = αp ·βs
Proof (upper bound rank(HfA) � |A|)
Proof (equality rank(Hf) = minfA=f |A|)1. Suppose Hf = P · S is rank factorisation
2. For a � � define Haf (p, s) = f(pas) and note
Haf = RaHf where (Raf)(x) = f(xa)
3. Observe that because S has full row rank colspan(RaP) =colspan(Ha
f ) � colspan(Hf) = colspan(P)
4. Hence there exists Ta such that RaP = PTa
5. Let A = �� = P(�, �), � = S(�, �), {Ta}�
6. Note fA = f since P(�, �)TxS(�, �) = RxP(�, �)S(�, �) =P(x, �)S(�, �) = Hf(x, �)
A Useful Correspondence
AQ = ��Q, Q�1�, {Q�1TaQ}a���
A = ��, �, {Ta}a��� Q � Rn�n invertible
A Useful Correspondence
(�Q)(Q�1Tx1Q) · · · (Q�1TxtQ)(Q�1�) = �Tx1 · · · Txt�
AQ = ��Q, Q�1�, {Q�1TaQ}a���
A = ��, �, {Ta}a��� Q � Rn�n invertible
A Useful Correspondence
(�Q)(Q�1Tx1Q) · · · (Q�1TxtQ)(Q�1�) = �Tx1 · · · Txt�
AQ = ��Q, Q�1�, {Q�1TaQ}a���
A = ��, �, {Ta}a��� Q � Rn�n invertible
Theorem:
bijection
For any rational f : �� � R
Minimal WA Acomputing f
Rank factorizationsof the form Hf = P · S
[Balle, Carreras, Luque, Quattoni 2014]
SVD of Hankel Matrices
Hf = UDV⊤ =
⎡
⎢⎢⎣
......
u1 · · · un
......
⎤
⎥⎥⎦
⎡
⎢⎣σ1
. . .σn
⎤
⎥⎦
⎡
⎢⎣· · · v⊤1 · · ·
...· · · v⊤n · · ·
⎤
⎥⎦
SVD of Hankel Matrices
Hf = UDV⊤ =
⎡
⎢⎢⎣
......
u1 · · · un
......
⎤
⎥⎥⎦
⎡
⎢⎣σ1
. . .σn
⎤
⎥⎦
⎡
⎢⎣· · · v⊤1 · · ·
...· · · v⊤n · · ·
⎤
⎥⎦
Theorem:
Hf with f rational admitsan SVD iff �f�2 < �
[Balle, Panangaden, Precup 2015, 201?]
SVD of Hankel Matrices
Hf = UDV⊤ =
⎡
⎢⎢⎣
......
u1 · · · un
......
⎤
⎥⎥⎦
⎡
⎢⎣σ1
. . .σn
⎤
⎥⎦
⎡
⎢⎣· · · v⊤1 · · ·
...· · · v⊤n · · ·
⎤
⎥⎦
Hf : �2 � �2
g �� Hfg
(Hfg)(x) =�
y���
f(xy)g(y)
Key Idea: see Hankel matrix as an operator in a Hilbert spaceTheorem:
Hf with f rational admitsan SVD iff �f�2 < �
[Balle, Panangaden, Precup 2015, 201?]
The Singular Value AutomatonFrom Correspondence Theorem:
If Hf admits an SVD Hf = UDV�, then there exists a WAA for f inducing P = UD1/2 and S = D1/2V�
[Balle, Panangaden, Precup 2015]
The Singular Value AutomatonFrom Correspondence Theorem:
If Hf admits an SVD Hf = UDV�, then there exists a WAA for f inducing P = UD1/2 and S = D1/2V�
singular value automaton
[Balle, Panangaden, Precup 2015]
The Singular Value AutomatonFrom Correspondence Theorem:
If Hf admits an SVD Hf = UDV�, then there exists a WAA for f inducing P = UD1/2 and S = D1/2V�
WA computing sing. vect. can be obtained from SVA:
Ui = ⟨α, ei/√σi, {Ta}⟩
Vi = ⟨ei/√σi,β, {Ta}⟩
fUi= ui
fVi= vi
A = ��, �, {Ta}a���
singular value automaton
[Balle, Panangaden, Precup 2015]
Consequences of SVA
1. For every f � �2Rat the SVA provides a “unique”canonical WA representation
2. The vector subspace �2Rat � R��is closed under
taking left/right singular vectors of Hankel matrices
Consequences of SVA
1. For every f � �2Rat the SVA provides a “unique”canonical WA representation
2. The vector subspace �2Rat � R��is closed under
taking left/right singular vectors of Hankel matrices
The Plan
1. Hankel Matrices and Singular Value Automata (SVA)
2. Algorithms for Computing SVA
3. WA Approximation via SVA Truncation
Two Important Matrices
Reachability Gramian: GP = P> · P
Observability Gramian: GS = S · S>
Two Important Matrices
Reachability Gramian: GP = P> · P
Observability Gramian: GS = S · S>
Theorem:Assuming A is minimal, GP and GS are well-defined iff �fA�2 < �
[Balle, Panangaden, Precup 2015, 201?]
Two Important Matrices
Reachability Gramian: GP = P> · P
Observability Gramian: GS = S · S>
P = UD1/2 ) GP = D
S = D1/2V> ) GS = D
For SVA
Two Important Matrices
Reachability Gramian: GP = P> · P
Observability Gramian: GS = S · S>
GQP = Q>GPQ
GQS = Q-1GSQ
->
AQFor
P = UD1/2 ) GP = D
S = D1/2V> ) GS = D
For SVA
Computing the SVAGiven minimal WA A computing f � �2Rat do:
1. Compute Gramians GP and GS
2. Diagonalize M = GSGP
(i.e. find Q s.t. Q�1MQ = D2)
3. Obtain SVA as AQ
Computing the SVA
note:Step 2 is a finite eigenvalue problem :-)
Step 1 involves products of infinite matrices :-o
Given minimal WA A computing f � �2Rat do:
1. Compute Gramians GP and GS
2. Diagonalize M = GSGP
(i.e. find Q s.t. Q�1MQ = D2)
3. Obtain SVA as AQ
Computing the Gramians
Fixed-point Equations for Gramians
GS = β · β⊤ +∑
a∈Σ
Ta ·GS · T⊤a
GP = α⊤ · α+∑
a∈Σ
T⊤a ·GP · Ta
Computing the Gramians
Closed-form Algorithm
Total time O(n6 + |�|n4)vec(GS) =
�
I ��
a��
Ta � Ta
��1
(� � �)
Fixed-point Equations for Gramians
GS = β · β⊤ +∑
a∈Σ
Ta ·GS · T⊤a
GP = α⊤ · α+∑
a∈Σ
T⊤a ·GP · Ta
Computing the Gramians
Closed-form Algorithm
Total time O(n6 + |�|n4)vec(GS) =
�
I ��
a��
Ta � Ta
��1
(� � �)
Fixed-point Equations for Gramians
GS = β · β⊤ +∑
a∈Σ
Ta ·GS · T⊤a
GP = α⊤ · α+∑
a∈Σ
T⊤a ·GP · Ta
note: there is also an iterative algorithm with O(|�|n2.4) flops per iteration
The Plan
1. Hankel Matrices and Singular Value Automata (SVA)
2. Algorithms for Computing SVA
3. WA Approximation via SVA Truncation
Approximation (Attempt 1)⎡
⎢⎢⎣
......
u1 · · · un
......
⎤
⎥⎥⎦
⎡
⎢⎣σ1
. . .σn
⎤
⎥⎦
⎡
⎢⎣· · · v⊤1 · · ·
...· · · v⊤n · · ·
⎤
⎥⎦ ≈
⎡
⎢⎢⎣
......
u1 · · · uk
......
⎤
⎥⎥⎦
⎡
⎢⎣σ1
. . .σk
⎤
⎥⎦
⎡
⎢⎣· · · v⊤1 · · ·
...· · · vk
⊤ · · ·
⎤
⎥⎦
Hf H
Approximation (Attempt 1)⎡
⎢⎢⎣
......
u1 · · · un
......
⎤
⎥⎥⎦
⎡
⎢⎣σ1
. . .σn
⎤
⎥⎦
⎡
⎢⎣· · · v⊤1 · · ·
...· · · v⊤n · · ·
⎤
⎥⎦ ≈
⎡
⎢⎢⎣
......
u1 · · · uk
......
⎤
⎥⎥⎦
⎡
⎢⎣σ1
. . .σk
⎤
⎥⎦
⎡
⎢⎣· · · v⊤1 · · ·
...· · · vk
⊤ · · ·
⎤
⎥⎦
Hf H best rank k approximation
Claim:
If f : �� � R is such that Hf = H, then rank(f) = k
and �f � f�2 � �Hf � H�op = �k+1
Approximation (Attempt 1)⎡
⎢⎢⎣
......
u1 · · · un
......
⎤
⎥⎥⎦
⎡
⎢⎣σ1
. . .σn
⎤
⎥⎦
⎡
⎢⎣· · · v⊤1 · · ·
...· · · v⊤n · · ·
⎤
⎥⎦ ≈
⎡
⎢⎢⎣
......
u1 · · · uk
......
⎤
⎥⎥⎦
⎡
⎢⎣σ1
. . .σk
⎤
⎥⎦
⎡
⎢⎣· · · v⊤1 · · ·
...· · · vk
⊤ · · ·
⎤
⎥⎦
Hf H best rank k approximation
Claim:
If f : �� � R is such that Hf = H, then rank(f) = k
and �f � f�2 � �Hf � H�op = �k+1
Approximation (Attempt 1)⎡
⎢⎢⎣
......
u1 · · · un
......
⎤
⎥⎥⎦
⎡
⎢⎣σ1
. . .σn
⎤
⎥⎦
⎡
⎢⎣· · · v⊤1 · · ·
...· · · v⊤n · · ·
⎤
⎥⎦ ≈
⎡
⎢⎢⎣
......
u1 · · · uk
......
⎤
⎥⎥⎦
⎡
⎢⎣σ1
. . .σk
⎤
⎥⎦
⎡
⎢⎣· · · v⊤1 · · ·
...· · · vk
⊤ · · ·
⎤
⎥⎦
Hf H best rank k approximation
not Hankel!
Approximation (Attempt 2)
Hf =
⎡
⎢⎢⎣
...f...
⎤
⎥⎥⎦ =
⎡
⎢⎢⎣
......√
σ1u1 · · · √σnun
......
⎤
⎥⎥⎦
⎡
⎢⎣β1...
βn
⎤
⎥⎦
UD1/2 D1/2V⊤
Approximation (Attempt 2)
f =n∑
i=1
βi√σiuiHf =
⎡
⎢⎢⎣
...f...
⎤
⎥⎥⎦ =
⎡
⎢⎢⎣
......√
σ1u1 · · · √σnun
......
⎤
⎥⎥⎦
⎡
⎢⎣β1...
βn
⎤
⎥⎦
UD1/2 D1/2V⊤
Claim:
Taking f =�k
i=1 �i�
�iui we have a rational functionsum of k orthogonal rational functions and �f � f�2
2 =�ni=k+1 �2
i�i � �ni=k+1 �2
i
Approximation (Attempt 2)
f =n∑
i=1
βi√σiuiHf =
⎡
⎢⎢⎣
...f...
⎤
⎥⎥⎦ =
⎡
⎢⎢⎣
......√
σ1u1 · · · √σnun
......
⎤
⎥⎥⎦
⎡
⎢⎣β1...
βn
⎤
⎥⎦
UD1/2 D1/2V⊤
Claim:
Taking f =�k
i=1 �i�
�iui we have a rational functionsum of k orthogonal rational functions and �f � f�2
2 =�ni=k+1 �2
i�i � �ni=k+1 �2
i
Approximation (Attempt 2)
f =n∑
i=1
βi√σiui
has rank n, but want k!
Hf =
⎡
⎢⎢⎣
...f...
⎤
⎥⎥⎦ =
⎡
⎢⎢⎣
......√
σ1u1 · · · √σnun
......
⎤
⎥⎥⎦
⎡
⎢⎣β1...
βn
⎤
⎥⎦
UD1/2 D1/2V⊤
SVA Truncation
q1 q2
q3 q4
Keep
Remove
SVA Truncation
q1 q2
q3 q4
Keep
Remove
corresponding to k largest singular values
SVA Truncation
q1 q2
q3 q4
Keep
Remove
corresponding to k largest singular values
A� =
2
664
• • • •• • • •• • • •• • • •
3
775
keep to keep
remove to removeremove to keep
keep to remove
Tσ
Error in SVA TruncationOriginal Truncated
A� =
2
664
• • • •• • • •• • • •• • • •
3
775 A� =
�• •• •
�Tσ Tσ
A, f, n states A, f, k states
Error in SVA TruncationOriginal Truncated
A� =
2
664
• • • •• • • •• • • •• • • •
3
775 A� =
�• •• •
�Tσ Tσ
A, f, n states A, f, k states
Error Bounds:
�f � f�2 = �
��
�
Cf(�k+1 + · · · + �n)1/4
(�2k+1 + · · · + �2
n)1/2
[Balle, Panangaden, Precup 2015, 201?, Kulesza, Jiang, Singh 2015]
Conclusion
Conclusion
• Tools from functional analysis must play a role in a theory of automata approximation
Conclusion
• Tools from functional analysis must play a role in a theory of automata approximation
• Approximation theories can lead to useful algorithms for large-scale problems
Conclusion
• Tools from functional analysis must play a role in a theory of automata approximation
• Approximation theories can lead to useful algorithms for large-scale problems
• Change of basis in WA doesn’t change the function but yields automata with different properties (eg. for truncation)
Open Problems
Open Problems• How can this be extended to other types of automata, state
machines, co-algebras, etc?
Open Problems• How can this be extended to other types of automata, state
machines, co-algebras, etc?
• How important is it that WA are closed under reversal for the error analysis? [see Rabusseau, Balle, Cohen 2016]
Open Problems• How can this be extended to other types of automata, state
machines, co-algebras, etc?
• How important is it that WA are closed under reversal for the error analysis? [see Rabusseau, Balle, Cohen 2016]
• Can the AAK theorems in control theory be generalised to WA?
Open Problems• How can this be extended to other types of automata, state
machines, co-algebras, etc?
• How important is it that WA are closed under reversal for the error analysis? [see Rabusseau, Balle, Cohen 2016]
• Can the AAK theorems in control theory be generalised to WA?
• Can we obtain efficient approximation algorithms with extra constraints (eg. sparsity, positivity)?
(Co)-Algebraic and Analytical Aspects of Weighted Automata Minimisation and Equivalence
(Part 2)
Borja Balle
[CMCS Tutorial, Apr 2 2016]