ee580_notes.pdf

7/21/2019 ee580_notes.pdf

http://slidepdf.com/reader/full/ee580notespdf 1/119

EE 580 — Linear Control Systems

I. Models of Dynamical Systems

Department of Electrical Engineering

Pennsylvania State University

Fall 2010

c 2010 Ji-Woong Lee

1.1 Introduction

We are concerned with the analysis and synthesis of dynamical systems. Analysis of a dynamicalsystem involves obtaining its mathematical model and then qualifying and quantifying its behaviorby solving or simulating the mathematical model. On the other hand, synthesis of a dynamicalsystem involves interconnecting several subsystems such that the overall system meets desiredstability and performance properties.

The behavior of a dynamical system is described by either differential equations or differenceequations. A key feature of such a system is that its current behavior depends not only on thecurrent input to the system but also on the past inputs to, and the initial condition of, the system.For example, if a mathematical description of a system is given by a differential equation

y(t) = u(t), t ≥ t0,

or a difference equation

y(t + 1) = y(t) + u(t), t = t0, t0 + 1, . . . ,

where y(t) is the response of the system at time t, u(t) the input to the system at time t, and t0the initial time, then the solution to such an equation reads

y(t) =

tt0

u(τ ) dτ + y(t0), t ≥ t0,

or

y(t) =t−1

t0

u(τ ) + y(t0), t = t0 + 1, t0 + 2, . . . .

Clearly, the system response y(t) is determined by the past inputs u(τ ), τ < t. On the other hand,a trivial example of dynamical systems would be y(t) = u(t)2, where the system response y(t) doesnot depend on the past inputs. Such a system belongs to a special type of dynamical systems calledstatic systems, which are of less interest.

To understand the concept of synthesis, consider the simple problem of charging a capacitorby a current source as depicted in Fig. 1.1. Given the initial time t0 and final time tf > t0, ourobjective is to charge the capacitor such that v(t) = vf for all t ≥ tf no matter what the initial

1



EE 580 — Linear Control Systems 2

i (t) v (t)

R

C

+

{

Figure 1.1: An RC circuit.

voltage v(t0) across the capacitor is. This is a control prob-lem , where the RC circuit is the plant (or controlled sys-tem ), the current i(t) is the control input to the plant, andthe capacitor voltage v(t) is the controlled output from theplant. Since C v(t) = i(t) for all t ≥ t0, any control sig-

nal i satisfying C

vf − v(t0)

= tf

t0 i(τ ) dτ and i(t) = 0 fort ≥ tf will solve this control problem. In particular, thecontrol signal i given by

i(t) =

C

vf −v(t0)tf −t0

if t ∈ [t0, tf );

0 if t ≥ tf

will do. This control scheme is an example of open-loop control , where the entire control signal i

is determined at the initial time based on the initial condition of the plant. However, open-loopcontrol is not practical. A slight error in the initial condition v(t0) or a slight modeling error (e.g.,error in the value of capacitance C ) will result in v(tf ) = vf under open-loop control. Moreover,even if v(tf ) = vf and i(t) = 0 for t ≥ tf , a slight disturbance entering the circuit at some timet ≥ tf will make v(t) deviate from the desired voltage level vf and stay deviated.

A practical approach, on the other hand, is to employ closed-loop control (or feedback control ),where the control input at each time instant is corrected based on sensor measurements. Forexample, if measurements of the capacitor voltage v(t) are available online, then a closed-loopcontrol that solves our problem of charging the capacitor is given by

i(t) =

C

vf −v(t)tf −t

, if t ∈ [t0, tf );

K 0

vf − v(t)

if t ≥ tf ,(1.1)

where K 0 is any positive constant. It is readily seen that this closed-loop control scheme is insensi-tive to the initial condition or modeling error. Moreover, even if the system response v(t) deviates

from the desired value vf due to a disturbance at some time t ≥ tf , the voltage v(t) will return to vf

under this feedback control scheme. A feedback control scheme is implemented by interconnectingthe plant with another dynamical system, called the controller ; the output of the controller is thecontrol input to the plant, and the measured output from the plant, as well as the reference input

v (t)RC

circuitv f

i (t)+

{

K (t)

plantcontroller

Figure 1.2: The closed-loop system formed by the RCcircuit in Fig. 1.1 and the controller in (1.1).

(i.e., the desired value for the controlledoutput of the plant), is the input to thecontroller. In the RC circuit example, the feedback control law in (1.1) defines the con-troller in Fig. 1.2, where v is the measuredoutput from the plant (which happens to bethe same as the controlled output) and vf

the reference input. This controller hap-pens to be a static system represented bythe time-varying gain

K (t) =

C tf −t

if t ∈ [t0, tf );

K 0 if t ≥ tf .

The RC circuit example also illustrates two objectives of feedback control. One objective isthat of performance , where the closed-loop system (or feedback control system ) shown in Fig. 1.2 is




required to reach a target output voltage vf at the terminal time tf . It turns out that the feedbackcontrol given by (1.1) not only drives the capacitor voltage from an initial value v(t0) to the finalvalue v(tf ) = vf but also minimizes the total heat energy dissipated by the resistor

J (i) = tf

t0

Ri(τ )2 dτ

in doing so. In this sense, this controller achieves an optimal performance. (Design of optimalcontrollers is a topic of EE 581 and beyond the scope of EE 580.) The other control objective isthat of stability , where the closed-loop system in Fig. 1.2 is required to maintain the output voltageat, or near, the reference value v(t) = vf for all time t ≥ tf subject to disturbances. Under thefeedback control law (1.1) with K 0 > 0, we have that i(t) < 0 whenever v(t) > vf and i(t) > 0whenever v(t) < vf , and so the stability objective is fulfilled. The larger the feedback gain K 0 is,the more stable the closed-loop system becomes.

1.2 Notation

The set of real numbers is denoted by R

, and the set of integers by Z

. Given positive integersm and n, the Euclidean n-space (i.e., the set of all n-tuples (x1, . . . , xn) of real numbers x1, . . . ,xn ∈ R) is denoted by Rn, and the set of all linear functions that map R

n into Rm is by Rm×n.Elements of Rn and R

m×n are often represented by n-dimensional column vectors and m-by-n

matrices, respectively: If

y1 = a11x1 + · · · + a1nxn,

...

ym = am1x1 + · · · + amnxn,

then we write

y = Ax,

where

y =

y1...

ym

=

y1 · · · ym

T∈ R

m, x =

x1...

xn

=

x1 · · · xn

T∈ R

n,

and

A =

a11 · · · a1n...

. . . ...

am1 · · · amn

=

a11 · · · am1...

. . . ...

a1n · · · amn

T

∈ Rm×n.

More generally, if

y1 = f 1(x1, . . . , xn),

...

ym = f m(x1, . . . , xn),

where f 1, . . . , f m : Rn → R are given functions, then we write

y = f (x)




with y = [y1 · · · ym]T, x = [x1 · · · xn]T, and f = [f 1, · · · f m]T : Rn → Rm understood.

If x is an Rn-valued signal (i.e., vector-valued function of time) in the continuous-time domain,then we write x : R→ R

n. Similarly, if x is an Rn-valued signal in the discrete-time domain, thenx : Z → R

n. We will often use a single symbol, say, x to denote both a vector (in which casex ∈ Rn) and a signal (in which case x(t) ∈ Rn for each t): which is meant should be clear from

context. To emphasize that x is a function of time, however, we may say that x(·) is a signal, orthat x(t), t ∈ R, (or x(t), t ∈ Z) is a signal. Note that it is common, but incorrect, to say that x(t)is a signal; x(t) is the value of signal x at time t.

1.3 Mathematical Descriptions of Dynamical Systems

1.3.1 An Inverted Pendulum [1, Exercise 1.13]

L

m

.

M ( )t

Figure 1.3: An inverted pendulum.

An inverted pendulum consists of a pendulum of mass m

mounted on a moving cart of mass M as shown in Fig. 1.3;the distance between the pivot and the center of gravity of the pendulum is L. The gravitational acceleration is g, themoment of inertia with respect to the center of gravity J ,and the damping factor associated with the cart D. Theequation of motion for the inverted pendulum subject tothe external force µ(t), t ≥ t0, applied to the cart is givenby the second-order ordinary differential equations

(J + mL2)θ(t) = mgL sin θ(t)− mLρ(t)cos θ(t) (1.2a)

and

(M + m)ρ(t) + D ρ(t)

= µ(t) + mLθ(t)2 sin θ(t)−mLθ(t)cos θ(t), (1.2b)

where θ(t) denotes the angular displacement of the pendulum from the vertical at time t, and ρ(t)the displacement of the cart from the origin at time t. Given the initial values θ(t0), θ(t0), ρ(t0),and ρ(t0), and given the input signal µ(t), t ≥ t0, solving (1.2) yields the output values θ(t) and ρ(t)for all t ≥ t0.

1.3.2 Input-Output Representation of Dynamical Systems

y Dynamical

system

u

Figure 1.4: A dynamical system.

Suppose that we have a dynamical system that generatesan output signal y(t), t ≥ t0, according to an input signalu(t), t ≥ t0, as depicted in Fig. 1.4, where t0 is the initialtime at which the system begins to operate. An exampleis the inverted pendulum described in Section 1.3.1: Withzero initial values, there exist functions h1 and h2 such that

θ(t) = h1(µ(s), t0 ≤ s ≤ t),

ρ(t) = h2(µ(s), t0 ≤ s ≤ t)

for all t ≥ t0.




In general, if a system of ordinary differential equations has m input variables u1(t), . . . , um(t)and l output variables y1(t), . . . , yl(t), then, with “zero initial values,” there is a function h suchthat

y(t) = h(t, u(s), t0 ≤ s ≤ t), (1.3)

where u(s) is the m-tuple (u1(s), . . . , um(s)) of input variables, and y(t) the l-tuple (y1(t), . . . , yl(t))

of output variables. Here, the output cannot be related to the input via y(t) = h(u(t)) for any(memoryless) function h : Rm → R

l. A dynamical system has an internal memory; that is, theoutput y(t) for each time instant t is a function of the input signal u(s), s ≥ t0. For instance,the inverted pendulum in Section 1.3.1 is a dynamical system with one input (i.e., m = 1) andtwo outputs (i.e., l = 2). Denoting the entire signals u(t), t ≥ t0, and y(t), t ≥ t0, by u and y,respectively, the input-output representation (1.3) is often written

y = H (u) or y = H u (1.4)

where H is an operator (i.e., signal-valued function of signals) defined by a function h as in (1.3).Note that, according to (1.3), the output at time t does not depend on the input u(s) for anys > t; that is, the present output is determined solely by the past and present inputs. A dynamicalsystem possessing this property is said to be causal .

1.3.3 State-Space Representation of Dynamical Systems

Consider the inverted pendulum described in Section 1.3.1. Let x1(t) = θ(t), x2(t) = θ(t), x3(t) =ρ(t), and x4(t) = ρ(t); let u(t) = µ(t), y1(t) = θ (t), and y2(t) = ρ(t). Then it is readily seen thatthere exist functions f 1, . . . , f 4 such that

xi(t) = f i(x1(t), x2(t), x3(t), x4(t), u(t)), i = 1, . . . , 4, (1.5)

for all t ≥ t0, and functions g1 and g2 such that

yi(t) = gi(x1(t), x2(t), x3(t), x4(t), u(t)), i = 1, 2, (1.6)

for all t ≥ t0; in this particular case, we have x1(t) = x2(t), x3(t) = x4(t), y1(t) = x1(t), andy2(t) = x3(t). Given initial values x1(t0), . . . , x4(t0), and given an input signal u(t), t ≥ t0,equations (1.5) and (1.6) define the output variables y1(t) and y2(t) for all t ≥ t0.

In general, a system of high-order ordinary differential equations can be reduced to a system of first-order ordinary differential equations: If the former has n initial values x1(t0), . . . , xn(t0), m

input variables u1(t), . . . , um(t), and l output variables y1(t), . . . , yl(t), then there exist functionsf : R× R

n ×Rm → R

n and g : R× Rn × R

m → Rl such that the latter is of the form

x(t) = f (t, x(t), u(t)), (1.7a)

y(t) = g(t, x(t), u(t)), (1.7b)

where x(t) = [x1(t) · · · xn(t)]T, x(t) = [x1(t) · · · xn(t)]T, u(t) = [u1(t) · · · um(t)]T, and y(t) =[y1(t) · · · yl(t)]T for all t ≥ t0. Equations (1.7) comprise the state-space representation (or state model ) of a dynamical system that can be described by a system of ordinary differential equations;in particular, equation (1.7a) is called the state equation , and (1.7b) the output equation . Thevectors x(t), u(t), and y(t) are called the state , input , and output , respectively.

The number n of state variables is called the order of the system (1.7). For instance, the invertedpendulum described by two second-order ordinary differential equations results in a fourth-order




system represented by (1.5) and (1.6). Since n, m, and l are all finite, the system (1.7) is oftensaid to be finite-dimensional . In particular, since m, l ≥ 1, the system is called a multiple-input multiple-output (MIMO) system; the special case of m = l = 1 is called a single-input single-output (SISO) system. If the function f , called the vector field , is independent of u (i.e., if there exists afunction f such that f (t, x, u) = f (t, x) for all t ∈ R, x ∈ Rn, and u ∈ Rm), then the first-order

ordinary differential equation (1.7a) is said to be homogeneous (or unforced ); otherwise, it is calledinhomogeneous (or forced ).

1.3.4 Importance of State Concept

According to J. C. Willems [2], “the point of view which most control theorists adhere to is that,while actual systems are most naturally viewed in an input/output setting, when it comes tocomputations state models have superior properties.” One of the most important among suchproperties is what we call “finite-memory property.” Consider the state-space representation (1.7) of a dynamical system. The output equation (1.7b) describes the output y(t) in terms of the state x(t)(as well as the input u(t)) via a “memoryless” function g . This means that the “internal memory”of a dynamical system at time t is absorbed in the state x(t) at time t. Moreover, according to

the state equation (1.7a), the number of state variables remains finite and constant throughout theoperation of the plant; that is, the “size” of the internal memory of a finite-dimensional dynamicalsystem remains finite and constant.

The finite-memory property of the state-space model is particularly useful in the context of feedback controller design. At each time instant t, a feedback controller generates the input u(t)based on the information available to it up to time t; if the output signal is measured online, thenthis information consists of y(s) for all s ∈ [t0, t] and u(s) for all s ∈ [t0, t). However, it is well-known that for many control objectives this information can be replaced with x(t) whenever thestate is perfectly observed at time t (i.e., whenever y(t) = x(t)). Another advantage of state-spacemethods is that, unlike input-output models, state-space models provide a full description of theinternal behavior of dynamical systems. For this reason, state-space models are called internal

descriptions of dynamical systems. On the other hand, input-output models describe the systembehavior under zero initial conditions and are not necessarily able to capture the evolution of internal state variables that are not controlled or observed. Thus input-output models are calledexternal descriptions of dynamical systems. We will study these aspects of state-space approachesin more detail during this course.

1.4 Classification of Dynamical Systems

1.4.1 A Target Maneuver Model for Tracking Radar [3, Example V.B.1]

Let us consider a target maneuver model that tracking radar systems often use. With a sam-pling period T , a digital computer samples the position of the moving target as a point in two-

dimensional Cartesian coordinate. Then the target position at sampling instant tT is given by thepair (ρ1(t), ρ2(t)) for all t = t0, t0 + 1, . . . , where t0 ∈ Z defines the initial sampling instant t0T . If the acceleration of the target at sampling instant tT is (µ1(t), µ2(t)), then Newton’s law says thatthe velocity (ν 1(t), ν 2(t)) of the target at sampling instant tT approximately satisfies

ρi(t + 1) = ρi(t) + T ν i(t),

ν i(t + 1) = ν i(t) + T µi(t)




for all t = t0, t0 + 1, . . . . This leads to the state equation

x1(t + 1)x2(t + 1)x3(t + 1)x4(t + 1)

=

1 0 T 00 1 0 T

0 0 1 00 0 0 1

x1(t)x2(t)x3(t)x4(t)

+

0 00 0T 00 T

u1(t)u2(t)

(1.8a)

and output equation

y1(t)y2(t)

=

1 0 0 00 1 0 0


(1.8b)

where x1(t) = ρ1(t), x2(t) = ρ2(t), x3(t) = ν 1(t), x4(t) = ν 2(t), u1(t) = µ1(t), u2(t) = µ2(t),y1(t) = ρ1(t), and y2(t) = ρ2(t) for all t = t0, t0 + 1, . . . . (In a target model like this, the unknownacceleration inputs µ1(t) and µ2(t) are often considered as “noise.”)

Just like other dynamical systems, this target maneuver model has many different state-spacerepresentations. Let x1(t) and x2(t) be as above, but put x3(t) = ρ1(t − 1) − T 2µ1(t − 1) and

x4(t) = ρ2(t − 1)− T 2µ2(t − 1). Then the state equation now reads

x1(t + 1)x2(t + 1)x3(t + 1)x4(t + 1)

=

2 0 −1 00 2 0 −11 0 0 00 1 0 0


+

0 00 0

−T 2 00 −T 2

u1(t)u2(t)

for all t = t0, t0 + 1, . . . .

1.4.2 Continuous-Time and Discrete-Time Systems

A dynamical system represented by ordinary differential equations as in (1.7) is called a continuous-

time system. For example, the inverted pendulum model described in Section 1.3.1 leads to acontinuous-time system. On the other hand, a discrete-time system has the state-space represen-tation consisting of first-order difference equations of the form

x(t + 1) = f (t, x(t), u(t)), (1.9a)

y(t) = g(t, x(t), u(t)) (1.9b)

for t = t0, t0 + 1, . . . , where t0 ∈ Z is the initial time. If u(t) and y(t) are the the m-tuple of inputvariables and l-tuple of output variables, respectively, then the input-output representation of adiscrete-time system is written as in (1.3); denoting the entire sequences u(t0), u(t0 + 1), . . . , andy(t0), y(t0 + 1), . . . , by u and y, respectively, we may write as in (1.4), where H is an operatordefined by h. Examples of discrete-time systems include the state-space representations of the

target maneuver model described in Section 1.4.1.

1.4.3 Time-Varying and Time-Invariant Systems

The system of ordinary differential equations (1.7) or that of difference equations (1.9) is said to betime-invariant (or autonomous ) if the functions f and g are independent of t (i.e., if there exist f

and g such that f (t, x, u) = f (x, u) and g(t, x, u) = g(x, u) for all t ∈ R, x ∈ Rn, and u ∈ Rm).Time-invariance is characterized by the shift-invariance of the input-output representation (un-

der zero initial conditions); that is, a time shift in the input signal leads to the same time shift




t

t t

t

u t ( )

u t ( )

y t ( )

y t ( )

H

H

Figure 1.5: A time-invariant system H .

in the output signal as depicted in Fig. 1.5.The inverted pendulum model in Sec-tion 1.3.1 and the target maneuver modelin Section 1.4.1 both lead to time-invariantsystems. A dynamical system which is

not time-invariant is called time-varying .An example of time-varying systems is theclosed-loop system in Fig. 1.2; since the con-troller gain K (t) is a function of time, theoverall system is time-varying.

1.4.4 Linear and Nonlinear Systems

A continuous-time system (1.7) or a discrete-time system (1.9) is said to be linear if the functions f

and g are of the form

f (t, x, u) = A(t)x + B(t)u,

g(t, x, u) = C(t)x + D(t)u

for all t ∈ R, x ∈ Rn, and u ∈ R

m, where A(t) ∈ Rn×n, B(t) ∈ R

n×m, C(t) ∈ Rl×n, and

D(t) ∈ Rl×m are matrices determined by f (t, ·, ·) and g(t, ·, ·). In particular, a linear time-invariant

(LTI) system has

f (t, x, u) = Ax + Bu,

g(t, x, u) = Cx + Du

for all t ∈ R, x ∈ Rn, and u ∈ Rm, where A, B, C, and D are constant matrices. If a systemis linear but not time-invariant, then it is called a linear time-varying (LTV) system. If a systemis not linear, it is called nonlinear . The inverted pendulum example in Section 1.3.1 leads to a

nonlinear system. On the other hand, the target models in Section 1.4.1 are LTI systems, and theoverall system in Fig. 1.2, where the reference input vf is considered an input to the system, is anLTV system.

In general, linearity is characterized by the superposition principle , which says that a dynamicalsystem is linear if and only if its input-output representation H as in (1.4) satisfies the followingproperty under zero initial conditions: Whenever y = H u and z = H v, we have

αy + β z = H (αu + β v)

for all α, β ∈ R.

References[1] P. J. Antsaklis and A. N. Michel, A Linear Systems Primer . Boston, MA: Birkhauser, 2007.

[2] J. C. Willems, “The concept of a dynamical system in control,” in Proceedings of the 33rd IEEE Conference on Decision and Control , vol. 3, 1994, p. 2099.

[3] H. V. Poor, An Introduction to Signal Detection and Estimation , 2nd ed. New York, NY:Springer, 1994.




II. Solution and Linearization of State-Space Models



Fall 2010

2.1 Notation and Definitions

The Euclidean (vector) norm x of x ∈ Rn is defined by x =

√ xTx =

ni=1 x2

i

1/2. The

following properties of a vector norm can be easily checked: x = 0 if and only if x = 0 ∈ Rn;

αx = |α|x for all x ∈ R

n

and α ∈ R; and the triangle inequality x + y ≤ x + y holds forx, y ∈ R

n. Moreover, the Cauchy-Schwarz inequality says that |xTy| ≤ xy for all x, y ∈ Rn [1,

Lemma 5.4.13 & Eq. (5.4.15)]. This vector norm · on Rn induces a matrix norm on Rm×n: Thespectral norm (or operator norm ) of A ∈ R

m×n is defined by

A = max{Ax : x = 1} = max{Ax/x : x = 0}.

For example, if A = [ 1 00 2 ], then A = 2 because, over all x = [x1 x2]T ∈ R

2 with x =

x21 + x2

2

= 1, the norm Ax =

x21 + 4x2

2 is maximized by x = [0 1]T. Although we will use the samesymbol · to denote both the vector norm and matrix norm, it should be clear from contextwhich is meant. Similarly to the Euclidean vector norm, the spectral norm satisfies the followingproperties:

A

= 0 if and only if A = 0

∈ R

m×n;

αA

=

|α

|A

for all A

∈ R

m×n and α

∈ R;

and A + B ≤ A + B for A, B ∈ Rm×n. Moreover, the spectral norm is submultiplicative ;that is, BA ≤ BA and Ax ≤ Ax for all B ∈ R

l×m, A ∈ Rm×n, and x ∈ R

n [1,Theorem 5.6.2]. The identity matrix I (i.e., the matrix representation of the identity map) hasunity spectral norm: I = 1.

For r > 0 and x0 ∈ Rn, the set B(r, x0) = {x ∈ R

n : x−x0 < r} is called an open ball centeredat x0 and of radius r. If S ⊂ R

n, a point x ∈ S is said to be an interior point of S whenever thereexists some r > 0 such that B (r, x) ⊂ S ; otherwise x is said to be a boundary point of S . The setof all interior points (resp. boundary points) of S is called the interior (resp. boundary ) of S . Theset S is said to be open if the interior of S is S itself; S is said to be closed if its complement isopen. That is, an open set does not contain any of its boundary points, and a closed set containsall its boundary points. Simplest examples include open intervals (a, b), closed intervals [a, b], and

intervals like (a, b] and [a, b) that are neither open nor closed. By convention, the empty set ∅ andthe entire set Rn are both open and closed. A set S ⊂ R

n is said to be bounded if there exists somer > 0 such that S ⊂ B(r, 0) (or equivalently, x < r for all x ∈ S ); if S is closed and bounded,then S is said to be compact .

A sequence (x1, x2, . . . ) of vectors xk ∈ Rn is said to converge to a vector x ∈ R

n, which iscalled the limit of the sequence (xk), if for any given ε > 0 there exists a nonnegative integer K such that xk − x < ε for all k ≥ K ; in this case, we write either xk → x or limk→∞ xk = x.For example, while the sequence (xk) with xk = k/(1 + k) is convergent and has xk → 1, thesequence with xk = 1 + (−1)k is not convergent. A sequence (f 1, f 2, . . . ) of functions f k : D → R

m

1




is said to converge (pointwise) to a function f : D → Rm if the sequence (f 1(x), f 2(x), . . . ) converges

to f (x) for all x ∈ D. The sequence (f k) is said to converge uniformly to a function f if for everyε > 0 there exists a nonnegative integer K such that f k(x) − f (x) < ε whenever k ≥ K andx ∈ D; note that K does not depend on x in the case of uniform convergence. Geometrically,uniform convergence means that when k > K the entire graph of f k must lie within the strip

bounded by the graphs of f k − ε and f k + ε. For example, let f k, gk, f : (0, 1) →R

be defined byf k(x) = max{−x + 1/k, 0}, gk(x) = max{−kx + 1, 0}, and f (x) = 0 for x ∈ (0, 1); while gk → f uniformly, we have f k → f only pointwise.

A function f : D → Rm, where D ⊂ R

n, is said to be continuous at the point x0 ∈ D if for everyε > 0 there exists a δ > 0 such that f (x) − f (x0) < ε whenever x − x0 < δ and x ∈ D. If thefunction f is continuous at x0, then we write limx→x0 f (x) = f (x0). The function f is said to becontinuous (on D) if it is continuous at all x0 ∈ D. If for every ε > 0 there exists a δ > 0 suchthat f (x) − f (x0) < ε whenever x − x0 < δ and x, x0 ∈ D, then f is said to be uniformly continuous on D; note that δ does not depend on x or x0 in the case of uniform continuity. Forexample, if f (x) = 2x for x ∈ R

n, then f is uniformly continuous on Rn because, for any ε > 0,we have f (x) − f (x0) < ε whenever x − x0 < ε/2; on the other hand, if f (x) = x2, then f is continuous on Rn but not uniformly so because, for any δ > 0, choosing an x0 with

x0

= 1/δ

and setting x = (1 + δ 2/2)x0 gives x − x0 = δ/2 < δ but

|f (x) − f (x0)| = |xTx − xT0 x0| =

(x + x0)T(x − x0) =

(2 + δ 2/2)x0

T(δ 2/2)x0 > 1;

continuity of f follows from the fact that, for any ε > 0 and x0 ∈ Rn, inequality x − x0 <

ε + x02 − x0 implies |f (x) − f (x0)| = (x + x0)T(x − x0) ≤ x + x0x − x0 < ε. It is a

fact that, if f : D → Rm is continuous on D and if D is compact, then f is uniformly continuous

on D [2, Theorem 2.44] and bounded on D (i.e., there is some M > 0 such that f (x) ≤ M forall x ∈ D) [2, Corollary 2.40].

The set of every continuous function that maps D into Rm is denoted by C (D, Rm). For anypositive integer k, we define C k(D,Rm) as the set of every function f ∈ C (D,Rm) such that

∂ jf l∂xi1

1 · · · ∂xinn

∈ C (D,R)

holds for i1 + · · ·+ in = j , for j = 1, . . . , k, and for l = 1, . . . , m. If f ∈ C k(D,Rm), then f is calleda C k-function . In particular, a C 1-function is said to be continuously differentiable , meaning that itis differentiable and its derivative is continuous. The Jacobian matrix of a C 1-function f : D → R

m

at x0 ∈ D, where D ⊂ Rn, is defined by

∂f

∂ x(x0) =

∂f 1∂x1

(x0) · · · ∂f 1∂xn

(x0)

... . . .

...∂f m

∂x1 (x0) · · · ∂f m

∂xn (x0)

∈ Rm×n.

2.2 Solutions of State Equations [3, Section 1.5][4, Section 1.10]

2.2.1 Existence of Solutions

Given a function f : D → Rn, where D is an open, nonempty, and connected subset of R×R

n, andgiven initial data (t0, x0) ∈ D, consider the homogeneous state equation

x(t) = f (t, x(t)) (2.1a)




subject tox(t0) = x0; (2.1b)

equivalently, consider the integral equation

x(t) = x0 + t

t0

f (s, x(s)) ds,

where tt0

f (s, x(s)) ds =

tt0

f 1(s, x(s)) ds... t

t0f n(s, x(s)) ds

=

tt0

f 1(s, x1(s), . . . , xn(s)) ds... t

t0f n(s, x1(s), . . . , xn(s)) ds

.

A function φ : J → Rn, where J is an open interval in R, is said to be a solution of the initial-value

problem (2.1) on the interval J if it satisfies the following conditions:

• φ ∈ C 1(J,Rn);

• (t,φ(t)) ∈ D for all t ∈ J ;

• φ(t) = f (t,φ(t)) for all t ∈ J ;

• φ(t0) = x0.

The following result shows that the continuity of f is sufficient for the existence of a solutionto the initial-value problem. Moreover, it shows that one can solve the initial-value problem bothforward and backward in time.

Theorem 2.1 If f ∈ C (D, Rn) and (t0, x0) ∈ D, then the initial-value problem (2.1) has at least one solution on the interval (t0 − c, t0 + c) for some c > 0.

2.2.2 Uniqueness of Solutions

For example, consider the initial-value problem

x(t) = x(t)1/3, x(0) = 0. (2.2)

Applying Theorem 2.1 to f (t, x) = x1/3 yields that there exists a solution to this problem. Indeed,a solution is given by

φ(t) =

0 if t < 0;

(2t/3)3/2 if t ≥ 0.(2.3a)

However, it is not unique sinceφ(t) = 0, t ∈ R, (2.3b)

is also a solution. This example shows that continuity of f is not enough for the initial-value

problem (2.1) to have a unique solution.A function f : D → Rn is said to satisfy a Lipschitz condition (or to be Lipschitz continuous in

the second argument) if, for every compact subset K of D, there exists a constant LK > 0 (calleda Lipschitz constant with respect to the second argument) such that

f (t, x) − f (t, y) ≤ LK x − yfor all (t, x), (t, y) ∈ K . Lipschitz continuity is a stronger condition than continuity, and issufficient for the uniqueness of solutions to initial-value problems. We again assume that thedomain D ⊂ R×R

n of the vector field f is open, nonempty, and connected.




Theorem 2.2 If f ∈ C (D, Rn) and (t0, x0) ∈ D, and if f satisfies a Lipschitz condition, then the initial-value problem (2.1) has at most one solution on the interval (t0 − c, t0 + c) for any c > 0.

2.2.3 Continuation of Solutions

Once a solution to the initial-value problem has b een found on a (possibly small) time interval con-

taining the initial time t0, the process of extending it to a larger time interval is called continuationof a solution.

As an example, consider the initial-value problem (2.2), and suppose one has obtained a solutionφ(t) = 0 on the interval (−1, 0). In this case, φ is continuable in more than one way: Both (2.3a)and (2.3b) are extensions of this φ and are solutions to (2.2) on the entire time interval R. As asecond example, consider the initial-value problem

x(t) = x(t)2, x(0) = 1,

which has a solution φ(t) = (1 − t)−1 on (−1, 1). Clearly, this solution is continuable to the left(i.e., backward in time) to −∞ but not to the right (i.e., forward in time).

Theorem 2.3 If f ∈

C (D,Rn), where D = J ×Rn for some open interval J

⊂R, and if f satisfies

a Lipschitz condition, then for any (t0, x0) ∈ D the initial-value problem (2.1) has a unique solution that exists on the entire interval J .

This continuation result will be used to show that the initial-value problem with a linear,possibly time-varying state equation has a unique solution on the entire time interval if the matrixcoefficients are continuous, or piecewise continuous, in time.

2.2.4 Successive Approximations of Solutions

If the initial-value problem (2.1) has a unique solution φ, a classical method of constructing ap-proximations to φ is the method of successive approximations (or Picard iterations ). Given afunction f

∈ C (D,Rm), where D

⊂ R

×Rn is open, nonempty, and connected, and given initial

data (t0, x0) ∈ D, successive approximations for (2.1) are the sequence (φ0,φ1, . . . ) of functionsφm given by

φ0(t) = x0, (2.4a)

φm+1(t) = x0 +

tt0

f (s,φm(s)) ds, m = 0, 1, 2, . . . , (2.4b)

for all t in some interval J ⊂ R containing t0.

Theorem 2.4 If f ∈ C (D, Rn) and (t0, x0) ∈ D, and if f satisfies a Lipschitz condition, then the successive approximations φm given by (2.4) exist on the interval (t0 − c, t0 + c) for some c > 0,are continuous on (t0 − c, t0 + c), and converge uniformly, as m → ∞, to the unique solution of

(2.1) on (t0 − c, t0 + c).

2.3 Linearization of State-Space Models

2.3.1 Linearized Equations [3, Section 1.6]

Consider the state-space model

x(t) = f (t, x(t), u(t)), (2.5a)

y(t) = g(t, x(t), u(t)), (2.5b)




where f , g : R× D1 × D2 → Rn, D1 ⊂ R

n, and D2 ⊂ Rm. Let φ ∈ C (R, D1) and ψ ∈ C (R, D2) be

given signals. If f , g ∈ C 1(R× D1 × D2,Rn) and if φ and ψ satisfy

φ(t) = f (t,φ(t),ψ(t))

for all t

∈R, then we can linearize the state-space model (2.5) about (φ,ψ) in the following manner.

For each x ∈ C (R, D1) and u ∈ C (R, D2) satisfying the state equation (2.5a) for all t ∈ R,define

δ x(t) = x(t) −φ(t), δ u(t) = u(t) −ψ(t) (2.6a)

for all t ∈ R. If

∂f

∂ x =

∂f 1∂x1

· · · ∂f 1∂xn

... . . .

...∂f n∂x1

· · · ∂f n∂xn

,

∂f

∂ u =

∂f 1∂u1

· · · ∂f 1∂um

... . . .

...∂f n∂u1

· · · ∂f n∂um

are the Jacobian matrices of f with respect to the second and third arguments, respectively, thenwe obtain

˙δ x(t) = f (t, x(t), u(t)) − f (t,φ(t),ψ(t))

= f (t, δ x(t) + φ(t), u(t)) − f (t,φ(t),ψ(t)) + f (t,φ(t), δ u(t) + ψ(t)) − f (t,φ(t), u(t)) =0

= ∂f

∂ x(t,φ(t),ψ(t)) δ x(t) +

∂f

∂ u(t,φ(t),ψ(t)) δ u(t) + F 1(t, δ x(t), u(t)) + F 2(t, δ u(t)),

where

F 1(t, δ x(t), u(t)) = f (t, δ x(t) + φ(t), u(t)) − f (t,φ(t), u(t)) − ∂f

∂ x(t,φ(t),ψ(t)) δ x(t),

F 2(t, δ u(t)) = f (t,φ(t), δ u(t) + ψ(t))−

f (t,φ(t),ψ(t))−

∂f

∂ u(t,φ(t),ψ(t)) δ u(t).

Fix a t ∈ R. What we want to have is that, as δ x(t) → 0 and δ u(t) → 0, the functions F 1 and F 2vanish “faster” than δ x(t) and δ u(t), respectively. Standard terminology to denote such functionsis o(·), the “little o” function : If a function h : D → R

m, with D being a subset of Rn containingthe origin, is such that limx→0 h(x)/x = 0, then h is said to be o(x). With this terminology, itcan be shown that F 1(t, δ x(t), u(t)) is o(δ x(t)) for each t ∈ R and u(t) ∈ R

m, and that F 2(t, δ u(t))is o(δ u(t)) for each t ∈ R. Letting

A(t) = ∂f

∂ x(t,φ(t),ψ(t)) and B(t) =

∂f

∂ u(t,φ(t),ψ(t)), (2.6b)

we obtain

˙δ x = A(t) δ x(t) + B(t) δ u(t) + o(δ x(t)) + o(δ u(t))

for each t ∈ R. Similarly, letting

δ y(t) = g(t, x(t), u(t)) − g(t,φ(t),ψ(t)), (2.6c)

C(t) = ∂g

∂ x(t,φ(t),ψ(t)), and D(t) =

∂g

∂ u(t,φ(t),ψ(t)) (2.6d)

yieldsδ y(t) = C(t) δ x(t) + D(t) δ u(t) + o(δ x(t)) + o(δ u(t))




for each t ∈ R. As δ x(t) → 0 and δ u(t) → 0, we have

˙δ x(t) = A(t) δ x(t) + B(t) δ u(t),

δ y(t) = C(t) δ x(t) + D(t) δ u(t)(2.7)

for each t∈R. The state-space model (2.7) is called the linearization of (2.5) about the solution φ

and the input signal ψ.If the state-space model (2.5) is autonomous (i.e., time-invariant) and homogeneous (i.e., un-

forced), so thatx(t) = f (x(t)) and y(t) = g(x(t)),

and if φ is a constant solution of it, so that φ(t) = x0 for all t ∈ R for some equilibrium point x0 ∈ R

n satisfying f (x0) = 0, then its linearization (2.5) reads

˙δ x(t) = A δ x(t) and δ y(t) = C δ x(t),

where A and C are constant matrices.Discrete-time state equations of the form x(t + 1) = f (t, x(t), u(t)) and y(t) = g(t, x(t), u(t))

can be linearized in a virtually identical manner to the continuous-time case, and their linearizedmodels are of the form

δ x(t + 1) = A(t) δ x(t) + B(t) δ u(t),

δ y(t) = C(t) δ x(t) + D(t) δ u(t),

where δ x, δ u, δ y, A(t), B(t), C(t), and D(t) are as in (2.6).

2.3.2 Special Case: Taylor’s Theorem [2, Theorem 3.11][5, Theorem 28.2]

Let f : [a, b] → R and its first k derivatives be continuous on [a, b] and differentiable on (a, b), andlet x0

∈ [a, b]. Then for each x

∈ [a, b] with x

= x0 there exists a point c between x and x0 such

that

f (x) = f (x0) + ∂f

∂x(x0)(x − x0) +

1

2!

∂ 2f

∂x2(x0)(x − x0)2 + · · ·

+ 1

k!

∂ kf

∂xk(x0)(x − x0)k +

1

(k + 1)!

∂ k+1f

∂xk+1(c)(x − x0)k+1. (2.8)

Here it is clear that all the terms on the right-hand side of equality (2.8), except for the first two,are o(x − x0); that is, as x → x0, we have f (x) → f (x0) + ∂f

∂x (x0)(x − x0). Moreover, if f (x0) = 0(i.e., if x0 is an equilibrium of x = f (x)), then we have a linearization of f about x0, which is givenby ∂f

∂x (x0)δx.

2.3.3 Example: Inverted Pendulum

The motion of an inverted pendulum is described by the ordinary differential equations

(J + mL2)θ(t) = mgL sin θ(t) − mLρ(t)cos θ(t), (2.9a)

(M + m)ρ(t) + D ρ(t) = µ(t) + mLθ(t)2 sin θ(t) − mLθ(t)cos θ(t), (2.9b)




where θ(t) denotes the angular displacement of the pendulum from the vertical at time t, and ρ(t)the displacement of the cart from the origin at time t. A state-space model of (2.9), with theargument t omitted for simplicity, reads

x1 = x2,

x2 =

(M + m)mgL sin x1

−mLu cos x1

−m2L2x2

2 sin x1 cos x1 + mLDx4 cos x1

(M + m)(J + mL2) − m2L2 cos2 x1 ,

x3 = x4,

x4 = (J + mL2)u + mL(J + mL2)x2

2 sin x1 − m2gL2 sin x1 cos x1 − D(J + mL2)x4

(M + m)(J + mL2) − m2L2 cos2 x1

andy1 = x1, y2 = x3

with state variables x1 = θ, x2 = θ, x3 = ρ, and x4 = ρ; input variable u = µ; and output variablesy1 = θ and y2 = ρ. Using the procedure in Section 2.3.1, we linearize this model about the solution(φ1(t), φ2(t), φ3(t), φ4(t)) = (0, 0, v0t, v0), t ∈ R, and the input ψ(t) = Dv0, t ∈ R, as

˙δ x(t) = A δ x(t) + B δ u(t),δ y(t) = C δ x(t),

where, regardless of the constant v0,

A = 1

(M + m)J + mM L2

0 (M + m)J + mM L2 0 0mgL(M + m) 0 0 MLD

0 0 0 (M + m)J + mM L2

−m2gL2 0 0 −D(J + mL2)

,

B = 1

(M + m)J + mM L2

0

−mL0

J + mL2

, C =

1 0 0 00 0 1 0

.

2.3.4 Importance of Linearized Equations

A real-world process or environment can have several different models of varying levels of complexity,and linear equations may arise in a natural manner from the simplest possible model. However,more often than not, linear equations result from the process of linearizing nonlinear equations.Linear systems are important because of the following:

• Unlike nonlinear systems, linear systems are very well understood, and there exists a relativelycomplete theory for analyzing and synthesizing linear systems;

• Linearized equations may give us qualitative information about the underlying, often very

complicated, nonlinear behavior.

2.4 Solutions of Linear State Equations

2.4.1 Existence and Uniqueness [3, Section 1.7]

Given matrices A(t) ∈ Rn×n, B(t) ∈ R

n×m for t ∈ R, and given a function u : R → Rm and initial

data (t0, x0) ∈ R× Rn, consider the inhomogeneous linear state equation

x(t) = A(t)x(t) + B(t)u(t) (2.10a)




with the initial statex(t0) = x0. (2.10b)

Theorem 2.5 Suppose that A ∈ C (J,Rn×n), B ∈ C (J,Rn×m), and u ∈ C (J,Rm), where J is some open interval in R. Then for any t0 ∈ J and any x0 ∈ R

n, the initial-value problem (2.10)has a unique solution on the entire interval J .

Proof . Since A(t) and B(t)u(t) are continuous in t, the function f (t, x) = A(t)x + B(t)u(t) iscontinuous in (t, x). Moreover, for any closed interval J 0 ⊂ J there is a Lipschitz constant L0 ≥ 0such that

f (t, x) − f (t, y) = A(t)(x − y) ≤ A(t)x − y ≤ L0x − yfor all (t, x), (t, y) ∈ J 0 ×R

n; here, L0 is taken to be maxt∈J 0 A(t), which exists as J 0 is compactand A(·) is continuous on J 0 [2, Corollary 2.40]. (Here, continuity of A(·) follows from that of · and A(·).) Therefore, by Theorem 2.3, there exists a unique solution to (2.10) on J 0. Sincethis argument holds for any closed interval J 0 ⊂ J , a unique solution exists on J .

A function defined on an interval J of R is said to be piecewise continuous if it has at most

a finite number of discontinuities over any bounded interval in J . Theorem 2.5 extends to thecase where A(·), B(·), and u(·) are piecewise continuous because one can solve the initial-valueproblem (2.10) piece by piece. For example, if A(·), B(·), and u(·) are continuous on [t0, tf ) exceptpossibly at t1, . . . , tN −1 ∈ R with t0 < t1 < · · · < tN −1 < tf , then we obtain a solution φ(t) overall t ∈ [t0, tf ) by solving N initial-value problems as follows:

x(t) = A(t)x(t), t ∈ [t0, t1); x(t0) = x0 ⇒ φ(t), t ∈ [t0, t1);

x(t) = A(t)x(t), t ∈ [t1, t2); x(t1) = limτ →t−1

φ(τ ) ⇒ φ(t), t ∈ [t1, t2);

...

x(t) = A(t)x(t), t ∈ [tN −1, tf ); x(tN −1) = limτ →t−N −1

φ(τ ) ⇒ φ(t), t ∈ [tN −1, tf ).

2.4.2 Peano-Baker Series [3, Section 1.8]

Application of Theorem 2.4 to (2.10) leads to the following result:

Theorem 2.6 Suppose that A ∈ C (R,Rn×n), B ∈ C (R,Rn×m), and u ∈ C (R,Rm). Then for any t0 ∈ R and any x0 ∈ R

n, the initial-value problem (2.10) has the unique, continuously differentiable solution

x(t) total

solution

= Φ(t, t0)x0 homogeneous

solution

+

tt0

Φ(t, s)B(s)u(s) ds particular solution

for all t∈R, where the state transition matrix Φ(t, t0) is given by the Peano-Baker series

Φ(t, t0) = I +

tt0

A(s1) ds1 +

tt0

A(s1)

s1t0

A(s2) ds2ds1

+

tt0

A(s1)

s1t0

A(s2)

s2t0

A(s3) ds3ds2ds1 + · · · ,

which converges uniformly on any bounded interval in R containing t and t0.




In the time-invariant case whereA(t) = A for all t, the Peano-Baker series simplifies to

Φ(t, t0) =

∞k=0

Ak(t − t0)k

k! = I + A(t − t0) +

1

2A2(t − t0)2 +

1

6A3(t − t0)3 + · · ·

with the convention that 0! = 1 and A0

= I; in this case, we write Φ(t, t0) = eA(t−t0)

, which iscalled the matrix exponential .

2.4.3 Discrete-Time Case

Consider the initial-value problem determined by discrete-time linear systems of the form

x(t + 1) = A(t)x(t) + B(t)u(t), (2.11a)

x(t0) = x0. (2.11b)

Define the discrete-time state transition matrix by

Φ(t, t0) = I if t = t0;A(t − 1) · · · A(t0) if t > t0.

Then the unique solution to (2.11) is given by

x(t) total

solution

= Φ(t, t0)x0 homogeneous

solution

+t−1s=t0

Φ(t, s + 1)B(s)u(s)

particular solution

for all t = t0, t0 + 1, . . . . (Note that, unlike the continuous-time case, solving (2.11) backward intime for t < t0 is in general not possible.) In the time-invariant case where A(t) = A for all t

∈R,

we haveΦ(t, t0) = At−t0 .

References

[1] R. A. Horn and C. R. Johnson, Matrix Analysis . Cambridge, UK: Cambridge University Press,1985.

[2] C. C. Pugh, Real Mathematical Analysis . New York, NY: Springer, 2002.

[3] P. J. Antsaklis and A. N. Michel, A Linear Systems Primer . Boston, MA: Birkhauser, 2007.

[4] ——, Linear Systems , 2nd ed. Boston, MA: Birkhauser, 2006.

[5] S. R. Lay, Analysis with an Introduction to Proof , 2nd ed. Englewood Cliffs, NJ: Prentice Hall,1990.




III. Input-Output Description of Linear Systems



Fall 2010

3.1 Definitions and Facts About Matrix Inversion

A function f : Rn → Rm is said to be one-to-one (or injective) if f (x) = f (x) whenever x = x; it is

said to be onto (or surjective) if, for every y ∈ Rm, there exists an x ∈ R

n such that f (x) = y. If f is

both one-to-one and onto, then it is called a one-to-one correspondence (or bijection). Bijections areinvertible functions; that is, if f is bijective, then there exists a unique f −1 such that f −1(f (x)) = xfor all x ∈ R

n and such that f (f −1(y)) = y for all y ∈ Rm.

A matrix A ∈ Rn×n is said to be nonsingular (or invertible) if there exists a matrix A−1 ∈ R

n×n

called the inverse of A such that A−1A = I. The inverse A−1 is unique whenever it exists,and satisfies AA−1 = I as well. A nonsingular matrix is a representation of a linear one-to-onecorrespondence. If A, B ∈ R

n×n are both nonsingular, then it is readily seen that (AB)−1 =B−1A−1. The following are useful facts for computing the inverse of a matrix:

• Inverse of a Partitioned Matrix [1, Appendix A.22]. Let A ∈ Rn×n be nonsingular and

partitioned as

A = A11 A12

A21 A22with A11 ∈ R

n1×n1 , A22 ∈ Rn2×n2 , and n1 + n2 = n. If A−1

11 exists, then A−1 is partitionedas

A−1 =

A−1

11 + A−1

11 A12∆−1

11 A21A−1

11 −A−1

11 A12∆−1

11

−∆−111

A21A−111

∆−111

with

∆11 = A22 − A21A−111 A12.

If A−1

22 exists as well, then the (1, 1) block of A−1 can be replaced with ∆−1

22 where ∆22 =

A11 − A12A−122

A21; ∆11 (resp. ∆22) is known as the Schur complement of A11 (resp. A22).

• A Matrix Inversion Formula [2, Section 0.7.4]. Let B ∈ R

n×n

be nonsingular and given by

B = A + XRY

for some A ∈ Rn×n, X ∈ R

n×m, R ∈ Rm×m, and Y ∈ R

m×n. Then

B−1 = A−1 − A−1X

R−1 + YA−1X−1

YA−1

whenever A and R are nonsingular.

1




3.2 Impulse Response of Linear Systems [3, Section 2.4][4, Section 1.16]

3.2.1 Impulse Function

© k (t)

0{1/k 1/k t

k

Figure 3.1: Functions φk.

Define a sequence (φk) of functions φk ∈ C (R,R), k = 1,2, .. . , by

φk(t) =k(1 − k|t|) if |t| ≤ 1/k;

0 otherwise.(3.1)

Although the sequence (φk) does not converge to any func-tion, it satisfies the following:

Lemma 3.1 Let φk ∈ C (R,R), k = 1, 2, . . . , b e a s

in (3.1). Then

limk→∞

∞

−∞

φk(t − s)f (s) ds = f (t)

for all f ∈ C (R,R) and for all t ∈ R.

Proof . Choose an f ∈ C (R,R) and a t ∈ R. Since f is continuous, the minimum (resp. maximum)of f (s) over all s ∈ [t − 1/k,t + 1/k] is achieved at some point αk (resp. β k) in [t − 1/k,t + 1/k] forany positive k ∈ Z. It follows from

∞

−∞

φk(t − s) ds =

t+1/kt−1/k

φk(t − s) ds = 1

that

f (αk) ≤ ∞

−∞

φk(t − s)f (s) ds ≤ f (β k)

for each positive integer k. Since αk and β k both converge to t as k → ∞ and since f is contin-uous at t, we conclude that limk→∞ f (αk) = limk→∞ f (β k) = f (t) and hence that the sequence

∞

−∞φk(t − s)f (s) ds

must converge to f (t) as k → ∞.

Based on Lemma 3.1, we write

f (t) =

∞

−∞

δ (t − s)f (s) ds (3.2)

or simplyf (t) = (δ ∗ f )(t)

for f ∈ C (R,R) and t ∈ R. Equation (3.2) is a symbolic representation, not an integral, forthe following reasons: First of all, (φk) does not converge to a function; secondly, even if (gk) isa sequence of continuous functions converging to a function g, interchange of limit and integral(i.e., the equality lim

gk =

lim gk) is not guaranteed unless the convergence is uniform [5,

Corollary 4.7]. The symbol δ is called the Dirac delta function (or unit impulse ). In particular, if f ∈ C ([0, ∞),R), then

f (t) =

∞

0−δ (t − s)f (s) ds

for all t ≥ 0. (The lower limit of this integration starts from 0− to include the case of t = 0.)




3.2.2 Integral Representation of Linear Systems

A SISO linear operator H : C (R,R) → C (R,R) is said to admit an integral representation if thereexists an integrable function h : R × R → R such that

(Hu)(t) = ∞

−∞

h(t, s)u(s) ds

for all u ∈ C (R,R) and for all t ∈ R; the function h is called the kernel of the integral representationof H . For example, the differential equation y(t) = u(t), t ≥ t0, with the zero initial conditiony(t0) = 0, admits the integral representation y(t) = (Hu)(t) =

∞

−∞h(t, s)u(s) ds, where its kernel

is given by h(t, s) = 1 for s ∈ [t0, t] and h(t, s) = 0 otherwise. If φk ∈ C (R,R) is as in (3.1) foreach k, then Lemma 3.1 leads to

limk→∞

∞

−∞

h(t, τ )φk(s − τ ) dτ =

∞

−∞

h(t, τ )δ (s − τ ) dτ = h(t, s)

for all t, s ∈ R. Thus we identify the kernel h with the impulse response : h(t, s) is the output attime t due to the unit impulse applied at the input at time s.

Now consider a general MIMO linear operator H : C (R,Rm

) → C (R,Rl

). The linear operator H is said to admit an integral representation if there exists an integrable function H : R × R → R

l×m

(i.e., a function H whose entry hij : R × R → R in row i and column j is integrable for all i and j )such that

H(t, s) =

h11(t, s) · · · h1m(t, s)...

. . . ...

hl1(t, s) · · · hlm(t, s)

for t, s ∈ R and such that

(H u)(t) =

∞

−∞

H(t, s)u(s) ds (3.3)

=

∞

−∞ h11(t, s)u1(s) + · · · + h1m(t, s)um(s) ds... ∞

−∞

hl1(t, s)u1(s) + · · · + hlm(t, s)um(s)

ds

for all u = [u1 · · · um]T ∈ C (R,Rm) and for all t ∈ R. If we let all components of u(s) be zeroexcept for the jth component, then the ith component of the output (H u)(t) assumes the form

yi(t) =

∞

−∞

hij(t, s)u j(s) ds.

Thus hij(t, s), which is entry (i, j) of H(t, s), represents the ith component of the output at time tdue to a unit impulse applied at the j th component of the input at time s (while all other compo-

nents of the input are identically zero), and the matrix H(t, s) is called the impulse response matrix of the system H .

3.2.3 Causality and Relaxedness

It is readily seen that a MIMO linear system H is causal if and only if the impulse response matrixsatisfies H(t, s) = 0 whenever t < s. For causal systems, (3.3) reduces to

(H u)(t) =

t−∞

H(t, s)u(s) ds.




Causality implies that the system output does not depend on the future input, and is a propertythat every physical system possesses.

As C.-T. Chen pointed out in [6], “in developing the input-output description, before an inputis applied, the system must be assumed to be relaxed or at rest, and that the output is excitedsolely and uniquely by the input applied thereafter.” We have assumed so far that the system is

at rest at t = −∞: a MIMO linear system H is said to be relaxed (or at rest ) at t = t0 if we have(H u)(t) = 0 for t ≥ t0 whenever u(t) = 0 for t ≥ t0. If the system H is causal and if it is relaxedat t = t0, then we may write

(H u)(t) =

tt0

H(t, s)u(s) ds. (3.4)

3.2.4 Time-Invariance

If a MIMO linear system H is time-invariant, then we have H(t, s) = H(t − s, 0) for all t, s ∈ R,and so one often abuse notation and write (3.3) as

(H u)(t) = ∞

−∞

H(t − s)u(s)ds.

The right-hand side of this equality is called the convolution integral of H and u and is written as

(H u)(t) = (H ∗ u)(t).

It is easily seen that a linear time-invariant system is causal if and only if H(τ ) = 0 for all τ < 0.

3.2.5 Discrete-Time Systems

The discrete-time impulse function (or unit pulse ) δ is defined by

δ (t) = 1 if t = 0;

0 otherwise

for all t ∈ Z, and we may write

u(t) =∞

s=−∞

δ (t − s)u(s)

for all u : Z → R and for all t ∈ Z. Suppose that a discrete-time SISO linear operator H has arepresentation of the form

(Hu)(t) =∞

s=−∞

h(t, s)u(s) (3.5)

when u : Z → R and h : Z × Z → R are such that the infinite sum on the right-hand side of (3.5) iswell defined for all t ∈ Z. For example, the sum is guaranteed to be finite if either of the followingholds [3, Pages 60–61]:

•

∞

s=−∞ h(t, s)2 < ∞ for each t ∈ Z, and

∞

s=−∞ u(s)2 < ∞(due to the Schwarz inequality);

•

∞

s=−∞ |h(t, s)| < ∞ for each t ∈ Z, and sups∈Z |u(s)| < ∞,where sups∈Z |u(s)| denotes the supremum (i.e., least upper bound) on |u(s)| over all s ∈ Z.




Then we have∞

s=−∞

h(t, τ )δ (s − τ ) = h(t, s)

for all t, s ∈ Z, and so h is said to be the discrete-time impulse response (or unit pulse response )of H : h(t, s) represents the response of H at time t to a unit pulse occurring at time s.

If H is a discrete-time MIMO linear operator that admits an integral representation, then (3.5)is generalized to

(H u)(t) =

∞s=−∞

H(t, s)u(s) (3.6)

=

∞s=−∞

h11(t, s)u1(s) + · · · + h1m(t, s)um(s)

...

∞

s=−∞

hl1(t, s)u1(s) + · · · + hlm(t, s)um(s)

for some H : Z × Z → Rl×m and for all u : Z → Rm such that the infinite sum is well defined. The

matrix H(t, s) is called the discrete-time impulse response matrix , and the entries of H(t, s) have asimilar interpretation to their continuous-time counterparts. The system H is causal if and only if H(t, s) = 0 whenever t < s. On the other hand, the system is relaxed at t = t0 if (H u)(t) = 0 fort ≥ t0 whenever u(t) = 0 for t ≥ t0. If H is causal and relaxed at t0, then

(H u)(t) =t

s=t0

H(t, s)u(s). (3.7)

Finally, if H is time-invariant, then H (t, s) = H (t − s, 0); we abuse notation and write (3.6) as aconvolution sum

(H u)(t) =∞

s=−∞

H(t − s)u(s).

3.3 Transfer Function Description of Linear Time-Invariant Systems

3.3.1 Continuous-Time Case

The Laplace transform of a function f : [0, ∞) → R (including the Dirac delta function) is denotedby f (s) and defined as

f (s) =

∞

0−e−stf (t) dt

where s is a complex variable. (The lower limit 0− is to include a possible impulse in f at time 0.)Tables of Laplace transform pairs and properties are in [3, Page 92].

Suppose that a continuous-time MIMO linear time-invariant system H is causal and relaxed att = 0. Then taking the Laplace transform of both sides of its integral representation

y(t) =

∞

0

H(t − s)u(s) ds,

with H(t − s) = 0 for t < s, yieldsy(s) = H(s)u(s)




due to the convolution property of Laplace transforms, where

y(s) =

y1(s)...

yl(s)

, u(s) =

u1(s)...

um(s)

, H(s) =

h11(s) · · · h1m(s)...

. . . ...

hl1(s) · · · hlm(s)

.

The matrix H(s), which is the Laplace transform of the impulse response matrix, is called thetransfer function matrix of the system H .

3.3.2 Discrete-Time Case

The z -transform of a function f : {0, 1, . . . } → R is denoted by f (z) and defined as

f (z) =∞t=0

z−tf (t).

Tables of z -transform pairs and properties can be found in [3, Page 113].

If H is a discrete-time MIMO linear time-invariant system which is causal and relaxed at t = 0,then it can be shown that taking the z-transform of both sides of its representation

y(t) =∞s=0

H(t − s)u(s),

with H(t − s) = 0 for t < s, leads to

y(z) = H(z)u(z)

where

y(z) = y1(z)

...yl(z) , u(z) =

u1(z)...

um(z) , H(z) = h11(z) · · · h1m(z)

... . . .

...

hl1(z) · · · hlm(z) .

The matrix H(z) is called the discrete-time transfer function matrix of the system H .

3.4 Response of Linear Systems

3.4.1 Continuous-Time Case [3, Section 3.4]

Consider the continuous-time linear system of the form

x(t) = A(t)x(t) + B(t)u(t), (3.8a)

y(t) = C(t)x(t) + D(t)u(t) (3.8b)

with the initial statex(t0) = x0.

If A ∈ C (R,Rn×n), B ∈ C (R,Rn×m), C ∈ C (R,Rl×n), D ∈ C (R,Rl×m), and u ∈ C (R,Rm), thenthe total system response is given by

y(t) = C(t)Φ(t, t0)x0 zero-input

response

+

tt0

C(t)Φ(t, s)B(s)u(s) ds + D(t)u(t) zero-state response




for all t ∈ R, where Φ(t, s) is the state transition matrix of the system (3.8). Setting x0 = 0 andcomparing this with the integral representation (3.4) for causal linear systems relaxed at t0 yieldsthe impulse response matrix

H(t, s) = C(t)Φ(t, s)B(s) + D(t)δ (t − s) if t ≥ s;

0 if t < s.If the system (3.8) is time-invariant, so that A(t) = A, B(t) = B, C(t) = C, and D(t) = D for

all t, then taking the Laplace transform of both sides of (3.8a) and (3.8b) with t0 = 0 leads to

sx(s) − x(0) = Ax(s) + Bu(s),

y(s) = Cx(s) + Du(s),

where the first equality follows from the time differentiation property of Laplace transforms. There-fore, we obtain

y(s) = C(sI − A)−1x(0) + C(sI − A)−1Bu(s) + Du(s), (3.9)

which, together with x(0) = 0, yields the expression for the transfer function matrix: H(s) = C(sI − A)−1B + D.

Similarly, equation (3.9), together with u(t) = 0 for all t (i.e., u(s) = 0), yields that the Laplacetransform of the state transition matrix Φ(t, 0) = eAt, t ≥ 0, is given by the matrix (sI − A)−1,which is known as the resolvent of A.

3.4.2 Discrete-Time Case [3, Section 3.5]

The total system response of the discrete-time linear system

x(t + 1) = A(t)x(t) + B(t)u(t), (3.10a)

y(t) = C(t)x(t) + D(t)u(t) (3.10b)

with the initial statex(t0) = x0

is given by

y(t) = C(t)Φ(t, t0)x0 zero-input

response

+

t−1s=t0

C(t)Φ(t, s + 1)B(s)u(s) + D(t)u(t) zero-state response

for all t = t0, t0 + 1, . . . . Comparing this with (3.7) yields that the discrete-time impulse responsematrix is as follows:

H(t, s) =C(t)Φ(t, s + 1)B(s) if t > s;

D(t) if t = s;

0 if t < s.

If the system (3.10) is time-invariant with the initial time t0 = 0, then taking the z-transformof both sides of (3.10a) and (3.10b) with t0 = 0 leads to

zx(z) − zx(0) = Ax(z) + Bu(z),

y(z) = Cx(z) + Du(z),




where the first equality follows from the time advance property of z-transforms. Therefore, weobtain that the z-transform of the total response y(t), t = 0, 1, . . . , is given by

y(z) = Cz(zI − A)−1x(0) + C(zI − A)−1Bu(z) + Du(z).

Putting x(0) = 0 yields that the discrete-time transfer function matrix is expressed as H(z) = C(zI − A)−1B + D.

On the other hand, putting u(t) = 0 for all t (i.e., u(z) = 0) yields that the z-transform of thestate transition matrix Φ(t, 0) = At, t = 0, 1, . . . , is given by z(zI − A)−1.

3.4.3 Equivalent Descriptions of LTI Systems

Given a state-space model (A, B, C, D) of a linear time-invariant system of the form

x(t) = Ax(t) + Bu(t),

y(t) = Cx(t) + Du(t), or

x(t + 1) = Ax(t) + Bu(t),

y(t) = Cx(t) + Du(t),

let us change the state vector x to a new state vector x via

x = Px ⇔

x1...

xn

=

p11x1 + · · · + p1nxn...

pn1x1 + · · · + pnnxn

⇔ x = P−1x

for some nonsingular matrix P = ( pij) ∈ Rn×n, whose entry (i, j) is pij for all i and j. Then we

obtain a new state-space model ( A, B, C, D) of the form

˙x(t) = Ax(t) + Bu(t),

y(t) =

Cx(t) +

Du(t),

orx(t + 1) = Ax(t) + Bu(t),

y(t) =

Cx(t) +

Du(t),

where A = P−1AP, B = P−1B, C = CP, and D = D. It follows from

y(s) = C(sI − A)−1x(0) + C(sI − A)−1Bu(s) + Du(s)

= C(sI − A)−1x(0) + C(sI − A)−1 Bu(s) + Du(s),

and similar equations for the discrete-time case, that the two state-space models (A, B, C, D)and (A, B, C, D) share the same external (input-output) descriptions. Therefore, they are calledequivalent internal (state-space) descriptions of an LTI system.

3.5 Example: A Quarter-Car Model [7, Example 2.11]

Fig. 3.2 shows a simplified model of an automobile suspension system of one wheel. In this “quarter-car” model, M 1 is the mass of a quarter of the car body, M 2 the mass of a wheel, B1 the dampingfactor of the shock absorber, B2 the damping factor of the tire, K 1 the stiffness of the springs,and K 2 the stiffness of the tire. If z1(t) and z2(t) denote the vertical displacements of the car bodyand the wheel from their nominal heights, respectively, at time t, then the equations of motion forthe system are given by

M 1z1(t) = f 1(t) − B1

z1(t) − z2(t)

− K 1

z1(t) − z2(t)

,

M 2z2(t) = f 2(t) − B1

z2(t) − z1(t)

− K 1

z2(t) − z1(t)

− B2 z2(t) − K 2z2(t),




z 1(t )M 1

M 2

K 2

K 1 B 1

B 2

z 2(t )

f 1(t )

f 2(t )

Automobile

Suspensionsystem

Wheel

Tire

Figure 3.2: A quarter-car model.

where f 1(t) is the force at time t due to aerodynamic load and load transfer effects, and f 2(t) isthe force at time t due to the tire copying the road bumps. Under zero initial conditions, theseequations give

M 1s2 + B1s + K 1 −B1s − K 1−B1s − K 1 M 2s2 + (B1 + B2)s + (K 1 + K 2)

z1(s)z2(s)

=f 1(s)

f 2(s)

or, equivalently,

z1(s)z2(s)

=

M 1s2 + B1s + K 1 −B1s − K 1

−B1s − K 1 M 2s2 + (B1 + B2)s + (K 1 + K 2)

−1 f 1(s)

f 2(s)

=

1

∆(s)

M 2s2 + (B1 + B2)s + (K 1 + K 2) B1s + K 1

B1s + K 1 M 1s2 + B1s + K 1

f 1(s)

f 2(s)

,

where

∆(s) = M 1M 2s4 + (B1M 1 + B1M 2 + B2M 1)s3

+ (K 1M 1 + K 1M 2 + K 2M 1 + B1B2)s2 + (K 1B2 + K 2B1)s + K 1K 2.

Let y = z1 be the output, and let u = [f 1 f 2]T be the input. Then the transfer function matrix forthe quarter-car system is given by

H(s) =

M 2s2 + (B1 + B2)s + (K 1 + K 2)

∆(s)

B1s + K 1∆(s)

,

and we have the input-output description y(s) =

H(s)u(s) in the Laplace-transform domain.

While there is a unique external description of the quarter-car model, there are infinitely manyinternal descriptions for the model. For instance, if we define x1 = z1, x2 = z1, x3 = z2, x4 = z2,u1 = f 1, u2 = f 2, and y = z1, then the resulting state-space model reads

x(t) = Ax(t) + Bu(t);

y(t) = Cx(t) + Du(t),




where x = [x1 x2 x3 x4]T, u = [u1 u2]T, and

A =

0 1 0 0−K 1/M 1 −B1/M 1 K 1/M 1 B1/M 1

0 0 0 1K 1/M 1 B1/M 2 −(K 1 + K 2)/M 2 −(B1 + B2)/M 2

, B =

0 01/M 1 0

0 00 1/M 2

,

C = 1 0 0 0 , D = 0 0 .

Indeed, we recover the transfer function matrix via H(s) = C(sI − A)−1B + D.

References

[1] T. Kailath, Linear Systems . Englewood Cliffs, NJ: Prentice-Hall, 1980.




[5] C. C. Pugh, Real Mathematical Analysis . New York, NY: Springer, 2002.

[6] C.-T. Chen, Linear System Theory and Design , 2nd ed. New York, NY: Oxford UniversityPress, 1984.

[7] C. L. Phillips and R. D. Harbor, Feedback Control Systems , 4th ed. Upper Saddle River, NJ:Prentice Hall, 2000.




IV. Finite-Dimensional Vector Spaces



Fall 2010

4.1 Introduction

The space of complex numbers is denoted by C. Depending on the context, we will use the n-tuplerepresentation of the elements of Cn and write x = (x1, . . . , xn) ∈ C

n, or use their column vector

representations and write x

= [x1

· · · xn]

T

∈ C

n

. Since R

⊂ C

, the same notational conventionapplies to the elements of the Euclidean space Rn.Some of the basic linear algebraic tools are required for rigorous analysis of linear dynamical

systems. We will first study the properties of linear maps between finite-dimensional vector spaces.For our purposes, finite-dimensional vector spaces can be taken to be either Rn or Cn dependingon the context. This study will prove useful in the analysis and synthesis of linear systems.

4.2 Vector Spaces [1, Appendix A.1]

4.2.1 Fields

Definition 4.1 Let F be a set, and let + and · be two binary operations that map F × F into F .The tuple (F, +, ·), or the set F (with + and · understood), is called a field if the following axioms are satisfied:

(a) α + β = β + α and αβ = βα for all α, β ∈ F ; (commutativity)α + (β + γ ) = (α + β ) + γ and α(βγ ) = (αβ )γ for all α, β , γ ∈ F ; (associativity)α(β + γ ) = αβ + αγ for all α, β , γ ∈ F . (distributivity)

(b) There exists 0F ∈ F such that 0F + α = α for all α ∈ F ; (identity w.r.t. +)There exists 1F ∈ F with 1F = 0F , such that 1F α = α for all α ∈ F . (identity w.r.t. ·)

(c) For each α ∈ F there exists −α ∈ F such that α + (−α) = 0F ; (inverse w.r.t. +)For each α ∈ F with α = 0F , there exists α−1 ∈ F such that αα−1 = 1F . (inverse w.r.t. ·)

If (F, +, ·) is a field, then + and · are called addition and multiplication, respectively.

The set of real numbers R and the set of complex numbers C, equipped with the usual additionand multiplication, are fields. the set of rational functions (i.e., rational fractions over polynomialswith real coefficients), denoted R(s), is a field. The set {0, 1} does not form a field if we use theusual definition of addition and multiplication, because the element 1 + 1 = 2 is not in the set{0, 1}. However, if we define 0 + 0 = 1 + 1 = 0, 1 + 0 = 0 + 1 = 1, and 1 · 0 = 0 · 1 = 0 · 0 = 0,1 · 1 = 1, then it can be verified that ({0, 1}, +, ·) is a field, called the field of binary numbers. Theset of integers Z and the set of polynomials with real coefficients do not form a field under theusual addition and multiplication because they have no multiplicative inverse.

1




4.2.2 Vector Spaces

Definition 4.2 Let V be a set, let F be a field, let + be a binary operation that maps V ×V into V ,and let · be a binary operation that maps F × V into V . The tuple (V,F, +, ·), or the pair (V, F )(with + and · understood) or the set V (with F , +, and · understood), is called a vector space (or linear space) over F if the following axioms are satisfied:

(a) x + y = y + x for all x, y ∈ V ;x + (y + z) = (x + y) + z for all x, y, z ∈ V ;There exists a unique 0V ∈ V such that 0V + x = x for all x ∈ V . (zero vector or origin)

(b) (αβ )x = α(βx) for all α, β ∈ F and for all x ∈ V ;0F x = 0V and 1F x = x for all x ∈ V .

(c) α(x + y) = αx + αy for all α ∈ F and for all x, y ∈ V ;(α + β )x = αx + βx for all α, β ∈ F and for all x ∈ V .

If (V,F, +, ·) is a vector space, then the elements of V and F are called vectors and scalars,respectively, and the binary operations + and · are called vector addition and scalar multiplication.

If F is a field, denote the set of n-tuples of elements of F by F n. With x = (x1, . . . , xn) ∈ F n,y = (y1, . . . , yn) ∈ F n, and α ∈ F , define vector addition and scalar multiplication by

x + y = (x1 + y1, . . . , xn + yn) and αx = (αx1, . . . , α xn). (4.1)

Then F n is a vector space over the field F . In particular, Rn and Cn are vector spaces, and calledthe n-dimensional real vector space and n-dimensional complex vector space , respectively. Theseare examples of finite-dimensional vector spaces. If x = (x1, . . . , xn) ∈ F n with either F = R orF = C, we will use the column vector representation x = [x1 · · · xn]T whenever necessary.

4.2.3 Linear Subspaces

Definition 4.3 A nonempty subset W of a vector space V over a field F is called a (linear)subspace of V if αx + βy ∈ W whenever α, β ∈ F and x, y ∈ W .

A linear subspace is a vector space in its own right. If W 1 and W 2 are subspaces of a vectorspace V , then W 1 ∩ W 2 is also a linear subspace of V . To show that a set V is a vector space, itsuffices to show that it is a subspace of some vector space. For instance, the set of all points on astraight line in Rn passing through the origin is a subspace of Rn. On the other hand, a straightline that does not pass through the origin is not a subspace of Rn.

Let W be a set in a linear space V . A vector v ∈ V is said to be a linear combination of vectors in W if there exists a finite set of elements w1, . . . , wn ∈ W and a finite set of scalars α1,

. . . , αn ∈ F such thatv = α1w1 + · · · + αnwn.

Let W be a nonempty subset of a linear space V , and denote by span(W ) the set of all linearcombinations of the vectors from W . Then span(W ) is a linear subspace of V , called the linear subspace generated by the set W , or simply, the span of W ; the set W is said to span this linearsubspace. For example, the set {(1, 1, 0), (0, 0, 1)} ⊂ R3 spans the plane {(x1, x2, x3) ∈ R3 : x1 =x2}, which is a subspace of R3. It is readily verified that span(W ) is the smallest linear subspaceof a vector space V containing the subset W of V .




4.3 Bases for Vector Spaces [1, Appendix A.2]

4.3.1 Linear Independence

Definition 4.4 Let W = {v1, . . . , vm} be a finite nonempty set in a vector space V . If there exist scalars α1, . . . , αm, not all zero, such that

α1v1 + · · · + αmvm = 0,

then the set W is said to be linearly dependent over F . If a set is not linearly dependent, then it is said to be linearly independent. An infinite set of vectors W in V is said to be linearly independent if every finite subset of W is linearly independent.

If U is a finite set in a linear space V , then it is easily shown that U is linearly independent if and only if there is no proper subset Z of U such that span(Z ) = span(U ). For example, while theset {(1, 0), (0, 1), (2, −1)} ⊂ R2 is linearly dependent over the real field as 2(1, 0) − (0, 1) − (2, −1) =0, the sets {(1, 0), (0, 1)}, {(1, 0), (2, −1)}, and {(0, 1), (2, −1)} are all linearly independent; inparticular, span{(1, 0), (2, −1)} = span{(1, 0), (0, 1), (2, −1)} = R2.

4.3.2 Bases

Definition 4.5 A set W in a linear space V is called a basis for V if W is linearly independent and if span(W ) = V .

For example, each of the sets {(1, 0), (0, 1)}, {(1, 0), (2, −1)}, {(0, 1), (2, −1)}, . . ., is a basisfor R2, but none of the sets {(1, 0), (−2, 0)}, {(1, −1), (−2, 2)}, . . ., is. Let B = {v1, . . . , vn} be abasis for a linear space V . Then it is easily shown that for each vector v ∈ V , there exist uniquescalars α1, . . . , αn such that v = α1v1+· · ·+αnvn (Proof . If v = α1v1+· · ·+αnvn = β 1v1+· · ·+β nvn,then (α1 − β 1)v1 + · · · + (αn − β n)vn = 0, so by the linear independence of B we have α1 = β 1, . . . ,αn = β n.); these scalars are called the coordinates of v with respect to the basis B. Moreover, if {u1, . . . , um} is any linearly independent set of vectors in V , then m ≤ n; each basis of V consists

of exactly n elements. If the number of elements in a basis for a linear space V is n < ∞, then V is said to be of n-dimensional , or finite-dimensional , and we write dim V = n. Otherwise, V is saidto be infinite-dimensional .

Suppose either F = R or F = C. Consider the n-dimensional vector space F n over F and letB = {v1, . . . , vn} be a basis for F n. Then associated with every x ∈ F n is a unique n-tuple of scalars(α1, . . . , αn) relative to the basis B such that x = α1x1 + · · · + αnxn. The vector α = (α1, . . . , αn)is called the coordinate representation of the vector x with respect to the basis B . In particular, if

e1 =

10...0

, e2 =

01...0

, . . . , en =

00...1

are the columns of the identity matrix I ∈ F n×n, then the set {e1, . . . , en} is called the natural basis (or the standard basis ) for F n; if x = (x1, . . . , xn) ∈ F n, then the vector x = [x1 . . . xn]T

(i.e., x itself) is the coordinate representation of x with respect to the standard basis.As another example, consider the vector space of all polynomials p : C → C of degree less

than n with real coefficients over the field of real numbers. A basis for this space is then givenby B = {1, s , . . . , sn−1}, where s is a complex variable. Associated with each element p(s) =α0 + α1s + · · · + αn−1sn−1 is the unique n-vector α = (α0, . . . , αn−1) ∈ Rn, which constitutes thecoordinate representation of p(s) with respect to the basis B .




4.4 Linear Transformations [1, Appendix A.3]

4.4.1 Matrix Representations

Definition 4.6 Let V and W be vector spaces over the same field F . A function (or map) H : V →W is said to be a linear transformation (or linear map) if

H (αx + βy) = αH (x) + βH (y)

for all α, β ∈ F and for all x, y ∈ V .

Let V and W be two vector spaces over the same field F . Let L(V, W ) be the set of all lineartransformations from V into W . Define the sum of S , T ∈ L(V, W ) by

(S + T )(v) = S (v) + T (v)

for all v ∈ V . Also, define the multiplication of T ∈ L(V, W ) by a scalar α ∈ F by

(αT )(v) = αT (v)

for all v ∈ V . Then L(V, W ) is a vector space over F , called the space of linear transformations .For example, a function f : R2 → R is linear if and only if there exist a1, a2 ∈ R such thatf (x1, x2) = a1x1 + a2x2 for all (x1, x2) ∈ R2, and the set of all real-valued functions of realvariables of the form f (x1, x2) = a1x1 + a2x2 (i.e., the set of all 1-by-2 real matrices [a1 a2]) is avector space; obviously, this vector space is nothing but R1×2.

Theorem 4.7 Let (V, F ) and (W, F ) denote n-dimensional and m-dimensional vector spaces, re-spectively; let BV = {v1, . . . , vn} and BW = {w1, . . . , wm} be bases for V and W , respectively. If T : V → W is a linear transformation, then there exists a unique matrix representation A ∈ F m×n

of T with respect to the bases BV and BW such that, for any v ∈ V ,

v = v1 · · · vnα and T (v) = w1 · · · wm

β (4.2)

implies β = Aα with A =

a1 · · · an

,

where the columns a1, . . . , an ∈ F m of A satisfy

T (v1) =

w1 · · · wm

a1, . . . , T (vn) =

w1 · · · wm

an.

Proof . Suppose (4.2) holds with α = [α1 · · · αn]T. Then, since T is linear, we have

T (v) = T

v1 · · · vn

α

= T

n

1αivi

=

n

1αiT (vi) =

T (v1) · · · T (vn)

α.

This, along with T (v) =

w1 · · · wm

β, implies that β = Aα, where

A =

w1 · · · wm

−1 T (v1) · · · T (vn)

.

Now the desired result follows from

w1 · · · wm

a1 · · · an

=

T (v1) · · · T (vn)

.

A special case of Theorem 4.7 is as follows. With F being a field, let T = (T 1, . . . , T m) : F n →F m be a linear transformation; let A ∈ F m×n be the unique matrix representation of T with respect




to the standard bases for F n and F m. Then, for all x = (x1, . . . , xn) ∈ F n, we have T (x) = Ax

whenever x = [x1 · · · xn]T is the column vector representation of x; that is,

T (x) =

T 1(x1, . . . , xn)T 2(x1, . . . , xn)

...T m(x1, . . . , xn)

=

T 1(1, 0, . . . , 0) T 1(0, 1, . . . , 0) · · · T 1(0, 0, . . . , 1)T 2(1, 0, . . . , 0) T 2(0, 1, . . . , 0) · · · T 2(0, 0, . . . , 1)

... ...

...T m(1, 0, . . . , 0) T m(0, 1, . . . , 0) · · · T m(0, 0, . . . , 1)

A

x1

x2

...xn

x

.

For example, if f : R2 → R2 is linear with f (1, 0) =

3−1

and f (0, 1) = [ 1

0], then we have f (x1, x2) =

3−1

x1 + [ 1

0] x2 for all (x1, x2) ∈ R2. Thus, this linear map f has matrix representation

3 1−1 0

with respect to the standard bases for the domain R2 and the codomain R2. On the other hand, if we use the basis B1 = {(1, −1), (1, 0)} for the domain and the basis B2 = {(1, 0), (0, −1)} for the

codomain, then the linear map f has matrix representation

1 00 −1

−1

3 1−1 0

1 1−1 0

= [ 2 3

1 1].

4.4.2 Fundamental Theorem of Linear Equations

If T : V → W is a linear transformation, the null space (or kernel ) of T is defined as

N (T ) = {v ∈ V : T (v) = 0},

and the range (or image ) of T is defined to be the set

R(T ) = T (V ) = {w ∈ W : w = T (v), v ∈ V }.

It is readily seen that T is injective (i.e., one-to-one) if and only if N (T ) = {0}, and that T issurjective (i.e., onto) if and only if R(T ) = W .

Theorem 4.8 If T : V → W is a linear transformation and if V is finite-dimensional, then N (T )and R(T ) are linear subspaces of V and W , respectively, and satisfy

dim N (T ) + dim R(T ) = dim V.

This result is called the fundamental theorem of linear equations [2, Theorem 3.4]; the dimensionof R(T ) is called the rank of T and denoted by rank T , and the dimension of N (T ) is called thenullity of T . If F is a field and a linear map T : F n → F m has a matrix representation A ∈F m×n (with respect to the standard bases), so that T (x) = Ax whenever x is the column vectorrepresentation of x ∈ F n, then the range R(A) and null space N (A) of A are defined to be equalto R(T ) and N (T ), respectively. Denoted by rank A is the rank of A, which is defined to be equalto rank T . The number of linearly independent columns (resp. rows) in A is called the column rank (resp. row rank ) of A; the column rank and the row rank of A both coincide with the rank of A.In fact, the span of the columns in A is equal to R(A). For example, A =

1 −2 1

−1 2 1

has rank 2

(i.e., two linearly independent rows and two linearly independent columns) and so its null space

N (A) = {(x1, x2, x3) ∈ C3 : x1 − 2x2 + x3 = 0, −x1 + 2x2 + x3 = 0}

= span{(2, 1, 0)}

is of dimension 1, and its range

R(A) = {(y1, y2) ∈ C2 : y1 = x1 − 2x2 + x3, y2 = −x1 + 2x2 + x3, (x1, x2, x3) ∈ C

3}

= span{(1, −1), (1, 1)} = C2

has dimension 2.




4.4.3 Direct Sums of Linear Subspaces

Let V be a vector space and let W and U be subsets of V . The sum of W and U is defined by

W + U = {w + u : w ∈ W, u ∈ U }.

In particular, if W and U are linear subspaces of V , then W + U is also a linear subspace. If W and U are linear subspaces of V such that U ∩ W = {0}, then the sum W + U is called a direct sum and denoted by W ⊕ U . It is easily verified that U ∩ W = {0} if and only if, for every v ∈ U + W ,there exist unique elements w ∈ W and u ∈ U such that v = u + w. Let BU = {u1, . . . , um} andBW = {w1, . . . , wl} be bases for linear subspaces U and W , respectively. Then V = U ⊕ W if andonly if BU ∪ BW is a basis for V . Consequently, dim(U ⊕ W ) = dim U + dim W .

If x1, x2 ∈ Rn are such that xT

1x2 = 0, then we say x1 and x2 are orthogonal to each other.

It is easy to see that orthogonal vectors are linearly independent. Suppose W and U are linearsubspaces of Rn. If we have wTu = 0 for all w ∈ W and for all u ∈ U , then W and U are saidto b e orthogonal to each other. In particular, if W ⊕ U = Rn and if W and U are orthogonal toeach other, then we write W ⊥ = U (and say U is the orthogonal complement of W ) and U ⊥ = W .For example, if W = span{(1, 0)} and U span{(1, 1)}, then we have that W + U = W ⊕ U = R2;

however, W and U are not orthogonal to each other because [ 10

]T [ 11

] = 1 = 0. On the other hand,if W = span{(−1, 1)} and U = span{(1, 1)}, then we have not only W ⊕ U = Rn but also W ⊥ = U .

Theorem 4.9 If A ∈ Rm×n, then the following hold:

(a) R(A) ⊕ N (AT) = Rm and R(A)⊥ = N (AT);

(b) R(AT) ⊕ N (A) = Rn and R(AT)⊥ = N (A);

(c) N (A) = N (ATA);

(d) R(A) = R(AAT).

Proof . To prove (a), Choose y1 ∈ R(A) and y2 ∈ N (AT). Then we have y1 = Ax for somex ∈ Rn, so yT

1y2 = (Ax)Ty2 = xT(ATy2) = 0. This implies R(A) is orthogonal to N (AT). Now,

choose a nonzero y ∈ N (AT). Then ATy = 0 and so yTAx = 0 for all x ∈ Rn. This implies

y = Ax for any x ∈ Rn; that is, y /∈ R(A). Hence R(A) ∩ N (AT) = {0}. Since rank A = rank AT,

Theorem 4.8 applied to AT implies dim R(A) + dim N (AT) = dim R(AT) + dim N (AT) = m.This means that, if {v1, . . . , vk} is a basis for R(A) and if {w1, . . . , wl} is a basis for N (AT),then k + l = m. Since vi and w j are orthogonal to each other for all i and j, we have that{v1, . . . , vk, w1, . . . , wl} is a linearly independent set of vectors. Since R(A)⊕N (AT) is a subspaceof Rm, this is possible only if R(A) ⊕ N (AT) = R

m. Moreover, as R(A) is orthogonal to N (AT),we conclude that R(A)⊥ = N (AT). This complete the proof of (a).

Replacing A in (a) with AT yields (b) as well. To prove (c), suppose ATAx = 0. Then wehave xTATAx = Ax2 = 0, and so Ax = 0. Thus N (ATA) ⊂ N (A). Conversely, if Ax = 0,then ATAx = 0 as well, so N (A) ⊂ N (ATA). This proves (c). Lastly, it follows from (a)–(c) thatR(A) = N (AT)⊥ = N (AAT)⊥ = R(AAT), so that (d) holds true.

By part (d) of Theorem 4.9, we have rank A = rank AAT, and so A has full row rank (i.e.,rank A = m) if and only if AAT is nonsingular . Similarly, we have rank AT = rank ATA, so A

has full column rank (i.e., rank A = n) if and only if ATA ∈ Rn×n is nonsingular .




4.4.4 Linear Algebraic Equations

Consider the system of linear algebraic equations of the form

Ax = y, (4.3)

where A ∈ Rm×n

and y ∈ Rm

are given and x ∈ Rn

is to be determined. For a given y, thereexists at least one solution x to (4.3) if and only if y ∈ R(A). Every solution x to (4.3) can beexpressed as a sum x = x p + xh, where x p is a particular solution of (4.3) and xh satisfies Axh = 0.Thus, the solution to (4.3) is unique if and only if N (A) = {0}. The following are consequences of Theorem 4.9, which are presented without proof.

Theorem 4.10 (Least-Squares Solution) A matrix A ∈ Rm×n has full row rank (i.e., m ≤ n and

rank A = m) if and only if it represents a surjective linear map. In this case, AAT is nonsingular,and x0 = AT(AAT)−1y is such that Ax0 = y and x0 ≤ x for all solutions x to (4.3).

Theorem 4.11 (Least-Squares Approximation) A matrix A ∈ Rm×n has full column rank (i.e.,m ≥ n and rank A = n) if and only if it represents an injective linear map. In this case, ATA is

nonsingular, and x0 = (ATA)−1ATy is such that y − Ax0 ≤ y − Ax for all x ∈ Rn.

Theorem 4.12 (Unique Solution) A matrix A ∈ Rn×n has full row and column ranks (i.e., m = n

and rank A = n) if and only if A represents a bijective linear map. In this case, A is nonsingular,and x0 = A−1y is the unique solution to (4.3).

In Theorem 4.10, because x0 = AT(AAT)−1y is such that x0 − 0 ∈ R(AT) = N (A)⊥, we havexT(x0 − 0) = 0 for all x ∈ N (A). For this reason, we say that the origin is the orthogonal projection of x0 onto N (A). On the other hand, in Theorem 4.11, b ecause x0 = (ATA)−1ATy is such thatAx0 − y ∈ N (AT) = R(A)⊥, we see that zT(Ax0 − y) = 0 for all z ∈ R(A). For this reason, wesay that Ax0 is the orthogonal projection of y onto R(A).

4.5 Equivalence and Similarity [1, Appendix A.4]

4.5.1 Change of Bases: Vector Case

Let A ∈ F m×n be the matrix representation of a linear transformation that maps an n-dimensionalvector space (V, F ) into an m-dimensional vector space (W, F ). Let {v1, . . . , vn} be a basis for V and let {v1, . . . , vn} be a set of vectors in V given by

v1 · · · vn

=

v1 · · · vn

P (4.4a)

with

P = p1 · · · pn =

p11 · · · p1n

... . . . ... pn1 · · · pnn

. (4.4b)

The set {v1, . . . , vn} forms a basis for V if and only if the matrix P = ( pij) ∈ F n×n is nonsingular .(Proof . If P is nonsingular, then [v1 · · · vn] is nonsingular, and so sufficiency follows. If pn =α1p1 +· · ·+αn−1pn−1, then direct computation shows vn = [v1 · · · vn]pn = α1v1 +· · ·+αn−1vn−1,so {v1, . . . , vn} cannot b e linearly independent. This proves necessity.) We call P the matrix of the basis {v1, . . . , vn} with respect to the basis {v1, . . . , vn}. Here, the jth column p j of P isthe coordinate representation of v j with respect to the basis {v1, . . . , vn}. Given a v ∈ V , let




B = {v1, . . . , vn} and B = {v1, . . . , vn} be bases for the vector space V . Suppose that α and αare the coordinate representations of v with respect to the bases B and B, respectively; that is,

v =

v1 · · · vn

α =

v1 · · · vn

α.

Then it follows from P = [v1 · · · vn]−1[v1 · · · vn] that we have

Pα = α,

where P is the matrix of the basis B with respect to the basis B satisfying (4.4).

4.5.2 Change of Bases: Matrix Case

Let V and W be n-dimensional and m-dimensional vector spaces, respectively, over a commonfield F , and let {v1, . . . , vn} and {w1, . . . , wm} be bases for V and W , respectively. Suppose that{v1, . . . , vn} is another basis for V with P ∈ F n×n being the matrix of {v1, . . . , vn} with respectto {v1, . . . , vn}, and that {w1, . . . , wm} is another basis for W with Q ∈ F m×m being the matrixof {w1, . . . , wm} with respect to {w1, . . . , wm}. Then we have

v1 · · · vn

=

v1 · · · vn

P,

w1 · · · wm

Q =

w1 · · · wm

.

Let T : V → W be a linear transformation. If A ∈ F m×n is the matrix of T with respect to thebases {v1, . . . , vn} and {w1, . . . , wm}, and if A ∈ F m×n is the matrix of T with respect to thebases {v1, . . . , vn} and {w1, . . . , wm}, then it follows from y = T (x) with y = [w1 · · · wm]β =[w1 · · · wm]β, x = [v1 · · · vn]α = [v1 · · · vn]α, and β = Aα that we have β = Aα with

A = QAP. (4.5)

Definition 4.13 Let F be a field. A matrix A ∈ F m×n is said to be equivalent to an A ∈ F m×n

if there exist nonsingular matrices Q ∈ F m×m and P ∈ F n×n such that (4.5) is satisfied.

It can be shown that two matrices A and B are equivalent if and only if rank A = rank B; thatis, if A, B ∈ Rm×n, then rank A = rank B if and only if there exist nonsingular P ∈ Rn×n and Q ∈ Rm×m such that B = QAP.

4.5.3 Similarity Transformation

Let V be an n-dimensional vector space over a field F , and let {v1, . . . , vn} and {v1, . . . , vn} bebases for V . Let P ∈ F n×n be the matrix of {v1, . . . , vn} with respect to {v1, . . . , vn}. If T : V → V is a linear transformation, and if A ∈ F n×n is the matrix of T with respect to {v1, . . . , vn}, then

the matrix A ∈ F n×n

of T with respect to {v1, . . . , vn} is given by

A = P−1AP. (4.6)

Definition 4.14 Let F be a field. A matrix A ∈ F n×n is said to be similar to an A ∈ F n×n if there exists a nonsingular matrix P ∈ F n×n such that (4.6) is satisfied. The function f : F n×n →F n×n defined by f (A) = P−1AP for some nonsingular P and for all A is called a similaritytransformation.




4.5.4 Polynomials of Linear Transformations

If T : Cn → Cn is a linear transformation, then we write T 0 = I (i.e., the identity map such thatIx = x for all x ∈ Cn), T 1 = T , T 2 = T T = T (T (·)), and so on. In general, if p(s) is a polynomialof degree m in s given by

p(s) = α0 + α1s + · · · + αmsm, (4.7)

then we write p(T ) = α0I + α1T 1 + · · · + αmT m. It is readily seen that, whenever A : Cn×n is thematrix of T with respect to the standard basis, then p(A) defined by

p(A) = α0I + α1A + · · · + αmAm

is the matrix of p(T ) with respect to the standard basis. For A, P ∈ Cn×n, if P is nonsingular

and p is as in (4.7), we have p(P−1AP) = P−1 p(A)P;

that is, whenever B is similar to A under a similarity transformation, p(B) is similar to p(A)under the same similarity transformation .

4.5.5 Change of Bases for State-Space Models

Consider an LTI system of the form

x(t) = Ax(t) + Bu(t),

y(t) = Cx(t) + Du(t),(4.8)

where, for each t ∈ R, the column vectors x(t) ∈ Rn, u(t) ∈ Rm, and y(t) ∈ Rl are the coordinaterepresentations of the state, input, and output of the system with respect to the standard bases forthe state space Rn, input space Rm, and output space Rl, respectively. If x(t), u(t), and y(t) are thecoordinate representations of the state, input, and output with respect to new bases {v1, . . . , vn},{w1, . . . , wm}, and {z1, . . . , zl}, respectively, then we have

Px(t) = x(t), Qu(t) = u(t), and Ry(t) = y(t)

for t ∈ R, where

P =

v1 · · · vn

, Q =

w1 · · · wm

, and R =

z1 · · · zl

.

Thus, in terms of the new bases, the state-space model (4.8) is represented as

˙x(t) = (P−1AP)x(t) + (P−1BQ)u(t),

y(t) = (R−1CP)x(t) + (R−1DQ)u(t).(4.9)

If the transfer function matrix of the original model (4.8) is H(s) = C(sI − A)−1B + D, then thetransfer function matrix of the new model (4.9) is given by R−1 H(s)Q. As we have seen earlier,the two models are equivalent if Q = I ∈ Rm×m and R = I ∈ Rl×l. In general, since there exists

one-to-one correspondence between the two models, one can perform analysis and synthesis for (4.9)and transform the results back to the original spaces to obtain the corresponding results for (4.8).

References


[2] S. Axler, Linear Algebra Done Right , 2nd ed. New York, NY: Springer, 1997.




V. Structural Properties of Matrices



Fall 2010

5.1 Definitions and Facts

5.1.1 Determinants

Given a square matrix A ∈ Cn×n, the determinant of A, denoted by det A, can be defined induc-

tively in the following way. Define the determinant of a 1-by-1 matrix to be the value of its singleentry. Assume the determinant is defined over C(n−1)×(n−1) and let Aij ∈ C

(n−1)×(n−1) denote thesubmatrix of A = (aij) ∈ C

n×n resulting from the deletion of row i and column j. Then

n j=1

(−1)i+ jaij det Aij =ni=1

(−1)i+ jaij det Aij

for all i, j ∈ n, and this common value is defined to be the determinant of A. It is clear thatdet AT = det A by definition. Also, it is also easily verified that det I = 1 and det 0 = 0, and thatdet A = 0 if and only if A is nonsingular. The following are useful facts [1, A.11 & 12]:

• If A, B ∈ Cn×n, then det AB = det A det B. (This does not hold for nonsquare A and B.)

• If A and B are square and if A is nonsingular,

det

A B

C D

= det A det(D − CA−1B).

(Recall that D − CA−1B is the Schur complement of A.) In particular, if B = 0, then(regardless of the invertibility of A) we have

det

A 0

C D

= det A det D.

• If A ∈ Cm×n and B ∈ Cn×m, then det(I − AB) = det(I − BA).

5.1.2 Traces

If A ∈ Cn×n, then the trace of A, denoted by tr A, is the sum of the diagonal entries of A; that is,

tr A =n

i=1 aii for A = (aij). Clearly, trace is a linear operation: tr(αA + β B) = α tr A + β tr B

for all α, β ∈ C and for all A, B ∈ Cn×n. If A ∈ C

m×n and B ∈ Cn×m, then direct calculation

reveals that tr AB = tr BA.

1




5.1.3 Eigenvalues and Eigenvectors

Let (Cn,C) be the n-dimensional complex vector space (over the complex field). Let A ∈ Cn×n be a

matrix representation of a linear transformation that maps Cn into Cn. If a complex number λ ∈ C

and a nonzero complex vector v ∈ Cn are such that Av = λv, then λ is called an eigenvalue of A

and v is called an eigenvector of A associated with the eigenvalue λ. The fundamental theorem

of linear equations [2, Theorem 3.4] says that the sum of the rank and nullity of A equals n.Consequently, the linear equation Av = λv, or equivalently (λI−A)v = 0, has a nonzero solution v

if and only if rank(λI−A) < n (i.e., λI−A is singular). Thus, the eigenvalues of A are the complexnumbers s that satisfy

det(sI − A) = 0. (5.1)

We call det(sI − A), which is a polynomial of degree n in s, the characteristic polynomial of A,and (5.1) the characteristic equation of A. The well-known, but nontrivial, fundamental theorem of algebra [2, Theorem 4.7] implies that a polynomial of degree n with complex coefficients hasexactly n roots, counting multiplicities, among the complex numbers. A consequence of this theoremis that an n-by-n matrix has exactly n eigenvalues, counting (algebraic) multiplicities. If λ1,. . . λn ∈ C are these eigenvalues, then the following hold true [3, Theorem 1.2.12]:

tr A =ni=1

λi and det A =ni=1

λi.

If λ1, . . . , λ p are distinct eigenvalues of A ∈ Cn×n (where p ≤ n), then the dimension of the

linear subspace N (λiI − A) in Cn (i.e., the number of linearly independent eigenvectors associatedwith the eigenvalue λi over the complex field) is called the geometric multiplicity of the eigenvalue λi.On the other hand, if the characteristic polynomial is given by det(sI − A) =

pi=1(s − λi)

mi , where pi=1 mi = n, then mi is the algebraic multiplicity of the eigenvalue λi. If f (s) is a polynomial

in s, then the roots of the characteristic polynomial of f (A) are f (λ1), . . . , f (λ p), and det(sI −f (A)) =

pi=1(s − f (λi))mi ; that is, each f (λi) is an eigenvalue of f (A) of algebraic multiplicity

mi, i = 1, . . . , p. Also, since det(sI− A) = det P

−1

(sI− A)P = det(sI− P

−1

AP) for any invertibleP ∈ Cn×n, similar matrices have the same characteristic polynomial; that is, they have the same

set of eigenvalues, counting (algebraic) multiplicities. Finally, the eigenvalues of AT are the sameas those of A, counting multiplicities, because det(λI − AT) = det(λI − A)T = det(λI − A).

5.1.4 An Example

To illustrate the definitions and facts presented so far, consider

A =

1 0 1 00 0 0 0

−1 0 1 00 1 0 0

, P =

0 0 0 10 1 0 01 0 0 00 0 1 0

=

0 0 0 10 1 0 01 0 0 00 0 1 0

−1

.

If {e1, e2, e3, e4} is the standard basis for C4, then we have [e4 e2 e1 e3] = [e1 e2 e3 e4]P = P, sothe change of basis defined by P is nothing but a reordering of the basis vectors. In particular, AP

amounts to a reordering of the columns of A and P−1A corresponds to a reordering of the rowsof A. We have

P−1AP =

0 1 0 00 0 0 00 0 1 10 0 −1 1

=

A1 0

0 A2

with A1 =

0 10 0

, A2 =

1 1−1 1

, 0 =

0 00 0

,




so it follows from det(sI − A) = det(sI − P−1AP) = det(sI − A1)det(sI − A2) that the set of eigenvalues of A is the union of the sets of eigenvalues of A1 and A2. Since det(sI − A1) = s2, wehave that 0 and 0 are eigenvalues of A1. Also, since det(sI−A2) = (s−1)2 +1, we have that 1+i and1 − i are eigenvalues of A2. Thus, 0, 1+ i, and 1 − i are the distinct eigenvalues of A; their algebraicmultiplicities are 2, 1, and 1, respectively. While the geometric multiplicities of eigenvalues 1 + i

and 1 − i are both equal to 1, it follows from N (0I − A) = {x ∈ C

4

: x1 + x3 = 0, x1 − x3 =0, x2 = 0} = {x ∈ C4 : x1 = x2 = x3 = 0} = span{(0, 0, 0, 1)} that the geometric multiplicity of

eigenvalue 0 (i.e., the dimension of the subspace spanned by the eigenvectors of A corresponding toeigenvalue 0) is 1, which is less than its algebraic multiplicity. Finally, counting multiplicities, theset of all eigenvalues of A is {0, 0, 1 + i, 1 − i}, so indeed tr A = a11 + a22 + a33 + a44 = 1+0+1+ 0 =0 + 0 + (1 + i) + (1 − i) and det A = det A1 det A2 = 0 · 2 = 0 · 0 · (1 + i) · (1 − i).

5.2 Upper-Triangular Form of Matrices [2, Chapter 5]

5.2.1 Invariant Subspaces

Definition 5.1 Let V be a vector space; let T : V → V be a linear transformation. A linear subspace U of V is said to be invariant under T if u ∈ U implies T u ∈ U .

If A ∈ Cn×n is a matrix representation of a linear map T : Cn → C

n, and if a subspace U of Cn is invariant under T , then we say U is invariant under A. Clearly, the spaces {0}, Cn,

N (A), and R(A) are all invariant under A. In general, if p(s) is a polynomial in s, then it followsimmediately from p(A)A = A p(A) that N

p(A)

and R

p(A)

are both invariant under A.

5.2.2 Upper-Triangular Form

Suppose A ∈ Cn×n and {v1, . . . , vn} is a basis of Cn. Recall that P = [v1 · · · vn] is the matrix of

the basis {v1, . . . , vn} with respect to the standard basis {e1, . . . , en} such that the representationof A with respect to {v1, . . . , vn} is given by the similarity transform A = (aij) = P−1AP. The

matrix ¯A is called an upper-triangular matrix representation of A if all the entries below its diagonalare equal to zero (i.e. aij = 0 for i > j ).

Lemma 5.2 Let A ∈ Cn×n; let {v1, . . . , vn} be a basis of Cn. The following are equivalent:

(a) P−1AP is upper triangular with P = [v1 · · · vn];

(b) Avk ∈ span{v1, . . . , vk} for each k = 1, . . . , n;

(c) span{v1, . . . , vk} is invariant under A for each k = 1, . . . , n.

Proof . We will show (a) ⇒ (b) ⇒ (c) ⇒ (a). Condition (a) is equivalent to the existence of a11,a12, a22, . . . , a1n, . . . , ann ∈ C such that

A

v1 · · · vn

=

v1 · · · vn

a11 a12 · · · a1n

0 a22 · · · a2n...

... . . .

...0 0 · · · ann

. (5.2)

Thus, if (a) holds, then Avk = a1kv1 + · · · + akkvk ∈ span{v1, . . . , vk} for all k, and so (b)holds. If (b) holds, then we have Av1 ∈ span{v1}, Av2 ∈ span{v1, v2}, and so on. This meansAv1, . . . , Avk belong to span{v1, . . . , vk} for each k. Choose any x ∈ span{v1, . . . , vk}. Then




x = α1v1 + · · · + αkvk for some α1, . . . , αk ∈ C, and hence we have Ax = α1Av1 + · · · +αkAvk ∈ span{v1, . . . , vk}. This implies (c) holds true. Finally, if (c) holds, then Av1 = a11v1,Av2 = a12v1 + a22v2, and so on for some scalars aij . This gives (5.2), so (a) holds.

Lemma 5.3 Let A ∈ Cn×n. Let V ∈ Cn×k have full column rank. Then the columns of V span an invariant subspace of Cn under A if and only if AV = VAV for some AV ∈ Ck×k. Moreover,all eigenvalues of AV are eigenvalues of A.

Proof . We need to show necessity (i.e., the “only if” part) and sufficiency (i.e., the “if” part)separately. To show necessity, suppose V = [v1 · · · vk] has linearly independent columns that spanan invariant subspace of Cn under A. Then for each j = 1, . . . , k there exist a1 j, . . . , akj ∈ C suchthat Av j = a1 jv1 + · · · + akjvk. Thus letting AV be the matrix whose entry (i, j) is aij yields thatAV = VAV . Conversely, to show sufficiency, suppose AV = VAV . Then, for each u ∈ C

k, thevector Vu is a linear combination of v1, . . . , vk. Thus it follows from A(Vu) = V(AV u) = Vw,with w = AV u ∈ Ck, that the image of each linear combination of v1, . . . , vk under A is anotherlinear combination of v1, . . . , vk, and hence that span{v1, . . . , vk} is invariant under A. Finally,if AV v = λv for some λ ∈ C and some nonzero v ∈ Ck, then A(Vv) = V(AV v) = V(λv) = λ(Vv),

so λ is an eigenvalue of A with Vv being a corresponding eigenvector of A.

Theorem 5.4 Every square matrix A ∈ Cn×n has an upper-triangular matrix representation A.

Moreover, the eigenvalues of A consist precisely of the entries on the diagonal of A.

Proof . We will prove the result by induction on n. Clearly the desired result holds if n = 1.As an induction hypothesis, suppose that n > 1 and that the result holds for all A ∈ C

1×1, . . . ,A ∈ C

(n−1)×(n−1). We will complete the proof by showing that, under this hypothesis, the resultholds for all A ∈ C

n×n as well. Choose any A ∈ Cn×n. Let λ be an eigenvalue of A. Since there

exists a nonzero vector v (i.e., an eigenvector of A associated with λ) in N (λI − A), and sincedim R(λI − A)+dim N (λI − A) = n, we have that dim R(λI − A) < n. Moreover, R(λI − A) is aninvariant subspace of Cn under A. By Lemma 5.3, there exists a basis {u

1, . . . , u

k} of R(λI − A),

with k < n, such that putting U = [u1 · · · uk] ∈ Cn×k yields AU = UAU for some AU ∈ C

k×k.Then, by the induction hypothesis and by Lemma 5.2, there exists a basis {v1, . . . , vk} for Ck suchthat AU vi ∈ span{v1, . . . , vi} for each i = 1, . . . , k. Also, if we extend {v1, . . . , vk} to a basis{v1, . . . , vk, vk+1, . . . , vn} of Cn, then for each i = k + 1, . . . , n we have Avi = λvi − (λI − A)vi,where (λI − A)vi ∈ span{v1, . . . , vk}, and so Avi ∈ span{v1, . . . , vi}. Therefore, by Lemma 5.2,the matrix A has an upper-triangular representation A = P−1AP with P = [v1 · · · vn] ∈ Cn×n.If the entries on the diagonal of A are λ1, . . . , λn, then the corresponding entries of λI − A equalλ − λ1, . . . , λ − λn. Since λI − A = P−1(λI − A)P (i.e., λI − A and λI − A are similar), and sincedet(λI − A) =

ni=1(λ − λi) (due to A being upper triangular), the matrix λI − A is not invertible

(i.e., λ is an eigenvalue of A) if and only if λ equals one of the λ j’s. This completes the proof.

This result is called the Schur triangularization theorem . It will be used in Section 5.3.2 to

prove the Cayley-Hamilton Theorem, which is one of the most important linear algebraic tools.

5.2.3 Diagonal Form

Theorem 5.5 If A ∈ Cn×n has p distinct eigenvalues λ1, . . . , λ p ∈ C, and if v1, . . . , v p ∈ Cn

are corresponding eigenvectors, then {v1, . . . , v p} is linearly independent.

Proof . To prove the result by contradiction, suppose {v1, . . . , v p} is linearly dependent. Let k bethe smallest positive integer satisfying vk ∈ span{v1, . . . , vk−1}, so that vk = α1v1 + · · ·+αk−1vk−1




for some α1, . . . , αk−1 ∈ C. Then we have both

Avk = α1Av1 + · · · + αk−1Avk−1 = α1λ1v1 + · · · + αk−1λk−1vk−1

andAvk = λkvk = λkα1v1 + · · · + λkαk−1vk−1,

which give α1(λk − λ1)v1 + · · · + αk−1(λk − λk−1)vk−1 = 0. By construction, {v1, . . . , vk−1} islinearly independent, and so we must have α1 = · · · = αk−1 = 0. However, this implies vk = 0,which contradicts the fact that vk is an eigenvector of A.

This theorem suggests the following: If A ∈ Cn×n has n distinct eigenvalues λ1, . . . , λn, and if

v1, . . . , vn ∈ Cn are corresponding eigenvectors of A, then {v1, . . . , vn} is a basis for Cn. That is,

Cn = span{v1, . . . , vn} = N (λ1I − A) ⊕ · · · ⊕ N (λnI − A)

over the complex field. Moreover, if we let

P =

v1 · · · vn

,

then P is the matrix of the basis {v1, . . . , vn} with respect to the standard basis {e1, . . . , en} suchthat the representation of A with respect to {v1, . . . , vn} is given by

A = P−1AP = diag{λ1, . . . , λn} =

λ1 0 · · · 00 λ2 · · · 0...

... . . .

...0 0 · · · λn

. (5.3)

This is because the ith column of AP equals Avi, which equals λvi, which equals the ith columnof PA. The matrix A in (5.3) is called a diagonal matrix representation of A, which is a specialtype of upper-triangular representations. The existence of n distinct eigenvalues is sufficient but

not necessary for an n-by-n matrix to be diagonalizable like this. Some of the exact conditions fordiagonalizability proved in [2, Theorem 5.21] are as follows:

Theorem 5.6 Let λ1, . . . , λ p ∈ C be the distinct eigenvalues of A ∈ Cn×n. Each of the following

is equivalent to the existence of a diagonal matrix representation of A:

(a) The geometric multiplicity is the same as the algebraic multiplicity for each λi, i = 1, . . . , p;

(b) Cn has a basis consisting of eigenvectors of A;

(c) Cn = N (λ1I − A) ⊕ · · · ⊕ N (λ pI − A).

5.2.4 An Example

Let A, P ∈ C4×4 and A1, A2 ∈ C

2×2 be as in Section 5.1.4. Then we have

AP = P

A1 0

0 A2

⇒ A

e4 e2

=

e4 e2

A1 and A

e1 e3

=

e1 e3

A2,

so Lemma 5.3 implies that both span{e4, e2} and span{e1, e3} are invariant subspaces under A.Indeed, a direct computation also implies span{e2, e4}, for instance, is invariant under A becausex = αe2 + β e4 = [0 α 0 β ]T gives Ax = [0 0 0 α]T = αe4 ∈ span{e2, e4}. While A1 is upper




triangular, A2 is not, so even though P−1AP is block diagonal, it is not upper triangular. However,the two eigenvalues 1+i and 1−i of A2 are distinct and hence by Theorem 5.5 the eigenvectors of A2

span C2 and A2 is diagonalizable. More specifically, we have A2v1 = (1+i)v1 and A2v2 = (1−i)v2,

where v1 = [1 i]T and v2 = [1 − i]T are linearly independent, so V−1A2V = diag{1 + i, 1 − i} withV = [v1 v2]. Therefore, as

Q = P

I 0

0 V

=

0 0 1 10 1 0 00 0 i −i1 0 0 0

⇒

Q−1AQ =

I 0

0 V−1

A1 0

0 A2

I 0

0 V

=

A1 0

0 V−1A2V

=

0 1 0 00 0 0 00 0 1 + i 00 0 0 1 − i

,

we have that Q−1AQ is upper triangular. Note that, according to Theorem 5.6, the matrix A isnot diagonalizable because, while the eigenvalue 0 has an algebraic multiplicity of 2, its geometricmultiplicity is only 1.

5.3 Jordan Canonical Form of Matrices [2, Chapter 8]5.3.1 Generalized Eigenvectors

Due to Theorem 5.6, if A ∈ Cn×n does not have enough number of linearly independent eigenvec-

tors, there is no diagonal matrix representation of A. In such cases, we generalize the concept of eigenvectors in order to generate a basis B for Cn such that the representation of A with respectto B is upper triangular.

Definition 5.7 Let λ be an eigenvalue of A ∈ Cn×n. A nonzero vector v ∈ Cn is called a gener-alized eigenvector of A corresponding to λ if (λI − A)kv = 0 for some positive integer k.

Every eigenvector is a generalized eigenvector, but the converse is not true. For example, the

matrix A ∈ C4×

4 considered in Sections 5.1.4 and 5.2.4 is such that A2[0 x2 0 x4]T = 0 for all x2,x4 ∈ C. While (0, 1, 0, 0) and (0, 0, 0, 1) are both generalized eigenvectors of A corresponding to theeigenvalue 0, the former is not an eigenvector of A because Ax = 0 only if x ∈ span{(0, 0, 0, 1)}.

It is easily seen that, for any A ∈ Cn×n, we have

{0} = N (A0) ⊂ N (A1) ⊂ · · · ⊂ N (An) = N (An+1) = · · · ,

Cn = R(A0) ⊃ R(A1) ⊃ · · · ⊃ R(An) = R(An+1) = · · · .

If λ is an eigenvalue of A ∈ Cn×n, then replacing A with λI − A we see that the span of generalized

eigenvectors of A associated with λ equals N (λI − A)n. It is shown in [2, Theorem 8.10] that the algebraic multiplicity of λ is precisely equal to the dimension of N (λI − A)n. A matrix A ∈ C

n×n iscalled nilpotent if Ak = 0 for some positive integer k. If A is nilpotent, then every nonzero vectorin Cn is a generalized eigenvector of A corresponding to the eigenvalue 0, and thus N (An) = Cn. Itis shown in [2, Lemma 8.26] that every nilpotent matrix is similar to a “strictly” upper-triangularmatrix, whose entries on and below the diagonal are all 0’s.

5.3.2 Cayley-Hamilton Theorem

The following result, called the Cayley-Hamilton Theorem , is very useful to us.

Theorem 5.8 If p(s) = det(sI − A) where A ∈ Cn×n, then p(A) = 0.




Proof . Suppose {v1, . . . , vn} is a basis for Cn with respect to which the representation of A hasan upper-triangular form A = (aij) as in (5.2). To prove p(A) = 0, we need to show p(A)vk = 0for all k = 1, . . . , n. Since p(A) is a constant multiple of

ni=1(λiI − A), where λ1, . . . , λn

are the eigenvalues of A, it suffices to show k

i=1(λiI − A)vk = 0 for k = 1, . . . , n. To do thisby induction, let k = 1. Then we have Av1 = a11v1, where a11 = λ1 for some eigenvalue λ1

of A, giving (λ1I − A)v1 = 0. Now, as an induction hypothesis, suppose there exists a positiveinteger j such that k

i=1(λiI − A)vk = 0 for k = 1, . . . , j . We will complete the proof by showing

that, under this hypothesis, we have j+1

i=1 (λiI − A)v j+1 = 0 as well. Equation (5.2), togetherwith Theorem 5.4, gives (a j+1 j+1I − A)v j+1 ∈ span{v1, . . . , v j}, where a j+1 j+1 = λ j+1 for some

eigenvalue λ j+1 of A. Thus, by the induction hypothesis, j

i=1(λiI−A) applied to (λ j+1I−A)v j+1

gives 0. This completes the proof.

Corollary 5.9 If A ∈ F n×n with either F = R or F = C, and if f (s) is a polynomial whose coefficients are in F , then there exist β 0, . . . , β n−1 ∈ F such that

f (A) = β 0I + β 1A + · · · + β n−1An−1.

Proof . If p(s) = det(sI − A) is the characteristic polynomial of A, then the coefficients of p(s)are in F . Moreover, there exist two unique polynomials q (s) and r(s), whose coefficients in F , suchthat f (s) = p(s)q (s) + r(s), where the degree of r(s) is at most n − 1. By the Cayley-HamiltonTheorem, we have p(A) = 0, and so f (A) = p(A)q (A) + r(A) = r(A).

For example, let us use the Cayley-Hamilton theorem and its corollary to evaluate At for all

t = 0, 1, ..., where A =

1/2 10 −1/2

. The characteristic polynomial of A is given by p(s) =

det(sI − A) = (s − 1/2)(s + 1/2). Let f (s) = st. Then by Corollary 5.9 there are β 0(t), β 1(t) ∈ R

such that f (A) = β 0(t)I + β 1(t)A. Let r(s) = β 0(t) + β 1(t)s. Then f (s) − r(s) has p(s) as a factorif and only if f (s) − r(s) = 0 for s = 1/2 and for s = −1/2. That is, we have f (A) = r(A) if

1 1/21 −1/2

β 0(t)β 1(t)

=

(1/2)t

(−1/2)t

is satisfied. Solving this equation for β 0(t) and β 1(t), we obtain that

At = 2−

tI =

2−t

00 2−t

if t is even, and At = 2−

(t−

1)A =

2−t

2−(t−1)

0 −2−t

if t is odd.

5.3.3 Decomposition Theorem for Matrices

The following theorem, called the primary decomposition theorem for linear transformations , showsthat every linear transformation on a complex finite-dimensional vector space is decomposed intopieces, each of which is a nilpotent matrix plus a scalar multiple of the identity matrix.

Theorem 5.10 Let λ1, . . . , λ p be the distinct eigenvalues of A ∈ Cn×n. Let mk be the algebraic multiplicity of λk, and let V k be the subspace of Cn spanned by the generalized eigenvectors of A

associated with λk for k = 1, . . . , p. Then

Cn = V 1 ⊕ · · · ⊕ V p = N (λ1I − A)n ⊕ · · · ⊕ N (λ pI − A)n,

where each V k is invariant under A with V k = N (λkI − A)n and dim V k = mk. Moreover, if Vk

is a full-column-rank matrix whose columns span V k, then AVk = VkAk for some Ak ∈ Cmk×mk

such that λkI − Ak is nilpotent for k = 1, . . . , p.

Proof . We have seen in Section 5.3.1 that V k = N (λkI − A)n and dim V k = mk for each k.If x ∈ V k, then (λkI − A)n(Ax) = A(λkI − A)nx = A0 = 0, so each V k is invariant under A.By Lemma 5.3, there exists an Ak ∈ C

mk×mk such that AVk = VkAk for each k, so that (λkI −




A)nVk = Vk(λkI − Ak)n = 0. Since Vk has full column rank, this implies (λkI − Ak)n = 0. ThusλkI − Ak is nilpotent for each k . Now it remains to show C

n = V 1 ⊕ · · · ⊕ V p. Let V = V 1 + · · · + V p.Then V is a subspace of Cn with dim V ≤ n, where n =

pk=1 mk =

pk=1 dim V k, and so the

proof is complete once we show dim V = n. Clearly V is invariant under A. If V ∈ Cn×dimV

is a full-column-rank matrix whose columns span V , then Lemma 5.3 implies that there exists a

matrix AV ∈ C

dimV ×dimV

such that AV = VAV . Suppose (λI − A)

n

v = 0 for some λ ∈ C

andv ∈ V . Then v = Vu for some u ∈ CdimV , giving (λI − A)nVu = V(λI − AV )

nu = 0. Becausethe columns of V are linearly independent, we have (λI − AV )

nu = 0. That is, AV has the sameeigenvalues, with the same multiplicities, as A. This is possible only if dim V = n.

Corollary 5.11 Let λ1, . . . , λ p be the distinct eigenvalues of A ∈ Cn×n. Then there is a basis B

of Cn consisting of generalized eigenvectors of A such that, with respect to B, the matrix A has a block diagonal form A as follows:

A =

A1 0 · · · 0

0 A2 · · · 0...

... . . .

...0 0 · · · A p

where Ak =

λk ∗

. . .

0 λk

, k = 1, . . . , p.

Proof . For each k, let V k be the subspace of generalized eigenvectors of A corresponding to λk;let Vk be a full-column-rank matrix whose columns span V k. Then by Theorem 5.10 we haveAVk = VkAk for some matrix Ak ∈ C

dimV k×dimV k such that λkI − Ak is nilpotent. Choose aninvertible matrix Uk ∈ C

dimV k×dimV k such that U−1k (λkI − Ak)Uk is strictly upper triangular (with

zero diagonal entries) for each k. Then, because

AVkUk = VkAkUk = VkUk(U−1k AkUk) = VkUk(λkI − U−1

k (λkI − Ak)Uk)

holds for each k, the matrix A = P−1AP has the desired form with P = [V1U1 · · · V pU p].

5.3.4 Jordan Form

The primary decomposition theorem for linear transformations can be used to show that there isa basis of Cn with respect to which the matrix of a linear transformation on the finite-dimensionalcomplex vector space contains zeros everywhere except possibly on the diagonal and the line directlyabove the diagonal. The following lemma is proved in [2, Lemma 8.40]:

Lemma 5.12 Let N ∈ Cn×n be nilpotent. For each nonzero v ∈ C

n, let m(v) denote the largest nonnegative integer such that Nm(v)v = 0. Then there exist v1, . . . , vk ∈ C

n such that

(a) {v1, Nv1, . . . , Nm(v1)v1, . . . , vk, Nvk, . . . , Nm(vk)vk} is a basis of Cn;

(b) {Nm(v1)v1, . . . , Nm(vk)vk} is a basis of N (N).

For example, if v1 = [1 0 0 0 0]T and v2 = [0 0 0 1 0]T, then the nilpotent matrix

N =

0 0 0 0 01 0 0 0 00 1 0 0 00 0 0 0 00 0 0 1 0




has m(v1) = 2 a n d m(v2) = 1. Moreover, {v1, Nv1, N2v1, v2, Nv2} is a basis of C5, and{N2v1, Nv2} is a basis of N (N), which has dimension 2.

Given an A ∈ Cn×n, a basis of Cn is called a Jordan basis for A if, with respect to this basis, A

has a block diagonal matrix

J =

J1 0 · · · 0

0 J2 · · · 0...

... . . .

...0 0 · · · Jm

, where Jk =

λk 1 0

. . . . . .. . . 1

0 λk

, k = 1, . . . , m.

This matrix J is called a Jordan canonical form of A, and is both upper triangular and blockdiagonal. Each block Jk on the diagonal of J is called a Jordan block . A Jordan block Jk may be just a 1-by-1 block consisting of just some eigenvalue; otherwise, the diagonal of Jk is filled withsome eigenvalue λk, the line directly above the diagonal is filled with 1’s, and all other entries are 0.If p denotes the number of distinct eigenvalues of A, then the number m of Jordan blocks satisfies p ≤ m ≤ n.

Theorem 5.13 There exists a Jordan basis for every A ∈ Cn×n.

Proof . First consider a nilpotent matrix N. Choose the vectors v1, . . . , vk given by Lemma 5.12.Then, for each j, we have

N

Nm(vj)v j · · · Nv j v j

=

Nm(vj)v j · · · Nv j v j

0 1 0. . .

. . .. . . 1

0 0

.

Thus a Jordan form of N consists of k Jordan blocks. Now consider a general A ∈ Cn×n. Let λ1,

. . . , λ p be the distinct eigenvalues of A, with V 1, . . . , V p the corresponding subspaces of generalizedeigenvectors. Let V1, . . . , V p be full-column-rank matrices such that the columns of V j span V j

for each j. Then by Theorem 5.10, we have Cn = V 1 ⊕ · · · ⊕ V p and AV j = V jA j , j = 1, . . . , p,where each λ jI − A j is nilpotent. Thus A j − λ jI is nilpotent as well, and as we have just seenthere is a Jordan basis for it for each j. Putting these bases together gives a basis of Cn that is aJordan basis for A.

5.3.5 How to Obtain Jordan Canonical Form

We now know that a square matrix is always similar to its Jordan form, and the vectors that form aJordan basis are generalized eigenvectors of the matrix. A vector v ∈ Cn is said to be a generalizedeigenvector of grade k of A associated with the eigenvalue λ if

(A − λI)kv = 0 and (A − λI)k−1v = 0.

In this case, define

vk = v,

vk−1 = (A − λI)v = (A − λI)vk,

vk−2 = (A − λI)2v = (A − λI)vk−1,

...

v1 = (A − λI)k−1v = (A − λI)v2.




This set of vectors {v1, . . . , vk} is called a chain of generalized eigenvectors of length k. It isclear that each N (A − λI) j is a subspace of N (λI − A) j+1. Moreover, we have that each v j is in

N (A − λI) j but not in N (A − λI) j−1. Therefore, the chain {v1, . . . , vk} is linearly independent.The following example is from [4, Pages 39–41] and illustrates how to obtain a Jordan basis for A:

1. Choose a repeated eigenvalue λ of algebraic multiplicity m. The task here is to find m linearly

independent generalized eigenvectors of A associated with λ. This is achieved by searching chainsof generalized eigenvectors of various lengths.2. Compute the ranks of (A − λI)k, k = 0, 1, ..., until the nullity of (A − λI)k equals m.

Such a k exists and is no more than m. Let N j denote the null space of (A − λI) j . For thepurpose of illustration, assume n = 10, m = 8, k = 4, and v0 = dim N 0 = 0, v1 = dim N 1 = 3,v2 = dim N 2 = 6, v3 = dim N 3 = 7, and v4 = dim N 4 = 8.

3. Find generalized eigenvectors of grade k associated with λ and generate chains of generalized eigenvectors based on them. By assumption, we have N 3 ⊂ N 4 and v4 − v3 = 1, and so we canfind one and only one linearly independent vector u, which is in N 4 but not in N 3, such that(A − λI)4u = 0 and (A − λI)3u = 0. From this u, generate a chain of four generalized eigenvectorsas u1 = (A − λI)3u, u2 = (A − λI)2u, u3 = (A − λI)u, and u4 = u.

4. Find generalized eigenvectors of grade k − 1 associated with λ and generate chains of gener-

alized eigenvectors based on them. By assumption, we have N 2 ⊂ N 3 and v3 − v2 = 1, and so thereis only one linearly independent vector which is in N 3 but not in N 2. The vector u3 obtained inStep 3 above is such a vector, and the chain of generalized eigenvectors it generates is nothing but{u1, u2, u3}, which we already have.

...

5. Find generalized eigenvectors of grade 2 associated with λ and generate chains of generalized eigenvectors based on them. Because v2−v1 = 3 by assumption, there are three linearly independentvectors that are in N 2 but not in N 1. Since u2 is one of them, we can find two vectors v and w

such that {u2, v, w} is linearly independent and such that (A − λI)2v = 0 , (A − λI)v = 0,(A − λI)2w = 0, and (A − λI)w = 0. From v and w, generate two chains of generalized eigenvalues

of length 2, say {v1, v2} and {w1, w2}.6. Find eigenvectors (i.e., generalized eigenvectors of grade 1) associated with λ. By assumption,

the number of vectors in N 1 is equal to v1 − v0 = 3. However, u1, v1, and w1 are such vectors,and so there is no need to search for other vectors in N 1.

7. Finally, form a linearly independent set of m generalized eigenvectors. This completes thesearch of eight generalized eigenvectors of A associated with λ. These vectors are u1, u2, u3, u4,v1, v2, w1, and w2. The generalized eigenvectors generated this way are guaranteed to be linearlyindependent [4, Theorem 2-10]. Moreover, the generalized eigenvectors associated with differenteigenvalues are linearly independent due to Theorem 5.10.

8. Obtain a Jordan-form matrix. For each eigenvalue of A, its geometric multiplicity is equal to the number of Jordan blocks associated with it, and its algebraic multiplicity is the sum of the

orders of all Jordan blocks associated with it. Let V = [u1 u2 u3 u4 v1 v2 w1 w2]. Because(A − λI)u1 = 0 , (A − λI)u2 = u1, (A − λI)u3 = u2, (A − λI)u4 = u3, (A − λI)v1 = 0,(A − λI)v2 = v1, (A − λI)w1 = 0, and (A − λI)w2 = w1, we have AV = VAV , where

AV =

λ 1 0 00 λ 1 00 0 λ 10 0 0 λ

λ 10 λ

λ 10 λ

.




Indeed, the number of Jordan blocks associated with λ is 3, which equals the geometric multiplicityof λ. Also, these Jordan blocks form AV ∈ C

8×8, where 8 is the algebraic multiplicity of λ.

5.3.6 An Example

Let A ∈ C4×4 be as in Section 5.1.4 and Section 5.2.4:

A0 = I =

1 0 0 00 1 0 00 0 1 00 0 0 1

, A1 = A =

1 0 1 00 0 0 0

−1 0 1 00 1 0 0

A2 =

0 0 2 00 0 0 0

−2 0 0 00 0 0 0

, . . . .

The roots of the characteristic polynomial det(sI−A) = s2[(s−1)2 +1] are 0, 0, 1+i, and 1−i. Therepeated eigenvalue λ = 0 has algebraic multiplicity m = 2. We have rank(A− λI)0 = rank A0 = 4,rank(A − λI)1 = rank A = 3, and rank(A − λI)2 = rank A2 = 2, so k = 2 is the smallest numbersuch that dim N (A − λI)k = m = 2. A generalized eigenvector u such that (A − λI)ku = A2u = 0and (A − λI)k−1u = Au = 0 is then u = [0 1 0 0]T. From this u, we generate a chain of k = 2generalized eigenvectors as u2 = u = [0 1 0 0]T and u1 = (A − λI)u = Au = [0 0 0 1]T. Since theeigenvalues 1 + i and 1 − i give two more (generalized) eigenvectors—namely, v1 = [1 0 − i 0]T

and v2 = [1 0 − i 0]T, respectively—we have obtained a Jordan basis {u1, u2, v1, v2}. LettingQ = [u1 u2 v1 v2], which happens to be the same as in Example 5.2.4, gives a Jordan canonicalform J of A as

Q =

0 0 1 10 1 0 00 0 i −i1 0 0 0

⇒ J = Q−1AQ =

J1 0 0

0 J2 00 0 J3

=

0 1 0 00 0 0 00 0 1 + i 00 0 0 1 − i

,

where the three Jordan blocks are J1 = [ 0 10 0 ], J2 = 1 + i, and J3 = 1 − i.

References

[1] T. Kailath, Linear Systems . Englewood Cliffs, NJ: Prentice-Hall, 1980.

[2] S. Axler, Linear Algebra Done Right , 2nd ed. New York, NY: Springer, 1997.


[4] C.-T. Chen, Linear System Theory and Design , 2nd ed. New York, NY: Oxford UniversityPress, 1984.




VI. State Transition Matrix



Fall 2010

6.1 Introduction

Typical signal spaces are (infinite-dimensional) vector spaces. Consider the space C (J,Rn) of Rn-valued continuous functions on an interval J = (a, b) ⊂ R. For f , g ∈ C (J,Rn), define f + g ∈

C (J,Rn

) to be the function satisfying

(f + g)(t) = f (t) + g(t) for all t ∈ J.

Also, for α ∈ R and f ∈ C (J,Rn), define αf ∈ C (J,Rn) to be the function satisfying

(αf )(t) = αf (t) for all t ∈ J.

Then it is straightforward to show that C (J,Rn) is a vector space over the real field, where theorigin 0 ∈ C (J,Rn) is the function which is identically zero. Under this definition, a functionf ∈ C (J,Rn) is a linear combination of functions g1, . . . , gm ∈ C (J,Rn) if there are α1, . . . ,αm ∈ R such that f (t) = (α1g1 + · · ·+αmgm)(t) for all t ∈ J . A nonempty finite subset {g1, . . . , gm}of C (J,Rn) is then linearly independent if and only if (α1g1 + · · · + αmgm)(t) = 0 for all t ∈ J implies α1 = · · · = αm = 0.

For simplicity, we will focus on LTI systems of the form x(t) = Ax(t). However, all the theoremsup to Section 6.3.2 inclusive carry over to LTV systems of the form x(t) = A(t)x(t) as long as A(·)is continuous; see [1, Sections 2.3 & 2.4] and [2, Pages 50–73]. These results hold true even if A(·)is piecewise continuous; a function is said to be piecewise continuous if, on each bounded intervalin its domain, it is continuous except p ossibly for a finite number of discontinuities. If A(·) ispiecewise continuous (or piecewise constant in particular), then one can solve x(t) = A(t)x(t) pieceby piece and then put together the results to obtain an overall solution.

6.2 Fundamental Matrices [3, Section 3.2.1]

6.2.1 Solution Space

Consider the homogeneous linear time-invariant system

x(t) = Ax(t), t ∈ R, (6.1)

where A ∈ Rn×n is a constant matrix. We have seen that this system, subject to any initial

condition x(t0) = x0, has a unique solution φ on R; that is, there exists a unique φ ∈ C (R,Rn)such that φ(t) = Aφ(t) for all t ∈ R and such that φ(t0) = x0. It turns out that the set of solutionsto (6.1) is a finite-dimensional subspace of C (R,Rn), which is infinite dimensional.

1




Theorem 6.1 The set of all solutions to (6.1) forms an n-dimensional linear subspace of C (R,Rn).

Proof . Let V be the space of all solutions of (6.1) on R. It is easy to verify that V is alinear subspace of C (R,Rn). Choose a set of n linearly independent vectors {x1, . . . , xn} in Rn. If t0 ∈ R, then there exist n solutions φ1, . . . , φn of (6.1) such that φ1(t0) = x1, . . . , φn(t0) = xn.The linear independence of {φ1, . . . ,φn} follows from that of {x1, . . . , xn}. It remains to show that

{φ1, . . . ,φn} spans V . Let φ be any solution of (6.1) on R such that φ(t0) = x0. There exist uniquescalars α1, . . . , αn ∈ R such that x0 = α1x1 + · · · + αnxn, and thus that ψ = α1φ1 + · · · + αnφn

is a solution of (6.1) on R such that ψ(t0) = x0. However, because of the uniqueness of a solutionsatisfying the given initial data, we have that φ = α1φ1 + · · · + αnφn. This shows that everysolution of (6.1) is a linear combination of φ1, . . . , φn, and hence that {φ1, . . . ,φn} is a basis of the solution space.

6.2.2 Matrix Differential Equations

With A ∈ Rn×n as before, consider the matrix differential equation

X(t) = AX(t). (6.2)

If the columns of X(t) ∈ Rn×n are denoted x1(t), . . . , xn(t) ∈ R

n, then (6.2) is equivalent to

x1(t)...

xn(t)

=

A · · · 0...

. . . ...

0 · · · A

x1(t)...

xn(t)

,

so a matrix differential equation of the form (6.2) is a system of n2 differential equations.

Definition 6.2 A set {φ1, . . . ,φn} ⊂ C (R,Rn) of n linearly independent solutions of (6.1) is called a fundamental set of solutions of (6.1), and the Rn×n-valued function

Φ =φ1 · · · φn

is called a fundamental matrix of (6.1).

Theorem 6.3 A fundamental matrix Φ of (6.1) satisfies the matrix equation (6.2).

Proof . The result is immediate from

Φ(t) =φ1(t) · · · φn(t)

=

Aφ1(t) · · · Aφn(t)

= Aφ1(t) · · · φn(t)

= AΦ(t).

Therefore, a fundamental matrix is a matrix-valued function whose columns span the space of solutions to the system of n differential equations (6.1).

6.2.3 Characterizations of Fundamental Matrices

It can be shown that, if Φ is a solution of the matrix differential equation (6.2), then

det Φ(t) = e(trA)(t−τ ) det Φ(τ ) (6.3)

for all t, τ ∈ R. This result is called Abel’s formula and proved in [1, Theorem 2.3.3]. Its immediateconsequence is that, since t, τ are arbitrary, we have either det Φ(t) = 0 for all t ∈ R or det Φ(t) = 0for any t ∈ R. In fact, we have the following characterizations of fundamental matrices, which weobtain without using Abel’s formula.




Theorem 6.4 A solution Φ of the matrix differential equation (6.2) is a fundamental matrix of (6.1) if and only if Φ(t) is nonsingular for all t ∈ R.

Proof . To show necessity by contradiction, suppose that Φ = [φ1 · · · φn] is a fundamentalmatrix for (6.1) and that det Φ(t0) = 0 for some t0 ∈ R. Then, since Φ(t0) is singular, the set{φ1(t0), . . . ,φn(t0)} ⊂ R

n is linearly dependent (over the real field), so that there are α1, . . . ,

αn ∈ R, not all zero, such that

ni=1 αiφi(t0) = 0. Every linear combination of the columns of a

fundamental matrix is a solution of (6.1), and son

i=1 αiφi is a solution of (6.1). Due to the unique-ness of the solution, this, along with the initial condition

ni=1 αiφi(t0) = 0, implies that

ni=1 αiφi

is identically zero, which contradicts the fact that φ1, . . . , φn are linearly independent. Thus, weconclude that det Φ(t) = 0 for all t ∈ R. To show sufficiency, let Φ be a solution of (6.2) andsuppose that det Φ(t) = 0 for all t ∈ R. Then the columns of Φ form a linearly independent set of vectors for all t ∈ R. Hence, Φ is a fundamental matrix of (6.1). .

For example, consider three R3×3-valued functions

Φ1(t) =

1 2t t2

0 1 t0 0 1

, Φ2(t) =

1 t t2

0 1 t0 0 1

, Φ3(t) =

1 t t2

0 1 t0 0 t

, t ∈ R.

If we write Φ1 = [φ1 φ2 φ3], where φi ∈ C (R,R3) are given by φ1(t) = [1 0 0]T, φ2(t) = [2t 1 0]T,and φ3(t) = [t2 t 1]T for all t ∈ R, then {φ1,φ2,φ3} is a linearly independent set in C (R,R3)because

3i=1 αiφi being identically zero implies all αi equal 0. Similarly, the columns of Φ2 are

linearly independent in C (R,R3) and so are those of Φ3. Thus, Φ1, Φ2, and Φ3 are “potentially”fundamental matrices. We have det Φ1(t) = det Φ2(t) = 1 and det Φ3(t) = t for all t ∈ R. Sincedet Φ3(0) = 0, however, Theorem 6.4 tells us that Φ3 is not a fundamental matrix of (6.1) or evenits time-varying version x(t) = A(t)x(t) (as the proof of Theorem 6.4 carries over to the time-varying case). On the other hand, since Φ1(t) and Φ2(t) are nonsingular for all t, it remains thatΦ1 and Φ2 are fundamental matrices provided that they solve the matrix differential equation (6.2).Solving Φ1(t) = AΦ1(t) for A ∈ R3×3 yields that

A = Φ1(t)Φ1(t)−1 =

0 2 2t

0 0 10 0 0

1 −2t t2

0 1 −t0 0 1

=

0 2 0

0 0 10 0 0

,

so indeed Φ1 is a fundamental matrix of the LTI system (6.1). However, since

Φ2(t)Φ2(t)−1 =

0 1 2t

0 0 10 0 0

1 −t 0

0 1 −t0 0 1

=

0 1 t

0 0 10 0 0

is a function of t, the matrix-valued function Φ2 is not a fundamental matrix of (6.1). Neverthe-less, Φ2 is a fundamental matrix of the LTV system x(t) = A(t)x(t), where A(t) = Φ2(t)Φ2(t)−1.

Theorem 6.5 Let Φ be a fundamental matrix of (6.1). Then Ψ is a fundamental matrix of (6.1)if and only if there exists a nonsingular matrix P ∈ R

n×n such that Ψ(t) = Φ(t)P for all t ∈ R.

Proof . Suppose P is invertible and Ψ(t) = Φ(t)P for all t. Then it is easy to check that Ψ is asolution of (6.2). Moreover, since det Φ(t) = 0 for all t and since det P = 0, we have det Φ(t)P =det Φ(t)det P = 0 for all t. Thus by Theorem 6.4, Ψ is a fundamental matrix. Conversely, supposethat Ψ is a fundamental matrix. As Φ(t)Φ(t)−1 = I for all t, the chain rule gives

ddt

ΦΦ−1

(t) = 0 ⇒ d

dt

Φ−1

(t) = −Φ(t)−1 d

dt

Φ

(t)Φ(t)−1 = −Φ(t)−1A




for all t ∈ R. Using this equality, we obtain

ddt

Φ−1Ψ

(t) = Φ(t)−1 d

dt

Ψ

(t) + ddt

Φ−1

(t)Ψ(t) = Φ(t)−1AΨ(t) − Φ(t)−1AΨ(t) = 0

for all t. Hence Φ(·)−1Ψ(·) is constant, which implies the existence of a matrix P such thatΦ(t)−1Ψ(t) = P for all t; since Φ(t)−1 and Ψ(t) are both nonsingular, P is nonsingular.

6.3 State Transition Matrix [3, Sections 3.2.2 & 3.3.1]

6.3.1 Definitions

Solving the matrix differential equation (6.2) subject to the initial condition X(t0) = I ∈ Rn×n is

equivalent to solving the initial-value problems

xi(t) = Axi(t), t ∈ R; xi(t0) = ei ∈ Rn, (6.4)

separately over all i = 1, . . . , n, where ei denotes the ith standard basis vector (i.e., ith columnof the identity matrix I). That is, if φ1, . . . , φn ∈ C (R,Rn) are the unique solutions to (6.4) such

that φi(t0) = ei and ˙φ(t) = Aφ(t) for all t ∈

R and for each i = 1, . . . , n, then the fundamentalmatrix Φ(·, t0) = [φ1(·) · · · φn(·)] is the unique solution to the matrix differential equation

∂

∂tΦ(t, t0) = AΦ(t, t0)

subject toΦ(t0, t0) = I.

The Rn×n-valued function Φ(·, t0) is called the state transition matrix of (6.1). Every x0 ∈ Rn is a

linear combination of the standard basis vectors e1, . . . , en (i.e., x0 = Ix0), and so, as we alreadyknow, the solution to the system of differential equations (6.1) subject to the initial conditionx(t0) = x0 has a unique solution given by φ(·) = Φ(·, t0)x0, which is the same linear combination

of the columns of the state transition matrix. Moreover, Φ(t, t0) is determined by the Peano-Bakerseries, which in the case of LTI systems reduces to the matrix exponential given by

Φ(t, t0) = eA(t−t0) =∞k=0

Ak(t − t0)k

k!

for all t, t0 ∈ R with the convention that A0 = I and 0! = 1.

6.3.2 Properties of State Transition Matrix

Theorem 6.6 Let Φ(t, t0) be the state transition matrix of (6.1). If P is invertible, then the state variable change Px(t) = x(t) leads to the equivalent state equation

˙x(t) = Ax(t), t ∈ R; A = P−1AP. (6.5)

If Φ(·, t0) is the state transition matrix of (6.1), then the state transition matrix of (6.5) is

Φ(t, t0) = P−1Φ(t, t0)P, t, t0 ∈ R. (6.6)

Proof . Equation (6.5) follows from x(t) = P−1x(t) and ddt

P−1x

(t) = (P−1AP)P−1x(t), and

equation (6.6) from ∂ ∂t

P−1Φ(t, t0)P

= (P−1AP)P−1Φ(t, t0)P and P−1Φ(t0, t0)P = I.




Theorem 6.7 Let Φ(·, t0) be the state transition matrix of (6.1). Then the following hold:

(a) If Ψ is any fundamental matrix of (6.1), then Φ(t, t0) = Ψ(t)Ψ(t0)−1 for all t, t0 ∈ R;

(b) Φ(t, t0) is nonsingular and Φ(t, t0)−1 = Φ(t0, t) for all t, t0 ∈ R;

(c) Φ

(t, t0) = Φ

(t, s)Φ

(s, t0) for all t, s, t0 ∈R

; (semigroup property)

Proof . Part (a) follows from ∂ ∂t

Ψ(t)Ψ(t0)−1

= AΨ(t)Ψ(t0)−1 and Ψ(t0)Ψ(t0)−1 = I. Choose

any fundamental matrix Ψ. Then det Ψ(t) = 0 for all t, so (a) implies Φ(t, t0) is nonsingu-lar for all t, t0 because det Φ(t, t0) = det

Ψ(t)Ψ(t0)−1

= det Ψ(t)det Ψ(t0)−1 = 0. Moreover,

Φ(t, t0)−1 =

Ψ(t)Ψ(t0)−1−1

= Ψ(t0)Ψ(t)−1, so (b) holds. Finally, for any fundamental matrix Ψ,we have Φ(t, t0) = Ψ(t)Ψ(t0)−1 =

Ψ(t)Ψ(s)−1

Ψ(s)Ψ(t0)−1

, so (c) holds.

As an example, consider the homogeneous LTI system x(t) = Ax(t) given by

x1(t)x2(t)x3(t)

=

0 2 00 0 10 0 0

x1(t)x2(t)x3(t)

.

It follows readily from x3(t) = 0, x2(t) = x3(t), and x1(t) = 2x2(t) that φ1(t) = [ 1 0 0 ]T,φ2(t) = [2t 1 0]T, and φ3(t) = [t2 t 1]T are solutions of the system among infinitely many others.Moreover, {φ1,φ2,φ3} is a linearly independent set in C (R,R3). Thus Ψ = [φ1 φ2 φ3] is afundamental matrix of the system. Then by part (a) of Theorem 6.7 we have that the statetransition matrix of the system is

Φ(t, t0) = Ψ(t)Ψ(t0)−1 =

1 2t t2

0 1 t0 0 1

1 −2t0 t2

0

0 1 −t0

0 0 1

=

1 2(t − t0) (t − t0)2

0 1 t − t0

0 0 1

.

Indeed, we have Φ(t, t0) = eA(t−t0) for all t, t0 ∈ R. As another example, consider the homogeneousLTV system x(t) = A(t)x(t) given by

x1(t)

x2(t)x3(t)

=

0 1 t

0 0 10 0 0

x1(t)

x2(t)x3(t)

.

As φ1(t) = [1 0 0]T, φ2(t) = [t 1 0]T, and φ3(t) = [t2 t 1]T are linearly independent solutions tothis LTV system, the matrix-valued function Ψ = [φ1 φ2 φ3] is a fundamental matrix of the LTVsystem (but not of any LTI system). Therefore, the state transition matrix of the LTV system is

Φ(t, t0) =

Ψ(t)

Ψ(t0)

−1

=

1 t t2

0 1 t0 0 1

1 −t0 0

0 1 −t00 0 1

=

1 t − t0 t(t − t0)

0 1 t − t00 0 1

.

That is, ∂ ∂t Φ(t, t0) = A(t)Φ(t, t0) for all t ∈ R with Φ(t0, t0) = I.

6.3.3 Properties of Matrix Exponentials

Theorem 6.8 Let A ∈ Rn×n. Then the following hold:

(a) eAt1eAt2 = eA(t1+t2) for all t1, t2 ∈ R;




(b) AeAt = eAtA for all t ∈ R;

(c)

eAt−1

= e−At for all t ∈ R;

(d) det eAt = e(trA)t for all t ∈ R;

(e) If AB = BA, then eAteBt = e(A+B)t for all t ∈ R.

Proof . Part (a) holds because eAt1eAt2 = Φ(t1, 0)Φ(0, −t2) = Φ(t, −t2) = eA(t1+t2), where thesecond equality follows from Theorem 6.7(c). Part (b) follows from

A

limm→∞

m

k=0

Aktk

k!

= lim

m→∞

m

k=0

Ak+1tk

k! =

limm→∞

m

k=0

Aktk

k!

A.

Part (c) is immediate from Theorem 6.7(b). If Ψ is a fundamental matrix of (6.1), then Theo-rem 6.7(a) gives det eAt = det

Ψ(t)Ψ(0)−1

= det Ψ(t)/ det Ψ(0), where the last equality follows

from 1 = det I = det

Ψ(0)Ψ(0)−1

= det Ψ(0)det Ψ(0)−1. Thus Abel’s formula (6.3) gives (d).Finally, if AB = BA, then AiB j = B jAi and hence we have

limm→∞

m

i=0

Aiti

i! lim

l→∞l

k=0

Bjtj

j! = lim

l→∞l

k=0

Bjtj

j! lim

m→∞m

i=0

Aiti

i! ,

which implies (e).

In general, matrices do not commute. For example, if

A =

0 10 0

and B =

0 01 0

,

then AB = BA. In this case,

eAt =

1 t0 1

, eBt =

1 0t 1

, e(A+B)t =

12 (et + e−t) 1

2 (et − e−t)12 (et − e−t) 1

2 (et + e−t)

,

so eAteBt = e(A+B)t.

6.4 How to Determine Matrix Exponentials [3, Section 3.3.2]

We have already seen that the state transition matrix of a linear (time-invariant or time-varying)system can be obtained from a fundamental matrix, whose columns are linearly independent solu-tions to the given homogeneous system. Other methods to obtain the state transition matrix of anLTI system are summarized in this section.

6.4.1 Infinite Series Method

Given a matrix A ∈ Rn×n, evaluate the partial sum

Sm(t) =mk=0

tkk!

Ak = I + tA + t2

2 A + · · · + tm

m!Am

for m = 0, 1, . . . . Then, since Sm(t) → eAt uniformly on any bounded interval J in R, it isguaranteed that eAt ≈ Sm(t) for all t ∈ J and for all sufficiently large m. If A is nilpotent, thenthe partial sum Sm(t), with m = n − 1, gives a closed-form expression for eAt. For example,

A =

0 α0 0

⇒ A2 =

0 00 0

⇒ eAt = I + tA =

1 αt0 1

.




6.4.2 Similarity Transformation Method

Diagonalizable Case . If A ∈ Rn×n has n linearly independent eigenvectors v1, . . . , vn corresponding

to its (not necessarily distinct) eigenvalues λ1, . . . , λn, then let P = [v1 · · · vn]. Then, sinceJ = P−1AP = diag{λ1, . . . , λn}, Theorem 6.6 gives

eAt = PeJtP−1 = P

eλ1t

0. . .

0 eλnt

P−1.

General Case . Generate n linearly independent generalized eigenvectors v1, . . . , vn of A ∈Rn×n such that P = [v1 · · · vn] takes A into the Jordan canonical form J = P−1AP =diag{J0, . . . , Jm} ∈ Cn×n, where J1, . . . , Jm are Jordan blocks of varying dimensions. Then

eAt = PeJtP−1 = P

eJ1t 0. . .

0 eJmt

P−1.

A Jordan block is the sum of a diagonal matrix and a nilpotent matrix; that is, Jk = Λk + Nk with

Λk =

λk 0 · · · 00 λk · · · 0...

... . . .

...0 0 · · · λk

and Nk =

0 1 0. . .

. . .. . . 1

0 0

,

where λk is some eigenvalue of A for each k. Let Jk ∈ Cnk×nk for each k = 1, . . . , m, so thatm

k=1 nk = n. Then we have Nnkk = 0, and so the series defining eNkt terminates for each k.

Moreover, direct computation yields ΛkNk = NkΛk for each k. Therefore, Theorem 6.8(e) yields

eJkt = eΛkteNkt = eλkt

1 t t2/2 · · ·

tnk−1/(nk −

1)!0 1 t · · · tnk−2/(nk − 2)!0 0 1 · · · tnk−3/(nk − 3)!...

... ...

. . . ...

0 0 0 · · · 1

, k = 1, . . . , m. (6.7)

Note that, if λk in (6.7) is complex, then Euler’s formula gives

eλkt = e(σk+iωk)t = eσkt(cos ωkt + i sin ωkt).

For example, if α = 0, then

A =0 0

α 0

⇒ J = P

−1

AP =0 1

0 0

, P =0 1

α 0

⇒ eAt

= P1 t

0 1

P

−1

= 1 0

αt 1

.

6.4.3 Cayley-Hamilton Theorem Method

In view of the Cayley-Hamilton Theorem, there exist β i(m; ·) ∈ C (R,R) for i = 0, . . . , n − 1 andm = 1, 2, . . . such that, for A ∈ R

n×n,

eAt = limm→∞

mk=0

tk

k!Ak = lim

m→∞

n−1i=0

β i(m; t)Ai =n−1i=0

limm→∞

β i(m; t)Ai.




Then, letting β i(t) = limm→∞ β i(m; t) for i = 0, . . . , n − 1 and t ∈ R, we obtain

eAt =n−1i=0

β i(t)Ai, t ∈ R. (6.8)

Thus one can determine eAt by obtaining β i(t) for all i and t. Let f (s) and g(s) be two analyticfunctions (i.e., functions locally given by a convergent power series; e.g., polynomials, exponentialfunctions, trigonometric functions, logarithms, etc.). Let p(s) =

p j=1(s − λ j)mk be the character-

istic polynomial of A. Then f (A) = g(A) if

dkf

dsk (λ j) =

dkg

dsk(λ j), k = 0, . . . , m j − 1, k = 1, . . . , p, (6.9)

where p

j=1 m j = n. (Proof . Equations (6.9) imply that f (s) − g(s) has p(s) as a factor, so theresult follows from the Cayley-Hamilton Theorem.) Thus the terms β i(t) in (6.8) can be determinedby letting

f (s) = est and g(s) = β 0(t) + β 1(t)s + · · · + β n−1(t)sn−1.

For example, if A

= [

0 0

α 0 ], then the characteristic polynomial of A

is p(s) = det(sI

−A

) = s

2

. Letf (s) = est and g(s) = β 0(t) + β 1(t)s. Then, as f (0) = g(0) and ∂f ∂s (0) = ∂g

∂s (0) imply β 0(t) = 1 andβ 1(t) = t, we obtain that eAt = g(A) = I + tA.

6.4.4 Laplace Transform Method

We know that eAt is the inverse Laplace transform of (sI−A)−1 for A ∈ Rn×n. The partial fraction

expansion of each entry in (sI − A)−1 gives

(sI − A)−1 =

p j=1

mj−1k=0

k!

(s − λ j)k+1A jk ,

where λ1, . . . , λ p are the distinct eigenvalues of A, with corresponding algebraic multiplicitiesm1, . . . , m p such that

p j=1 mi = n, and where each A jk ∈ C

n×n is a matrix of partial fractionexpansion coefficients. Taking the inverse Laplace transform gives

eAt =

p j=1

mj−1k=0

tkeλjtA jk . (6.10)

If some eigenvalues are complex, conjugate terms on the right side of (6.10) can be combined usingEuler’s formula to give a real representation. If A jk is nonzero, then tkeλjt is called a mode of the system x(t) = Ax(t), t ∈ R. It is easily seen that x(t) → 0 as t → ∞ under arbitrary initialconditions if each mode of the system converges to zero (i.e., each λ j has negative real part). For

example, if A = [ 0 0α 0 ], then e

At

is the inverse Laplace transform of (sI − A)−1

= 1/s 0

α/s2 1/s

. Themodes of the system x(t) = Ax(t) in this example are 1 and t.

6.5 Discrete-Time Case

6.5.1 State Transition Matrix

Given an A ∈ Rn×n, consider the discrete-time initial-value problem of solving

x(t + 1) = Ax(t), t = t0, t0 + 1, . . . ; x(t0) = x0. (6.11)




The discrete-time state transition matrix is defined to be the unique solution Φ(·, t0) ∈ Rn×n of

the matrix difference equation

Φ(t + 1, t0) = AΦ(t, t0), t = t0, t0 + 1, . . . ,

subject toΦ(t

0, t

0) = I.

That is, Φ(t, t0) = At−t0 for all t, t0 ∈ Z with t ≥ t0, and as we have seen earlier the uniquesolution to (6.11) is given by x(t) = Φ(t, t0)x0, t ≥ t0. Unlike the continuous-time case, thedifference equation (6.11) cannot be solved backward in time unless A is nonsingular. This isbecause, unless A is invertible, the discrete-time state transition matrix Φ(t, t0) is not invertible.An immediate consequence is that the semigroup property holds only forward in time; that is,Φ(t, t0) = Φ(t, s)Φ(s, t0) for t ≥ s ≥ t0. Due to time-invariance, we have Φ(t, t0) = Φ(t − t0, 0).

6.5.2 How to Determine Powers of Matrices

Given an A ∈ Rn×n, if J = P−1AP is a Jordan canonical form of A, then we have

A

t

= (PJP

−1

)

t

= PJ

t

P

−1

.In particular, if J = diag{λ1, . . . , λn}, then Φ(t, 0) = At = P diag{λt

1, . . . , λtn}P−1. Also, by the

Cayley-Hamilton Theorem there exist β i(·), i = 1, . . . , n − 1, such that

At =n−1i=0

β i(t)Ai, t = 0, 1, . . . . (6.12)

Thus, as in the continuous-time case, one can determine At by letting

f (s) = st and g(s) = β 0(t) + β 1(t)s + · · · + β n−1(t)sn−1,

and then solving (6.9) for the terms β i(t). Finally, one can use the fact that Φ(·, 0) is the inverse z-transform of z(zI−A)−1 = (I−z−1A)−1 to determine At. If λ1, . . . , λ p are the distinct eigenvalues

of A, with corresponding algebraic multiplicities m1, . . . , m p such that

pi=1 mi = n, we have

z(zI − A)−1 = z

p j=1

mj−1k=0

k!

(z − λ j)k+1A jk

and hence

At =

p j=1

min{mj−1,t}k=0

t!

(t − k)!λt−k j A jk ,

where A jk ∈ Cn×n are the matrices of partial fraction expansion coefficients. If A jk is nonzero,

then t!(t−k)! λt−k

j is called a mode of the system x(t +1) = Ax(t). We have x(t) → 0 under arbitrary

initial conditions if each mode converges to zero (i.e., each λ j has a magnitude less than 1).

References

[1] P. J. Antsaklis and A. N. Michel, Linear Systems , 2nd ed. Boston, MA: Birkhauser, 2006.

[2] W. J. Rugh, Linear System Theory . Englewood Cliffs, NJ: Prentice-Hall, 1993.





VII. Stability of Linear Systems



Fall 2010

7.1 Definitions and Facts

We will see that internal stability of linear systems depends on the structural properties of thestate matrix A ∈ R

n×n that governs the dynamics via x = Ax. The Jordan canonical form J of A

reveals a complete picture of the structure of A. However, J in general contains complex entries.Therefore, we need to extend our usual definitions of vector and matrix norms to complex vectorsand complex matrices, respectively. If A ∈ Cm×n, then denoted by A∗ ∈ Cn×m is the conjugate transpose of A; that is, A∗ is equal to AT with its entries replaced by their complex conjugates.The Euclidean norm · on Cn is then defined by

x =√

x∗x = (|x1|2 + · · · + |xn|2)1/2

for x = [x1 · · · xn]T ∈ Cn. It can be easily verified that x > 0 for all nonzero x ∈ Rn, and thatαx = |α|x and x + y ≤ x + y for all x, y ∈ Cn. The spectral norm · on Cn×n isdefined as before by A = max{Ax : x = 1} for all A ∈ C

n×n. It can be shown that

A = max{√ λ : λ is an eigenvalue of A∗

A}= max{

√ λ : λ is an eigenvalue of AA∗} = A∗.

By definition, Ax ≤ Ax for all A ∈ Cn×n and x ∈ Cn. Also, we have that A > 0 for allnonzero A ∈ Cn×n, and that αA = |α|A and A + B ≤ A + B for all A, B ∈ Cn×n andα ∈ C. Another important property of the spectral norm is that it is submultiplicative ; that is,

AB ≤ AB

for A, B ∈ Cn×n. Another useful fact to remember is that we have

t1

t0

x(t) dt ≤ t1

t0 x(t)

dt and

t1

t0

A(t) dt ≤ t1

t0 A(t)

dt

for all t0, t1 ∈ R with t0 ≤ t1 and for all x(·) ∈ C (R,Rn) and A(·) ∈ C (R,Rm×n).It is obvious that all the above definitions and properties are valid for A ∈ Rn×n, for which

A∗ = AT. If A = AT ∈ Rn×n, then A is said to be symmetric . For A ∈ R

n×n, we shallwrite A ≥ 0 to mean that A is symmetric and nonnegative definite (i.e., xTAx ≥ 0 for allx ∈ R

n). Nonnegative definite matrices are also called positive semidefinite . In particular, we shallwrite A > 0 to indicate that A is symmetric and positive definite (i.e., xTAx > 0 for all nonzerox ∈ R

n). All the eigenvalues of a real symmetric matrix are real . (Proof . If A ∈ Rn×n is symmetric,

1




and if Av = λv for some nonzero v ∈ Cn, then λ∗v2 = (λv)∗v = v∗A∗v = v∗Av = λv2,

so λ∗ = λ.) Also, all the eigenvalues of a real symmetric, nonnegative definite (resp. positive definite) matrices are nonnegative (resp. positive). (Proof . If A ≥ 0 and Av = λv for somenonzero v ∈ C

n, then λv2 = v∗Av = ℜ[v]TAℜ[v] + ℑ[v]TAℑ[v] ≥ 0; the case of A > 0 issimilar.) If A ∈ Rn×n is symmetric, and if λmin(A) and λmax(A) denote the smallest and largest

eigenvalues of A, respectively, then we have λmin(A)x2

≤ x

T

Ax ≤ λmax(A)x2

for all x ∈R

n

;that is, λmin(A)I ≤ A ≤ λmax(A)I; in particular, if A is real and symmetric, λmin(A) ≥ 0 (resp.λmin(A) > 0) if and only if A ≥ 0 (resp. A > 0). A consequence is that a matrix A ≥ 0 isnonsingular if and only if A > 0. It is readily seen that, for any rectangular matrix M ∈ Rm×n,we have MTM ≥ 0 and MMT ≥ 0.

7.2 Internal Stability [1, Sections 4.3–4.6][2, Chapters 6 & 7]

Let A ∈ Rn×n. In this section, we will study boundedness properties and asymptotic behavior of solutions of the homogeneous differential equation

x(t) = Ax(t), t ∈ R, (7.1)

whose state transition matrix is denoted by Φ(t, t0) ∈ Rn×n for t, t0 ∈ R.

7.2.1 Lyapunov Stability

Definition 7.1 The system (7.1) is said to be (uniformly) stable (in the sense of Lyapunov) if there exists a c > 0 such that

x(t) ≤ cx(t0) for all t, t0 ∈ R with t ≥ t0 and for all x(t0) ∈ R

n.

If a system is uniformly stable in the sense of Lyapunov, then one can guarantee the entire statetrajectory x(t), t ≥ t0, remains within an arbitrarily small neighborhood of the origin by ensuringthat the initial state x(t0) is sufficiently close to the origin. That is, for every ε > 0, there existsa δ > 0 such that x(t) < ε whenever t ≥ t0 and x(t0) < δ . Since we focus on linear time-invariant systems, this apparently weaker requirement is in fact equivalent to what Definition 7.1requires. In Definition 7.1, uniformity refers to the fact that the constant c in the definition can bechosen independently of the initial time t0; this uniformity-in-time property is of course automaticfor time-invariant systems. The following result shows that Lyapunov stability is equivalent to theboundedness of the state transition matrix.

Lemma 7.2 The system (7.1) is uniformly stable in the sense of Lyapunov if and only if there exists a c > 0 such that

Φ(t, t0) ≤ c

for all t, t0

∈R with t

≥t0.

Proof . Suppose that such a c exists. Then x(t) = Φ(t, t0)x(t0) ≤ Φ(t, t0)x(t0) ≤cx(t0) whenever t ≥ t0, so sufficiency holds. Conversely, suppose that x(t) ≤ cx0 whenevert ≥ t0 and x(t0) = x0 ∈ Rn. By the definition of spectral norm, there exists an x0 ∈ Rn suchthat x0 = 1 and Φ(t, t0)x0 = Φ(t, t0). Choosing such an x0 and letting x(t0) = x0 givesΦ(t, t0) = Φ(t, t0)x0 = x(t) ≤ cx0 = c, so necessity holds.

This result is valid for linear time-varying systems as well. Since Lyapunov stability is a propertyof the state matrix A alone, we shall say that the matrix A is stable to mean that the system (7.1)is stable.




Theorem 7.3 The following are equivalent:

(a) The matrix A is uniformly stable in the sense of Lyapunov;

(b) There exists a c > 0 such that eAτ ≤ c for all τ ≥ 0;

(c) All eigenvalues of A have negative or zero real parts, and the geometric multiplicity is the same as the algebraic multiplicity for each eigenvalue with zero real part.

Proof . By Lemma 7.2, the matrix A is stable if and only if Φ(t, t0) = eA(t−t0) is bounded over allt − t0 ∈ [0, ∞), so (a) and (b) are equivalent. Let P be a nonsingular matrix such that J = P−1AP

is in the Jordan form. Then it follows from eJτ = P−1eAτ P and eAτ = PeJτ P−1 that eAτ isbounded on [0, ∞) if and only if eJτ is bounded on [0, ∞). Every entry of eJτ is of the form βtkeλτ ,where β is a positive number, k a nonnegative integer, and λ an eigenvalue of A. If ℜ[λ] < 0, thenit is readily seen that βtkeλτ is bounded on [0, ∞). If ℜ[λ] = 0, then βtkeλτ is bounded on [0, ∞)if and only if k = 0 (i.e., if and only if the size of each of the Jordan blocks associated with theeigenvalue λ is one-by-one). Since every Jordan block associated with λ is one-by-one if and onlyif λ has the same geometric and algebraic multiplicities, equivalence of (b) and (c) follows.

For example, consider

A1 =

0 1−1 0

and A2 =

0 1 0

0 0 00 0 −1

.

The two eigenvalues ±i of A1 are distinct and their real parts are nonpositive, so A1 is Lyapunovstable. On the other hand, the eigenvalues of A2 are 0 and −1, where only one Jordan block isassociated with the repeated eigenvalue 0, so A2 is not Lyapunov stable. Indeed,

eA1τ =

cos τ sin τ

−sin τ cos τ

and eA2τ =

1 τ 00 1 0

0 0 e−τ

;

while eA1τ = 1 ≤ 1 for all τ ≥ 0, we have eA2τ → ∞ as τ → ∞.

7.2.2 Asymptotic Stability

Definition 7.4 The system (7.1) is said to be (uniformly) asymptotically stable if it is stable and if given any c > 0 there exists a T > 0 such that

x(t) ≤ cx(t0)

for all t0 ∈ R, for all t ≥ t0 + T , and for all x(t0) ∈ Rn.

Definition 7.5 The system (7.1) is said to be (uniformly) exponentially stable if there exist c, λ >0 such that

x(t) ≤ ce−λ(t−t0)x(t0) for all t, t0 ∈ R with t ≥ t0 and for all x(t0) ∈ R

n.

What exponential stability requires is that, no matter what the initial state is, one can guaranteethe state x(t) remains within an arbitrarily small neighborhood of the origin by ensuring that thetime span t − t0 is sufficiently large. That is, exponential stability guarantees asymptotic stability;




the converse is not true for general dynamical systems because exponential stability requires notonly that the state decays over time but also that the decay rate is exponential. However, for linear(time-varying and time-invariant) systems, uniform asymptotic stability and uniform exponentialstability are equivalent. Note that we require uniformity in time (i.e., T in Definition 7.4 and c, λin Definition 7.5 are independent of t0), which is automatic for time-invariant systems.

Lemma 7.6 The following are equivalent:

(a) The system (7.1) is uniformly asymptotically stable;

(b) The system (7.1) is uniformly exponentially stable;

(c) There exist c, λ > 0 such that Φ(t, t0) ≤ ce−λ(t−t0) for all t, t0 ∈ R with t ≥ t0;

(d) There exists β > 0 such that t

t0Φ(t, τ ) dτ ≤ β for all t, t0 ∈ R with t ≥ t0.

Proof . We will show (a) ⇒ (b) ⇒ (c) ⇒ (d) ⇒ (a). Suppose the system is asymptotically stable.Then, there exists a T > 0 such that x(t0 + T ) ≤ 2−1x(t0), x(t0 + 2T ) ≤ 2−1x(t0 + T ), . . . ,

for all t0 and x(t0). This gives that, for t ∈ [t0 + kT, t0 + (k + 1)T ) and k = 0, 1, . . . , we have

x(t) ≤ Φ(t, t0 + kT )x(t0 + kT ) ≤ Φ(t, t0 + kT )2−kx(t0).

Moreover, since the system is Lyapunov stable as well, Lemma 7.2 implies that there is a c > 0such that Φ(t, t0 + kT ) ≤ c whenever t ≥ t0 + kT . Thus, for t ∈ [t0 + kT, t0 + (k + 1)T ) andk = 0, 1, ..., we have

x(t) ≤ c 2−kx(t0) ≤ 2c 2−(t−t0)/T x(t0) = 2cet−t0T

ln( 12

)x(t0),

so putting λ = (−1/T )ln(1/2) establishes that (a) implies (b). Next, suppose the system isexponentially stable, so that x(t) ≤ ce−λ(t−t0)x(t0) whenever t ≥ t0 and x(t0) ∈ Rn. Choose

an x0 ∈ R

n

such that x0 = 1 and Φ(t, t0)x0 = Φ(t, t0). Then letting x(t0) = x0 gives

Φ(t, t0) = Φ(t, t0)x0 = x(t) ≤ ce−λ(t−t0)x0 = ce−λ(t−t0),

so (b) implies (c). Now, suppose Φ(t, t0) ≤ ce−λ(t−t0) whenever t ≥ t0. Then we have t

t0

Φ(t, τ ) dτ ≤ t

t0

ce−λ(t−τ ) dτ = c

λ

1 − e−λ(t−t0)

≤ c

λ

whenever t ≥ t0. Thus putting β = c/λ establishes that (c) implies (d). Finally, suppose tt0

Φ(t, τ ) dτ ≤ β holds whenever t ≥ t0. It is readily seen that t

t0∂

∂τ Φ(t, τ ) dτ = I − Φ(t, t0).

Also, term-by-term differentiation of the Peano-Baker series yields ∂ ∂τ Φ(t, τ ) = −Φ(t, τ )A. Hence

we have Φ(t, t0) = I + tt0 Φ(t, τ )A dτ , which leads to

Φ(t, t0) ≤ I +

t

t0

Φ(t, τ )A dτ ≤ 1 + β A, (7.2)

which implies the system is Lyapunov stable. Moreover, for t > t0, we have

Φ(t, t0) = 1

t − t0

t

t0

Φ(t, t0) dτ ≤ 1

t − t0

t

t0

Φ(t, τ )Φ(τ, t0) dτ ≤ β (1 + β A)

t − t0,




where the last inequality follows from (d) and (7.2). For any c > 0, putting T = β (1 + β A)/cgives Φ(t, t0) ≤ c for t ≥ t0 + T , or x(t) ≤ Φ(t, t0)x(t0) ≤ cx(t0) for t ≥ t0 + T andx(t0) ∈ R

n. Thus (d) implies (a).

This lemma is also valid for time-varying systems with bounded A(·). As in the case of Lyapunovstability, asymptotic stability of the system (7.1) is determined solely by the structure (i.e., the

Jordan form) of A. Therefore, we shall speak of asymptotic stability and exponential stability of the matrix A.


(a) The matrix A is uniformly asymptotically stable;

(b) The matrix A is uniformly exponentially stable;

(c) There exist c, λ > 0 such that eAτ ≤ ce−λτ for all τ ≥ 0;

(d) There exists β > 0 such that ∞

0 eAτ dτ ≤ β ;

(e) All eigenvalues of A have negative real parts;

(f) There exists a Q > 0 such that ATQ + QA < 0. (Lyapunov Inequality)

Proof . The equivalence of (a), (b), (c), and (d) is immediate from Lemma 7.6. To show (a)and (e) are equivalent, let P be a nonsingular matrix such that J = P−1AP is in the Jordan form.It follows from eJτ = P−1eAτ P and eAτ = PeJτ P−1 that eJτ is bounded and converges to zeroas τ → ∞ if and only if eAτ is bounded and converges to zero as τ → ∞. In particular, Sinceevery entry of eJτ is of the form βtkeλτ , where β is a positive number, k a nonnegative integer, andλ an eigenvalue of A, we obtain that eAτ → 0 if and only if all eigenvalues of A have negativereal parts. Thus (a) and (e) are equivalent.

We will complete the proof by showing that (c) and (f) are equivalent. Suppose eAτ ≤ ce−λτ

for all τ ≥ 0. LetQ =

∞0

eATτ eAτ dτ.

SinceeA

Tτ =

(eAτ )T = eAτ , we have

∞0

eATτ eAτ

dτ ≤ ∞

0

eATτ eAτ dτ ≤

∞0

c2e−2λτ dτ = c2

2λ,

so Q is well-defined. Using the fact that ddτ e

Aτ = AeAτ = eAτ A, we obtain

xT

ATQ + QA

x =

∞0

xT

d

dτ eA

Tτ

eAτ + eA

Tτ

d

dτ eAτ

x dτ

= ∞

0

d

dτ eAτ x2 dτ = eAτ x2

∞τ =0

= −x2

for all x ∈ Rn, so ATQ + QA = −I < 0. Now, to show (c) implies (f), it remains to prove Q > 0.

By definition, Q = QT. Also, since

d

dτ

xTeA

Tτ eAτ x

= xTeATτ

AT + A

eAτ x ≥ −AT + AeAτ x

2 ≥ −2AxTeATτ eAτ x




for all x ∈ Rn, we obtain

xTQx ≥ − 1

2A ∞

0

d

dτ

xTeA

Tτ eAτ x

dτ ≥ − 1

2A ∞

0

d

dτ eAτ x2 dτ =

1

2Ax2

for all x

∈ R

n. That is, Q

≥ (1/2

A

)I > 0. This establishes the implication from (c) to (f ). To

show the converse, suppose Q > 0 and ATQ + QA < 0. Let η = λmin(Q), ρ = λmax(Q), andν = λmin(−ATQ − QA). Then

d

dt

x(t)TQx(t)

= x(t)T(ATQ + QA)x(t) ≤ −ν x(t)2 and x(t)TQx(t) ≤ ρx(t)2

for t ≥ t0, so thatd

dt

x(t)TQx(t)

≤ −ν

ρx(t)TQx(t)

for t ≥ t0. If we put V (t) = x(t)TQx(t), then this implies after integrating from t0 to t that

V (t) ≤ e−νρ

(t−t0)V (t0), and hence that

x(t)TQx(t) ≤ e−νρ (t−t0)x(t0)TQx(t0)

for t ≥ t0, which gives

x(t)2 ≤ 1

ηx(t)TQx(t) ≤ 1

ηe−

νρ

(t−t0)x(t0)TQx(t0) ≤ ρ

ηe−

νρ

(t−t0)x(t0)2

for t ≥ t0. Thus condition (c) holds with c =

ρ/η and λ = ν /2ρ. This completes the proof.

For example, while none of the matrices A1 and A2 considered at the end of Section 7.2.1 isasymptotically stable, the matrix A =

−1 −1

1 −1

is asymptotically stable (and hence exponentially

stable) because all its eigenvalues −1 ± i have negative real parts. Also, with Q = I, we haveATQ + QA = AT + A =

−2 00 −2 < 0 and Q = [ 1 0

0 1 ] > 0, so the Lyapunov inequality is satisfied.

In the proof of Theorem 7.7, a function of the form V (t) = x(t)TQx(t) is used to show that thefeasibility of the Lyapunov inequality implies asymptotic stability. Such a function defines, in asense, the “total energy” of the system because there are continuous, strictly increasing functionsα(·) and β (·) with α(0) = β (0) = 0 such that α(x(t)) ≤ V (t) ≤ β (x(t)). Thus (it can be shownthat) the asymptotic stability of the system is guaranteed if there is a continuous, strictly increasingfunction γ (·) with γ (0) = 0 such that V (t) ≤ −γ (x(t)). Such a “total energy” function that canbe used to show the asymptotic stability of a dynamical system is called a Lyapunov function .As shown in the proof of Theorem 7.7, the feasibility of the Lyapunov inequality guarantees theexistence of a Lyapunov function and hence the asymptotic stability of the state matrix A. Indeed,

we have eAτ =

e−τ cos τ −e−τ sin τ e−τ sin τ e−τ cos τ

, and so eAτ → 0 as τ → ∞.

7.3 Input-Output Stability [1, Section 4.7][2, Chapter 12]

In this section, we will study the boundedness property of the zero-state response of linear systems

x(t) = Ax(t) + Bu(t),

y(t) = Cx(t) + Du(t).(7.3)

The input-output behavior of (7.3), under the zero initial state, is specified by the impulse responsematrix H(t, t0) = CΦ(t, t0)B + Dδ (t − t0) for t ≥ t0, where δ is the Dirac delta function.




Definition 7.8 The system (7.3) is said to be (uniformly) bounded-input bounded-output stable(or BIBO stable) if there exists an η > 0 such that, whenever x(t0) = 0, we have

y(t) ≤ η supτ ≥t0

u(τ )

for all t, t0 ∈ R with t ≥ t0 and for all (piecewise continuous) u with supτ ≥t0 u(τ ) < ∞.

Here, supτ ≥t0 u(τ ) denotes the supremum (or least upper bound ) of the set {u(τ ) : τ ≥ t0}.That is, if r = supτ ≥t0 u(τ ), then we have that u(τ ) ≤ r for all τ ≥ t0 and that r ≤ r′ wheneveru(τ ) ≤ r′ for all τ ≥ t0. Thus, the output of a BIBO stable system is bounded whenever theinput signal is bounded. Moreover, the uniformity requirement says that the single constant η inDefinition 7.8 works over all initial times t0. Of course, this uniformity requirement is automaticand hence redundant for time-invariant systems.

Lemma 7.9 The system (7.3) is uniformly bounded-input bounded-output stable if and only if there exists a ρ > 0 such that

t

t0 H(t, τ ) dτ ≤ ρ

for all t, t0 ∈ R with t ≥ t0.

Proof . Suppose t

t0H(t, τ )dτ ≤ ρ for t ≥ t0. If u is bounded and x(t0) = 0, we have

y(t) =

t

t0

H(t, τ )u(τ ) dτ

≤ t

t0

H(t, τ )u(τ ) dτ

≤ t

t0

H(t, τ ) dτ

supτ ≥t0

u(τ )

≤ ρ supτ ≥t0

u(τ )

for t ≥ t0, and so putting η = ρ yields the sufficiency part of the desired result. To show necessity,we will first consider the SISO case where H(t, τ ) = h(t, τ ) and y(t) =

tt0

h(t, τ )u(τ ) dτ whenever

x(t0) = 0 and t ≥ t0. Suppose the system is BIBO stable and yet no ρ > 0 satisfies t

t0|h(t, τ )|dτ ≤ ρ

for all t, t0 ∈ R with t ≥ t0; that is, for each ρ > 0, there exist tρ and sρ with tρ > sρ such that tρsρ

|h(tρ, τ )| dτ > ρ. Define the input signal u(t), t ≥ sρ, by

u(t) =

1 if t ∈ [sρ, tρ) and h(tρ, t) > 0;

−1 if t ∈ [sρ, tρ) and h(tρ, t) < 0;

0 otherwise.

Clearly, this u is bounded with supτ ≥sρ |u(τ )| = 1. However, when x(sρ) = 0, we have

|y(tρ)| =

tρ

sρ

h(tρ, τ )u(τ ) dτ

= tρ

sρ

|h(tρ, τ )| dτ > ρ = ρ supτ ≥sρ

|u(τ )|.

Since this holds for each ρ > 0, there does not exist an η > 0 such that |y(t)| ≤ η supτ ≥sρ |u(τ )|for all t, t0 ∈ R with t ≥ t0. This contradicts the BIBO stability assumption, and hence provesthe necessity part of the result for SISO cases. It remains to generalize necessity to MIMO cases.




Suppose no ρ > 0 satisfies t

t0H(t, τ )dτ ≤ ρ for all t, t0 ∈ R with t ≥ t0. Then, for each η > 0,

there exist iη, jη, sη, and tη with tη > sη such that tη

sη|hiη jη(tη, τ )| dτ > η, where hij denotes entry

(i, j) of H. By mimicking the SISO case, we can define a bounded u with supτ ≥sη u(τ ) = 1, sothat y(tη) > η . This way, necessity is established for MIMO cases as well.

Theorem 7.10 If the system (7.3) is uniformly asymptotically stable, then it is uniformly bounded-input bounded-output stable as well.

Proof . By Lemma 7.6 we have that there are constants c, λ > 0 such that t

t0

H(t, τ ) dτ ≤ t

t0

CΦ(t, τ )B dτ + D ≤ t

t0

ce−λ(t−τ ) dτ

CB + D

≤ (c/λ)CB + Dwhenever t ≥ t0. Thus Lemma 7.9 with ρ = (c/λ)CB + D establishes the result.

The above results—namely, Lemma 7.9 and Theorem 7.10—are also valid for LTV systems.Because y(t) =

tt0

CΦ(t, τ )Bu(τ ) dτ + Du(t), t ≥ t0, whenever x(t0) = 0, and Du(·) is boundedwhenever u(

·) is bounded, BIBO stability is independent of the “direct transmission” matrix D.

Thus we will speak of the BIBO stabilty of triples (A, B, C). According to Theorem 7.10, however,if A is asymptotically stable, then the triple (A, B, C) is BIBO stable regardless of B or C.

Theorem 7.11 Let A ∈ Rn×n, B ∈ R

n×m, and C ∈ Rl×n. The following are equivalent:

(a) The triple (A, B, C) is uniformly bounded-input bounded-output stable;

(b) There exists ρ > 0 such that ∞

0

CeAτ Bdτ ≤ ρ;

(c) All the poles of each entry of C(sI − A)−1B have negative real parts.

Proof . Equivalence of (a) and (b) is due to Lemma 7.9, so it remains to show that (b) and (c)are equivalent. Each entry of C(sI

−A)−1B is a strictly proper rational function (i.e., the degree

of the numerator polynomial is strictly less than the degree of the denominator polynomial), andhence can be expanded by partial fraction expansion into a sum of a finite number of terms of theform β/(s − λ)k, where λ is a pole. Consequently, each entry of the impulse response matrix is asum of a finite number of terms of the form βτ k−1eλτ , which is integrable if and only if λ has anegative real part. Hence we conclude that conditions (b) and (c) are equivalent.

For example, consider x(t) = Ax(t) + Bu(t) and y (t) = Cx(t), where

A =

−1 00 1

, B =

11

, and C =

1 0

.

Since the eigenvalues of A are ±1 and one of them has positive real part, the system is internallyunstable. However, the transfer function of the system is

h(s) = C(sI − A)−1B = 1/(s + 1), so the

system is BIBO stable. While the system internally has two modes (namely, e−τ

and eτ

), only thestable mode e−τ is “visible” from outside; that is, the unstable mode eτ does not manifest itself asa pole of the transfer function from u to y. Thus, such a system cannot be stabilizable with anyfeedback controller that generates the control input u based on the measurement y. In this case,one might want to replace either the input matrix B (i.e., the actuator) or the output matrix C

(i.e., the sensor) to make the unstable mode appear as a pole of the transfer function. For instance,if we replace C with C = [0 1], then the transfer function becomes 1/(s − 1) and it is possible tostabilize the system via feedback control; in this case, the pole at s = −1, which does not appearin the transfer function, is asymptotically stable and so it does not affect the closed-loop stability.




7.4 Stability of Discrete-Time Linear Systems [1, Section 4.8][3, Section 6.10]

Now, let us consider the internal stability and input-output stability of discrete-time linear time-invariant systems of the form

x(t + 1) = Ax(t) + Bu(t),

y(t) = Cx(t) + Du(t). (7.4)

The results and their proofs are analogous to the continuous-time cases, and thus we will presentthem without proof.

Definition 7.12 The system (7.4) is said to be (uniformly) exponentially stable if there exist c > 0and λ ∈ (0, 1) such that, whenever u(t) = 0 for all t ≥ t0,

x(t) ≤ cλt−t0x(t0)

for all t0, t ∈ Z with t ≥ t0 and for all x(t0) ∈ Rn.

Theorem 7.13 Let A ∈ Rn×

n. The following are equivalent:

(a) The discrete-time linear time-invariant system (7.4) is uniformly exponentially stable;

(b) There exist c > 0 and λ ∈ (0, 1) such that Ak ≤ cλk for all k = 0, 1, . . . ;

(c) The magnitude of every eigenvalue of A is less than one;

(d) There exists a Q > 0 such that ATQA − Q < 0. (discrete-time Lyapunov inequality)

Definition 7.14 The system (7.4) is said to be (uniformly) bounded-input bounded-output stableif there exists an η > 0 such that, whenever x(t0) = 0, we have

y(t) ≤ η supτ ≥t0

u(τ )

for all t, t0 ∈ Z with t ≥ t0 and for all u with supτ ≥t0 u(τ ) < ∞.


n×m, and C ∈ Rl×n. The following are equivalent:

(a) The system (7.4) is uniformly bounded-input bounded-output stable;

(b) There exists ρ > 0 such that ∞

k=0 CAkB ≤ ρ;

(c) All the poles of each entry of C(zI − A)−1B have magnitude less than one.

References



[3] P. J. Antsaklis and A. N. Michel, Linear Systems , 2nd ed. Boston, MA: Birkhauser, 2006.




VIII. Controllability of Linear Systems



Fall 2010

8.1 Introduction

Analogous to the fact that matrices are linear maps between finite-dimensional vector spaces, lineardynamical systems are linear operators between infinite-dimensional signal spaces. If T : C (R,Rm) →

C (R,Rl

) is a linear operator, for instance, then the null space (or kernel ) of T is defined as

N (T ) = {v ∈ C (R,Rm : T v = 0},

and the range (or image ) of T is defined as

R(T ) = {w ∈ C (R,Rl) : w = T v, v ∈ C (R,Rm)}.

Clearly, N (T ) and R(T ) are linear subspaces of C (R,Rm) and C (R, Rl), respectively. These sub-spaces for other types of linear operators, like T : C (R, Rm) → R

n or T : Rn → C (R,Rl), aredefined in the obvious way. Again, our notational convention is as follows: If u ∈ C (R,Rm) andT : C (R,Rm) → C (R,Rl), then, while u = u(·) ∈ C (R, Rm) and T u = (T u)(·) ∈ C (R,Rl), we have

u(t) ∈ R

m

and (T u)(t) ∈ R

l

for each t ∈ R.Consider the state equation

x(t) = Ax(t) + Bu(t), (8.1)

where A ∈ Rn×n and B ∈ R

n×m are constant matrices. We are concerned with the problem of steering the state x of the system (8.1) to a desired value by generating an appropriate controlinput signal u. This is a “controllability” problem. Such a problem arises, for instance, in track-seeking control of a hard disk drive, where the magnetic head of the hard disk drive needs to besteered from one position to another within a given time span. (Once the magnetic head has beenpositioned on or near the desired track, a track-following control mode would kick in to maintainthe head on the desired track to read the data on the disk; this is a “stability” problem.)

8.2 Reachable States

Definition 8.1 A state x1 ∈ Rn is said to be reachable (or controllable from the origin) if there

exists an input u that transfers the state x from x(0) = 0 to x(T ) = x1 for some T ∈ (0, ∞).

For t0 ≤ t1, let Lr(t0, t1) : C (R,Rm) → Rn be the linear operator defined by

Lr(t0, t1)u =

t1

t0

eA(t1−τ )Bu(τ ) dτ

1




for u ∈ C (R,Rm). We have x(t) = eAtx(0) + Lr(0, t)u for all t ∈ R, so a state x1 is reachable if there exists an input u such that x1 = Lr(0, T )u for some T ∈ (0, ∞). If we denote the set of allreachable states by Rr, then Rr constitutes a linear subspace of the state space R

n given by

Rr = {x1 ∈ Rn : x1 ∈ R(Lr(0, T )), 0 < T < ∞}.

That is, x1 is reachable if and only if x1 ∈ Rr.

Definition 8.2 The reachability Gramian of the system (8.1) is the matrix

Wr(t0, t1) =

t1

t0

eA(t1−τ )BBTeA

T(t1−τ ) dτ ∈ Rn×n, t0 ≤ t1.

Clearly, we have Wr(t0, t1) ≥ 0 for all t0, t1 ∈ R with t0 ≤ t1.

Lemma 8.3 R(Lr(t0, t1)) = R(Wr(t0, t1)) for all t0, t1 ∈ R with t0 ≤ t1.

Proof . For notational convenience, let us drop the arguments t0 and t1 whenever there is no

ambiguity. To show R(Wr) ⊂ R(Lr), choose an x1 ∈ R(Wr). Then x1 = Wrη1 for some η1 ∈ Rn.If we put u(τ ) = B

TeAT(t1−τ )η1, then

Lru =

t1

t0

eA(t1−τ )Bu(τ ) dτ =

t1

t0

eA(t1−τ )BBTeA

T(t1−τ )η1 dτ = Wrη1 = x1,

so x1 ∈ R(Lr). This establishes the desired inclusion. Conversely, to show the reverse inclusionR(Lr) ⊂ R(Wr), choose an x1 ∈ R(Lr), so that there is a u ∈ C (R,Rm) satisfying Lru = x1.Since Wr = W

Tr , we have R(Wr)⊕N (Wr) = R

n, so there is a unique pair (xr, xn) of xr ∈ R(Wr)and xn ∈ N (Wr) such that x1 = xr + xn. Since Wrxn = 0, we have x

TnWrxn = 0. Because

xTnWrxn = t1t0

xTneA

(t1−τ

)BBTeAT

(t1−τ

)xn dτ = t1t0

BTe

AT

(t1−τ

)xn2

dτ,

whereB

TeAT(t1−τ )xn

2is continuous in τ and nonnegative, this implies that B

TeAT(t1−τ )xn = 0

for all τ ∈ [t0, t1]. On the other hand, since Wr = WTr , we have R(Wr)⊥ = N (Wr), and so

xTr xn = 0, which implies

xTn xn = x

T1 xn = (Lru)T

xn =

t1

t0

u(τ )TB

TeAT(t1−τ )xn dτ.

This, along with BTeA

T(t1−τ )xn = 0 for all τ ∈ [t0, t1], implies that xn = 0, so that x1 = xr ∈R(Wr). This establishes the reverse inclusion and hence completes the proof.

Theorem 8.4 A state x1 ∈ Rn is reachable if and only if

x1 ∈ R(Wr(0, T )) (8.2)

for some T ∈ (0, ∞). Moreover, if (8.2) holds and if η1 ∈ Rn is any vector such that Wr(0, T )η1 =

x1, then an input u that transfers the state x from x(0) = 0 to x(T ) = x1 is given by

u(t) = BTeA

T(T −t)η1, t ∈ [0, T ).




Proof . The result is immediate from Lemma 8.3 and the first half of its proof.

As an example, consider

A =

1 00 1

and B =

10

,

which give the reachability Gramian

Wr(0, T ) =

T

0

eT −τ 0

0 eT −τ

10

1 0

eT −τ 00 eT −τ

dτ =

(e2T − 1)/2 0

0 0

.

It is readily seen that R(Wr(0, T )) = span{[1 0]T} for T > 0. Let x1 = [1 0]T. Then x1 is reachablebecause x1 ∈ R(Wr(0, T )), and hence, for every T > 0, there is an input signal u that steers thestate of the system x(t) = Ax(t) + Bu(t) from x(0) = 0 to x(T ) = x1. Since η1 = [2/(e2T − 1) 0]T,among infinitely many others, solve Wr(0, T )η1 = x1, such an input is given by

x1 =

10

⇒ u(t) =

1 0

eT −t 00 eT −t

2/(e2T − 1)

0

=

2eT −t

e2T − 1, t ∈ [0, T ).

Indeed, this u satisfies Lr(0, T )u = T

0 eA(T −τ )Bu(τ ) dτ = x1.

8.3 Controllable States

Definition 8.5 A state x0 ∈ Rn is said to be controllable (or controllable to the origin) if there

exists an input u that transfers the state x from x(0) = x0 to x(T ) = 0 for some T ∈ (0, ∞).

For t0 ≤ t1, let Lc(t0, t1) : C (R, Rm) → Rn be the linear operator defined by

Lc(t0, t1)u =

t1

t0

eA(t0−τ )Bu(τ ) dτ

for u ∈ C (R, Rm). A state x0 is controllable if there exists an input u such that −eAT x0 =Lr(0, T )u, or −x0 = Lc(0, T )u, for some T ∈ (0, ∞). Thus, the set of all controllable states is alinear subspace of the state space Rn given by

Rc = {x0 ∈ Rn : x0 ∈ R

Lc(0, T )

, 0 < T < ∞}.

Definition 8.6 The controllability Gramian of the system (8.1) is the matrix

Wc(t0, t1) =

t1

t0

eA(t0−τ )BBTeA

T(t0−τ ) dτ ∈ Rn×n, t0 ≤ t1.

Clearly, we have Wr(t0, t1) ≥ 0 for all t0, t1 ∈ R with t0 ≤ t1. The controllability Gramian and

the reachability Gramian are related via the identity

Wr(t0, t1) = Φ(t1, t0)Wc(t0, t1)Φ(t1, t0)T. (8.3)

Lemma 8.7 R

Lc(t0, t1)

= R(Wc(t0, t1)) for all t0, t1 ∈ R with t0 ≤ t1.

Proof . The proof is virtually identical to that of Lemma 8.3.




Theorem 8.8 A state x0 ∈ Rn is controllable if and only if

x0 ∈ R(Wc(0, T )) (8.4)

for some T ∈ (0, ∞). If (8.4) holds and if η0 ∈ Rn is such that Wc(0, T )η0 = x0, then an input u

that transfers the state x from x(0) = x0 to x(T ) = 0 is given by

u(t) = −BTe−A

Ttη0, t ∈ [0, T ).

Proof . The first part of the theorem is immediate from Lemma 8.7; the second part of thetheorem follows from (8.3) and Theorem 8.4.

As an example, consider A = [ 1 00 1 ] and B = [ 1

0 ] again. The controllability Gramian is

Wc(0, T ) =

T

0

e−τ 0

0 e−τ

10

1 0

e−τ 00 e−τ

dτ =

(1 − e−2T )/2 0

0 0

.

We have R(Wc(0, T )) = span{[1 0]T} for T > 0, so x0 = [1 0]T is controllable, and, for everyT > 0, there is an input u that steers the state of the system x(t) = Ax(t) + Bu(t) from x(0) = x0

to x(T ) = 0. With η0 = [2/(1 − e−2T ) 0]T, we have Wc(0, T )η0 = x0, so such an input is given by

x0 =

10

⇒ u(t) = −

1 0

e−t 00 e−t

2/(1 − e−2T )

0

= −

2e−t

1 − e−2T , t ∈ [0, T ).

Indeed, this u satisfies eAT x0 + Lr(0, T )u = x0 + Lc(0, T )u = x0 +

T

0 e−Aτ Bu(τ ) dτ = 0.

8.4 Controllability of Continuous-Time LTI Systems

We will now investigate the conditions under which an LTI system is reachable or controllable.

Definition 8.9 The system (8.1), or the pair (A, B), is said to be reachable if the reachable

subspace Rr = Rn. Similarly, it is said to be controllable if the controllable subspace Rc = Rn.

Definition 8.10 The controllability matrix (or reachability matrix) of the system (8.1) is

C =

B AB · · · An−1

B

∈ Rn×mn.

Lemma 8.11 The system (8.1) satisfies R(Wr(0, T )) = R(C) for every T > 0.

Proof . To show R(Wr(0, T )) ⊂ R(C) for every T > 0, suppose that x1 ∈ R(Wr(0, T )) for

some T > 0. Due to Lemma 8.3, we have x1 ∈ R(Lr(0, T )), so x1 = T

0 eA(T −τ )Bu1(τ ) dτ for

some u1. Due to the Cayley-Hamilton theorem, we may write eA(T −τ ) =

n−1k=0 β k(T − τ )A

k forsome β

0(T − τ ), . . . , β n−

1(T − τ ) ∈ R. Thus

x1 =n−1k=0

Ak

B

T

0β k(T − τ )u1(τ ) dτ =

n−1k=0

Ak

Bαk(T ) = Cα0(T )T

α1(T )T · · · αn−1(T )TT

,

where αk(T ) = T

0 β k(T − τ )u1(τ ) dτ ∈ Rm for k = 0, . . . , n − 1. This implies x1 ∈ R(C) for all

T > 0, and hence establishes the desired inclusion for all T > 0.Conversely, to show R(C) ⊂ R(Wr(0, T )) for every T > 0, suppose that x1 ∈ R(C) and that

x1 /∈ R(Wr(0, T )) for some T > 0. It follows from x1 ∈ R(C) that x1 = Cη1 for some η1 ∈ Rmn.




On the other hand, as Wr(0, T ) is symmetric, we have R(Wr(0, T )) ⊕ N (Wr(0, T )) = Rn, and

so x1 /∈ R(Wr(0, T )) implies that x1 = xr + xn with xr ∈ R(Wr(0, T )), xn ∈ N (Wr(0, T )), andxn = 0. As Wr(0, T )xn = 0, we have x

TnWr(0, T )xn = 0. Hence it follows from

xTn Wr(0, T )xn =

T

0

xTneA(T −τ )

BBTeA

T(T −τ )xn dτ =

T

0

BTeA

T(T −τ )xn2dτ

and the nonnegativity and continuity of BTeA

T(T −τ )xn2 in τ that x

TneA(T −τ )

B = 0 for allτ ∈ [0, T ]. Using the fact that (d/dt)eAt = AeAt = eAt

A, we obtain xTneA(T −τ )

Ak

B = 0 for allk ≥ 0. In particular, evaluating these equalities at τ = T yields that

xnB = xTn AB = · · · = x

Tn A

n−1B = 0 or x

TnC = 0.

However, x1 = Cη1 for some η1 and xTn xr = 0, so x

Tn xn = x

Tn x1 = x

TnCη1 = 0. This contradicts xn

being nonzero. Therefore, we must have that x1 ∈ R(C) implies x1 ∈ R(Wr(0, T )) for all T > 0.This completes the proof that the equality R(C) = R(Wr(0, T )) holds for every T > 0.

A consequence of the above lemma is that, if the system is reachable, the time T in which the

state is transferred to the desired state can be made arbitrarily small. Thus, it is theoreticallypossible to accomplish the desired state transfer instantaneously. However, in general, the fasterthe transfer, the larger the control magnitude required.

Lemma 8.12 The subspace R(C) is invariant under A (and hence under eAt for all t ∈ R).

Proof . If x ∈ R(C), then x = Cη for some η ∈ Rmn, so x =

n−1k=0 A

kBηk for some η0,

. . . , ηn−1 ∈ Rm. Thus, by the Cayley-Hamilton theorem, we have

Ax =

nk=1

Ak

Bηk =

n−1k=0

β kAk

Bηk = C

β 0η

T0 β 1η

T1 · · · β n−1η

Tn−1

T

for some β 0, . . . , β n−1 ∈ R, which implies Ax ∈ R(C). Applying the Cayley-Hamilton theoremonce more gives that R(C) is invariant under eAt as well regardless of t ∈ R.

Lemma 8.13 Rr = Rc = R(Wr(0, T )) = R(Wc(0, T )) = R(C) for every T > 0.

Proof . By Lemmas 8.7 and 8.11, we have Rr = R(Wr(0, T )) = R(C) for every T > 0, so itsuffices to show that R(Wc(0, T )) = R(C) for every T > 0. Suppose x ∈ R(C). Using Lemma 8.12and the Cayley-Hamilton theorem, we deduce that R(C) is invariant under eAt for every t ∈ R, sothat eAT x ∈ R(C) for all T > 0. Then Lemma 8.11 implies eAT x ∈ R(Wr(0, T )) for all T > 0.Whenever eAT

x = Wr(0, T )η for some η ∈ Rn, we have

x = e−AT Wr(0, T )η = e−AT

Wr(0, T )e−ATT eA

TT η = Wc(0, T )eA

TT η.

Thus x ∈ R(Wc(0, T )) for all T > 0. This establishes that R(C) ⊂ R(Wc(0, T )) for all T > 0.Conversely, suppose x ∈ R(Wc(0, T )) for some T > 0. Then reversing the above argument showsthat eAT

x ∈ R(C). Since R(C) is invariant under e−AT , we deduce x ∈ R(C). Thus we haveR(Wc(0, T )) ⊂ R(C) for all T > 0. This completes the proof.

Theorem 8.14 For the system (8.1), the following are equivalent:

(a) The pair (A, B) is reachable;




(b) The pair (A, B) is controllable;

(c) The reachability Gramian Wr(0, T ) > 0 for every T > 0;

(d) The controllability Gramian Wc(0, T ) > 0 for every T > 0;

(e) The controllability matrix C is of full row rank; that is, rank C = n;

(f) For every T > 0, the rows of eAtB (as functions of t) are linearly independent on [0, T ];

(g) The rows of (sI − A)−1B are linearly independent.

Moreover, if any of these conditions holds, then for any x0, x1 ∈ Rn and any T ∈ (0, ∞), an

input u that transfers the state x from x(0) = x0 to x(T ) = x1 is given by

u(t) = BTeA

T(T −t)Wr(0, T )−1

x1 − eAT

x0

= −B

Te−ATt

Wc(0, T )−1

x0 − e−AT x1

, t ∈ [0, T ).

Proof . The equivalence of conditions (a)–(e) follows from Lemma 8.13. Since (sI − A)−1B

is the Laplace transform of eAtB, t ≥ 0, the equivalence of (f) and (g) follows from the factthat the Laplace transformation is a linear one-to-one correspondence (which implies that a linearcombination of the rows of (sI − A)−1

B is zero if and only if that same linear combination of the rows of eAtB is zero for all t). The forms of u in the last part of the theorem are fromTheorems 8.4 and 8.8. It remains to show the equivalence of (a) and (f). Suppose (a) holds but (f)does not. Then there exists a T > 0 and an x1 = 0 such that x

T1 eAt

B = 0 for every t ∈ [0, T ]. Onthe other hand, since every state is reachable within every time span, there exists a u such thatx1 =

T

0 eA(T −τ )Bu(τ ) dτ . However, as x

T1 x1 =

T

0 xT1 eA(T −τ )

Bu(τ ) dτ with xT1 eA(T −τ )

B = 0 forall τ ∈ [0, T ], we must have x1 = 0, which is a contradiction. Thus (a) implies (f). Conversely,to show (f) implies (a), suppose (a) does not hold. Then, due to the equivalence of (a) and (c),there exists a T > 0 and an x1 = 0 such that x

T1 Wr(0, T ) = 0. This implies x

T1 Wr(0, T )x1 = T

0 B

T

eAT(T −τ )

x12

dτ = 0, which in turn implies x

T1 eAt

B = 0 for all t ∈ [0, T ]. Thus (f) doesnot hold. That is, if (f ) holds, then (a) must hold as well. This establishes the equivalence of (a)and (f), and hence completes the proof.

For example, consider the pairs

(A1, B1) =

1 00 1

,

10

and (A2, B2) =

0 0 0

1 0 00 0 1

,

1 0

0 00 1

.

Their controllability matrices are

C1

= B1

A1

B1 = 1 1

0 0 and C

2 = B

2 A

2B

2 A

2

2B

2 =

1 0 0 0 0 00 0 1 0 0 00 1 0 1 0 1

.

The pair (A1, B1) is not controllable because rank C1 = 1 < 2; indeed, only the states withinR(C1) = span{[1 0]T} are controllable. On the other hand, we have rank C2 = 3, so the pair(A2, B2) is controllable. It follows from

eA2t =

1 0 0

t 1 00 0 et

, eA

T

2 t =

eA2t

T=

1 t 0

0 1 00 0 et

,




Wc(0, T ) =

T

0e−A2τ B2B

T2 e−A

T

2 τ dτ =

T −T 2/2 0

−T 2/2 T 3/3 00 0 (1 − e−2T )/2

,

and Wc(0, T )−1 =

4/T 6/T 2 06/T 2 12/T 3 0

0 0 −2/(e−2T

− 1)

that, for any x0, x1 ∈ R3 and any T > 0, the following input signal will drive the state of the

system x(t) = A2x(t) + B2u(t) from x(0) = x0 to x(T ) = x1:

u(t) = −BT2 e−A

T

2 t

Wc(0, T )−1(x0 − e−A2T x1)

=

− 4

T + 6t

T 2 − 6

T 2 + 12t

T 3 0

0 0 2e−t

e−2T −1

x0 −

1 0 0

−T 1 00 0 e−T

x1

, t ∈ [0, T ).

8.5 Controllability of Discrete-Time Linear Systems

In this section, we focus on the linear time-invariant state equation

x(t + 1) = Ax(t) + Bu(t), (8.5)

where A ∈ Rn×n and B ∈ R

n×m. The solution to this difference equation subject to x(0) = x0 canbe written as

x(t) = At

x0 +t−1τ =0

At−τ −1

Bu(τ ) = At

x0 + CtU(0, t)

for all t = 1, 2, . . . , where

Ct = B AB · · · At−1

B ,

U(0, t) =

u(t − 1)T u(t − 2)T · · · u(0)T

T .

The definitions of reachability and controllability are analogous to their continuous-time coun-terparts. It is readily seen that the reachable subspace Rr and controllable subspace Rc for thesystem (8.5) are

Rr = {x1 ∈ Rn : x1 ∈ R(CK ), K = 1, 2, . . . },

Rc = {x0 ∈ Rn : A

K x0 ∈ R(CK ), K = 1, 2, . . . }.

The discrete-time results presented in this section can be proved analogously to their continuous-time counterparts, so their proofs are omitted. The reachability Gramian of the system (8.5) is

given by

Wr(0, K ) =K −1τ =0

AK −(τ +1)

BBT

ATK −(τ +1)

= CK CTK .

In particular, Wr(0, n) = CCT, where C = Cn is the controllability matrix. Unlike the continuous-

time case, the discrete-time state transition matrix Φ(t, t0) = At−t0 may be singular. Therefore,unless A is nonsingular, the controllability Gramian for the system (8.5) is not defined. A con-sequence is that, although reachability implies controllability, controllability does not guaranteereachability in the case of discrete-time systems.




Theorem 8.15 A state x1 ∈ Rn is reachable if and only if

x1 ∈ R(Wr(0, K ))

for some K ∈ {1, 2, . . . }. With such a K , an input u that transfers the state x from x(0) = 0 tox(K ) = x1 is given by

u(t) = BT

AK −t−1

Tη1, t = 0, . . . , K − 1,

where η1 ∈ Rn is such that Wr(0, K )η1 = x1, or by U(0, K ) such that

CK U(0, K ) = x1.

Theorem 8.16 A state x0 ∈ Rn is controllable if and only if

AK

x0 ∈ R(Wr(0, K ))

for some K ∈ {1, 2, . . . }. With such a K , an input u that transfers the state x from x(0) = x0 tox(K ) = 0 is given by

u(t) = −

BTA

K −t−1T

η0, t = 0, . . . , K − 1,where η0 ∈ R

n is such that Wr(0, K )η0 = AK

x0, or by U(0, K ) such that

CK U(0, K ) = −AK

x0.

Lemma 8.17 The system (8.5) satisfies Rr = R(Wr(0, K )) = R(C) for every K ≥ n.

A consequence of the above lemma is that, in the case of reachable discrete-time systems, thenumber of steps K required to transfer the state to the desired state can be taken to be equal to n.


(a) The pair (A, B) is reachable;

(b) The reachability Gramian Wr(0, K ) > 0 (i.e., rank Wr(0, K ) = n) for K = n, n + 1, . . . ;

(c) The controllability matrix C has full row rank (i.e. rank C = n).

Moreover, if any of these conditions hold, then there exists an input that will transfer any state x0

to any other state x1 in some finite time K . Such an input can be taken to be

u(t) = BT

ATn−t−1

Wr(0, n)−1(x1 − An

x0), t = 0, . . . , n − 1,

with K = n, or equivalently, to satisfy

U(0, n) = CT

CC

T

−1

(x1 − An

x0).

Theorem 8.19 For the system (8.5), we have Rr ⊂ Rc (i.e., reachability implies controllability).In particular, if A is nonsingular, then Rc = Rr.

References






IX. Observability of Linear Systems



Fall 2010

9.1 Introduction

Consider the linear time-invariant system

˙x

(t) = Ax

(t) +Bu

(t),y(t) = Cx(t) + Du(t),

(9.1)

where A ∈ Rn×n, B ∈ R

n×m, C ∈ Rl×n, and D ∈ R

l×m are constant matrices. For t, t0 ∈ R, theoutput y(t) of the system is given by

y(t) = CeA(t−t0)x(t0) +

t

t0

CeA(t−τ )Bu(τ ) dτ + Du(t). (9.2)

We are concerned with the problem of determining the state of the system (9.1) based on the input-output data. This is an “observability” problem, which needs to be addressed before solving thecontrollability problem. Consider, for instance, the track-seeking control problem for a hard disk

drive, where the magnetic head needs to be steered from an arbitrary initial state x(0) to a desiredterminal state x1 in a time span of T seconds. One strategy to approach this problem is to partitionthe time interval [0, T ) into two subintervals [0, T 1) and [T 1, T ) with some T 1 ∈ (0, T ) and solvethe observability problem during the first subinterval and solve the controllability problem duringthe second subinterval. That is, during the first subinterval [0, T 1), one takes the measurementsy(t), t ∈ [0, T 1], and determine x(0), and hence x(T 1), based on them. Then during the secondsubinterval [T 1, T ), one steers the state from x(T 1) to x(T ) = x1 in the time span of T − T 1 seconds.

9.2 Unobservable States

Definition 9.1 A state x0 ∈ Rn of the system (9.1) is said to be unobservable if

CeAt

x0 = 0

for all t ≥ 0. That is, a state x0 is unobservable if the zero-input response of the system (9.1) is zero for all time t ≥ 0 whenever x(0) = x0.

Definition 9.2 The observability Gramian of the system (9.1) is the matrix

Wo(t0, t1) =

t1

t0

eAT(τ −t0)CTCeA(τ −t0) dτ ∈ R

n×n, t0 ≤ t1.

1




Clearly, we have that Wo(t0, t1) ≥ 0 (i.e., Wo(t0, t1) is symmetric and nonnegative definite) for allt0, t1 ∈ R with t0 ≤ t1.

Lemma 9.3 A state x0 ∈ Rn is unobservable if and only if x0 ∈ N (Wo(0, T )) for all T > 0.

Proof . If Wo(0, T )x0 = 0 for all T > 0, then

xT0 Wo(0, T )x0 =

T

0

CeAτ x02 dτ = 0

for all T > 0. Because CeAτ x02 is continuous in τ , this implies that CeAτ x0 = 0 for all τ ≥ 0,and hence that x0 is unobservable. Conversely, if x0 is unobservable, then CeAtx0 = 0 for all t ≥ 0,and so

Wo(0, T )x0 =

T

0eA

Tτ CTCeAτ x0 dτ = 0

regardless of T > 0. This implies x0 belongs to the null space of Wo(0, T ) for every T > 0.

A consequence of Lemma 9.3 is that the unobservable subspace (i.e., the subspace of all unob-servable states) is given by

Ro = {x0 ∈ Rn : x0 ∈ N (Wo(0, T )) for all T > 0}.

Note that the unobservable subspace Ro is not equal to the set

{x0 ∈ Rn : x0 ∈ N (Wo(0, T )), T > 0},

which only requires that x0 ∈ N (Wo(0, T )) for “some” T > 0; for x0 to be unobservable, it isrequired that x0 ∈ N (Wo(0, T )) for “every” T > 0.

Theorem 9.4 If Wo(0, T ) > 0 for some T ∈ (0, ∞), then the unobservable subspace Ro = {0}and the state x0 at time t = 0 is given by

x0 = Wo(0, T )−1 T

0 eAT

t

CT

y(t) − t0 Ce

A(t−τ )

Bu(τ ) dτ − Du(t)

dt. (9.3)

Proof . If Wo(0, T ) > 0, then Wo(0, T ) is nonsingular with rank Wo(0, T ) = n, and hence N (Wo(0, T )) = {0}. Lemma 9.3 implies that Ro = {0} in this case. It follows from (9.2) witht0 = 0 and x(t0) = x0 that

T

0eA

TtCT

y(t) −

t

0CeA(t−τ )Bu(τ ) dτ − Du(t)

dt =

T

0eA

TtCTCeAtx0 dt = Wo(0, T )x0,

which gives the desired result.

Theorem 9.4 indicates that, if the observability Gramian Wo(0, T ) > 0 for some T > 0, thenthe initial state x(0) can be uniquely determined based on the (present and future) output y(t)

and input u(t) over t ∈ [0, T ].For example, consider the pair

(A, C) =

0 1 0

0 0 00 0 1

,

1 0 00 0 1

⇒ Wo(0, T ) =

T

0eA

Tτ CTCeAτ dτ =

T T 2/2 0

T 2/2 T 3/3 00 0 (e2T − 1)/2

, T > 0.




We have that det Wo(0, T ) = T 4(e2T − 1)/24 > 0 for T > 0, so Wo(0, T ) is nonsingular, orequivalently (because Wo(0, T ) ≥ 0), Wo(0, T ) > 0 for T > 0. Based on the zero-input responsey(t) of the system (9.1) over t ∈ [0, T ] for any T > 0, one can uniquely determine the state x(0) as

x(0) = Wo(0, T )−1 T

0

eATtCTy(t) dt =

4/T −6/T 2 0−6/T 2 12/T 3 0

0 0 2/(e2T − 1)

T

0

1 0t 0

0 et

y(t) dt.

If the zero-input response is

y(t) = CeAtx(0) =

1 0 00 0 1

1 t 00 1 00 0 et

−1

12

=

t − 12et

for t ∈ R, then indeed we obtain

x(0) =

4/T −6/T 2 0

−6/T 2 12/T 3 0

0 0 2/(e

2T

− 1)

T 2/2 − T

T 3/3 − T 2/2

e

2T

− 1

=

−11

2

using the above integral equation.

9.3 Unconstructible States

Definition 9.5 A state x1 ∈ Rn of the system (9.1) is said to be unconstructible if

CeAtx1 = 0

for all t ≤ 0. That is, a state x1 is unconstructible if the zero-input response of the system (9.1) is zero for all time t ≤ 0 whenever x(0) = x1.

Definition 9.6 The constructibility Gramian of the system (9.1) is the matrix

Wcn(t0, t1) =

t1

t0

eAT(τ −t1)CTCeA(τ −t1) dτ ∈ R

n×n, t0 ≤ t1.

Clearly, we have that Wcn(t0, t1) ≥ 0 for all t0, t1 ∈ R with t0 ≤ t1. The observability Gramianand the constructibility Gramian are related via the identity

Wo(t0, t1) = eAT(t1−t0)Wcn(t0, t1)eA(t1−t0).

Lemma 9.7 A state x1 ∈ Rn is unconstructible if and only if x1 ∈ N (Wcn(0, T )) for all T > 0.

Proof . The proof is analogous to that of Lemma 9.3.

By Lemma 9.7 we see that the unconstructible subspace is given by

Rcn = {x1 ∈ Rn : x1 ∈ N (Wcn(0, T )) for all T > 0}.

Theorem 9.8 If Wcn(0, T ) > 0 for some T ∈ (0, ∞), then the unconstructible subspace Rcn = {0}and the state x1 at time t = T is given by

x1 = Wcn(0, T )−1 T

0eA

T(t−T )CT

y(t) −

t

T

CeA(t−τ )Bu(τ ) dτ − Du(t)

dt. (9.4)




Proof . If Wcn(0, T ) > 0, then Wcn(0, T ) is nonsingular with rank Wcn(0, T ) = n, and hence N (Wcn(0, T )) = {0}. Lemma 9.7 implies that Rcn = {0} in this case. It follows from (9.2) witht0 = T and x(t0) = x1 that

T

0

eAT(t−T )CT

y(t) −

t

T

CeA(t−τ )Bu(τ ) dτ − Du(t)

dt =

T

0

eAT(t−T )CTCeA(t−T )x1 dt

= Wcn(0, T )x1,

which gives the desired result.

Theorem 9.8 says that, if the constructibility Gramian Wcn(0, T ) > 0 for some T > 0, then thestate x(T ) at time T can be uniquely determined based on the (past and present) output y(t) andinput u(t) over t ∈ [0, T ]. To illustrate this result, consider the quadruple

(A, B, C, D) =

0 1 0

0 0 00 0 1

,

0

10

,

1 0 00 0 1

,

00

.

In this example, the constructibility Gramian is

Wcn(0, T ) =

T

0eA

T(τ −T )CTCeA(τ −T ) dτ =

T −T 2/2 0

−T 2/2 T 3/3 00 0 (1 − e−2T )/2

, T > 0.

We have that det Wcn(0, T ) = T 4(1 − e−2T )/24 > 0 for T > 0, so Wcn(0, T ) is nonsingular as wellas Wcn(0, T ) ≥ 0, or equivalently, Wcn(0, T ) > 0 for T > 0. Suppose

x(0) = 0; u(t) = 1, t ≥ 0;

y(t) = t

0

CeA(t−τ )Bu(τ ) dτ = t

0

1 0 0

0 0 1

1 t − τ 00 1 0

0 0 et−τ

01

0

dτ =

t2/2

0 , t ≥ 0.

Then the state x(T ) is uniquely determined (without knowing x(0) = 0) as

x(T ) = Wcn(0, T )−1 T

0eA

T(t−T )CT

y(t) −

t

T

CeA(t−τ )Bu(τ ) dτ

dt

=

4/T 6/T 2 0

6/T 2 12/T 3 00 0 2/(1 − e−2T )

T

0

1 0

t − T 00 et−T

t2/2

0

−

t

T

t − τ

0

dτ

dt

=

T 2/2

T

0

.

Indeed, x(T ) = T

0 eA(T −τ )Bu(τ ) dτ = [T 2/2 T 0]T.

9.4 Observability of Continuous-Time LTI Systems

Definition 9.9 The system (9.1) is said to be observable if there exists T ∈ (0, ∞) such that x(0)can be uniquely determined based on y(t), u(t), t ∈ [0, T ]. Similarly, it is said to be constructibleif there exists T ∈ (0, ∞) such that x(T ) can be uniquely determined based on y(t), u(t), t ∈ [0, T ].




Definition 9.10 The observability matrix of the system (9.1) is

O =

C

CA...

CAn−1

∈ Rln×n.

Lemma 9.11 The system (9.1) satisfies N (Wo(0, T )) = N (O) for every T > 0.

Proof . Suppose x0 ∈ N (Wo(0, T )) for some T > 0, so that Wo(0, T )x0 = 0. Then

xT0 Wo(0, T )x0 =

T

0CeAτ x02 dτ = 0

for some T > 0. Because CeAτ x02 is continuous in τ , this implies that CeAτ x0 = 0 for allτ ∈ [0, T ]. Taking the derivatives of CeAτ x0 with respect to τ , we have that

CeAτ

x0 = CAeAτ

x0 = · · · = CA

n−1

eAτ

x0 = 0

for all τ ∈ [0, T ]. In particular, these equalities at τ = 0 give

Cx0 = CAx0 = · · · = CAn−1x0 = 0.

Thus Ox0 = 0, or x0 ∈ N (O). This shows that N (Wo(0, T )) ⊂ N (O) for each T > 0. Conversely,suppose x0 ∈ N (O). Then we have CAkx0 = 0 for k = 0, . . . , n − 1, which, along with theCayley-Hamilton theorem, implies CAkx0 = 0 for all k = 0, 1, . . . . Since eAτ =

∞

k=0(tk/k!)Ak,we have CeAτ x0 = 0 for all τ ∈ R, which implies Wo(0, T )x0 = 0 regardless of T > 0. Thus

N (O) ⊂ N (Wo(0, T )) for every T > 0.

Lemma 9.12 The system (9.1) satisfies N (Wcn(0, T )) = N (O) for every T > 0.

Proof . Suppose Wcn(0, T )x1 = 0 for some T > 0. Then xT1 Wcn(0, T )x1 = 0, or equivalently

T

0 CeA(τ −T )x12 dτ = 0. Since CeA(τ −T )x12 is continuous in τ , this implies CeA(τ −T )x1 = 0for all τ ∈ [0, T ]. Taking the derivatives of both sides of this equation with respect to τ we obtain

CeA(τ −T )x1 = CAeA(τ −T )x1 = · · · = CAn−1eA(τ −T )x1 = 0;

in particular, putting τ = T gives Cx1 = CAx1 = · · · = CAn−1x1 = 0. Thus Ox1 = 0, whichimplies N (Wcn(0, T )) ⊂ N (O) for every T > 0. Conversely, suppose Ox1 = 0. Then, by theCayley-Hamilton theorem, we have CAkx1 = 0 for all k = 0, 1, ..., so with any T > 0 we have

0−T

eATtCTCeAtx1 dt = T 0

eAT(τ −T )CTCeA(τ −T ) dtx1 = Wcn(0, T )x1 = 0.

Thus, Wcn(0, T )x1 = 0 for every T > 0.

Lemma 9.13 Ro = Rcn = N (Wo(0, T )) = N (Wcn(0, T )) = N (O) for every T > 0.

Proof . By Lemmas 9.3 and 9.11 we have Ro = N (Wo(0, T )) = N (O) for all T > 0. Similarly,by Lemmas 9.7 and 9.12 we have Rcn = N (Wcn(0, T )) = N (O) for all T > 0.




Lemma 9.14 The system (9.1) is observable if and only if Ro = {0}. Similarly, it is constructible if and only if Rcn = {0}.

Proof . Suppose Ro = {0}. Then by Lemma 9.13 we have Wcn(0, T ) > 0 for every T > 0,and hence by Theorem 9.4 the system is observable. Now, suppose Ro = {0}. Then, since Ro is asubspace of Rn, there exists a nonzero x0 ∈ R

n such that CeAtx0 = 0 for all t ≥ 0. Choose anyx0 ∈ Rn. Then the zero-input response y(t), t ≥ 0, for x(0) = x0 will be exactly the same as thatfor x(0) = x0 + x0. Thus the system is not observable. This proves that the system is observableif and only if Ro = {0}. The fact that the system is constructible if and only if Rcn = {0} can beproved in a similar manner using Lemma 9.13 and Theorem 9.8.

A consequence of Lemmas 9.13 and 9.14, and Theorem 9.4 is that, if the system is observable,the time span T over which the input and output are observed and collected to determine theinitial state can be made arbitrarily small. Thus, it is theoretically possible to accomplish stateobservation instantaneously. However, a very small T may lead to a nearly singular observabilityGramian, which can lead to numerical difficulties. Moreover, Lemmas 9.13 and 9.14 suggest thatobservability and constructibility depend only on the matrices A and C. Therefore, one can speakof the observability and constructibility of the pair (A, C) to mean those of the system (9.1).

Theorem 9.15 For the system (9.1), the following are equivalent:

(a) The pair (A, C) is observable;

(b) The pair (A, C) is constructible;

(c) The observability Gramian Wo(0, T ) > 0 for every T > 0;

(d) The constructibility Gramian Wcn(0, T ) > 0 for every T > 0;

(e) The observability matrix O is of full column rank; that is, rankO = n;

(f) For every T > 0, the columns of CeAt (as functions of t) are linearly independent on [0, T ];

(g) The columns of C(sI − A)−1 are linearly independent.

Moreover, if condition (c) holds, then the state x0 at t = 0 is given by (9.3) for each T > 0, and,if condition (d) holds, then the state x1 at t = T is given by (9.4) for each T > 0.

Proof . The equivalence of (a)–(e) follows from Lemmas 9.13 and 9.14. Since C(sI − A)−1 isthe Laplace transform of CeAt, t ≥ 0, the equivalence of (f) and (g) follows from the fact that theLaplace transformation is a linear one-to-one correspondence. The form of x0 in the last part of the theorem is due to Theorem 9.4. Thus, to complete the proof, it suffices to show that (c) and (f)are equivalent. Suppose (c) holds but (f) does not. Then there exists a T > 0 and an x0 = 0 suchthat CeAtx0 = 0 for all t ∈ [0, T ]. On the other hand, since Wo(0, T ) > 0 with such a T , the initial

state x(0) = x0 is uniquely recovered based on the zero-input response y(t), t ∈ [0, T ], via

x0 = Wo(0, T )−1 T

0eA

TtCTy(t) dt = Wo(0, T )−1

T

0eA

TtCTCeAtx0 dt,

which implies x0 = 0, which contradicts the supposition that x0 = 0. This proves that (c) im-plies (f). To prove the converse, suppose (f) holds but not (c). Then, for some T > 0 and x0 = 0,

we have Wo(0, T )x0 = 0, and hence xT0 Wo(0, T )x0 =

T

0 CeAτ x02 dτ , which implies CeAτ x0 = 0for all τ ∈ [0, T ]. This contradicts (f), and hence proves that (f) implies (c).




As an example, let us revisit

(A, C) =

0 1 0

0 0 00 0 1

,

1 0 00 0 1

.

This pair gives the observability matrix

O =

C

CA

CA2

=

1 0 00 0 10 1 00 0 10 0 00 0 1

,

whose rank is 3. Thus the pair (A, C) is observable (and constructible) as seen at the end of theprevious sections.

9.5 Observability of Discrete-Time Linear Systems

Consider the state equation

x(t + 1) = Ax(t) + Bu(t),

y(t) = Cx(t) + Du(t) (9.5)



l×m are constant matrices. The output y(t) of this system is given by

y(t) = CAt−t0x(t0) +t−1

s=t0

CAt−s−1Bu(s) + Du(t)

For all t, t0 ∈ Z with t ≥ t0. Thus we may write

Y(0, t) = Otx(0) + MtV(0, t)

for t > t0, where

Y(0, t) =

y(0)...

y(t − 1)

, V(0, t) =

u(0)...

u(t − 1)

,

and

Ot =

C

CA...

CAt−1

, Mt =

D 0 · · · 0

CB D · · · 0...

... . . .

...CAt−2B CAt−3B · · · D

.

Definition 9.16 The system (9.5), or the pair (A, C), is said to be observable in discrete time if there exists K ∈ {1, 2, . . . } such that x(0) can be uniquely determined based on y(t), u(t), t ∈{0, . . . , K − 1}.




The observability Gramian of the system (9.5) is given by

Wo(0, K ) =

K −1τ =0

AT

τ CTCAτ = O

TK OK ;

in particular, Wo(0, n) = O

T

O, where O = On is the observability matrix.

Theorem 9.17 The system (9.5) is observable if and only if

OTK OK > 0 (i.e., rank

O

TK OK

= n )

for some K ≤ n. In this case, the state x0 at time t = 0 is given by

x0 =O

TK OK

−1O

TK

Y(0, K ) − MK V(0, K )

.

A consequence of the above theorem is that, in the case of observable discrete-time systems, thenumber of steps K required to determine the initial state can be taken to be at most equal to n.


(a) The pair (A, C) is observable in discrete time;

(b) The observability Gramian OTK OK > 0 (i.e., rank

O

TK OK

= n) for K = n, n + 1, . . . ;

(c) The observability matrix O has full column rank (i.e. rankO = n).

Moreover, if any of these conditions holds, then the state x0 at t = 0 is given by

x0 = (OTO)−1OT

Y(0, n) − MnV(0, n)

.

References






X. Structural Properties of Linear Systems (Part 1)



Fall 2010

10.1 Introduction

Recall that a linear map, or its matrix representation, between finite-dimensional vector spaces aredecomposed into Jordan blocks, which completely characterize the stability properties of the linear

map. Similar things can be done to linear dynamical systems, or their state-space representations.Consider the linear time-invariant system

x(t) = Ax(t) + Bu(t),

y(t) = Cx(t) + Du(t), (10.1)



l×m are given matrices. If the system (10.1)is not controllable (i.e., not reachable), then it is possible to separate the controllable part of thesystem from the uncontrollable part. Similarly, if the system is not observable, then it is possibleto separate the observable part from the unobservable part.

10.2 Standard Form for Uncontrollable Systems

Lemma 10.1 Let C be the controllability matrix of the pair (A, B). If rankC = nr, then there exists a nonsingular matrix Q ∈ R

n×n such that

Q−1AQ =

A1 A12

0 A2

and Q−1B =

B1

0

, (10.2)

where A1 ∈ Rnr×nr , B1 ∈ R

nr×m, and the pair (A1, B1) is controllable.

Proof . Let {v1, . . . , vnr} be a basis for the controllable subspace R(C). Since R(C) is in-

variant under A, there exists a unique matrix A1 ∈ Rnr×nr , whose columns are the coordinaterepresentations of Av1, . . . , Avnr

with respect to {v1, . . . , vnr}, such that

A v1 · · · vnr = v1 · · · vnrA1.

On the other hand, if an x ∈ Rn is such that x = Bu for some u ∈ Rm, then

x = [B AB · · · An−1B][uT 0 · · · 0]T,

so R(B) ⊂ R(C). Thus there exists a unique B1 ∈ Rnr×m , whose columns are the coordinate

representations of the columns of B with respect to {v1, . . . , vnr}, such that

B =

v1 · · · vnr

B1.

1




Now, choosing any linearly independent vectors qnr+1, . . . , qn such that

Q =

v1 · · · vnr qnr+1 · · · qn

is nonsingular, we have

A v1 · · · vnr = Q A1

0 and A qnr+1 · · · qn = Q A12

A2 for some A12 ∈ Rnr×(n−nr) and A2 ∈ R

(n−nr)×(n−nr); similarly

B = Q

B1

0

.

This leads to (10.2). It remains to show that (A1, B1) is controllable. The controllability matrix C of the pair

Q−1AQ, Q−1B

is given by

C =

B1 A1B1 · · · An−1

1 B1

0 0 · · · 0

= Q−1

C. (10.3)

Here, the first equation in (10.3) implies that the rank of C is equal to the rank of B1 A1B1 · · · An−1

1 B1

,

where n ≥ nr. By the Cayley-Hamilton theorem, however, the columns of this matrix are linearcombinations of the columns of the controllability matrix of (A1, B1) given by

C1 =

B1 A1B1 · · · Anr−11 B1

∈ R

nr×mnr ,

whose rank is at most nr. On the other hand, the second equation in (10.3) says that rank C =rankC = nr. Therefore, we have rank C1 = nr, and hence the pair (A1, B1) is controllable.

If Q is as in Lemma 10.1, then the change of state variables given by Qx(t) = x(t) results inan equivalent system of the form ˙x1(t)

˙x2(t)

=A1 A12

0 A2

x1(t)x2(t)

+B1

0

u(t),

y(t) =

C1 C2

x1(t)x2(t)

+ Du(t),

(10.4)

where x(t) = [x1(t)T x2(t)]T with x1(t) ∈ Rnr and x2(t) ∈ R

n−nr , and where (A1, B1) is con-trollable. This form of state-space representations is called the standard form for uncontrollable systems . The eigenvalues of A1 (resp. A2) are called controllable eigenvalues (resp. uncontrollable eigenvalues ) of the system (10.1). Recall that the inverse Laplace transform of (sI − A)−1 yields

e

At

=

p

i=1

mi−1

k=0

Aikt

k

e

λit

with

Aik = 1

k!(mi − 1 − k)! lims→λi

(s − λi)

mi(sI − A)−1mi−1−k

,

where λ1, . . . , λ p are distinct eigenvalues of A and m1, . . . , m p are their algebraic multiplicities,and where the terms tkeλit are called the modes of the system (10.1). In particular, the modes of the system corresponding to controllable eigenvalues (resp. uncontrollable eigenvalues) are calledcontrollable modes (resp. uncontrollable modes ) of the system.




Lemma 10.2 The input-output description of the system (10.1) is determined solely by its control-lable part; that is, if (10.1) and (10.4) are equivalent, then the transfer function matrix of (10.1)is given by

H(s) = C(sI − A)−1B + D = C1(sI − A1)−1B1 + D.

Proof . If Q is as in Lemma 10.1, then we have

eQ−1AQτ = Q−1eAτ Q and e

A1 A12

0 A2

τ

=

eA1τ ∗

0 eA2τ

,

where the symbol ∗ denotes the block that we do not care. Hence

CeAτ B =

CQ

Q−1eAτ Q

Q−1B

=

C1 C2

eA1τ ∗0 eA2τ

B1

0

= C1eA1τ B1,

from which the result follows.

As a simple example, consider A = [ 1 10 0 ], B = [ 1

0 ], and C = [ 1 2 ]. Then C = [ 1 10 0 ], so the

reachable subspace R(C) = span{v1} with v1 = [ 10 ]. Choose v2 = [ 0

1 ], and let Q = [ v1 v2 ] = [ 1 00 1 ]

to obtain Q−1AQ = A1 A12

0 A2 = [ 1 1

0 0 ], Q−1B = B1

0 = [ 1

0 ], and CQ = [ C1 C2 ] = [ 1 2 ]. Thus,

among the two eigenvalues (namely, 1 and 0) of the system, the eigenvalue 1 is controllable and theeigenvalue 0 is uncontrollable. Indeed, the transfer function of the system is

H(s) = C(sI − A)−1B =

1s−1

2s−1s(s−1)

10

=

1

s − 1 = C1(sI − A1)−1B1.

Since rankO = rank[ 1 21 1 ] = 2, both eigenvalues are observable. However, the uncontrollable

eigenvalue 0 is never excited by the input, and hence does not appear as a system pole.

10.3 Standard Form for Unobservable Systems

Lemma 10.3 Let O be the observability matrix of the pair (A, C). If rankO = no, then there exists a nonsingular matrix Q ∈ R

n×n such that

Q−1AQ =

A1 0

A21 A2

and CQ =

C1 0

, (10.5)

where A1 ∈ Rno×no, C1 ∈ Rl×n0, and the pair (A1, C1) is observable.

Proof . Let {vno+1, . . . , vn} be a basis for the unobservable subspace N (O). Since N (O) isinvariant under A, there exists a unique matrix A2 ∈ R

(n−no)×(n−no), whose columns are thecoordinate representations of Avno+1, . . . , Avn with respect to {vno+1, . . . , vn}, such that

A vno+1 · · · vn = vno+1 · · · vn

A2.

On the other hand, if Ox = 0, then Cx = 0, so N (O) ⊂ N (C). Thus we have

C

vno+1 · · · vn

= 0.

Choose q1, . . . , qno be linearly independent vectors such that

Q =

q1 . . . qno vno+1 · · · vn




is nonsingular. Then we have

A

q1 · · · qno

= Q

A1

A21

and A

vno+1 · · · vn

= Q

0

A2

for some A1 ∈ R

no×no and A21 ∈ R(n−no)×no . Similarly, CQ = [C1 0] for some C1 ∈ R

l×no. This

gives (10.5). Moreover, the observability matrix O of the pair Q−1AQ, CQ is given by

O =

CT

1 AT1 CT

1 · · ·

An−11

TCT

1

0 0 · · · 0

T

= OQ.

Here, the first equation, along with the Cayley-Hamilton theorem, implies that rank O is equal tothe rank of the observability matrix

O1 =

CT1 AT

1 CT1 · · ·

An−1

1

TCT

1

T∈ R

lno×no

of (A

1,C

1), which is at most no; however, the second equation gives rank O = rankO

= no.Therefore, we have rankO1 = no, and hence (A1, C1) is observable.

If Q is as in Lemma 10.3, then the change of state variables given by Qx(t) = x(t) results inan equivalent system of the form

˙x1(t)˙x2(t)

=

A1 0

A21 A2

x1(t)x2(t)

+

B1

B2

u(t),

y(t) =

C1 0 x1(t)

x2(t)

+ Du(t),

(10.6)

where x(t) = [x1(t)T x2(t)]T with x1(t) ∈ Rno and x2(t) ∈ R

n−no, and where (A1, C1) is observable.This form of state-space representations is called the standard form for unobservable systems . The

eigenvalues of A1 (resp. A2) and the corresponding modes are called observable eigenvalues (resp.unobservable eigenvalues ) and observable modes (resp. unobservable modes ) of the system (10.1).

Lemma 10.4 The input-output description of the system (10.1) is determined solely by its observ-able part. That is, if (10.1) and (10.6) are equivalent, then the transfer function matrix of (10.1)is given by H(s) = C(sI − A)−1B + D = C1(sI − A1)−1B1 + D.

Proof . The proof parallels that of Lemma 10.2.

For example, suppose A = [ 1 10 0 ], B = [ 1

1 ], and C = [ 1 1 ]. Since O = [ 1 11 1 ], the unobservable

subspace N (O) = span{v2} with v2 =

1−1

. Choose a v1 = [ 10 ], and let Q = [ v1 v2 ] =

1 10 −1 to obtain Q

−

1AQ = A1

0A21 A2 = [ 1 00 0 ], Q−

1B = B1

B2 = 2−1 , and CQ = [ C1 0 ] = [ 1 0 ]. Thus,among the two eigenvalues of the system, the eigenvalue 1 is observable and the eigenvalue 0 isunobservable. Indeed, the transfer function of the system is

H(s) = C(sI − A)−1B =

1 1 s+1

s(s−1)1s

=

2

s − 1 = C1(sI − A1)−1B1.

Since rankC = rank[ 1 21 0 ] = 2, both eigenvalues are controllable. However, the unobservable eigen-

value 0 is never observed from the system output, and hence does not appear as a system pole.




10.4 Kalman Canonical Form of LTI Systems

The following theorem is called the Kalman decomposition theorem . The theorem unifies the de-composition lemmas for uncontrollable and unobservable linear time-invariant systems.

Theorem 10.5 Let C and O be the controllability and observability matrices of the triple (A, B, C).

If rankC = nr, rankO = no, and dim R(C) ∩ N (O) = nro, then there exists a nonsingular matrix Q ∈ Rn×n such that

Q−1AQ =

A11 0 A13 0

A21 A22 A23 A24

0 0 A33 0

0 0 A43 A44

, Q−1B =

B1

B2

0

0

, CQ =

C1 0 C3 0

, (10.7)

where A11 ∈ R(nr−nro)×(nr−nro), A22 ∈ Rnro×nro, A33 ∈ R

(no−(nr−nro))×(no−(nr−nro)), and A44 ∈R

((n−no)−nro)×((n−no)−nro), and such that the following hold:

(a) The pair (Ac, Bc) with

Ac = A11 0A21 A22

and Bc = B1B2

is controllable.

(b) The pair (Ao, Co) with

Ao =

A11 A13

0 A33

and Co =

C1 C3

is observable.

(c) The triple (A11, B1, C1) is controllable and observable.

Proof . Choose a basis {v1, . . . , vnr} for R(C) such that {vnr−nro+1, . . . , vnr} is a basis forR(C) ∩ N (O). Choose vectors vno+nro+1, . . . , vn such that

{vnr−nro+1, . . . , vnr, vno+nro+1, . . . , vn

n− no vectors

}

is a basis for N (O). Finally, choose qnr+1, . . . , qno+nro such that

Q =

v1 · · · vnr qnr+1 · · · qno+nro

vno+nro+1 · · · vn

is nonsingular. Then, since the first nr columns of Q span R(C), (the proof of) Lemma 10.1 impliesthat Q−1AQ, Q−1B, and CQ are of the following partitioned forms:

Q−1AQ =

A11

A12

A13

A14

A21 A22 A23 A24

0 0 A33 A34

0 0 A43 A44

, Q−1B =

B1

B2

0

0

, CQ =

C1 C2 C3 C4

. (10.8)

Let

T =

Inr−nro

0 0 0

0 0 Inro 0

0 In0−(nr−nro) 0 0

0 0 0 I(n−no)−nro

,




where Ik are the k-by-k identity matrices. Then

QT =

v1 · · · vnr−nro qnr+1 · · · qno+nro

vnr−nro+1 · · · vnr vno+nro+1 · · · vn

,

where the last n − no columns of QT span N (O). Thus (the proof of) Lemma 10.3, along with thefact that T−1 = TT, implies that Q−1AQ, Q−1B, and CQ are of the following partitioned forms:

Q−1AQ = T(QT)−1A(QT)T−1 = T

F11 F12 0 0F21 F22 0 0

F31 F32 F33 F34

F41 F42 F43 F44

T−1 =

F11 0 F12 0F31 F33 F32 F34

F21 0 F22 0

F41 F43 F42 F44

,

Q−1B = T(QT)−1B = T

G1

G2

G3

G4

=

G1

G3

G2

G4

,

CQ = C(QT)T−1

H1 H2 0 0

T−1 =

H1 0 H2 0

.

Comparing these equations with those in (10.8), we conclude that (10.7) holds. The pair (Ac, Bc)

is controllable by Lemma 10.1, and the pair (Ao, Co) is observable by Lemma 10.3. It follows from

rank

Bc AcBc · · · Anr−1c Bc

= rank

B1 A11B1 · · · Anr−1

11 B1

∗ ∗ · · · ∗

= nr,

along with the Cayley-Hamilton theorem, that

rank

B1 A11B1 · · · Anr−111 B1

= rank

B1 A11B1 · · · Anr−nro−1

11 B1

= nr − nro.

Thus the triple (A11, B1, C1) is controllable. Since (Ac, Bc, [C1 0]) is already in the standard formfor unobservable systems, we conclude that (A11, B1, C1) is observable as well.

If A11, A22, A33, and A44 are as in Theorem 10.5, then we have

det(sI − A) = det(sI − A11)det(sI − A22)det(sI − A33)det(sI − A44).

The eigenvalues of A11 are controllable and observable, those of A22 are controllable and unob-servable, those of A33 are uncontrollable and observable, and those of A44 are uncontrollable andunobservable.

Theorem 10.6 The input-output description of the system (10.1) is determined solely by its con-trollable and observable part. That is, if Q is as in (10.7), then the transfer function matrix of (10.1) is given by H(s) = C(sI − A)−1B + D = C1(sI − A11)−1B1 + D.

Proof . The result is an immediate consequence of Lemmas 10.2 and 10.4.

For example, consider

A =−1 −3 −2

1 3 31 1 0

, B =−1 0

1 10 −1

, C =1 1 0

2 2 1

.

The controllability and observability matrices are

C =

−1 0 −2 −1 −4 −11 1 2 0 4 20 −1 0 1 0 −1

, O =

1 2 0 1 1 21 2 0 1 1 20 1 1 2 0 1

T

.




The controllable(or reachable) subspace Rr = R(C), the unobservable subspace Ro = N (O),and their intersection Rro = R(C) ∩ N (O) are

Rr = span

01

−1

,

1−10

, Ro = span

1−10

, and Rro = span

1−10

,

respectively. Then

v1 =

01

−1

, v2 =

1−10

, q3 =

111

⇒ Q =

v1 v2 q3

=

0 1 11 −1 1

−1 0 1

.

This Q gives

Q−1AQ =

−1 0 −1

−1 2 −70 0 1

, Q−1B =

0 1

−1 00 0

, CQ =

1 0 21 0 5

.

Among the three eigenvalues of A, the eigenvalue −1 is both controllable and observable, theeigenvalue 2 is controllable but unobservable, and the eigenvalue 1 is observable but uncontrollable.Indeed, the transfer function matrix shows a single pole at s = −1:

H(s) = C(sI − A)−1B =

11

(s − (−1))−1 0 1

=

0 1/(s + 1)0 1/(s + 1)

.

10.5 Popov-Belevitch-Hautus (PBH) Tests

We know the system (10.1) is controllable if and only if its controllability matrix has full rowrank. Similarly, the system is observable if and only if its observability matrix has full rank. Thedecomposition results presented above reveal the structural properties of linear systems, and provideadditional tests for controllability and observability. These tests facilitate further insights into thestructural properties of linear systems, and will play an important role in controller synthesisproblems.

Theorem 10.7 (PBH Eigenvector Tests)

(a) The pair (A, B) is controllable if and only if no eigenvector v of AT satisfies BTv = 0.

(b) The pair (A, C) is observable if and only if no eigenvector v of A satisfies Cv = 0.

Proof . Suppose there exists an eigenvector v ∈ Cn of AT such that BTv = 0. Let λ ∈ C

be the eigenvalue associated with v. Then we have vTB = 0, vTAB = λvTB = 0, vTA2B =

λvTAB = 0, . . . , an d s o vTC = 0, where C is the controllability matrix of (A, B). That is,ℜ(v)T

C = ℑ(v)TC = 0. Since v = 0, we have ℜ(v) = 0 or ℑ(v) = 0, so rank C < n. This proves

that the pair (A, B) being controllable implies BTv = 0 for any eigenvector v of AT. Conversely,suppose (A, B) is not controllable, so that there exists a nonsingular Q satisfying (10.2), whereA2 ∈ R

(n−nr)×(n−nr) with nr < n. Choose an eigenvalue λ of AT2 , and let v be the corresponding

eigenvector of AT2 . Then putting v =

Q−1

T[0 vT]T ∈ R

n gives

ATv =

Q−1T

Q−1AQT

0v

=

Q−1T

AT1 0

AT12 AT

2

0v

=

Q−1T

0λv

= λv




and

BTv = BT

Q−1T

0v

=

Q−1BT

0v

=

BT1 0

0v

= 0.

This shows that the pair (A, B) is controllable whenever no eigenvector v of AT satisfies BTv = 0,and hence that part (a) of the theorem holds. The proof of part (b) is similar.

Corollary 10.8 (PBH Rank Tests)

(a) The pair (A, B) is controllable if and only if

rank

λI − A B

= n

for all eigenvalues λ of A (and hence for all λ ∈ C).

(b) A number λ ∈ C is an uncontrollable eigenvalue of A if and only if

rank

λI − A B

< n.

(c) The pair (A, C) is observable if and only if

rank

λI − A

C

= n

for all eigenvalues λ of A (and hence for all λ ∈ C).

(d) A number λ ∈ C is an unobservable eigenvalue of A if and only if

rank

λI − A

C

< n.

10.6 Kalman Decomposition of Discrete-Time Systems

For discrete-time LTI systems of the form

x(t + 1) = Ax(t) + Bu(t), t = 0, 1, . . . ;

y(t) = Cx(t) + Du(t), t = 0, 1, . . . ,

even though reachability implies controllability, controllability does not imply reachability. Simi-larly, observability implies constructibility, but the converse does not necessarily hold. Thus, wewill speak of the reachability (as well as observability) of discrete-time systems. Otherwise, allthe theorems in Sections 10.4 and 10.5, with “controllability” replaced by “reachability” and the“Laplace transform” H(s) by the “z-transform” H(z), carry over to discrete-time LTI systems

without further change.

References






XI. Structural Properties of Linear Systems (Part 2)



Fall 2010

11.1 Introduction

In this section, we continue to study the structural properties of LTI systems. In particular,we delineate how controllability and observability are related to the system response and system

stability. We will also learn that controllability and observability are dual notions, and hence thatthey are mathematically identical notions in a sense.

11.2 Controller Canonical Form

11.2.1 SISO Case

Consider the single-input single-output linear time-invariant system

x(t) = Ax(t) + bu(t),

y(t) = cx(t) + du(t),(11.1)

where A ∈ Rn×n, b ∈ R

n×1, and c ∈ R1×n.

Theorem 11.1 Let p(s) = sn + αn−1sn−1 + ⋯ + α1s + α0 be the characteristic polynomial of A; let Cbe the controllability matrix of (A, b). There exists a nonsingular matrix Q ∈ R

n×n such that

Q−1AQ =

⎡⎢⎢⎢⎢⎢⎢⎢⎣

0 1 ⋯ 0⋮ ⋮ ⋱ ⋮0 0 ⋯ 1

−α0 −α1 ⋯ −αn−1

⎤⎥⎥⎥⎥⎥⎥⎥⎦, Q−1b =

⎡⎢⎢⎢⎢⎢⎢⎢⎣

0⋮01

⎤⎥⎥⎥⎥⎥⎥⎥⎦, and cQ = β 0 β 1 ⋯ β n−1 (11.2)

for some β 0, . . . , β n−1 ∈ R if and only if rankC = n.

Proof . Suppose rankC = n. The system being SISO implies that C ∈ Rn×n, and hence that C is

invertible. Let q be the last row of C

−1

. Then C

−1C

= I impliesq b Ab ⋯ An−2b An−1b = 0 0 ⋯ 0 1 ;

that is, qb = qAb = ⋯ = qAn−2b = 0 and qAn−1b = 1. Define R = [qT (qA)T ⋯ (qAn−1)T]T. It isclear that

RC =

⎡⎢⎢⎢⎢⎢⎢⎢⎣

qb . . . qAn−2b qAn−1b

qAb ⋯ qAn−1b qAnb

⋮ ⋰ ⋮ ⋮qAn−1b ⋯ pA2n−3b qA2n−2b

⎤⎥⎥⎥⎥⎥⎥⎥⎦=

⎡⎢⎢⎢⎢⎢⎢⎢⎣

0 ⋯ 0 10 ⋯ 1 ∗⋮ ⋰ ⋮ ⋮1 ⋯ ∗ ∗

⎤⎥⎥⎥⎥⎥⎥⎥⎦1




is invertible. Since C is invertible, R is invertible as well. The Cayley-Hamilton theorem gives

qAn = q p(A) −n−1

j=0 α jA j = −n−1

j=0 α jqA j = − α0 α0 ⋯ αn−1R,

so we obtain

RA = ⎡⎢⎢⎢⎢⎢⎢⎢⎣

qA⋮

qAn−1

qAn

⎤⎥⎥⎥⎥⎥⎥⎥⎦= ⎡⎢⎢⎢⎢⎢⎢⎢⎣

0 1 ⋯ 0⋮ ⋮ ⋱ ⋮0 0 ⋯ 1

−α0 −α1 ⋯ −αn−1

⎤⎥⎥⎥⎥⎥⎥⎥⎦R and Rb = ⎡⎢⎢⎢⎢⎢⎢⎢⎣

qb⋮

qAn−2b

qAn−1b

⎤⎥⎥⎥⎥⎥⎥⎥⎦= ⎡⎢⎢⎢⎢⎢⎢⎢⎣

0⋮01

⎤⎥⎥⎥⎥⎥⎥⎥⎦.

Letting Q = R−1 yields that Q−1AQ, Q−1b, and cQ are as in (11.2).Conversely, suppose that there exists an invertible Q such that (11.2) holds. Then

Q−1C = Q−1b (Q−1AQ)(Q−1b) ⋯ (Q−1AQ)n−1(Q−1b)

=

⎡⎢⎢⎢⎢⎢⎢⎢⎣

0 ⋯ 0 10 ⋯ 1 ∗⋮ ⋰ ⋮ ⋮

1 ⋯ ∗ ∗

⎤⎥⎥⎥⎥⎥⎥⎥⎦

.

Clearly, we have rankQ−1C = n. Since Q−1 is invertible, we conclude that rankC = n. ◻

Theorem 11.2 If there exists a Q ∈ Rn×n such that (11.2) holds, then the transfer function of the

system (11.1) is given by

h(s) = β n−1sn−1 + ⋯ + β 1s + β 0

sn + αn−1sn−1 + ⋯ + α1s + α0

+ d.

Proof . We have

(sI − Q−1AQ)⎡⎢⎢⎢⎢⎢⎢⎢⎣1s⋮

sn−1

⎤⎥⎥⎥⎥⎥⎥⎥⎦= ⎡⎢⎢⎢⎢⎢⎢⎢⎣

0⋮0

p(s)⎤⎥⎥⎥⎥⎥⎥⎥⎦

, and hence 1

p(s) ⎡⎢⎢⎢⎢⎢⎢⎢⎣1s⋮

sn−1

⎤⎥⎥⎥⎥⎥⎥⎥⎦= (sI − Q−1AQ)−1Q−1b,

where p(s) = sn + αn−1sn−1 + ⋯ + α1s + α0. Now the result is immediate from

c(sI − A)−1b = cQ(sI − Q−1AQ)−1Q−1b = 1

p(s) β 0 β 1 ⋯ β n−1 1 s ⋯ sn−1T. ◻

11.2.2 MIMO Case

Consider the multiple-input multiple-output linear time-invariant system

x(t) = Ax(t) + Bu(t),

y(t) = Cx(t) + Du(t),(11.3)


n×m, and C ∈ Rl×n.

If C is the controllability matrix of the pair (A, B) and if b1, . . . , bm ∈ Rn are the columnsof B such that B = [b1 ⋯ bm], then we may write

C = b1 ⋯ bm Ab1 ⋯ Abm ⋯ An−1b1 ⋯ An−1bm . (11.4)




Suppose rankC = nr. Starting from the left and moving to the right, select the first nr linearlyindependent columns of (11.4). Reorder these columns and obtain

M = b1 Ab1 ⋯ Aµ1−1b1 ⋯ bm Abm ⋯ Aµm−1bm . (11.5)

The integer µi is the number of linearly independent columns in M that are associated with bi; set

µi = 0 if no column of M involves bi. The integers µ1, . . . , µm are called the controllability indices of the pair (A, B). Clearly, we have ∑m

i=1 µi = rankC.If the m columns b1, . . . , bm ∈ Rn of B are not linearly independent (i.e., if rank B = r < m),

then some of the controllability indices are zero. In this case, there exists a matrix Br ∈ Rn×r

with linearly independent columns v1, . . . , vr ∈ Rn such that b j = p1 jv1 + ⋯ + prjvr for some p1 j, . . . , prj ∈ R and for all j = 1, . . . , m. If we let Pr = ( pij) ∈ Rr×m, then B = BrPr. Sincerank B = rank Br, we must have rank Pr = r; that is, the rows of Pr must be linearly independent.Now choose any Ps ∈ R

(m−r)×m such that P = [PTr PT

s ]T is nonsingular. Then we have B = [Br 0]P.Thus, whenever rank B = r < m, there exist a matrix Br ∈ R

n×r with linearly independent columnsand a nonsingular matrix K ∈ R

m×m such that

BK =

Br 0

,

which implies Bu = [Br 0]K−1u. That is, only the first r entries of the transformed input K−1u

affect the system state. Moreover, it follows from

rank Br ABr ⋯ An−1Br = rank⎛⎜⎝B AB ⋯ An−1B

⎡⎢⎢⎢⎢⎢⎣K 0

⋱0 K

⎤⎥⎥⎥⎥⎥⎦⎞⎟⎠

that (A, Br) is controllable if and only if (A, B) is controllable. Therefore, without loss of gener-ality, we assume that the m columns of B are linearly independent.

Theorem 11.3 Let C be the controllability matrix of (A, B). If rank B = m and rankC = n,and if µ1, . . . , µm are the controllability indices of

(A, B

), then there exists a nonsingular matrix

Q ∈ Rn×n such that

Q−1AQ =

⎡⎢⎢⎢⎢⎢⎣A11 ⋯ A1m

⋮ ⋮Am1 ⋯ Amm

⎤⎥⎥⎥⎥⎥⎦(11.6a)

with

Aii =

⎡⎢⎢⎢⎢⎢⎢⎢⎣

0 1 ⋯ 0⋮ ⋮ ⋱ ⋮0 0 ⋯ 1∗ ∗ ⋯ ∗

⎤⎥⎥⎥⎥⎥⎥⎥⎦∈ R

µi×µi ; Aij =

⎡⎢⎢⎢⎢⎢⎢⎢⎣

0 ⋯ 0⋮ ⋱ ⋮0 ⋯ 0∗ ⋯ ∗

⎤⎥⎥⎥⎥⎥⎥⎥⎦∈ R

µi×µj , i ≠ j; (11.6b)

and

Q−1B = ⎡⎢⎢⎢⎢⎢⎣B11 ⋯ B1m

⋮ ⋮Bm1 ⋯ Bmm

⎤⎥⎥⎥⎥⎥⎦(11.6c)

with

Bii =

⎡⎢⎢⎢⎢⎢⎢⎢⎣

0⋮01

⎤⎥⎥⎥⎥⎥⎥⎥⎦∈ R

µi×1; Bij =

⎡⎢⎢⎢⎢⎢⎢⎢⎣

0⋮0∗

⎤⎥⎥⎥⎥⎥⎥⎥⎦∈ R

µi×1, i < j; Bij =

⎡⎢⎢⎢⎢⎢⎢⎢⎣

0⋮00

⎤⎥⎥⎥⎥⎥⎥⎥⎦∈ R

µi×1, i > j. (11.6d)

(The matrix CQ does not have any particular structure.)




Sketch of Proof . Define σk = ∑ki=1 µi, k = 1, . . . , m. Let qk be the σk-th row of M−1, where M

is as in (11.5); that is,

M−1 = ∗ ⋯ ∗ qT1 ⋯ ∗ ⋯ ∗ qT

mT.

Use q1, . . . , qm to form

R = qT1 (q1A)T ⋯ (q1Aµ1−1)T ⋯ qTm (qmA)T ⋯ (qmAµm−1)TT .

Then it can be shown, proceeding as in the proof of Theorem 11.1, that R is nonsingular and thatQ = R−1 is the desired similarity transformation. ◻

Suppose that Q is as in Theorem 11.3; suppose rankC = n and rank B = m. If µ1, . . . , µm arethe controllability indices of (A, B) with ∑m

i=1 µi = n, let

Ac =

⎡⎢⎢⎢⎢⎢⎣A1 0

⋱0 Am

⎤⎥⎥⎥⎥⎥⎦∈ R

n×n with Ai =

⎡⎢⎢⎢⎢⎢⎢⎢⎣

0 1 ⋯ 0⋮ ⋮ ⋱ ⋮0 0 ⋯ 10 0 ⋯ 0

⎤⎥⎥⎥⎥⎥⎥⎥⎦∈ R

µi×µi , i = 1, . . . , m; (11.7a)

Bc = ⎡⎢⎢⎢⎢⎢⎣B1 0

⋱0 Bm

⎤⎥⎥⎥⎥⎥⎦∈ R

n×m with Bi =⎡⎢⎢⎢⎢⎢⎢⎢⎣

0⋯01

⎤⎥⎥⎥⎥⎥⎥⎥⎦∈ R

µi×1 i = 1, . . . , m . (11.7b)

Then we haveQ−1AQ = Ac + BcAq and Q−1B = BcBq (11.7c)

for some Aq ∈ Rm×n and Bq ∈ Rm×m. If q1, . . . , qm and Q are as in the sketch of the proof of Theorem 11.3, then it can be shown that

Aq =

⎡⎢⎢⎢⎢⎢⎣

q1Aµ1

⋮qmAµm

⎤⎥⎥⎥⎥⎥⎦

Q and Bq =

⎡⎢⎢⎢⎢⎢⎣

q1Aµ1−1

⋮qmAµm−1

⎤⎥⎥⎥⎥⎥⎦

B.

Furthermore, the matrix Bq is in fact an upper triangular matrix with ones on the diagonal, andhence it is nonsingular. The following is the controllable version of the, so-called, structure theorem .

Theorem 11.4 If Q is as in Theorem 11.3, and if Aq and Bq are as in (11.7), then the transfer function matrix of the system (11.3) is given by H(s) = N(s)D(s)−1 + D, where

D(s) = B−1q Λ(s) − AqS(s), N(s) = CQS(s),

Λ

(s

) =

⎡⎢⎢⎢⎢⎢⎢⎢⎣

sµ1

sµ2

⋱sµm

⎤⎥⎥⎥⎥⎥⎥⎥⎦

, S

(s

) = diag

⎧⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎩

⎡⎢⎢⎢⎢⎢⎢⎢⎣

1s

⋮sµ1−1

⎤⎥⎥⎥⎥⎥⎥⎥⎦

, . . . ,

⎡⎢⎢⎢⎢⎢⎢⎢⎣

1s

⋮sµm−1

⎤⎥⎥⎥⎥⎥⎥⎥⎦

⎫⎪⎪⎪⎪⎪⎬⎪⎪⎪⎪⎪⎭

.

Proof . Since

Q−1BD(s) = BcBqB−1q Λ(s) − AqS(s) = BcΛ(s) − BcAqS(s),

sI − Q−1AQS(s) = sS(s) − Ac + BcAqS(s) = sI − AcS(s) − BcAqS(s)= BcΛ(s) − BcAqS(s),




we have sI − Q−1AQS(s) = Q−1BD(s). This gives

H(s) = CQsI − Q−1AQ−1Q−1B + D = CQS(s)D(s)−1 + D = N(s)D(s)−1 + D. ◻

As an illustrative example, consider

A =

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

0 0 3 1 2 −1−4 −3 −4 2 −1 −30 0 −2 −1 −1 1

−5 −2 −8 1 −3 −40 −1 2 2 1 −13 5 0 −4 −2 2

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦, B =

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

0 0 −10 1 −20 0 11 0 −50 0 −11 −2 −1

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦, C = 2 4 3 −3 1 1

1 1 −2 −1 −1 1, D = 0 0 0

0 0 0.

Let b1, b2, and b3 be the columns of B such that B = [b1 b2 b3]. Then the controllability matrixof (A, B) is given by C = [b1 b2 b3 Ab1 Ab2 Ab3 ⋯ A5b1 A5b2 A5b3]. Since rankC = 6 andrank B = 3, we can proceed to obtain the controller-form representation of (A, B, C). Taking thefirst 6 linearly independent columns of C and reordering them, we obtain

M =

b1 Ab1 A2b1 b2 b3 Ab3

=

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

0 0 1 0 −1 −30 −1 2 1 −2 00 0 2 0 1 11 −3 4 0 −5 30 1 −2 0 −1 −61 −2 1 −2 −1 7

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

⇒ M−1 =

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

−7 14 8 −6 10 75 −30 −9 15 −14 −151 −4 −1 2 −2 −2

−3 7 3 −3 4 3−3 14 5 −7 7 71 −6 −2 3 −3 −3

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

.

The controllability indices are µ1 = 3, µ2 = 1, and µ3 = 2. Since σ1 = 3, σ2 = 4, and σ3 = 6, takingthe 3rd, 4th, and 6th rows of M−1, and denoting them by q1, q2, and q3, respectively, we obtain

Q−1 =

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

q1

q1A

q1A2

q2

q3

q3A

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

=

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

1 −4 −1 2 −2 −20 0 1 0 1 00 −1 0 1 2 0

−3 7 3 −3 4 31 −6 −2 3 −3 −30 0 1 0 0 0

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

⇒ Q =

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

3 0 0 0 −2 −16 −1 0 1 −3 −20 0 0 0 0 16 −3 1 1 −3 00 1 0 0 0 −1

−5 −2 1 −1 2 4

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

.

Now a controller-form representation (Ac, Bc, Cc) = (Q−1AQ, Q−1B, CQ) of (A, B, C) is given by

Ac =

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

0 1 0 0 0 00 0 1 0 0 0

24 −2 0 5 −10 −9−25 0 0 −2 15 0

0 0 0 0 0 1−11 2 0 −2 5 1

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

, Bc =

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

0 0 00 0 01 −1 −50 1 00 0 00 0 1

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

, Cc = 7 4 −2 0 −5 −4−2 −1 0 −1 0 0

.

To obtain the transfer function matrix using the structure theorem, write

Ac = Ac + BcAq =

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

0 1 0 0 0 00 0 1 0 0 00 0 0 0 0 0

0 0 0 0 0 0

0 0 0 0 0 10 0 0 0 0 0

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

+

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

0 0 00 0 01 0 0

0 1 0

0 0 00 0 1

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

⎡⎢⎢⎢⎢⎢⎣24 −2 0 5 −10 −9

−25 0 0 −2 15 0−11 2 0 −2 5 1

⎤⎥⎥⎥⎥⎥⎦3rd, 4th, and 6th rows of Ac

,




Bc = BcBq =

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

0 0 00 0 01 0 0

0 1 0

0 0 00 0 1

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

⎡⎢⎢⎢⎢⎢⎣1 −1 −50 1 00 0 1

⎤⎥⎥⎥⎥⎥⎦3rd, 4th, and 6th rows of Bc

,

and obtain

D(s) = B−1q Λ(s) − AqS(s)

=

⎡⎢⎢⎢⎢⎢⎣1 −1 −50 1 00 0 1

⎤⎥⎥⎥⎥⎥⎦

−1

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎝

⎡⎢⎢⎢⎢⎢⎣s3 0 00 s 00 0 s2

⎤⎥⎥⎥⎥⎥⎦−

⎡⎢⎢⎢⎢⎢⎣24 −2 0 5 −10 −9

−25 0 0 −2 15 0−11 2 0 −2 5 1

⎤⎥⎥⎥⎥⎥⎦

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

1 0 0s 0 0

s2 0 00 1 00 0 10 0 s

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎠=

⎡⎢⎢⎢⎢⎢⎣

s3 − 8s + 56 s + 7 5s2 + 4s − 3025 s + 2 −15

−2s + 11 2 s2 − s − 5

⎤⎥⎥⎥⎥⎥⎦

,

N(s) = CQS(s) = 7 4 −2 0 −5 −4−2 −1 0 −1 0 0

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

1 0 0s 0 0

s2 0 00 1 00 0 10 0 s

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

= −2s2 + 4s + 7 0 −4s − 5−s − 2 −1 0

.

The transfer function matrix is then given by

H(s) = N(s)D(s)−1 + D

= 2s4 − 2s3 − 17s2 + 43s + 46 2s4 − 4s3 − 13s2 + 116s + 115 6s4 − 5s3 − 83s2 + 46

s3 + 3s2 − 30s + 61 s4 − 2s3 − s2 + 45s − 105 5s3 + 9s2 − 104s + 51s5 + s4 − 5s3 + 16s2 + 44s + 115

.

11.3 Observer Canonical Form

11.3.1 SISO Case

Consider the single-input single-output linear time-invariant system (11.1). A dual result to The-orem 11.1 is stated as follows.

Theorem 11.5 Let p(s) = sn + αn−1sn−1 + ⋯ + α1s + α0 be the characteristic polynomial of A; let Obe the observability matrix of (A, c). There exists a nonsingular matrix Q ∈ R

n×n such that

Q−1AQ =⎡⎢⎢⎢⎢⎢⎢⎢⎣

0 ⋯ 0 −α0

1 ⋯ 0 −α1

⋮ ⋱ ⋮ ⋮0 ⋯ 1 −αn−1

⎤⎥⎥⎥⎥⎥⎥⎥⎦, Q−1b =

⎡⎢⎢⎢⎢⎢⎢⎢⎣

β 0β 1⋮

β n−1

⎤⎥⎥⎥⎥⎥⎥⎥⎦, and cQ = 0 ⋯ 0 1 (11.8)

for some β 0, . . . , β n−1 ∈ R if and only if rankO = n.

Sketch of Proof . The proof is analogous to that of Theorem 11.1, so it will only be sketched.Suppose rankO = n. Let p be the last column of O−1. Then OO

−1 = I, so cp = cAp = ⋯ =




cAn−2p = 0 and cAn−1p = 1. This implies that OQ is invertible, where Q = [p Ap ⋯ An−1p], andhence that Q is invertible. The Cayley-Hamilton theorem then gives Anp = −∑n−1

j=0 α jA jp, so we

obtain that Q−1AQ, Q−1b, and cQ are of the form in (11.8). Conversely, suppose (11.8) holds.Then OQ can be shown to have rank n. Since Q is invertible, we conclude that rankO = n. ◻

Theorem 11.6 If there exists a Q ∈ Rn×n such that (11.8) holds, then the transfer function of the

system (11.1) is given by

h(s) = β n−1sn−1 + ⋯ + β 1s + β 0

sn + αn−1sn−1 + ⋯ + α1s + α0

+ d.

Proof . The proof is analogous to that of Theorem 11.2; the result follows from the fact that

1 s ⋯ sn−2 sn−1 (sI − Q−1AQ) = 0 0 ⋯ 0 p(s) . ◻

11.3.2 MIMO Case

Consider the multiple-input multiple-output linear time-invariant system (11.3). If O is the ob-servability matrix of the pair

(A, C

) and if c1, . . . , cl ∈ R

1×n are l rows of C, then we may write

O = cT1 ⋯ cT

l (c1A)T ⋯ (clA)T ⋯ c1An−1T⋯ clA

n−1TT. (11.9)

Suppose rankO = no. Starting from the top and moving to the bottom, select the first no linearlyindependent rows of (11.9). Reorder these rows and obtain

N = cT1 (c1A)T ⋯ c1Aν 1−1T

⋯ cTl (clA)T ⋯ clA

ν l−1TT. (11.10)

The integer ν i is the number of linearly independent rows in N that are associated with ci; setν i = 0 if no row of N involves ci. The integers ν 1, . . . , ν l are called the observability indices of thepair (A, C). Clearly, we have ∑l

i=1 ν i = rankO.If the l rows c1, . . . , cl of C are not linearly independent (i.e., if rank C = r < l), then some

of the observability indices are zero. In this case, there exist a matrix Cr ∈ Rr×n with linearlyindependent rows and a nonsingular matrix L ∈ R

l×l such that

LC = Cr

0 ,

which implies LCx = Cr

0 x. That is, only the first r entries of the transformed output Ly depends

on the systems state. Moreover, (A, Cr) is observable if and only if (A, C) is observable. Therefore,without loss of generality, we assume that the l rows of C are linearly independent.

Theorem 11.7 Let O be the observability matrix of the pair (A, C). If rank C = l and rankO = n,and if ν 1, . . . , ν l are the observability indices of (A, C), then there exists a nonsingular matrix

Q ∈ R

n×n

such that

Q−1AQ =⎡⎢⎢⎢⎢⎢⎣

A11 ⋯ A1l

⋮ ⋮Al1 ⋯ All

⎤⎥⎥⎥⎥⎥⎦(11.11a)

with

Aii =

⎡⎢⎢⎢⎢⎢⎢⎢⎣

0 ⋯ 0 ∗1 ⋯ 0 ∗⋮ ⋱ ⋮ ⋮0 ⋯ 1 ∗

⎤⎥⎥⎥⎥⎥⎥⎥⎦∈ R

ν i×ν i ; Aij =

⎡⎢⎢⎢⎢⎢⎣0 ⋯ 0 ∗⋮ ⋱ ⋮ ∗0 ⋯ 0 ∗

⎤⎥⎥⎥⎥⎥⎦∈ R

ν i×ν j , i ≠ j; (11.11b)




and

CQ =

⎡⎢⎢⎢⎢⎢⎣C11 ⋯ C1l

⋮ ⋮Cl1 ⋯ Cll

⎤⎥⎥⎥⎥⎥⎦(11.11c)

with

Cii = 0 ⋯ 0 1 ∈ R1×ν i; (11.11d)

Cij = 0 ⋯ 0 0 ∈ R1×ν i , i < j; Cij = 0 ⋯ 0 ∗ ∈ R

1×ν i, i > j. (11.11e)

(The matrix Q−1B does not have any particular structure.)

Proof . The proof is similar to that of Theorem 11.3: Define σk = ∑ki=1 ν i, k = 1, . . . , l, and let pk

be the σk-th column of N−1, where N is as in (11.10); then use p1, . . . , pl to form

Q = p1 Ap1 ⋯ Aν 1−1p1 ⋯ pl Apl ⋯ Aν l−1pl . ◻

Suppose that Q is as in Theorem 11.7; suppose rankO = n and rank C = l. If ν 1, . . . , ν l are theobservability indices of (A, C) with ∑l

i=1 ν i = n, let

Ao = ⎡⎢⎢⎢⎢⎢⎣A1 0

⋱0 Al

⎤⎥⎥⎥⎥⎥⎦∈ R

n×n with Ai = ⎡⎢⎢⎢⎢⎢⎢⎢⎣

0 ⋯ 0 01 ⋯ 0 0⋮ ⋱ ⋮ ⋮0 ⋯ 1 0

⎤⎥⎥⎥⎥⎥⎥⎥⎦∈ R

ν i×ν i, i = 1, . . . , l; (11.12a)

Co =

⎡⎢⎢⎢⎢⎢⎣C1 0

⋱0 Cl

⎤⎥⎥⎥⎥⎥⎦∈ R

l×n with Ci = 0 ⋯ 0 1 ∈ R1×ν i , i = 1, . . . , l. (11.12b)

Then we haveQ−1AQ = Ao + A p

Co and CQ = C p Co (11.12c)

for some A p ∈ Rn×l and C p ∈ R

l×l. If p1, . . . , pl and Q are as in the proof of Theorem 11.7, then itcan be shown that

A p = Q−1 Aν 1p1 ⋯ Aν lpl and C p = C Aν 1−1p1 ⋯ Aν l−1pl .

Furthermore, the matrix C p is in fact a lower triangular matrix with ones on the diagonal, andhence it is nonsingular. The following is the observable version of the structure theorem .

Theorem 11.8 If Q is as in Theorem 11.7, and if A p and C p are as in (11.12), then the transfer function matrix of the system (11.3) is given by H(s) = D(s)−1N(s) + D, where

D(s) = Λ(s) − S(s)A pC−1 p , N(s) = S(s)Q−1B,

Λ(s) =

⎡⎢⎢⎢⎢⎢⎢⎢⎣

sν 1

sν 2

⋱

sν l

⎤⎥⎥⎥⎥⎥⎥⎥⎦

, S(s) = diag 1 s ⋯ sν 1−1 , . . . , 1 s ⋯ sν l−1 .

Proof . The proof is similar to that of Theorem 11.4, and the result follows from D(s)CQ =S(s)sI − Q−1AQ. ◻In the example presented at the end of Section 11.2.2, a state-space model (A, B, C, D) was

given, and its controller-form representation (Ac, Bc, Cc, D) and transfer function matrix H(s) wereobtained. Now, suppose ( A, B, C, D) = (AT, CT, BT, DT). Then an observer-form representationof ( A, B, C, D) turns out to be (AT

c , CTc , BT

c , DT). Moreover, by the structure theorem, the transferfunction matrix of ( A, B, C, D) is equal to H(s)T.




11.4 Relation Between Stability, Controllability, and Observability

Controllability (or, more precisely, reachability) and observability are dual notions. Each resultconcerning observability can be obtained from its controllability counterpart, and vice versa, byinvoking the following duality result.

Theorem 11.9 Let A ∈ Rn×n, B ∈ Rn×m, and C ∈ Rl×n. The following hold:

(a) The matrix A is (asymptotically) stable if and only if AT is (asymptotically) stable;

(b) The pair (A, B) is controllable if and only if (AT, BT) is observable;

(c) The pair (A, C) is observable if and only if (AT, CT) is controllable.

Proof . Part (a) follows from the fact that A and AT share a common set of eigenvalues.Let C and O be the controllability and observability matrices, respectively, of the triple (A, B, C).Part (b) follows from the fact that C has full row rank if and only if CT has full column rank.Similarly, part (c) holds because O has full column rank if and only if OT has full row rank. ◻

We know that asymptotically stable systems are BIBO stable, but that the converse does nothold true in general. In view of Kalman’s decomposition theorem, it seems that the converseshould hold true if the system is both controllable and observable. This is indeed the case due tothe following result proved in [1, Theorem 12.8].

Theorem 11.10 Suppose the system (11.3) is controllable and observable. Then the system is bounded-input bounded-output stable if and only if it is asymptotically stable.

The following result facilitates further insights on the relation between stability, controllability,and observability. These structural properties are connected via Lyapunov equations.


n×m, and C ∈ Rl×n. For continuous-time LTI systems of the

form (11.3) the following hold:(a) Suppose (A, B) is controllable. Then A is asymptotically stable (i.e., all eigenvalues of A

have negative real parts) if and only if there exists P ∈ Rn×n such that

P > 0 and AP + PAT = −BBT.

(b) Suppose (A, C) is observable. Then A is asymptotically stable if and only if there exists Q ∈ R

n×n such that Q > 0 and ATQ + QA = −CTC.

For discrete-time LTI systems of the form

x(t + 1) = Ax(t) + Bu(t),

y(t) = Cx(t) + Du(t),

the following hold:

(c) Suppose (A, B) is reachable. Then A is asymptotically stable (i.e., all eigenvalues of A have magnitudes less than one) if and only if there exists P ∈ R

n×n such that

P > 0 and APAT − P = −BBT.




(d) Suppose (A, C) is observable. Then A is asymptotically stable if and only if there exists Q ∈ R

n×n such that Q > 0 and ATQA − Q = −CTC.

Proof . To prove part (a), suppose AP + PAT = −BBT for some P > 0. Choose any eigenvalue λ

of A, and let v be the corresponding eigenvector of AT. Then

(λ∗ + λ

)v∗Pv = v∗

(AP + PAT

)v =

−v∗BBTv. Since P > 0 (and v ≠ 0), we have v∗Pv > 0. Also, since (A, B) is controllable, we haveBTv ≠ 0 by the PBH eigenvector test for controllability, so v∗BBTv > 0. Thus, R(λ) = (λ∗+ λ)2 =−v∗BBTv(2v∗Pv) < 0. Conversely, supp ose every eigenvalue of A has negative real part. Let P(t)be the observability Gramian of (AT, BT) on [0, t]; that is, P(t) = ∫ t0 eAτ BBTeA

Tτ dτ . Since (A, B)is controllable, (AT, BT) is observable, so P(t) > 0 for all t > 0 and P(t) ≥ P(s) whenever t ≥ s > 0.On the other hand, since A is exponentially stable in continuous time, there are c, λ > 0 suchthat eAτ ≤ ce−λτ , so we have P(t) ≤ c2BBT(2λ) < ∞. Therefore P = ∫ ∞0 eAτ BBTeA

Tτ dτ iswell-defined and P > 0. Moreover,

AP + PAT = ∞0

d

dτ eAτ BBTeA

Tτ + eAτ BBT d

dτ eA

Tτ dτ

= ∞0

ddτ eAτ BBTeA

Tτ dτ = −BBT.

This establishes part (a). Part (b) is a dual result, which follows from part (a) applied to (AT, CT).To prove part (c), suppose APAT − P = −BBT for some P > 0. Choose any eigenvalue λ

of A, and let v be the corresponding eigenvector of AT. Then proceeding as in part (a) we obtainλ∗λ − 1 = −v∗BBTvv∗Pv < 0, which implies λ < 1. Conversely, suppose the magnitude of everyeigenvalue of A is less than one. Let P(K ) = ∑K −1

τ =0 Aτ BBT(AT)τ ; that is, let P(K ) be the discrete-time observability Gramian of (AT, BT) on {0, 1, . . . , K }. Since (A, B) is reachable, we have that(AT, BT) is observable, and that P(K ) > 0 and P(K + 1) ≥ P(K ) for all positive integers K . Onthe other hand, since A is exponentially stable in discrete time, there are c > 0 and λ ∈ (0, 1) suchthat Aτ ≤ cλτ , and so P(K ) ≤ c2BBT(1 − λ2) < ∞. Therefore P =

∑∞τ =0 Aτ BBT(AT)τ is

well-defined and P > 0. Moreover, APAT − P = (P − BBT) − P = −BBT. This establishes part (c).Part (d) is dual to part (c), and follows from part (c) applied to (AT, CT). ◻

References







XII. Eigenvalue Assignment and Stabilization



Fall 2010

12.1 Introduction

Recall that a control system is a feedback interconnection of two dynamical systems, called theplant and the controller, as shown in Fig. 12.1. By closing the feedback loop, one aims to make the

controlled output z exhibit desired properties with respect to the reference input r.The single most important goal of feedback, or closed-loop, control is to cope with uncertainties.

One of the very first control problems we have encountered is the controllability problem, wherethe control objective is to steer the initial state to a desired state. A particular solution u to thisproblem has been given as a function of time that is determined solely by the initial state as wellas known system coefficients. However, this kind of control schemes, so-called open-loop control,is not satisfactory in real-world situations that involve uncertainties in the initial state, systemcoefficients, etc. What is required to make the control system perform satisfactorily in practiceis to update the control input u(t) based on the information (about the plant) available (to thecontroller) at each time t using a feedback control law γ of the form u(t) = γ (y(s), t0 ≤ s ≤ t).

Restricting our attention to linear time-invariant feedback control laws, we describe γ by

xK (t) = AK xK (t) + BK y(t) + Fr(t),

u(t) = CK xK (t) + DK y(t) + Gr(t).

Controllers of this form are called dynamic output feedback controllers . A dynamic output feedbackcontroller has to address two problems simultaneously: One is the problem of transferring the stateof the plant to a desired value (i.e., the control problem), and the other is the problem of estimatingthe state of the plant based on the available information (i.e., the estimation problem). For manycontrol objectives (e.g., eigenvalue assignment and stabilization), however, these two problems areseparated. That is, to design a dynamic output feedback controller, one may design a state feedbackcontroller and a state observer separately and combine them in a straightforward manner.

z Plantr

u+

+

Controller

yPre-Filter

Figure 12.1: A feedback control system

1




12.1.1 State Feedback Controller

The problem of state feedback control is to find a linear static control law of the form

u(t) = Kx(t)

for the plant x(t) = Ax(t) + Bu(t) with y(t) = x(t). Closing the feedback loop then gives

x(t) = (A + BK)x(t).

For example, the state feedback stabilization problem is to find a linear state feedback gain matrix Ksuch that the closed-loop plant state matrix A + BK is asymptotically stable.

12.1.2 State Observer

A state estimation problem is to obtain an estimate x of the state x of the plant using a dynamicalsystem, called the Luenberger observer , of the form

˙x(t) = Ax(t) + Bu(t) + L

Cx(t) + Du(t) − y(t)

.

If we let the estimation error at time t be e(t) = x(t) − x(t), and if the plant is described byx(t) = Ax(t) + Bu(t) and y(t) = Cx(t) + Du(t), then the problem of designing a state observerreduces to that of obtaining the matrix L such that

e(t) = (A + LC)e(t).

For example, an asymptotic state estimation problem is to find a linear output injection gain matrix L such that the observer state matrix A + LC is asymptotically stable.

12.1.3 Dynamic Output Feedback Controller

As we will see shortly, the separation principle for eigenvalue assignment states that one can assign

the eigenvalues of A + BK and A + LC separately to place the closed-loop system poles at desiredlocations in the complex plane C. Similarly, the separation principle for stabilization states that, toasymptotically stabilize the closed-loop system, one only needs to obtain a stabilizing state feedbackgain and a stabilizing output injection gain separately. Moreover, if K and L are the state feedbackand output injection gains obtained separately, then the coefficients AK , BK , CK , and DK of adynamic output feedback controller can be given in terms of K and L. Lastly, F and G are chosento satisfy additional performance requirements.

12.2 Eigenvalue Assignment

Consider the linear time-invariant system

x(t) = Ax(t) + Bu(t),y(t) = Cx(t) + Du(t),

(12.1)



l×m. We first study the problem of state-feedback eigenvalue assignment.

Lemma 12.1 If (A, B) is controllable and if rank B = m, then by state feedback u(t) = Kx(t)with K ∈ Rm×n, the eigenvalues of A + BK can be arbitrarily assigned provided that complex eigenvalues appear in conjugate pairs.




Proof . Let Q ∈ Rn×n be such that (Ac, Bc) = (Q−1AQ, Q−1B) is in controller form; that is,

Ac =

A11 · · · A1m...

...Am1 · · · Amm

and Bc =

B11 · · · B1m...

...Bm1 · · · Bmm

with

Aii =

0 1 · · · 0...

... . . .

...0 0 · · · 1∗ ∗ · · · ∗

∈ Rµi×µi ; Aij =

0 · · · 0...

. . . ...

0 · · · 0∗ · · · ∗

∈ Rµi×µj , i = j ;

Bii =

0...01

∈ Rµi×1; Bij =

0...0∗

∈ Rµi×1, i < j; Bij =

0...00

∈ Rµi×1, i > j,

where µ1, . . . , µm are the controllability indices of (A, B). Let λ1, . . . , λn be desired eigenvalueswith complex conjugate eigenvalues appearing in pair, so that the desired characteristic polynomial

p(s) =n

i=1

(s − λi) = sn + β n−1sn−1 + · · · + β 1s + β 0

for some β 0, . . . , β n−1 ∈ R. Since rankB = m, we have µi ≥ 1 for all i = 1, . . . , m, so Bc

has exactly m nonzero rows. These rows form an m-by-m nonsingular matrix Bq, which is uppertriangular with ones on the diagonal. Because the corresponding rows of Ac denoted by strings of ∗form an m-by-n matrix Aq, the corresponding rows of Ac + BcKc, forming an m-by-n matrix Ad,can be arbitrarily assigned by a suitable choice of Kc ∈ Rm×n. That is, for any Ad ∈ Rm×n, theunique solution to Aq + BqKc = Ad is given by Kc = B−1

q (Ad − Aq). In particular, we maychoose Kc such that

Ac + BcKc =

0 1 · · · 0...

... . . .

...0 0 · · · 1

−β 0 −β 1 · · · −β n−1

.

The characteristic polynomial of Ac + BcKc equals p(s). Then, it follows from Ac + BcKc =Q−1

A + BKcQ−1

Q that, with K = KcQ−1, the matrix A + BK has the same characteristic

polynomial and hence the desired eigenvalues.

Theorem 12.2 By state feedback u(t) = Kx(t) with K ∈Rm×n

, the eigenvalues of A + BK can be arbitrarily assigned if and only if (A, B) is controllable, provided that complex eigenvalues appear in conjugate pairs.

Proof . To show sufficiency, suppose (A, B) is controllable. If rank B = m, then arbitraryeigenvalue assignment is possible by Lemma 12.1. If rank B = r < m, on the other hand, thenthere exist a full-rank matrix Br ∈ Rn×r and a nonsingular matrix F ∈ Rm×m such that

BF =

Br 0

. (12.2)




The pair (A, Br) is controllable because the columns of the controllability matrix of (A, B) arelinear combinations of the columns of the controllability matrix of (A, Br). Thus, by Lemma 12.1,there exists a Kr ∈ R

r×n such that A + BrKr has desired eigenvalues. However, if we let

K = F

Kr

0

,

then we have

A + BK = A + BF

Kr

0

= A +

Br 0

Kr

0

= A + BrKr,

so A + BK has the desired eigenvalues. This proves the sufficiency part of desired result.To show necessity, suppose (A, B) is not controllable. If nr denotes the rank of the controllability

matrix of (A, B), then we have nr < n. Thus there exists a nonsingular Q ∈ Rn×n such that, nomatter what state feedback gain matrix K ∈ Rm×n is used, we have

Q−1(A + BK)Q = Q−1AQ + Q−1B(KQ) =

A1 A12

0 A2

+

B1

0

K1 K2

= A1 + B1 K1 A12 + B1 K2

0 A2 ,

where A1 ∈ Rnr×nr and B1 ∈ R

nr×m are such that (A1, B1) is controllable, and where K1 ∈ Rm×nr

and K2 ∈ Rm×(n−nr) are such that

K1 K2

= KQ. The eigenvalues of A + BK are the same as

the eigenvalues of Q−1(A + BK)Q, which in turn are the eigenvalues of A1 + B1K1 and those of

A2 combined. Since the eigenvalues of A2 are unaffected by state feedback, arbitrary eigenvalueassignment is not possible. This shows necessity, and completes the proof.

According to Theorem 12.2, an exact condition for arbitrary eigenvalue assignment with statefeedback is the controllability of the pair (A, B). In general, the proof of the theorem suggeststhat all controllable eigenvalues are arbitrarily assignable via state feedback and none of the un-controllable eigenvalues is affected by state feedback. The problem of output-injection eigenvalue

assignment can be addressed by invoking duality.

Theorem 12.3 By output injection with L ∈ Rn×l, the eigenvalues of A + LC can be arbitrarily assigned if and only if (A, C) is observable, provided that complex eigenvalues appear in conjugate pairs.

Proof . The result is immediate from Theorem 12.2, the duality result, and the fact that A + LC

and

A + LCT

share the same set of eigenvalues.

12.3 Example of Eigenvalue Assignment

Let

A = 0 1 0

0 0 10 2 −1

and B = 0 1 1

1 1 00 0 0

.

Since rank B = 2 < 3, there are a full-rank matrix Br and a nonsingular matrix F such that (12.2)holds. In particular, because the three columns b1, b2, b3 of B are such that {b1, b2} is linearlyindependent and such that b1 − b2 + b3 = 0, we may take

Br =

0 11 10 0

and F =

1 0 10 1 −10 0 1

.




The controllability matrix of the pair (A, Br) is

Cr =

0 1 1 1 0 01 1 0 0 2 20 0 2 2 −2 −2

with rank

Cr = 3, so (A, Br) as well as (A, B) is controllable. The first three columns of

Cr arelinearly independent, so

µ1 = 2, µ2 = 1, M =

b1 Ab1 b2

=

0 1 11 0 10 2 0

, and M−1 =

−1 1 1/20 0 1/21 0 −1/2

.

From σ1 = µ2 = 2 and σ2 = µ1 + µ2 = 3, it follows that q1 and q2 are the second and third rowsof M−1, respectively:

q1 =

0 0 1/2

and q2 =

1 0 −1/2

.

Then the similarity transformation Q determined by

Q−

1 = q1

q1Aq2

= 0 0 1/2

0 1 −1/21 0 −1/2

⇒ Q = 1 0 1

1 1 02 0 0

gives us the controller canonical form (Ac, Bc) of (A, Br), where

Ac = Q−1AQ =

0 1 02 −1 01 0 0

and Bc = Q−1Br =

0 01 10 1

.

Suppose that we want to assign the closed-loop eigenvalues to the locations −2, −1 + i, and −1 − i;that is, suppose that the desired eigenvalues are the roots of the polynomial

s3 + β 2s2 + β 1s + β 0 = (s + 2)(s2 + 2s + 2) = s3 + 4s2 + 6s + 4.

Taking the second and third rows of Ac and Bc yields the following desired equality:2 −1 01 0 0

+

1 10 1

Kc =

0 0 1

−β 0 −β 1 −β 2

=

0 0 1−4 −6 −4

.

This linear equation has a unique solution given by

Kc =

1 10 1

−1

0 0 1−4 −6 −4

−

2 −1 01 0 0

=

3 7 5−5 −6 −4

.

(By the way, this does not mean that Kc is the only state feedback gain that achieves the desiredeigenvalue assignment—see [1, Example 9.11].) Because Ac + BcKc has the desired eigenvalues,A + BrKr has the desired eigenvalues as well, where

Kr = KcQ−1 = 5 7 −9/2

−4 −6 5/2

.

Finally, the state feedback gain

K = F

Kr

0

=

5 7 −9/2−4 −6 5/20 0 0

⇒ A + BK =

−4 −5 5/21 1 −10 2 −1

,

where the eigenvalues of A + BK are indeed −2, −1 + i, and −1 − i.




12.4 Stabilization

12.4.1 Stabilizability and Detectability

Consider the linear time-invariant system (12.1) with A ∈ Rn×n, B ∈ Rn×m, C ∈ Rl×n, andD ∈ Rl×m.

Definition 12.4 The pair (A, B) is said to be stabilizable if there exists a state feedback gain matrix K ∈ Rm×n such that A + BK is asymptotically stable.

Theorem 12.5 The pair (A, B) is stabilizable if and only if al l the uncontrollable eigenvalues of Ahave negative real parts.

Proof . Suppose that all the uncontrollable eigenvalues of A have negative real parts. Let Q bea nonsingular matrix such that (A, B) is in the standard form for uncontrollable systems:

A = Q−1AQ =

A1 A12

0 A2

and B = Q−1B =

B1

0

, (12.3)

where (A1, B1), with A1 ∈ Rnr×nr and B1 ∈ Rnr×m, is controllable but all the eigenvalues of A2 ∈ R(n−nr)×(n−nr) are uncontrollable and hence have negative real parts. By Theorem 12.2,there exists a K1 ∈ Rm×nr such that all the eigenvalues of A1 + B1K1 have negative real parts.Let

K =

K1 0

∈ Rm×n.

Then all the eigenvalues of the matrix

A + BK =

A1 A12

0 A2

+

B1

0

K1 0

=

A1 + B1K1 A12

0 A2

,

which is of block triangular form, have negative real parts. The matrices A + BK and Q(A +

BK)Q−1 share the same set of eigenvalues, but Q(A + BK)Q−1 = A + BKQ−1. Thus

K = KQ−1 =

K1 0

Q−1

is a stabilizing state feedback gain matrix such that A + BK is asymptotically stable.Conversely, suppose that (A, B) is stabilizable, so that A+BK is asymptotically stable for some

K ∈ Rm×n. Let Q represent any similarity transformation such that (A, B) is in the form (12.3),where the pair (A1, B1), with A1 ∈ Rnr×nr and B1 ∈ Rnr×m, is controllable and all the eigenvaluesof A2 ∈ R

(n−nr)×(n−nr) are uncontrollable. Let [ K1 K2], with K1 ∈ Rm×nr and K2 ∈ Rm×(n−nr),

be a partition of KQ. Then, since the set of eigenvalues of

Q

−1

(A + BK)Q = A + BKQ = A1 + B1 K1 A12 + B1 K2

0 A2 is equal to that of A+BK, the eigenvalues of A2 have negative real parts. That is, all uncontrollableeigenvalues have negative real parts. This completes the proof.

The above theorem says that there exists a stabilizing state feedback gain for a linear time-invariant system if and only if uncontrollable eigenvalues of the system have negative real parts.An immediate consequence is that controllable linear time-invariant systems are stabilizable.

Corollary 12.6 The pair (A, B) is stabilizable if it is controllable.




Due to duality, all these results on stabilizing state feedback can be restated in terms of stabi-lizing output injection.

Definition 12.7 The pair (A, C) is said to be detectable if there exists an output injection gain matrix L ∈ R

n×l such that A + LC is asymptotically stable.

Theorem 12.8 The pair (A, C) is detectable if and only if all the unobservable eigenvalues of Ahave negative real parts.

Corollary 12.9 The pair (A, C) is detectable if it is observable.

12.4.2 Separation Principle

For simplicity, suppose a linear time-invariant plant (12.1) has D = 0. Also, since the referenceinput is irrelevant to stabilization, assume r is identically zero. We will consider dynamic outputfeedback controllers of the form

xK (t) = AK xK (t) + BK y(t),

u(t) = CK xK (t) + DK y(t),

(12.4)

where AK ∈ RnK×nK , BK ∈ R

nK×l, CK ∈ Rm×nK , and DK ∈ R

m×l. Closing the feedback loopyields the homogeneous closed-loop system

x(t)xK (t)

=

A + BDK C BCK

BK C AK

x(t)xK (t)

. (12.5)

Theorem 12.10 There exists a dynamic output feedback controller (12.4) such that the closed-loopsystem (12.5) is asymptotically stable if and only if the triple (A, B, C) is stabilizable and detectable.Moreover, if K ∈ Rm×n and L ∈ Rn×l are such that A + BK and A + LC are asymptotically stable,then a stabilizing dynamic output feedback controller can be taken to be with nK = n and

AK = A + LC + BK, BK = −L, CK = K, and DK = 0. (12.6)

Proof . Suppose that the closed-loop system (12.5) is asymptotically stable. Let

A =

A 0

0 0

∈ R

(n+nK)×(n+nK), B =

0 B

I 0

∈ R

(n+nK)×(nK+m),

C =

0 I

C 0

∈ R

(nK+l)×(n+nK), and K =

AK BK

CK DK

∈ R

(nK+m)×(nK+l).

Then (12.5) can be written as˙x(t) =

A +

B

K

C

x(t),

where x(t) = [x(t)T xK (t)T]T ∈ Rn+nK is the closed-loop state. Thus A + BKC being asymptoti-cally stable implies that A, B, C is both stabilizable and detectable. We have

rank

λI − A B = rank

λI − A 0 0 B

0 λI I 0

= rank

λI − A B

+ nK ;

rank

λI − AC

= rank

λI − A 0

0 λI

0 I

C 0

= rank

λI − A

C

+ nK




for all λ ∈ C. Moreover, every eigenvalue of A is an eigenvalue of A. Thus, by the PBH rank test,we conclude that (A, B, C) is both stabilizable and detectable as well. This proves the necessitypart of the desired result.

Conversely, suppose that (A, B, C) is both stabilizable and detectable, so that there exist K

and L such that A + BK and A + LC are asymptotically stable. If the dynamic output feedback

controller (12.4) have n = nK and (12.6), then we haveI 0

I −I

−1

A + BDK C BCK

BK C AK

I 0

I −I

=

A + BK −BK

0 A + LC

,

and hence the closed-loop system (12.5) is asymptotically stable. This shows the sufficiency partholds true as well.

Corollary 12.11 There exists a dynamic output feedback controller (12.4) such that the closed-loopsystem (12.5) is asymptotically stable if the triple (A, B, C) is controllable and observable.

12.5 Observer-Based Controllers

12.5.1 Full-Order Observers

It is readily seen that the controller structure in (12.6) corresponds to an observer-based controller of the form

˙x(t) = Ax(t) + Bu(t) + L

Cx(t) + Du(t) − y(t)

, (12.7a)

u(t) = Kx(t) + Gr(t) (12.7b)

applied to the plant (12.1) (with possibly nonzero D). The separation principle in Theorem 12.10says that, as far as dynamic output feedback stabilization is concerned, using an observer-basedcontroller suffices. The order of the observer (12.7a) (i.e., the number of the observer state variablesin x) is equal to n, which is the same as the order of the plant. For this reason, this observer iscalled a full-order observer . The state estimation error e = x − x satisfies

e(t) = (A + LC)e(t),

so there exists an asymptotically stable state observer of the form (12.7a) for the plant (12.1) if and only if the pair (A, C) is detectable. With the closed-loop state defined by [xT eT]T, thestate-space description of the closed-loop system reads

x(t)e(t)

=

A + BK −BK

0 A + LC

x(t)e(t)

+

BG

0

r(t),

y(t) =

C + DK −DK x(t)

e(t)

+ DGr(t),

so the closed-loop transfer function matrix from r to y is given by

H(s) = C + DK −DK (sI − A − BK)−

1 ∗0 (sI − A − LC)−1

BG0 + DG

= (C + DK)(sI − A − BK)−1BG + DG.

The observer eigenvalues uncontrollable, so they do not manifest themselves as system poles.This suggests how one can address certain model matching problems via observer-based control:If (A, B, C) is controllable and observable, then choose K to place the system poles at desired lo-cations and choose L such that the observer eigenvalues are asymptotically stable and much fasterthan the system poles so as not to affect the transient behavior of the overall system much.




12.5.2 Reduced-Order Observers

Suppose that the output matrix C has full row rank (i.e., rank C = l). Let Q ∈ Rn×n be anonsingular matrix whose inverse is of the form

Q−1 = C

R . (12.8)

Then we haveCQ =

Il 0

,

where Il is the l-by-l identity matrix. Then the similarity transformation Qx(t) = x(t) gives

˙x(t) = Q−1AQx(t) + Q−1Bu(t) =

A11 A12

A21 A22

x1(t)x2(t)

+

B1

B2

u(t),

y(t) = CQx(t) + Du(t) =

I 0 x1(t)

x2(t)

+ Du(t).

At each time t, we have y(t) − Du(t) = x1(t) with u(t) and y(t) known, so the vector x1(t) is

perfectly observed and only x2(t) needs to be estimated. If D = 0, assume u(t) is continuouslydifferentiable in t. Putting

u(t) = A21x1(t) + B2u(t) = A21y(t) + (B2 − A21D)u(t), (12.9a)

y(t) = ˙x1(t) − A11x1(t) − B1u(t) = y(t) − Du(t) − A11y(t) − (B1 − A11D)u(t), (12.9b)

we obtain

˙x2(t) = A22x2(t) + u(t),

y(t) = A12x2(t). (12.10)

Therefore, the problem of estimating the n-dimensional state vector x(t) of the system (A, B, C),where C has full row rank, has been reduced to that of estimating the ( n − l)-dimensional state

vector x2(t) of the system (A22, I, A12).Observability is invariant under the order-reduction described above. This invariance property

is a consequence of the PBH test for observability:

Lemma 12.12 The pair (A22, A12) is observable if and only if (A, C) is observable.

Proof . If (A, C) is observable, then so is (Q−1AQ, CQ). This implies thatλI − A11 −A12

−A21 λI − A22

I 0

v1

v2

= 0 (12.11)

holds for some λ ∈ C only if v1 = 0 and v2 = 0. In particular, if v1 = 0 and if λ and v2 = 0 are such

that A22v2 = λv2, then we must have A12v2 = 0. Hence (A22, A12) is observable. Conversely, if (A, C) is not observable, then there exist λ ∈ C, v1 ∈ Cl and v2 ∈ C(n−l) with

vT1 vT

2

= 0 such

that (12.11) holds. Since (12.11) implies v1 = 0, we must have that v2 is an eigenvector of A22

such that A12v2 = 0. Hence (A22, A12) is not observable.

Suppose that (A, C) is observable, and that L ∈ R(n−l)×l is an output injection gain matrix

such that the eigenvalues of the closed-loop matrix A22 + LA12 have negative real parts. Then anasymptotically stable state observer for the system (12.10) is of the form

˙x2(t) = A22x2(t) + u(t) + L(A12x2(t) − y(t)), (12.12)




where x2(t) is the estimate of x2(t). Plugging (12.9) into (12.12) and letting

z(t) = x2(t) + L(y(t) − Du(t))

gives

z(t) =(A22 + LA12)z(t) + (B2 + LB1)u(t)+

(A21 + LA11) − (A22 + LA12)L

(y(t) − Du(t)), (12.13a)

which is a dynamical system of order n−l that generates an estimate z(t)−L(y(t)−Du(t)) of x2(t).Now, from y(t) = x1(t) + Du(t), it follows that

y(t) − Du(t)z(t) − L(y(t) − Du(t))

is an estimate of x(t), and hence that

x(t) = C

R−1

y(t) − Du(t)

z(t) − L(y(t) − Du(t)) (12.13b)

is an estimate of x(t). In summary, we have the following result:

Theorem 12.13 Suppose that (A, C) is detectable, and that C has full row rank (i.e., rank C =l). Let R ∈ R

(n−l)×n be any matrix such that the right-hand side of (12.8) is nonsingular; let L ∈ R

(n−l)×l be such that all the eigenvalues of A22 + LA12 have negative real parts. Then an asymptotically stable state observer of order n − l for the plant (12.1) is given by (12.13).

12.6 Example of Observer-Based Control

Let

A = 0 1 0

0 0 10 2 −1

, B = 0 1 1

1 1 00 0 0

, C = 2 1 0 , and D = 0 0 0 .

The eigenvalues of A are 0, 1, and −2. The PBH rank test tells us that the eigenvalues 0 and 1are observable and that the eigenvalue −2 is unobservable:

rank

−A

C

= rank

I − A

C

= 3 and rank

−2I − A

C

= 2.

Since the unobservable eigenvalue has negative real part, the pair (A, C) is detectable and thereexists an asymptotically stable state observer. The unobservable subspace is identified as follows:

O = C

CA

CA2 = 2 1 0

0 2 10 2 1

⇒ N (O) = span{v3} where v3 = 1

−24 .

Letting Q =

e1 e2 v3

, we transform (A, C) to its standard form for unobservable systems

(Ao, Co) = (Q−1AQ, CQ), where

Q =

1 0 10 1 −20 0 4

⇒ Ao =

A1 0

A21 A2

=

0 1/2 00 1 0

0 1/2 −2

, Co =

C1 0

=

2 1 0

.




Now, we will shift the observable eigenvalues of A, which are the eigenvalues of A1, to the left-half

plane of C. With L1 =

l1 l2T

, the characteristic polynomial of A1 + L1C1 is

p(s) = s2 − (2l1 + l2 + 1)s + (2l1 − l2).

If the desired locations for the observer eigenvalues are −2, −2, and −5, then, as the unobservable

eigenvalue of A is already at −2, we need to place the roots of p(s) at −2 and −5. Solving p(s) = (s + 2)(s + 5) for l1 and l2 gives l1 = 1/2 and l2 = −9. It follows from

A + LC = QAoQ−1 + LCoQ−1

= Q

A1 0

A12 A2

+ Q−1L

C1 0

Q−1 = Q

A1 + L1C1 0

A21 + L2C1 A2

Q−1,

where Q−1L =LT

1 LT

2

Twith L1 ∈ R

2×1 and L2 ∈ R1×1, that L2 can be arbitrarily chosen, and

that the observer gain L ∈ R3×1 is determined by

L = QL1L2

=1 0 10 1 −2

0 0 4

1/2−90

=1/2−9

0

with L2 = 0. If the feedback controller gain K is as in Section 12.3, then the following observer-based controller places the closed-loop eigenvalues at −5, −2, −2, −2, and −1 ± i:

˙x(t) = Ax(t) + Bu(t) + L(Cx(t) + Du(t) − y(t))

=

0 1 00 0 10 2 −1

x(t) +

0 1 11 1 00 0 0

u(t) +

1/2−90

2 1 0

x(t) − y(t)

,

u(t) = Kx(t) + Gr(t)

=

−4 −5 5/21 1 −10 2 −1

x(t) + Gr(t).

The closed-loop transfer function matrix with G = I is then

H(s) = (C + DK)(sI − A − BK)−1BG + DG

=

2 1 0

s 0 00 s 00 0 s

−

−4 −5 5/21 1 −10 2 −1

−1

0 1 11 1 00 0 0

= s2 − 5s + 4s3 + 4s2 + 6s + 4

3s2 − 4s + 7s3 + 4s2 + 6s + 4

2s2 + s + 3s3 + 4s2 + 6s + 4

.

References






XIII. State-Space Realization of Linear Systems



Fall 2010

13.1 Introduction

Let A ∈ Rn×n, B ∈ R


l×m. We know that the external description (i.e.,the input-output description) of the linear time-invariant system

x(t) = Ax(t) + Bu(t),

y(t) = Cx(t) + Du(t) (13.1)

is given by the transfer function matrix

H(s) = C(sI − A)−1B + D, (13.2)

which is the Laplace transform of the impulse response matrix

H(τ ) =

CeAτ B + Dδ (τ ) if τ ≥ 0;

0 if τ < 0

of the system (13.1). Now, the problem we want to address is to determine an internal description (i.e., the state-space description) of a given transfer function. That is, we want to implement asystem that exhibits the desired input-output behavior specified by a transfer function matrix.

13.2 Existence of Realizations

13.2.1 Markov Parameters

A rational fraction of polynomials with real coefficients is called a rational function . The set of rational functions, denoted by R(s) and equipped with the usual addition and multiplication, is afield. Thus R(s)n is a vector space over the field of rational functions, and R(s)m×n is the set of linear transformations that map R(s)n into R(s)m. It follows from (13.2) that transfer functions

of linear time-invariant systems belong to R(s)m×n. Therefore, without loss of generality, we shallrestrict our attention to rational transfer function matrices.

Definition 13.1 The quadruple (A, B, C, D) is said to be a (state-space) realization of H(s) ∈R(s)l×m if (13.2) holds.

If the transfer function matrix H(s) ∈ R(s)l×m of a linear time-invariant system has a Laurentseries expansion H(s) = H0 + H1s−1 + H2s−2 + · · · ,

1




then the matrices Hk, k = 0, 1, ..., are called the Markov parameters of the system. The Markovparameters can be determined by

H0 = lims→∞

H(s)

andHk+1 = lim

s→∞

sk+1 H(s) − H0 − H1s−1 − · · · − Hks−kfor all k = 0, 1, .... A condition for a (A, B, C, D) to be a realization of some H(s) can be givenin terms of the Markov parameters.

Lemma 13.2 The quadruple (A, B, C, D) is a realization of H(s) ∈ R(s)l×m if and only if

H0 = D and Hk+1 = CAkB, k = 0, 1, . . . .

Proof . If x(t) = Ax(t), then sx(s) − x(0) = Ax(s), so (sI − A)−1 is the Laplace transform of eAτ , τ ≥ 0. Since eAτ has the series expansion

∞

k=0 Akτ k/k!, and since the Laplace transform of τ k/k! is s−(k+1) for each nonnegative integer k, we have (sI − A)−1 =

∞

k=0 Aks−(k+1). Thus

H(s) = D + C(sI − A)−1B = D +∞k=0

CAkBs−(k+1) = D + CBs−1 + CABs−2 + · · · .

13.2.2 Block Controller Form Realization

A rational matrix H(s) ∈ R(s)l×m is said to be proper if lims→∞ H(s) < ∞ entrywise. It is said to

be strictly proper if lims→∞ H(s) = 0.

A straightforward way to write down a state-space realization of a proper rational matrix H(s)is to make a multi-variable analog of the single-input controller form (or the single-output observerform) [1]. Let us write

H(s) = N(s)/d(s), (13.3a)

where the denominator polynomial

d(s) = sr + dr−1sr−1 + · · · + d1s + d0 (13.3b)

is the monic least common multiple of the denominators of the entries of H(s) and

N(s) = Nrd(s) + Nr−1sr−1 + · · · + N1s + N0. (13.3c)

(A polynomial is called monic if its term of highest degree has a coefficient of one.) Note thatNr = 0 if H(s) is strictly proper. Let

A = 0 Im · · · 0...

...

. ..

...0 0 · · · Im

−d0Im −d1Im · · · −dr−1Im

∈ Rrm×

rm, B = 0...0

Im

∈ Rrm×

m, (13.4a)

C =

N0 N1 · · · Nr−1

∈ R

l×rm, D = Nr ∈ Rl×m, (13.4b)

where Im denotes the m-by-m identity matrix. Then (A, B, C, D) is a realization of H(s).

Lemma 13.3 Let H(s) ∈ R(s)l×m be a proper rational matrix satisfying (13.3). Then (A, B, C, D)given by (13.4) is a realization of H(s).




Proof . Let Γ(s) = [Γ0(s)T · · · Γr−1(s)T]T be the matrix formed by the last m columns of (sI − A)−1. Then it follows from

(sI − A)Γ =

0 · · · 0 IT

that sΓ0(s) = Γ1(s), . . . , sΓr−2(s) = Γr−1(s), and d0Γ0(s)+· · ·+dr−2Γr−2(s)+(s+dr−1)Γr−1(s) = I.

Equivalently, we havesΓ0(s) = Γ1(s), s2Γ0(s) = Γ2(s), . . . , sr−1Γ0(s) = Γr−1(s), and d(s)Γ0(s) = I.

Thus H(s) = C(sI − A)−1B + D because

H(s) = Nr + Nr−1sr−1

d(s) + · · · + N1

s

d(s) + N0

1

d(s)

= Nr + Nr−1Γr−1(s) + · · · + N1Γ1(s) + N0Γ0(s)

=

N0 · · · Nr−2 Nr−1

∗ · · · ∗ Γ0(s)... · · ·

... ...

∗ · · · ∗ Γr−2(s)

∗ · · · ∗ Γr−1(s)

0...0

I

+ Nr.

The quadruple (A, B, C, D) of the form (13.4) shall be said to be in a block controller form .How to obtain a block observer form is similar.

Theorem 13.4 There exists a realization of H(s) ∈ R(s)l×m if and only if H(s) is proper.

Proof . If (A, B, C, D) is a realization of H(s), then Lemma 13.2 implies that the first Markovparameter H0 = D, and hence that lims→∞

H(s) < ∞ entrywise. Conversely, if H(s) is a properrational matrix, then by Lemma 13.3 there exists a block controller form realization of H(s).

13.3 Minimal Realizations

If A ∈ R

n×n

, B ∈ R

n×m

, C ∈ R

l×n

, and D ∈ R

l×m

, then n is called the order of the matrixquadruple (A, B, C, D).

Definition 13.5 An n-th order realization (A, B, C, D) of H(s) ∈ R(s)l×m is said to be minimal(or irreducible) if every realization of H(s) is at least of order n.

The notions of controllability and observability play a crucial role in addressing the problem of minimal realization.

Lemma 13.6 Let (A1, B1, C1, D1) and (A2, B2, C2, D2) be realizations of H(s) ∈ R(s)l×m. Define

Ci(0) = Bi, Oi(0) = Ci, Ci(k + 1) =

Bi AiBi · · · Ak

i Bi

, (13.5a)

Oi(k + 1) =

Ci

CiAi

...CiA

ki

(13.5b)

for i = 1, 2 and for k = 0, 1, . . . . Then we have

O1(k1)C1(k2) = O2(k1)C2(k2)

for all nonnegative integers k1 and k2.




Proof . Due to Lemma 13.2, the Markov parameters of the two realizations are the same: thatis, D1 = D2 and C1Ak

1B1 = C2Ak2B2 for k = 0, 1, . . . . Thus the result follows from

Oi(k1)Ci(k2) =

CiBi CiAiBi · · · CiAk2−1i Bi

CiAiBi CiA2i Bi · · · CiA

k2i Bi

.

..

.

..

.

..CiA

k1−1i Bi CiA

k1i Bi · · · CiA

k1+k2−2i Bi

.

Theorem 13.7 A realization (A, B, C, D) of H(s) ∈ R(s)l×m is minimal if and only if it is both controllable and observable.

Proof . If a realization is not controllable and observable, then, using the Kalman decompositionmethod, one may obtain another realization of lower order that is both controllable and observable.Thus a minimal realization is necessarily both controllable and observable.

To show sufficiency, let (A1, B1, C1, D1) and (A2, B2, C2, D2) be realizations of H(s); supposethat (A1, B1, C1, D1) is of order n1 and both controllable and observable, and that (A2, B2, C2, D2)is of order n

2. If Ci(k) and Oi(k) are as in (13.5), then by Lemma 13.6 we have O

1(n

1)C

1(n

1) =

O2(n1)C2(n1). Since (A1, B1, C1, D1) is controllable and observable, we have that O1(n1) has fullcolumn rank and C1(n1) has full row rank, so that rank C1(n1) = rankO1(n1) = n1. Choose anymatrix M ∈ R

ln1×(l−1)n1 such thatO1(n1) M

is nonsingular. Then

O1(n1) M

C1(n1)0

= O1(n1)C1(n1) and rank

C1(n1)

0

= rankC1(n1) = n1.

Thus we must have

rankO1(n1)C1(n1)

= rank

O2(n1)C2(n1)

= n1.

Since RC2(k) ⊂ RC2(n2) for all k ≤ n2, and since the Cayley-Hamilton theorem implies

RC2(k) = RC2(n2) for all k ≥ n2, we have rankC2(n1) ≤ n2. Similarly, since N O2(k) ⊃ N O2(n2)

for all k ≤ n2, and since N

O2(k)

= N

O2(n2)

due to the Cayley-Hamilton theorem,

we have rankO2(n1) ≤ n2 by the fundamental theorem of linear equations. Thus we must haverank

O2(n1)C2(n1)

≤ n2. This implies n1 ≤ n2 and establishes sufficiency.

Theorem 13.7 suggests that one way to obtain a minimal realization of a proper transfer functionmatrix H(s) is to find any realization of H(s) (e.g., a block controller form realization) and thenextract the controllable and observable part of it using the Kalman canonical decomposition. Thefollowing result characterizes all minimal realizations.

Theorem 13.8 Let (A1, B1, C1, D1) be a given minimal realization of

H(s) ∈ R(s)l×m. Then

(A2, B2, C2, D2) is another minimal realization of

H(s) if and only if there exists a nonsingular

matrix P ∈ R

n×n

such that

A2 = P−1A1P, B2 = P−1B1, C2 = C1P, and D2 = D1. (13.6)

Moreover, such a matrix P is given by

P = C1CT2

C2C

T2

−1

=O

T1O1

−1O

T1O2,

where Ci and Oi are the controllability and observability matrices of (Ai, Bi, Ci, Di), respectively, for i = 1, 2.




Proof . If (13.6) holds true for some nonsingular P, then (A2, B2, C2, D2) is equivalent to(A1, B1, C1, D1), and hence it is a realization of H(s) as well. Moreover, since P−1A1P ∈ R

n×n

whenever A1 ∈ Rn×n, we have that (A2, B2, C2, D2) is of minimal order whenever (A1, B1, C1, D1)

is of minimal order. This proves the sufficiency part of the desired result.To prove necessity, suppose that (A1, B1, C1, D1) and (A2, B2, C2, D2) are minimal realizations

of H(s). We will show that they are equivalent. Since the two realizations are minimal, they areof the same order, say n. Due to Lemma 13.6, we have O1C1 = O2C2, which gives

OT1O1C1 = O

T1O2C2 and O1C1C

T2 = O2C2C

T2 . (13.7)

By Theorem 13.7, the two realizations are b oth controllable and observable, so in particular C1CT2 ,

C2CT2 , OT

1O1, and OT1O2 are nonsingular. (This follows from Sylvester’s rank inequality : If M ∈

Rm×n and N ∈ R

n×k, then rank M + rank N − n ≤ rank(MN).) Define nonsingular matrices P,

Q ∈ Rn×n by P = C1C

T2

C2C

T2

−1

and Q =O

T1O1

−1O

T1O2. Then (13.7) yields

P = C1CT2

C2C

T2

−1

=O

T1O1

−1O

T1O2C2C

T2

C2C

T2

−1

=O

T1O1

−1O

T1O2 = Q. (13.8)

Let Ci(k) and Oi(k) be as in (13.5). Then OiAiCi are submatrices of Oi(n + 1)Ci(n + 1), andhence Lemma 13.6 implies O1A1C1 = O2A2C2, which gives

OT1O1A1C1C

T2 = O

T1O2A2C2C

T2 . (13.9)

Now (13.7)–(13.9) yield A1P = PA2, C1 = PC2, and O1P = O2, from which the equivalence of the two realizations follows.

13.4 Discrete-Time Case

Let A ∈ Rn×n, B ∈ R


l×m. The input-output description of thediscrete-time linear time-invariant system

x(t + 1) = Ax(t) + Bu(t),y(t) = Cx(t) + Du(t)

(13.10)

is given by the transfer function matrix

H(z) =∞τ =0

z−τ H(τ ) = C(zI − A)−1B + D, (13.11)

where H(τ ) is the impulse response matrix of the system (13.10) given by

H(τ ) =

CAτ −1B if τ > 0;

D if τ = 0.

Definition 13.9 The quadruple (A, B, C, D) is said to be a realization of H(z) ∈ R(z)l×m if (13.11) holds.

Theorem 13.10 There exists a discrete-time realization of H(z) ∈ R(z)l×m if and only if H(z) is proper.

Theorem 13.11 A discrete-time realization (A, B, C, D) of H(z) ∈ R(z)l×m is minimal if and only if it is both reachable and observable.




Theorem 13.12 Let (A1, B1, C1, D1) be a given minimal realization of H(z) ∈ R(z)l×m in discrete time. Then (A2, B2, C2, D2) is another minimal realization of H(z) in discrete time if and only if there exists a nonsingular matrix P ∈ R

n×n such that

A2 = P−1A1P, B2 = P−1B1, C2 = C1P, and D2 = D1.

13.5 An Example of Minimal Realizations

In this example, we will obtain a minimal realization of

H(s) =

1

(s − 1)21

(s − 1)(s + 3)−6

(s − 1)(s + 3)2s − 2

(s + 3)2

by applying the Kalman decomposition theorem on its block controller form realization. The leastcommon multiple of the denominators of the entries of H(s) is given by

d(s) = sr + dr−1sr−1 + · · · + d1s + d0 = (s − 1)2(s + 3)2

= s4 + 4s3 − 2s2 − 12s + 9,

and we can write H(s) = N(s)/d(s), where

N(s) =

(s + 3)2 (s − 1)(s + 3)−6(s − 1) (s − 1)2(s − 2)

=

s2 + 6s + 9 s2 + 2s − 3

−6s + 6 s3 − 4s2 + 5s − 2

= N4d(s) + N3s3 + N2s2 + N1s + N0

with

N4 =

0 00 0

, N3 =

0 00 1

, N2 =

1 10 −4

, N1 =

6 2−6 5

, and N0 =

9 −36 −2

.

Therefore, a block controller–form realization (A, B, C, D) of H(s) is given by

A =

0 I2 0 00 0 I2 00 0 0 I2

−d0I2 −d1I2 −d2I2 −d3I2

=

0 0 1 0 0 0 0 00 0 0 1 0 0 0 00 0 0 0 1 0 0 00 0 0 0 0 1 0 00 0 0 0 0 0 1 00 0 0 0 0 0 0 1

−9 0 12 0 2 0 −4 00 −9 0 12 0 2 0 −4

,

B =

000I2

=

0 00 00 00 00 00 01 00 1

,

C = N0 N1 N2 N3

=

9 −3 6 2 1 1 0 06 −2 −6 5 0 −4 0 1

,

and

D = N4 =

0 00 0

.




This realization is of order 8; it is controllable but not observable. Reducing (A, B, C, D) to thestandard form for unobservable systems and taking the controllable and observable part, we obtaina controllable and observable realization (Aco, Bco, Cco, Dco) of H(s):

Aco = 17/33 −70/33 31/99 280/99 −34/29780/11 −178/11 −320/33 745/33 −160/9919/11 −20/11 −76/33 80/33 61/9948/11 −87/11 −64/11 116/11 −32/33

−21/11 −30/11 28/11 40/11 14/33

, Bco = −17/297 70/297−80/99 178/99

−19/99 20/99−16/33 29/33

7/33 10/33

,

Cco =

9 −3 6 2 16 −2 −6 5 0

, and Dco =

0 00 0

.

This fifth-order realization is of minimal order. Note that, in this example, the block controller–formrealization is not of minimal order, and that the minimal order is not equal to the degree of d(s)either. Since (Aco, Bco, Cco, Dco) is both controllable and observable, one can in turn transform itto its controller and observer canonical forms. In general, if

H(s) ∈ R(s)l×m and if the degree of the

denominator p olynomial is r, then the minimal order n satisfies r ≤ n ≤ mr; see Theorem 13.13.

13.6 Other Realization Algorithms

13.6.1 Diagonal Realizations [2, Section 8.4.3][1, Section 6.1]

Given a strictly proper H(s) ∈ R(s)l×m, let d(s) be the monic least common multiple of thedenominators of the entries of H(s). Suppose d(s) has distinct real roots λ1, . . . , λr such that

d(s) = (s − λ1) · · · (s − λr).

Then we can expand H(s) into partial fractions and write

H(s) = N(s)/d(s) =

ri=1

Ri/(s − λi),

where the residue matrices Ri can be obtained as

Ri = lims→λi

(s − λi) H(s).

If ρi is the rank of Ri, then we can perform an LU factorization of Ri and write

Ri = CiBi

for some full-column-rank matrix Ci ∈ Rl×ρi and some full-row-rank matrix Bi ∈ R

ρi×m. Then it

is immediate that (A, B, C), with

A =

λ1Iρ1 0. . .

0 λrIρr

∈ R(r

1 ρj)×(

r1 ρj),

B =

B1...

Br

∈ R(r

1 ρj)×m, C =

C1 · · · Cr

∈ R

l×(r

1 ρj),




is a realization of H(s). The realization has order r

j=1 ρ j.Let n =

r j=1 ρ j. The controllability matrix C of the diagonal realization can be written

C = BdV

where

Bd = B1 0

. . .

0 Br

and V = Im λ1Im · · · λn−11 Im

... ... ...Im λrIm · · · λn−1

r Im

.

The matrix V is a block Vandermonde matrix , which is of full rank as long as λi = λ j for i = j.Also, by construction, the matrix Bd has full rank, and hence the realization is controllable. Theobservability can be shown similarly. Therefore, the diagonal realization, if exists, is minimal.

13.6.2 Realization via Markov Parameters [3, Section 5.4D]

As in the previous section, let

d(s) = sr + dr−1sr−1 + · · · + d1s + d0

be the denominator matrix of a strictly proper H(s) ∈ R(s)l×

m. The transfer function matrix H(s)can be written H(s) = H1s−1 + H2s−2 + · · · ,

where Hi are the Markov parameters of H(s). Then (A, B, C) given by

A =

0 Il · · · 0...

... . . .

...0 0 · · · Il

−d0Il −d1Il · · · −dr−1Il

∈ Rrl×rl, B =

H1...

Hr

∈ Rrl×m,

C =

Il 0 · · · 0

∈ R

l×rl

is a realization of H(s). The proof goes as follows:It is easily seen that

CCA

...CAr−1

=

Il 0. . .

0 Il

, (13.12)

and hence that CAi−1B = Hi for i = 1, . . . , r. Since H(s) is strictly proper, thereexist R1, . . . , Rr ∈ R

l×m such that

d(s)

H(s) = (sr + dr−1sr−1 + · · · + d1s + d0)(H1s−1 + H2s−2 + · · · )

= R1sr−1 + · · · + Rr−1s + Rr.

Equating coefficients of equal powers of s, we obtain

H1 = R1,

H2 = R2 − dr−1H1,

...

Hr = Rr − dr−1Hr−1 − · · · − d1H1,

Hi+r = − dr−1Hi+r−1 − · · · − d1Hi+1 − d0Hi, i = 1, 2, . . . .




Using this, it can be shown that CAi−1B = Hi for all i = 1, 2, . . . , which is equivalentto (A, B, C) being a realization of H(s).

The realization (A, B, C) has order rl, and in view of (13.12) it is observable. Similarly, dualityarguments lead to a dual-form controllable realization.

13.7 Determination of Minimal Order [2, Section 8.3.3]

Let H(s) ∈ Rl×m be proper. The determinant of any square submatrix of H(s) is called a minor

of H(s). The monic least common multiple of the denominators of “all nonzero minors” of H(s) iscalled a pole polynomial of H(s) (and the roots of the pole polynomial of H(s) are called the poles of H(s)). The degree of the pole polynomial of H(s) is called the McMillan degree of H(s).

Theorem 13.13 [3, Chapter 5, Theorem 3.11] If (A, B, C, D) is a minimal realization of a transfer function matrix H(s), then the pole polynomial of H(s) is equal to the characteristic polynomial of A, and the McMillan degree of H(s) equals the order of (A, B, C, D).

The Hankel matrix of a proper transfer function

H(s) ∈ R

l×m is defined as

MH (i, j) =

H1 H2 · · · H jH2 H3 · · · H j+1

... ...

...Hi Hi+1 · · · Hi+ j−1

,

where H1, H2, . . . are the Markov parameters such that H(s) = H0 + H1s−1 + H2s−2 + · · · . If (A, B, C, D) is any realization of H(s), then we have MH (i, j) = O(i)C( j), where

O(i) =

CT (CA)T · · ·

CAi−1TT , C( j) =

B AB · · · A j−1B

.

Theorem 13.14 [3, Chapter 5, Theorem 3.13] If r is the degree of the denominator polynomial (i.e., the least common multiple of the denominators of the entries) of a proper transfer func-

tion H(s), then the order of a minimal realization of H(s) is equal to rank MH (r, r).

For example, the pole polynomial of the transfer function matrix H(s) in Section 13.5 is

pH(s) = (s − 1)2(s + 3)3,

Documents

ee580_notes.pdf