Control over Gaussian Channels With and Without Source ...anatolyk/papers/journal/... · source sample[26]–[30], the reason is that a Gaussian source is probabilistically matched

1

Control over Gaussian ChannelsWith and Without Source–Channel Separation

Anatoly Khina, Member, IEEE, Elias Riedel Garding, Gustav M. Pettersson,Victoria Kostina, Member, IEEE, and Babak Hassibi, Member, IEEE

Abstract— We consider the problem of controlling anunstable linear plant with Gaussian disturbances over anadditive white Gaussian noise channel with an averagetransmit power constraint, where the signaling rate of com-munication may be different from the sampling rate of theunderlying plant. Such a situation is quite common sincesampling is done at a rate that captures the dynamics ofthe plant and that is often lower than the signaling rateof the communication channel. This rate mismatch offersthe opportunity of improving the system performance byusing coding over multiple channel uses to convey a sin-gle control action. In a traditional, separation-based ap-proach to source and channel coding, the analog messageis first quantized down to a few bits and then mappedto a channel codeword whose length is commensuratewith the number of channel uses per sampled message.Applying the separation-based approach to control meetsits challenges: first, the quantizer needs to be capable ofzooming in and out to be able to track unbounded sys-tem disturbances, and second, the channel code must becapable of improving its estimates of the past transmis-sions exponentially with time, a characteristic known asanytime reliability. We implement a separated scheme byleveraging recently developed techniques for control overquantized-feedback channels and for efficient decoding of

This work has received funding from the European Union’s Horizon2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No 708932. The work of E. Riedel Garding wassupported by the National Science Foundation (NSF) under Grant CCF-1566567 under the SURF program. The work of G. M. Pettersson wassupported by The Boeing Company under the SURF program. The workof V. Kostina was supported in part by the NSF under Grant CCF-1566567. The work of B. Hassibi was supported in part by the NSFunder grants CNS-0932428, CCF-1018927, CCF-1423663 and CCF-1409204, by a grant from Qualcomm Inc., by NASA’s Jet PropulsionLaboratory through the President and Director’s Fund, and by KingAbdullah University of Science and Technology. This work was done inpart while A. Khina and V. Kostina were visiting the Simons Institute forthe Theory of Computing. The material in this paper was presented inpart at the IEEE Conference on Decision and Control, Las Vegas, NV,USA, Dec. 2016.

A. Khina was with the Department of Electrical Engineering, CaliforniaInstitute of Technology, Pasadena, CA 91125, USA. He is now with theDepartment of Electrical Engineering-Systems, Tel Aviv University, TelAviv 6997801, Israel (e-mail: [email protected]).

E. Riedel Garding was with the Department of Applied Mathematicsand Theoretical Physics, Centre for Mathematical Sciences, Universityof Cambridge, CB3 0WA, Cambridge, United Kingdom. He is now withthe School of Engineering Sciences, KTH Royal Institute of Technology,SE-100 44, Stockholm, Sweden (e-mail: [email protected]).

G. M. Pettersson is with the School of Engineering Sciences, KTHRoyal Institute of Technology, SE-100 44, Stockholm, Sweden (e-mail:[email protected]).

V. Kostina and B. Hassibi are with the Department of ElectricalEngineering, California Institute of Technology, Pasadena, CA 91125,USA (e-mails: {vkostina, hassibi}@caltech.edu).

anytime-reliable codes. We further propose an alternative,namely, to perform analog joint source–channel coding, bythis avoiding the digital domain altogether. For the casewhere the communication signaling rate is twice the sam-pling rate, we employ analog linear repetition as well asShannon–Kotel’nikov maps to show a significant improve-ment in stability margins and linear-quadratic costs overseparation-based schemes. We conclude that such analogcoding performs better than separation, and can stabilizeall moments as well as guarantee almost-sure stability.

Index Terms— Networked control systems, Gaussianchannel, combined source–channel coding, quantization,Lloyd–Max algorithm, channel coding, tree codes.

I. INTRODUCTION

The current technological era of ubiquitous wireless connec-tivity and the Internet of Things exhibits an ever-growing de-mand for new and improved techniques for stabilizing cyber-physical and networked control systems, which as a resulthave been the subject of intense recent investigations [1]–[4]. Unlike traditional control with colocated plant, observerand controller, the components of such systems are separatedby unreliable communication links. In many of these systems,the rate at which the output of the plant is sampled andobserved, as well as the rate at which control inputs are appliedto the plant, is different from the signaling rate with whichcommunication occurs. The rate at which the plant is sampledand controlled is often governed by how fast the dynamics ofthe plant is, whereas the signaling rate of the communicationdepends on the bandwidth available, the noise levels, etc. As aresult, there is no inherent reason why these two rates shouldbe related and, in fact, the communication signaling rate isalmost always higher than the control sampling rate.

This latest fact clearly gives us the opportunity to improvethe performance of the system by conveying the informationabout each sampled output of the plant, and/or each controlsignal, through multiple uses of the communication channel.

The standard information-theoretic approach suggests quan-tizing the analog messages (the sampled output or control sig-nal) and then protecting the quantized bits with a channel errorcorrecting code whose block length is commensurate with thenumber of channel uses available per sample. This approachrelies on the source–channel separation principle, which prof-fers that quantization of the messages and channel coding ofthe quantized bits can be done independently of one another.

2

Nonetheless, while source–channel separation-basedschemes become optimal in communication systems wherelarge blocks of the message and the channel code areprocessed together (necessitating non-causal knowledge of allthe message signals and entailing large delays)—a celebratedresult [5], [6, Ch. 3.9]—it is not true for control systemsthat require real-time (low-delay) communication of causallyavailable messages. Furthermore, since any error made in thepast is magnified in each subsequent time step due to theunstable nature of the plant, the source–channel separationprinciple requires a stronger notion of error protection,termed anytime reliability by Sahai and Mitter [7]. Anytimereliability guarantees that the error probability of causallyencoded quantized (“information”) bits decays faster thanthe inflation factor at each step. Sahai and Mitter [7] furtherobserved that anytime-reliable codes have a natural tree codestructure reminiscent of the codes developed by Schulman [8]for the related problem of interactive communication.

Sukhavasi and Hassibi [9] further showed that anytime relia-bility can be guaranteed with high probability by concentratingon the family of linear time-invariant (LTI) codes and choos-ing their coefficients at random. Unfortunately, maximum-likelihood (ML) (optimum) decoding of tree codes is infeasi-ble.1 To overcome this problem, a sequential decoder [10] [11,Ch. 6.4], [12, Ch. 10], [13, Sec. 6.9], [14, Ch. 6], [15, Ch. 6]for tree codes was proposed in [16] and was shown to achieveanytime reliability with high probability while maintainingbounded expected decoding complexity, albeit with some lossof performance.

Tree codes transform the control task over a noisy channel tothat over a noiseless channel with finite-capacity C, implyingthat the channel code needs to be supplemented with anadequate fixed-rate quantizer (a.k.a. fixed-length lossy sourcecoder). Such a quantizer compresses the analog signal to pack-ets of exactly C bits to be communicated from the observerto the controller at every time step. As unstable systems withdisturbances that have distributions with unbounded supportcannot be stabilized by a static quantizer [17, Sec. III-A],adaptive uniform and logarithmic quantizers that establishstabilizability guarantees were devised by Yuksel [18] andMinero et al. [19], respectively. A greedy algorithm that re-calculates, at every time step, the probability density function(PDF) of the source conditioned on the previously transmittedpackets (a la sequential Bayesian filtering [20]) and appliesLloyd–Max quantization [21, Ch. 6] with respect to this PDF,was proposed by Bao et al. [22] (albeit without any optimalityclaims). For the scenario of disturbances with logarithmicallyconcave (log-concave) distributions and scalar measurements,this algorithm has been recently shown to be greedily optimalin [23], i.e., it minimizes the linear-quadratic cost at each timestep; it was further shown there to be nearly globally optimal.

An obvious alternative strategy to separated source/channelcoding is to simply repeat the transmitted (analog) signal—thisadds a linear factor to the SNR (3 dB for a single repetition).This strategy maps the analog control signals directly into

1Except over erasure channels, over which ML decoding amounts to solvinglinear equations [9].

analog communication signals, avoiding the digital domainaltogether, and can therefore be viewed as a simple instanceof joint source–channel coding (JSCC) [24].

Surprisingly, in the Gaussian rate-matched case, in whichone additive white Gaussian noise (AWGN) channel use isavailable per one white Gaussian source sample, a simpleamplifier achieves the Shannon limit with zero delay [25],[26]. The optimality of linear schemes extends further to thecase where KC > 1 uses of an AWGN channel with perfectinstantaneous feedback are available per one white Gaussiansource sample [26]–[30], the reason is that a Gaussian sourceis probabilistically matched to a Gaussian channel [31], anuncommon coincidence. Tatikonda and Mitter [32] exploiteda special property of the erasure channel with feedback, inwhich a retransmission scheme attains its capacity withoutdelay. A related example is control over a packet drop channel,considered by Sinopoli et al. [33]. There, a simple retransmis-sion scheme attains the optimum, as long as the packet dropprobability is not too high. Coding of Gauss–Markov sourcesover a packet erasure channel with feedback is studied in [34].

JSCC in the absence of probabilistic matching is challeng-ing. In the Gaussian rate-mismatched case with no commu-nication feedback, in which KC > 1 AWGN channel usesare available per one source sample, repetitive transmissionof the source sample is suboptimal. Non-linear mappings areknown to achieve better performance, as noted originally byShannon [35] and Kotel’nikov [36], and in the context ofcontrol—in [37]–[39] (and is akin to Witsenhausen’s cele-brated counterexample [40]).

In this work, we concentrate on the simple case of stabi-lizing a scalar discrete-time linear quadratic Gaussian (LQG)control system over an AWGN channel with KC = 2 channeluses per control sample, with a fixed signal-to-noise ratio(SNR). As we show in the sequel, this SNR imposes an upperlimit on the size of the maximum unstable eigenvalue of theplant that can be stabilized.

We develop end-to-end separation-based and JSCC schemesand compare their LQG costs, as well as the minimum requiredSNR for stabilizing the system; to the best of our knowledgethis is the first time the latter is devised for the scenario ofLQG control over an AWGN channel (cf. [7]), whereas theformer (in the absence of instantaneous feedback) was offeredin the conference version of this work [41]. We show thatJSCC schemes achieve far better performance while requiringfar less computational and memory resources. We further ob-serve that an inherent advantage of JSCC schemes is that theyallow a graceful improvement in performance with the SNR,while the performance of separation-based schemes saturatesdue to their digital nature. Moreover, whereas separation-based schemes can guarantee only a finite number of boundedmoments, certain JSCC schemes, e.g., linear ones (repetition-based included), can stabilize all moments (as well as guaran-tee almost-sure stability).

Another important product of this work is the implemen-tation and simulation of all the aforementioned schemes andalgorithms along with a comparison of their performance; thePython 3 code is available online in [42].

An outline of the rest of the paper is as follows. We

3

Fig. 1. A scalar control system with an AWGN driving disturbanceand an AWGN communication channel. The dashed line represents theassumption that the past control signals are available at the transmitter.

formulate the problem setup in Section II. Three differentingredients that are used to construct the schemes for controlover noisy channels of this paper, namely, a quantizer forcontrol over a noiseless finite-rate channel, an anytime-reliabletree code and an Archimedean bi-spiral-based JSCC map, aredescribed in Sections III, IV and V, respectively. They aresubsequently used in Secs. VI and VII to develop source–channel separation-based and JSCC-based schemes for LQGcontrol over an AWGN channel, and are compared in terms oftheir LQG cost in Section VIII. We conclude the paper withSection IX, by discussing the principal differences betweenthe proposed schemes along with possible extensions.

II. PROBLEM SETUP

We now formulate the control–communications setting thatwill be treated in this work, depicted in Fig. 1. We concentrateon the simple case of a scalar linear fully-observable system.In contrast to traditional control settings, the observer and thecontroller are not colocatedin this setting, and are connectedinstead via a scalar AWGN channel.

Remark II.1. The model and solutions can be extended to morecomplex cases of vector states and channels; see Section IX-B.

The control and transmission duration spans the time inter-val [T ] , {1, . . . , T}.

Plant: Scalar discrete-time linear system dynamic:

xt+1 = αxt + wt + ut, t ∈ [T − 1], (1)

where xt is the (scalar) state at time t, wt is an AWGN ofpower W , α is a known scalar satisfying |α| > 1, and ut is thecontrol signal. We further assume that x0 = 0 for simplicity.2

Channel: We assume KC ∈ N channel uses are availableper each control sample. Hence, at each time instant t we canuse the channel

bt;i = at;i + nt;i, i ∈ [KC ], t ∈ [T − 1], (2)

KC times, where bt;i is the i-th channel output correspondingto control sample t, at;i is the corresponding channel inputsubject to a unit power constraint

E[a2t;i

]≤ 1, (3)

2This assumption can be replaced with an initial state x0 that is distributedaccording to some known PDF, as is indeed the case in all subsequent steps.

and nt;i is an AWGN of power 1/SNR.3 We collect all the KC

channel uses in a column vector and denote it, with a slightabuse of notation, by at , (at;1, . . . , at;KC )T, where ‘T’denotes the transpose operation. The corresponding channeloutput vector is denoted by bt , (bt;1, . . . , bt;KC )T.

Causal transmitter: At time t, generates KC channelinputs by applying a causal function Et : Rt × Rt−1 → RKCto the measured states xt , (x1, . . . , xt) and all past controlsignals ut−1 , (u1, . . . , ut−1):

at = Et(xt, ut−1

), (4)

where the input is subject to an average power constraint (3).Remark II.2. In this work, we assume that the ob-server/transmitter knows all past control signals ut−1; fora discussion of the scenario when such information is notavailable at the observer, see Section IX-C.

Causal receiver: At time t, observes KC channel outputsand generates a control signal ut by applying a causal functionDt : RtKC → R to all the available channel outputs:

ut = Dt(bt), (5)

where bt , (b1, . . . , bt)T.

Cost: Similarly to the classical LQG control setting (inwhich the controller and the observer are colocated), we wishto minimize the average-stage LQG cost at the time horizon T :

JT ,1

TE

[QTx

2T +

T−1∑t=1

(Qtx

2t +Rtu

2t

)],

for known non-negative weights {Qt} and {Rt}, by designingappropriate operations at the observer, which also plays therole of the transmitter over the channel (2); and the controller,which also serves as the receiver over the channel (2).

For the important special case of fixed parameters,

Qt ≡ Q, Rt ≡ R,we further define the infinite-horizon cost,

J∞ , limT→∞

JT , (6)

assuming the limit exists.We recall next recently developed schemes for quantization

and channel coding for control as well as results from infor-mation theory for JSCC design with low delay.

III. CONTROL WITH NOISELESS FINITE-RATE FEEDBACK

In this section we consider the model of Section II withthe AWGN channel (2) replaced with a noiseless channel offinite capacity C, depicted in Fig. 2. That is, in this case, thechannel, transmitter and receiver are as follows.

Channel: At time t, a packet `t ∈ {0, . . . , 2C − 1} issent over a noiseless channel of capacity C, meaning that thereceiver obtains `t at time t.

Transmitter: The function Et (4), in this case, has a discretecodomain {0, . . . , 2C − 1} (with no power constraints).

3This representation is without loss of generality since the case of anaverage power PC and noise power N can always be transformed to anequivalent channel with average power 1 and noise power N/PC , 1/SNRby multiplying both sides of (2) by 1/

√PC .

4

Receiver: The domain of Dt (5) is {0, . . . , 2C − 1}.Thus, the transmitter–receiver design amounts, in this case,

to fixed-length sequential quantization.A recent result [23] shows that an adaptive quantizer that

successively calculates the PDF of xt given `t and appliesLloyd–Max quantization with respect to it is greedily optimaland close-to-globally optimal whenever fw is log-concave.

Definition III.1 (Log-concave function; see [43]). A functionf : R→ R≥0 is said to be log-concave if its logarithm log ◦fis concave; we use the extended definition that allows f(x) toassign zero values, i.e., log f(x) ∈ R∪ {−∞} is an extendedreal-value function that can take the value −∞.

We recall the Lloyd–Max algorithm and its optimality guar-antees in Section III-A; the appropriate adaptive networkedcontrol system is described in Section III-B.

A. Quantizer DesignThe exposition in this section is based on [21, Part II].

Definition III.2 (Quantizer). A scalar quantizer Q of rate Cis described by an encoder EQ : R → {0, . . . , 2C − 1} and adecoder DQ : {0, . . . , 2C − 1} → {c[0], . . . , c[2C − 1]} ⊂ R.With a slight abuse of notation, we shall define the quantiza-tion operation Q : R→ {c[0], . . . , c[2C − 1]} as the composi-tion of the encoding and decoding operations: Q = DQ ◦ EQ.4

The reproduction points are assumed, w.l.o.g., to be ordered:5

c[0] < c[1] < · · · < c[2C − 1].

We denote by I[`] the collection of all points that are mappedto index ` (equivalently to the reproduction point c[`]):

I[`] , {x|x ∈ R, EQ = `} = {x|x ∈ R,Q = c[`]}.

We shall concentrate on the following class of quantizers.

Definition III.3 (Regular quantizer). A scalar quantizer isregular if every cell I[`] (` = 0, . . . , 2C − 1) is a contiguousinterval that contains its reproduction point c[`]:

c[`] ∈ I[`] = [p[`], p[`+ 1]) , ` = 0, . . . , 2C − 1,

where p ,{p[0], . . . , p[2C ]

}is the set of partition levels—the

cells boundaries. Hence, a regular scalar quantizer can be rep-resented by the input partition-level set and the reproduction-point set c ,

{c[0], . . . , c[2C − 1]

}.

Cost: The cost we wish to minimize is the mean squarederror distortion between a source w with a given log-concavePDF fw and its quantization Q(w):

D , E[(w −Q(w))2

](7a)

=

2C−1∑`=0

∫ p[`+1]

p[`]

(w − c[`])2fw(w)dw. (7b)

Denote by D∗ the minimal achievable distortion D; theoptimal quantizer is the one that achieves D∗.

4The encoder and decoder that give rise to the same quantizer are uniqueup to a permutation of the labeling of the index `.

5If some inequalities are not strict, then the quantizer can be reduced toanother quantizer of lower rate.

Fig. 2. Scalar control system with a driving white Gaussian disturbanceand a noiseless finite-capacity channel (“bit pipe”). The dashed linerepresents a bit-pipe of capacity C.

Remark III.1. Since fw is log-concave, it is continuous [43].Hence, the inclusion/exclusion of the boundary points of eachcell does not affect the distortion of the quantizer, meaningthat the boundary points can be broken systematically.Remark III.2. We concentrate, in this work, on input PDFswith infinite support. Consequently p[0] = −∞ and p[2C ] =∞, and the leftmost and rightmost intervals are open.

The optimal quantizer satisfies the following necessaryconditions [21, Ch. 6.2].

Proposition III.1 (Nearest neighbor condition). For a fixedreproduction-point set c (fixed decoder), the partition-level setp (encoder) that minimize the distortion D (7) is

p[`] =c[`− 1] + c[`]

2, ` = 1, 2, . . . , 2C − 1, (8)

and p[0] = −∞ and p[2C ] =∞.

Proposition III.2 (Centroid condition). For a fixed partition-level set p (fixed encoder), the reproduction-point set c (de-coder) that minimizes the distortion D (7) is

c[`] = E[w∣∣ p[`] ≤ w < p[`+ 1]

], ` = 0, . . . , 2C − 1. (9)

The optimal quantizer must simultaneously satisfy both (9)and (8); iterating between these two necessary conditions givesrise to the Lloyd–Max algorithm.Algorithm III.1 (Lloyd–Max).

Initialization. Pick an initial reproduction-point set c.Iteration. Repeat the two steps

1) Fix c and set p as in (8),2) Fix p and set c as in (9),

interchangeably, until the decrease in the distortion D periteration goes below a desired threshold.

Propositions III.2 and III.1 suggest that the distortion atevery iteration decreases; since the distortion is bounded frombelow by zero, the Lloyd–Max algorithm is guaranteed toconverge to a local optimum.

Unfortunately, multiple local optima may exist in general(e.g., Gaussian mixtures with well separated components),rendering the algorithm sensitive to the initial choice c.

Nonetheless, sufficient conditions for the existence of aunique global optimum were established in [44]–[46]. Theseguarantee the convergence of the algorithm to the global op-timum for any initial choice of c. An important class of PDFsthat satisfy these conditions is that of the log-concave PDFs.

5

Theorem III.1 ([44]–[46]). Let the source PDF fw be log-concave. Then, the Lloyd–Max algorithm converges to aunique solution that minimizes the quadratic distortion (7).

B. Controller DesignWe now describe the optimal greedy control policy, the

implementation of which is available in [42, tree/master/code/separate/control]. To that end, we make use ofthe following lemma that extends the separation principle ofestimation and control to networked control systems.

Lemma III.1 ([47], [29]). The optimal control law is given by

ut = −Ltxt,

where

Lt =St+1

Rt + St+1α (10)

is the optimal linear quadratic regulator (LQR) control gain,xt , E [xt|`t], and St satisfies the dynamic Riccati backwardrecursion [48] with ST = QT and ST+1 = LT = 0:

St = Qt +St+1RtSt+1 +Rt

α2.

Moreover, this controller achieves the cost6

JT =1

T

T∑t=1

(StW +GtE

[(xt − xt)2

] ),

with Gt = St+1α2 − St +Qt.

Remark III.3. Lemma III.1 holds true for more general chan-nels, with xt = E [xt|bt], where bt is the channel output [29].

The optimal greedy algorithm minimizes the estimationdistortion E

[(xt − xt)2

]at time t, without regard to its effect

on future distortions. To that end, at time t, the transmitterand the receiver calculate the PDF of xt conditioned on `t−1

and ut−1, fxt|`t−1,ut−1 , and apply the Lloyd–Max quantizerto this PDF.7 We refer to fxt|`t−1,ut−1 and to fxt|`t,ut−1 as theprior and posterior PDFs, respectively.

Although the optimal greedy algorithm does not achieveglobal optimality [49], its loss is negligible [23].Algorithm III.2 (Optimal greedy control).

Initialization. Both the transmitter and the receiver set1) `0 = x0 = u0 = 0.2) Prior PDF: fx1|`0,u0

(x1|0, 0) ≡ fx1(x).

Observer/Transmitter. At time t ∈ [T − 1]:1) Observes xt.2) Runs the Lloyd–Max algorithm (Algorithm III.1) with

respect to the prior PDF f(xt|`t−1, ut−1) to obtain thequantizer Qt(xt) of rate C; we denote its partition-leveland reproduction-point sets by pt and ct, respectively.

6We set RT = 0 and `T = 0 for the definition of xT , as no transmissionor control action are performed at time T .

7Since ut−1 is a deterministic function of `t−1, it suffices to condition on`t−1. However, when incorporating Algorithm III.2 into a separation-basedscheme, this distinction becomes useful since ut−1 becomes a function ofpossibly corrupted channel outputs in this case.

3) Quantizes the system state xt [recall Definition III.2]:

`t = EQt(xt),xt = Qt(xt) = DQt(`t).

4) Transmits the quantization index `t.5) Calculates the posterior PDF f(xt|`t):

fxt|`t,ut−1(xt|`t, ut−1)

=

{fxt|`t−1,ut−1(xt|`t−1, ut−1)/γ, xt ∈ I[`t]

0, otherwise

where I[`t] , [pt[`t], pt[`t + 1]) as in Definition III.3,

and γ ,∫ pt[`t+1]

pt[`t]

fxt|`t−1,ut−1(α|`t−1, ut−1)dα.

6) Calculates the next prior PDF using (1) and ut = −Ltxt:

fxt+1|`t,ut(xt+1|`t, ut)

=1

|α|fxt|`t,ut−1

(xt+1 − ut

α

∣∣∣∣`t, ut−1

)∗ fw (xt+1) ,

where ‘∗’ denotes the convolution operation, and thetwo convolved terms correspond to the PDFs of thequantization error α(xt − xt) and the disturbance wt.

Controller/Receiver. At time t ∈ [T − 1]:1) Runs the Lloyd–Max algorithm (Algorithm III.1) with

respect to the prior PDF f(xt|`t−1) as in Step 2 of theobserver/transmitter protocol.

2) Receives the index `t.3) Reconstructs the quantized value: xt = DQt(`t).4) Applies the control actuation ut = −Ltxt to the system.5) Calculates the posterior PDF f(xt|`t) and the next

prior PDF f(xt+1|`t) as in Steps 5 and 6 of the ob-server/transmitter protocol.

Theorem III.2 ( [23]). Let fw be a log-concave PDF. Then,Algorithm III.2 constitutes the optimal greedy control policy.

The following is an immediate consequence of the log-concavity of the Gaussian PDF.

Corollary III.1. Let fw be a Gaussian PDF. Then, Algo-rithm III.2 constitutes the optimal greedy control policy.

IV. ANYTIME-RELIABLE CODES

We now describe causal error correcting codes that allow tomeet the assumption of a noiseless finite-capacity channel ofSection III over the AWGN channel (2).

Since any decoding mistake is multiplied by α—and thecorresponding second moment (power) by α2—at every timestep, the code should have an error probability that decaysexponentially with time with an exponent that is greater thanα2;8 see [7], [9], [16] for further details and discussion.

We construct such codes, termed anytime-reliable codes [7],for memoryless binary-input output-symmetric (MBIOS)channels and then apply these results for the AWGN channelby employing appropriate digital constellations.

8To stabilize higher moments one needs higher exponents. See the discus-sion in Section IX-A below.

6

Encoder w(z|c) Decoderıt ct zt

{ıi|t}ti=1

Fig. 3. MBIOS channel with reconstructions of all past information bits.

Definition IV.1 (MBIOS channel). A binary-input channel isa system with binary input alphabet {0, 1}, output alphabetZ and two probability transition functions: q(z|0) for inputc = 0 and q(z|1) for input c = 1. The channel is said tobe memoryless if the probability distribution of the outputdepends only on the input at that time and is conditionallyindependent of previous and future channel inputs and outputs.It is further said to be output-symmetric if there exists aninvolution π : Z → Z , i.e., a permutation that satisfiesπ−1 = π, such that q(π(z)|0) = q(z|1) for all z ∈ Z .9

The encoder and resulting code need to be causal, in ourcase, due to the sequential nature of the information stream.That is, at time instant t, k new information bits ıt are fed tothe encoder; the encoder, then, produces n coded bits ct byencoding all of the available information bits ıt:

ct = E't

(ıt), (11)

using an encoding function E't : {0, 1}kt → {0, 1}n, agreed

upon by the encoder and the decoder prior to transmission.The sequential encoding operation can be conveniently

viewed as advancing over a prefix tree (trie) and the corre-sponding codes are therefore referred to as tree codes.

At time t, the decoder recovers estimates {ıi|t}ti=1 of all thepast information bits ıt by applying a causal function D' :Znt → {0, 1}kt to all the received channel outputs ct, toproduce (see also Fig. 3)(

ı1|t, ı2|t, . . . , ıt|t)

= D't

(ct). (12)

One is then assigned the task of choosing a sequenceof function pairs {(E'

t ,D't )|t ∈ N} that provides anytime

reliability. We recall this definition as stated in [9].

Definition IV.2 (Anytime reliability). Define the probability ofthe first error event at time t happening d steps back (delay) as

Pe(t, d) , P(ıt−d 6= ıt−d|t,∀δ > d, ıt−δ = ıt−δ|t

),

where the probability is over the randomness of the informa-tion bits {ıt} and the channel noise. Suppose we are assigneda budget of n channel uses per time step of the evolution of theplant. Then, an encoder–decoder pair is called (R, β) anytimereliable if there exist A ∈ R and d0 ∈ N, such that

Pe(t, d) ≤ A2−βnd, ∀t, d ≥ d0, (13)

where β is called the anytime exponent.

Remark IV.1. The requirement of d ≥ d0 in (13) can alwaysbe dropped, by replacing A by a larger constant. Conversely,A can be replaced with 1 by reducing β by ε > 0, however

9This also extends to additive noise channels, such as the binary-inputAWGN channel.

small, and taking a large enough d0. Nonetheless, we use bothA and d0 in the definition for convenience.

A. LTI Anytime-Reliable Codes under ML DecodingFollowing Sukhavasi and Hassibi [9], we now present a

linear time-invariant (LTI) anytime-reliable code ensembleunder ML decoding.

When restricted to an LTI (“tree”) code, each function E't

can be characterized by a set of matrices {G1, . . . ,Gt}, whereGt ∈ Zn×k2 . The sequence of quantized measurements at timet, {bi}ti=1, is encoded as,

ct = G1ı1 + G2ı2 + · · ·+ Gtıt , (14)

or equivalently in matrix form:

c = Gn;Rı,

with

Gn,R =

G1 0 0 · · · · · ·G2 G1 0 · · · · · ·

......

. . . . . . · · ·Gt Gt−1 · · · G1 0

......

......

. . .

, (15a)

ıT =[ıT1 ıT2 · · · ıTt . . .

], (15b)

cT =[cT1 cT2 · · · cTt . . .

]. (15c)

We now define the random LTI tree code ensemble.

Definition IV.3 (LTI tree code ensemble). An ensemble ofLTI tree codes of rate R = k/n that map kt information bitsinto n bits at every time step t, where the entries in all {Gi}of Gn;R of (15a) are i.i.d. and uniform.

Theorem IV.1 (Error exponent under ML decoding). Let q bean MBIOS channel. Let further ε > 0 and d0 ∈ N. Then, theprobability that a particular code from the random LTI treecode ensemble of Definition IV.3 has an anytime exponent (13)of EG(R)− ε, for all t ∈ N and d > d0, under optimal (ML)decoding, is bounded from below by

Pr

( ∞⋂t=1

t⋂d=d0

{Pe(t, d) ≤ 2−[EG(R)−ε]nd

})≥ 1− 2−εnd0

1− 2−εn

where EG is the block random-coding error exponent [50,Ch. 9], [13, Sec. 5.6], [12, Ch. 7]:

EG(R) , max0≤ρ≤1

[E0(ρ)− ρR] , (16a)

E0(ρ) , − log∑z∈Z

[1

2q

11+ρ (z|0) +

1

2q

11+ρ (z|1)

]1+ρ

. (16b)

Thus, for any ε > 0, however small, this probability can bemade arbitrarily close to 1 by taking d0 to be large enough.

Unfortunately, ML decoding requires searching over all pos-sible codewords—the number of which grows exponentiallyfast with time—rendering it infeasible except over erasurechannels [9]. We therefore turn to sequential decoding, whichtrades some performance for feasible expected complexity.

7

B. LTI Anytime-Reliable Codes underSequential Decoding

Instead of an exhaustive search over all possiblecodewords—the complexity of which grows as O

(2kt)—as

is done in ML decoding, one may restrict the search to onlythe most likely codeword paths, such that their per letter com-plexity does not grow significantly with time. Such algorithmsare known collectively as sequential decoding algorithms.

In this work, we shall concentrate on one of the two popularvariants of this algorithm—the Stack Algorithm, the otherbeing the Fano Algorithm. The former achieves better timecomplexity and smaller error probability but is more expensivein terms of memory, compared to the latter. Nonetheless,the anytime exponent of both algorithms is the same; for atreatment of the Fano algorithm, which is similar to the onepresented next, the reader is referred to [16].

We next summarize the relevant properties of the stackdecoding algorithm when using the generalized Fano metric(see, e.g., [12, Ch. 10]) to compare possible codeword paths:

M(c1, . . . , cN ) =

T∑t=1

M(ct), (17a)

M(ct) , logq(zt|ct)∑

c′∈{0,1}n(

12

)nq(zt|c′)

− nB,(17b)

where B is referred to as the metric bias. It penalizes longerpaths when the metrics of different-length paths are compared.

In contrast to ML decoding, where at time t, all possiblepaths (of length kt) are explored to determine the path withthe total maximal metric,10 when using the stack sequentialdecoding algorithm, a list of partially explored paths is storedin a priority queue, where at each step the path with the highestmetric is further explored and replaced with its immediatedescendants and their metrics. The stack algorithm is outlinedin Algorithm IV.1 and implemented in [42, tree/master/code/separate/coding], [51]; for a detailed descriptionof the stack algorithm (as well as the Fano algorithm andvariants thereof), see [11, Ch. 6.4], [12, Ch. 10], [13, Sec. 6.9],[14, Ch. 6], [15, Ch. 6].

Theorem IV.2 (Error exponent under sequential decoding).Let q be an MBIOS channel. Let further ε > 0 and d0 ∈ N.Then, the probability that a particular code from the randomLTI tree code ensemble of Definition IV.3 has an anytimeexponent (13) of EJ(R)− ε, for all t ∈ N and d > d0, undersequential stack decoding, is bounded from below by

Pr

( ∞⋂t=1

t⋂d=d0

{Pe(t, d) ≤ A2−[EJ (R)−ε]nd

})≥ 1− 2−εnd0

1− 2−εn

where EJ is Jelinek’s sequential decoding exponent:

EJ(B,R) , max0≤ρ≤1

ρ

1 + ρ

{E0(ρ) +B − (1 + ρ)R

},

E0 is given in (16b), and A is finite for B < E0(1) and isbounded from above by11

10Note that optimizing (17a) in this case is equivalent to ML decoding.11Note that E0(ρ)/ρ is a monotonically decreasing function of ρ, therefore

B < E0(1) guarantees that E0(ρ)− ρB > 0.

A ≤ 1− e−t[E0(ρ)−ρB]

1− e−[E0(ρ)−ρB]≤ 1

1− e−[E0(ρ)−ρB]<∞.

Thus, for any ε > 0, however small, this probability can bemade arbitrarily close to 1 by taking d0 to be large enough.

Since EJ(B,R) is a monotonically increasing function ofB, choosing B = E0(1) maximizes the exponential decayof Pe(d) in d.12 Interestingly, for this choice of bias, we haveEJ(E0(1), R) = EG(R) whenever EG(R) is achieved by ρ =1 in (16a), i.e., for rates below the critical rate. For other valuesof ρ, EJ(E0(1), R) is strictly smaller than EG(R).

The choice B = R, on the other hand, is known to minimizethe expected computational complexity (which has a Paretodistribution; see [16], for details), and is therefore a popularchoice in practice. Moreover, for rates below the cutoff rateR < E0(1), the expected number of metric evaluations (17b)at each time instant is finite and does not depend on d, for anyB ≤ E0(1) [13, Sec. 6.9], [12, Ch. 10]. Thus, the only increasein expected complexity of this algorithm with d comes from anincrease in the complexity of evaluating the metric of a singlesymbol (17b). Since the latter increases (at most) linearly withd, the total complexity of the algorithm grows polynomiallyin d. Furthermore, for rates above the cutoff rate, R > E0(1),the expected complexity is known to grow rapidly with thecode length for any metric [52], implying that the algorithmis applicable only for rates below the cutoff rate E0(1).

C. Modulation

In order to support the transmission of more than onecoded bit per channel use, we modulate the bits using pulse-amplitude modulation (PAM). Specifically, we map everyn/KC consecutive coded bits of ct = (ct;1, . . . , ct;n)T intoa constellation point of size 2k, where k is the number ofinformation bits than need to be conveyed at each time step t.We normalize the constellation to have an average unit power:

at;i =

√3

2k−1 − 1

k∑j=1

2j−1(−1)ct;j+ki , i ∈ [KC ]. (18)

V. LOW-DELAY JOINT SOURCE–CHANNEL CODING

In this section, we review known results from informationtheory for transmitting an i.i.d. zero-mean Gaussian sourcest of power PS over an AWGN channel (2). Following theproblem setup of Section II, we consider the case where KC ∈N channel uses of (2) are available per each source sample st.We suppress the time index t throughout this section.

The goal of the transmitter is to convey the source s to thereceiver with a minimal possible average distortion, where theappropriate distortion measure for our case of interest is themean square error distortion.

Transmitter: Similarly to (4), generates KC channel inputsby applying a function E : R→ RKC to the source sample s:

a = E (s) ,

12For finite values of d a lower choice of B may be better, since the constantA might be smaller in this case.

8

Algorithm IV.1 Sequential Decoding Stack Algorithm.Q← MaxPriorityQueue(node 7→ node.metric) . Leaf nodes, ordered by Fano metricQ.push with priority(root)while Q.top.depth < t do . Stop at the first sequence to reach full length

node← Q.pop() . Take the sequence with the largest metricfor child ∈ node.create children() do . Replace it with its 2k extensions

Q.push with priority(child)

return Q.top.input sequence() . Reconstruct the input sequence by backtracking

where a is defined as in (4) and is subject to an average powerconstraint (3).

Receiver: Similarly to (4), observes the KC channel outputsb [defined as in (4)], and constructs an estimate s of s, byapplying a function D : RKC → R:

s = D (b) .

Cost: The cost, commonly referred to as average distortionin the context of JSCC, is defined by

D = E[(s− s)2

],

and the corresponding (source) signal-to-distortion ratio (SDR)is defined as

SDR ,PSD

.

Our results here are more easily presented in terms ofunbiased errors, as these can be regarded as uncorrelated ad-ditive noise in the sequel (when used as part of the developedcontrol scheme). Therefore, we consider the use of (sample-wise) correlation-sense unbiased estimators (CUBE), namely,estimators that satisfy

E [s (s− s)] = 0.

We note that any estimator sB can be transformed into a CUBEs by multiplying by a suitable constant:

s =E[s2]

E [ssB ]sB ; (19)

for a further discussion of such estimators and their use incommunications the reader is referred to [24].

Shannon’s celebrated result [5] states that the minimalachievable distortion, using any transmitter–receiver scheme,is dictated, in the case of a Gaussian source, by13

1

2log (1 + SDR) = R(D) ≤ KCC = KC

2 log (1 + SNR)(20)

where R(D) is the rate–distortion function of the source and Cis the channel capacity [5]; this result remains true even in thepresence of feedback—when the channel outputs are availableat the transmitter [31, Ch. 1.5]. Thus, the optimal SDR,commonly referred to as optimum performance theoreticallyachievable (OPTA) SDR, is given by

SDROPTA = (1 + SNR)KC − 1. (21)

13The rate–distortion function here is written in terms of the unbiased SDR,in contrast to the more common biased SDR expression log(SDR).

While (21) is attainable via separation for KS sourcesamples and KSKC channel uses in the limit of large KS ,it is unknown, in general, how closely (21) can be approachedat finite delay. Here we focus on the scenario of interest tocontrol, namely, the zero-delay case, in which a single Gaus-sian sample is instantaneously mapped to KC channel uses.

We next concentrate on the case of KC > 1 and perfect in-stantaneous feedback, in Section V-A. We further treat the caseof KC = 2 when no feedback is available, in Section V-B.

A. With FeedbackWhen perfect instantaneous feedback is available, the fol-

lowing simple scheme, due to Elias [26] (see also [31, Ch. 3.5]and the references therein), is known to achieve SDROPTA forKC ∈ N.Scheme V.1 (JSCC with feedback).

Transmitter. At channel use i ∈ [KC ]:• Calculates the MMSE estimation error of the source s

given all past outputs (b1, . . . , bi−1) (available via theinstantaneous feedback):

sMMSEi−1 = s− sMMSE

i−1 ,

where the MMSE estimate si of st given (b1, . . . , bi) isequal to (clearly sMMSE

0 = 0)

sMMSEi = sMMSE

i−1 + SNR

√PS

(1 + SNR)i+1

bi , (22)

• Transmits the estimation error si−1 after a suitable poweradjustment:

ai =(1 + SNR)

i−1

PSsMMSEi−1 . (23)

Receiver. At channel use i ∈ [KC ]:• Calculates the MMSE estimate sMMSE

i of s from(b1, . . . , bi) as in (22).

• Calculates the CUBE estimate si of s from (b1, . . . , bi)using (19):

si =(1 + SNR)

i

(1 + SNR)i − 1

sMMSEi .

Theorem V.1 ([26]). Scheme V.1 achieves the OPTA SDR (21).

We provide a short proof, for completeness.

Proof. The transmitter calculates the MMSE estimate from thechannel outputs which are available to it via the feedback andtransmits the estimation error with a proper power adjustment.

9

Clearly, sMMSEt;0 = E [st] = 0.

At channel use i, the MMSE estimate is given by

sMMSEi , E [s|b1, . . . , bi] (24a)

= E[sMMSEi−1 + sMMSE

i−1

∣∣b1, . . . , bi] (24b)

= sMMSEi−1 + E

[sMMSEi−1

∣∣b1, . . . , bi] , (24c)

= sMMSEi−1 +

SNR

1 + SNR

√PS

(1 + SNR)i−1

bi , (24d)

where (24c) holds since (sMMSEi−1 , bi) are independent of

b1, . . . , bi−1 due to the structure of ai (23), the fact that theMMSE estimation error is orthogonal to all the measurements,and hence also independent by Gaussianity, and (24d) holdssince the MMSE estimator is linear in the Gaussian case.

The MMSE is equal to the conditional MMSE in theGaussian case, and is given by

E[s2i

∣∣b1, . . . , bi] = E[s2i

]=

1

(1 + SNR)iPS .

This concludes the proof.

Remark V.1 (Non-Gaussian noise). For the case of an additivenon-Gaussian noise channel with a given SNR, Scheme V.1achieves an SDR of (1 + SNR)KC − 1. Since linear optimiza-tion is generally suboptimal in the non-Gaussian case, betterperformance can be attained using an appropriate schemeinstead of Scheme V.1 (which is the optimal linear scheme);a notable attempt in this direction was made by Shayevitz andFeder [53]. In fact, for most noises, OPTA performance canbe attained only in the limit of large information and codeblocks, even in the presence of feedback [31, Ch. 3.5].

B. Without FeedbackWe now turn to the more involved case of low-delay JSCC

without feedback. We concentrate on the case of KC = 2.That is, the case in which one source sample is conveyed overtwo channel uses.

A naıve approach is to send the source as is over bothchannel uses, up to a power adjustment. The correspondingunbiased SDR in this case is

SDRlin = 2SNR,

a linear improvement rather than an exponential one as in (21).This scheme approaches (21) for very low SNRs, but suffersgreat losses at high SNRs. Note that the linear factor 2 comesfrom the fact that the total power available over both channeluses has doubled, and the same performance can be attainedby allocating all of the available power to the first channel usewhile remaining silent during the second channel use.

This suggests that better mappings that truly exploit theextra channel use can be constructed. The first to proposean improvement for the 1:2 case were Shannon [35] andKotel’nikov [36], in the late 1940s. In their works, the sourcesample is viewed as a point on a single-dimensional line,whereas the two channel uses correspond to a two-dimensionalspace (represented by a dashed line in Fig. 4). In these terms,

Fig. 4. Linear and Archimedean bi-spiral curves.

the linear scheme corresponds to mapping the one-dimensionalsource line to a straight line in the two-dimensional channelspace (see Fig. 4), and hence clearly cannot provide anyimprovement, as AWGN is invariant to rotations. However,by mapping the one-dimensional source line into a two-dimensional curve that fills the space better, a great boost inperformance can be attained, as was demonstrated in [35],[36], [11, Ch. 8.2], [54]–[58] and references therein, fordifferent families of mappings.

In this work, we concentrate on such a family that is basedon the Archimedean spiral, which was considered in severalworks [54], [58]–[61] (represented by the solid line in Fig. 4):{

areg1 (s) = creg s cos(ωs) = creg |s| cos(ω|s|) sign(s)

areg2 (s) = creg |s| sin(ωs) = creg |s| sin(ω|s|) sign(s)

(25)

where ω determines the rotation frequency, the factor creg ischosen to satisfy the power constraint, and the sign(s) term isneeded to avoid overlap of the curve for positive and negativevalues of s (for each of which now corresponds a distinctspiral, and the two meet only at the origin). This (bi-)spiralallows to effectively improve the resolution with respect tosmall noise values, since the one-dimensional source line iseffectively stretched compared to the noise, and hence thenoise magnitude shrinks when the source curve is mapped(contracted) back. However, for large noise values, a jump to adifferent branch—referred to as a threshold effect—may occur,incurring a large distortion. Thus, the value ω needs to bechosen to be as large as possible to allow maximal stretchingof the curve for the same given power, while maintaining alow threshold event probability. The SDRs for different valuesof ω are depicted in Fig. 5a.

Another ingredient that is used in conjunction with (25)is stretching s prior to mapping it to a bi-spiral usingφλ(s) , sign(s)|s|λ:{astretch

1 (s) = areg1 (φλ(s)) = cstretch|s|λ cos

(ω|s|λ

)sign(s)

astretch2 (s) = areg

2 (φλ(s)) = cstretch|s|λ sin(ω|s|λ

)sign(s)

(26)

The choice λ = 0.5 promises a great boost in performancein the high-SNR regime, as is seen in Fig. 5b. We further note

10

that although the optimal decoder E [s|b1, b2] is an MMSE esti-mator, in this case, the ML decoder p(b1, b2|s) achieves similarperformance for moderate and high SNRs. A joint optimizationof λ and ω for each SNR, for both MMSE and ML decoding,was carried out in [59], with the latter depicted in Fig. 5.

A desired property of the linear JSCC schemes is that theirSDR improves with the channel SNR (“SNR universality”).Such an improvement is not allowed by the separation-basedtechnique, as it fails when the actual SNR is lower thanthe design SNR, and does not promise any improvement forSNRs above it. This motivated much work in designing JSCCschemes whose performance improves with the SNR, even forthe case of large block lengths [62]–[64]. The schemes in theseworks achieve optimal performance (21) for a specific designSNR (21), and improve linearly for higher SNRs. Similarbehavior is observed also in Fig. 5 where the optimal ω valuevaries with the (design) SNR, and mimics closely the quadraticgrowth in the SDR. Above the design SNR, linear growth isachieved for a particular choice of ω.

We further note that the distortion component due to thethreshold event grows with |s|. To avoid this behavior, insteadof increasing the magnitude

∥∥astretch∥∥ proportionally to the

phase ∠(astretch

)as in (26), we propose to increase it slightly

faster at a pace that guarantees that the incurred distortion doesnot grow with |s|:{

abounded1 (s) = cbounded|s|λβ cos

(ω|s|λ

)sign(s)

abounded2 (s) = cbounded|s|λβ sin

(ω|s|λ

)sign(s)

(27)

for some β > 1. This has only a slight effect on the resultingSDRs, as is illustrated in Fig. 5.

Finally, note that in no way do we claim that the spiral-based Shannon–Kotel’nikov (SK) scheme is optimal. Variousother techniques exist, most using a hybrid of digital andanalog components [56], [57], [65], which outperform thespiral-based scheme for various parameters. Nevertheless, thisscheme is the earliest technique to be considered and it givesgood performance boosts which suffice for our demonstration.

VI. CONTROL VIA SOURCE–CHANNEL SEPARATION

The separation-based control scheme, outlined next, appliesAlgorithm III.2 and sends the resulting quantization indicesafter encoding with a tree code generated as in Section IV. Theobserver/transmitter knowingly ignores any decoding errorsmade by the controller/receiver by internally simulating thesystem without any decoding errors. On the other hand,the controller/receiver, upon detecting an error in the past,recalculates the steps of Algorithm III.2 starting from thiserror and corrects for it in the following steps. This schemeis implemented in [42, tree/master/code/separate].

Scheme VI.1 (Separation-based).Initialization.

1) Selects the number of information bits k that are encodedat every time step t [recall (11), (14)].

2) Sets the size M of the PAM constellation to be 2k andthe number of coded bits—n to be KCk.

3) Generates G1, . . . ,GT as in Definition IV.3.

(a) λ = 1.

(b) λ = 0.5.

Fig. 5. Performances of the JSCC linear repetition scheme, OPTAbound, and the JSCC SK spiral scheme for optimized λ and ω, for thestandard case (β = 1) and distortion-bounded case. The solid linesdepict the performance of the standard spiral for various values of ω fortwo stretch parameters λ = 0.5 and 1, which perform better at highand low SNRs, respectively.

4) Assigns the noiseless-channel capacity C of Algo-rithm III.2 to equal k.

5) Initializes Algorithm III.2.

Observer/Transmitter. At time t ∈ [T ]:

1) Runs the Observer/Transmitter steps of Algorithm III.2with the control signal ut in Step 6 replaced by the signalgenerated by the controller in (28).

2) Maps the resulting quantization index `t into the k-bitinput ıt of the tree encoder.

3) Encodes ıt into n coded bits ct according to (14).4) Maps ct into KC constellation points at as in (18).5) Transmits the KC constellation points at over the KC

channel uses.

Controller/Receiver.

1) Receives the KC channel outputs bt.2) Recovers estimates of all information bits until time t,(

ı1|t, ı2|t, . . . , ıt|t)

as in (12) using Algorithm IV.1.

11

3) Maps each ıτ |t (for τ ∈ [t]) into a quantization index es-timate ˆr

τ |t, where the superscript ‘r’ stands for ‘receiver’.4) Finds the earliest time τ ∈ [t− 1] for which ˆr

τ |t 6= ˆrτ |t.

We denote this time instant by t0. If no such time instantexists, set t0 = t.

5) Runs the Controller/Receiver steps of Algorithm III.2 fortime instants τ = t0, . . . , t − 1, with `t replaced with(ˆr

1|t, . . . ,ˆrt|t) and the used control signals ut−1.

6) Runs Steps 1 and 3 of the Controller/Receiver of Algo-rithm III.2 for time instant t with `t replaced with ˆr

t|t.7) Applies the control signal

ut = −Ltxrt|t +

t−1∑τ=0

αt−1−τLτ

(xrτ |t − x

rτ |t−1

)(28a)

= −Ltxrt|t +

t−1∑τ=t0

αt−1−τLτ

(xrτ |t − x

rτ |t−1

), (28b)

to the system, where xrτ |t denotes the estimate of the

source xτ given(

ˆr1|t,

ˆr2|t, . . . ,

ˆrτ |t

)at the receiver.

VII. CONTROL VIA LOW-DELAY JSCCIn this section, we construct a Kalman-filter-like solu-

tion [48] by employing JSCC schemes. We note that theadditional complication here is due to the communicationchannel (2) and its inherent input power constraint.

Denote by xrt1|t2 the estimate of xt1 at the receiver given

bt2 . Denote further its mean square error (MSE) by

P rt1|t2 , E

[(xrt1|t2

)2],

where xrt1|t2 , xt1 − xr

t1|t2 .Then, the scheme works as follows. At time instant t, the

controller constructs an estimate xrt|t of xt. It then applies

the control signal ut = −Ltxrt|t to the plant, with Lt given

in (10). Note that, since both the controller and the observerknow the previously applied control signals ut, they also knowxrt|t and xr

t+1|t.Hence, in order to describe xt, the observer can save

transmit power by transmitting the error signal (xt − xrt|t−1),

instead of xt. The controller can then add back xrt|t−1 to the

received signal to construct xrt|t.

Scheme VII.1.

Observer/Transmitter: At time t• Generates the desired error signal

st = xrt|t−1 (29a)

= xt − xrt|t−1 (29b)

of average power P rt|t−1 (determined in the sequel).

• Since the channel input is subject to a unit power con-straint (3), st is normalized:

st =1√P rt|t−1

st . (30)

• Maps st into KC channel inputs, constituting the en-tries of at, using a bounded-distortion JSCC scheme of

choice with (maximum given any input) average distor-tion 1/SDR0 for the given channel SNR.

• Sends the KC channel inputs at over the channel (2).

Controller/Receiver: At time t• Receives the KC channel outputs bt.• Recovers a CUBE of the source signal st: ˆst = st+neff

t ,where neff

t ⊥ st is an additive noise of power of (at most)1/SDR0.

• Unnormalizes ˆst to construct an estimate of st:

st =√P rt|t−1

ˆst (31a)

=√P rt|t−1

(st + neff

t

)(31b)

= xrt|t−1 +

√P rt|t−1 n

efft . (31c)

• Constructs an estimate xrt|t of xt given bt. Since st ⊥

xrt|t−1, the linear MMSE estimate amounts to14

xrt|t = xr

t|t−1 +SDR0

1 + SDR0st , (32)

with an MSE of

P rt|t =

P rt|t−1

1 + SDR0. (33)

• Generates the control signal ut = −Ltxrt|t, and the

receiver prediction of the next system state

xrt|t−1 = αxr

t−1|t−1 + ut−1 ,

where Lt is given as in Lemma III.1.

Using (33) and (1), the prediction error at the receiver isgiven by the following recursion:

P rt+1|t =

α2P rt|t−1

1 + SDR0+W. (34)

The recursive relation (34) leads to the following conditionfor the stabilizability of the control system and bound on theachievable cost.

Theorem VII.1 (Achievable). The scalar control system ofSection II is stabilizable using Scheme VII.1 if α2 < 1 +SDR0, and its infinite-horizon average-stage LQG cost J∞ (6)is bounded from above by

J ≤Q+

(α2 − 1

)S

1 + SDR0 − α2W. (35)

The following theorem is an adaptation of the lower boundin [66] to our setting of interest.

Theorem VII.2 (Lower bound). The scalar control system ofSection II is stabilizable only if α2 < 1 + SDROPTA, and theoptimal achievable infinite-horizon average-stage LQG cost isbounded from below by

J ≥Q+

(α2 − 1

)S

1 + SDROPTA − α2W. (36)

14If the resulting effective noise nefft is not an AWGN with power that does

not depend on the channel input, then a better estimator than that in (32) maybe constructed.

12

By comparing (35) and (36) we see that the potential gapbetween the two bounds stems only from the gap between thebounds on the achievable SDR over the AWGN channel (2).Remark VII.1 (Stabilizability). The stabilizability condition ofTheorem. VII.1 is distant from that of Theorem. VII.2 in thiscase since SDR0 < SDROPTA. By improving SDR0, one canimprove the achievable stabilizability of the system.

It is interesting to note that in this case, in stark contrast tothe classical LQG setting in which the system is stabilizablefor any values of α and W , low values of the SDR renderthe system unstable. Hence, it provides, among others, theminimal required transmit power for the system to remainstable. The difference from the classical LQG case stemsfrom the additional input power constraint, which effectivelycouples the power of the effective observation noise with thatof the estimation error, and was previously observed in, e.g.,[28], [29], [66], [67]. The existence of a threshold SDR belowwhich the system cannot be stabilized parallels the result ofSinopoli et al. [33] for control over packet drop channels, thatshows that the system cannot be stabilized if the packet dropprobability exceeds a certain threshold.

We next discuss the special cases of KC = 1 in Section VII-A, KC ∈ N and instantaneous perfect output feedback inSection VII-B, and KC = 2 in Section VII-C.

A. Source–Channel Rate MatchIn this subsection we treat the case of KC = 1, namely,

the case where the sample rate of the control system and thesignaling rate of the communication channel match.

As we have seen in Section V, analog linear transmission ofa Gaussian source over an AWGN channel achieves optimalperformance (even when infinite delay is allowed), namely,the OPTA SDR (21), for any given input value. Thus, theJSCC scheme that we use in this case is linear transmission—the source is transmitted as is, up to a power adjustment[recall (29) and (30)]:

at = st =1√P rt|t−1

st .

Since in this case SDR0 = SDROPTA, the upper and lowerbounds of Theorems VII.1 and VII.2 coincide, establishing theoptimum performance in this case.

Corollary VII.1. The scalar control system of Section II withKC = 1 is stabilizable if only if α2 < 1 + SNR, and theoptimal achievable infinite-horizon average stage LQG costsatisfies (35) with equality with SDR0 = SNR.

Remark VII.2. The stabilizability condition and optimumMMSE performance were previously established in [28], [67]and extend also to the noisy-observation case [23].

B. Source–Channel Rate Mismatch with FeedbackWhen the AWGN channel outputs bt (2) are available to the

transmitter via an instantaneous feedback, we can incorporateScheme V.1 in Scheme VII.1 to attain the OPTA SDR andagain establish the optimal LQG cost of this setting.

Corollary VII.2. The scalar control system of Section IIKC ∈ N is stabilizable if only if α2 < (1 + SNR)

KC , and theoptimal achievable infinite-horizon average stage LQG costsatisfies (35) with equality with SDR0 = (1 + SNR)

KC − 1.

Remark VII.3 (Non-Gaussian noise). Following Remark V.1,the achievability of Theorem. VII.1 is attainable with SDR0 =SNR even when the noise is non-Gaussian. In this case,however, the variance W in the lower bound of Theorem. VII.2should be replaced with its entropy power (which is strictlylower than the variance for non-Gaussian processes) [66], andtherefore better performance might be achievable. We note thatthis lower bound is not achievable in general with or withoutfeedback, as is implied by [31, Ch. 3.5].

C. Source–Channel Rate Mismatch without FeedbackWe now consider the case of KC = 2 channel uses

per sample. As we saw in Section V, linear schemes aresuboptimal outside the low-SNR region. Instead, by using non-linear maps, e.g., the (modified) Archimedean spiral-based SKmaps (27), better performance can be achieved. This schemeis implemented in [42, tree/master/code/joint].

We note that the improvement in the SDR of the JSCCscheme is substantial when the SDR of the linear scheme isclose to α2 − 1, using an improved scheme with better SDRimproves substantially the LQG cost. Unfortunately, the spiral-based SK schemes do not promise any improvement for SNRsbelow 5dB under ML decoding.Remark VII.4. By replacing the ML decoder with an MMSEone, strictly better performance can be achieved over the linearscheme for all SNR values.Remark VII.5. The resulting effective noise at the output ofthe JSCC receiver is not necessarily Gaussian, and hence theresulting system state xt, is not necessarily Gaussian either.Nevertheless, for the bounded-distortion scheme (27), this hasno effect on the resulting performance, as is demonstrated next.

VIII. SIMULATIONS

A. Rate-Matched CaseThe optimal average-stage LQG cost is illustrated in Fig. 6,

for a system with α = 2 and two SNRs—2 and 4. SNR = 4satisfies the stabilizability condition α2 < 1 + SNR, whereasSNR = 2 fails to do so. Unit LQG penalty coefficientsQt ≡ Rt ≡ 1 and unit driving noise power W = 1 are used.

B. JSCC for the Rate-Mismatched CaseThe effect of the SDR improvement is illustrated in Fig. 7

for a system with α = 3 and W = 1, for Qt ≡ 1 andRt ≡ 0, by comparing the achievable costs and lower boundof Theorems VII.1 and VII.2. We note that the JSCC schemewith (intermediate) feedback always achieves OPTA.

C. Comparison of Separation-based and JSCCSchemes for the Rate-Mismatched Case

We now compare the performance of the separation-basedscheme of Section VI with the JSCC schemes of Section VII.The implementations of these schemes are available in [42].

13

Fig. 6. Optimal average stage LQG cost J of a single representativerun for KC = KS = 1, α = 2, and SNRs 2 and 4 which correspondto a stabilizable and an unstabilizable systems. The driving noise andobservation noise powers and the LQG penalty coefficients are Qt ≡Rt ≡ 1,W = 1.

Fig. 8 shows a comparison of the control costs Jt achievedby these schemes for a perfectly observed scalar plant withα = 1.2, W = 1, Qt ≡ 1 and Rt ≡ 0, over 256 runs.

Clearly, the JSCC-based schemes outperform the separation-based schemes by a large margin.

Note that while larger constellations perform better for highSNRs, the situation is reversed when the SNR is low.

We further include a simulation of a single run of each of theschemes in Fig. 9. For the separation-based scheme, decodingerrors (times when ˆr

t|t 6= `t) are highlighted. Their impactis clear: while the decoder is in error, it applies the wrongcontrol signal, causing the cost function to rapidly deteriorate.In the instance shown, these decoding errors are clearly themajor factor degrading performance.

IX. DISCUSSION AND FUTURE RESEARCH

A. Separation-based and JSCC Schemes ComparisonAs is evident from the simulation results in Section VIII-

C, in addition to demanding far less computation time andmemory, and being considerably simpler to implement thanseparation-based schemes, the JSCC-based schemes also per-form much better in terms of control cost.

A key component behind this improvement is the fact thatthe JSCC schemes of Section V allow the (rare) utilization oflarge (unbounded) excess transmission power. The separation-based schemes, on the other hand, are limited by the trans-mission power of the maximum constellation point, whichincreases as the square root of the average power.15 Namely,these schemes have a peak power constraint, which is knownto have a detrimental effect on the performance [68].

Another unfortunate shortcoming of using separation-basedschemes is their incompetence to stabilize higher moments.As was noted already in the seminal work of Sahai andMitter [7], in order to stabilize higher moments, increasederror exponents are required, that need to grow linearly withthe moment’s order—this behavior is manifested by the abrupt

15In the limit of infinite-size constellations, the distribution of the con-stellation tends to a continuous uniform PDF over [−M/2,M/2] of powerP =M2/12.

Fig. 7. Average-stage LQG costs when using the (distortion-bounded)SK Archimedean bi-spiral, repetition and the lower bound of Theo-rem. VII.2 for α = 3,W = 1, Qt ≡ 1, Rt ≡ 0. The vertical dottedlines represent the minimum SNR below which the cost diverges toinfinity.

jumps in the cost of this scheme in Fig. 9. In contrast, JSCCschemes (which in our case enjoy an implicit feedback via thecontrol-system loop) can attain a super exponential decay ofthe error probability, when used to send bits [69] (cf. [68]).Thus, such schemes can stabilize more and even all moments,and guarantee almost-sure stability (by using, e.g., the simplelinear/repetition based scheme), as was noted already by Sahaiand Mitter [7, Sec. III-C] in the context of anytime reliability.

On the other hand, the JSCC schemes developed in thiswork assumed knowledge of the control objectives at thetransmitter in order to recover the estimates of the controller atthe transmitter. The separation-based scheme on the other handdid not require such knowledge and only assumed knowledgeof the control signal ut.

B. Partially-Observable Vector SystemsIn this paper, we focused on the simplest case of scalar sys-

tems, and KC = 2. As implied by the JSCC theorem (20), anexponentially large (in KC) gain in the cost can be achieved.Furthermore, the results of Theorems VII.1 and VII.2 readilyextend to systems with noisy observations [41] as well asvector states xt and vector control signals ut but scalarobserved outputs yt.

Interestingly, for the case of vector (observation, state andcontrol) signals, even if the signaling rate of the channeland the sample rate of the observer are equal (rate-matchedcase), conveying several analog observations over a singlechannel input may be of the essence. This is achieved by acompression JSCC scheme, e.g., by reversing the roles of thesource and the channel inputs in the SK spiral-based scheme.Similarly to their expansion counterparts, such compressionJSCC schemes provide exponentially growing gains with theSNR and dimension [35], [36], [54], [57]–[59], [61], andpromise better LQG costs than their linear counterparts [28].

The vector case is much more challenging, in general.Beyond the difficulty of designing good higher-dimensionalcurves/surfaces (and more general geometric structures) [55],[57] and lower bounds [66], there is a challenge of evendetermining and prioritizing the subspaces of the state space

14

Fig. 8. Average-stage LQG costs averaged over 256 runswhen using the (distortion-bounded) JSCC SK Archimedean bi-spiralscheme, linear (repetition) scheme, separation-based scheme for 2-PAM and 4-PAM, and the (OPTA) lower bound of Theorem. VII.2 forα = 1.2,W = 1, Qt ≡ 1, Rt = 0. The SNRs for which the separa-tion based schemes were simulated are marked with squares and circlesfor 2-PAM and 4-PAM, respectively.

of xt that needs to be conveyed. To make this point moreclear, consider a linear plant,

xt+1 = Axt +But + wt ,

with two-dimensional state (xt), output (yt) and i.i.d. Gaus-sian noise (wt) signals, a scalar control signal (ut), andknown 2 × 2 and 2 × 1 matrices A and B, respectively.Since ut is one-dimensional, optimizing the cost of the nextstep mandates transmitting the information that is needed toreconstruct/approximate the optimal ut (say a one-dimensionallinear combination of xt: ut = −KT

t xt, where Kt is apredetermined column vector). However, since A mixes boththe subspace dictated by KT and its perpendicular, betterperformance might be attained in future steps (“cost-to-go”)by transmitting both subspaces.

C. Oblivious Transmitter

In this work, we assumed that the observer knows all pastcontrol signals. We note that such information is not neededfor the JSCC schemes (without feedback), for the special caseof variance control, i.e., Rt ≡ 0.

For the more general LQG-cost setting (Rt 6≡ 0), thisassumption can be viewed as a two-sided side-informationscenario. Nevertheless, although this is a common situationin practice, there are scenarios in which the observer isoblivious of the control signal applied or has only a noisymeasurement of the actuation signal generated by the con-troller. Such settings can be regarded as a JSCC problem withside information at the receiver (only), and can be treatedusing JSCC techniques designed for this case, some of whichcombine naturally with the JSCC schemes for rate mismatch[57], [64], [65]. In fact, this idea was recently applied for therelated problem of communication over an AWGN channelwith AWGN feedback channel in [70].

We further note that for bounded noise (even worst-case/arbitrary), parallel results can be achieved.

0 20 40 60 80 100 120t

15

10

5

0

5

10

J t [d

B]

2-PAMSpiralSpiral: analyticOPTADecoding errors

Fig. 9. LQG cost comparison of the separation-based scheme with k =2 (2-PAM constellation), the JSCC SK bi-spiral scheme, and the OPTAlower bound for α = 1.2, SNR = 4.5 dB,W = 1, Qt ≡ 1, Rt ≡0. The schemes were simulated for the same disturbance and noisesequences and their results are compared to the analytically derivedcost of the JSCC scheme and the OPTA lower bound. Times whendecoding errors in the separation-based scheme occur are marked by athick line.

D. Packeted Transmission with ErasuresIn the separation-based schemes, following the work of

Sahai and Mitter [7], we used a decoder that ought to make adecision on all information bits transmitted until that time,even if its “belief” of a particular bit—quantified by anappropriate metric, say Fano’s metric—is low.

An alternative to this approach is to allow declaring anerasure, for bits of “low belief”. This idea was advocatedand explored in the celebrated work of Fano [71] for blockcodes, where a tradeoff between the achievable error erasureexponents was established.

Furthermore, by substantially increasing the error exponent(lowering the error probability), at the expense of decreasingthe erasure exponent (increasing the erasure probability), onecan drive the separation-based scheme toward that of a noise-less channel with occasional packet erasures. Interestingly,Algorithm III.2 and the lower bound of Theorem. VII.2 readilyextend to this case due to their “greedy nature” [34].

ACKNOWLEDGMENT

The authors thank H. Yıldız for valuable discussions andhelp with parts of the simulation.

REFERENCES

[1] J. P. Hespanha, P. Naghshtabrizi, and U. Xu, “A survey of recent resultsin networked control systems,” Proc. IEEE, vol. 95, no. 1, pp. 138–162,Jan. 2007.

[2] V. Gupta, A. F. Dana, J. P. Hespanha, R. M. Murray, and B. Hassibi,“Data transmission over networks for estimation and control,” IEEETrans. Auto. Cont., vol. 54, no. 8, pp. 1807–1819, Aug. 2009.

[3] L. Schenato, B. Sinopoli, M. Franceschetti, K. Poola, and S. S. Sastry,“Foundations of control and estimation over lossy networks,” Proc.IEEE, vol. 95, no. 1, pp. 163–187, Jan. 2007.

[4] S. Yuksel and T. Basar, Stochastic Networked Control Systems: Sta-bilization and Optimization Under Information Constraints. Boston:Birkhauser, 2013.

15

[5] C. E. Shannon, “A mathematical theory of communication,” Bell Sys.Tech. J., vol. 27, pp. 379–423, Jul. 1948.

[6] A. El Gamal and Y.-H. Kim, Network Information Theory. CambridgeUniversity Press, 2011.

[7] A. Sahai and S. K. Mitter, “The necessity and sufficiency of anytimecapacity for stabilization of a linear system over a noisy communicationlink—part I: Scalar systems,” IEEE Trans. Inf. Theory, vol. 52, no. 8,pp. 3369–3395, Aug. 2006.

[8] L. J. Schulman, “Coding for interactive communication,” IEEE Trans.Inf. Theory, vol. 42, pp. 1745–1756, 1996.

[9] R. T. Sukhavasi and B. Hassibi, “Linear time-invariant anytime codes forcontrol over noisy channels,” IEEE Trans. Auto. Cont., vol. 61, no. 12,pp. 3826–3841, Dec. 2016.

[10] J. M. Wozencraft, “Sequential decoding for reliable communications,”MIT Research Lab. of Elect., Cambridge, MA, USA, Tech. Rep., 1957.

[11] J. M. Wozencraft and I. M. Jacobs, Principles of CommunicationEngineering. New York: John Wiley & Sons, 1965.

[12] F. Jelinek, Probabilistic Information Theory: Discrete and MemorylessModels. New York: McGraw-Hill, 1968.

[13] R. G. Gallager, Information Theory and Reliable Communication. NewYork: John Wiley & Sons, 1968.

[14] A. J. Viterbi and J. K. Omura, Principles of Digital Communication andCoding. New York: McGraw-Hill, 1979.

[15] R. Johannesson and K. S. Zigangirov, Fundamentals of ConvolutionalCoding. New York: Wiley-IEEE Press, 1999.

[16] A. Khina, W. Halbawi, and B. Hassibi, “(almost) practical tree codes,”in Proc. IEEE Int. Symp. on Inf. Theory (ISIT), Barcelona, Spain, Jul.2016, pp. 2404–2408.

[17] G. N. Nair, F. Fagnani, S. Zampieri, and R. J. Evans, “Feedback controlunder data rate constraints: An overview,” Proc. IEEE, vol. 95, no. 1,pp. 108–137, Jan. 2007.

[18] S. Yuksel, “Stochastic stabilization of noisy linear systems with fixed-rate limited feedback,” IEEE Trans. Auto. Cont., vol. 55, no. 12, pp.2847–2853, Dec. 2010.

[19] P. Minero, M. Franceschetti, S. Dey, and G. N. Nair, “Data rate theoremfor stabilization over time-varying feedback channels,” IEEE Trans.Auto. Cont., vol. 54, no. 2, pp. 243–255, Feb. 2009.

[20] S. Sarkka, Bayesian filtering and smoothing. Cambridge, UK: Cam-bridge University Press, 2013, vol. 3.

[21] A. Gersho and R. M. Gray, Vector Quantization and Signal Compression.Boston: Kluwer Academic Pub., 1992.

[22] L. Bao, M. Skoglund, and K. H. Johansson, “Iterative encoder–controllerdesign for feedback control over noisy channels,” IEEE Trans. Auto.Cont., vol. 56, no. 2, Feb. 2011.

[23] A. Khina, Y. Nakahira, Y. Su, and B. Hassibi, “Algorithms for optimalcontrol with fixed-rate feedback,” in Proc. IEEE Conf. Decision andControl (CDC), Melbourne, VIC, Australia, Dec. 2017, pp. 6015–6020.

[24] Y. Kochman, A. Khina, U. Erez, and R. Zamir, “Rematch-and-forward:Joint source/channel coding for parallel relaying with spectral mis-match,” IEEE Trans. Inf. Theory, vol. 60, no. 1, pp. 605–622, Jan. 2014.

[25] T. Goblick, “Theoretical limitations on the transmission of data fromanalog sources,” IEEE Trans. Inf. Theory, vol. 11, pp. 558–567, 1965.

[26] P. Elias, “Channel capacity without coding,” in Proc. IRE, vol. 45, no. 3,Jan. 1957, pp. 381–381.

[27] R. Bansal and T. Basar, “Simultaneous design of measurement andcontrol strategies for stochastic systems with feedback,” Automatica,vol. 25, no. 5, pp. 679–694, Sep. 1989.

[28] J. S. Freudenberg, R. H. Middleton, and V. Solo, “Stabilization anddisturbance attenuation over a Gaussian communication channel,” IEEETrans. Auto. Cont., vol. 55, no. 3, pp. 795–799, Mar. 2010.

[29] S. Tatikonda, A. Sahai, and S. K. Mitter, “Stochastic linear control overa communication channel,” IEEE Trans. Auto. Cont., vol. 49, no. 9, pp.1549–1561, Sep. 2004.

[30] N. Elia, “When Bode meets Shannon: Control-oriented feedback com-munication schemes,” IEEE Trans. Auto. Cont., vol. 49, no. 9, pp. 1477–1488, Sep. 2004.

[31] M. Gastpar, “To code or not to code,” Ph.D. dissertation, School ofComputer and Communication Sciences, Ecole Polytechnique Federalede Lausanne (EPFL), Lausanne, Switzerland, Dec. 2002.

[32] S. Tatikonda and S. K. Mitter, “Control over noisy channels,” IEEETrans. Auto. Cont., vol. 49, no. 7, pp. 1196–1201, Jul. 2004.

[33] B. Sinopoli, L. Schenato, M. Franceschetti, K. Poola, M. I. Jordan, andS. S. Sastry, “Kalman filtering with intermittent observations,” IEEETrans. Auto. Cont., vol. 49, no. 9, pp. 1453–1464, Sep. 2004.

[34] A. Khina, V. Kostina, A. Khisti, and B. Hassibi, “Tracking and control ofGauss–Markov processes over packet-drop channels with acknowledg-

ments,” IEEE Trans. Cont. Network Sys., accepted, May 2018. Availableupon request from ISF.

[35] C. E. Shannon, “Communication in the presence of noise,” Proc. IRE,vol. 37, no. 1, pp. 10–21, Jan. 1949.

[36] V. A. Kotel’nikov, The Theory of Optimum Noise Immunity. New York:McGraw-Hill, 1959.

[37] S. Yuksel and S. Tatikonda, “A counterexample in distributed optimalsensing and control,” IEEE Trans. Auto. Cont., vol. 54, no. 4, pp. 841–844, Apr. 2009.

[38] U. Kumar, V. Gupta, and J. N. Laneman, “On stability across a Gaussianproduct channel,” in Proc. IEEE Conf. Decision and Control and Euro.Control (CDC-ECC), Orlando, FL, USA, Dec. 2011, pp. 3142–3147.

[39] A. A. Zaidi, T. J. Oechtering, S. Yuksel, and M. Skoglund, “Stabilizationand control over Gaussian networks,” in Information and Control inNetworks, B. Bernhardsson, G. Como, and A. Rantzer, Eds. Springer,2014, pp. 39–85.

[40] H. S. Witsenhausen, “A counterexample in stochastic optimum control,”SIAM J. Control, vol. 6, no. 1, pp. 131–147, Feb. 1968.

[41] A. Khina, G. M. Pettersson, V. Kostina, and B. Hassibi, “Multi-ratecontrol over AWGN channels via analog joint source–channel coding,”in Proc. IEEE Conf. Decision and Control (CDC), Las Vegas, NV, USA,Dec. 2016, pp. 5968–5973.

[42] E. Riedel Garding, “Control over AWGN channels with andwithout source–channel separation: Python implementation,” Aug.2017. [Online]. Available: https://github.com/eliasrg/SURF2017/

[43] S. Dharmadhikari and K. Joag-Dev, Unimodality, Convexity, and Appli-cations, ser. Probability and Mathematical Statistics. Academic Press,Inc., Boston, MA, 1988.

[44] P. E. Fleischer, “Sufficient conditions for achieving minimum distortionin a quantizer,” in IEEE Int. Conv. Rec., Part 1, Jan. 1964, pp. 104–111.

[45] J. C. Kieffer, “Uniqueness of locally optimal quantizer for log-concavedensity and convex error weighting function,,” IEEE Trans. Inf. Theory,vol. 29, no. 1, pp. 42–47, Jan. 1983.

[46] A. V. Trushkin, “Sufficient conditions for uniqueness of a locally optimalquantizer for a class of convex error weighting functions,” IEEE Trans.Inf. Theory, vol. 28, no. 2, pp. 187–198, Mar. 1982.

[47] T. Fischer, “Optimal quantized control,” IEEE Trans. Auto. Cont.,vol. 27, no. 4, pp. 996–998, Aug. 1982.

[48] D. P. Bertsekas, Dynamic Programming and Optimal Control, 2nd ed.Belmont, MA, USA: Athena Scientific, 2000, vol. I.

[49] M. Fu, “Lack of separation principle for quantized linear quadraticGaussian control,” IEEE Trans. Auto. Cont., vol. 57, no. 9, pp. 2385–2390, 2012.

[50] R. M. Fano, Transmission of Information. Cambridge:M.I.T. Press andNew York: John Wiley and Sons, 1961.

[51] A. Khina, W. Halbawi, and B. Hassibi, “Sequential stack decoder:Python implementation,” Jan. 2016. [Online]. Available: http://www.eng.tau.ac.il/∼anatolyk/code/tree codes.py

[52] E. Arıkan, “An upper bound on the cutoff rate of sequential decoding,”IEEE Trans. Inf. Theory, vol. 34, no. 1, pp. 55–63, Jan. 1988.

[53] O. Shayevitz and M. Feder, “The posterior matching feedback schemefor joint source–channel coding with bandwidth expansion,” in Proc.Data Comp. Conf. (DCC), Snowbird, UT, USA, 2009, pp. 83–92.

[54] S. Y. Chung, “On the construction of some capacity-approaching codingschemes,” Ph.D. dissertation, Dept. EECS, Massachusetts Institute ofTechnology, Cambridge, MA, USA, 2000.

[55] P. A. Floor, “On the theory of Shannon–Kotelnikov mappings in jointsource–channel coding,” Ph.D. dissertation, Dept. Electron. and Tele-com., Norwegian University of Science and Technology, May 2008.

[56] M. Kleiner and B. Rimoldi, “Asymptotically optimal joint source–channel with minimal delay,” in Proc. IEEE GLOBECOM, Honolulu,HI, USA, Nov./Dec. 2009.

[57] E. Akyol, K. B. Viswanatha, K. Rose, and T. A. Ramstad, “On zero-delay source–channel coding,” IEEE Trans. Inf. Theory, vol. 60, no. 12,pp. 7473–7489, Dec. 2014.

[58] F. Hekland, P. A. Floor, and T. A. Ramstad, “Shannon–Kotel’nikovmappings in joint source–channel coding,” IEEE Trans. Comm., vol. 57,no. 1, pp. 94–105, Jan. 2009.

[59] Y. Hu, J. Garcia-Frias, and M. Lamarca, “Analog joint source–channelcoding using non-linear curves and MMSE decoding,” IEEE Trans.Comm., vol. 59, no. 11, pp. 3016–3026, Nov. 2011.

[60] I. Kvecher and D. Rephaeli, “An analog modulation using a spiralmapping,” in Proc. IEEE Conv. Electrical and Electron. Engineers inIsrael (IEEEI), Eilat, Israel, Nov. 2006.

[61] A. Ingber and M. Feder, “Power preserving 2:1 bandwidth reductionmappings,” in Proc. of the Data Comp. Conf., Snowbird, UT, USA,Mar. 1997.

https://github.com/eliasrg/SURF2017/

http://www.eng.tau.ac.il/~anatolyk/code/tree_codes.py

http://www.eng.tau.ac.il/~anatolyk/code/tree_codes.py

16

[62] U. Mittal and N. Phamdo, “Hybrid digital-analog (HDA) joint source–channel codes for broadcasting and robust communications,” IEEETrans. Inf. Theory, vol. 48, no. 5, pp. 1082–1102, May 2002.

[63] Z. Reznic, M. Feder, and R. Zamir, “Distortion bounds for broadcastingwith bandwidth expansion,” IEEE Trans. Inf. Theory, vol. 52, no. 8, pp.3778–3788, Aug. 2006.

[64] Y. Kochman and R. Zamir, “Analog matching of colored sources tocolored channels,” IEEE Trans. Inf. Theory, vol. 57, no. 6, pp. 3180–3195, June 2011.

[65] X. Chen and E. Tuncel, “Zero-delay joint source–channel coding usinghybrid digital–analog schemes in the Wyner–Ziv setting,” IEEE Trans.Comm., vol. 62, no. 2, pp. 726–735, Feb. 2014.

[66] V. Kostina and B. Hassibi, “Rate–cost tradeoffs in control,” in Proc.Allerton Conf. on Comm., Control, and Comput., Monticello, IL, USA,Sep. 2016, pp. 1157–1164.

[67] J. S. Freudenberg, R. H. Middleton, and J. H. Braslavsky, “Stabilizationand disturbance attenuation over a Gaussian communication channel,”in Proc. IEEE Conf. Decision and Control (CDC), New Orleans, LA,USA, Dec. 2007, pp. 3958–3963.

[68] A. D. Wyner, “On the Schalkwijk–Kailath coding scheme with a peakenergy constraint,” IEEE Trans. Inf. Theory, vol. 14, no. 1, pp. 129–134,Jan. 1968.

[69] J. P. M. Schalkwijk and T. Kailath, “A coding scheme for additive noisechannels with feedback–I: No bandwidth constraint,” IEEE Trans. Inf.Theory, vol. 12, pp. 172–182, Apr. 1966.

[70] A. Ben-Yishai and O. Shayevitz, “Interactive schemes for the AWGNchannel with noisy feedback,” IEEE Trans. Inf. Theory, vol. 63, no. 4,pp. 2409–2427, Apr. 2017.

[71] G. D. Forney Jr., “Exponential error bounds for erasure, list, and decisionfeedback schemes,” IEEE Trans. Inf. Theory, vol. 14, no. 2, pp. 206–220,Mar. 1968.

Anatoly Khina was born in Moscow, USSR,in 1984, and moved to Israel in 1991. He iscurrently a Senior Lecturer in the Departmentof Electrical Engineering-Systems, Tel Aviv Uni-versity, Tel Aviv, Israel, from which he holdsB.Sc. (summa cum laude, 2006), M.Sc. (summacum laude, 2010), and Ph.D. (2016) degrees,all in Electrical Engineering. He was a Post-doctoral Scholar in the Department of ElectricalEngineering, California Institute of Technology,Pasadena, CA, USA, from 2015 to 2018, and a

Research Fellow at the Simons Institute for the Theory of Computing,University of California, Berkeley, Berkeley, CA, USA, during the Springof 2018. His research interests include Information Theory, ControlTheory, Signal Processing and Matrix Analysis.

In parallel to his studies, Dr. Khina has worked as an engineer invarious roles focused on algorithms, software and hardware R&D. Heis a recipient of the Simons-Berkeley and Qualcomm Research Fellow-ships; Fulbright, Rothschild and Marie Skłodowska-Curie PostdoctoralFellowships; Clore Scholarship; Trotsky Award; Weinstein Prize in signalprocessing; Intel award for Ph.D. research; and the first prize for out-standing research work in the field of communication technologies fromthe Advanced Communication Center’s Feder Family Award program.

Elias Riedel Garding received his BSc degreein Engineering Physics from the Royal Instituteof Technology (KTH), Stockholm in 2017 and aMASt degree in Applied Mathematics, focusedon theoretical physics, from the University ofCambridge in 2018. He contributed to this workas a Summer Undergraduate Research Fellowat the California Institute of Technology (2017).He was a summer student at the Joint Re-search Centre of Photonics at Zhejiang Univer-sity, Hangzhou in 2016 and contributed to NMR

spectroscopy research at the Department of Biochemistry, Universityof Cambridge in the summer of 2018. He is currently pursuing a MScdegree in Subatomic and Astrophysics at KTH, Stockholm.

Gustav M. Pettersson received the B.Sc. de-gree in Engineering Physics in 2017 and theM.Sc. degree in Aerospace Engineering in 2018,from the KTH Royal Institute of Technology,Stockholm, Sweden. He is currently with theEuropean Space Agency, Noordwijk, the Nether-lands. He has previously been with the CaliforniaInstitute of Technology, Pasadena, CA, USA, in2016, the Science for Life Laboratory, Solna,Sweden, in 2017, and at the NASA Ames Re-search Center, Moffett Field, CA, USA, in 2018.

Victoria Kostina joined Caltech as an AssistantProfessor of Electrical Engineering in the fall of2014. She holds a Bachelor’s degree AppliedMathematics and Physics from Moscow instituteof Physics and Technology (2004), where shewas affiliated with the Institute for InformationTransmission Problems of the Russian Academyof Sciences, a Master’s degree in Electrical En-gineering from University of Ottawa (2006), anda PhD in Electrical Engineering from Prince-ton University (2013). She received the Natural

Sciences and Engineering Research Council of Canada postgraduatescholarship (2009–2012), the Princeton Electrical Engineering BestDissertation Award (2013), the Simons-Berkeley research fellowship(2015) and the NSF CAREER award (2017). Dr. Kostina’s researchspans information theory, coding, control and communications.

Babak Hassibi was born in Tehran, Iran, in1967. He received the B.S. degree from theUniversity of Tehran in 1989, and the M.S. andPh.D. degrees from Stanford University in 1993and 1996, respectively, all in electrical engineer-ing.

He has been with the California Institute ofTechnology since January 2001, where he iscurrently the Mose and Lilian S. Bohn Professorof Electrical Engineering. From 20013–2016 hewas the Gordon M. Binder/Amgen Professor of

Electrical Engineering and from 2008–2015 he was Executive Officerof Electrical Engineering, as well as Associate Director of InformationScience and Technology. From October 1996 to October 1998 he wasa research associate at the Information Systems Laboratory, StanfordUniversity, and from November 1998 to December 2000 he was aMember of the Technical Staff in the Mathematical Sciences ResearchCenter at Bell Laboratories, Murray Hill, NJ. He has also held short-term appointments at Ricoh California Research Center, the IndianInstitute of Science, and Linkoping University, Sweden. His researchinterests include communications and information theory, control andnetwork science, and signal processing and machine learning. He is thecoauthor of the books (both with A. H. Sayed and T. Kailath) IndefiniteQuadratic Estimation and Control: A Unified Approach to H2 and H∞Theories (New York: SIAM, 1999) and Linear Estimation (EnglewoodCliffs, NJ: Prentice Hall, 2000). He is a recipient of an Alborz FoundationFellowship, the 1999 O. Hugo Schuck best paper award of the AmericanAutomatic Control Council (with H. Hindi and S. P. Boyd), the 2002 NSFCareer Award, the 2002 Okawa Foundation Research Grant for Infor-mation and Telecommunications, the 2003 David and Lucille PackardFellowship for Science and Engineering, the 2003 Presidential EarlyCareer Award for Scientists and Engineers (PECASE), and the 2009Al-Marai Award for Innovative Research in Communications, and was aparticipant in the 2004 National Academy of Engineering “Frontiers inEngineering” program.

He has been a Guest Editor for the IEEE TRANSACTIONS ON INFOR-MATION THEORY special issue on “space-time transmission, reception,coding and signal processing”, was an Associate Editor for Communi-cations of the IEEE TRANSACTIONS ON INFORMATION THEORY during2004-2006, and is currently an Editor for the Journal “Foundations andTrends in Information and Communication” and for the IEEE TRANSAC-TIONS ON NETWORK SCIENCE AND ENGINEERING.

Documents

Control over Gaussian Channels With and Without Source ...anatolyk/papers/journal/... · source sample[26]–[30], the reason is that a Gaussian source is probabilistically matched