68
University of Illinois at Chicago ECE 534, Natasha Devroye Chapter 7: Channel capacity

Chapter 7: Channel capacity · Chapter 7: Channel capacity. University of Illinois at Chicago ECE 534, Natasha Devroye Chapter 7 outline • Definition and examples of channel capacity

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Chapter 7: Channel capacity · Chapter 7: Channel capacity. University of Illinois at Chicago ECE 534, Natasha Devroye Chapter 7 outline • Definition and examples of channel capacity

University of Illinois at Chicago ECE 534, Natasha Devroye

Chapter 7: Channel capacity

Page 2: Chapter 7: Channel capacity · Chapter 7: Channel capacity. University of Illinois at Chicago ECE 534, Natasha Devroye Chapter 7 outline • Definition and examples of channel capacity

University of Illinois at Chicago ECE 534, Natasha Devroye

Chapter 7 outline

• Definition and examples of channel capacity

• Symmetric channels

• Channel capacity properties

• Definitions and jointly typical sequences

• Channel coding theorem: achievability and converse

• Hamming codes

• Channels with feedback

• Source-channel separation theorem

Page 3: Chapter 7: Channel capacity · Chapter 7: Channel capacity. University of Illinois at Chicago ECE 534, Natasha Devroye Chapter 7 outline • Definition and examples of channel capacity

Generic communication block diagram

Source Encoder Channel Decoder Destination

Noise

SourceSource

coderChannel

Source

decoderDestination

Noise

Channel

coder

Channel

decoder

Encoder Decoder

Remove redundancyControlled adding of redundancy

Decode signals, detect/correct errors

Restore source

Page 4: Chapter 7: Channel capacity · Chapter 7: Channel capacity. University of Illinois at Chicago ECE 534, Natasha Devroye Chapter 7 outline • Definition and examples of channel capacity

University of Illinois at Chicago ECE 534, Natasha Devroye

Communication system

Source Encoder Channel Decoder Destination

Noise

Source Encoder Channel Decoder Destination

MessageEstimate of message

Page 5: Chapter 7: Channel capacity · Chapter 7: Channel capacity. University of Illinois at Chicago ECE 534, Natasha Devroye Chapter 7 outline • Definition and examples of channel capacity

University of Illinois at Chicago ECE 534, Natasha Devroye

Capacity: key ideas

• choose input set of codewords so they are “non-confusable” at the output

• number of these that we can chose will determine the channel’s capacity!

• number that we can choose will depend on the distribution p(y|x) which characterizes the channel!

• for now we deal with discrete channels

Source Encoder Channel Decoder Destination

MessageEstimate of message

Page 6: Chapter 7: Channel capacity · Chapter 7: Channel capacity. University of Illinois at Chicago ECE 534, Natasha Devroye Chapter 7 outline • Definition and examples of channel capacity

University of Illinois at Chicago ECE 534, Natasha Devroye

Discrete channel capacity

Channel

Page 7: Chapter 7: Channel capacity · Chapter 7: Channel capacity. University of Illinois at Chicago ECE 534, Natasha Devroye Chapter 7 outline • Definition and examples of channel capacity

Mathematical description of capacity

• Information channel capacity:

• Channel coding theorem says: information capacity = operational capacity

Pe =n

i=m+1

⌅n

i

⇧f i (1� f)n�i

C = maxp(x)

I(X; Y )

C =1

2log2(1 + |h|2P/PN)

C =

⌃�

12 log2(1 + |h|2P/PN)

Eh

�12 log2(1 + |h|2P/PN)

C =

⌃�

maxQ:Tr(Q)=P12 log2

⇤⇤IMR + HQH†⇤⇤

maxQ:Tr(Q)=P EH

�12 log2

⇤⇤IMR + HQH†⇤⇤⇥

Y = HX + N

X = H�1U + N

⌅Y = H(H�1U) + N

= U + N

C =1

2log2(1 + P/N)

R2 ⇤ I(Y2; X2|X1)

Let Z = (Y1, Y2,X1,X2,V1,V2, W ) be distributed as:

P (w)⇥ P (m1�|w)P (m1⇥|w)P (x1|m1�, m1⇥, w)

⇥ P (m⇥1�|m1�, w)P (m⇥

1⇥|m1⇥, w)P (m2�|v1, w)P (m2⇥|v1, w)

⇥ P (x2|m2�, m2⇥,m⇥, w)P (y1|x1,x2)P (y2|x1,x2)

1

• Operational channel capacity:

Highest rate (bits/channel use) that can communicate at reliably

Page 8: Chapter 7: Channel capacity · Chapter 7: Channel capacity. University of Illinois at Chicago ECE 534, Natasha Devroye Chapter 7 outline • Definition and examples of channel capacity

Capacity bits/channel useC = maxp(x)

I(X; Y )

1

I(X; Y ) =!

x,y

p(x, y) log

"

p(x, y)

p(x)p(y)

#

B = B1 + B2

γ = α, β

(R1α, R1β, R2α, R2β)

6

X YChannel: p(y|x)

Channel capacity

hard part, to find the ``capacity achieving input distribution.’’

Page 9: Chapter 7: Channel capacity · Chapter 7: Channel capacity. University of Illinois at Chicago ECE 534, Natasha Devroye Chapter 7 outline • Definition and examples of channel capacity

Noiseless channel

o o

1 1

Capacity?1 bit/channel use

Channel capacity

C = maxp(x)

I(X;Y )

= maxp(x)

H(X)�H(X|Y )

= maxp(x)

H(X)� 0

= 1

Page 10: Chapter 7: Channel capacity · Chapter 7: Channel capacity. University of Illinois at Chicago ECE 534, Natasha Devroye Chapter 7 outline • Definition and examples of channel capacity

o

o

1

1

2

3

1/2

1/2

1/2

1/2

Non-overlapping outputs

Capacity?1 bit/channel use

Channel capacity

C = maxp(x)

I(X;Y )

= maxp(x)

H(X)�H(X|Y )

= maxp(x)

H(X)� 0

= 1

Page 11: Chapter 7: Channel capacity · Chapter 7: Channel capacity. University of Illinois at Chicago ECE 534, Natasha Devroye Chapter 7 outline • Definition and examples of channel capacity

X

YZ

.

W

V

U

T

A

S

H

G

F

E

DC

JK

LMNOP

B

QR

I

Noisy typewriter

Capacity?

Channel capacity

C = maxp(x)

I(X;Y )

= maxp(x)

H(X)�H(X|Y )

= maxp(x)

H(X)� log2(3)

= log2(27)� log2(3) = log2(9)

bits/channel use

C = maxp(x)

I(X;Y )

= maxp(x)

H(X)�H(X|Y )

= maxp(x)

H(X)� log2(3)

= log2(27)� log2(3) = log2(9)

Page 12: Chapter 7: Channel capacity · Chapter 7: Channel capacity. University of Illinois at Chicago ECE 534, Natasha Devroye Chapter 7 outline • Definition and examples of channel capacity

o

o

e

11

f

f

1-f

1-f

Binary erasure channel

Capacity?1-f bits/channel use

Channel capacity

p

1-p

Page 13: Chapter 7: Channel capacity · Chapter 7: Channel capacity. University of Illinois at Chicago ECE 534, Natasha Devroye Chapter 7 outline • Definition and examples of channel capacity

o o

1 1

f

f

1-f

1-f

p(y=0|x=0) = p(y=1|x=1)=1-f

p(y=1|x=0) = p(y=0|x=1)=f

Conditional distributionsCapacity?

1-H(f) bits/channel use

Binary symmetric channelChannel capacity

[Cover+Thomas pg.187]

Page 14: Chapter 7: Channel capacity · Chapter 7: Channel capacity. University of Illinois at Chicago ECE 534, Natasha Devroye Chapter 7 outline • Definition and examples of channel capacity

Transition probability matrixReview Examples of Channel Channel Capacity Jointly Typical Sequences

Binary Channels

Binary Symmetric Channel: X = {0, 1} and Y = {0, 1}

X

1

0

1

0

Y

!1 − f ff 1 − f

"

Binary Erasure Channel: X = {0, 1} and Y = {0, ?, 1}

1

?X Y

0

1

0

!1 − f f 00 f 1 − f

"

Z channel: X = {0, 1} and Y = {0, 1}

X

1

0

1

0

Y

!1 0f 1 − f

"

B. Smida (ES250) Channel Capacity Fall 2008-09 9 / 22

Review Examples of Channel Channel Capacity Jointly Typical Sequences

Binary Channels

Binary Symmetric Channel: X = {0, 1} and Y = {0, 1}

X

1

0

1

0

Y

!1 − f ff 1 − f

"

Binary Erasure Channel: X = {0, 1} and Y = {0, ?, 1}

1

?X Y

0

1

0

!1 − f f 00 f 1 − f

"

Z channel: X = {0, 1} and Y = {0, 1}

X

1

0

1

0

Y

!1 0f 1 − f

"

B. Smida (ES250) Channel Capacity Fall 2008-09 9 / 22

Review Examples of Channel Channel Capacity Jointly Typical Sequences

Binary Channels

Binary Symmetric Channel: X = {0, 1} and Y = {0, 1}

X

1

0

1

0

Y

!1 − f ff 1 − f

"

Binary Erasure Channel: X = {0, 1} and Y = {0, ?, 1}

1

?X Y

0

1

0

!1 − f f 00 f 1 − f

"

Z channel: X = {0, 1} and Y = {0, 1}

X

1

0

1

0

Y

!1 0f 1 − f

"

B. Smida (ES250) Channel Capacity Fall 2008-09 9 / 22

Page 15: Chapter 7: Channel capacity · Chapter 7: Channel capacity. University of Illinois at Chicago ECE 534, Natasha Devroye Chapter 7 outline • Definition and examples of channel capacity

Symmetric channels

7.2 SYMMETRIC CHANNELS 189

Hence

C = maxp(x)

H(Y ) − H(α) (7.13)

= maxπ

(1 − α)H(π) + H(α) − H(α) (7.14)

= maxπ

(1 − α)H(π) (7.15)

= 1 − α, (7.16)

where capacity is achieved by π = 12 .

The expression for the capacity has some intuitive meaning: Since aproportion α of the bits are lost in the channel, we can recover (at most)a proportion 1 − α of the bits. Hence the capacity is at most 1 − α. It isnot immediately obvious that it is possible to achieve this rate. This willfollow from Shannon’s second theorem.

In many practical channels, the sender receives some feedback fromthe receiver. If feedback is available for the binary erasure channel, it isvery clear what to do: If a bit is lost, retransmit it until it gets through.Since the bits get through with probability 1 − α, the effective rate oftransmission is 1 − α. In this way we are easily able to achieve a capacityof 1 − α with feedback.

Later in the chapter we prove that the rate 1 − α is the best that can beachieved both with and without feedback. This is one of the consequencesof the surprising fact that feedback does not increase the capacity ofdiscrete memoryless channels.

7.2 SYMMETRIC CHANNELS

The capacity of the binary symmetric channel is C = 1 − H(p) bits pertransmission, and the capacity of the binary erasure channel is C = 1 −α bits per transmission. Now consider the channel with transition matrix:

p(y|x) =

⎣0.3 0.2 0.50.5 0.3 0.20.2 0.5 0.3

⎦. (7.17)

Here the entry in the xth row and the yth column denotes the conditionalprobability p(y|x) that y is received when x is sent. In this channel, allthe rows of the probability transition matrix are permutations of each otherand so are the columns. Such a channel is said to be symmetric. Anotherexample of a symmetric channel is one of the form

Y = X + Z (mod c), (7.18)

190 CHANNEL CAPACITY

where Z has some distribution on the integers {0, 1, 2, . . . , c − 1}, X hasthe same alphabet as Z, and Z is independent of X.

In both these cases, we can easily find an explicit expression for thecapacity of the channel. Letting r be a row of the transition matrix, wehave

I (X;Y) = H(Y) − H(Y |X) (7.19)

= H(Y) − H(r) (7.20)

≤ log |Y| − H(r) (7.21)

with equality if the output distribution is uniform. But p(x) = 1/|X|achieves a uniform distribution on Y , as seen from

p(y) =!

x∈Xp(y|x)p(x) = 1

|X|!

p(y|x) = c1

|X|= 1

|Y|, (7.22)

where c is the sum of the entries in one column of the probability transitionmatrix.

Thus, the channel in (7.17) has the capacity

C = maxp(x)

I (X;Y) = log 3 − H(0.5, 0.3, 0.2), (7.23)

and C is achieved by a uniform distribution on the input.The transition matrix of the symmetric channel defined above is doubly

stochastic. In the computation of the capacity, we used the facts that therows were permutations of one another and that all the column sums wereequal.

Considering these properties, we can define a generalization of theconcept of a symmetric channel as follows:

Definition A channel is said to be symmetric if the rows of the channeltransition matrix p(y|x) are permutations of each other and the columnsare permutations of each other. A channel is said to be weakly symmetricif every row of the transition matrix p(·|x) is a permutation of every otherrow and all the column sums

"x p(y|x) are equal.

For example, the channel with transition matrix

p(y|x) =#

13

16

12

13

12

16

$

(7.24)

is weakly symmetric but not symmetric.

Page 16: Chapter 7: Channel capacity · Chapter 7: Channel capacity. University of Illinois at Chicago ECE 534, Natasha Devroye Chapter 7 outline • Definition and examples of channel capacity

Capacity of weakly symmetric channels

Page 17: Chapter 7: Channel capacity · Chapter 7: Channel capacity. University of Illinois at Chicago ECE 534, Natasha Devroye Chapter 7 outline • Definition and examples of channel capacity

Properties of the channel capacityPe =n

i=m+1

⌅n

i

⇧f i (1� f)n�i

C = maxp(x)

I(X; Y )

C =1

2log2(1 + |h|2P/PN)

C =

⌃�

12 log2(1 + |h|2P/PN)

Eh

�12 log2(1 + |h|2P/PN)

C =

⌃�

maxQ:Tr(Q)=P12 log2

⇤⇤IMR + HQH†⇤⇤

maxQ:Tr(Q)=P EH

�12 log2

⇤⇤IMR + HQH†⇤⇤⇥

Y = HX + N

X = H�1U + N

⌅Y = H(H�1U) + N

= U + N

C =1

2log2(1 + P/N)

R2 ⇤ I(Y2; X2|X1)

Let Z = (Y1, Y2,X1,X2,V1,V2, W ) be distributed as:

P (w)⇥ P (m1�|w)P (m1⇥|w)P (x1|m1�, m1⇥, w)

⇥ P (m⇥1�|m1�, w)P (m⇥

1⇥|m1⇥, w)P (m2�|v1, w)P (m2⇥|v1, w)

⇥ P (x2|m2�, m2⇥,m⇥, w)P (y1|x1,x2)P (y2|x1,x2)

1

Page 18: Chapter 7: Channel capacity · Chapter 7: Channel capacity. University of Illinois at Chicago ECE 534, Natasha Devroye Chapter 7 outline • Definition and examples of channel capacity

Preview of the channel coding theorem

• What happens when we use the channel n times?

Page 19: Chapter 7: Channel capacity · Chapter 7: Channel capacity. University of Illinois at Chicago ECE 534, Natasha Devroye Chapter 7 outline • Definition and examples of channel capacity

Preview of the channel coding theorem

Review Examples of Channel Channel Capacity Jointly Typical Sequences

Previous of the channel coding theorem

An average input sequence corresponds to about 2nH(Y |X ) typical outputsequences

There are a total of 2nH(Y ) typical output sequences

For nearly error free transmission, we select a number of input sequenceswhose corresponding sets of output sequences hardly overlap

The maximum number of distinct sets of output sequences is2n(H(Y )−H(Y |X )) = 2nI (Y ;X )

B. Smida (ES250) Channel Capacity Fall 2008-09 18 / 22

Page 20: Chapter 7: Channel capacity · Chapter 7: Channel capacity. University of Illinois at Chicago ECE 534, Natasha Devroye Chapter 7 outline • Definition and examples of channel capacity

University of Illinois at Chicago ECE 534, Fall 2009, Natasha Devroye

Let’s make this rigorous!

Page 21: Chapter 7: Channel capacity · Chapter 7: Channel capacity. University of Illinois at Chicago ECE 534, Natasha Devroye Chapter 7 outline • Definition and examples of channel capacity

Definitions

Channel

Page 22: Chapter 7: Channel capacity · Chapter 7: Channel capacity. University of Illinois at Chicago ECE 534, Natasha Devroye Chapter 7 outline • Definition and examples of channel capacity

Definitions Source Encoder Channel Decoder Destination

MessageEstimate of message

Page 23: Chapter 7: Channel capacity · Chapter 7: Channel capacity. University of Illinois at Chicago ECE 534, Natasha Devroye Chapter 7 outline • Definition and examples of channel capacity

Definitions Source Encoder Channel Decoder Destination

MessageEstimate of message

Page 24: Chapter 7: Channel capacity · Chapter 7: Channel capacity. University of Illinois at Chicago ECE 534, Natasha Devroye Chapter 7 outline • Definition and examples of channel capacity

Definitions Source Encoder Channel Decoder Destination

MessageEstimate of message

Page 25: Chapter 7: Channel capacity · Chapter 7: Channel capacity. University of Illinois at Chicago ECE 534, Natasha Devroye Chapter 7 outline • Definition and examples of channel capacity

University of Illinois at Chicago ECE 534, Fall 2009, Natasha Devroye

What’s our goal?

Page 26: Chapter 7: Channel capacity · Chapter 7: Channel capacity. University of Illinois at Chicago ECE 534, Natasha Devroye Chapter 7 outline • Definition and examples of channel capacity

Mathematical description of capacity

• Information channel capacity:

• Channel coding theorem says: information capacity = operational capacity

Pe =n

i=m+1

⌅n

i

⇧f i (1� f)n�i

C = maxp(x)

I(X; Y )

C =1

2log2(1 + |h|2P/PN)

C =

⌃�

12 log2(1 + |h|2P/PN)

Eh

�12 log2(1 + |h|2P/PN)

C =

⌃�

maxQ:Tr(Q)=P12 log2

⇤⇤IMR + HQH†⇤⇤

maxQ:Tr(Q)=P EH

�12 log2

⇤⇤IMR + HQH†⇤⇤⇥

Y = HX + N

X = H�1U + N

⌅Y = H(H�1U) + N

= U + N

C =1

2log2(1 + P/N)

R2 ⇤ I(Y2; X2|X1)

Let Z = (Y1, Y2,X1,X2,V1,V2, W ) be distributed as:

P (w)⇥ P (m1�|w)P (m1⇥|w)P (x1|m1�, m1⇥, w)

⇥ P (m⇥1�|m1�, w)P (m⇥

1⇥|m1⇥, w)P (m2�|v1, w)P (m2⇥|v1, w)

⇥ P (x2|m2�, m2⇥,m⇥, w)P (y1|x1,x2)P (y2|x1,x2)

1

• Operational channel capacity:

Highest rate (bits/channel use) that can communicate at reliably

Page 27: Chapter 7: Channel capacity · Chapter 7: Channel capacity. University of Illinois at Chicago ECE 534, Natasha Devroye Chapter 7 outline • Definition and examples of channel capacity

Recall the definition of typical sequences....

Let’s make this 2-D!

Page 28: Chapter 7: Channel capacity · Chapter 7: Channel capacity. University of Illinois at Chicago ECE 534, Natasha Devroye Chapter 7 outline • Definition and examples of channel capacity

Jointly typical sequences

Page 29: Chapter 7: Channel capacity · Chapter 7: Channel capacity. University of Illinois at Chicago ECE 534, Natasha Devroye Chapter 7 outline • Definition and examples of channel capacity

Joint Asymptotic Equipartition Theorem (AEP)

Page 30: Chapter 7: Channel capacity · Chapter 7: Channel capacity. University of Illinois at Chicago ECE 534, Natasha Devroye Chapter 7 outline • Definition and examples of channel capacity

Joint typicality imagesReview Channel Coding Theorem Proof of Achievability Proof of Converse

Jointly Typical Diagram

There are about 2nH(X ) typical X in all

Each typical Y is jointly typical with about 2nH(X |Y ) of those typical X ’s

The jointly typical pairs are a fraction 2−nI (X ;Y ) of the inner rectangular

B. Smida (ES250) Channel Capacity Fall 2008-09 9 / 23

Page 31: Chapter 7: Channel capacity · Chapter 7: Channel capacity. University of Illinois at Chicago ECE 534, Natasha Devroye Chapter 7 outline • Definition and examples of channel capacity

Channel coding theorem

Page 32: Chapter 7: Channel capacity · Chapter 7: Channel capacity. University of Illinois at Chicago ECE 534, Natasha Devroye Chapter 7 outline • Definition and examples of channel capacity

Key ideas behind channel coding theorem

• Allow for arbitrarily small but nonzero probability of error

• Use channel many times in succession: law of large numbers!

• Probability of error calculated over a random choice of codebooks

• Joint typicality decoders used for simplicity of proof

• NOT constructive! Does NOT tell us how to code to achieve capacity!

Page 33: Chapter 7: Channel capacity · Chapter 7: Channel capacity. University of Illinois at Chicago ECE 534, Natasha Devroye Chapter 7 outline • Definition and examples of channel capacity

Key ideas behind the channel coding theorem

Page 34: Chapter 7: Channel capacity · Chapter 7: Channel capacity. University of Illinois at Chicago ECE 534, Natasha Devroye Chapter 7 outline • Definition and examples of channel capacity

Random codes Source Encoder Channel Decoder Destination

MessageEstimate of message

Page 35: Chapter 7: Channel capacity · Chapter 7: Channel capacity. University of Illinois at Chicago ECE 534, Natasha Devroye Chapter 7 outline • Definition and examples of channel capacity

Transmission Source Encoder Channel Decoder Destination

MessageEstimate of message

Page 36: Chapter 7: Channel capacity · Chapter 7: Channel capacity. University of Illinois at Chicago ECE 534, Natasha Devroye Chapter 7 outline • Definition and examples of channel capacity

Probability of error Source Encoder Channel Decoder Destination

MessageEstimate of message

Page 37: Chapter 7: Channel capacity · Chapter 7: Channel capacity. University of Illinois at Chicago ECE 534, Natasha Devroye Chapter 7 outline • Definition and examples of channel capacity

Probability of error Source Encoder Channel Decoder Destination

MessageEstimate of message

Page 38: Chapter 7: Channel capacity · Chapter 7: Channel capacity. University of Illinois at Chicago ECE 534, Natasha Devroye Chapter 7 outline • Definition and examples of channel capacity

Random codes? Source Encoder Channel Decoder Destination

MessageEstimate of message

Page 39: Chapter 7: Channel capacity · Chapter 7: Channel capacity. University of Illinois at Chicago ECE 534, Natasha Devroye Chapter 7 outline • Definition and examples of channel capacity

Analogy....

Copyright Cambridge University Press 2003. On-screen viewing permitted. Printing not permitted. http://www.cambridge.org/0521642981

You can buy this book for 30 pounds or $50. See http://www.inference.phy.cam.ac.uk/mackay/itila/ for links.

164 10 — The Noisy-Channel Coding Theorem

✛ ✲

✻ ✲✛2NH(X)

2NH(Y )

✲✛

✲✛2NH(X|Y )

❄2NH(Y |X)

! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !

! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !

! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !

! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !

! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !

! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !

2NH(X,Y ) dots

ANX

ANY

Figure 10.2. The jointly-typicalset. The horizontal directionrepresents AN

X , the set of all inputstrings of length N . The verticaldirection represents AN

Y , the set ofall output strings of length N .The outer box contains allconceivable input–output pairs.Each dot represents ajointly-typical pair of sequences(x,y). The total number ofjointly-typical sequences is about2NH(X,Y ).

10.3 Proof of the noisy-channel coding theorem

Analogy

Imagine that we wish to prove that there is a baby in a class of one hundredbabies who weighs less than 10 kg. Individual babies are difficult to catch andweigh. Shannon’s method of solving the task is to scoop up all the babies

Figure 10.3. Shannon’s method forproving one baby weighs less than10 kg.

and weigh them all at once on a big weighing machine. If we find that theiraverage weight is smaller than 10 kg, there must exist at least one baby whoweighs less than 10 kg – indeed there must be many! Shannon’s method isn’tguaranteed to reveal the existence of an underweight child, since it relies onthere being a tiny number of elephants in the class. But if we use his methodand get a total weight smaller than 1000 kg then our task is solved.

From skinny children to fantastic codes

We wish to show that there exists a code and a decoder having small prob-ability of error. Evaluating the probability of error of any particular codingand decoding system is not easy. Shannon’s innovation was this: instead ofconstructing a good coding and decoding system and evaluating its error prob-ability, Shannon calculated the average probability of block error of all codes,and proved that this average is small. There must then exist individual codesthat have small probability of block error.

Random coding and typical-set decoding

Consider the following encoding–decoding system, whose rate is R ′.

1. We fix P (x) and generate the S = 2NR′ codewords of a (N,NR′) =

[Mackay textbook, pg. 164]

Page 40: Chapter 7: Channel capacity · Chapter 7: Channel capacity. University of Illinois at Chicago ECE 534, Natasha Devroye Chapter 7 outline • Definition and examples of channel capacity

Channel coding theorem

Page 41: Chapter 7: Channel capacity · Chapter 7: Channel capacity. University of Illinois at Chicago ECE 534, Natasha Devroye Chapter 7 outline • Definition and examples of channel capacity

Converse to the channel coding theorem

• Based of Fano’s inequality:

Page 42: Chapter 7: Channel capacity · Chapter 7: Channel capacity. University of Illinois at Chicago ECE 534, Natasha Devroye Chapter 7 outline • Definition and examples of channel capacity

Converse to the channel coding theorem

• Need one more result before we prove the converse:

• What does this mean?

Now let’s prove the channel coding converse

Page 43: Chapter 7: Channel capacity · Chapter 7: Channel capacity. University of Illinois at Chicago ECE 534, Natasha Devroye Chapter 7 outline • Definition and examples of channel capacity

Converse to the channel coding theorem

Page 44: Chapter 7: Channel capacity · Chapter 7: Channel capacity. University of Illinois at Chicago ECE 534, Natasha Devroye Chapter 7 outline • Definition and examples of channel capacity

Weak versus strong converses

• Weak converse:

• Strong converse:

• Channel capacity: sharp dividing point below which exponentially fast, and above which exponentially fast.

Page 45: Chapter 7: Channel capacity · Chapter 7: Channel capacity. University of Illinois at Chicago ECE 534, Natasha Devroye Chapter 7 outline • Definition and examples of channel capacity

Practical coding schemes

SourceSource

coderChannel

Source

decoderDestination

Noise

Channel

coder

Channel

decoder

Encoder Decoder

Page 46: Chapter 7: Channel capacity · Chapter 7: Channel capacity. University of Illinois at Chicago ECE 534, Natasha Devroye Chapter 7 outline • Definition and examples of channel capacity

Example: channel coding

With permission from David J.C. Mackay

Page 47: Chapter 7: Channel capacity · Chapter 7: Channel capacity. University of Illinois at Chicago ECE 534, Natasha Devroye Chapter 7 outline • Definition and examples of channel capacity

Example: channel coding

With permission from David J.C. Mackay

Rate?

R = # source bits / # coded bits

Page 48: Chapter 7: Channel capacity · Chapter 7: Channel capacity. University of Illinois at Chicago ECE 534, Natasha Devroye Chapter 7 outline • Definition and examples of channel capacity

Example: channel coding

With permission from David J.C. Mackay

Use repetition code of rate R=1/n: 0 → 000...0 1 → 111...1

Decoder? Majority vote

[n=2m+1]Probability of error?

Pe =n

i=m+1

⌅n

i

⇧f i (1� f)n�i

C = maxp(x)

I(X; Y )

C =1

2log2(1 + |h|2P/PN)

C =

⌃�

12 log2(1 + |h|2P/PN)

Eh

�12 log2(1 + |h|2P/PN)

C =

⌃�

maxQ:Tr(Q)=P12 log2

⇤⇤IMR + HQH†⇤⇤

maxQ:Tr(Q)=P EH

�12 log2

⇤⇤IMR + HQH†⇤⇤⇥

Y = HX + N

X = H�1U + N

⌅Y = H(H�1U) + N

= U + N

C =1

2log2(1 + P/N)

R2 ⇤ I(Y2; X2|X1)

Let Z = (Y1, Y2,X1,X2,V1,V2, W ) be distributed as:

P (w)⇥ P (m1�|w)P (m1⇥|w)P (x1|m1�, m1⇥, w)

⇥ P (m⇥1�|m1�, w)P (m⇥

1⇥|m1⇥, w)P (m2�|v1, w)P (m2⇥|v1, w)

⇥ P (x2|m2�, m2⇥,m⇥, w)P (y1|x1,x2)P (y2|x1,x2)

1

Need n →∞ for reliable communication!

Page 49: Chapter 7: Channel capacity · Chapter 7: Channel capacity. University of Illinois at Chicago ECE 534, Natasha Devroye Chapter 7 outline • Definition and examples of channel capacity

Channel capacity

• Is capacity R = 0?

• No! just need better coding!

• Now, we’re more interested in determining capacity than determining (finding codes) to achieve it

• Benchmarks

Page 50: Chapter 7: Channel capacity · Chapter 7: Channel capacity. University of Illinois at Chicago ECE 534, Natasha Devroye Chapter 7 outline • Definition and examples of channel capacity

Practical coding schemes

SourceSource

coderChannel

Source

decoderDestination

Noise

Channel

coder

Channel

decoder

Encoder Decoder

Page 51: Chapter 7: Channel capacity · Chapter 7: Channel capacity. University of Illinois at Chicago ECE 534, Natasha Devroye Chapter 7 outline • Definition and examples of channel capacity

Linear block codes

Page 52: Chapter 7: Channel capacity · Chapter 7: Channel capacity. University of Illinois at Chicago ECE 534, Natasha Devroye Chapter 7 outline • Definition and examples of channel capacity

Properties of linear block codes

Page 53: Chapter 7: Channel capacity · Chapter 7: Channel capacity. University of Illinois at Chicago ECE 534, Natasha Devroye Chapter 7 outline • Definition and examples of channel capacity

Properties of linear block codes

Page 54: Chapter 7: Channel capacity · Chapter 7: Channel capacity. University of Illinois at Chicago ECE 534, Natasha Devroye Chapter 7 outline • Definition and examples of channel capacity

Examples

Page 55: Chapter 7: Channel capacity · Chapter 7: Channel capacity. University of Illinois at Chicago ECE 534, Natasha Devroye Chapter 7 outline • Definition and examples of channel capacity

Hamming codes

• # of codewords?

• what are the codewords?

7.11 HAMMING CODES 211

the block as a 1; otherwise, we decode it as 0. An error occurs if andonly if more than three of the bits are changed. By using longer repetitioncodes, we can achieve an arbitrarily low probability of error. But the rateof the code also goes to zero with block length, so even though the codeis “simple,” it is really not a very useful code.

Instead of simply repeating the bits, we can combine the bits in someintelligent fashion so that each extra bit checks whether there is an error insome subset of the information bits. A simple example of this is a paritycheck code. Starting with a block of n − 1 information bits, we choosethe nth bit so that the parity of the entire block is 0 (the number of 1’sin the block is even). Then if there is an odd number of errors duringthe transmission, the receiver will notice that the parity has changed anddetect the error. This is the simplest example of an error-detecting code.The code does not detect an even number of errors and does not give anyinformation about how to correct the errors that occur.

We can extend the idea of parity checks to allow for more than oneparity check bit and to allow the parity checks to depend on various subsetsof the information bits. The Hamming code that we describe below is anexample of a parity check code. We describe it using some simple ideasfrom linear algebra.

To illustrate the principles of Hamming codes, we consider a binarycode of block length 7. All operations will be done modulo 2. Considerthe set of all nonzero binary vectors of length 3. Arrange them in columnsto form a matrix:

H =

⎣0 0 0 1 1 1 10 1 1 0 0 1 11 0 1 0 1 0 1

⎦. (7.117)

Consider the set of vectors of length 7 in the null space of H (the vectorswhich when multiplied by H give 000). From the theory of linear spaces,since H has rank 3, we expect the null space of H to have dimension 4.These 24 codewords are

0000000 0100101 1000011 11001100001111 0101010 1001100 11010010010110 0110011 1010101 11100000011001 0111100 1011010 1111111

Since the set of codewords is the null space of a matrix, it is linear in thesense that the sum of any two codewords is also a codeword. The set ofcodewords therefore forms a linear subspace of dimension 4 in the vectorspace of dimension 7.

7.11 HAMMING CODES 211

the block as a 1; otherwise, we decode it as 0. An error occurs if andonly if more than three of the bits are changed. By using longer repetitioncodes, we can achieve an arbitrarily low probability of error. But the rateof the code also goes to zero with block length, so even though the codeis “simple,” it is really not a very useful code.

Instead of simply repeating the bits, we can combine the bits in someintelligent fashion so that each extra bit checks whether there is an error insome subset of the information bits. A simple example of this is a paritycheck code. Starting with a block of n − 1 information bits, we choosethe nth bit so that the parity of the entire block is 0 (the number of 1’sin the block is even). Then if there is an odd number of errors duringthe transmission, the receiver will notice that the parity has changed anddetect the error. This is the simplest example of an error-detecting code.The code does not detect an even number of errors and does not give anyinformation about how to correct the errors that occur.

We can extend the idea of parity checks to allow for more than oneparity check bit and to allow the parity checks to depend on various subsetsof the information bits. The Hamming code that we describe below is anexample of a parity check code. We describe it using some simple ideasfrom linear algebra.

To illustrate the principles of Hamming codes, we consider a binarycode of block length 7. All operations will be done modulo 2. Considerthe set of all nonzero binary vectors of length 3. Arrange them in columnsto form a matrix:

H =

⎣0 0 0 1 1 1 10 1 1 0 0 1 11 0 1 0 1 0 1

⎦. (7.117)

Consider the set of vectors of length 7 in the null space of H (the vectorswhich when multiplied by H give 000). From the theory of linear spaces,since H has rank 3, we expect the null space of H to have dimension 4.These 24 codewords are

0000000 0100101 1000011 11001100001111 0101010 1001100 11010010010110 0110011 1010101 11100000011001 0111100 1011010 1111111

Since the set of codewords is the null space of a matrix, it is linear in thesense that the sum of any two codewords is also a codeword. The set ofcodewords therefore forms a linear subspace of dimension 4 in the vectorspace of dimension 7.

Page 56: Chapter 7: Channel capacity · Chapter 7: Channel capacity. University of Illinois at Chicago ECE 534, Natasha Devroye Chapter 7 outline • Definition and examples of channel capacity

Achieving capacity

• Linear block codes: not good enough...

• Convolutional codes: not good enough......

7.11 HAMMING CODES 211

the block as a 1; otherwise, we decode it as 0. An error occurs if andonly if more than three of the bits are changed. By using longer repetitioncodes, we can achieve an arbitrarily low probability of error. But the rateof the code also goes to zero with block length, so even though the codeis “simple,” it is really not a very useful code.

Instead of simply repeating the bits, we can combine the bits in someintelligent fashion so that each extra bit checks whether there is an error insome subset of the information bits. A simple example of this is a paritycheck code. Starting with a block of n − 1 information bits, we choosethe nth bit so that the parity of the entire block is 0 (the number of 1’sin the block is even). Then if there is an odd number of errors duringthe transmission, the receiver will notice that the parity has changed anddetect the error. This is the simplest example of an error-detecting code.The code does not detect an even number of errors and does not give anyinformation about how to correct the errors that occur.

We can extend the idea of parity checks to allow for more than oneparity check bit and to allow the parity checks to depend on various subsetsof the information bits. The Hamming code that we describe below is anexample of a parity check code. We describe it using some simple ideasfrom linear algebra.

To illustrate the principles of Hamming codes, we consider a binarycode of block length 7. All operations will be done modulo 2. Considerthe set of all nonzero binary vectors of length 3. Arrange them in columnsto form a matrix:

H =

⎣0 0 0 1 1 1 10 1 1 0 0 1 11 0 1 0 1 0 1

⎦. (7.117)

Consider the set of vectors of length 7 in the null space of H (the vectorswhich when multiplied by H give 000). From the theory of linear spaces,since H has rank 3, we expect the null space of H to have dimension 4.These 24 codewords are

0000000 0100101 1000011 11001100001111 0101010 1001100 11010010010110 0110011 1010101 11100000011001 0111100 1011010 1111111

Since the set of codewords is the null space of a matrix, it is linear in thesense that the sum of any two codewords is also a codeword. The set ofcodewords therefore forms a linear subspace of dimension 4 in the vectorspace of dimension 7.

Page 57: Chapter 7: Channel capacity · Chapter 7: Channel capacity. University of Illinois at Chicago ECE 534, Natasha Devroye Chapter 7 outline • Definition and examples of channel capacity

Achieving capacity

• Linear block codes: not good enough...

• Convolutional codes: not good enough...

• Turbo codes: 1993 Berrou et al. considered two interleaved convolutional codes with parallel cooperative decoders. Achieved close to capacity!

• LDPC codes: Low Density Parity Check Codes introduced by Gallager in his 1963 thesis, later kept alive by Michael Tanner (UIC’s former provost!) in the 80s and then “re-discovered” in the 90s, where an iterative message passing algorithm used to decode was formulated. Also achieve close to capacity!

• Excellent survey is (linked to on course website)

• UPDATE: 2009, Arikan introduced Polar codes first code with an explicit construction to provably achieve the channel capacity for symmetric binary-input, discrete, memoryless channels

7.11 HAMMING CODES 211

the block as a 1; otherwise, we decode it as 0. An error occurs if andonly if more than three of the bits are changed. By using longer repetitioncodes, we can achieve an arbitrarily low probability of error. But the rateof the code also goes to zero with block length, so even though the codeis “simple,” it is really not a very useful code.

Instead of simply repeating the bits, we can combine the bits in someintelligent fashion so that each extra bit checks whether there is an error insome subset of the information bits. A simple example of this is a paritycheck code. Starting with a block of n − 1 information bits, we choosethe nth bit so that the parity of the entire block is 0 (the number of 1’sin the block is even). Then if there is an odd number of errors duringthe transmission, the receiver will notice that the parity has changed anddetect the error. This is the simplest example of an error-detecting code.The code does not detect an even number of errors and does not give anyinformation about how to correct the errors that occur.

We can extend the idea of parity checks to allow for more than oneparity check bit and to allow the parity checks to depend on various subsetsof the information bits. The Hamming code that we describe below is anexample of a parity check code. We describe it using some simple ideasfrom linear algebra.

To illustrate the principles of Hamming codes, we consider a binarycode of block length 7. All operations will be done modulo 2. Considerthe set of all nonzero binary vectors of length 3. Arrange them in columnsto form a matrix:

H =

⎣0 0 0 1 1 1 10 1 1 0 0 1 11 0 1 0 1 0 1

⎦. (7.117)

Consider the set of vectors of length 7 in the null space of H (the vectorswhich when multiplied by H give 000). From the theory of linear spaces,since H has rank 3, we expect the null space of H to have dimension 4.These 24 codewords are

0000000 0100101 1000011 11001100001111 0101010 1001100 11010010010110 0110011 1010101 11100000011001 0111100 1011010 1111111

Since the set of codewords is the null space of a matrix, it is linear in thesense that the sum of any two codewords is also a codeword. The set ofcodewords therefore forms a linear subspace of dimension 4 in the vectorspace of dimension 7.

Channel Coding: The Road to Channel Capacity

Forney, G.D.   Costello, D.J.  

This paper appears in: Proceedings of the IEEE

Publication Date: June 2007

Volume: 95,  Issue: 6

On page(s): 1150-1177

ISSN: 0018-9219

INSPEC Accession Number: 9629854

Digital Object Identifier: 10.1109/JPROC.2007.895188

Current Version Published: 2007-07-30

Page 58: Chapter 7: Channel capacity · Chapter 7: Channel capacity. University of Illinois at Chicago ECE 534, Natasha Devroye Chapter 7 outline • Definition and examples of channel capacity

Feedback capacity

Source Encoder Channel Decoder Destination

MessageEstimate of message

Channel without feedback

Channel WITH feedback

Source Encoder Decoder Destination

MessageEstimate of message

Page 59: Chapter 7: Channel capacity · Chapter 7: Channel capacity. University of Illinois at Chicago ECE 534, Natasha Devroye Chapter 7 outline • Definition and examples of channel capacity

Feedback capacity

Page 60: Chapter 7: Channel capacity · Chapter 7: Channel capacity. University of Illinois at Chicago ECE 534, Natasha Devroye Chapter 7 outline • Definition and examples of channel capacity

Feedback capacity

Source Encoder Channel Decoder Destination

MessageEstimate of message

Channel without feedback

Channel WITH feedback

Source Encoder Decoder Destination

MessageEstimate of message

Page 61: Chapter 7: Channel capacity · Chapter 7: Channel capacity. University of Illinois at Chicago ECE 534, Natasha Devroye Chapter 7 outline • Definition and examples of channel capacity

Source-channel separation

• When are we allowed to design the source and channel coder separately AND remain optimal from an end-to-end perspective?

Source Encoder Channel Decoder Destination

Noise

SourceSource

coderChannel

Source

decoderDestination

Noise

Channel

coder

Channel

decoder

Encoder Decoder

Page 62: Chapter 7: Channel capacity · Chapter 7: Channel capacity. University of Illinois at Chicago ECE 534, Natasha Devroye Chapter 7 outline • Definition and examples of channel capacity

Source-channel separation

Page 63: Chapter 7: Channel capacity · Chapter 7: Channel capacity. University of Illinois at Chicago ECE 534, Natasha Devroye Chapter 7 outline • Definition and examples of channel capacity

Source-channel separation: achievability

Source Encoder Channel Decoder Destination

Page 64: Chapter 7: Channel capacity · Chapter 7: Channel capacity. University of Illinois at Chicago ECE 534, Natasha Devroye Chapter 7 outline • Definition and examples of channel capacity

Source-channel separation: converse

Page 65: Chapter 7: Channel capacity · Chapter 7: Channel capacity. University of Illinois at Chicago ECE 534, Natasha Devroye Chapter 7 outline • Definition and examples of channel capacity

Source-channel separation

Source Encoder Channel Decoder Destination

Noise

SourceSource

coderChannel

Source

decoderDestination

Noise

Channel

coder

Channel

decoder

Encoder Decoder

Page 66: Chapter 7: Channel capacity · Chapter 7: Channel capacity. University of Illinois at Chicago ECE 534, Natasha Devroye Chapter 7 outline • Definition and examples of channel capacity

University of Illinois at Chicago ECE 534, Fall 2009, Natasha Devroye

222 CHANNEL CAPACITY

SourceEncoder

SourceDecoder

ChannelEncoder

ChannelDecoder

Channelp(y|x)

Vn Xn(Vn) Yn V n^

FIGURE 7.15. Separate source and channel coding.

compression theorem is a consequence of the AEP, which shows thatthere exists a “small” subset (of size 2nH ) of all possible source sequencesthat contain most of the probability and that we can therefore representthe source with a small probability of error using H bits per symbol.The data transmission theorem is based on the joint AEP; it uses thefact that for long block lengths, the output sequence of the channel isvery likely to be jointly typical with the input codeword, while any othercodeword is jointly typical with probability ≈ 2−nI . Hence, we can useabout 2nI codewords and still have negligible probability of error. Thesource–channel separation theorem shows that we can design the sourcecode and the channel code separately and combine the results to achieveoptimal performance.

SUMMARY

Channel capacity. The logarithm of the number of distinguishableinputs is given by

C = maxp(x)

I (X;Y).

Examples

• Binary symmetric channel: C = 1 − H(p).

• Binary erasure channel: C = 1 − α.

• Symmetric channel: C = log |Y| − H(row of transition matrix).

Properties of C

1. 0 ≤ C ≤ min{log |X|, log |Y|}.2. I (X;Y) is a continuous concave function of p(x).

Joint typicality. The set A(n)ϵ of jointly typical sequences {(xn, yn)}

with respect to the distribution p(x, y) is given by

A(n)ϵ =

!(xn, yn) ∈ Xn × Yn : (7.151)

""""−1n

log p(xn) − H(X)

"""" < ϵ, (7.152)

Page 67: Chapter 7: Channel capacity · Chapter 7: Channel capacity. University of Illinois at Chicago ECE 534, Natasha Devroye Chapter 7 outline • Definition and examples of channel capacity

University of Illinois at Chicago ECE 534, Fall 2009, Natasha Devroye

PROBLEMS 223

!!!!−1n

log p(yn) − H(Y)

!!!! < ϵ, (7.153)

!!!!−1n

log p(xn, yn) − H(X, Y )

!!!! < ϵ

", (7.154)

where p(xn, yn) =#n

i=1 p(xi, yi).

Joint AEP. Let (Xn, Y n) be sequences of length n drawn i.i.d. accord-ing to p(xn, yn) =

#ni=1 p(xi, yi). Then:

1. Pr((Xn, Y n) ∈ A(n)ϵ ) → 1 as n → ∞.

2. |A(n)ϵ | ≤ 2n(H(X,Y )+ϵ).

3. If (Xn, Y n) ∼ p(xn)p(yn), then Pr$(Xn, Y n) ∈ A(n)

ϵ

%

≤ 2−n(I (X;Y)−3ϵ).

Channel coding theorem. All rates below capacity C are achievable,and all rates above capacity are not; that is, for all rates R < C, thereexists a sequence of (2nR, n) codes with probability of error λ(n) → 0.Conversely, for rates R > C, λ(n) is bounded away from 0.

Feedback capacity. Feedback does not increase capacity for discretememoryless channels (i.e., CFB = C).

Source–channel theorem. A stochastic process with entropy rate Hcannot be sent reliably over a discrete memoryless channel if H >C. Conversely, if the process satisfies the AEP, the source can betransmitted reliably if H < C.

PROBLEMS

7.1 Preprocessing the output. One is given a communication chan-nel with transition probabilities p(y|x) and channel capacity C =maxp(x) I (X;Y). A helpful statistician preprocesses the output byforming Y = g(Y ). He claims that this will strictly improve thecapacity.(a) Show that he is wrong.(b) Under what conditions does he not strictly decrease the

capacity?

Page 68: Chapter 7: Channel capacity · Chapter 7: Channel capacity. University of Illinois at Chicago ECE 534, Natasha Devroye Chapter 7 outline • Definition and examples of channel capacity

University of Illinois at Chicago ECE 534, Fall 2009, Natasha Devroye

PROBLEMS 223

!!!!−1n

log p(yn) − H(Y)

!!!! < ϵ, (7.153)

!!!!−1n

log p(xn, yn) − H(X, Y )

!!!! < ϵ

", (7.154)

where p(xn, yn) =#n

i=1 p(xi, yi).

Joint AEP. Let (Xn, Y n) be sequences of length n drawn i.i.d. accord-ing to p(xn, yn) =

#ni=1 p(xi, yi). Then:

1. Pr((Xn, Y n) ∈ A(n)ϵ ) → 1 as n → ∞.

2. |A(n)ϵ | ≤ 2n(H(X,Y )+ϵ).

3. If (Xn, Y n) ∼ p(xn)p(yn), then Pr$(Xn, Y n) ∈ A(n)

ϵ

%

≤ 2−n(I (X;Y)−3ϵ).

Channel coding theorem. All rates below capacity C are achievable,and all rates above capacity are not; that is, for all rates R < C, thereexists a sequence of (2nR, n) codes with probability of error λ(n) → 0.Conversely, for rates R > C, λ(n) is bounded away from 0.

Feedback capacity. Feedback does not increase capacity for discretememoryless channels (i.e., CFB = C).

Source–channel theorem. A stochastic process with entropy rate Hcannot be sent reliably over a discrete memoryless channel if H >C. Conversely, if the process satisfies the AEP, the source can betransmitted reliably if H < C.

PROBLEMS

7.1 Preprocessing the output. One is given a communication chan-nel with transition probabilities p(y|x) and channel capacity C =maxp(x) I (X;Y). A helpful statistician preprocesses the output byforming Y = g(Y ). He claims that this will strictly improve thecapacity.(a) Show that he is wrong.(b) Under what conditions does he not strictly decrease the

capacity?