ECE 515 Information Theoryagullive/coding515.pdf · 2019-11-20 · Information Theory Channel Capacity and Coding 1. Information Theory Problems ... Example 33 Y|X .7 .2 .1.1 .7 .2.2

ECE 515Information Theory

Channel Capacity and Coding

1

Information Theory Problems

• How to transmit or store information as efficiently as possible.

• What is the maximum amount of information that can be transmitted or stored reliably?

• How can information be kept secure?

2

Digital Communications System

3

Data source

Data sink

W X

YŴ

PX

PY

PY|X

Ŵ is an estimate of W

Communication Channel

4

Discrete Memoryless Channel

x1

x2

xK

y1

y2

yJ

.

.

....

J≥K≥25

X Y

Discrete Memoryless Channel

Channel Transition Matrix

1 1 1 1

1

1

( | ) ( | ) ( | )

( | ) ( | ) ( | )

( | ) ( | ) ( | )

k K

j j k j K

J J k J K

p y x p y x p y x

P p y x p y x p y x

p y x p y x p y x

=

6

Binary Symmetric Channel

7

x1

x2

y1

y2

channel matrix

p pp p

0

e

p

p q

q

1-p-q

1-p-q

8

1

0

1

x1

x2

y1

y2

y3

Binary Errors with Erasure Channel

BPSK Modulation K=2

( ) ( )

( ) ( )

1

2

2cos 2 0 bit

2cos 2 1 bit

bc

b

bc

b

Ex t f t

T

Ex t f t

T

π

π

= +

= −

9

Eb is the energy per bitTb is the bit durationfc is the carrier frequency

BPSK Demodulation in AWGN

x1x2

x2 x1

p(y|x2) p(y|x1)

10

bE− bE+0

0

( )σπσ

−= = −

2

| 22

1p ( | ) exp22

kY X k

y xy x x

BPSK Demodulation K=2

11

y1y4

bE− bE+0

y3 y2

y1y8

bE− bE+0

J=4

J=8

y1y3

bE− bE+0

y2

J=3

Mutual Information for a BSC

12

BSCX Ycrossover probability

1p p= −p

0wX Y

1w

p( 0)x w= =p( 1) 1x w w= = − =

0

1

pp

p

p

channel matrix

p pp p

PY|X =

13

Convex Functions

14

Concave Convex

Concave Function

15

Convex Function

16

Mutual Information

17

18

19

input probabilities

20

21

22

channel transition probabilities

BSC I(X;Y)

w p23

Channel Capacity

24

channelX Y


25

x1

x2

y1

y2

C=1-h(p)

Properties of the Channel Capacity

• C ≥ 0 since I(X;Y) ≥ 0• C ≤ log|X| = log(K) since

C = max I(X;Y) ≤ max H(X) = log(K)• C ≤ log|Y| = log(J) for the same reason• I(X;Y) is a concave function of p(X), so a local

maximum is a global maximum

26

Symmetric Channels

A discrete memoryless channel is said to besymmetric if the set of output symbols

{yj}, j = 1, 2, ..., J,can be partitioned into subsets such that foreach subset of the matrix of transitionprobabilities, each column is a permutation ofthe other columns, and each row is also apermutation of the other rows.

27

Binary Channels

28

Symmetric channel matrix

11

p pP

p p−

= −

Non-symmetric channel matrix

1 21 2

1 2

11

p pP p p

p p−

= ≠ −

0

e

p

p q

q

1-p-q

1-p-q

29

1

0

1

x1

x2

y1

y2

y3


Symmetric Channels

• No partition required → strongly symmetric• Partition required → weakly symmetric

30

Capacity of a Strongly Symmetric Channel

Theorem

31

= =

=

= +

= +

∑ ∑

∑1 1

1

I(X;Y) p( ) p( | )logp( | ) H(Y)

p( | )logp( | ) H(Y)

K J

k j k j kk j

J

j k j kj

x y x y x

y x y x

.1

.2.1

.2.7

32

x1

x3

y1

y2

y3

Example J = K = 3

x2

.7

.7

.2

.1

Example

33

Y|X

.7 .2 .1

.1 .7 .2

.2 .1 .7P

=

1 1

p( ) p( | )logp( | )K J

k j k j kk j

x y x y x= =∑ ∑

[ ][ ][ ]

1

2

3

1

p( ) .7log.7 .1log.1 .2log.2

p( ) .2log.2 .7log.7 .1log.1

p( ) .1log.1 .2log.2 .7log.7

.7log.7 .2log.2 .1log.1

p( | )logp( | )J

j k j kj

x

x

x

y x y x=

= + +

+ + +

+ + +

= + +

= ∑

Example

34

1

1 11

2 21

1

H(Y) p( )logp( )

p( ) p( | )p( )

p( ) p( | )p( )

p( ) p( | )p( )

J

j jj

K

k kk

K

k kk

K

J J k kk

y y

y y x x

y y x x

y y x x

=

=

=

=

= −

=

=

=

∑

∑

∑

∑

1 1 2 3

2 1 2 3

3 1 2 3

p( ) .7p( ) .2p( ) .1p( )p( ) .1p( ) .7p( ) .2p( )p( ) .2p( ) .1p( ) .7p( )

y x x xy x x xy x x x

= + += + += + +

r-ary Symmetric Channel

− − − − − − − − = −− − −

− − − −

11 1 1

11 1 1

11 1 1

11 1 1

p p ppr r r

p p ppr r r

P p p ppr r r

p p p pr r r

35

r-ary Symmetric Channel

36

= − − + − +− −

= + − − +−

= + − − + − −= − − −

C (1 )log(1 ) ( 1) log( ) log1 1

log (1 )log(1 ) log( )1

log (1 )log(1 ) log( ) log( 1)log ( ) log( 1)

p pp p r rr r

pr p p pr

r p p p p p rr h p p r

• If r = 2 C = 1 – h(p)• If r = 3 C = log23 – h(p) – p• If r = 4 C = 2 – h(p) – plog23


37

0

e

p

p q

q

1-p-q

1-p-q1

0

1


Y|X

X Y Y|X X

.8 .05P .15 .15

.05 .8

.425.5

P P P P .15.5

.425

=

= = × =

38

Capacity of a Weakly Symmetric Channel

• qi – probability of channel i• Ci – capacity of channel i

39

1

C CL

i ii

q=

=∑


40

0

p

p

1-p-q

1-p-q1

0

1

0

e

q

q1

1

.9412 .0588P

.0588 .9412

=

[ ]2P 1.0 1.0=


41

1

C CL

i ii

q=

=∑

= + + == + =

= + =

1

2

C .9412log.9412 .0588log.0588 log2 .6773C 1log1 log1 0.0

C .85(.6773) .15(0.0) .5757

Binary Erasure Channel

42

0

e

p

p

1-p

1-p1

0

1

C = 1 - p

Z Channel (Optical)

43

0

p

1

1 1 (light off)

0 (light on)

1-p

Z Channel (Optical)

44

( )/(1 )2C log 1 (1 ) p pp p −= + −

= =

=

∑∑

2 2

1 1

p( | )I(X;Y) p( )p( | )log

p( )j k

k j kk j j

y xx y x

y

11I( ;Y) logx w

w pw

= +

21I( ;Y) log (1 )log

(1 )p px wp w p

w pw p w −

= + − + −

−= −− + h( )/(1 )

11(1 )(1 2 )p pw

p

Channel Capacity for the Z, BSC and BEC

45

p

Blahut-Arimoto Algorithm

• An analytic solution for the capacity can be very difficult to obtain

• The alternative is a numerical solution– Arimoto Jan. 1972– Blahut Jul. 1972

• Exploits the fact that I(X;Y) is a concave function of p(xk)

46

1 1

1

p( | )I(X;Y) p( ) p( | )log

( ) ( | )

K Jj k

k j k Kk j

l j ll

y xx y x

p x p y x= =

=

=

∑ ∑∑


47


• Update the probabilities

48

49

Symmetric Channel Example

50

51

Non-Symmetric Channel Example

0.7000

52

53

54

55

56

Communication over Noisy Channels

X

57


58

x1

x2

y1

y2

channel matrix

p pp p

• Consider a block of N = 1000 bits– if p = 0, 1000 bits are received correctly– if p = 0.01, 990 bits are received correctly– if p = 0.5, 500 bits are received correctly

• When p > 0, we do not know which bits are in error– if p = 0.01, C = .919 bit– if p = 0.5, C = 0 bit

59


Triple Repetition Code

• N = 3message w codeword c

0 0001 111

60

Binary Symmetric Channel Errors

• If N bits are transmitted, the probability of an m bit error pattern is

• The probability of exactly m errors is

• The probability of m or more errors is

( ) −−1 N mmp p

( ) ( )−−1 N mm Nm

p p

( ) ( )−

=

−∑ 1N

N ii Ni

i m

p p

61

Triple Repetition Code• N = 3• The probability of 0 errors is• The probability of 1 error is• The probability of 2 errors is • The probability of 3 errors is

−23 (1 )p p

62

− 3(1 )p

− 23p(1 )p

3p

Triple Repetition Code – DecodingReceived Word Codeword

0 0 00 0 00 0 00 0 01 1 11 1 11 1 11 1 1

0 0 00 0 10 1 01 0 00 0 00 0 10 1 01 0 0

0 0 00 0 10 1 01 0 01 1 11 1 01 0 10 1 1

Error Pattern

63

Triple Repetition Code• The BSC bit error probability is p < • majority vote or nearest neighbor decoding

000, 001, 010, 100 → 000 111, 110, 101, 011 → 111

• the probability of a decoding error is

• If p = 0.01, then Pe = 0.000298 and only one word in 3356 will be in error after decoding.

• A reduction by a factor of 33.

= − + = − <2 3 2 3eP 3 (1 ) 3 2p p p p p p

64

12

Code Rate

• After compression, the data is (almost) memoryless and uniformly distributed (equiprobable)

• Thus the entropy of the messages (codewords) is

• The blocklength of a codeword is N

65

H( ) = logbW M

Code Rate

• The code rate is given by

• M is the number of codewords• N is the block length• For the triple repetition code

66

logR = bits per channel useb M

N

2log 2 1R = 3 3

=

Code rate R 67

Repetition

Shannon’s Noisy Coding Theorem

For any ε > 0 and for any rate R less than the channel capacity C, there is an encoding and decoding scheme that can be used to ensure that the probability of decoding error is less than ε for a sufficiently large block length N.

68

Code rate R 69

Error Correction Coding N = 3

• R = 1/3 M = 20 → 0001 → 111

• R = 1 M = 8000 → 000 001 → 001 010 → 010 011 → 011111 → 111 110 → 110 101 → 101 100 → 100

• Another choice R = 2/3 M = 400 → 000 01 → 011 10 → 101 11 → 110

70


• BSC p = 0.01• M is the number of codewords

• Tradeoff between code rate and error rate

Code Rate R Pe M=2NR

1 0.0297 8

2/3 0.0199 4

1/3 2.98×10-4 2

71

Codes for N=3

72

000

111

000

001

111

010

100110

011

101

000

110

011

101

R=1 R=2/3 R=1/3


• BSC p = 0.01



1 0.0490 32

4/5 0.0394 16

3/5 0.0297 8

2/5 9.80×10-4 4

1/5 9.85×10-6 2

73


• BSC p = 0.01 N = 7



1 0.0679 128

6/7 0.0585 64

5/7 0.0490 32

4/7 2.03×10-3 16

3/7 1.46×10-3 8

2/7 9.80×10-4 4

1/7 3.40×10-7 2

74

• p = .01• N = 3 R = 2/3 Pe = 2.0×10-2

• N = 12 R = 2/3 Pe = 6.2×10-3

• N = 30 R = 2/3 Pe = 3.3×10-3

• The code rates are the same but the error rate is decreasing

• Thus for fixed R, Pe can be decreased by increasing N

75

Best Code Comparison

Code Matrix

76

Binary Codes

• For given values of M and N, there are2MN

possible codes.• Of these, some will be bad, some will be best

(optimal), and some will be good, in terms of Pe

• An average code will be good.

77

78

Channel Capacity

• To prove that information can be transmitted reliably over a noisy channel at rates up to the capacity, Shannon used a number of new concepts– Allowing an arbitrarily small but nonzero

probability of error– Using long codewords– Calculating the average probability of error over a

random choice of codes to show that at least one good code exists

79

Channel Coding Theorem

• Random coding used in the proof• Joint typicality used as the decoding rule• Shows that good codes exist which provide an

arbitrarily small probability of error• Does not provide an explicit way of

constructing good codes• If a long code (large N) is generated randomly,

the code is likely to be good but is difficult to decode

80

Noisy Communication System

X

81

The Data Processing InequalityCascaded Channels

I(W;X)W X

I(X;Y)Y

I(W;Y)

The mutual information I(W;Y) for the cascade cannot be larger than I(W;X) or I(X;Y), so that

I(W;Y) ≤ I(W;X) I(W;Y) ≤ I(X;Y)

82

83

Channel Capacity: Weak Converse

For R > C, the decoding error probability is bounded away from 0

RC

lower bound on Pe

84

Channel Capacity: Weak Converse

• C = 0.3

85

R

1-C/R

logb(r-1)

logbr

0Pe

(r-1)/r 1

h(Pe)+Pelogb(r-1)

Fano’s Inequality

86

Fano’s Inequality

H(X|Y) ≤ h(Pe)+Pelogb(r-1)

H(X|Y) ≤ 1+NRPe(N,R)

87

Channel Capacity: Strong Converse

• For rates above capacity (R > C)

• where EA(R) is Arimoto’s error exponent

88

A-NE (R)eP (N,R) 1 2≥ −

Arimoto’s Error Exponent EA(R)

89

EA(R) for a BSC with p=0.1

90

91

RC

1

Pe

• The capacity is a very clear dividing point• At rates below capacity, Pe → 0 exponentially as N → ∞ • At rates above capacity, Pe → 1 exponentially as N → ∞

Documents

ECE 515 Information Theoryagullive/coding515.pdf · 2019-11-20 · Information Theory Channel Capacity and Coding 1. Information Theory Problems ... Example 33 Y|X .7 .2 .1.1 .7 .2.2