Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
ECE 515Information Theory
Channel Capacity and Coding
1
Information Theory Problems
• How to transmit or store information as efficiently as possible.
• What is the maximum amount of information that can be transmitted or stored reliably?
• How can information be kept secure?
2
Digital Communications System
3
Data source
Data sink
W X
YŴ
PX
PY
PY|X
Ŵ is an estimate of W
Communication Channel
4
Discrete Memoryless Channel
x1
x2
xK
y1
y2
yJ
.
.
....
J≥K≥25
X Y
Discrete Memoryless Channel
Channel Transition Matrix
1 1 1 1
1
1
( | ) ( | ) ( | )
( | ) ( | ) ( | )
( | ) ( | ) ( | )
k K
j j k j K
J J k J K
p y x p y x p y x
P p y x p y x p y x
p y x p y x p y x
=
6
Binary Symmetric Channel
7
x1
x2
y1
y2
channel matrix
p pp p
0
e
p
p q
q
1-p-q
1-p-q
8
1
0
1
x1
x2
y1
y2
y3
Binary Errors with Erasure Channel
BPSK Modulation K=2
( ) ( )
( ) ( )
1
2
2cos 2 0 bit
2cos 2 1 bit
bc
b
bc
b
Ex t f t
T
Ex t f t
T
π
π
= +
= −
9
Eb is the energy per bitTb is the bit durationfc is the carrier frequency
BPSK Demodulation in AWGN
x1x2
x2 x1
p(y|x2) p(y|x1)
10
bE− bE+0
0
( )σπσ
−= = −
2
| 22
1p ( | ) exp22
kY X k
y xy x x
BPSK Demodulation K=2
11
y1y4
bE− bE+0
y3 y2
y1y8
bE− bE+0
J=4
J=8
y1y3
bE− bE+0
y2
J=3
Mutual Information for a BSC
12
BSCX Ycrossover probability
1p p= −p
0wX Y
1w
p( 0)x w= =p( 1) 1x w w= = − =
0
1
pp
p
p
channel matrix
p pp p
PY|X =
13
Convex Functions
14
Concave Convex
Concave Function
15
Convex Function
16
Mutual Information
17
18
19
input probabilities
20
21
22
channel transition probabilities
BSC I(X;Y)
w p23
Channel Capacity
24
channelX Y
Binary Symmetric Channel
25
x1
x2
y1
y2
C=1-h(p)
Properties of the Channel Capacity
• C ≥ 0 since I(X;Y) ≥ 0• C ≤ log|X| = log(K) since
C = max I(X;Y) ≤ max H(X) = log(K)• C ≤ log|Y| = log(J) for the same reason• I(X;Y) is a concave function of p(X), so a local
maximum is a global maximum
26
Symmetric Channels
A discrete memoryless channel is said to besymmetric if the set of output symbols
{yj}, j = 1, 2, ..., J,can be partitioned into subsets such that foreach subset of the matrix of transitionprobabilities, each column is a permutation ofthe other columns, and each row is also apermutation of the other rows.
27
Binary Channels
28
Symmetric channel matrix
11
p pP
p p−
= −
Non-symmetric channel matrix
1 21 2
1 2
11
p pP p p
p p−
= ≠ −
0
e
p
p q
q
1-p-q
1-p-q
29
1
0
1
x1
x2
y1
y2
y3
Binary Errors with Erasure Channel
Symmetric Channels
• No partition required → strongly symmetric• Partition required → weakly symmetric
30
Capacity of a Strongly Symmetric Channel
Theorem
31
= =
=
= +
= +
∑ ∑
∑1 1
1
I(X;Y) p( ) p( | )logp( | ) H(Y)
p( | )logp( | ) H(Y)
K J
k j k j kk j
J
j k j kj
x y x y x
y x y x
.1
.2.1
.2.7
32
x1
x3
y1
y2
y3
Example J = K = 3
x2
.7
.7
.2
.1
Example
33
Y|X
.7 .2 .1
.1 .7 .2
.2 .1 .7P
=
1 1
p( ) p( | )logp( | )K J
k j k j kk j
x y x y x= =∑ ∑
[ ][ ][ ]
1
2
3
1
p( ) .7log.7 .1log.1 .2log.2
p( ) .2log.2 .7log.7 .1log.1
p( ) .1log.1 .2log.2 .7log.7
.7log.7 .2log.2 .1log.1
p( | )logp( | )J
j k j kj
x
x
x
y x y x=
= + +
+ + +
+ + +
= + +
= ∑
Example
34
1
1 11
2 21
1
H(Y) p( )logp( )
p( ) p( | )p( )
p( ) p( | )p( )
p( ) p( | )p( )
J
j jj
K
k kk
K
k kk
K
J J k kk
y y
y y x x
y y x x
y y x x
=
=
=
=
= −
=
=
=
∑
∑
∑
∑
1 1 2 3
2 1 2 3
3 1 2 3
p( ) .7p( ) .2p( ) .1p( )p( ) .1p( ) .7p( ) .2p( )p( ) .2p( ) .1p( ) .7p( )
y x x xy x x xy x x x
= + += + += + +
r-ary Symmetric Channel
− − − − − − − − = −− − −
− − − −
11 1 1
11 1 1
11 1 1
11 1 1
p p ppr r r
p p ppr r r
P p p ppr r r
p p p pr r r
35
r-ary Symmetric Channel
36
= − − + − +− −
= + − − +−
= + − − + − −= − − −
C (1 )log(1 ) ( 1) log( ) log1 1
log (1 )log(1 ) log( )1
log (1 )log(1 ) log( ) log( 1)log ( ) log( 1)
p pp p r rr r
pr p p pr
r p p p p p rr h p p r
• If r = 2 C = 1 – h(p)• If r = 3 C = log23 – h(p) – p• If r = 4 C = 2 – h(p) – plog23
Binary Errors with Erasure Channel
37
0
e
p
p q
q
1-p-q
1-p-q1
0
1
Binary Errors with Erasure Channel
Y|X
X Y Y|X X
.8 .05P .15 .15
.05 .8
.425.5
P P P P .15.5
.425
=
= = × =
38
Capacity of a Weakly Symmetric Channel
• qi – probability of channel i• Ci – capacity of channel i
39
1
C CL
i ii
q=
=∑
Binary Errors with Erasure Channel
40
0
p
p
1-p-q
1-p-q1
0
1
0
e
q
q1
1
.9412 .0588P
.0588 .9412
=
[ ]2P 1.0 1.0=
Binary Errors with Erasure Channel
41
1
C CL
i ii
q=
=∑
= + + == + =
= + =
1
2
C .9412log.9412 .0588log.0588 log2 .6773C 1log1 log1 0.0
C .85(.6773) .15(0.0) .5757
Binary Erasure Channel
42
0
e
p
p
1-p
1-p1
0
1
C = 1 - p
Z Channel (Optical)
43
0
p
1
1 1 (light off)
0 (light on)
1-p
Z Channel (Optical)
44
( )/(1 )2C log 1 (1 ) p pp p −= + −
= =
=
∑∑
2 2
1 1
p( | )I(X;Y) p( )p( | )log
p( )j k
k j kk j j
y xx y x
y
11I( ;Y) logx w
w pw
= +
21I( ;Y) log (1 )log
(1 )p px wp w p
w pw p w −
= + − + −
−= −− + h( )/(1 )
11(1 )(1 2 )p pw
p
Channel Capacity for the Z, BSC and BEC
45
p
Blahut-Arimoto Algorithm
• An analytic solution for the capacity can be very difficult to obtain
• The alternative is a numerical solution– Arimoto Jan. 1972– Blahut Jul. 1972
• Exploits the fact that I(X;Y) is a concave function of p(xk)
46
1 1
1
p( | )I(X;Y) p( ) p( | )log
( ) ( | )
K Jj k
k j k Kk j
l j ll
y xx y x
p x p y x= =
=
=
∑ ∑∑
Blahut-Arimoto Algorithm
47
Blahut-Arimoto Algorithm
• Update the probabilities
48
49
Symmetric Channel Example
50
51
Non-Symmetric Channel Example
0.7000
52
53
54
55
56
Communication over Noisy Channels
X
57
Binary Symmetric Channel
58
x1
x2
y1
y2
channel matrix
p pp p
• Consider a block of N = 1000 bits– if p = 0, 1000 bits are received correctly– if p = 0.01, 990 bits are received correctly– if p = 0.5, 500 bits are received correctly
• When p > 0, we do not know which bits are in error– if p = 0.01, C = .919 bit– if p = 0.5, C = 0 bit
59
Binary Symmetric Channel
Triple Repetition Code
• N = 3message w codeword c
0 0001 111
60
Binary Symmetric Channel Errors
• If N bits are transmitted, the probability of an m bit error pattern is
• The probability of exactly m errors is
• The probability of m or more errors is
( ) −−1 N mmp p
( ) ( )−−1 N mm Nm
p p
( ) ( )−
=
−∑ 1N
N ii Ni
i m
p p
61
Triple Repetition Code• N = 3• The probability of 0 errors is• The probability of 1 error is• The probability of 2 errors is • The probability of 3 errors is
−23 (1 )p p
62
− 3(1 )p
− 23p(1 )p
3p
Triple Repetition Code – DecodingReceived Word Codeword
0 0 00 0 00 0 00 0 01 1 11 1 11 1 11 1 1
0 0 00 0 10 1 01 0 00 0 00 0 10 1 01 0 0
0 0 00 0 10 1 01 0 01 1 11 1 01 0 10 1 1
Error Pattern
63
Triple Repetition Code• The BSC bit error probability is p < • majority vote or nearest neighbor decoding
000, 001, 010, 100 → 000 111, 110, 101, 011 → 111
• the probability of a decoding error is
• If p = 0.01, then Pe = 0.000298 and only one word in 3356 will be in error after decoding.
• A reduction by a factor of 33.
= − + = − <2 3 2 3eP 3 (1 ) 3 2p p p p p p
64
12
Code Rate
• After compression, the data is (almost) memoryless and uniformly distributed (equiprobable)
• Thus the entropy of the messages (codewords) is
• The blocklength of a codeword is N
65
H( ) = logbW M
Code Rate
• The code rate is given by
• M is the number of codewords• N is the block length• For the triple repetition code
66
logR = bits per channel useb M
N
2log 2 1R = 3 3
=
Code rate R 67
Repetition
Shannon’s Noisy Coding Theorem
For any ε > 0 and for any rate R less than the channel capacity C, there is an encoding and decoding scheme that can be used to ensure that the probability of decoding error is less than ε for a sufficiently large block length N.
68
Code rate R 69
Error Correction Coding N = 3
• R = 1/3 M = 20 → 0001 → 111
• R = 1 M = 8000 → 000 001 → 001 010 → 010 011 → 011111 → 111 110 → 110 101 → 101 100 → 100
• Another choice R = 2/3 M = 400 → 000 01 → 011 10 → 101 11 → 110
70
Error Correction Coding N = 3
• BSC p = 0.01• M is the number of codewords
• Tradeoff between code rate and error rate
Code Rate R Pe M=2NR
1 0.0297 8
2/3 0.0199 4
1/3 2.98×10-4 2
71
Codes for N=3
72
000
111
000
001
111
010
100110
011
101
000
110
011
101
R=1 R=2/3 R=1/3
Error Correction Coding N = 5
• BSC p = 0.01
• Tradeoff between code rate and error rate
Code Rate R Pe M=2NR
1 0.0490 32
4/5 0.0394 16
3/5 0.0297 8
2/5 9.80×10-4 4
1/5 9.85×10-6 2
73
Error Correction Coding N = 7
• BSC p = 0.01 N = 7
• Tradeoff between code rate and error rate
Code Rate R Pe M=2NR
1 0.0679 128
6/7 0.0585 64
5/7 0.0490 32
4/7 2.03×10-3 16
3/7 1.46×10-3 8
2/7 9.80×10-4 4
1/7 3.40×10-7 2
74
• p = .01• N = 3 R = 2/3 Pe = 2.0×10-2
• N = 12 R = 2/3 Pe = 6.2×10-3
• N = 30 R = 2/3 Pe = 3.3×10-3
• The code rates are the same but the error rate is decreasing
• Thus for fixed R, Pe can be decreased by increasing N
75
Best Code Comparison
Code Matrix
76
Binary Codes
• For given values of M and N, there are2MN
possible codes.• Of these, some will be bad, some will be best
(optimal), and some will be good, in terms of Pe
• An average code will be good.
77
78
Channel Capacity
• To prove that information can be transmitted reliably over a noisy channel at rates up to the capacity, Shannon used a number of new concepts– Allowing an arbitrarily small but nonzero
probability of error– Using long codewords– Calculating the average probability of error over a
random choice of codes to show that at least one good code exists
79
Channel Coding Theorem
• Random coding used in the proof• Joint typicality used as the decoding rule• Shows that good codes exist which provide an
arbitrarily small probability of error• Does not provide an explicit way of
constructing good codes• If a long code (large N) is generated randomly,
the code is likely to be good but is difficult to decode
80
Noisy Communication System
X
81
The Data Processing InequalityCascaded Channels
I(W;X)W X
I(X;Y)Y
I(W;Y)
The mutual information I(W;Y) for the cascade cannot be larger than I(W;X) or I(X;Y), so that
I(W;Y) ≤ I(W;X) I(W;Y) ≤ I(X;Y)
82
83
Channel Capacity: Weak Converse
For R > C, the decoding error probability is bounded away from 0
RC
lower bound on Pe
84
Channel Capacity: Weak Converse
• C = 0.3
85
R
1-C/R
logb(r-1)
logbr
0Pe
(r-1)/r 1
h(Pe)+Pelogb(r-1)
Fano’s Inequality
86
Fano’s Inequality
H(X|Y) ≤ h(Pe)+Pelogb(r-1)
H(X|Y) ≤ 1+NRPe(N,R)
87
Channel Capacity: Strong Converse
• For rates above capacity (R > C)
• where EA(R) is Arimoto’s error exponent
88
A-NE (R)eP (N,R) 1 2≥ −
Arimoto’s Error Exponent EA(R)
89
EA(R) for a BSC with p=0.1
90
91
RC
1
Pe
• The capacity is a very clear dividing point• At rates below capacity, Pe → 0 exponentially as N → ∞ • At rates above capacity, Pe → 1 exponentially as N → ∞