Plan: 2.Converse bounds 3.Achievability bounds 4.Channel ...people.lids.mit.edu › yp › homepage...

Introduction Bounds: Converse Bounds: Achievability Channel dispersion Applications & Extensions Conclusion

Channel coding: finite blocklength results

1. Overview on the example of BSC

2. Converse bounds

3. Achievability bounds

4. Channel dispersion

5. Applications: performance of real-world codesExtensions: codes with feedback

Abstract communication problem

Noisy channel

Goal: Decrease corruption of data caused by noise

Channel coding: principles

Noisy channel

Data bits Redundancy

Goal: Decrease corruption of data caused by noise

Solution: Code to diminish probability of error Pe .

Key metrics: Rate and Pe

Possible

Impossible

Pe Reliability−Rate tradeoff

Noisy channel

Noisy channelDecreasing Pe further:

1. More redundancyBad: loses rate

2. Increase blocklength!

n = 10

Noisy channelDecreasing Pe further:

n = 100

Noisy channel

Decreasing Pe further:

n = 1000

Noisy channel

Decreasing Pe further:

n = 106

Channel coding: Shannon capacity

Noisy channel

Channel capacity

Shannon: Fix R < CPe ↘ 0 as n→∞

Channel coding: Shannon capacity

Noisy channel

C Rate

Channel capacity

Question:For what n will Pe < 10−3?

Channel coding: Gaussian approximation

Noisy channel

C Rate

Channel capacity

Channel dispersion

Answer:

n & const · V

Channel coding: Gaussian approximation

Noisy channel

C Rate

Channel capacity

Channel dispersion

Answer:

n & const · V

How to describe evolution of the boundary?

C Rate

Classical results:

I Vertical asymptotics: fixed rate, reliability functionElias, Dobrushin, Fano, Shannon-Gallager-Berlekamp

I Horizontal asymptotics: fixed ε, strong converse,√

n termsWolfowitz, Weiss, Dobrushin, Strassen, Kemperman

How to describe evolution of the boundary?

C Rate

XXI century:

I Tight non-asymptotic bounds

I Remarkable precision of normal approximation

I Extended results on horizontal asymptoticsAWGN, O(log n), cost constraints, feedback, etc.

Finite blocklength fundamental limit

C Rate

Definition

R∗(n, ε) = max

nlog M : ∃(n,M, ε)-code

(max. achievable rate for blocklength n and prob. of error ε)

Note: Exact value unknown (search is doubly exponential in n).

Minimal delay and error-exponents

Fix R < C . What is the smallest blocklength n∗ neededto achieve

R∗(n, ε) ≥ R ?

Classical answer: Approximation via reliability function[Shannon-Gallager-Berlekamp’67]:

n∗ ≈ 1

E (R)log

E.g., take BSC (0.11) and R = 0.9C , prob. of error ε = 10−3:

n∗ ≈ 5000 (channel uses)

Difficulty: How to verify accuracy of this estimate?

Minimal delay and error-exponents

Fix R < C . What is the smallest blocklength n∗ neededto achieve

R∗(n, ε) ≥ R ?

Classical answer: Approximation via reliability function[Shannon-Gallager-Berlekamp’67]:

n∗ ≈ 1

E (R)log

E.g., take BSC (0.11) and R = 0.9C , prob. of error ε = 10−3:

n∗ ≈ 5000 (channel uses)

Difficulty: How to verify accuracy of this estimate?

Bounds

I Bounds are implicit in Shannon’s theorem

limn→∞

R∗(n, ε) = C ⇐⇒{

R∗(n, ε) ≤ C + o(1) ,R∗(n, ε) ≥ C + o(1) .

(Feinstein’54, Shannon’57, Wolfowitz’57, Fano)

I Reliability function: even better bounds(Elias’55, Shannon’59, Gallager’65, SGB’67)

I Problems: derived for asymptotics (need “de-asymptotization”)unexpected sensitivity:

ε ≤ e−nEr (R) [Gallager’65]

ε ≤ e−nEr (R−o(1))+O(log n) [Csiszar-Korner’81]

Bounds

limn→∞

R∗(n, ε) = C ⇐⇒{

R∗(n, ε) ≤ C + o(1) ,R∗(n, ε) ≥ C + o(1) .

ε ≤ e−nEr (R) [Gallager’65]

ε ≤ e−nEr (R−o(1))+O(log n) [Csiszar-Korner’81]

For BSC(n = 103, 0.11): o(1) ≈ 0.1, eO(log n) ≈ 1024 (!)

Bounds

limn→∞

R∗(n, ε) = C ⇐⇒{

R∗(n, ε) ≤ C + o(1) ,R∗(n, ε) ≥ C + o(1) .

Strassen’62: Take n > 19600ε16 . . . (!)

I Solution: Derive bounds from scratch.

New achievability bound

0 s -ZZZZZ~

1− δ s01 s -�

��>

1− δs1δ

Theorem (RCU)

For a BSC (δ) there exists a code with rate R, blocklength n and

ε ≤n∑

)δt(1− δ)n−t min

)2−n+nR

Proof of RCU bound for the BSC

Hamming Space Fn2

I Input space: A = {0, 1}n

I Let c1, . . . , cM ∼ Bern( 12 )n

(random codebook)

I Transmit c1

I Noise displaces c1→Y

Y = c1 + Z , Z ∼ Bern(δ)n

I Decoder: find closest codeword to Y

Hamming Space Fn2

I Let c1, . . . , cM ∼ Bern( 12 )n

(random codebook)

I Transmit c1

Hamming Space Fn2

I Let c1, . . . , cM ∼ Bern( 12 )n

(random codebook)

I Transmit c1

Hamming Space Fn2

I Let c1, . . . , cM ∼ Bern( 12 )n

(random codebook)

I Transmit c1

Hamming Space Fn2

I Let c1, . . . , cM ∼ Bern( 12 )n

(random codebook)

I Transmit c1

Hamming Space Fn2

c4Success!

I Let c1, . . . , cM ∼ Bern( 12 )n

(random codebook)

I Transmit c1

Hamming Space Fn2

I Let c1, . . . , cM ∼ Bern( 12 )n

(random codebook)

I Transmit c1

I Decoder: find closest codeword to YI Probability of error analysis:

P[error|Y ,wt(Z ) = t] = P[∃j > 1 : cj ∈ Ball(Y , t)]

≤∑M

j=2P[cj ∈ Ball(Y , t)]

≤ 2nR∑t

)2−n

Hamming Space Fn2

I Let c1, . . . , cM ∼ Bern( 12 )n

(random codebook)

I Transmit c1

≤∑M

≤ 2nR∑t

)2−n

Hamming Space Fn2

I Let c1, . . . , cM ∼ Bern( 12 )n

(random codebook)

I Transmit c1

I Probability of error analysis:

≤∑M

≤ 2nR∑t

)2−n

Hamming Space Fn2

Error!I Input space: A = {0, 1}n

I Let c1, . . . , cM ∼ Bern( 12 )n

(random codebook)

I Transmit c1

I Probability of error analysis:

≤∑M

≤ 2nR∑t

)2−n

Hamming Space Fn2

Error!I Input space: A = {0, 1}n

I Let c1, . . . , cM ∼ Bern( 12 )n

(random codebook)

I Transmit c1

≤∑M

≤ 2nR∑t

)2−n

. . . cont’d . . .

I Conditional probability of error:

P[error|Y ,wt(Z ) = t] ≤t∑

)2−n+nR

I Key observation: For large noise t RHS is > 1. Tighten:

P[error|Y ,wt(Z ) = t] ≤ min

)2−n+nR

}(∗)

I Average wt(Z ) ∼ Binomial(n, δ) =⇒ Q.E.D.

Note: Step (∗) tightens Gallager’s ρ-trick:

P[Aj ]

Sphere-packing converse (BSC variation)

Theorem (Elias’55)

For any (n,M, ε) code over the BSC (δ):

ε ≥ f

where f (·) is a piecewise-linear decreasing convex function:

)δj(1− δ)n−j t = 0, . . . , n

Note: Convexity of f follows from general properties of βα (below)

Hamming Space Fn2

Proof:I Denote decoding regions Dj :∐

Dj = {0, 1}n

I Probability of error is:

ε = 1M

∑j P[cj + Z 6∈ Dj ]

≥ 1M

∑j P[Z 6∈ Bj ]

I Bj = ball centered at 0 s.t. Vol(Bj) = Vol(Dj)

I Simple calculation:

P[Z 6∈ Bj ] = f (Vol(Bj))

I f – convex, apply Jensen:

ε ≥ f

Vol(Dj)

Hamming Space Fn2

Dj = {0, 1}n

ε = 1M

∑j P[cj + Z 6∈ Dj ]

≥ 1M

∑j P[Z 6∈ Bj ]

ε ≥ f

Vol(Dj)

Hamming Space Fn2

Dj = {0, 1}n

ε = 1M

∑j P[cj + Z 6∈ Dj ]

≥ 1M

∑j P[Z 6∈ Bj ]

ε ≥ f

Vol(Dj)

Bounds: example BSC (0.11), ε = 10−3

0 500 1000 1500 20000

Blocklength, n

Feinstein−Shannon

Gallager

Converse

Capacity

0 500 1000 1500 20000

Blocklength, n

Converse

Capacity

Delay required to achieve 90% of C :

2985 ≤ n∗ ≤ 3106

Error-exponent: n∗ ≈ 5000

0 500 1000 1500 20000

Blocklength, n

Converse

Capacity

Delay required to achieve 90% of C :

2985 ≤ n∗ ≤ 3106

Error-exponent: n∗ ≈ 5000

Normal approximation

Theorem

For the BSC (δ) and 0 < ε < 1,

R∗(n, ε) = C −√

nQ−1(ε) +

C (δ) = log 2 + δ log δ + (1− δ) log(1− δ)

V = δ(1− δ) log2 1− δδ

Q(x) =

∫ ∞

1√2π

e−y2

Proof: Bounds + Stirling’s formulaNote: Now we see the explicit dependence on ε!

Normal approximation: BSC (0.11); ε = 10−3

0 500 1000 1500 20000

Blocklength, n

Capacity

Non−asymptotic bounds

Normal approximationC −

√Vn Q−1(ε) + 1

2log nn

To achieve 90% of C :

n∗ ≈ 3150

Normal approximation: BSC (0.11); ε = 10−3

0 500 1000 1500 20000

Blocklength, n

Capacity

Non−asymptotic bounds

Normal approximationC −

√Vn Q−1(ε) + 1

2log nn

To achieve 90% of C :

n∗ ≈ 3150

Dispersion and minimal required delay

Delay needed to achieve R = ηC :

n∗ &(

Q−1(ε)

1− η

Note: VC2 is “coding horizon”.

Behavior of VC2 (BSC)

0 0.1 0.2 0.3 0.4 0.5

−−−−−−−−−−−−−−−−−−−→Less noise More noise

BSC summary

Delay required to achieve 90 % of capacity:

I Error-exponents:n∗ ≈ 5000

I True value:2985 ≤ n∗ ≤ 3106

I Channel dispersion:n∗ ≈ 3150

Converse Bounds

Notation

I Take a random transformation APY |X−→ B

(think A = An, B = Bn, PY |X = PY n|X n)

I Input distribution PX induces PY = PY |X ◦ PX

PXY = PXPY |XI Fix code:

Wencoder−→ X → Y

decoder−→ W

W ∼ Unif [M] and M = # of codewordsInput distribution PX associated to a code:

PX [·] 4= # of codewords ∈ (·)M

I Goal: Upper bounds on log M in terms of ε4= P[error ]

As by-product: R∗(n, ε) . C −√

Vn Q−1(ε)

Fano’s inequality

Theorem (Fano)

For any code

encoder decoder

with W ∼ Unif{1, . . . ,M}:

log M ≤ supPXI (X ; Y ) + h(ε)

1− ε , ε = P[W 6= W ]

Implies weak converse:

R∗(n, ε) ≤ C

1− ε + o(1) .

Proof: ε-small =⇒ H(W |W )-small =⇒ I (X ; Y ) ≈ H(W )= log M

A (very long) proof of Fano via channel substitution

Consider two distributions on (W ,X ,Y , W ):

P : PWXYW = PW × PX |W × PY |X ×PW |YDAG: W → X → Y → W

Q : QWXYW = PW × PX |W × QY ×PW |YDAG: W → X Y → W

Under Q the channel is useless:

Q[W = W ] =M∑

PW (m)QW (m) =1

QW (m) =1

Next step: data-processing for relative entropy D(·||·)

Data-processing for D(·||·)

TransformationPB|A

Input distribution Output distribution

(Random)

D(PA‖QA) ≥ D(PB‖QB)

Apply to transform: (W ,X ,Y , W ) 7→ 1{W 6= W }:

D(PWXYW ‖QWXYW ) ≥ d(P[W = W ] ‖Q[W = W ] )

= d(1− ε|| 1M )

where d(x ||y) = x log xy + (1− x) log 1−x

1−y .

Data-processing for D(·||·)

TransformationPB|A

Input distribution Output distribution

(Random)

D(PA‖QA) ≥ D(PB‖QB)

Apply to transform: (W ,X ,Y , W ) 7→ 1{W 6= W }:

D(PWXYW ‖QWXYW ) ≥ d(P[W = W ] ‖Q[W = W ] )

= d(1− ε|| 1M )

where d(x ||y) = x log xy + (1− x) log 1−x

1−y .

A proof of Fano via channel substitution

So far:D(PWXYW ‖QWXYW ) ≥ d(1− ε|| 1

Lower-bound RHS:

d(1− ε‖ 1M ) ≥ (1− ε) log M − h(ε)

Analyze LHS:

D(PWXYW ‖QWXYW ) = D(PXY ‖QXY )

= D(PXPY |X‖PXQY )

= D(PY |X‖QY |PX )

(Recall: D(PY |X‖QY |PX ) = E x∼PX[D(PY |X=x‖QY )])

A proof of Fano via channel substitution: last step

Putting it all together:

(1− ε) log M ≤ D(PY |X ||QY |PX ) + h(ε) ∀QY ∀code

Two methods:

1. Compute supPXinfQY

and recall

D(PY |X‖QY |PX ) = I (X ; Y )

2. Take QY = P∗Y = the caod (capacity achieving output dist.)and recall

D(PY |X‖P∗Y |PX ) ≤ supPX

I (X ; Y ) ∀PX

Conclude:(1− ε) log M ≤ sup

I (X ; Y ) + h(ε)

Important: Second method is particularly useful for FBL!

Tightening: from D(·||·) to βα(·, ·)Question: How about replacing D(·||·) with other divergences?

Answer:

D(·||·) relative entropy(KL divergence) weak converse

Dλ(·||·) Renyi divergence strong converse

βα(·, ·) Neyman-PearsonROC curve

FBL bounds

Next: What is βα?

Neyman-Pearson’s βα

Definition

For every pair of measures P,Q

βα(P,Q)4= inf

E :P[E ]≥αQ[E ] .

iterate overall “sets” E

plot pairs (P[E ],Q[E ])=⇒

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

α = P [E]

βα(P,Q)

Important: Like relative entropy βα satisfies data-processing.

βα = binary hypothesis testing

Tester?

Y1, Y2 , . . .

Two types of errors:

P[Tester says “QY ”] ≤ ε

Q[Tester says “PY ”] → min

Hence: Solve binary HT ⇐⇒ compute βα(PnY ,Q

Stein’s Lemma: For many i.i.d. observations

log β1−ε(PnY ,Q

nY ) = −nD(PY ||QY ) + o(n) .

But in fact log βα(PnY ,Q

nY ) can also be computed exactly!

How to compute βα?

Theorem (Neyman-Pearson)

βα is given parametrically by −∞ ≤ γ ≤ +∞:

logP(X )

Q(X )≥ γ

logP(X )

Q(X )≥ γ

]= βα(P,Q)

For product measures log Pn(X )Qn(X ) = sum of i.i.d. =⇒ from CLT:

log βα(Pn,Qn) = −nD(P||Q) +√

nV (P||Q)Q−1(α) + o(√

V (P||Q) = VarP

Back to proving converse

Recall two measures:

P : PWXYW = PW × PX |W × PY |X ×PW |YDAG: W → X → Y → W

P[W = W ] = 1−ε

Q : QWXYW = PW × PX |W × QY ×PW |YDAG: W → X Y → W

Q[W = W ] = 1M

Then by definition of βα:

β1−ε(PWXYW ,QWXYW ) ≤ 1

But logPWXYWQWXYW

= logPXPY |XPXQY

log M ≤ − log β1−ε(PXPY |X ,PXQY ) ∀QY ∀code

Meta-converse: minimax version

Theorem

Every (M, ε)-code for channel PY |X satisfies

log M ≤ − log

{infPX

β1−ε(PXPY |X ,PXQY )

Theorem

log M ≤ − log

{infPX

I Finding good QY for every PX is not needed:

β1−ε(PXPY |X ,PXQY ) = supQY

β1−ε(PXPY |X ,PXQY ) (∗)

I Saddle-point property of βα is similar to D(·||·):

D(PXPY |X‖PXQY ) = supQY

D(PXPY |X‖PXQY ) = C

Theorem

log M ≤ − log

{infPX

Bound is tight in two senses:

I There exist non-signalling assisted (NSA) codes attaining theupper-bound. [Matthews, Trans. IT’2012]

I ISIT’2013: For any (M, ε)-code with ML decoder

log M = − log

{supQY

Vazquez-Vilar et al [WeB4]

Theorem

log M ≤ − log

{infPX

In practice: evaluate with a luckily guessed (suboptimal) QY .How to guess good QY ?

I Try caod P∗YI Analyze channel symmetries

I Use geometric intuition. −→good QY ≈ “center” of PY |X

I Exercise: Redo BSC.

Space of distributions on Y

PY |X=a1

PY |X=aj

PY |X=a2

Example: Converse for AWGN

The AWGN Channel

Z∼ N (0, σ2)↓

X −→ ⊕ −→ Y

Codewords X n ∈ Rn satisfy power-constraint:

|Xj |2 ≤ nP

Goal: Upper-bound # of codewords decodable with Pe ≤ ε.

I Given {c1, . . . , cM} ∈ Rn with P[W 6= W ] ≤ ε on AWGN(1).I Yaglom-map trick: replacing n→ n + 1 equalize powers:

‖cj‖2 = nP ∀j ∈ {1, . . . ,M}

I Take QY n = N (0, 1 + P)n (the caod!)I Optimal test “PX nY n vs. PX nQY n” (Neyman-Pearson):

logPY n|X n

QY n= nC +

2·(‖Y n‖2

1 + P− ‖Y n − X n‖2

where C = 12 log(1 + P).

I Under P: Y n = X n +N (0, In)=⇒ distribution of LLR (CLT approx.)

≈ nC +√

nV Z , Z ∼ N (0, 1)

Simple algebra: V = log2 e2

(1− 1

(1+P)2

I Given {c1, . . . , cM} ∈ Rn with P[W 6= W ] ≤ ε on AWGN(1).I Yaglom-map trick: replacing n→ n + 1 equalize powers:

‖cj‖2 = nP ∀j ∈ {1, . . . ,M}

I Take QY n = N (0, 1 + P)n (the caod!)I Optimal test “PX nY n vs. PX nQY n” (Neyman-Pearson):

logPY n|X n

QY n= nC +

2·(‖Y n‖2

1 + P− ‖Y n − X n‖2

where C = 12 log(1 + P).

I Under P: Y n = X n +N (0, In)=⇒ distribution of LLR (CLT approx.)

≈ nC +√

nV Z , Z ∼ N (0, 1)

Simple algebra: V = log2 e2

(1− 1

(1+P)2

. . . cont’d . . .I Under P: distribution of LLR (CLT approx.)

≈ nC +√

nV Z , Z ∼ N (0, 1)

I Take γ = nC −√

nV Q−1(ε) =⇒

logdPdQ≥ γ

]≈ 1− ε .

I Under Q: standard change-of-measure shows

logdPdQ≥ γ

]≈ exp{−γ} .

I By Neyman-Pearson

log β1−ε(PY n|X n=c ,QY n) ≈ −nC +√

nV Q−1(ε)

I Punchline: ∀(n,M, ε)-code

log M . nC −√

nV Q−1(ε)

N.B.! RHS can be exactly expressed via non-central χ2 dist.. . . and computed in MATLAB (w/o any CLT approx).

. . . cont’d . . .I Under P: distribution of LLR (CLT approx.)

≈ nC +√

nV Z , Z ∼ N (0, 1)

I Take γ = nC −√

nV Q−1(ε) =⇒

logdPdQ≥ γ

]≈ 1− ε .

I Under Q: standard change-of-measure shows

logdPdQ≥ γ

]≈ exp{−γ} .

I By Neyman-Pearson

log β1−ε(PY n|X n=c ,QY n) ≈ −nC +√

nV Q−1(ε)

I Punchline: ∀(n,M, ε)-code

log M . nC −√

nV Q−1(ε)

N.B.! RHS can be exactly expressed via non-central χ2 dist.. . . and computed in MATLAB (w/o any CLT approx).

AWGN: Converse from βα(P ,Q) with QY = N (0, 1)n

0 200 400 600 800 1000 1200 1400 1600 1800 20000

Blocklength, n

Capacity

Converse

Best achievability

Channel parameters:

SNR = 0 dB , ε = 10−3

From one QY to many

Symmetric channel

“distance” = −βα(·, ·)inputs equidistant to P ∗

caod P ∗Y

pointsm

ch. inputs(aj , bj , cj)

Symmetric channel: choice of QY is clear

From one QY to many

bj b2pointsm

General channels: Inputs cluster (by composition, power-allocation, . . .)(Clusters ⇐⇒ orbits of channel symmetry gp.)

From one QY to many

aj a2caod P ∗

bj b2pointsm

General channels: Caod is no longer equidistant to all inputs(read: analysis horrible!)

From one QY to many

aj a2caod P ∗

Natural choicesfor QY

bj b2pointsm

Solution: Take QY different for each cluster!I.e. think of QY |X

General meta-converse principle

Steps:

I Select auxiliary channel QY |X (art)E.g.: QY |X=x = center of a cluster of x

I Prove converse bound for channel QY |XE.g.: Q[W = W ] . # of clusters

I Find βα(P,Q), i.e. compare:

P : PWXYW = PW × PX |W × PY |X × PW |Y

Q : PWXYW = PW × PX |W × QY |X × PW |Y

I Amplify converse for QY |X to a converse for PY |X :

β1−Pe(PY |X ) ≤ 1− Pe(QY |X ) ∀code

Meta-converse theorem: point-to-point channels

Theorem

For any code ε4= P[error] and ε′

4= Q[error] satisfy

β1−ε(PXPY |X ,PXQY |X ) ≤ 1− ε′

Advanced examples of QY |X :

I General DMC: QY |X=x = PY |X ◦ Px

Why? To reduce DMC to symmetric DMC

I Parallel AWGN: QY |X=x = f (power-allocation)Why? Since water-filling is not FBL-optimal

I Feedback: Q[Y ∈ ·|W = w ] = P[Y ∈ ·|W 6= w ]Why? To get bounds in terms of Burnashev’s C1

I PAPR of codes: QY n|X n=xn = f (peak power of x)Why? To show peaky codewords waste power

Meta-converse generalizes many classical methods

Theorem

4= Q[error] satisfy

Corollaries:I Fano’s inequalityI Wolfowitz strong converseI Shannon-Gallager-Berlekamp’s sphere-packing

+ improvements: [Valembois-Fossorier’04], [Wiechman-Sason’08]I Haroutounian’s sphere-packingI list-decoding conversesI Berlekamp’s low-rate converseI Verdu-Han and Poor-Verdu information spectrum conversesI Arimoto’s converse (+ extension to feedback)

Meta-converse generalizes many classical methods

Theorem

4= Q[error] satisfy

Corollaries:I Fano’s inequalityI Wolfowitz strong converseI Shannon-Gallager-Berlekamp’s sphere-packing

+ improvements: [Valembois-Fossorier’04], [Wiechman-Sason’08]I Haroutounian’s sphere-packing

I list-decoding converses E.g.: Q[W ∈ {list}] = |{list}|M

I Berlekamp’s low-rate converseI Verdu-Han and Poor-Verdu information spectrum conversesI Arimoto’s converse (+ extension to feedback)

Meta-converse in networks

YXPY |X

{error} = {W1 6= W1} ∪ {W2 6= W2}

Meta-converse in networks

YXPY |X

{error} = {W1 6= W1} ∪ {W2 6= W2}I Probability of error depends on channel:

P[error] = ε ,

Q[error] = ε′ .

I Same idea: use code as a suboptimal binary HT: PY |X vs. QY |XI . . . and compare to the best possible test:

D(PXY ‖QXY ) ≥ d(1− ε‖1− ε′)β1−ε(PXPY |X ,PXQY |X ) ≤ 1− ε′

Example: MAC (weak-converse)

P[W1,2 = W1,2] = 1− ε Q[W1,2 = W1,2] = 1M1

. . . apply data processing of D(·||·) . . .⇓

d(1− ε‖ 1M1

) ≤ D(PY |X1X2‖QY |X1

|PX1PX2)

Optimizing QY |X1:

log M1 ≤I (X1; Y |X2) + h(ε)

1− εAlso with X1 ↔ X2 =⇒ weak converse (usual pentagon)

Example: MAC (FBL?)

P[W1,2 = W1,2] = 1− ε Q[W1,2 = W1,2] = 1M1

. . . use βα(·, ·) . . .⇓

β1−ε(PX1X2Y ,PX1QX2Y ) ≤ 1M1

On-going work: This βα is highly non-trivial to compute.[Huang-Moulin, MolavianJazi-Laneman, Yagi-Oohama]

Achievability Bounds

Notation

I A random transformation APY |X−→ B

I (M, ε) codes:

W → A→ B→ W W ∼ Unif {1, . . . ,M}P[W 6= W ] ≤ ε

I For every PXY = PXPY |X define information density:

ıX ;Y (x ; y)4= log

dPY |X=x

dPY(y)

I E [ıX ;Y (X ; Y )] = I (X ; Y )I Var[ıX ;Y (X ; Y )|X ] = VI Memoryless channels: ıAn;Bn(An; Bn) = sum of iid.

ıAn;Bn(An; Bn)d≈ nI (A; B) +

√nV Z , Z ∼ N (0, 1)

I Goal: Prove FBL bounds.

As by-product: R∗(n, ε) & C −√

Vn Q−1(ε)

Achievability bounds: classical ideas

Goal: select codewords C1, . . . ,CM in the input space A.

Two principal approaches:

I Random coding: generate C1, . . . ,CM – iid with PX andcompute average probability of error [Shannon’48, Erdos’47].

I Maximal coding: choose Cj one by one until the output spaceis exhausted [Gilbert’52, Feinstein’54, Varshamov’57].

Complication: Many inequivalent ways to apply these ideas!Which ones are the best for FBL?

Classical bounds

I Feinstein’55 bound: ∃(M, ε)-code:

M ≥ supγ≥0

{γ(ε− P[ıX ;Y (X ; Y ) ≤ log γ])

I Shannon’57 bound: ∃(M, ε)-code:

ε ≤ infγ≥0

{P[ıX ;Y (X ; Y ) ≤ log γ] +

M − 1

I Gallager’65 bound: ∃(n,M, ε)-code over memoryless channel:

ε ≤ exp

{−nEr

(log M

I Up to M ↔ (M − 1) Feinstein and Shannon are equivalent.

New bounds: RCU

Theorem (Random Coding Union Bound)

For any PX there exists a code with M codewords and

ε ≤ E [min {1, (M − 1)π(X ,Y )}]π(a, b) = P[ıX ;Y (X ; Y ) ≥ ıX ;Y (X ; Y ) |X = a,Y = b]

where PXY X (a, b, c) = PX (a)PY |X (b|a)PX (c)

Proof:

I Reason as in RCU for BSC with dHam(·, ·)↔ −ıX ;Y (·, ·)I For example ML decoder: W = argmaxj ıX ;Y (Cj ; Y )

I Conditional prob. of error:

P[error |X ,Y ] ≤ (M − 1)P[ıX ;Y (X ; Y ) ≥ ıX ;Y (X ; Y ) |X ,Y ]

I Same idea: take min{·, 1} before averaging over (X ,Y ).

New bounds: RCU

Theorem (Random Coding Union Bound)

ε ≤ E [min {1, (M − 1)π(X ,Y )}]π(a, b) = P[ıX ;Y (X ; Y ) ≥ ıX ;Y (X ; Y ) |X = a,Y = b]

where PXY X (a, b, c) = PX (a)PY |X (b|a)PX (c)

Highlights:

I Strictly stronger than Feinstein-Shannon and Gallager

I Not easy to analyze asymptotics

I Computational complexity O(n2(|X |−1)|Y |)

New bounds: DT

Theorem (Dependence Testing Bound)

ε ≤ E[exp

{−∣∣∣ıX ;Y (X ; Y )− log

∣∣∣+}]

Highlights:

I Strictly stronger than Feinstein-Shannon

I . . . and no optimization over γ!

I Easier to compute than RCU

I Easier asymptotics: ε ≤ E[e−n|

1nı(X n;Y n)−R|+

≈ Q(√

nV {I (X ; Y )− R}

I Has a form of f -divergence: 1− ε ≥ Df (PXY ‖PXPY )

DT bound: Proof

I Codebook: random C1, . . .CM ∼ PX iid

I Feinstein decoder:

W = smallest j s.t. ıX ;Y (Cj ; Y ) > γ

I j-th codeword’s probability of error:

P[error |W = j ] ≤ P[ıX ;Y (X ; Y ) ≤ γ]︸︷︷︸©a

+(j − 1)P[ıX ;Y (X ; Y ) > γ]︸︷︷︸©b

In ©a : Cj too far from YIn ©b : Ck with k < j is too close to Y

I Average over W :

P[error] ≤ P [ıX ;Y (X ; Y ) ≤ γ] +M−1

2P[ıX ;Y (X ; Y ) > γ

DT bound: Proof

I Recap: for every γ there exists a code with

ε ≤ P [ıX ;Y (X ; Y ) ≤ γ] +M−1

2P[ıX ;Y (X ; Y ) > γ

I Key step: closed-form optimization of γ.Note: ıX ;Y = log dPXY

M+1PXY

≤ eγ]

+M−1

M+1PXY

> eγ])

Bayesian dependence testing!

Optimum threshold: Ratio of priors =⇒ γ∗ = log M−1

I Change of measure argument:

dQ≤ τ

dQ> τ

{−∣∣∣∣log

dQ− log τ

∣∣∣∣+}]

Example: Binary Erasure Channel BEC (0.5), ε = 10−3

0 500 1000 1500 20000.3

Blocklength, n

Converse

Achievability

Capacity

At n = 1000 best k → n code:

(DT) 450 ≤ k ≤ 452 (meta-c.)

Converse: QY n = truncated caod

Example: Binary Erasure Channel BEC (0.5), ε = 10−3

0 500 1000 1500 20000.3

Blocklength, n

Converse

Achievability

Capacity

At n = 1000 best k → n code:

(DT) 450 ≤ k ≤ 452 (meta-c.)

Converse: QY n = truncated caod

Input constraints: κβ bound

Theorem

For all QY and τ there exists an (M, ε)-code inside F ⊂ A

M ≥ κτsupx β1−ε+τ (PY |X=x ,QY )

κτ = inf{E : PY |X [E |x]≥τ ∀x∈F}

QY [E ]

Highlights:

I Key for channels with cost constraints (e.g. AWGN).

I Bound parameterized by the output distribution.

I Reduces coding to binary HT.

κβ bound: idea

Decoder:

• Take received y.

• Test y against each codeword ci , i = 1, . . .M:

Run optimal binary HT for :

H0 : PY |X=ci

H1 : QY

P[detectH0] = 1− ε+ τQ[detectH0] = β1−ε+τ (PY |X=x ,QY )

• First test that returns H0 becomes the decoded codeword.

• If all H1 – declare error.

κβ bound: idea

Decoder:

• Take received y.

• Test y against each codeword ci , i = 1, . . .M:

Run optimal binary HT for :

H0 : PY |X=ci

H1 : QY

P[detectH0] = 1− ε+ τQ[detectH0] = β1−ε+τ (PY |X=x ,QY )

• First test that returns H0 becomes the decoded codeword.

• If all H1 – declare error.

κβ bound: idea

Codebook:

• Pick codewords s.t. “balls” are τ -disjoint: P[Y ∈ Bx∩others|x ] ≤ τ• Key step: Cannot pick more codewords =⇒

M⋃j=1{j-th decoding region} is a composite HT:

H0 : PY |X=x x ∈ FH1 : QY

• Performance of the best test:

κτ = inf{E : PY |X [E |x]≥τ ∀x∈F}

QY [E ] .

• Thus:

κτ ≤ Q[all M “balls”]

≤ M supxβ1−ε+τ (PY |X=x ,QY )

Hierarchy of achievability bounds (no cost constr.)

Gallager

Feinstein

I Arrows show logical implication

I Performance ↔ computation rule of thumb.I ISIT’2013: Haim-Kochman-Erez [WeB4]

Tighter

Looser

Computation & Analysis

Gallager

Feinstein

al perf

I Arrows show logical implicationI Performance ↔ computation rule of thumb.

I ISIT’2013: Haim-Kochman-Erez [WeB4]

Tighter

Looser

Computation & Analysis

Gallager

Feinstein

RCU RCU*

al perf

I Arrows show logical implicationI Performance ↔ computation rule of thumb.I ISIT’2013: Haim-Kochman-Erez [WeB4]

Channel Dispersion

Connection to CLT

Recap:I Let PY n|X n = Pn

Y |X be memoryless. FBL fundamental limit:

R∗(n, ε) = max

nlog M : ∃(n,M, ε)-code

I Converse bounds (roughly):

R∗(n, ε) . ε-th quantile of1

dPY n|X n

I Achievability bounds (roughly):

R∗(n, ε) & ε-th quantile of1

nıX n;Y n(X n; Y n)

I Both random variables have form: 1n · (sum of iid) =⇒ by CLT

R∗(n, ε) = C + θ

(1√n

This section: Study√

n-term.

General definition of channel dispersion

Definition

For any channel we define channel dispersion as

V = limε→0

lim supn→∞

n (C − R∗(n, ε))2

2 ln 1ε

Rationale is the expansion (see below)

R∗(n, ε) = C −√

nQ−1(ε) + o

(1√n

)(∗)

and the fact Q−1(ε) ∼ 2 ln 1ε for ε→ 0

Recall: Approximation via (∗) is remarkably tight

General definition of channel dispersion

Definition

For any channel we define channel dispersion as

V = limε→0

lim supn→∞

n (C − R∗(n, ε))2

2 ln 1ε

Heuristic connection to error exponents E (R):

E (R) =(R − C )2

2· ∂

2E (R)

∂R2+ o((R − C )2)

and thus

(∂2E (R)

Dispersion of memoryless channels

I DMC [Dobrushin’61,Strassen’62]:

V = Var[ıX ;Y (X ; Y )] X ∼ capacity-achieving

I AWGN channel [PPV’08]:

V = log2 e2

I Parallel AWGN [PPV’09]:

V =L∑

j=1VAWGN

){Wj}–waterfilling powers

I DMC with input constraints [Hayashi’09,P’10]:

V = Var[ıX ;Y (X ; Y )|X ] X ∼ capacity-achieving

Dispersion of channels with memory

From [PPV’09, PPV’10, PV’11]:

I Non-white Gaussian noise with PSD N(f ):

V =log2 e

1/2∫

−1/2

[1− |N(f )|4

1/2∫

−1/2

[ξ − |N(f )|2

df = 1

I AWGN subject to stationary fading process Hi (CSI at receiver):

V = PSD 12

log(1+PH2i )(0) +

log2 e

(1− E 2

1 + PH20

I State-dependent discrete additive noise (CSI at receiver):

V = PSDC(Si )(0) + E [V (S)]

I ISIT’13: Quasi-static fading channels: V = 0 (!)Yang-Durisi-Koch-P. in [WeA4]

Dispersion: product vs generic channels

I Relation to alphabet size:

V ≤ 2 log2 min(|A|, |B|)− C 2 .

I Dispersion is additive:

{A1 → DMC1 → B1

A2 → DMC2 → B2

}= A1×A2 → DMC → B1×B2

C = C1 + C2 , Vε = V1,ε + V2,ε

I =⇒ product DMCs have atypically low dispersion.

Dispersion and normal approximation

Let PY |X be DMC and

{maxPX

Var[i(X ,Y )|X ] , ε < 1/2 ,

minPXVar[i(X ,Y )|X ] , ε > 1/2

where optimization is over all PX s.t. I (X ; Y ) = C .

Theorem (Strassen’62)

R∗(n, ε) = C −√

Q−1(ε) + O

(log n

But [PPV’10]: a counter-example with

R∗(n, ε) = C + Θ(

n−23

Dispersion and normal approximation

Let PY |X be DMC and

{maxPX

Var[i(X ,Y )|X ] , ε < 1/2 ,

minPXVar[i(X ,Y )|X ] , ε > 1/2

where optimization is over all PX s.t. I (X ; Y ) = C .

Theorem (Strassen’62, PPV’10)

R∗(n, ε) = C −√

Q−1(ε) + O

(log n

unless DMC is exotic in which case O(

log nn

)becomes O(n−

Further results on O(

log nn

I For BEC we have:

R∗(n, ε) = C −√

nQ−1(ε) + 0 · log n

I For most other symmetric channels (incl. BSC and AWGN∗):

R∗(n, ε) = C −√

nQ−1(ε) +

I For most DMC (under mild conditions):

R∗(n, ε) ≥ C −√

Q−1(ε) +1

I ISIT’13: For all DMC

R∗(n, ε) ≤ C −√

Q−1(ε) +1

Tomamichel-Tan, Moulin in [WeA4]

Applications

Evaluating performance of real-world codes

I Comparing codes: usual method – waterfall plots Pe vs. SNR

I Problem: Not fair for different rates.

=⇒ define rate-invariant metric:

Definition (Normalized rate)

Given rate R code find SNR at which Pe = ε.

Rnorm =R

R∗(n, ε,SNR)

I Agreement: ε = 10−3 or 10−4

I Take R∗(n, ε,SNR) ≈ C −√

Vn Q−1(ε)

I A family of channels needed (e.g. AWGN or BSC)

Evaluating performance of real-world codes

I Comparing codes: usual method – waterfall plots Pe vs. SNR

I Problem: Not fair for different rates.

=⇒ define rate-invariant metric:

Definition (Normalized rate)

Given rate R code find SNR at which Pe = ε.

Rnorm =R

R∗(n, ε,SNR)

I Agreement: ε = 10−3 or 10−4

I Take R∗(n, ε,SNR) ≈ C −√

Vn Q−1(ε)

I A family of channels needed (e.g. AWGN or BSC)

Codes vs. fundamental limits (from 1970’s to 2012)

1Normalized rates of code families over BIAWGN, Pe=0.0001

Blocklength, n

Turbo R=1/3Turbo R=1/6Turbo R=1/4VoyagerGalileo HGATurbo R=1/2Cassini/PathfinderGalileo LGAHermitian curve [64,32] (SDD)BCH (Koetter−Vardy)Polar+CRC R=1/2 (List dec.)ME LDPC R=1/2 (BP)

Performance of short algebraic codes (BSC, ε = 10−3)

Blocklength, n

HammingGolayBCHExtended BCHReed−Muller: 1 ord.Reed−Muller: 2 ord.Reed−Muller: 3 ord.Reed−Muller: 4+ ord.

Optimizing ARQ systems

I End-user wants Pe = 0

I Usual method: automatic repeat request (ARQ)

average throughput = Rate× (1− P[error])

I Question: Given k bits what rate (equiv. ε) maximizesthroughput?

I Assume (C ,V ) is known. Then approximately

T ∗(k) ≈ maxR

1− Q

(√kR

R− 1

0 0.5 1 1.50

I Solution: ε∗(k) ∼ 1√kt log kt

, t = CV

I Punchline: For k ∼ 1000 bit and reasonable channels

ε ≈ 10−3..10−2

Optimizing ARQ systems

I End-user wants Pe = 0

I Usual method: automatic repeat request (ARQ)

average throughput = Rate× (1− P[error])

I Question: Given k bits what rate (equiv. ε) maximizesthroughput?

I Assume (C ,V ) is known. Then approximately

T ∗(k) ≈ maxR

1− Q

(√kR

R− 1

0 0.5 1 1.50

I Solution: ε∗(k) ∼ 1√kt log kt

, t = CV

I Punchline: For k ∼ 1000 bit and reasonable channels

ε ≈ 10−3..10−2

Benefits of feedback: From ARQ to Hybrid ARQ

Encoder Channel DecoderW

z−1Y t−1

Xt Yt W

I Memoryless channels: feedback does not improve C[Shannon’56]

I Question: What about higher order terms?

Theorem

For any DMC with capacity C and 0 < ε < 1 we have for codeswith feedback and variable length:

R∗f (n, ε) =C

1− ε + O

(log n

Note: dispersion is zero!

Benefits of feedback: From ARQ to Hybrid ARQ

Encoder Channel DecoderW

z−1Y t−1

Xt Yt W

I Memoryless channels: feedback does not improve C[Shannon’56]

I Question: What about higher order terms?

Theorem

For any DMC with capacity C and 0 < ε < 1 we have for codeswith feedback and variable length:

R∗f (n, ε) =C

1− ε + O

(log n

Note: dispersion is zero!

Stop feedback bound (BSC version)

Theorem

For any γ > 0 there exists a stop feedback code of rate R, averagelength ` = E [τ ] and probability of error over BSC (δ)

ε ≤ E [f (τ)] ,

f (n)4= E

[1{τ ≤ n}2`R−Sτ

τ4= inf{n ≥ 0 : Sn ≥ γ}

Sn4= n log(2− 2δ) + log

1− δ ·n∑

Zk ∼ i.i.d. Bernoulli(δ) .

Feedback codes for BSC(0.11), ε = 10−3

0 100 200 300 400 500 600 700 800 900 10000

Avg. blocklength

ConverseAchievabilityNo feedback

No feedback

Stop feedback

Effects of flow control

Encoder Channel Decoder

z−1Y t−1

Xt Yt W

I Modeling of packet termination

I Often: reliability of start/end � reliability of payload

Theorem

If reliable termination is available, then there exist codes withvariable length and feedback achieving

R∗t (n, 0) ≥ C + O

Effects of flow control

Encoder Channel Decoder

z−1Y t−1

Xt Yt W

I Modeling of packet termination

I Often: reliability of start/end � reliability of payload

Theorem

If reliable termination is available, then there exist codes withvariable length and feedback achieving

R∗t (n, 0) ≥ C + O

Codes employing feedback + termination (BSC version)

Theorem

Consider a BSC (δ) with feedback and reliable termination. Thereexists a code sending k bits with zero error and average length

` ≤∞∑

)δt(1− δ)n−t min

)2k−n

Feedback + termination for the BSC(0.11)

0 100 200 300 400 500 600 700 800 900 10000

Avg. blocklength

VLFTVLFNo feedback

No feedback

Feedback + Termination

Stop feedback

Benefit of feedback

Delay to achieve 90% of the capacity of the BSC(0.11):

I No feedback:n ≈ 3100

I Stop feedback + variable-length:

n . 200

I Feedback + variable-length + termination:

n . 20

Benefit of feedback

Delay to achieve 90% of the capacity of the BSC(0.11):

I No feedback:n ≈ 3100 penalty term: O

(1√n

I Stop feedback + variable-length:

n . 200 penalty term: O(

log nn

I Feedback + variable-length + termination:

n . 20 penalty term: O(

Gaussian channel: Energy per bit

Zi∼ N(

0, N02

↓Xi −→

⊕ −→ Yi

Problem: minimal energy-per-bit Eb vs. payload size k:

|Xi |2]≤ kEb .

Asymptotically: [Shannon’49]

)→ log 2 = −1.6 dB , k →∞ .

Energy per bit vs. # of information bits (ε = 10−3)

Information bits, k

Achievability (non−feedback)Converse (non−feedback)Full feedback (optimal)

Energy per bit vs. # of information bits (ε = 10−3)

Information bits, k

Achievability (non−feedback)Converse (non−feedback)Stop feedbackFull feedback (optimal)

Summary

Classical: (n→∞)

Noisy channelOptimal coding

==============⇒ C

Finite blocklength: (n – finite)

Noisy channelOptimal coding

==============⇒ N(C , Vn

What we had to skip

I Hypothesis testing methods in Quantum IT:[Wang-Colbeck-Renner’09], [Matthews-Wehner’12],[Tomamichel’12],[Kumagai-Hayashi’13]

I Channels with state:[Ingber-Feder’10], [Tomamichel-Tan’13],[Yang-Durisi-Koch-P.’12]

I FBL theory of lattice codes: [Ingber-Zamir-Feder’12]

I Feedback codes: [Naghshvar-Javidi’12],[Williamson-Chen-Wesel’12]

I Random coding bounds and approximations:[Martinez-Guillen i Fabregas’11], [Kosut-Tan’12]

I Other FBL questions: [Riedl-Coleman-Singer’11],[Varshney-Mitter-Goyal’12], [Asoodeh-Lapidoth-Wang’12],[P.-Wu’13]

. . . and many more (apologies!) . . . (cf: References)

New results at ISIT’2013: two terminals

I Universal lossless compression: Kosut-Sankar [MoD5]

I Random number generation: Kumagai-Hayashi [WeA3]

I Quasi-static SIMO: Yang-Durisi-Koch-P. [WeA4]

I O(log n) = 12 log n : Tomamichel-Tan, Moulin [WeA4]

I Meta-converse is tight: Vazquez-Vilar et al [WeB4]

I Meta-converse for unequal error protection:Shkel-Tan-Draper [WeB4]

I RCU* bound: Haim-Kochman-Erez [WeB4]

I Cost constraints: Kostina-Verdu [WeB4]

I Lossless compression: Kontoyiannis-Verdu [WeB5]

I Feedback: Chen-Williamson-Wesel [ThD6]

New results at ISIT’2013: multi-terminal

I achievability bounds: Yassae-Aref-Gohari [TuD1]

I random binning: Yassae-Aref-Gohari [ThA1]

I interference channel: Le-Tan-Motani [ThA1]

I Gaussian line network: Subramanian-Vellambi-Land [ThA1]

I Slepian-Wolf for mixed sources: Nomura-Han [ThA7]

I one-help-one and Wyner-Ziv: Watanabe-Kuzuoka-Tan [FrC5]

References: Lossless compression

A. A. Yushkevich, “On limit theorems connected with the concept of entropy of Markov chains”, UspekhiMatematicheskikh Nauk, 8:5(57), pp. 177-180, 1953

V. Strassen, “Asymptotische Abschatzungen in Shannon’s Informationstheorie,” Trans. Third Prague Conf.Information Theory, 1962, Czechoslovak Academy of Sciences, Prague, pp. 689-723. Translation:http://www.math.cornell.edu/~pmlut/strassen.pdf

I. Kontoyiannis, “Second-order noiseless source coding theorems,” IEEE Trans. Inform. Theory, vol. 43, no. 3, pp.1339-1341, July 1997

M. Hayashi, “Second-order asymptotics in fixed-length source coding and intrinsic randomness,” IEEE Trans.Inform. Theory, vol. 54, no. 10, pp. 4619-4637, October 2008.

W. Szpankowski and S. Verdu, “Minimum expected length of fixed-to-variable lossless compression without prefixconstraints,” IEEE Trans. on Information Theory, vol. 57, no. 7, pp. 4017-4025, July 2011.

S. Verdu and I. Kontoyiannis, “Lossless Data Compression Rate: Asymptotics and Non-Asymptotics,” 46th AnnualConference on Information Sciences and Systems, Princeton, New Jersey, March 21-23, 2012

R. Nomura and T. S. Han, “Second-Order Resolvability, Intrinsic Randomness, and Fixed-Length Source Codingfor Mixed Sources: Information Spectrum Approach,” IEEE Trans. Inform. Theory, vol.59, no.1, pp.1,16, Jan.2013

I. Kontoyiannis and S. Verdu, “Optimal lossless compression: Source varentropy and dispersion,” Proc. 2013 IEEEInt. Symposium on Information Theory, Istanbul, Turkey, July 7-12, 2013

References: multi-terminal compression

S. Sarvotham, D. Baron, and R. G. Baraniuk, “Non-asymptotic performance of symmetric Slepian-Wolf coding,” inConference on Information Sciences and Systems, 2005.

D.-K. He, L. A. Lastras-Montano, E.-H. Yang, A. Jagmohan, and J. Chen, “On the redundancy of Slepian-Wolfcoding,” IEEE Trans. Inform. Theory, vol. 55, no. 12, pp. 5607-27, Dec 2009.

V. Y. F. Tan and O. Kosut, “The dispersion of Slepian-Wolf coding,” in Proc. 2012 IEEE Intl. Symp. Inform.Theory (ISIT), Jul. 2012, pp. 915-919.

S. Kuzuoka, “On the redundancy of variable-rate Slepian-Wolf coding,” Inform. Theory and its Applications(ISITA), 2012 International Symposium on (pp. 155-159), Oct. 2012.

S. Verdu, “Non-asymptotic achievability bounds in multiuser information theory,” in Allerton Conference, 2012.

References: Channel coding (1948-2008)

A. Feinstein, “A new basic theorem of information theory,” IRE Trans. Inform. Theory, vol. 4, no. 4, pp. 2-22,1954.

P. Elias, “Coding for two noisy channels,” Proc. Third London Symposium on Information Theory, pp. 61-76,Buttersworths, Washington, DC, Sept. 1955

C. E. Shannon, “Certain results in coding theory for noisy channels,” Inform. Control, vol. 1, pp. 6-25, 1957.

J. Wolfowitz, “The coding of messages subject to chance errors,” Illinois J. Math., vol. 1, pp. 591-606, 1957.

C. E. Shannon, “Probability of error for optimal codes in a Gaussian channel,” Bell System Tech. Journal, vol. 38,pp. 611-656, 1959.

R. L. Dobrushin, “Mathematical problems in the Shannon theory of optimal coding of information,” Proc. 4thBerkeley Symp. Mathematics, Statistics, and Probability, 1961, vol. 1, pp. 211-252.

V. Strassen, “Asymptotische Abschatzungen in Shannon’s Informationstheorie,” Trans. Third Prague Conf.Information Theory, 1962, Czechoslovak Academy of Sciences, Prague, pp. 689-723. Translation:http://www.math.cornell.edu/~pmlut/strassen.pdf

R. G. Gallager, “A simple derivation of the coding theorem and some applications,” IEEE Trans. Inform. Theory,vol. 11, no. 1, pp. 3-18, 1965.

C. E. Shannon, R. G. Gallager, and E. R. Berlekamp, “Lower bounds to error probability for coding on discretememoryless channels I”, Inform. Contr., vol. 10, pp. 65-103, 1967

J. Wolfowitz, “Notes on a general strong converse,” Inform. Contr., vol. 12, pp. 1-4, 1968.

H. V. Poor and S. Verdu, “A lower bound on the error probability in multihypothesis testing,” IEEE Trans. Inform.Theory, vol. 41, no. 6, pp. 1992-1993, 1995.

A. Valembois and M. P. C. Fossorier, “Sphere-packing bounds revisited for moderate block lengths,” IEEE Trans.Inform. Theory, vol. 50, no. 12, pp. 2998-3014, 2004.

G. Wiechman and I. Sason, “An improved sphere-packing bound for finite-length codes over symmetric memorylesschannels,” IEEE Trans. Inform. Theory, vol. 54, no. 5, pp. 1962-1990, 2008.

References: Channel coding (2008-)

Y. Polyanskiy, H. V. Poor and S. Verdu, “New channel coding achievability bounds,” Proc. 2008 IEEE Int. Symp.Information Theory (ISIT), Toronto, Canada, 2008.

L. Wang, R. Colbeck and R. Renner, “Simple channel coding bounds,” Proc. 2009 IEEE Int. Symp. InformationTheory (ISIT), Seoul, Korea, July 2009.

M. Hayashi, “Information spectrum approach to second-order coding rate in channel coding,” IEEE Trans. Inform.Theory, vol. 55, no. 11, pp. 4947-4966, Nov 2009.

Y. Polyanskiy, H. V. Poor and S. Verdu, “Dispersion of Gaussian channels,” 2009 IEEE Int. Symp. Inf. Theory(ISIT), Seoul, Korea, Jul. 2009.

L. Varshney, S. K. Mitter and V. Goyal, “Channels that die,” Communication, Control, and Computing, Allerton,2009.

Y. Polyanskiy, H. V. Poor and S. Verdu, “Channel coding rate in the finite blocklength regime,” IEEE Trans. Inf.Theory, vol. 56, no. 5, pp. 2307-2359, May 2010.

Y. Polyanskiy, H. V. Poor and S. Verdu, “Dispersion of the Gilbert-Elliot channel,” IEEE Trans. Inf. Theory, vol.57, no. 4, pp. 1829-1848, Apr. 2011.

Y. Polyanskiy and S. Verdu, “Channel dispersion and moderate deviations limits for memoryless channels,” 48thAllerton Conference 2010, Allerton Retreat Center, Monticello, IL, USA, Sep. 2010.

Y. Polyanskiy and S. Verdu, “Arimoto channel coding converse and Renyi divergence,” 48th Allerton Conference2010, Allerton Retreat Center, Monticello, IL, USA, Sep. 2010.

A. Ingber and M. Feder, “Finite blocklength coding for channels with side information at the receiver.” Electricaland Electronics Engineers in Israel (IEEEI), 2010 IEEE 26th Convention of. IEEE, 2010.

S. Asoodeh, A. Lapidoth and L. Wang, “It takes half the energy of a photon to send one bit reliably on thePoisson channel with feedback.” arXiv:1010.5382, 2010

T. J. Riedl, T. Coleman and A. Singer, “Finite block-length achievable rates for queuing timing channels.”Information Theory Workshop (ITW), 2011.

A. Martinez and A. Guillen i Fabregas, “Random-coding bounds for threshold decoders: Error exponent andsaddlepoint approximation,” in Proc. Int. Symp. Info. Theory, Aug. 2011.

Y. Polyanskiy, H. V. Poor and S. Verdu, “Minimum energy to send k bits through the Gaussian channel with andwithout feedback,” IEEE Trans. Inf. Theory, vol. 57, no. 8, pp. 4880 - 4902, Aug. 2011.

P.-N. Chen, H.-Y. Lin, and S. Moser, “Ultra-small block-codes for binary discrete memoryless channels,” in Proc.IEEE Inf. Theory Works., Paraty, Brazil, Oct. 2011.

Y. Polyanskiy and S. Verdu, “Scalar coherent fading channel: dispersion analysis,” 2011 IEEE Int. Symp. Inf.Theory (ISIT), St. Petersburg, Russia, Aug. 2011.

Y. Polyanskiy, H. V. Poor and S. Verdu, “Feedback in the non-asymptotic regime,” IEEE Trans. Inf. Theory, vol.57, no. 8, pp. 4903 - 4925, Aug. 2011.

W. Yang, G. Durisi, T. Koch, Y. Polyanskiy, “Diversity versus channel knowledge at finite block-length,” 2012 Inf.Theory Workshop (ITW), Lausanne, Switzerland, Sep 2012.

Y. Polyanskiy and S. Verdu, “Empirical distribution of good channel codes with non-vanishing error probability,”IEEE Trans. Inf. Theory, submitted Dec. 2012.http://people.lids.mit.edu/yp/homepage/data/optcodes_journal.pdf

Y. Altug and A.Wagner., “Moderate deviations in channel coding.” arXiv:1208.1924, 2012

J. Hoydis, R. Couillet, P. Piantanida and M. Debbah, “A random matrix approach to the finite blocklength regimeof MIMO fading channels.” Proc 2012 IEEE ISIT, 2012.

P. Moulin, “The log-volume of optimal constant-composition codes for memoryless channels, within O(1) bits.”Information Theory Proceedings (ISIT), 2012 IEEE International Symposium on. IEEE, 2012.

J. Scarlett and A. Martinez, “Mismatched Decoding: Finite-Length Bounds, Error Exponents andApproximations.” arXiv preprint arXiv:1303.6166 (2013).

W. Matthews and S. Wehner, “Finite blocklength converse bounds for quantum channels” arXiv:1210.4722, 2012

M. Tomamichel, “A framework for non-asymptotic quantum information theory.” arXiv:1203.2142, 2012

A. Ingber, R. Zamir, M. Feder, “Finite-Dimensional Infinite Constellations,” IEEE Trans. Inform. Theory, vol.59,no.3, pp.1630,1656, March 2013

M. Tomamichel and V. Tan, “ε-Capacities and Second-Order Coding Rates for Channels with General State.”arXiv:1305.6789, 2013

M. Naghshvar and T. Javidi, “Active sequential hypothesis testing.” arXiv:1203.4626, 2012

A. Williamson, T.-Y. Chen and R. D. Wesel, “A rate-compatible sphere-packing analysis of feedback coding withlimited retransmissions.” Information Theory Proceedings (ISIT), 2012 IEEE International Symposium on, 2012.

Y. Polyanskiy, “Saddle point in the minimax converse for channel coding,” IEEE Trans. Inf. Theory, vol. 59, no.5, pp. 2576-2595, May 2013.

Y. Polyanskiy and Y. Wu, “Peak-to-average power ratio of good codes for Gaussian channel”, arXiv:1302.0084,Jan 2013.

References: Channel coding (multi-terminal)

T. S. Han, Information-Spectrum Methods in Information Theory. Springer Berlin Heidelberg, Feb 2003.

Y. Huang and P. Moulin, “Finite blocklength coding for multiple access channels,” in Proc. IEEE Intl. Symp.Inform. Theory (ISIT), Jul 2012, pp. 831-835.

E. MolavianJazi and J. N. Laneman, “Simpler achievable rate regions for multiaccess with finite blocklength,”Proc. 2012 IEEE Int. Symp. Information Theory, July 2012.

V. Tan and O. Kosut, “On the dispersions of three network information theory problems.” Information Sciencesand Systems (CISS), 2012 46th Annual Conference on. IEEE, 2012.

P. Moulin, “A new metaconverse and outer region for finite-blocklength MACs.” Information Theory andApplications Workshop (ITA), 2013. IEEE, 2013.

S. Verdu, “Non-asymptotic achievability bounds in multiuser information theory,” in Allerton Conference, 2012.

V. Tan, “On Dispersions of Discrete Memoryless Channels with Noncausal State Information at the Encoder.”arXiv:1204.0431(2012).

V. Tan, “The Capacity of the General Gelfand-Pinsker Channel and Achievable Second-Order Coding Rates.”arXiv:1210.1091(2012).

M. H. Yassaee, M. R. Aref, and A. Gohari, “A Technique for Deriving One-Shot Achievability Results in NetworkInformation Theory.” arXiv preprint arXiv:1303.0696 (2013).

References: Lossy compression, joint source-channel coding

I. Kontoyiannis, “Pointwise redundancy in lossy data compression and universal lossy data compression,” IEEETrans. Inform. Theory, vol. 46, no. 1, pp. 136-152, Jan. 2000

A. Ingber and Y. Kochman, “The dispersion of lossy source coding,” in Data Compression Conference (DCC),Snowbird, UT, Mar. 2011, pp. 53-62.

D. Wang, A. Ingber, and Y. Kochman, “The dispersion of joint source-channel coding,” in Communication,Control, and Computing (Allerton), 2011 49th Annual Allerton Conference on, pp. 180-187.

A. Ingber, D Wang and Y. Kochman, “Dispersion theorems via second order analysis of functions of distributions.”Information Sciences and Systems (CISS), 2012 46th Annual Conference on. IEEE, 2012.

V. Kostina and S. Verdu, “Fixed-length lossy compression in the finite blocklength regime,” IEEE Trans. Inform.Theory, vol. 58, no. 6, Jun. 2012.

A. Tauste Campo, G. Vazquez-Vilar, A. Guillen i Fabregas and A. Martinez, “Converse bounds for finite-lengthjoint source-channel coding,” in Proc. 50th Allerton Conf. on Comm., Cont. and Comp., Monticello, IL, USA,Oct. 2012.

D. Wang, A. Ingber and Y. Kochman, “A strong converse for joint source-channel coding.” Information TheoryProceedings (ISIT), 2012 IEEE International Symposium on. IEEE, 2012.

V. Kostina and S. Verdu, “To code or not to code: Revisited.” Information Theory Workshop (ITW), 2012 IEEE.IEEE, 2012.

V. Kostina and S. Verdu, “Lossy Joint Source-Channel Coding in the Finite Blocklength Regime”, IEEE Trans. onInfor. Theory, vol. 59, no. 5, pp. 2545-2575, May 2013

N. Datta, J. M. Renes, R. Renner and M. M. Wilde, “One-shot lossy quantum data compression.”arXiv:1304.2336, 2013

Thank you!

Do not hesitate to ask questions!

Yury Polyanskiy <yp@mit.edu>

Sergio Verdu <verdu@princeton.edu>

Plan: 2.Converse bounds 3.Achievability bounds 4.Channel ...people.lids.mit.edu › yp › homepage...

Documents

Mistake Bounds

Bounds Good

BMP ACHIEVABILITY vs. WQS

MotivationData compressionChannel codingAnomalous dispersion Finite blocklength …people.lids.mit.edu/yp/homepage/data/ESIT-Tutorial-YP.pdf · 2016-04-12 · MotivationData compressionChannel

9 Tail Bounds

Storytelling without Bounds

Si So Bounds

Tighter Problem-Dependent Regret Bounds in Reinforcement ... · Tighter Problem-Dependent Regret Bounds in Reinforcement Learning without Domain Knowledge using Value Function Bounds

On the Achievability of Simulation-Based Security for ... · On the Achievability of Simulation-Based Security for FE 3 most ‘encryption queries (i.e., the number of challenge ciphertexts)

On the Achievability of White-Box Cryptography · On the Achievability of White-Box Cryptography Author: Răzvan Roşie Subject: Computer Science [cs]/Cryptography and Security [cs.CR]

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 56, NO. 5 ...rqiu/teaching/ece7750... · achievability bounds, the RCU and DT bounds contain no pa-rameters (other than the input distribution)

Session T6 Improving the Reliability and Achievability of

Achievability of Asymptotic Minimax Regret by Horizon

Extending Sutton’s “Bounds”

The Fifth Community Workshop on Achievability and ......The Fifth Community Workshop on Achievability and Sustainability of Human Exploration of Mars 5 –7 December 2017, Washington

Curvature bounds: discrete versus continuous spacesANCA BONCIOCAT, IMAR BUCHAREST CURVATURE BOUNDS: DISCRETE VERSUS CONTINUOUS SPACES. ROUGH CURVATURE BOUNDS ROUGH CD-CONDITION FIN

Out Of Bounds

Ball in Bounds

Achievability of the Paris targets in the EU—the role of ... · Achievability of the Paris targets in the EU—the role of demand-side-driven mitigation in different types of scenarios

Achievability Issues