§1.1 Discrete random variables §1.2 Discrete random vectors §1.1 Discrete random variables §1.2 Discrete random vectors §1 Entropy and mutual information

§1.1 Discrete random variables

§1.2 Discrete random vectors



§1 Entropy and mutual information

§1.1.1 Discrete memoryless source and entropy§1.1.2 Discrete memoryless channel

and mutual information

§1.1.1 Discrete memoryless source and entropy

Example 1.1.1

Let X represent the outcome of a single roll of a fair die.

6/16/16/16/16/16/1

654321

P

X

1)(,)(,),(),(

,,,

)( 121

21

q

ii

q

qaP

aPaPaP

aaa

xP

X

1. DMS (Discrete memoryless source )

Probability Space:

2. self information

Example 1.1.2


1 2 1 2 3 4

( ) 0 5 0 5 ( ) 0 25 0 25 0 25 0 25

X a a Y a a a a,

P x . . P y . . . .

red white red white blue black

Analyse the uncertainty of red ball selected from X and from Y.

2. self information

I(ai) = f [p(ai)]Satisfy:

1) I(ai) is the monotone decreasing function of p(ai):

if p(a1)> p(a2), then I(a1) < I(a2) ；2) if p(ai)＝ 1, then I(ai)＝ 0；3) if p(ai)＝ 0 , then I(ai)→∞；4) if p(ai aj)＝ p(ai) p(aj)， then I(aiaj)=I(ai)+I(aj)


self information 1

( ) log log ( )( )i r r i

i

I a p ap a

I(ai)

p(ai)0 1

0)()(1 ji apap )()( ji aIaI

0)( iap )( iaI

1)( iap 0)( iaI

)()()( bIaIabI a and b are statistically independent


Remark:

The measure of uncertainty of the random variable ai

The measure of information the random variable ai provides.

bitnat

hart

3. Entropy

1

( ) [ ( )] ( ) log ( )q

i i ii

H X E I a p a p a

Definition:Suppose X is a discrete random variable, whose

range R={a1,a2,…} is finite or countable.

Let p(ai)=P{X=ai}. The entropy of X is defined by

A measure ofamount of information provided by X.

uncertainty (or randomness) about X.


average

Entropy-the amount of “information”provided by an observation of X

Example 1.1.3 100 balls in a bag, 80% is red, and remain is white. Now , we fetch out a ball. How about the information of every fetching?

2.08.0)(21 aa

xP

XN

aInaIn

N

II N )()( 2211

)()()()( 2211 aIapaIap

2

1

)(log)(i

ii apap),( 11 aNpn

)( 22 aNpn =0.722 bit/sig


( ) (0.8,0.2)

0.722 /

H X H

bit sig

Entropy-the “uncertainty” or “randomness” about X

Example 1.1.4

01.099.0)(

,3.07.0)(

,5.05.0)(

21

3

321

2

221

1

1 aa

xP

Xaa

xP

Xaa

xP

X

)/(08.0)( 3 sigbitXH

)/(1)( 1 sigbitXH

)/(88.0)( 2 sigbitXH

average


3. Entropy

1

( ) [ ( )] ( ) log ( )q

i i ii

H X E I a p a p a

1) units: bit/sig,nat/sig,hart/sig

2) If p(ai)=0, p(ai)log p(ai)-1 = 0

3) If R is infinite , H(X) may be +

Note:


Example 1.1.5 entropy of BS

qpP

X 10 10 p

pq 1

)1log()1(log)( ppppXH

3. Entropy

1 21

1( , , ..., ) log

q

q ii i

H p p p pp

( )H p

entropy function

probability vector

( )H p


4. The properties of entropy

Theorem1.1 Let X assume values in R={x1,x2,…,xr}.

0)( XH1)

2) H(X) = 0 iff pi = 1 for some i

3) H(X) ≤ logr ,with equality iff pi = 1/r for all i

——base of data compressing


Proof:

(Theorem 1.1 in textbook)

Lemma: 0, log ( 1) log , ln 1.

with equality iff 1.

x x x e or x x

x

4) ),...,,(),...,,(2121 riiir pppHpppH

2

1

6

1

3

1)(

321 aaa

xP

X

3

1

2

1

6

1)(

321 aaa

yP

Y

6

1

2

1

3

1)(

321 bbb

zP

Z

Example 1.1.6 Let X,Y,Z are all discrete random variables:


1.46 ( / )H bit sig


5) If X,Y are independent , then H(XY) = H(X) + H(Y)


Proof:

1 2

( ) ( )

q

i

a a aX

P x p a

( ) 1ii

p a

1 2

( )( )r

j

b b bY

p bP y

( ) 1jj

p b j

1

1( ) ( ) log

( )

r

j j

H Y p bp b

i1

1( ) ( ) log

( )

q

i i

H X p ap a


Proof:

1 2,

( ) ( )

q

i

a a aX

P x p a

( ) 1i

i

p a 1 2 ,( )( )

r

j

b b bY

p bP y

( ) 1j

j

p b

)()()()(

),(),(

)(111

jijik

qrrq

bpapbapP

baba

xyP

XY

1)()(1 11

q

i

r

jji

qr

kk baPP

qr

k PPXYH

1 kk )(

1log)()(

q

i

r

jbapji ji

bap1 1

)(1log)(

i j

bpjij i

apij jibpapapbp )(

1)(

1 log)()(log)()(

Joint source:


Theorem1.2 The entropy function H(p1,p2,…,pr) is a convex function of probability vector (p1,p2,…,pr) .

6) Convex properties



0 1/2 1

1

)(

)1log()1(log)(

pH

ppppXH

Example 1.1.5 (continued)entropy of BS

H

p

5. conditional entropy

X, Y are a pair of random variables, if (X,Y)~p(x,y)

Then the conditional entropy of X , given Y is defined by

Definition:

,

1 1( | ) log ( , ) log

( | ) ( | )X Y

H X Y E p x yp x y p x y


X yxp

yxpyYXH)|(

1log)|()|(

Y

yYXHypYXH )|()()|(

Y X yxp

yxpyp)|(

1log)|()(

YX yxp

yxp, )|(

1log),(


Analyse:


Example 1.1.7

3/40

1

0

1

?

1/4

1/2

1/2

X Y

pX(0)=2/3, pX(1) = 1/3

H(X)=?

H(X|Y=0) = ?

H(X|Y=?) = ?

H(X|Y) = ?


H(X) = H(2/3,1/3)=0.9183 bit/sig

H(X|Y=0) = 0

H(X|Y=1) = 0

H(X|Y) = 1/3 bit/sig

H(X|Y=?) = H(1/2,1/2)=1 bit/sig




Theorem1.3

with equality iff X and Y are independent.

( | ) ( )H X Y H X

Proof:

(conditioning reduces entropy)

Review

KeyWords: Measure of information

self information

entropy

properties of entropy

conditional entropy

Homework

1. P44: T1.1,2. P44: T1.4,3. P44: T1.6,

4. Let X be a random variable taking on a finite number of values. What is the relationship

of H(X) or H(Y) if (1) Y=2X ? (2) Y=cosX ?

Homework

1

1 1

5. Let be an ensemble of points , ,

and let ( )= , prove that

1 1( ) log +(1- ) log +(1- ) ( )

1where Y is an ensemble of -1 points , ,

with probabilities ( ) ( ) /(1- );

1 1

M

X M

M

Y j X j

X M a a

P a

H X H Y

M a a

P a P a

j M

. Prove that

1 1( ) log +(1- ) log +(1- )log( -1)

1and determine condition for equality.

H X M

Homework

6. Given a chessboard with 8×8=64 squares. A chessman is put randomly in a square. Guess the location of the chessman. Find the uncertainty of the result.

if we mark every square by its row and column number, and already know the row number of the chessman, how about the uncertainty?

Coin flip. A fair coin is flipped until the first head occurs. Let X denote the number of flips required. Find the entropy H(X) in bits.

thinking ：

r

rr

n

n

112

1 )1( r

rrn

n

n

...

21...

21

21

21

......321

)( 32 n

n

xP

X

Homework

Imply:




§1.1.1 Discrete memoryless source and entropy§1.1.2 Discrete memoryless channel

and mutual information

Channel1 2

,

{ , , , }i

r

x X

X a a a

1 2

,

{ , , , }

j

s

y Y

Y b b b

p(y∣x)

1

( )

( ) 1

j i

s

j ij

or p b a

p b a

§1.1.2 Discrete memoryless channel and mutual information


1. DMC (Discrete Memoryless Channel)

The model of DMC

01

r-1

01

s-1

p(y|x)

r input symbols, s output symbols

representation of DMC

x yp(y|x)

0)|( xyp for all x,y

y

xyp 1)|( for all x



transition probabilities

graph




matrix

11 12 11 1 2 1 1

21 22 21 2 2 2 2

1 21 2

1

( ) ( ) ( )

( ) ( ) ( )

( ) ( ) ( )

( ) 1

ss

ss

r r rsr r s r

s

j ij

p p pp b a p b a p b a

p p pp b a p b a p b aP

p p pp b a p b a p b a

p b a

transition probabilities matrix




formula

( ) ( )j ip y x p b a

1 2

1

, ,

( )( )

( ) 1

r

i

r

ii

a a aX

p aP x

p a

1 2

1

, ,

( )( )

( ) 1

s

j

s

ji

b b bY

p bP y

p b

1

( ) 1s

j ij

p b a

Example 1.1.8: BSC (Binary Symmetric Channel)

r = s = 2

pp

ppP

1

1p(0|0) = p(1|1) = 1-p

p(0|1) = p(1|0) = p

0 0

1 1

1-p

1-p

p

p




Example 1.1.9: BEC (Binary Erasure Channel)


0 1 ？ 1 1 ？？

0 1 0 ？ 1 ？ 1 ？

Example 1.1.9: BEC (Binary Erasure Channel)

r = 2, s = 3

p(0|0) = p, p(?|0) = 1-p

p(1|1) = q, p(?|1) = 1-q

qq

ppP

10

01

p0

1

0

1

?

1-p

1-q

q



2. average mutual information

definition

I(X;Y) = H(X) – H(X|Y)


Channel1 2

,

{ , , , }i

r

x X

X a a a

1 2

,

{ , , , }

j

s

y Y

Y b b b

p(y∣x) or p(ai|bj)

1

( ) 1s

j ij

p b a

H(X) H(X|Y)

entropy equivocation

average mutual information

The reduction in uncertainty about X conveyed by the observations Y;

The information about X from Y.


definition

X XY yxp

yxpxp

xp)|(

1log),(

)(

1log)(

XY XY yxP

yxPxP

yxP)|(

1log),(

)(

1log),(


I(X;Y) = H(X) – H(X|Y)

( | )( , ) log

( )XY

p x yp x y

p x

( , )( , ) log

( ) ( )XY

p x yp x y

p x p y

( | )( , ) log

( )XY

p y xp x y

p y


definition


I(X;Y) and I(x;y)

( | )( , ) log

( )

P x yI x y

P x

I(X;Y) ＝ EXY[I(x;y)]

mutual information

I(X;Y) and H(X)

properties

1) Non-negativity of average mutual information

Theorem1.4 For any discrete random variables X and Y,

0);( YXI .Moreover I(X;Y) = 0 iff X and Y are independent.


2. Average mutual information

Proof:(Theorem 1.3 in textbook)

We do not expect to be misled on average by observing the output of channel.

properties



X Y’ Y

listener-in

S encrypt

Key

channel decrypt D

total loss

message ： arrive at four

ciphertext ： duulyh dw irxu

A cryptosystem

I(X;Y) = I(Y;X)

I(X;Y) = H(Y) – H(Y|X)

I(X;Y) = H(X) – H(X|Y)

I(X;Y) = H(X) + H(Y) – H(XY)

Joint entropy



3) relationship between entropy and average mutual information

2) symmetry

Mnemonic Venn diagram H(X)

H(Y)

H(Y∣X)H(X∣Y)

I(X;Y)

H(XY)

properties



a1

a2

ar

b1

b2

br

a1

a2

b1

b2

b5

b3

b4

a1

a2

a3

b1

b2

Recognising channel

properties

4) Convex property



I(X;Y)=f [P(x) ， P(y|x)]

( | )( ; ) ( ) log

( )

( ) ( ) ( | ) ( )

( ) ( | ) ( )

X Y

X X

P x yI X Y P xy

P x

P y P xy P y x P x

P xy P y x P x

properties

4) Convex properties

Theorem1.5 I(X;Y) is a convex function of the input probabilities P(x).



I(X;Y)=f [P(x) ， P(y|x)]

Theorem1.6 I(X;Y) is a convex function of the transitionprobabilities P(y|x).



properties

Example 1.1.10 analyse the I(X;Y) of BSC



0 1

( ) 1

X

P x

source ：， channel ：

1－ p

1－ p

p

p

0 0

1 1

( | )( ; ) ( ) log

( )X Y

P x yI X Y P xy

P x

( | )( ) log ( ) ( | )

( )X Y

P y xP xy H Y H Y X

P y




1 1 1( | ) ( ) log log log ( )

( | )X Y

H Y X p xy p p H pp y x p p

( ) ( 2 )H Y H p p

( ; ) ( 2 ) ( )I X Y H p p H p




( ; ) ( 2 ) ( )I X Y H p p H p

Review

KeyWords: Channel and it’s information measure

channel model

equivocation

average mutual information

mutual information

properties of average mutual information

Thinking

( ; ) 0I X Y ( ; ) 0Cov X Y

: ( ; ) ( ) ( )comparing I X Y H X H Y，，



1 2 3( )Tb b b b

Let the source have alphabet A={0,1} with p0=p1=0.5.

Let encoder C have alphabet B={0,1,… ,7}and let the elements of B have binary representation

( 0) 5 (101)b t

The encoder is shown below. Find the entropy of the coded output and find the output sequence if the input sequence is a(t)={101001011000001100111011} and the initial contents of the registers are

Example 1.1.11

D Q D QD Qa(t)

b0 b1 b2

0

1

2

3

4

5

6

7

0

1

2

3

4

5

6

7

Yt Yt+1


a(t)={101001011000001100111011}

b = {001242425124366675013666}

Homework

1. P45: T1.10,

2. P46: T1.19(except c)

3. Let the DMS 0 1

( ) 0.6 0.4

X

p x

conveys message through a channel:

Calculate that:(1) H(X) and H(Y);(2) the mutual information of xi and yj (i,j=1,2);(3) the equivocation H(X|Y) and average mutual information.

5 1(0 | 0) (1 | 0) 6 6(0 |1) (1 |1) 3 1

4 4

p pP

p p

Homework

4. Suppose that I(X;Y)=0.Does this imply that I(X;Z)=I(X;Z|Y)?

5. In a joint ensemble XY, the mutual information I(x;y) is a random variable. In this problem we are concerned with the variance of that random variable, VAR[I(x;y)].

(1) Prove that VAR[I(x;y)]=0 iff there is a constant αsuch that, for all x,y with P(xy)>0,

P(xy)= αP(x) P(y)(2) Express I(X;Y) in term of α and interpret the special

case α =1. (continued)

Homework

5. (3) for each of the channel in fig5 , find a probability

assignment P(x) such that I(X;Y) >0 and VAR[I(x;y)]=0 . Calculate I(X;Y).

a1

a2

a3

b1

b2

1

1

1

a1

a2

a3

b1

b2

1/2

1/2

1/2b3

1/2

1/2

1/2






§1.2.1 Extended source and joint entropy§1.2.2 Extended channel and mutual information

§1.2.1 Extended source and joint entropy

1. Extended source

Source model

q

q

ppp

aaa

xP

X

...

...

)( 21

211

1

q

iip

N-times extended

source],,[

),(

21

21

qi

N

aaaXX

XXXX

1 2

1 1 1 1

1

( ), , ( )

( ) ( )( )

( ) 1

N

N

N

Nq q qq

i i i i

q

ii

a a a a a aX

p p a a aP x

p

Example 1.2.1

2. Joint entropy

Definition:

The joint entropy H(XY) of a pair of discrete random

variables (X,Y) with a joint distribution p(x,y) is defined as ( ) ( , ) log ( , )

x X y Y

H XY p x y p x y

which can also be expressed as

( ) [log ( , )]H XY E p x y


2. Joint entropy Extended DMS

Nq

iii

N ppXHXH1

)(log)()()(

)...(log)...(...21

1

21

21 1 1NN

N

iii

q

iiii

q

i

q

i

aaapaaap

q

i

q

iii

q

iii

N

Napapapap

1 11 1

2

1

11)()...()(log)(

q

i

q

iii

q

iii

N

NNNapapapap

1 11 1

11

1

)()...()(log)(...

)()(...)( XNHXHXH


2. Joint entropy memory source


2） Joint entropy

q

i

q

jjiji aapaapXXH

1 121 )(log)()(

)(2

1)( 212 XXHXH

bit/sig

1） Conditional entropy

2 11 1

( | ) ( ) log ( | )q q

i j j ii j

H X X p a a p a a

3） (per symbol) entropy

3. Properties of joint entropy

Theorem1.7 (Chain rule) :

H(XY) = H(X) + H(Y|X)

Proof: ( ) ( , ) log ( , )x X y Y

H XY p x y p x y

Xx Yy

xypxpyxp )|()(log),(

Xx YyXx Yy

xypyxpxpxypxp )|(log),()(log)|()(

Xx YyXx

xypyxpxpxp )|(log),()(log)(

)|()( XYHXH


Let X be a random variable, its probability space is ：

6

1

3

1

2

1210

)(xP

X

Its joint probability )( jiaaP

ia ja

1/4 1/4 0

1/24

1/24

1/4

0

1/24

1/8

0 1 2

0

1

2

Example 1.2.3



H(X)=?

P(aj| ai) aj=0 aj =1 aj =2 ai=0 1/2 1/2 0 ai=1 3/4 1/8 1/8 ai=2 0 1/4 3/4

Relationship

)|()()( 12121 XXHXHXXH

)( 21 XXH )(2 XH

)|( 12 XXH)(XH

H(X2) ≥ H(X2|X1)

H(X1X2) ≤ 2H(X1)



2( ) ( )H X H X

General stationary source

)(...)()(

...

)( 21

21

q

q

apapap

aaa

xP

X1)(

1

q

iiap

),...()...(2121 NiiiN aaapXXXP },...,2,1{,...,, 21 qiii N

Let X1,X2,…,XN be dependent , the joint probability is ：


)...|()...|()()...( 1211211 NNN xxxxPxxPxPxxP


• Joint entropy

N

NNii

iiiiiiN aaapaaapXXXH...

21

1

2121)...(log)...()...(

)...(1

)( 21 NN XXXHN

XH


Definition of entropies


• conditional entropy

)...|(log)...()...|(121

1

1...

11 NN

N

N iiiiii

iiNN aaaapaapXXXH

• (per symbol) entropy

Theorem1.8 (Chain rule for entropy):

Let X1,X2,…,Xn be drawn according to p(x1,x2,…,xn). Then

1 2 1 11

( ... ) ( | ... )n

n i ii

H X X X H X X X

Proof (do it by yourself)



)...|()...|( 2211121 NNNN XXXXHXXXXH

)...|()( 11 NNN XXXHXH

)()( 1 XHXH NN

)...|(lim)(lim 121 NNN

NN

XXXXHXHH


Relation of entropies

——base of data compressing

If H(X)<∞, then:


entropy rate

Theorem1.9 (Independence bound on entropy):

Let X1,X2,…, Xn be drawn according to p(x1,x2,…,xn). Then

1 21

( ... ) ( )n

n ii

H X X X H X

with equality iff the Xi are independent


(P37(corollary) in textbook)



Example 1.2.4

Suppose a memoryless source with A={0,1} having equal probabilities emits a sequence of six symbols. Following the sixth symbol, suppose a seventh symbol is transmitted which is the sum modulo 2 of the six previous symbols. What is the entropy of the seven-symbol sequence?





§1.2.1 Extended source and joint entropy§1.2.2 Extended channel and mutual information

1. The model of extended channel

source encoder

channel decoder

(U1,U2,…,Uk) (X1,X2,…,XN)

(Y1,Y2,…,YN)(V1,V2,…,Vk)

A general communication system

XN

YN

§1.2.2 Extended channel and mutual information

Extended channel

1 2( ) NNX X X X X

1 2( ) N

NY Y Y Y Y

1 2{ , , , }i i rX x X a a a 1 2{ , , , }j j sY y Y b b b ( )P y x

1 2 1 2

1

( ) ( )

( )

N N

DMC N

i ii

P y x P y y y x x x

P y x





1 1 1 1 1 1

2 1 1 2 1 1 1 2

( ) ( )

( ) ( | ) ( )

( ) ( )N N

N N

h k

r r s sr s

X Y

a a b b

a a a p b b b

a a b b


( ; ) ( ; ) ( ) ( | )N N N N NI X Y I X Y H X H X Y

)|()( NNN XYHYH ( | )

( ) log( )N NX Y

P y xP xy

P y

NN YXyx ,,

N Nr

k

s

h h

khhk p

pp

1 1 )(

)|(log)(

)...(),...(11 NN hhhkkk bbaa


example 1.2.5

3. The properties

Theorem1.11 If the components(X1, X2,…,XN) of XN areindependent, then

1

( ; ) ( ; )N

i ii

I X Y I X Y



3. The properties

Theorem1.12 If XN =(X1, X2,…,XN) and YN =(Y1, Y2,…,YN) are random vectors and the channel is memoryless,

that is

N

iiiNN xyPxxyyP

111 )|(),...,|,...,(

then

1

( ; ) ( ; )N

i ii

I X Y I X Y



example 1.2.6

Let X1,X2,…,X5 be independent identically distributedrandom variables with common entropy H. Also let Tbe a permutation of the set {1, 2,3,4,5}, and let Yi = XT(i)

1 2 3 4 5

3 2 5 1 4

Show that ,);(5

1i

ii YXI ).;( 55 YXI


Review

Keywords:

Measure of information

vector Extented source

joint entropy

Extented channelstationary source

(per symbol) entropy

conditional entropy entropy rate

Review

chain rule for entropy

Conclusion:

Independence bound on entropy

conditioning reduces entropy

properties of ( ; )I X Y

Homework

1. P47: T1.23,2. P47: T1.24,

1 2 -1

n-1

ii=1

3. Let , , , be i.i.d. random variables taking values

in {0,1}, with { 1} 1/ 2. Let 1 if X is odd

and 0 otherwise. Let 3.

(1) Show that and are independent, for , ,

n

i n

n

i j

X X X

Pr X X

X n

X X i j i j

1 2 1

{1,2, , };

(2) Find ( ), for ;

(3) Find ( ), is this equal to ( )?

i j

n

n

H X X i j

H X X X nH X

2 1

1

( | )1

( )

H X X

H X

4.Let X1, X2 be identically distributed random variables. Let be:

1 ） show that 0 1

2 ） when 0? 3 ） when 1?

Homework

5. Shuffles increase entropy. Argue that for any distribution on shuffles T and any distribution on card positions X that

H(TX) ≥ H(TX|T) , if X and T are independent.

Homework

Thinking :

Documents

§1.1 Discrete random variables §1.2 Discrete random vectors §1.1 Discrete random variables §1.2 Discrete random vectors §1 Entropy and mutual information