§1.1 Discrete random variables §1.2 Discrete random vectors §1.1 Discrete random variables §1.2...

§1.1 Discrete random variables

§1.2 Discrete random vectors

§1 Entropy and mutual information

§1.1.1 Discrete memoryless source and entropy§1.1.2 Discrete memoryless channel

and mutual information

§1.1.1 Discrete memoryless source and entropy

Example 1.1.1

Let X represent the outcome of a single roll of a fair die.

6/16/16/16/16/16/1

654321

1)(,)(,),(),(

)( 121

aPaPaP

1. DMS (Discrete memoryless source )

Probability Space:

2. self information

Example 1.1.2

1 2 1 2 3 4

( ) 0 5 0 5 ( ) 0 25 0 25 0 25 0 25

X a a Y a a a a,

P x . . P y . . . .

red white red white blue black

Analyse the uncertainty of red ball selected from X and from Y.

2. self information

I(ai) = f [p(ai)]Satisfy:

1) I(ai) is the monotone decreasing function of p(ai):

if p(a1)> p(a2), then I(a1) < I(a2) ；2) if p(ai)＝ 1, then I(ai)＝ 0；3) if p(ai)＝ 0 , then I(ai)→∞；4) if p(ai aj)＝ p(ai) p(aj)， then I(aiaj)=I(ai)+I(aj)

self information 1

( ) log log ( )( )i r r i

I a p ap a

p(ai)0 1

0)()(1 ji apap )()( ji aIaI

0)( iap )( iaI

1)( iap 0)( iaI

)()()( bIaIabI a and b are statistically independent

Remark:

The measure of uncertainty of the random variable ai

The measure of information the random variable ai provides.

bitnat

3. Entropy

( ) [ ( )] ( ) log ( )q

i i ii

H X E I a p a p a

Definition:Suppose X is a discrete random variable, whose

range R={a1,a2,…} is finite or countable.

Let p(ai)=P{X=ai}. The entropy of X is defined by

A measure ofamount of information provided by X.

uncertainty (or randomness) about X.

average

Entropy-the amount of “information”provided by an observation of X

Example 1.1.3 100 balls in a bag, 80% is red, and remain is white. Now , we fetch out a ball. How about the information of every fetching?

2.08.0)(21 aa

aInaIn

II N )()( 2211

)()()()( 2211 aIapaIap

)(log)(i

ii apap),( 11 aNpn

)( 22 aNpn =0.722 bit/sig

( ) (0.8,0.2)

0.722 /

bit sig

Entropy-the “uncertainty” or “randomness” about X

Example 1.1.4

01.099.0)(

,3.07.0)(

,5.05.0)(

)/(08.0)( 3 sigbitXH

)/(1)( 1 sigbitXH

)/(88.0)( 2 sigbitXH

average

3. Entropy

( ) [ ( )] ( ) log ( )q

i i ii

H X E I a p a p a

1) units: bit/sig,nat/sig,hart/sig

2) If p(ai)=0, p(ai)log p(ai)-1 = 0

3) If R is infinite , H(X) may be +

Example 1.1.5 entropy of BS

X 10 10 p

)1log()1(log)( ppppXH

3. Entropy

1( , , ..., ) log

q ii i

H p p p pp

( )H p

entropy function

probability vector

( )H p

4. The properties of entropy

Theorem1.1 Let X assume values in R={x1,x2,…,xr}.

0)( XH1)

2) H(X) = 0 iff pi = 1 for some i

3) H(X) ≤ logr ,with equality iff pi = 1/r for all i

——base of data compressing

Proof:

(Theorem 1.1 in textbook)

Lemma: 0, log ( 1) log , ln 1.

with equality iff 1.

x x x e or x x

4) ),...,,(),...,,(2121 riiir pppHpppH

321 aaa

321 bbb

Example 1.1.6 Let X,Y,Z are all discrete random variables:

1.46 ( / )H bit sig

5) If X,Y are independent , then H(XY) = H(X) + H(Y)

Proof:

( ) ( )

a a aX

P x p a

( ) 1ii

( )( )r

b b bY

p bP y

( ) 1jj

1( ) ( ) log

H Y p bp b

1( ) ( ) log

H X p ap a

Proof:

( ) ( )

a a aX

P x p a

( ) 1i

p a 1 2 ,( )( )

b b bY

p bP y

( ) 1j

)()()()(

),(),(

bpapbapP

1)()(1 11

kk baPP

k PPXYH

1 kk )(

1log)()(

jbapji ji

bap1 1

)(1log)(

bpjij i

apij jibpapapbp )(

1 log)()(log)()(

Joint source:

Theorem1.2 The entropy function H(p1,p2,…,pr) is a convex function of probability vector (p1,p2,…,pr) .

6) Convex properties

0 1/2 1

)1log()1(log)(

ppppXH

Example 1.1.5 (continued)entropy of BS

5. conditional entropy

X, Y are a pair of random variables, if (X,Y)~p(x,y)

Then the conditional entropy of X , given Y is defined by

Definition:

1 1( | ) log ( , ) log

( | ) ( | )X Y

H X Y E p x yp x y p x y

yxpyYXH)|(

1log)|()|(

yYXHypYXH )|()()|(

Y X yxp

yxpyp)|(

1log)|()(

YX yxp

yxp, )|(

1log),(

Analyse:

Example 1.1.7

pX(0)=2/3, pX(1) = 1/3

H(X)=?

H(X|Y=0) = ?

H(X|Y=?) = ?

H(X|Y) = ?

H(X) = H(2/3,1/3)=0.9183 bit/sig

H(X|Y=0) = 0

H(X|Y=1) = 0

H(X|Y) = 1/3 bit/sig

H(X|Y=?) = H(1/2,1/2)=1 bit/sig

Theorem1.3

with equality iff X and Y are independent.

( | ) ( )H X Y H X

Proof:

(conditioning reduces entropy)

Review

KeyWords: Measure of information

self information

entropy

properties of entropy

conditional entropy

Homework

1. P44: T1.1,2. P44: T1.4,3. P44: T1.6,

4. Let X be a random variable taking on a finite number of values. What is the relationship

of H(X) or H(Y) if (1) Y=2X ? (2) Y=cosX ?

Homework

5. Let be an ensemble of points , ,

and let ( )= , prove that

1 1( ) log +(1- ) log +(1- ) ( )

1where Y is an ensemble of -1 points , ,

with probabilities ( ) ( ) /(1- );

Y j X j

X M a a

H X H Y

P a P a

. Prove that

1 1( ) log +(1- ) log +(1- )log( -1)

1and determine condition for equality.

Homework

6. Given a chessboard with 8×8=64 squares. A chessman is put randomly in a square. Guess the location of the chessman. Find the uncertainty of the result.

if we mark every square by its row and column number, and already know the row number of the chessman, how about the uncertainty?

Coin flip. A fair coin is flipped until the first head occurs. Let X denote the number of flips required. Find the entropy H(X) in bits.

thinking ：

1 )1( r

......321

)( 32 n

Homework

Imply:

§1.1.1 Discrete memoryless source and entropy§1.1.2 Discrete memoryless channel

and mutual information

Channel1 2

{ , , , }i

X a a a

{ , , , }

Y b b b

p(y∣x)

or p b a

§1.1.2 Discrete memoryless channel and mutual information

1. DMC (Discrete Memoryless Channel)

The model of DMC

p(y|x)

r input symbols, s output symbols

representation of DMC

x yp(y|x)

0)|( xyp for all x,y

xyp 1)|( for all x

transition probabilities

matrix

11 12 11 1 2 1 1

21 22 21 2 2 2 2

1 21 2

( ) ( ) ( )

r r rsr r s r

p p pp b a p b a p b a

p p pp b a p b a p b aP

p p pp b a p b a p b a

transition probabilities matrix

formula

( ) ( )j ip y x p b a

( )( )

a a aX

p aP x

( )( )

b b bY

p bP y

( ) 1s

Example 1.1.8: BSC (Binary Symmetric Channel)

r = s = 2

1p(0|0) = p(1|1) = 1-p

p(0|1) = p(1|0) = p

Example 1.1.9: BEC (Binary Erasure Channel)

0 1 ？ 1 1 ？？

0 1 0 ？ 1 ？ 1 ？

Example 1.1.9: BEC (Binary Erasure Channel)

r = 2, s = 3

p(0|0) = p, p(?|0) = 1-p

p(1|1) = q, p(?|1) = 1-q

2. average mutual information

definition

I(X;Y) = H(X) – H(X|Y)

Channel1 2

{ , , , }i

X a a a

{ , , , }

Y b b b

p(y∣x) or p(ai|bj)

( ) 1s

H(X) H(X|Y)

entropy equivocation

average mutual information

The reduction in uncertainty about X conveyed by the observations Y;

The information about X from Y.

definition

X XY yxp

1log),(

1log)(

XY XY yxP

yxP)|(

1log),(

I(X;Y) = H(X) – H(X|Y)

( | )( , ) log

p x yp x y

( , )( , ) log

( ) ( )XY

p x yp x y

p x p y

( | )( , ) log

p y xp x y

definition

I(X;Y) and I(x;y)

( | )( , ) log

P x yI x y

I(X;Y) ＝ EXY[I(x;y)]

mutual information

I(X;Y) and H(X)

properties

1) Non-negativity of average mutual information

Theorem1.4 For any discrete random variables X and Y,

0);( YXI .Moreover I(X;Y) = 0 iff X and Y are independent.

2. Average mutual information

Proof:(Theorem 1.3 in textbook)

We do not expect to be misled on average by observing the output of channel.

properties

X Y’ Y

listener-in

S encrypt

channel decrypt D

total loss

message ： arrive at four

ciphertext ： duulyh dw irxu

A cryptosystem

I(X;Y) = I(Y;X)

I(X;Y) = H(Y) – H(Y|X)

I(X;Y) = H(X) – H(X|Y)

I(X;Y) = H(X) + H(Y) – H(XY)

Joint entropy

3) relationship between entropy and average mutual information

2) symmetry

Mnemonic Venn diagram H(X)

H(Y∣X)H(X∣Y)

I(X;Y)

properties

Recognising channel

properties

4) Convex property

I(X;Y)=f [P(x) ， P(y|x)]

( | )( ; ) ( ) log

( ) ( ) ( | ) ( )

( ) ( | ) ( )

P x yI X Y P xy

P y P xy P y x P x

P xy P y x P x

properties

4) Convex properties

Theorem1.5 I(X;Y) is a convex function of the input probabilities P(x).

I(X;Y)=f [P(x) ， P(y|x)]

Theorem1.6 I(X;Y) is a convex function of the transitionprobabilities P(y|x).

properties

Example 1.1.10 analyse the I(X;Y) of BSC

source ：， channel ：

1－ p

( | )( ; ) ( ) log

( )X Y

P x yI X Y P xy

( | )( ) log ( ) ( | )

( )X Y

P y xP xy H Y H Y X

1 1 1( | ) ( ) log log log ( )

( | )X Y

H Y X p xy p p H pp y x p p

( ) ( 2 )H Y H p p

( ; ) ( 2 ) ( )I X Y H p p H p

Review

KeyWords: Channel and it’s information measure

channel model

equivocation

average mutual information

mutual information

properties of average mutual information

Thinking

( ; ) 0I X Y ( ; ) 0Cov X Y

: ( ; ) ( ) ( )comparing I X Y H X H Y，，

1 2 3( )Tb b b b

Let the source have alphabet A={0,1} with p0=p1=0.5.

Let encoder C have alphabet B={0,1,… ,7}and let the elements of B have binary representation

( 0) 5 (101)b t

The encoder is shown below. Find the entropy of the coded output and find the output sequence if the input sequence is a(t)={101001011000001100111011} and the initial contents of the registers are

Example 1.1.11

D Q D QD Qa(t)

b0 b1 b2

Yt Yt+1

a(t)={101001011000001100111011}

b = {001242425124366675013666}

Homework

1. P45: T1.10,

2. P46: T1.19(except c)

3. Let the DMS 0 1

( ) 0.6 0.4

conveys message through a channel:

Calculate that:(1) H(X) and H(Y);(2) the mutual information of xi and yj (i,j=1,2);(3) the equivocation H(X|Y) and average mutual information.

5 1(0 | 0) (1 | 0) 6 6(0 |1) (1 |1) 3 1

Homework

4. Suppose that I(X;Y)=0.Does this imply that I(X;Z)=I(X;Z|Y)?

5. In a joint ensemble XY, the mutual information I(x;y) is a random variable. In this problem we are concerned with the variance of that random variable, VAR[I(x;y)].

(1) Prove that VAR[I(x;y)]=0 iff there is a constant αsuch that, for all x,y with P(xy)>0,

P(xy)= αP(x) P(y)(2) Express I(X;Y) in term of α and interpret the special

case α =1. (continued)

Homework

5. (3) for each of the channel in fig5 , find a probability

assignment P(x) such that I(X;Y) >0 and VAR[I(x;y)]=0 . Calculate I(X;Y).

§1.2.1 Extended source and joint entropy§1.2.2 Extended channel and mutual information

§1.2.1 Extended source and joint entropy

1. Extended source

Source model

N-times extended

source],,[

1 1 1 1

( ), , ( )

( ) ( )( )

Nq q qq

i i i i

a a a a a aX

p p a a aP x

Example 1.2.1

2. Joint entropy

Definition:

The joint entropy H(XY) of a pair of discrete random

variables (X,Y) with a joint distribution p(x,y) is defined as ( ) ( , ) log ( , )

x X y Y

H XY p x y p x y

which can also be expressed as

( ) [log ( , )]H XY E p x y

2. Joint entropy Extended DMS

N ppXHXH1

)(log)()()(

)...(log)...(...21

21 1 1NN

aaapaaap

Napapapap

1 11 1

11)()...()(log)(

NNNapapapap

1 11 1

)()...()(log)(...

)()(...)( XNHXHXH

2. Joint entropy memory source

2） Joint entropy

jjiji aapaapXXH

1 121 )(log)()(

1)( 212 XXHXH

bit/sig

1） Conditional entropy

2 11 1

( | ) ( ) log ( | )q q

i j j ii j

H X X p a a p a a

3） (per symbol) entropy

3. Properties of joint entropy

Theorem1.7 (Chain rule) :

H(XY) = H(X) + H(Y|X)

Proof: ( ) ( , ) log ( , )x X y Y

H XY p x y p x y

xypxpyxp )|()(log),(

Xx YyXx Yy

xypyxpxpxypxp )|(log),()(log)|()(

Xx YyXx

xypyxpxpxp )|(log),()(log)(

)|()( XYHXH

Let X be a random variable, its probability space is ：

Its joint probability )( jiaaP

1/4 1/4 0

Example 1.2.3

H(X)=?

P(aj| ai) aj=0 aj =1 aj =2 ai=0 1/2 1/2 0 ai=1 3/4 1/8 1/8 ai=2 0 1/4 3/4

Relationship

)|()()( 12121 XXHXHXXH

)( 21 XXH )(2 XH

)|( 12 XXH)(XH

H(X2) ≥ H(X2|X1)

H(X1X2) ≤ 2H(X1)

2( ) ( )H X H X

General stationary source

)(...)()(

apapap

),...()...(2121 NiiiN aaapXXXP },...,2,1{,...,, 21 qiii N

Let X1,X2,…,XN be dependent , the joint probability is ：

)...|()...|()()...( 1211211 NNN xxxxPxxPxPxxP

• Joint entropy

iiiiiiN aaapaaapXXXH...

2121)...(log)...()...(

)...(1

)( 21 NN XXXHN

Definition of entropies

• conditional entropy

)...|(log)...()...|(121

N iiiiii

iiNN aaaapaapXXXH

• (per symbol) entropy

Theorem1.8 (Chain rule for entropy):

Let X1,X2,…,Xn be drawn according to p(x1,x2,…,xn). Then

1 2 1 11

( ... ) ( | ... )n

n i ii

H X X X H X X X

Proof (do it by yourself)

)...|()...|( 2211121 NNNN XXXXHXXXXH

)...|()( 11 NNN XXXHXH

)()( 1 XHXH NN

)...|(lim)(lim 121 NNN

XXXXHXHH

Relation of entropies

——base of data compressing

If H(X)<∞, then:

entropy rate

Theorem1.9 (Independence bound on entropy):

Let X1,X2,…, Xn be drawn according to p(x1,x2,…,xn). Then

( ... ) ( )n

H X X X H X

with equality iff the Xi are independent

(P37(corollary) in textbook)

Example 1.2.4

Suppose a memoryless source with A={0,1} having equal probabilities emits a sequence of six symbols. Following the sixth symbol, suppose a seventh symbol is transmitted which is the sum modulo 2 of the six previous symbols. What is the entropy of the seven-symbol sequence?

§1.2.1 Extended source and joint entropy§1.2.2 Extended channel and mutual information

1. The model of extended channel

source encoder

channel decoder

(U1,U2,…,Uk) (X1,X2,…,XN)

(Y1,Y2,…,YN)(V1,V2,…,Vk)

A general communication system

§1.2.2 Extended channel and mutual information

Extended channel

1 2( ) NNX X X X X

1 2( ) N

NY Y Y Y Y

1 2{ , , , }i i rX x X a a a 1 2{ , , , }j j sY y Y b b b ( )P y x

1 2 1 2

( ) ( )

P y x P y y y x x x

1 1 1 1 1 1

2 1 1 2 1 1 1 2

( ) ( )

( ) ( | ) ( )

( ) ( )N N

r r s sr s

a a b b

a a a p b b b

a a b b

( ; ) ( ; ) ( ) ( | )N N N N NI X Y I X Y H X H X Y

)|()( NNN XYHYH ( | )

( ) log( )N NX Y

P y xP xy

NN YXyx ,,

khhk p

1 1 )(

)|(log)(

)...(),...(11 NN hhhkkk bbaa

example 1.2.5

3. The properties

Theorem1.11 If the components(X1, X2,…,XN) of XN areindependent, then

( ; ) ( ; )N

I X Y I X Y

3. The properties

Theorem1.12 If XN =(X1, X2,…,XN) and YN =(Y1, Y2,…,YN) are random vectors and the channel is memoryless,

that is

iiiNN xyPxxyyP

111 )|(),...,|,...,(

( ; ) ( ; )N

I X Y I X Y

example 1.2.6

Let X1,X2,…,X5 be independent identically distributedrandom variables with common entropy H. Also let Tbe a permutation of the set {1, 2,3,4,5}, and let Yi = XT(i)

1 2 3 4 5

3 2 5 1 4

Show that ,);(5

ii YXI ).;( 55 YXI

Review

Keywords:

Measure of information

vector Extented source

joint entropy

Extented channelstationary source

(per symbol) entropy

conditional entropy entropy rate

Review

chain rule for entropy

Conclusion:

Independence bound on entropy

conditioning reduces entropy

properties of ( ; )I X Y

Homework

1. P47: T1.23,2. P47: T1.24,

1 2 -1

3. Let , , , be i.i.d. random variables taking values

in {0,1}, with { 1} 1/ 2. Let 1 if X is odd

and 0 otherwise. Let 3.

(1) Show that and are independent, for , ,

Pr X X

X X i j i j

{1,2, , };

(2) Find ( ), for ;

(3) Find ( ), is this equal to ( )?

H X X i j

H X X X nH X

( | )1

4.Let X1, X2 be identically distributed random variables. Let be:

1 ） show that 0 1

2 ） when 0? 3 ） when 1?

Homework

5. Shuffles increase entropy. Argue that for any distribution on shuffles T and any distribution on card positions X that

H(TX) ≥ H(TX|T) , if X and T are independent.

Homework

Thinking :

§1.1 Discrete random variables §1.2 Discrete random vectors §1.1 Discrete random variables §1.2...

Documents

2. Matrix Algebra and Random Vectors

Chapter 4 : Discrete Random Variablesnehemyl/files/UW_MATH_STAT394... · Chapter 4 : Discrete Random Variables 1 Random variables Example.Weselectrandomlythreepeoplethatattendatennismatchbetween

Discrete random variables - WordPress.com

Discrete Random Variables - University of Saskatchewanmath.usask.ca/~laverty/S241/S241 Lectures PDF/07 S241... · 2011-05-11 · Discrete random variables For a discrete random variable

Discrete Random Variables.ppt

Discrete Random Variables. Discrete random variables For a discrete random variable X the probability distribution is described by the probability function,

Random Variable & Discrete Distribution

ReviewCh3 Discrete Random Variables

Further Discrete Random Variables

Discrete R.V.4-1 Chapter 4 Distribution Functions and Discrete Random Variables 4.1 Random Variables 4.2 Distribution Functions 4.3 Discrete Random Variables

Discrete Random Variables – Outline

Discrete Random Variables

TOPIC 10 Discrete (Categorical) Data Analysis. Discrete Random Variables Recall that discrete random variables may take only discrete values. For example,

Chapter 3 – Discrete Random Variables and Probability ... · Variables and Probability Distributions. Outline – Random variables. Outline – Discrete randomDiscrete random

DISCRETE DIFFERENTIAL GEOMETRYbrickisland.net/DDGSpring2016/wp-content/uploads/2016/02/DDG_CMUSpring... · Discrete Tangent Vectors? How do we deﬁne tangent vectors for a discrete

GAUSSIAN RANDOM VECTORS AND PROCESSES

C4: DISCRETE RANDOM VARIABLES

discrete and random variables

6.1 Discrete Random Variables

Discrete Random Variables Chapter Discrete Random … · Discrete Random Variables 4.2 A discrete variable can assume a countable number of values while a continuous random variable