37
Outline Probability basics Information theory basics Additional properties of random variables Review of Statistics and Information Theory Basics Lecture 2, COMP / ELEC / STAT 502 c E. Mer´ enyi January 12, 2012 Lecture 2, COMP / ELEC / STAT 502 c E. Mer´ enyi Review of Statistics and Information Theory Basics

L2 Review Of Basics

Embed Size (px)

DESCRIPTION

Review of Statistics and Information Theory BasicsLecture 2, COMP / ELEC / STAT 502 ⃝c E. Mer ́enyiJanuary 12, 2012

Citation preview

  • OutlineProbability basics

    Information theory basicsAdditional properties of random variables

    Review of Statistics and Information TheoryBasics

    Lecture 2, COMP / ELEC / STAT 502c E. Merenyi

    January 12, 2012

    Lecture 2, COMP / ELEC / STAT 502 c E. Merenyi Review of Statistics and Information Theory Basics

  • OutlineProbability basics

    Information theory basicsAdditional properties of random variables

    Outline

    Probability basicsRandom variables, CDF, pdf, momentsCovariance, joint CDF and pdf

    Information theory basicsInformation, entropy, differential entropyMaximum Entropy principleFurther entropy definitionsMutual information and relation to entropy

    Additional properties of random variables

    Lecture 2, COMP / ELEC / STAT 502 c E. Merenyi Review of Statistics and Information Theory Basics

  • OutlineProbability basics

    Information theory basicsAdditional properties of random variables

    Random variables, CDF, pdf, momentsCovariance, joint CDF and pdf

    Outline

    Probability basicsRandom variables, CDF, pdf, momentsCovariance, joint CDF and pdf

    Information theory basicsInformation, entropy, differential entropyMaximum Entropy principleFurther entropy definitionsMutual information and relation to entropy

    Additional properties of random variables

    Lecture 2, COMP / ELEC / STAT 502 c E. Merenyi Review of Statistics and Information Theory Basics

  • OutlineProbability basics

    Information theory basicsAdditional properties of random variables

    Random variables, CDF, pdf, momentsCovariance, joint CDF and pdf

    CDF of random variable

    Let , be continuous scalar random variables (r.v.-s).

    (Cumulative) probability distribution function (CDF)

    F(x) = Pr( x) = x

    f(u)du

    and

    F(y) = Pr( y) = y

    f(v)dv

    (2.1)

    where

    F (x) = f(x) and F(y) = f(y) (2.2)

    and exist almost everywhere for continuous r.v.-s. by definition.F(x) is monotoniously non-decreasing; right-continuous, andF() = 0, F() = 1.

    Lecture 2, COMP / ELEC / STAT 502 c E. Merenyi Review of Statistics and Information Theory Basics

  • OutlineProbability basics

    Information theory basicsAdditional properties of random variables

    Random variables, CDF, pdf, momentsCovariance, joint CDF and pdf

    pdf of random variable

    Probability density function (pdf)f(x) and f(y) are the pdfs of and , respectively. Then

    0 f(x) and 0 f(y)

    f(x)dx = 1 and

    f(x)dx = 1(2.3)

    and f(x) and f(y) are continuous almost everywhere.

    Interpretation of f(x) as probability:

    f(x) = limh0

    F(x + h/2) F(x h/2)h

    = limh0

    P [ x + h/2] P [ x h/2]h

    = limh0

    P [x h/2) < x + h/2]h

    (2.4)

    The larger f(x) the larger the probability that falls close to x .

    Lecture 2, COMP / ELEC / STAT 502 c E. Merenyi Review of Statistics and Information Theory Basics

  • OutlineProbability basics

    Information theory basicsAdditional properties of random variables

    Random variables, CDF, pdf, momentsCovariance, joint CDF and pdf

    Moments of random variable

    The kth moment M(k) of is defined as the expected value of the kth

    power of the variable:

    M(k) = E [

    k ] =

    xkdF(x) =

    xk f(x)dx (2.5)

    The kth central moment, m(k) is defined similarly except for a shift to

    obtain zero mean (m(1) ).

    m(k) =

    (x E [])kdF(x) =

    (x E [])k f(x)dx (2.6)

    Lecture 2, COMP / ELEC / STAT 502 c E. Merenyi Review of Statistics and Information Theory Basics

  • OutlineProbability basics

    Information theory basicsAdditional properties of random variables

    Random variables, CDF, pdf, momentsCovariance, joint CDF and pdf

    Moments of random variable

    Quantities of interest related to the first four moments, as estimatedfrom a sample, are

    Mean = M(1) =

    1

    n

    ni=1

    i (2.7)

    Variance = m(2)

    =1

    n 1n

    i=1

    (i M(1) )2 (2.8)

    Standard Deviation (STD) =Variance (2.9)

    Lecture 2, COMP / ELEC / STAT 502 c E. Merenyi Review of Statistics and Information Theory Basics

  • OutlineProbability basics

    Information theory basicsAdditional properties of random variables

    Random variables, CDF, pdf, momentsCovariance, joint CDF and pdf

    Moments of random variable

    Skewness =1

    n

    ni=1

    (i M(1)STD

    )3= m

    (2)

    3/2 m(3) (2.10)

    (Normalized) Kurtosis =1

    n

    ni=1

    (i M(1)STD

    )4 3

    = m(2)

    2 m(4) 3 (2.11)

    Lecture 2, COMP / ELEC / STAT 502 c E. Merenyi Review of Statistics and Information Theory Basics

  • OutlineProbability basics

    Information theory basicsAdditional properties of random variables

    Random variables, CDF, pdf, momentsCovariance, joint CDF and pdf

    Moments of random variable

    Kurtosis is often used as a quantitative measure of non-Gaussianity. Ifkurt() denotes the Kurtosis of ,

    kurt() =

    < 0 for subgaussian or platykurtic distribution

    = 0 for gaussian or mesokurtic distribution

    > 0 for supergaussian or leptokurtic distribution

    (2.12)

    Lecture 2, COMP / ELEC / STAT 502 c E. Merenyi Review of Statistics and Information Theory Basics

  • OutlineProbability basics

    Information theory basicsAdditional properties of random variables

    Random variables, CDF, pdf, momentsCovariance, joint CDF and pdf

    Covariance of random variables

    The covariance of random variables and is defined as

    c = E [( M(1) )( M(1) )] (2.13)or in general, for the elements xi of random vector x

    cij = E [(xi M(1)xi )(xj M(1)xj )] or (2.14)

    Cx = E [(xM(1)x )(xM(1)x )T ] (2.15)

    Similarly, the cross-covariance of random vectors x, y is

    Cxy = E [(xM(1)x )(y M(1)y )T ] (2.16)

    Lecture 2, COMP / ELEC / STAT 502 c E. Merenyi Review of Statistics and Information Theory Basics

  • OutlineProbability basics

    Information theory basicsAdditional properties of random variables

    Random variables, CDF, pdf, momentsCovariance, joint CDF and pdf

    Joint CDF and pdf of random variables

    Joint CDF and pdf of (continuous) random variables and

    F,(x , y) = Pr( x y) (2.17)

    F,(x , y) =

    x

    y

    f,(u, v)dudv (2.18)

    where

    f,(x , y) =2

    xyF,(x , y) (2.19)

    exists almost everywhere, and it is the joint pdf of and . Properties ofF,(x , y) and f,(x , y) are analogous to eqs (2.1)(2.3).

    Lecture 2, COMP / ELEC / STAT 502 c E. Merenyi Review of Statistics and Information Theory Basics

  • OutlineProbability basics

    Information theory basicsAdditional properties of random variables

    Random variables, CDF, pdf, momentsCovariance, joint CDF and pdf

    Marginal distributions

    F(x) = F,(x ,) = x

    f,(u, v)dudv (2.20)

    F(y) = F,(, y) =

    y

    f,(u, v)dudv (2.21)

    Lecture 2, COMP / ELEC / STAT 502 c E. Merenyi Review of Statistics and Information Theory Basics

  • OutlineProbability basics

    Information theory basicsAdditional properties of random variables

    Random variables, CDF, pdf, momentsCovariance, joint CDF and pdf

    Statistical independence

    A pair of joint random variables and are statistically independent ifftheir joint CDF factorizes into the product of the marginal CDFs:

    F,(x , y) = F(x) F(y) (2.22)For two jointly continuous random variables and , equivalently,

    f,(x , y) = f(x) f(y) (2.23)

    F,(x , y) =

    x

    y

    f,(u, v)dudv

    =

    x

    f(u)du

    y

    f(v)dv

    = F(x) F(y) (2.24)(Prove the other direction at home.)

    Lecture 2, COMP / ELEC / STAT 502 c E. Merenyi Review of Statistics and Information Theory Basics

  • OutlineProbability basics

    Information theory basicsAdditional properties of random variables

    Random variables, CDF, pdf, momentsCovariance, joint CDF and pdf

    Conditional probability

    Conditional probability

    F|(x |y) = F,(x , y)/F(y) (2.25)f|(x |y) = f,(x , y)/f(y) (2.26)

    Obviously, F|(x |y) = F(x) iff and are independent.Equivalently, f|(x |y) = f(x) iff and are independent.

    f(x) =

    f|(x |y)f(y)dy

    E [|] =

    x f|(x |y)dx

    Lecture 2, COMP / ELEC / STAT 502 c E. Merenyi Review of Statistics and Information Theory Basics

  • OutlineProbability basics

    Information theory basicsAdditional properties of random variables

    Information, entropy, differential entropyMaximum Entropy principleFurther entropy definitionsMutual information and relation to entropy

    Outline

    Probability basicsRandom variables, CDF, pdf, momentsCovariance, joint CDF and pdf

    Information theory basicsInformation, entropy, differential entropyMaximum Entropy principleFurther entropy definitionsMutual information and relation to entropy

    Additional properties of random variables

    Lecture 2, COMP / ELEC / STAT 502 c E. Merenyi Review of Statistics and Information Theory Basics

  • OutlineProbability basics

    Information theory basicsAdditional properties of random variables

    Information, entropy, differential entropyMaximum Entropy principleFurther entropy definitionsMutual information and relation to entropy

    Information

    Let X be a discrete random variable, taking its values fromH = {x1, x2, . . . , xn}. Let pi be the probability of xi , pi 0 andn

    i=1 pi = 1. The amount of information in observing the event X = xiwas defined by Shannon as

    I (xi ) = loga(pi ) (2.27)

    If a =

    2 then I is given in bits,

    e then I is given in nats,

    10 then I is given in digits.

    (2.28)

    Lecture 2, COMP / ELEC / STAT 502 c E. Merenyi Review of Statistics and Information Theory Basics

  • OutlineProbability basics

    Information theory basicsAdditional properties of random variables

    Information, entropy, differential entropyMaximum Entropy principleFurther entropy definitionsMutual information and relation to entropy

    Shannon entropy

    The expected value of I (xi ) over the entire set of events H is the entropyof X :

    H(X ) = n

    i=1

    pi log(pi ) (2.29)

    (Note that X in H(X ) is not an argument of a function but the label ofthe random variable.) The convention 0 log0 = 0 is used. The entropyis the average amount of information gained by observing one event fromall possible events for the random variable X (the average informationper message), the average length of the shortest description of a randomvariable, the amount of uncertainty, a measure of the randomness of X .

    Lecture 2, COMP / ELEC / STAT 502 c E. Merenyi Review of Statistics and Information Theory Basics

  • OutlineProbability basics

    Information theory basicsAdditional properties of random variables

    Information, entropy, differential entropyMaximum Entropy principleFurther entropy definitionsMutual information and relation to entropy

    Shannon entropy

    Some basic properties of H(X ):

    H(X ) = 0 iff pi = 1 for some i and pk = 0 for k 6= i . (2.30)

    (If an event occurs with probability 1 then there is no uncertainty.)

    0 H(X ) log |H| with equality iff pi = 1/|H| = 1/n for all i . (2.31)

    (Uniform distribution carries maximum uncertainty.)

    Lecture 2, COMP / ELEC / STAT 502 c E. Merenyi Review of Statistics and Information Theory Basics

  • OutlineProbability basics

    Information theory basicsAdditional properties of random variables

    Information, entropy, differential entropyMaximum Entropy principleFurther entropy definitionsMutual information and relation to entropy

    Differential entropy

    The differential entropy of a continuous random variable.A quantity analogous to the entropy of a discrete random variable can bedefined for continuous random variables.

    Reminder: The random variable is continuous if its distributionfunction, F(x) is continuous.

    Definition: Let f(x) = F(x) be the pdf of . The set S : {x |f(x) > 0}

    is called the support set of . The differential entropy of is defined as

    h() = S

    f(x)logf(x)dx = E [logf(x)] (2.32)

    where S is the support set of . The lower case letter h is used todistinguish differential entropy from the discrete (or absolute) entropy H.

    Lecture 2, COMP / ELEC / STAT 502 c E. Merenyi Review of Statistics and Information Theory Basics

  • OutlineProbability basics

    Information theory basicsAdditional properties of random variables

    Information, entropy, differential entropyMaximum Entropy principleFurther entropy definitionsMutual information and relation to entropy

    Discrete vs differential entropy

    The differential entropy of a continuous variable can be derived as alimiting case for the discrete entropy of the quantized version of . (See,for example, Cover and Thomas, p. 228.) If = 2n is the bin size ofthe quantization, and the quantized version of is denoted by then

    H() + log h() as n ( 0) (2.33)

    That is, the absolute entropy of an n-bit quantization of a continuousrandom variable is approximately h() log2n = h() + n. It followsthat the absolute entropy of a continuous (non-quantized) variable is .

    Lecture 2, COMP / ELEC / STAT 502 c E. Merenyi Review of Statistics and Information Theory Basics

  • OutlineProbability basics

    Information theory basicsAdditional properties of random variables

    Information, entropy, differential entropyMaximum Entropy principleFurther entropy definitionsMutual information and relation to entropy

    The Maximum Entropy principle

    Consider the following: suppose we have a stochastic system with a setof known states but unknown probabilities, and we also know someconstraints on the distribution of the states. How can we choose theprobability model that is optimum in some sense among the possiblyinfinite number of models that may satisfy the constraints?

    The Maximum Entropy principle (due to Jaynes, 1957) states thatWhen an inference is made on the basis of incomplete information, itshould be drawn from the probability ditribution that maximizes theentropy, subject to constraints on the distribution.

    Lecture 2, COMP / ELEC / STAT 502 c E. Merenyi Review of Statistics and Information Theory Basics

  • OutlineProbability basics

    Information theory basicsAdditional properties of random variables

    Information, entropy, differential entropyMaximum Entropy principleFurther entropy definitionsMutual information and relation to entropy

    Finding the Maximum Entropy distribution

    Solving this problem, in general, involves the maximization of thedifferential entropy

    h() =

    f(x)logf(x)dx (2.34)

    over all pdfs f(x) of the r.v. , subject to the following constraints:

    f(x) 0 (2.35)

    f(x)dx = 1 (2.36)

    f(x)gi (x)dx = i for i = 2, . . . ,m (2.37)

    where f(x) = 0 holds outside of the support set of , and i are theprior knowledge about .

    Lecture 2, COMP / ELEC / STAT 502 c E. Merenyi Review of Statistics and Information Theory Basics

  • OutlineProbability basics

    Information theory basicsAdditional properties of random variables

    Information, entropy, differential entropyMaximum Entropy principleFurther entropy definitionsMutual information and relation to entropy

    Finding the Maximum Entropy distribution

    Using the Lagrange multipliers i (i = 1, 2, . . . ,m) to formulate theobjective function

    J(f ) =

    [f(x)logf(x) + 1f(x) +

    mi=2

    igi (x)f(x)

    ]dx (2.38)

    Differentiation of the integrand with respect to f and setting the resultsto 0 produces

    1 logf(x) + 1 +mi=2

    igi (x) = 0 (2.39)

    Lecture 2, COMP / ELEC / STAT 502 c E. Merenyi Review of Statistics and Information Theory Basics

  • OutlineProbability basics

    Information theory basicsAdditional properties of random variables

    Information, entropy, differential entropyMaximum Entropy principleFurther entropy definitionsMutual information and relation to entropy

    Finding the Maximum Entropy distribution

    Solving this for f(x) yields the maximum entropy distribution for thegiven constraints:

    f(x) = exp

    (1 + 1 +

    mi=2

    igi (x)

    )(2.40)

    Example: If the prior knowledge consists of the mean and the variance2 of then

    f(x)(x )2dx = 2 (2.41)

    and from the constraints it follows that g2(x) = (x )2 and 2 = 2.The Maximum Entropy distribution for this case is the Gaussiandistribution as shown on page 28 of Colin Fyfe 2 (CF2), or Haykin,Chapter 10, page 491.

    Lecture 2, COMP / ELEC / STAT 502 c E. Merenyi Review of Statistics and Information Theory Basics

  • OutlineProbability basics

    Information theory basicsAdditional properties of random variables

    Information, entropy, differential entropyMaximum Entropy principleFurther entropy definitionsMutual information and relation to entropy

    Joint differential entropy

    Joint differential entropy (C-T 230)

    h(, ) =

    f,(u, v)logf,(u, v)dudv (2.42)

    For more details on joint differential entropy, see C-T, p. 230.

    Lecture 2, COMP / ELEC / STAT 502 c E. Merenyi Review of Statistics and Information Theory Basics

  • OutlineProbability basics

    Information theory basicsAdditional properties of random variables

    Information, entropy, differential entropyMaximum Entropy principleFurther entropy definitionsMutual information and relation to entropy

    Conditional differential entropy

    In CF 2/2, you have seen the formula of the conditional entropy ofdiscrete random variable X given Y . It represents the amount ofuncertainty remaining about the system input X after the system outputY has been observed. Analogous to that is the conditional differentialentropy for continuous random variables:

    h(|) =

    f,(x , y)logf,(x |y)dxdy (2.43)

    Since f,(x |y) = f,(x , y)/f(y)

    h(|) = h(, ) h(). (2.44)

    Lecture 2, COMP / ELEC / STAT 502 c E. Merenyi Review of Statistics and Information Theory Basics

  • OutlineProbability basics

    Information theory basicsAdditional properties of random variables

    Information, entropy, differential entropyMaximum Entropy principleFurther entropy definitionsMutual information and relation to entropy

    Relative entropy (Kullback-Leibler distance)

    The relative entropy is an entropy based measure of the differencebetween two pdfs f(x) and g(x), which is defined as

    D(f g) =

    f(x)logf(x)

    g(x)dx (2.45)

    Note that D(f g) is finite only if the support set of f is contained inthe support set of g . The K-L distance is a measure of the penalty(error) for assuming g(x) when in reality the density function of isf(x). The convention 0 log 00 = 0 is used.

    It follows (can be proven) that D(f g) 0 with equality iff f(x) =g(x) almost everywhere. The relative entropy or K-L distance is alsocalled the K-L divergence.

    Lecture 2, COMP / ELEC / STAT 502 c E. Merenyi Review of Statistics and Information Theory Basics

  • OutlineProbability basics

    Information theory basicsAdditional properties of random variables

    Information, entropy, differential entropyMaximum Entropy principleFurther entropy definitionsMutual information and relation to entropy

    Mutual information

    Mutual information is the measure of the amount of information onerandom variable contains about another.

    I (; ) =

    f,(x , y)logf,(x , y)

    f(x)f(y)dxdy (2.46)

    The concept of mutual information is of profound importance in thedesign of self-organizing systems, where the objective is to develop analgorithm that can learn an input-output relationship of interest on thebasis of the input patterns alone. The Maximum mutual informationprinciple of Linsker (1988) states that the synaptic connections of amultilayered ANN develop such as to maximize the amount of informationthat is preserved when signals are transformed at each processing stageof the network, subject to certain constraints. This is based on earlierideas that came from the recognition that encoding of data from a scenefor the purpose of redundancy reduction is related to the identification ofspecific features in the scene (by the biological perceptual machinery).

    Lecture 2, COMP / ELEC / STAT 502 c E. Merenyi Review of Statistics and Information Theory Basics

  • OutlineProbability basics

    Information theory basicsAdditional properties of random variables

    Information, entropy, differential entropyMaximum Entropy principleFurther entropy definitionsMutual information and relation to entropy

    Properties of Mutual Information

    I (; ) =

    f,(x , y)logf,(x , y)

    f(x)f(y)dxdy

    =

    f,(x , y)logf,(x , y)

    f(y)dxdy

    f,(x , y)logf(x)dxdy

    = h(|)

    (

    f,(x , y)dy)logf(x)dx

    = h(|)

    f(x)logf(x)dx

    = h() h(|)= h() h(|)= h() + h() h(, ) (2.47)

    Lecture 2, COMP / ELEC / STAT 502 c E. Merenyi Review of Statistics and Information Theory Basics

  • OutlineProbability basics

    Information theory basicsAdditional properties of random variables

    Information, entropy, differential entropyMaximum Entropy principleFurther entropy definitionsMutual information and relation to entropy

    Properties of Mutual Information

    I (; ) = I (; ) (2.48)

    I (; ) 0 (2.49)

    I (; ) = D(f,(x , y) f(x)f(y)) (2.50)The last equation expresses mutual information as a special case of theK-L distance.

    I (; ) = D(f,(x, y) density #1

    f(x)f(y) density #2

    )

    The larger the K-L distane the less independent and are, because f,(x, y) and f(x)f(y) are farther apart= I is large. See also diagram in CF 2/2 or in Haykin p.494 (fig. 10.1) showing insightful relations amongmutual information, entropy, conditional and joint entropy.

    Lecture 2, COMP / ELEC / STAT 502 c E. Merenyi Review of Statistics and Information Theory Basics

  • OutlineProbability basics

    Information theory basicsAdditional properties of random variables

    Outline

    Probability basicsRandom variables, CDF, pdf, momentsCovariance, joint CDF and pdf

    Information theory basicsInformation, entropy, differential entropyMaximum Entropy principleFurther entropy definitionsMutual information and relation to entropy

    Additional properties of random variables

    Lecture 2, COMP / ELEC / STAT 502 c E. Merenyi Review of Statistics and Information Theory Basics

  • OutlineProbability basics

    Information theory basicsAdditional properties of random variables

    Sufficient condition of independece

    Statistically independent random variables satisfy the following:

    E [g()h()] = E [g()]E [h()] (2.51)

    where g() and h() are any absolutely integrable functions of and ,respectively. This follows from

    E [g()h()] =

    g(u)h(v)f,(u, v)dudv

    =

    g(u)f(u)du

    h(v)f(v)dv

    = E [g()]E [h()] (2.52)

    Lecture 2, COMP / ELEC / STAT 502 c E. Merenyi Review of Statistics and Information Theory Basics

  • OutlineProbability basics

    Information theory basicsAdditional properties of random variables

    Uncorrelatedness vs independece

    Statistical independence is a stronger property than uncorrelatedness. and are uncorrelated iff

    E [] = E []E [] (2.53)

    In general two random vectors x, y are uncorrelated iff

    Cxy = E [(xM(1)x )(y M(1)y )T ] = 0 (2.54)or equivalently,

    Rxy = E [xyT ] = E [x]E [yT ] = M(1)x M

    (1)y

    T(2.55)

    where Rxy is the correlation matrix of x, y. For zero mean thecorrelation matrix is the same as the covariance matrix.

    Lecture 2, COMP / ELEC / STAT 502 c E. Merenyi Review of Statistics and Information Theory Basics

  • OutlineProbability basics

    Information theory basicsAdditional properties of random variables

    Uncorrelatedness, contd, and whiteness

    As a special case, the components of a random vector x are mutuallyuncorrelated iff

    Cx = E [(xM(1)x )(xM(1)x )T ]= D = Diag(c11, c22, . . . , cnn)

    = Diag(21 , 22 , . . . ,

    2n) (2.56)

    Random vectors with

    M(1)x = 0 and Cx = Rx = I (2.57)

    i.e., having zero mean and unit covariance (and hence unit correlation)matrix are called white. Whiteness is a stronger property thanuncorrelatedness but weaker than statistical independence.

    Lecture 2, COMP / ELEC / STAT 502 c E. Merenyi Review of Statistics and Information Theory Basics

  • OutlineProbability basics

    Information theory basicsAdditional properties of random variables

    Density functions of frequently used distributions

    Uniform:

    f(x) =

    {1

    ba x [a, b]0 elsewhere

    (2.58)

    Gaussian:

    f(x) =12pi

    e(x)2/22 (2.59)

    where and 2 denote the mean and variance, respectively.

    Laplacian:

    f(x) =

    2e|x| > 0 (2.60)

    Lecture 2, COMP / ELEC / STAT 502 c E. Merenyi Review of Statistics and Information Theory Basics

  • OutlineProbability basics

    Information theory basicsAdditional properties of random variables

    Exponential power family of density functions

    These density functions are special cases of the generalized gaussian orexponential power family that has the general formula (for zero mean):

    f(x) = C exp( |x |

    E [|x | ])

    (2.61)

    As it is easy to verify, = 2 yields the Gaussian, = 1 the Laplacian,and the uniform density function. The values of < 2 give riseto supergaussian densities, > 2 to subgaussian ones.

    f(x) =12pi

    e(x)2/22 f(x) =

    2e|x| > 0

    Lecture 2, COMP / ELEC / STAT 502 c E. Merenyi Review of Statistics and Information Theory Basics

  • OutlineProbability basics

    Information theory basicsAdditional properties of random variables

    Uncorrelated but not independent example

    Example of uncorrelated but not independent random variables:

    Assume that and are discrete valued and their joint probabilities are0.25 for any of (0,1), (0,-1), (1,0) and (-1,0). Then it is easy to calculatethat and are uncorrelated. However, they are not independent since

    E [22] = 0 6= 0.25 = E [2]E [2] (2.62)

    Lecture 2, COMP / ELEC / STAT 502 c E. Merenyi Review of Statistics and Information Theory Basics

    OutlineProbability basicsRandom variables, CDF, pdf, momentsCovariance, joint CDF and pdf

    Information theory basicsInformation, entropy, differential entropyMaximum Entropy principleFurther entropy definitionsMutual information and relation to entropy

    Additional properties of random variables