Lecture 4Statistic for quality

Embed Size (px)

Citation preview

  • 8/20/2019 Lecture 4Statistic for quality

    1/21

    Mathematical

    Biostatistics

    Boot Camp:

    Lecture 4,

    Random

    Vectors

    Brian Caffo

    Table of 

    contents

    Random

    vectors

    Independence

    Independentevents

    Independentrandom variables

    IID randomvariables

    Correlation

    Variance and

    correlation

    properties

    Variances

    properties of 

    sample means

    The sample

    variance

    Some

    Mathematical Biostatistics Boot Camp: Lecture 4

    Vectors

    Brian Caffo

    Department of Biostatistics

    Johns Hopkins Bloomberg School of Public Health

    Johns Hopkins University

    August 3, 2012

  • 8/20/2019 Lecture 4Statistic for quality

    2/21

    Mathematical

    Biostatistics

    Boot Camp:

    Lecture 4,

    Random

    Vectors

    Brian Caffo

    Table of 

    contents

    Random

    vectors

    Independence

    Independentevents

    Independentrandom variables

    IID randomvariables

    Correlation

    Variance and

    correlation

    properties

    Variances

    properties of 

    sample means

    The sample

    variance

    Some

    Table of co

    1   Table of contents

    2   Random vectors

    3   IndependenceIndependent eventsIndependent random variablesIID random variables

    4   Correlation

    5   Variance and correlation properties

    6   Variances properties of sample means

    7   The sample variance

    8   Some discussion

  • 8/20/2019 Lecture 4Statistic for quality

    3/21

    Mathematical

    Biostatistics

    Boot Camp:

    Lecture 4,

    Random

    Vectors

    Brian Caffo

    Table of 

    contents

    Random

    vectors

    Independence

    Independentevents

    Independentrandom variables

    IID randomvariables

    Correlation

    Variance and

    correlation

    properties

    Variances

    properties of 

    sample means

    The sample

    variance

    Some

    Random v

    •  Random vectors are simply random variables collected into a v•   For example if  X   and Y   are random variables (X , Y ) is a rand

    •   Joint density  f   (x , y ) satisfies  f    > 0 and  

      f   (x , y )dxdy  = 1

    •   For discrete random variables

    f   (x , y ) = 1

    •  In this lecture we focus on   independent  random variables whf   (x , y ) = f   (x )g (y )

  • 8/20/2019 Lecture 4Statistic for quality

    4/21

    Mathematical

    Biostatistics

    Boot Camp:

    Lecture 4,

    Random

    Vectors

    Brian Caffo

    Table of 

    contents

    Random

    vectors

    Independence

    Independentevents

    Independentrandom variables

    IID randomvariables

    Correlation

    Variance and

    correlation

    properties

    Variances

    properties of 

    sample means

    The sample

    variance

    Some

    Independent e

    • Two events  A  and  B   are  independent  if 

    P (A ∩ B ) = P (A)P (B )

    •   Two random variables,  X   and  Y  are independent if for any tw

    P ([X  ∈  A] ∩ [Y  ∈ B ]) = P (X  ∈  A)P (Y  ∈ B )•   If  A   is independent of  B   then

    •   Ac  is independent of  B •   A  is independent of  B c 

    •   Ac  is independent of  B c 

  • 8/20/2019 Lecture 4Statistic for quality

    5/21

    Mathematical

    Biostatistics

    Boot Camp:

    Lecture 4,

    Random

    Vectors

    Brian Caffo

    Table of 

    contents

    Random

    vectors

    Independence

    Independentevents

    Independentrandom variables

    IID randomvariables

    Correlation

    Variance and

    correlation

    properties

    Variances

    properties of 

    sample means

    The sample

    variance

    Some

    Ex

    •  What is the probability of getting two consecutive heads?•   A = {Head on flip 1}   P (A) = .5•   B  = {Head on flip 2}   P (B ) = .5

    •  A

    ∩B  =

     {Head on flips 1 and 2

    }•   P (A ∩ B ) = P (A)P (B ) = .5 × .5 = .25

  • 8/20/2019 Lecture 4Statistic for quality

    6/21

    Mathematical

    Biostatistics

    Boot Camp:

    Lecture 4,

    Random

    Vectors

    Brian Caffo

    Table of 

    contents

    Random

    vectors

    Independence

    Independentevents

    Independentrandom variables

    IID randomvariables

    Correlation

    Variance and

    correlation

    properties

    Variances

    properties of 

    sample means

    The sample

    variance

    Some

    Ex

    •  Volume 309 of  Science  reports on a physician who was on triatestimony in a criminal trial

    •  Based on an estimated prevalence of sudden infant death synd8, 543, Dr Meadow testified that that the probability of a mot

    children with SIDS was   18,5432

    •  The mother on trial was convicted of murder•  What was Dr Meadow’s mistake(s)?

  • 8/20/2019 Lecture 4Statistic for quality

    7/21

    Mathematical

    Biostatistics

    Boot Camp:

    Lecture 4,

    Random

    Vectors

    Brian Caffo

    Table of 

    contents

    Random

    vectors

    Independence

    Independentevents

    Independentrandom variables

    IID randomvariables

    Correlation

    Variance and

    correlation

    properties

    Variances

    properties of 

    sample means

    The sample

    variance

    Some

    Example: cont

    •  For the purposes of this class, the principal mistake was to  asprobabilities of having SIDs within a family are independent

    •   That is,  P (A1 ∩ A2) is not necessarily equal to  P (A1)P (A2)•  Biological processes that have a believed genetic or familiar en

    component, of course, tend to be dependent within families

    •   In addition, the estimated prevalence was obtained from an  unon single cases; hence having no information about recurrencefamilies

  • 8/20/2019 Lecture 4Statistic for quality

    8/21

    Mathematical

    Biostatistics

    Boot Camp:

    Lecture 4,

    Random

    Vectors

    Brian Caffo

    Table of 

    contents

    Random

    vectors

    Independence

    Independentevents

    Independentrandom variables

    IID randomvariables

    Correlation

    Variance and

    correlation

    properties

    Variances

    properties of 

    sample means

    The sample

    variance

    Some

    Usefu

    • We will use the following fact extensively in this class:

    If a collection of random variables X 1, X 2, . . . , X n  are independtheir joint distribution is the product of their individual densit

    functions 

    That is, if f   i   is the density for random variable X i   we have tha

    f   (x 1, . . . , x n) =n

    i =1

    f  i (x i )

  • 8/20/2019 Lecture 4Statistic for quality

    9/21

    Mathematical

    Biostatistics

    Boot Camp:

    Lecture 4,

    Random

    Vectors

    Brian Caffo

    Table of 

    contents

    Random

    vectors

    Independence

    Independentevents

    Independentrandom variables

    IID randomvariables

    Correlation

    Variance and

    correlation

    properties

    Variances

    properties of 

    sample means

    The sample

    variance

    Some

    IID random var

    •   In the instance where  f  1 = f  2  = . . . =  f  n  we say that the  X i   arindependent  and   identically distributed 

    •   iid random variables are the default model for random sample

    • Many of the important theories of statistics are founded on as

    variables are iid

  • 8/20/2019 Lecture 4Statistic for quality

    10/21

    Mathematical

    Biostatistics

    Boot Camp:

    Lecture 4,

    Random

    Vectors

    Brian Caffo

    Table of 

    contents

    Random

    vectors

    Independence

    Independentevents

    Independentrandom variables

    IID randomvariables

    Correlation

    Variance and

    correlation

    properties

    Variances

    properties of 

    sample means

    The sample

    variance

    Some

    Ex

    •  Suppose that we flip a biased coin with success probability  p the join density of the collection of outcomes?

    •  These random variables are iid with densities  p x i (1 − p )1−x i •   Therefore

    f   (x 1, . . . , x n) =

    ni =1

    p x i (1 − p )1−x i  = p  x i (1 − p )n

  • 8/20/2019 Lecture 4Statistic for quality

    11/21

    Mathematical

    Biostatistics

    Boot Camp:

    Lecture 4,

    Random

    Vectors

    Brian Caffo

    Table of 

    contents

    Random

    vectors

    Independence

    Independentevents

    Independentrandom variables

    IID randomvariables

    Correlation

    Variance and

    correlation

    properties

    Variances

    properties of 

    sample means

    The sample

    variance

    Some

    Corre

    •   The   covariance  between two random variables  X   and  Y   is de

    Cov(X , Y ) = E [(X  − µx )(Y  − µy )] = E [XY ] − E [X•  The following are useful facts about covariance

    1   Cov(X , Y ) =  Cov(Y , X )2   Cov(X , Y ) can be negative or positive3   |Cov(X , Y )| ≤

     Var(X )Var(y )

  • 8/20/2019 Lecture 4Statistic for quality

    12/21

    Mathematical

    Biostatistics

    Boot Camp:

    Lecture 4,

    Random

    Vectors

    Brian Caffo

    Table of 

    contents

    Random

    vectors

    Independence

    Independentevents

    Independentrandom variables

    IID randomvariables

    Correlation

    Variance and

    correlation

    properties

    Variances

    properties of 

    sample means

    The sample

    variance

    Some

    Corre

    •   The   correlation  between  X   and  Y   isCor(X , Y ) =  Cov(X , Y )/

     Var(X )Var(y )

    1   −1 ≤  Cor(X , Y ) ≤ 12   Cor(X , Y ) =

     ±1 if and only if  X   = a + bY   for some constant

    3   Cor(X , Y ) is unitless4   X   and  Y   are uncorrelated if  Cor(X , Y ) = 05   X   and  Y   are more positively correlated, the closer  Cor(X , Y ) 6   X   and  Y   are more negatively correlated, the closer  Cor(X , Y )

  • 8/20/2019 Lecture 4Statistic for quality

    13/21

    Mathematical

    Biostatistics

    Boot Camp:

    Lecture 4,

    Random

    Vectors

    Brian Caffo

    Table of 

    contents

    Random

    vectors

    Independence

    Independentevents

    Independentrandom variables

    IID randomvariables

    Correlation

    Variance and

    correlation

    properties

    Variances

    properties of 

    sample means

    The sample

    variance

    Some

    Some useful r

    •   Let {X i }ni =1  be a collection of random variables•   When the

     {X i 

    } are uncorrelated

    Var

      ni =1

    ai X i  + b 

     =

    ni =1

    a2i  Var(X i )

    •   Otherwise

    Var   n

    i =1ai X i  + b 

    =

    ni =1

    a2i  Var(X i ) + 2n−1i =1

    n j =i 

    ai a j Cov(X i , X

    •   If the  X i  are iid with variance  σ2 then  Var(X̄ ) = σ2/n  and  E [S

  • 8/20/2019 Lecture 4Statistic for quality

    14/21

    Mathematical

    Biostatistics

    Boot Camp:

    Lecture 4,

    Random

    Vectors

    Brian Caffo

    Table of 

    contents

    Random

    vectors

    Independence

    Independentevents

    Independentrandom variables

    IID randomvariables

    Correlation

    Variance and

    correlation

    properties

    Variances

    properties of 

    sample means

    The sample

    variance

    Some

    ExampleProve that  Var(X  + Y ) =  Var(X ) +  Var(Y ) + 2Cov(X , Y )

    Var(X  + Y )

    =   E [(X  + Y )(X  + Y )] − E [X  + Y ]2

    =   E [X 2 + 2XY  + Y 2] − (µx  + µy )2

    =   E [X 2 + 2XY  + Y 2] − µ2x  − 2µx µy  − µ2y 

    = (E [X 2] − µ2x ) + (E [Y 2] − µ2y ) + 2(E [XY ] − µx

    =   Var(X ) +  Var(Y ) + 2Cov(X , Y )

  • 8/20/2019 Lecture 4Statistic for quality

    15/21

    Mathematical

    Biostatistics

    Boot Camp:

    Lecture 4,

    Random

    Vectors

    Brian Caffo

    Table of 

    contents

    Random

    vectors

    Independence

    Independentevents

    Independentrandom variables

    IID randomvariables

    Correlation

    Variance and

    correlation

    properties

    Variances

    properties of 

    sample means

    The sample

    variance

    Some

    • A commonly used subcase from these properties is that  if a co

    random variables  {X i }  are uncorrelated , then the variance of tsum of the variances

    Var

      ni =1

    X i 

     =

    ni =1

    Var(X i )

    •  Therefore, it is sums of variances that tend to be useful, not sdeviations; that is, the standard deviation of the sum of bunchrandom variables is the square root of the sum of the varianceof the standard deviations

  • 8/20/2019 Lecture 4Statistic for quality

    16/21

    Mathematical

    Biostatistics

    Boot Camp:

    Lecture 4,

    Random

    Vectors

    Brian Caffo

    Table of 

    contents

    Random

    vectors

    Independence

    Independentevents

    Independentrandom variables

    IID randomvariables

    Correlation

    Variance and

    correlation

    properties

    Variances

    properties of 

    sample means

    The sample

    variance

    Some

    The sample Suppose X i  are iid with variance  σ

    2

    Var(X̄ ) =   Var1n

    n

    i =1

    X i =

      1

    n2Var

      ni =1

    X i 

    =

      1

    n2

    ni =1

    Var

    (X i )

    =  1

    n2 × nσ2

    =  σ2

    n

    M h i l

  • 8/20/2019 Lecture 4Statistic for quality

    17/21

    Mathematical

    Biostatistics

    Boot Camp:

    Lecture 4,

    Random

    Vectors

    Brian Caffo

    Table of 

    contents

    Random

    vectors

    Independence

    Independentevents

    Independentrandom variables

    IID randomvariables

    Correlation

    Variance and

    correlation

    properties

    Variances

    properties of 

    sample means

    The sample

    variance

    Some

    Some com

    •   When  X i  are independent with a common variance  Var(X̄ ) =•   σ/√ n   is called  the standard error  of the sample mean•  The standard error of the sample mean is the standard deviat

    distribution of the sample mean

    •  σ   is the standard deviation of the distribution of a single obse

    •  Easy way to remember, the sample mean has to be less variabobservation, therefore its standard deviation is divided by a

     √

    M th ti l

  • 8/20/2019 Lecture 4Statistic for quality

    18/21

    Mathematical

    Biostatistics

    Boot Camp:

    Lecture 4,

    Random

    Vectors

    Brian Caffo

    Table of 

    contents

    Random

    vectors

    Independence

    Independentevents

    Independentrandom variables

    IID randomvariables

    Correlation

    Variance and

    correlation

    properties

    Variances

    properties of 

    sample means

    The sample

    variance

    Some

    The sample va

    •   The   sample variance is defined as

    S 2 =

    ni =1(X i  −  X̄ )2

    n − 1•  The sample variance is an estimator of  σ2•  The numerator has a version that’s quicker for calculation

    ni =1

    (X i  −  X̄ )2 =n

    i =1

    X 2i  − nX̄ 2

    •  The sample variance is (nearly) the mean of the squared deviamean

    Mathematical

  • 8/20/2019 Lecture 4Statistic for quality

    19/21

    Mathematical

    Biostatistics

    Boot Camp:

    Lecture 4,

    Random

    Vectors

    Brian Caffo

    Table of 

    contents

    Random

    vectors

    Independence

    Independentevents

    Independentrandom variables

    IID randomvariables

    Correlation

    Variance and

    correlation

    properties

    Variances

    properties of 

    sample means

    The sample

    variance

    Some

    The sample variance is unb

    E    n

    i =1 (X i  −

     ¯X )

    2  =

    ni =1

    2

    i − nE ¯X 2

    =n

    i =1

    Var(X i ) + µ

    2− n Var(X̄ 

    =

    ni =1

    σ

    2

    + µ2− n σ2/n + µ2

    =   nσ2 + nµ2 − σ2 − nµ2

    = (n − 1)σ2

    Mathematical

  • 8/20/2019 Lecture 4Statistic for quality

    20/21

    Mathematical

    Biostatistics

    Boot Camp:

    Lecture 4,

    Random

    Vectors

    Brian Caffo

    Table of 

    contents

    Random

    vectors

    Independence

    Independentevents

    Independentrandom variables

    IID randomvariables

    Correlation

    Variance and

    correlation

    properties

    Variances

    properties of 

    sample means

    The sample

    variance

    Some

    Hoping to avoid some con

    •   Suppose  X i  are iid with mean  µ and variance  σ2•   S 2 estimates σ2•  The calculation of  S 2 involves dividing by  n − 1•   S /√ n  estimates  σ/√ n  the standard error of the mean•   S /√ n   is called the sample standard error (of the mean)

    Mathematical

  • 8/20/2019 Lecture 4Statistic for quality

    21/21

    Mathematical

    Biostatistics

    Boot Camp:

    Lecture 4,

    Random

    Vectors

    Brian Caffo

    Table of 

    contents

    Random

    vectors

    Independence

    Independentevents

    Independentrandom variables

    IID randomvariables

    Correlation

    Variance and

    correlation

    properties

    Variances

    properties of 

    sample means

    The sample

    variance

    Some

    Ex

    •  In a study of 495 organo-lead workers, the following summariefor TBV in  cm3

    •   mean = 1151.281•  sum of squared observations = 662361978

    • sample sd =  (662361978 − 495 × 1151.2812)/494 = 112.62•  estimated se of the mean = 112.6215/√ 495 = 5.062