SDA 3E Chapter 4

Embed Size (px)

Citation preview

  • 8/4/2019 SDA 3E Chapter 4

    1/47

    2007 Pearson Education

    Chapter 4: Sampling andEstimation

  • 8/4/2019 SDA 3E Chapter 4

    2/47

    Need for SamplingVery large populations

    Destructive testing

    Continuous production process

    The objective of sampling is to draw a valid inference about

    a population.

  • 8/4/2019 SDA 3E Chapter 4

    3/47

    Sample Design Sampling Plan a description of the

    approach that will be used to obtain

    samples from a population Objectives

    Target population

    Population frame

    Method of sampling

    Operational procedures for data collection

    Statistical tools for analysis

  • 8/4/2019 SDA 3E Chapter 4

    4/47

    Sampling Methods Subjective

    Judgment sampling

    Convenience sampling

    Probabilistic

    Simple random sampling every subset of

    a given size has an equal chance of beingselected

  • 8/4/2019 SDA 3E Chapter 4

    5/47

    PHStat Tool

    Random Sample Generator PHStat menu > Sampling > Random

    Sample Generator

    Enter sample size

    Select sampling

    method

  • 8/4/2019 SDA 3E Chapter 4

    6/47

    Excel Data Analysis Tool

    Sampling Excel menu > Tools > Data Analysis >

    Sampling

    Specify input rangeof data

    Choose sampling

    method

    Select output option

  • 8/4/2019 SDA 3E Chapter 4

    7/47

    Other Sampling Methods Systematic sampling

    Stratified sampling

    Cluster sampling

    Sampling from a continuous process

  • 8/4/2019 SDA 3E Chapter 4

    8/47

    Errors in Sampling Nonsampling error

    Poor sample design

    Sampling (statistical) error

    Depends on sample size

    Tradeoff between cost of sampling and

    accuracy of estimates obtained bysampling

  • 8/4/2019 SDA 3E Chapter 4

    9/47

    Estimation Estimation assessing the value of a

    population parameter using sample data.

    Point estimate a single number used toestimate a population parameter

    Confidence intervals a range of valuesbetween which a population parameter isbelieved to be along with the probability thatthe interval correctly estimates the truepopulation parameter

  • 8/4/2019 SDA 3E Chapter 4

    10/47

    Common Point Estimates

  • 8/4/2019 SDA 3E Chapter 4

    11/47

    Theoretical Issues Unbiased estimator one for which the

    expected value equals the population

    parameter it is intended to estimate The sample variance is an unbiased

    estimator for the population variance

    1

    2

    12

    n

    xx

    s

    n

    i

    i

    N

    xn

    i

    i

    2

    12

  • 8/4/2019 SDA 3E Chapter 4

    12/47

    Interval Estimates Range within which we believe the true

    population parameter falls

    Example: Gallup poll percentage ofvoters favoring a candidate is 56% with a3% margin of error.

    Interval estimate is [53%, 59%]

  • 8/4/2019 SDA 3E Chapter 4

    13/47

    Confidence Intervals Confidence interval (CI) an interval

    estimated that specifies the likelihood that

    the interval contains the true populationparameter

    Level of confidence (1a) the probabilitythat the CI contains the true population

    parameter, usually expressed as a percentage(90%, 95%, 99% are most common).

  • 8/4/2019 SDA 3E Chapter 4

    14/47

    Sampling Distribution of the

    Mean

  • 8/4/2019 SDA 3E Chapter 4

    15/47

    Interval Estimate Containing the

    True Population Mean

  • 8/4/2019 SDA 3E Chapter 4

    16/47

    Interval Estimate Not Containing

    the True Population Mean

  • 8/4/2019 SDA 3E Chapter 4

    17/47

    Confidence Interval for the

    Mean KnownA 100(1a)% CI is: x za/2(/n)

    za/2 may be found from Table A.1 or using theExcel function NORMSINV(1-a/2)

  • 8/4/2019 SDA 3E Chapter 4

    18/47

    Example Compute a 95 percent confidence interval for

    the mean number of TV hours/week for the18-24 age group in the file TV Viewing.xls.

    Assume that the population standarddeviation is known to be 10.0. The samplemean for the n= 45 observations iscomputed to be 60.16. For a 95 percent CI,

    za/2 = 1.96. Therefore, the CI is60.16 1.96(10/45)= 60.16 2.92 or [57.24, 63.08]

  • 8/4/2019 SDA 3E Chapter 4

    19/47

    Confidence Interval for the

    Mean, UnknownA 100(1a)% CI is: x ta/2,n-1(s/n)

    ta/2,n-1is the value from a t-distribution withn-1 degrees of freedom, from Table A.2 or

    the Excel function TINV(a, n-1)

  • 8/4/2019 SDA 3E Chapter 4

    20/47

    Relationship Between Normal

    Distribution and t-distribution

    The t-distribution yields larger confidenceintervals for smaller sample sizes.

  • 8/4/2019 SDA 3E Chapter 4

    21/47

    Example Compute a 95 percent confidence interval for the

    mean number of TV hours/week for the 18-24 agegroup in the file TV Viewing.xls. Assume that the

    population standard deviation is not but estimatedfrom the sample as 10.095. A 95 percent CIcorresponds to a/2 = 0.025. With 45 observations,thus the t-distribution has 45 - 1 = 44 df. Using TableA.2, we find that t0.025, 44 = 2.0154, yielding a 95

    percent CI for the mean of60.16 2.0154(10.095/45)= 60.16 3.03 or [57.13, 63.19]

  • 8/4/2019 SDA 3E Chapter 4

    22/47

    PHStat Tool: Confidence

    Intervals for the Mean PHStatmenu > Confidence Intervals>

    Estimate for the mean, sigma known,

    or Estimate for the mean, sigmaunknown

  • 8/4/2019 SDA 3E Chapter 4

    23/47

    PHStat Tool: Confidence

    Intervals for the Mean - Dialog

    Enter the confidence level

    Choose specification ofsample statistics

    Check Finite PopulationCorrection box ifappropriate

  • 8/4/2019 SDA 3E Chapter 4

    24/47

    Sampling From Finite

    Populations When n > 0.05N, use a correction

    factor in computing the standard error:

    1

    N

    nN

    nx

  • 8/4/2019 SDA 3E Chapter 4

    25/47

    PHStat Tool: Confidence

    Intervals for the Mean - Results

  • 8/4/2019 SDA 3E Chapter 4

    26/47

    Confidence Intervals for

    Proportions Sample proportion: p = x/n

    x = number in sample having desired

    characteristic n = sample size

    The sampling distribution of p has meanp and variance p(1p)/n

    When np and n(1p) are at least 5,the sampling distribution of p approacha normal distribution

  • 8/4/2019 SDA 3E Chapter 4

    27/47

    Confidence Intervals for

    Proportions

    A 100(1

    a)% CI is: np)-p(1

    zp/2a

    PHStattool is available under ConfidenceIntervalsoption

  • 8/4/2019 SDA 3E Chapter 4

    28/47

    Confidence Intervals and

    Sample Size CI for the mean, known

    Sample size needed for half-width of at

    most E is n (za/2)2

    (2

    )/E2

    CI for a proportion Sample size needed for half-width of at

    most E is

    Use p as an estimate ofp or 0.5 for themost conservative estimate

    2

    2

    2/)1()(

    E

    z

    n

    ppa

  • 8/4/2019 SDA 3E Chapter 4

    29/47

    PHStat Tool: Sample Size

    Determination PHStatmenu > Sample Size>

    Determination for the Meanor

    Determination for the Proportion

    Enter s, E, and

    confidence level

    Check FinitePopulation Correction

    box if appropriate

  • 8/4/2019 SDA 3E Chapter 4

    30/47

    Confidence Intervals for

    Population Total

    A 100(1

    a)% CI is:

    PHStattool is available under ConfidenceIntervalsoption

    Nx tn-1,a/2 1

    N

    nN

    n

    sN

  • 8/4/2019 SDA 3E Chapter 4

    31/47

    Confidence Intervals for

    Differences Between MeansPopulation 1 Population 2

    Mean 1

    2

    Standard

    deviation

    1

    2

    Point estimate x1 x2Sample size n1 n2

    Point estimate for the difference in means,12, is given by x1 - x2

  • 8/4/2019 SDA 3E Chapter 4

    32/47

    Independent Samples With

    Unequal Variances

    A 100(1

    a)% CI is:x1 -x2 (ta/2, df*) 2

    2

    2

    1

    2

    1

    n

    s

    n

    s

    1

    )/(

    1

    )/(

    2

    2

    2

    2

    2

    1

    2

    1

    2

    1

    2

    2

    2

    2

    1

    2

    1

    n

    ns

    n

    ns

    n

    s

    n

    s

    df* = Fractional valuesrounded down

  • 8/4/2019 SDA 3E Chapter 4

    33/47

    Example In theAccounting Professionals.xlsworksheet,

    find a 95 percent confidence interval for the

    difference in years of service between males andfemales.

  • 8/4/2019 SDA 3E Chapter 4

    34/47

    Calculations s1= 4.39 and n1= 14 (females),

    s2= 8.39 and n2= 13 (males)

    df* = 17.81, so use 17 as the degreesof freedom

  • 8/4/2019 SDA 3E Chapter 4

    35/47

    Independent Samples With

    Equal Variances

    A 100(1

    a)% CI is:x

    1- x

    2

    (ta/2, n1 + n22

    )21

    11

    nnsp

    2

    )1()1(

    21

    2

    22

    2

    11

    nn

    snsn

    sp

    where spis a common pooled standard deviation. Mustassume the variances of the two populations are equal.

  • 8/4/2019 SDA 3E Chapter 4

    36/47

    Example: Accounting

    Professionals

  • 8/4/2019 SDA 3E Chapter 4

    37/47

    Paired Samples

    A 100(1a)% CI is: D (tn-1,a/2) sD/n

    1

    )(1

    n

    DD

    s

    n

    i

    i

    D

    Di = difference for each pair of observations

    D = average of differences

    PHStattool available in theConfidence Intervalsmenu

    2

  • 8/4/2019 SDA 3E Chapter 4

    38/47

    Example Pile Foundation.xls

    A 95% CI for the average differencebetween the actual and estimated pilelengths is

  • 8/4/2019 SDA 3E Chapter 4

    39/47

    Differences Between

    Proportions

    A 100(1a)% CI is:2

    22

    1

    11

    2/21

    )1()1(

    n

    pp

    n

    ppzpp

    a

    Applies when nipi and ni(1 pi) are greater than 5

  • 8/4/2019 SDA 3E Chapter 4

    40/47

    Example In theAccounting Professionals.xls

    worksheet, the proportion of females having

    a CPA is 8/14 = 0.57, while the proportion ofmales having a CPA is 6/13 = 0.46. A 95percent confidence interval for the differencein proportions between females and males is

  • 8/4/2019 SDA 3E Chapter 4

    41/47

    Sampling Distribution of s The sample standard deviation, s, is a point

    estimate for the population standard

    deviation, The sampling distribution of s has a chi-

    square (c2) distribution with n-1 df See Table A.3

    CHIDIST(x, deg_freedom) returns probability tothe right of x

    CHIINV(probability, deg_freedom) returns thevalue of x for a specified right-tail probability

  • 8/4/2019 SDA 3E Chapter 4

    42/47

    Confidence Intervals for the

    Variance

    A 100(1a)% CI is:

    2

    2/1,1

    2

    2

    2/,1

    2)1(

    ,)1(

    aa cc nn

    snsn

    Note the difference in thedenominators!

  • 8/4/2019 SDA 3E Chapter 4

    43/47

    PHStatTool: Confidence

    Intervals for Variance - Dialog PHStatmenu > Confidence Intervals>

    Estimate for the Population Variance

    Enter sample size,standard deviation,

    and confidence level

  • 8/4/2019 SDA 3E Chapter 4

    44/47

    PHStatTool: Confidence

    Intervals for Variance - Results

  • 8/4/2019 SDA 3E Chapter 4

    45/47

    Time Series Data Confidence intervals only make sense

    for stationary time series data

  • 8/4/2019 SDA 3E Chapter 4

    46/47

    Summary and ConclusionsAs the confidence level (1 - a)

    increases, the width of the confidenceinterval also increases.

    As the sample size increases, the widthof the confidence interval decreases.

  • 8/4/2019 SDA 3E Chapter 4

    47/47

    Probability IntervalsA 100(1a)% probability interval for a

    random variable X is any interval [a,b]

    such that P(a X b) = 1a Do not confuse a confidence interval

    with a probability interval; confidence

    intervals are probability intervals forsampling distributions, not for thedistribution of the random variable.