91
Statistical Inference and Sample Size Statistical Inference and Sample Size Arindam Basu [email protected] 2015-03-18

A Lecture on Sample Size and Statistical Inference for Health Researchers

Embed Size (px)

Citation preview

Page 1: A Lecture on Sample Size and Statistical Inference for Health Researchers

Statistical Inference and Sample Size

Statistical Inference and Sample Size

Arindam [email protected]

2015-03-18

Page 2: A Lecture on Sample Size and Statistical Inference for Health Researchers

Statistical Inference and Sample Size

What Shall We Learn?

Revise concepts on probability

Statistical Inference - Estimation

Concept of Hypothesis Testing

Concepts of Sampling and Sample Size

Page 3: A Lecture on Sample Size and Statistical Inference for Health Researchers

Statistical Inference and Sample Size

Approaches to Population Parameters

We‘d like to know about population parameters

Parameters are unknown

As a Result, We calculate or study statistic in samples

Estimate Parameters in population from statistics inSamples

We Also Test Hypotheses About Parameter in our samples

Page 4: A Lecture on Sample Size and Statistical Inference for Health Researchers

Statistical Inference and Sample Size

Concepts of Probability

Page 5: A Lecture on Sample Size and Statistical Inference for Health Researchers

Statistical Inference and Sample Size

Theory of Probability and Inference

A trial/experiment has a set of specified outcomes

The outcome of one trial does not influence the outcome ofanother trial

The trials are identical

Probabilities provide a link between a population and samples

Page 6: A Lecture on Sample Size and Statistical Inference for Health Researchers

Statistical Inference and Sample Size

Independence Law of Probability

Two outcomes are statistically independent if the probabilityof their joint occurrence is the product of the probabilities ofoccurrence of each outcome

Page 7: A Lecture on Sample Size and Statistical Inference for Health Researchers

Statistical Inference and Sample Size

C and D Are Independent

P(CD) = P(C) * P(D); P(CD) = Joint Probability of

the Event C and Event D

Page 8: A Lecture on Sample Size and Statistical Inference for Health Researchers

Statistical Inference and Sample Size

Examples of Independent Events

If Repetitions are Independent, then they are from a RandomSample

Random Sample is About the Method that produces theSample

Page 9: A Lecture on Sample Size and Statistical Inference for Health Researchers

Statistical Inference and Sample Size

Law of Mutually Exclusive

Two outcomes are mutually exclusive if at most one of themcan occur at a time; that is, the outcomes do not overlap

Page 10: A Lecture on Sample Size and Statistical Inference for Health Researchers

Statistical Inference and Sample Size

C and D are Mutually Exclusive

P(C OR D) = P(C) + P(D)

Page 11: A Lecture on Sample Size and Statistical Inference for Health Researchers

Statistical Inference and Sample Size

Examples of Mutually Exclusive Events

Dead or Alive Outcomes

Head or Tail in a Toss

Vaginal OR Caesarian Section as Modes of Delivery

NZ European OR Asian OR Maori OR Pacific Islander

Others??

Page 12: A Lecture on Sample Size and Statistical Inference for Health Researchers

Statistical Inference and Sample Size

Not All Outcomes are Mutually Exclusive

Figure: Not all outcomes mutually exclusive

Page 13: A Lecture on Sample Size and Statistical Inference for Health Researchers

Statistical Inference and Sample Size

What is the Probability of Overweight OR Having HighBlood Pressure?

P(Overwt) OR P(HTN) OR P( Overwt HTN ) = 0.1 + 0.2 +

0.1 = 0.4

Page 14: A Lecture on Sample Size and Statistical Inference for Health Researchers

Statistical Inference and Sample Size

Question: What is the sum of two marginal probabilities?

P(Overwt) + P(HTN) = 0.3 + 0.2 = 0.5

Page 15: A Lecture on Sample Size and Statistical Inference for Health Researchers

Statistical Inference and Sample Size

What Happens When we Remove the Joint Occurrences?

P(Overwt OR HTN) = P(Overwt) + P(HTN) - P(O H) = 0.4

Thus in this case O and H are NOT mutually exclusive

Page 16: A Lecture on Sample Size and Statistical Inference for Health Researchers

Statistical Inference and Sample Size

Law of Addition

By the addition rule, for any two outcomes, the probability ofoccurrence of either outcome or both is the sum of theprobabilities of each occurring minus the probability of theirjoint occurrence

Page 17: A Lecture on Sample Size and Statistical Inference for Health Researchers

Statistical Inference and Sample Size

Law of Conditional Probability

For any two outcomes C and D, the conditional probability ofthe occurrence of C given the occurrence of D, P (C | D],Probability of C GIVEN D is given by

Page 18: A Lecture on Sample Size and Statistical Inference for Health Researchers

Statistical Inference and Sample Size

C is Conditional on D

P(C|D) = P(C D) /P(D)

Page 19: A Lecture on Sample Size and Statistical Inference for Health Researchers

Statistical Inference and Sample Size

Concepts of Randomness

Page 20: A Lecture on Sample Size and Statistical Inference for Health Researchers

Statistical Inference and Sample Size

What is a Random Variable?

A Variable Associated with Random Sample

The process that generates that variable must be random

The Likelihood of Person 1 being selected will have nothing todo with the likelihood of Person 2 being selected

Empirical relative frequency of occurrence of a value of thevariable becomes an estimate of the probability of occurrenceof that value

Page 21: A Lecture on Sample Size and Statistical Inference for Health Researchers

Statistical Inference and Sample Size

Consider this: Number of Boys in Families of Eight

Figure: Number of Boys in Families of 8 Children

Page 22: A Lecture on Sample Size and Statistical Inference for Health Researchers

Statistical Inference and Sample Size

Calculate: What is the probability of

Finding Exactly Two Boys in that Family?

P(Number of Boys = 2) = 0.0993

Finding None, One, or Two Boys in the Family?

P(Number = 1) + P(Number = 2) + P(Number = 0) =0.1310

Page 23: A Lecture on Sample Size and Statistical Inference for Health Researchers

Statistical Inference and Sample Size

Probability Distribution Function of this Data

Figure: Probability Distribution of Boys in Families of 8 Children

Page 24: A Lecture on Sample Size and Statistical Inference for Health Researchers

Statistical Inference and Sample Size

Types of Variables

Discrete

Nominal

Ordinal

Continuous

Interval

Ratio

Page 25: A Lecture on Sample Size and Statistical Inference for Health Researchers

Statistical Inference and Sample Size

Probability with Continuous Random Variable

What is the Probability of Findings someone with Weightexactly 50 kg?

Answer = 0! (i.e., exactly 50.000 and not 50.001 kg)

We can find someone in the interval 49.5 and 50.5 Kgs

Convert continuous variables into intervals -> treat midpointslike discrete -> list probabilities associated

Page 26: A Lecture on Sample Size and Statistical Inference for Health Researchers

Statistical Inference and Sample Size

Probability with Continuous Random Variable

What is the Probability of Findings someone with Weightexactly 50 kg?

Answer = 0! (i.e., exactly 50.000 and not 50.001 kg)

We can find someone in the interval 49.5 and 50.5 Kgs

Convert continuous variables into intervals -> treat midpointslike discrete -> list probabilities associated

Page 27: A Lecture on Sample Size and Statistical Inference for Health Researchers

Statistical Inference and Sample Size

Start with the Barplot of Relative Frequencies

Figure: Barplot of Relative Frequencies

Page 28: A Lecture on Sample Size and Statistical Inference for Health Researchers

Statistical Inference and Sample Size

The curve would take a smooth shape

Figure: Line Plot of Relative Frequencies

Page 29: A Lecture on Sample Size and Statistical Inference for Health Researchers

Statistical Inference and Sample Size

Probability Density Function

A probability density function is a curve that specifies, bymeans of the area under the curve over an interval, theprobability that a continuous random variable falls within theinterval. The total area under the curve is 1

Page 30: A Lecture on Sample Size and Statistical Inference for Health Researchers

Statistical Inference and Sample Size

How to Calculate the Average of a Discrete RandomVariable?

E(Y) = Σ (p*y); where E(Y) = Expected value of Y, p

= proportion, y = individual values

Page 31: A Lecture on Sample Size and Statistical Inference for Health Researchers

Statistical Inference and Sample Size

What is Normal Distribution?

Population = Set of All Possible Values of a Variable

Random Selection of Objects makes the variable RandomVariable

Challenge: Find a Model with few parameters and can applyto real data

Normal or Gaussian distribution is a Statistical model

Page 32: A Lecture on Sample Size and Statistical Inference for Health Researchers

Statistical Inference and Sample Size

Why is Normal Distribution Popular?

It works!

Central Limit Theorem

Practical

Page 33: A Lecture on Sample Size and Statistical Inference for Health Researchers

Statistical Inference and Sample Size

Central Limit Theorem

If a random variable Y has population mean µ and populationvariance σ , the sample mean y , based on n observations, isapproximately normally distributed with mean µ and varianceσ /

√n, for sufficiently large n

Page 34: A Lecture on Sample Size and Statistical Inference for Health Researchers

Statistical Inference and Sample Size

Central Limit Theorem in Simple Terms

Means of Random Samples from Any Distribution Will beNormally Distributed

Reassuring Even when we do not know the nature of theoriginal distribution

Page 35: A Lecture on Sample Size and Statistical Inference for Health Researchers

Statistical Inference and Sample Size

CLT Helps Us to Calculate the Confidence Intervals

Figure: 95 pct confidence interval

Page 36: A Lecture on Sample Size and Statistical Inference for Health Researchers

Statistical Inference and Sample Size

A Table that Helps You to Calculate the 95% CI

Figure: z value table

Page 37: A Lecture on Sample Size and Statistical Inference for Health Researchers

Statistical Inference and Sample Size

Example of a Normal Distribution

Figure: Density Plot

Page 38: A Lecture on Sample Size and Statistical Inference for Health Researchers

Statistical Inference and Sample Size

Statistics Are Random

A statistic associated with a random sample is a random variable

Page 39: A Lecture on Sample Size and Statistical Inference for Health Researchers

Statistical Inference and Sample Size

Illustration with an Example of IQ distribution

Figure: IQ Distribution

Page 40: A Lecture on Sample Size and Statistical Inference for Health Researchers

Statistical Inference and Sample Size

Points to Note

Reduction of variability by a factor of 2 will require a 4-foldincrease in sample size

Page 41: A Lecture on Sample Size and Statistical Inference for Health Researchers

Statistical Inference and Sample Size

Note: if you have 100 participants, and can add another10, don‘t bother

Figure: Extra 10 pct not worth

Page 42: A Lecture on Sample Size and Statistical Inference for Health Researchers

Statistical Inference and Sample Size

Example: Birth Weight of Babies with SIDS (SuddenInfant Death Syndrome)

78 babies died in a City diagnosed with SIDS. Birthcertificates were obtained and found that for these 78 babies,their mean birthweight was 2994 grams. It is also known thatin this population the standard deviation of birthweight isabout 800 grams.

What is the 95% Confidence Interval for Mean Birthweight forSIDS for these infants?

Page 43: A Lecture on Sample Size and Statistical Inference for Health Researchers

Statistical Inference and Sample Size

Answer to the Birth Weight Question

At the lower limit: 2994 - (1.96) * (800 /√

78) = 2816

At the higher limit: 2994 + (1.96) * (800 /√

78) = 3172

What if we wanted to be MORE confident? Say 99%confident?

Page 44: A Lecture on Sample Size and Statistical Inference for Health Researchers

Statistical Inference and Sample Size

Answer to the 99% Confidence Interval

Lower Limit = 2994 - (2.58 * 800 /√

78 ) = 2760

Upper Limit = 2994 + (2.58 * 800 / $√

78 ) = 3228

Page 45: A Lecture on Sample Size and Statistical Inference for Health Researchers

Statistical Inference and Sample Size

Interpretations of Confidence Intervals

As the confidence level increases, the interval level gets wider.

Why can this be?

This is the Price we pay for making sure we have straddledthe population mean

As we decrease α, we increase the level of confidence

If we want to decrease the width then we either decreaseconfidence or increase sample size

Page 46: A Lecture on Sample Size and Statistical Inference for Health Researchers

Statistical Inference and Sample Size

Interpretations of Confidence Intervals

As the confidence level increases, the interval level gets wider.

Why can this be?

This is the Price we pay for making sure we have straddledthe population mean

As we decrease α, we increase the level of confidence

If we want to decrease the width then we either decreaseconfidence or increase sample size

Page 47: A Lecture on Sample Size and Statistical Inference for Health Researchers

Statistical Inference and Sample Size

Steps of Estimation

Start with sample statistic

State about the population parameter

We use confidence interval to indicate that our intervalstraddles the parameter

Sort of flip it over, and get hypothesis testing

Page 48: A Lecture on Sample Size and Statistical Inference for Health Researchers

Statistical Inference and Sample Size

Steps of Hypothesis Testing

Start by assuming a parameter value

Make a probability statement about the value of statistic

Measure ?how far? an observed statistic is from ahypothesised parameter

If the distance is GREAT, we argue hypothesised parameter isINCONSISTENT with the data -> reject the hypothesis

Page 49: A Lecture on Sample Size and Statistical Inference for Health Researchers

Statistical Inference and Sample Size

Concepts of Distance in Hypothesis Testing

Take the basic variability of the observations (variance, σ2 )

Take the sample size (N)

If the observed value of statistic >= 2 * standard errors fromhypothesized value of parameter, question the Truth ofHypothesis

This is because the data do not match the hypothesis

Page 50: A Lecture on Sample Size and Statistical Inference for Health Researchers

Statistical Inference and Sample Size

Example: Are the SIDS babies? birthweight different fromthe normal population?

Mean birthweight of our sample (N = 78) babies = 2994 g

We know standard deviation of population = 800 g

Therefore standard error = 800 /√

78 = 90.6 g

For general population, average birth weight = 3300 g.

Is our sample birthweight consistent with this?

Page 51: A Lecture on Sample Size and Statistical Inference for Health Researchers

Statistical Inference and Sample Size

How far are SIDS birthweight from the average birthweight?

Figure: SIDS Birth Weight

Page 52: A Lecture on Sample Size and Statistical Inference for Health Researchers

Statistical Inference and Sample Size

Conclusions from SIDS Study

The observed difference = 308 g

This is 308/90.6 = 3.38 standard errors away fromhypothesised mean

It is GREAT distance away by our rule

Hence the SIDS babies sample is inconsistent with what isexpected!

The SIDS babies come from a DIFFERENT population, less

What are other challenges to this?

Page 53: A Lecture on Sample Size and Statistical Inference for Health Researchers

Statistical Inference and Sample Size

Where in the Normal Distribution We have this standarderror?

Figure: Area of Observed Value

Page 54: A Lecture on Sample Size and Statistical Inference for Health Researchers

Statistical Inference and Sample Size

Can We Associate a Probability Value to this TailEstimate?

The area to the right of the standard error (here 3.38) is thep-value

We know for z = 1.96, p-value = 0.025

We know for z = 2.58, p-value = 0.005

Page 55: A Lecture on Sample Size and Statistical Inference for Health Researchers

Statistical Inference and Sample Size

What if our statistic fell within the 2 standard errors?

We set it up before the data gathering as follows:

Page 56: A Lecture on Sample Size and Statistical Inference for Health Researchers

Statistical Inference and Sample Size

Figure: sample space

Page 57: A Lecture on Sample Size and Statistical Inference for Health Researchers

Statistical Inference and Sample Size

Concepts Related to Hypothesis Testing

Null Hypothesis - specifies hypothesised real value forparameter

Alternative Hypothesis - Real or range of values when nullhypothesis is rejected

Rejection Region values of statistic when null hypothesis isrejected

Page 58: A Lecture on Sample Size and Statistical Inference for Health Researchers

Statistical Inference and Sample Size

Key Table of Hypothesis Testing

Figure: Table of Hypothesis Testing

Page 59: A Lecture on Sample Size and Statistical Inference for Health Researchers

Statistical Inference and Sample Size

Applying this to our SIDS study

Figure: SIDS sample space

Page 60: A Lecture on Sample Size and Statistical Inference for Health Researchers

Statistical Inference and Sample Size

Rejection Regions - One tailed versus Two tailed

Figure: one tailed

Page 61: A Lecture on Sample Size and Statistical Inference for Health Researchers

Statistical Inference and Sample Size

For One tailed tests with same alpha, widen rejectionregion

Figure:

Page 62: A Lecture on Sample Size and Statistical Inference for Health Researchers

Statistical Inference and Sample Size

Summary of Statistical Inference

Define population

Specify parameters

Take random sample from the population

Estimate the parameter from the sample statistic

Test Hypotheses about the sample statistic and the parameter

Page 63: A Lecture on Sample Size and Statistical Inference for Health Researchers

Statistical Inference and Sample Size

Review: Assumptions of Hypothesis Testing

We knew the population variance and formalised the samplemean to estimate population

What Happens when:

We do not know either the population mean or the variance?

How do we compare two normal populations?

How do we estimate sample sizes?

Page 64: A Lecture on Sample Size and Statistical Inference for Health Researchers

Statistical Inference and Sample Size

Need for a Pivotal Variable

Think of a randomly selected sample whose mean is calculated

That mean follows a normal distribution and estimates thepopulation mean

The variance (or standard deviation of that mean) estimatesthe variance of the population as well

Pivotal Variable is the link

Page 65: A Lecture on Sample Size and Statistical Inference for Health Researchers

Statistical Inference and Sample Size

Pivotal Variable

Chisquare = ( (N - 1) * (standarddeviation)2 ) /σ2 Z = (y -

µ) /σ /√n ;

Chisquare = ( (N - 1) * (standarddeviation)2 ) /σ2

Page 66: A Lecture on Sample Size and Statistical Inference for Health Researchers

Statistical Inference and Sample Size

Requirements of a Pivotal Variable

At least a statistic,

And a parameter

Distribution of Z or Chi-square is fixed

Confidence intervals needed Z or chisquare

These quantities are known as Pivotal Variables

Page 67: A Lecture on Sample Size and Statistical Inference for Health Researchers

Statistical Inference and Sample Size

From Z to T

Random sample picked from a normal distribution and weknow the variance (sigmaˆ2)

Then, Z is our pivotal quantity which has a Normal(0,1)distribution.

What happens when we do not know the population variancebut need to estimate the population mean from sample?

The corresponding pivotal variable is ‘t‘, after Student orWilliam Gosset

T = (y - µ) /(s /√N )

What is the distribution of ‘t‘ ?

Page 68: A Lecture on Sample Size and Statistical Inference for Health Researchers

Statistical Inference and Sample Size

Properties of ‘T‘ Distribution

Similar to Normal Distribution

Depends on N

Indexed by n–1, and similar to chisquare

Bell shaped, symmetrical about 0

As N approaches infinity, t becomes similar to Normal

Page 69: A Lecture on Sample Size and Statistical Inference for Health Researchers

Statistical Inference and Sample Size

Concept: SIDS problem now in terms of t-statistic

This time we do not know the population variance and wouldlike to estimate the population mean

Sample mean birthweight y = 3199.8 g

Standard deviation = 663

Page 70: A Lecture on Sample Size and Statistical Inference for Health Researchers

Statistical Inference and Sample Size

Challenge

Without assuming population variance, can we

Obtain an interval estimate of the population mean?

Test Null Hypothesis that

Birthweight Average of SIDS Cases is 3300g?

T-value for 14 df = 2.14

Hence, upper limit: 3199.8 + 2.14 * 663/√

15 = 3566

Lower limit = 3199 - (2.14 * 663/√

15) = 2834

Note that the confidence interval is wider

Page 71: A Lecture on Sample Size and Statistical Inference for Health Researchers

Statistical Inference and Sample Size

Hypothesis Testing for Paired Data

Paired Data = Repeated or Multiple Measurements on thesame participants

Example: before after measurement of pain followinganalgesics administration

We want to look differences between pairs

Have the mean of sample differences come from a populationof differences with mean 0?

Assume that this difference is normally distributed

Page 72: A Lecture on Sample Size and Statistical Inference for Health Researchers

Statistical Inference and Sample Size

Example: Aminophylline Challenge Study

Apnea children, administered Aminophylline to these children

Measure apnea episodes 16 hours later and compared withwhat would happen 24 hours before administration

Average change for 13 children = 0.767

Sd for 13 children = 0.52

T value for 12 df = 2.18

If we consider no change = 0, then,

Rejection region = 0 - (2.18 * 0.524/√

13 ) = –0.317 andlikewise 0.317

0.767 falls outside of this region.

We reject the null hypothesis

Page 73: A Lecture on Sample Size and Statistical Inference for Health Researchers

Statistical Inference and Sample Size

Sampling

Page 74: A Lecture on Sample Size and Statistical Inference for Health Researchers

Statistical Inference and Sample Size

Importance of Sampling

Save time and money

Measurements can be more accurate when done on smallernumbers

Therefore choose the method with most accuracy andprecision

Page 75: A Lecture on Sample Size and Statistical Inference for Health Researchers

Statistical Inference and Sample Size

Alternatives to Sampling

Census - Expensive

Volunteer based reporting

Early responders are different from late responders and bothare different from members of the general public (“WorriedWell“)

Let the Interviewer Choose (“Choose those who are easiest tofind“)

Page 76: A Lecture on Sample Size and Statistical Inference for Health Researchers

Statistical Inference and Sample Size

Concepts of Sampling

Capture as many respondents as you can

Also try to capture data from nonrespondents

60% or less from postal questionnaires even after 3rd posting,while 70–75% for interviewer based sampling (Jennifer Kelseyet.al. (2007)

For prevalence estimation, completely healthy and those withdiseases do not want to participate

For common but untreatable conditions like back pain, peoplewith intractable problems over-represent in the hope that?research? will solve their problems

Page 77: A Lecture on Sample Size and Statistical Inference for Health Researchers

Statistical Inference and Sample Size

Key Definitions of Terms of Sampling

Sampling Unit is the basic fundamental unit around whichsampling planned (Household, persons)

Sampling Frame = Collection of sampling unit

Probability Sampling = where each sampling unit has anonzero probability of being included in the sample

Nonprobability Sampling = Convenient Sampling

Page 78: A Lecture on Sample Size and Statistical Inference for Health Researchers

Statistical Inference and Sample Size

Types of Probability Sampling

Simple Random Sampling

Systematic Sampling

Stratified Sampling

Cluster Sampling

Multistage Sampling

Page 79: A Lecture on Sample Size and Statistical Inference for Health Researchers

Statistical Inference and Sample Size

Simple Random Sampling

Each unit has EQUAL probability of being included

Uses Random Numbers Table

With Replacement and Without Replacement (See Rexamples)

Page 80: A Lecture on Sample Size and Statistical Inference for Health Researchers

Statistical Inference and Sample Size

Problems of Simple Random Sampling

Investigator needs to know the sampling frame before starting

If the randomising process is not robust or well done, therecan be errors

Not suitable for all situations

Problem: if the investigator is interested to find out familysize from a school, and conduct simple random sampling,there is a problem.

Children with larger families will be oversampled and it canlead to errors

Page 81: A Lecture on Sample Size and Statistical Inference for Health Researchers

Statistical Inference and Sample Size

Systematic Sampling

The sampling unit is regularly spaced throughout the samplingframe

Investigator selects every kith sample

Advantages: investigator does not need to know the samplingframe in advance

Example: every 3rd newborn child in a hospital

Page 82: A Lecture on Sample Size and Statistical Inference for Health Researchers

Statistical Inference and Sample Size

Advantages and Disadvantages of Systematic Sampling

Simple to implement (just select the nth sample unit)

Can capture patterns easily

If there is a cyclical pattern exists, systematic sampling canmiss the pattern entirely, e.g., seasonal trends, say FluPatterns

Cannot estimate variance of population reliably from aSINGLE sample, needs at least two samples

Page 83: A Lecture on Sample Size and Statistical Inference for Health Researchers

Statistical Inference and Sample Size

Stratified Sampling

Divide population into strata or uniform groups

Draw Sample from each stratum

Represents Each subgroup

Can Get precise estimates compared with a correspondingsimple random sample

Can Assign Weights

Widely Used Strategy

Disadvantage: if too few units are selected for some stratathan others

Page 84: A Lecture on Sample Size and Statistical Inference for Health Researchers

Statistical Inference and Sample Size

Cluster Sampling

Sample Clusters rather than individuals

In the sampling frame, identify clusters (say classrooms, orhouseholds, or similar units)

Then, in each cluster, examine everyone within these clusters

Want to study prevalence of dental caries in schoolchildren?Divide schools into classrooms, and sample individualclassrooms, and examine all children in the classrooms

Page 85: A Lecture on Sample Size and Statistical Inference for Health Researchers

Statistical Inference and Sample Size

Advantages of Cluster Sampling

Need not enumerate entire population in advance

Economical Use of Resources

Page 86: A Lecture on Sample Size and Statistical Inference for Health Researchers

Statistical Inference and Sample Size

Multistage Sampling

Identify Primary Sampling Units that are Larger

From the Primary Sampling Units identify secondary samplingunits

Sample from the secondary sampling units or, extend theprocess further

Different from Cluster Sampling

In cluster sampling one selects everyone from the secondaryunit, here the secondary unit is sampled

Can use in different stages different sampling procedures

Page 87: A Lecture on Sample Size and Statistical Inference for Health Researchers

Statistical Inference and Sample Size

Sample Size Calculations

We need to know at least:

How variable are the data

How willing you are to accept that your conclusion is incorrectthat there is an effect when there is none (Type I error)

What is the magnitude of effect you want to detect

What is the certainty with which you want to detect the effect(power)

Page 88: A Lecture on Sample Size and Statistical Inference for Health Researchers

Statistical Inference and Sample Size

Importance of these criteria

The more variation in data, the more observations you need

The more certain you want to be, the more observations youwill need

If the difference is very large, you need fewer people

If the difference is very small, you need more people

Page 89: A Lecture on Sample Size and Statistical Inference for Health Researchers

Statistical Inference and Sample Size

The Formula

Where ∆ = ((µ1 - µ2) /σ) N = 2 * (z * (1-α /2) + z *

(1- β)2)) /∆2

Where ∆ = ((µ1 - µ2) /σ)

Page 90: A Lecture on Sample Size and Statistical Inference for Health Researchers

Statistical Inference and Sample Size

What is the significance of this formula

The standardized difference enters the formula as a square

The narrower the difference, the correspondingly increase insize required

Page 91: A Lecture on Sample Size and Statistical Inference for Health Researchers

Statistical Inference and Sample Size

Summary

This brief tour provides a snapshot of core statistical thinking

We focused on relevant study design issues

We learned about basic probability

We learned about Distributions (Z, T)

We learned about principles of estimation and hypothesistesting

We learned about sampling and sample sizes