Upload
others
View
9
Download
0
Embed Size (px)
Citation preview
InferenceChapter 10
What is a population?
What is a population?
A population is the complete set of patients (or subjects or observations) that we hope to learn about.
Examples: Mother-infant pairs, patients with a previous hernia repair, black or white patients on dialysis
Why not collect data from an entire population?
Why not collect data from an entire population?
• $$$$• Impossible
In order to learn about a population, we collect data on a subset of the population. The subset is a sample.
Selecting and collecting data from a subset of the population is sampling.
A few ways one may sample a population+ Random sample+ Systematic sample+ Stratified sample+ Convenience sample
Remember:Before one can sample, one must clearly define the population.
What is the consequence of collecting data from a subset instead of collecting data from the full population?
What is the consequence of collecting data from a subset instead of collecting data from the full population?
Sampling error or estimation error or variation or uncertainty or loss of precision
Embrace uncertainty
Communicate the degree of uncertainty with conclusions drawn from samples.
Examples: confidence intervals, probabilities, interquartile range
How do we learn from a sample?
Population and Sample
A parameter is a numerical measurement that describes a characteristic
of a population
A statistic is a numerical measurement that describes a characteristic of
a sample
In general, we will use a statistic to infer something about a parameter
Population and Sample
Population and Sample
Population and Sample
Population and Sample
Population and Sample
Population and Sample
• Previously in the NHANES study, we computed summary statistics such as sample means and variances or sample proportions to describe our sample.
• Now, not only do we want to describe the sample, we want to learn something about the population.
• Learn about population = inference about population
Population and Sample
• What might we learn about a population?
▫ Population mean (μ): Average value assumed by a random variable, also called the expected value.
▫ Population variance (σ2): Variability of the random variable. May also use Population standard deviation (σ).
▫ Other population parameters: median, 25th- or 75th-quantile
Statistical InferenceStatistical inference consists of:
• Estimation: Use sample to estimate population parameter(s) of interest.
▫ Proportion of low birthweight infants.
• Hypothesis testing: Use sample to evaluate population parameter(s).
▫ For now, we will concentrate on estimation.
Statistical Inference
• Estimation consists of point estimates and interval estimates:
▫ Point estimate: A “best guess” of the population parameter based on the sample.
MeanY
Best guess for the mean
Statistical Inference
• Estimation consists of point estimates and interval estimates:
▫ Point estimate: A “best guess” of the population parameter based on the sample.
▫ Interval estimate: A range of “reasonable values”, accounting for sampling variability of the point estimate.
• IMPORTANT: The sample must be representative of the population of interest.
Y
Range of reasonable values for the mean
Example:
Inadvertent enterotomy (IE) occurs during abdominal repair
(like hernia repairs) when an incision is unintentionally made in
the intestine.
Example (cont):
Suppose: In a cohort of 3000 laparoscopic ventral hernia repairs, there were 120 IE. Among 2000 open ventral hernia repairs, there were 40 IE.
What is the estimated proportion (and 95% CI) in each group?
What is the difference in IE proportions (and 95% CI) between laparoscopic and open?
So what?Why does one care about the
difference in proportions and 95% CI?
Confidence Intervals &Inference about the PopulationA confidence interval is a type of inference about the larger population. The interval represents the set of population parameters supported by the data.
by CI
Inference by CI - population mean
Example: What is the mean birthweight (and 95% CI) of infants
for mothers that do not smoke, calculated from the Bayside
hospital data?
Inference by CI - population mean
Example (cont):
bwt 115 3054.957 70.1625 2915.965 3193.948 Variable Obs Mean Std. Err. [95% Conf. Interval]
. ci means bwt if smoke == 0
Inference by CI - population proportion
Example: What is the proportion and 95% CI of mothers that
smoke? (Respond on Top Hat.)
Inference by CI - population proportion
Example (cont):
smoke 189 .3915344 .0355036 .324772 .4626181 Variable Obs Proportion Std. Err. [95% Conf. Interval] Wilson
. ci proportions smoke, wilson
Inference by CI – population rate
Example: What is the rate of death and 95% CI in the placebo
population in the Primary Biliary Cirrhosis Trial (liver.dta)?
Inference by CI – population rate
Example (cont):
status 3.06523 19.57439 2.527043 14.93732 25.19612 Variable Exposure Mean Std. Err. [95% Conf. Interval] Poisson Exact
. ci means status if tx == 0, poisson exposure(ot100k)
. gen ot100k = obstime /100000
. use liver.dta
Inference by CI – difference in population means
Example: Difference and CI of mean birthweight between
mothers do and do not smoke (lowbwt.dta).
Inference by CI – difference in population means
Example (cont):
diff 281.7133 103.9741 76.46677 486.9598 combined 189 2944.656 53.02858 729.0224 2840.049 3049.264 smoker 74 2773.243 76.73218 660.0752 2620.316 2926.17nonsmoke 115 3054.957 70.1625 752.409 2915.965 3193.948 Group Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] Two-sample t test with unequal variances
. ttest bwt, by(smoke) unequal
Inference by CI – difference in population proportions
See example of inadvertent enterotomy above.
One can also make inference about the population with hypothesis
testingThere is a connection with confidence intervals.
Generally, studies are designed to identify
Conclusive differences
Conclusive similarities
or
Because of noise or error, a study may also generate
Conclusive differences
Conclusive similarities
or
Inconclusive results
or
To discover conclusive similarities
• A study will a priori establish an equivalence threshold.
To discover conclusive similarities
• A study will a priori establish an equivalence threshold.
Other related terms:• Null region• Region of practical
equivalence
To discover conclusive similarities
• A conclusive similarity is identified when the confidence
interval for the difference falls within the equivalence
threshold.
To discover conclusive differences
• A conclusive difference is identified when the confidence
interval for the difference falls outside the equivalence
threshold.
An inconclusive result
• An inconclusive result occurs when the confidence interval
for the difference straddles the equivalence threshold.
Same framework applies to ratios
Conclusive diff
Conclusive similarityInconclusive
Example
Suppose surgeons establish that the rate of surgical site
infections is equivalent for robotic and laparoscopic hernia
repair.
Example
Suppose surgeons establish that the rate of surgical site infections is equivalent for robotic and laparoscopic hernia repair. The surgeons decide on a 2.5 percentage point difference as threshold.
Example
Suppose surgeons establish that the rate of surgical site infections is equivalent for robotic and laparoscopic hernia repair. The surgeons decide on a 2.5 percentage point difference as threshold.
Example
If 89 in a cohort of 1000 laparoscopic repairs and 97 in a cohort
of 1000 robotic repairs experience an infection, then …
Example
If 89 in a cohort of 1000 laparoscopic repairs and 97 in a cohort
of 1000 robotic repairs experience an infection, then …
Example
If 89 in a cohort of 1000 laparoscopic repairs and 97 in a cohort
of 1000 robotic repairs experience an infection, then …
Inconclusive-1.7 3.3
Example
However, if 892 in a cohort of 10000 laparoscopic repairs and 967 in a cohort of 10000 robotic repairs experience an infection, then …
Example
However, if 892 in a cohort of 10000 laparoscopic repairs and
967 in a cohort of 10000 robotic repairs experience an
infection, then …
Conclusive similarity-0.0 1.6
It is common to have a point null instead of a null region
It is common to have a point null instead of a null region
Conclusive differenceInconclusive
It is common to have a point null instead of a null region
Conclusive differenceInconclusive
Notice that there is no possibility of identifying a conclusive similarity.
A common mistake is to interpret an inconclusive result as evidence for a conclusive similarity.
Inconclusive
You might see in a manuscript: The rates of adverse events were the same in the intervention and placebo groups.
You might see in a manuscript: The rates of adverse events were the same in the intervention and placebo groups.
Why can this be misleading?• There usually is no threshold for equivalence• The statement is usually based on a point null• Fails to communicate the degree of differences supported by the data
A common mistake is to interpret an inconclusive result as evidence for a conclusive similarity.
You might see in a manuscript: The rates of adverse events were the same in the intervention and placebo groups.
Why can this be misleading?• There usually is no threshold for equivalence• The statement is usually based on a point null• Fails to communicate the degree of differences supported by the data
Better practice: Use the CI to communicate the degree of uncertainty.
A common mistake is to interpret an inconclusive result as evidence for a conclusive similarity.
Hypothesis Testing, formalized
• DEFINITIONS:
▫ Hypothesis: A statement about a population parameter. Example: mean
bilirubin.
▫ Null Hypothesis (H0): The default claim.
Example: mean bilirubin = 3.
▫ Alternative Hypothesis (H1): The competing claim.
Example: mean bilirubin > 3 (one-sided)
Example: mean bilirubin < 3 (one-sided)
Example: mean bilirubin ≠ 3 (double-sided or two-sided)
Hypothesis Testing, formalized
• DEFINITIONS:
▫ Hypothesis: A statement about a population parameter. Example: mean
bilirubin.
▫ Null Hypothesis (H0): The default claim.
Example: mean bilirubin = 3.
▫ Alternative Hypothesis (H1): The competing claim.
Example: mean bilirubin > 3 (one-sided)
Example: mean bilirubin < 3 (one-sided)
Example: mean bilirubin ≠ 3 (double-sided or two-sided)
Hypothesis Testing
• DEFINITIONS:
▫ Hypothesis: A statement about a population parameter. Example: mean
bilirubin.
▫ Null Hypothesis (H0): The default claim.
Example: mean bilirubin = 2.
▫ Alternative Hypothesis (H1): The competing claim.
Example: mean bilirubin > 3 (one-sided)
Example: mean bilirubin < 3 (one-sided)
Example: mean bilirubin ≠ 3 (double-sided or two-sided)
Hypothesis Testing
• DEFINITIONS:
▫ Hypothesis: A statement about a population parameter. Example: mean
bilirubin.
▫ Null Hypothesis (H0): The default claim.
Example: mean bilirubin = 2.
▫ Alternative Hypothesis (H1): The competing claim.
Example: mean bilirubin > 2 (one-sided)
Example: mean bilirubin < 2 (one-sided)
Example: mean bilirubin ≠ 2 (double-sided or two-sided)
Hypothesis Testing
• DEFINITIONS:
▫ Hypothesis: A statement about a population parameter. Example: mean
bilirubin.
▫ Null Hypothesis (H0): The default claim.
Example: mean bilirubin = 2.
▫ Alternative Hypothesis (H1): The competing claim.
Example: mean bilirubin > 2 (one-sided)
Example: mean bilirubin < 2 (one-sided)
Example: mean bilirubin ≠ 2 (double-sided or two-sided)
Only interested in detecting differences in one direction
mean bilirubin > 2 (one-sided)
Ignored Only care about differences greater than 2
mean bilirubin < 2 (one-sided)
Ignored Only care about differences less than 2
mean bilirubin ≠ 2 (two-sided)The usual setup
Care to detect mean differences in any direction.
In general, for two-sided test with point null H0: mean = 2 H1: mean ≠ 2
Conclusive difference
You will see the phrase “rejected the null” when a conclusive difference is identified.
In general, for two-sided test with point null H0: mean = 2 H1: mean ≠ 2
Conclusive difference
You will see the phrase “rejected the null” when a conclusive difference is identified.
In my opinion, it is more straightforward to write: “detected a difference in …”
Hypothesis Testing
ONE-SIDED OR TWO-SIDED HYPOTHESIS TESTS
Remember, if there is a point null: One cannot identify conclusive
similarities. Therefore, one does not “accept the null” nor
“declare it to be true”.
One can perform an hypothesis test using a confidence interval.
How to perform an hypothesis test with point null via CI
• Compute a 1-α confidence interval for the parameter of
interest
Is the null hypothesis in the
CI?
Yes
No
Fail to reject H0
Reject H0
(Inconclusive result)
(Conclusive difference)
We saw this earlier … now updated with hypothesis testing vocab.
Reject H0 (Conclusive difference)Fail to reject H0 (Inconclusive result)
Hypothesis Testing via CI
Example: Low Birthweight Study
Suppose there is a concern that birth weights of infants among mothers
who smoked during pregnancy may be below the national average.
Suppose the national average is 3100g, and we will consider it a
population mean.
data(birthwt, package = "MASS")
Hypothesis Testing via CI
Example (cont):
To assess our study question, we will express our test hypotheses as:
Among smoking mothers,
H0: mean birthweight = 3100g
H1: mean birthweight ≠ 3100g.
Hypothesis Testing vi CI
Example (side note):
Why is H1 ≠ 3100g instead of < 3100g? Even though our concern is low
birthweight infants, if smoking mothers are having babies that are
much heavier than the national average, there may be some other
unrealized problem.
Hypothesis testing via CI
Example (cont):
Calculate the 95% CI:
birthwt %>% filter(smoke == 1) %>% select(bwt) %>% t.test
Hypothesis testing via CI
What does one conclude from the CI?
Example (cont):
Calculate the 95% CI:...95 percent confidence interval: 2619.094 2924.744...
How to perform an hypothesis test via CI
Example (cont):
Is the null hyp (3100g) in the CI?
Yes
No
Fail to reject H0
Reject H095% CI: (2619, 2925)
How to perform an hypothesis test via CI
Example (cont):
Is the null hyp (3100g) in the CI?
Yes
No
Fail to reject H0
Reject H095% CI: (2620, 2926)
Could our inference be wrong?(Yes)
Hypothesis TestingThe ways we can be wrongThe ways we can be right
H0 true H0 false
Reject H0 Type I Error (α) Correct
Do not reject H0 Correct Type II Error (β)
Hypothesis TestingThe ways we can be wrongThe ways we can be right
H0 truemean birthweight = 3100g
H0 falsemean birthweight ≠ 3100g
Reject H0 Type I Error (α) Correct
Do not reject H0 Correct Type II Error (β)
In the context of the low birthweight study:
Hypothesis TestingThe ways we can be wrongThe ways we can be right
Example: Type I Error
A type I error is made if conclude mean birthweight ≠ 3100g
(the alt hypothesis) when, in fact, the mean birthweight =
3100g (the null hypothesis).
Hypothesis test via CI
• The hypothesis test by (1-α)% CI has a type I error rate of α.
Hypothesis TestingThe ways we can be wrongThe ways we can be right
Example: Type II Error
A type II error is made if we fail to reject the null hypothesis
that mean birthweight = 3100g when in fact the mean
birthweight ≠ 3100g (alternative hypothesis).
Notation
• α = Probability of a Type I Error
• β = Probability of a Type II Error
• 1 - β = Power
Hypothesis test via CI
Proportion
Hypothesis test via CIProportion
Example:
Suppose that the proportion of pregnant mothers that smoke in
the US is known to be 15%. Is the proportion of mothers that
smoke in the low birthweight study different than the national
proportion?
Hypothesis test via CIProportion
Example (cont):
1. Write the null and alternative hypotheses.
Hypothesis test via CIProportion
Example (cont):
1. Write the null and alternative hypotheses.
H0: p = .15
H1: p ≠ .15
Hypothesis test via CIProportion
Example (cont):
1. Write the null and alternative hypotheses.
2. Calculate the CI.
Hypothesis test via CIProportion
Example (cont):
1. Write the null and alternative hypotheses.
2. Calculate the CI.
birthwt %>% pull(smoke) %>% table %>% rev %>% prop.test
Hypothesis test via CIProportion
Example (cont):
1. Write the null and alternative hypotheses.
2. Calculate the CI.
3. Perform the hypothesis test. (What do you conclude?)
Hypothesis test via CI
Rate
Hypothesis test via CIRate
Example: Is the rate incidence rate of death the same in the
treatment arm the same as placebo in the Primary Biliary
Cirrhosis Trial?
data(pbc, package = "survival")
Hypothesis test via CIRate
Example (cont):
1. Write the hypothesis in terms of the incidence rate ratio
(IRR).
Hypothesis test via CIRate
Example (cont):
1. Write the hypothesis in terms of the incidence rate ratio (IRR).
H0: IRR = 1
H1: IRR ≠ 1
Hypothesis test via CIRate
Example (cont):
1. Write the hypothesis in terms of the incidence rate ratio
(IRR).
2. Using the 95% CI, perform the hypothesis test.
Hypothesis test via CIRate
pbc %>% filter(trt %in% 1:2) %>% mutate(death = 1*(status == 2)) %>% group_by(trt) %>% summarize( obstime = sum(time/365.25/10) , events = sum(death) )poisson.test(c(65, 60), c(87.2, 84.2))
What other CI can you calculate for the purposes of performing an hypothesis test?Difference in population means, Difference in proportions, etc
One-Sided Hypothesis Tests
The confidence interval methods works for one-sided
hypotheses. One simply calculates a one-sided confidence
interval.
One-Sided Hypothesis Tests
Example: Low Birthweight Study
Consider the one-sided hypothesis test for birthweight from smoking
mothers:
H0: mean birthweight ≥ 3100g
H1: mean birthweight < 3100g
One-Sided Hypothesis Tests
Example: Low Birthweight Study
Is the null region in the CI?
birthwt %>% filter(smoke == 1) %>% select(bwt) %>% t.test(alternative = "less")
One-Sided Hypothesis Tests
Words of Warning about One-Sided Tests:
• The choice between a one-sided and a two-sided test can be
controversial.
• A one-sided test can sometimes achieve significance when a two-
sided test does not.
• Studies designed under a one-sided setting have more power (or
need fewer subjects) than the two-sided counterpart (will discuss
this in greater detail later).
One-Sided Hypothesis Tests
• There are few studies based on one-sided hypotheses in the
medical literature.
• If you do encounter one, read it with a very critical eye.
Hypothesis testing via the p-value method(Warning: p-values can be highly addictive. Use sparingly and with caution.)
A heuristic approach to p-values
Recall the following hypothesis from the Low Birthweight
Study:
Among smoking mothers,
H0: mean birthweight = 3100g
H1: mean birthweight ≠ 3100g.
A heuristic approach to p-valuesConfidence Interval α level of hypothesis test
The 95% CI: (2620, 2926)The 99% CI: (2570, 2976)The 99.9% CI: (2510, 3036)The 99.99% CI: (2408, 3137)
.05
.010.0010.00001
0.05
0.001
0.00001
0.01
0.05
0.001
0.00001
0.01
At what α level does the
confidence interval tip
from “reject the null” to
“fail to reject the null”?
A heuristic approach to p-values
The p-value is the tipping-point α.
0.05
0.001
0.00001
0.01
(1 – pvalue)% CI
fail to reject
reject
Why is knowing the tipping point helpful?
Why is knowing the tipping point helpful?It indicates the maximum confidence level for which one would reject the null.
A heuristic approach to p-values
Another example:
Recall the low birthweight study and the hypothesis regarding the
proportion of mothers that smoke during pregnancy. Suppose
H0: proportion = 0.3
H1: proportion ≠ 0.3
A heuristic approach to p-values
Another example (cont):
One might perform the 0.05 level test by calculating a 95%
confidence interval.
Or, one might calculate the p-value.
> birthwt %>% pull(smoke) %>% table %>% rev %>% prop.test(p = 0.3)
1-sample proportions test with continuity correction
data: ., null probability 0.3X-squared = 7.1111, df = 1, p-value = 0.007661alternative hypothesis: true p is not equal to 0.395 percent confidence interval: 0.3222615 0.4652911sample estimates: p 0.3915344
A heuristic approach to p-values
Another example (cont):
A heuristic approach to p-values
Another example (cont):
What will be the bound for the (1-0.007661)% CI?
A heuristic approach to p-values
Another example (cont):
Null hypothesis value
> birthwt %>% pull(smoke) %>% table %>% rev %>% prop.test(conf.level = 1-0.007661)
1-sample proportions test with continuity correction
data: ., null probability 0.5X-squared = 8.4656, df = 1, p-value = 0.003619alternative hypothesis: true p is not equal to 0.599.2339 percent confidence interval: 0.3000003 0.4911500sample estimates: p 0.3915344
Calculating p-values in R
• The same commands we used to generate confidence
intervals for differences in means, proportions, and rates will
be the same commands that generate p-values for hypothesis
tests.
Family-wise error rates
Family-wise error rates
● Rather than thinking about single hypothesis test, consider a group of hypotheses.– The family-wise type I error rate is the probability of making at
least one type I error within the group of tests.
Family-wise error rates
● Thought experiement (10.16.1 in text)– Say we have 250 pennies, and we wish to determine whether any
are unbalanced, i.e., have probability p of heads different from 0.5.
– Study design: (a) flip each coin 100 times.(b) perform hypothesis test with data from coin flips
– Question: What is the family-wise type I error rate?
R <- 5000pennies <- 250flips <- 100l30g60 <- function(x){x < 40 | x > 60}
fwe <- rep(NA, R)for(i in 1:R){ data <- rbinom(pennies, flips, 0.5) type1errors <- l30g60(data) fwe[i] <- sum(type1errors) > 0 }
mean(fwe)