Inference Chapter 10 - VUMC

InferenceChapter 10

What is a population?

What is a population?

A population is the complete set of patients (or subjects or observations) that we hope to learn about.

Examples: Mother-infant pairs, patients with a previous hernia repair, black or white patients on dialysis

Why not collect data from an entire population?

Why not collect data from an entire population?

• $$$$• Impossible

In order to learn about a population, we collect data on a subset of the population. The subset is a sample.

Selecting and collecting data from a subset of the population is sampling.

A few ways one may sample a population+ Random sample+ Systematic sample+ Stratified sample+ Convenience sample

Remember:Before one can sample, one must clearly define the population.

What is the consequence of collecting data from a subset instead of collecting data from the full population?

What is the consequence of collecting data from a subset instead of collecting data from the full population?

Sampling error or estimation error or variation or uncertainty or loss of precision

Embrace uncertainty

Communicate the degree of uncertainty with conclusions drawn from samples.

Examples: confidence intervals, probabilities, interquartile range

How do we learn from a sample?

Population and Sample

A parameter is a numerical measurement that describes a characteristic

of a population

A statistic is a numerical measurement that describes a characteristic of

a sample

In general, we will use a statistic to infer something about a parameter







• Previously in the NHANES study, we computed summary statistics such as sample means and variances or sample proportions to describe our sample.

• Now, not only do we want to describe the sample, we want to learn something about the population.

• Learn about population = inference about population


• What might we learn about a population?

▫ Population mean (μ): Average value assumed by a random variable, also called the expected value.

▫ Population variance (σ2): Variability of the random variable. May also use Population standard deviation (σ).

▫ Other population parameters: median, 25th- or 75th-quantile

Statistical InferenceStatistical inference consists of:

• Estimation: Use sample to estimate population parameter(s) of interest.

▫ Proportion of low birthweight infants.

• Hypothesis testing: Use sample to evaluate population parameter(s).

▫ For now, we will concentrate on estimation.

Statistical Inference

• Estimation consists of point estimates and interval estimates:

▫ Point estimate: A “best guess” of the population parameter based on the sample.

MeanY

Best guess for the mean

Statistical Inference

• Estimation consists of point estimates and interval estimates:

▫ Point estimate: A “best guess” of the population parameter based on the sample.

▫ Interval estimate: A range of “reasonable values”, accounting for sampling variability of the point estimate.

• IMPORTANT: The sample must be representative of the population of interest.

Y

Range of reasonable values for the mean

Example:

Inadvertent enterotomy (IE) occurs during abdominal repair

(like hernia repairs) when an incision is unintentionally made in

the intestine.

Example (cont):

Suppose: In a cohort of 3000 laparoscopic ventral hernia repairs, there were 120 IE. Among 2000 open ventral hernia repairs, there were 40 IE.

What is the estimated proportion (and 95% CI) in each group?

What is the difference in IE proportions (and 95% CI) between laparoscopic and open?

So what?Why does one care about the

difference in proportions and 95% CI?

Confidence Intervals &Inference about the PopulationA confidence interval is a type of inference about the larger population. The interval represents the set of population parameters supported by the data.

by CI

Inference by CI - population mean

Example: What is the mean birthweight (and 95% CI) of infants

for mothers that do not smoke, calculated from the Bayside

hospital data?

Inference by CI - population mean

Example (cont):

bwt 115 3054.957 70.1625 2915.965 3193.948 Variable Obs Mean Std. Err. [95% Conf. Interval]

. ci means bwt if smoke == 0

Inference by CI - population proportion

Example: What is the proportion and 95% CI of mothers that

smoke? (Respond on Top Hat.)

Inference by CI - population proportion

Example (cont):

smoke 189 .3915344 .0355036 .324772 .4626181 Variable Obs Proportion Std. Err. [95% Conf. Interval] Wilson

. ci proportions smoke, wilson

Inference by CI – population rate

Example: What is the rate of death and 95% CI in the placebo

population in the Primary Biliary Cirrhosis Trial (liver.dta)?

Inference by CI – population rate

Example (cont):

status 3.06523 19.57439 2.527043 14.93732 25.19612 Variable Exposure Mean Std. Err. [95% Conf. Interval] Poisson Exact

. ci means status if tx == 0, poisson exposure(ot100k)

. gen ot100k = obstime /100000

. use liver.dta

Inference by CI – difference in population means

Example: Difference and CI of mean birthweight between

mothers do and do not smoke (lowbwt.dta).

Inference by CI – difference in population means

Example (cont):

diff 281.7133 103.9741 76.46677 486.9598 combined 189 2944.656 53.02858 729.0224 2840.049 3049.264 smoker 74 2773.243 76.73218 660.0752 2620.316 2926.17nonsmoke 115 3054.957 70.1625 752.409 2915.965 3193.948 Group Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] Two-sample t test with unequal variances

. ttest bwt, by(smoke) unequal

Inference by CI – difference in population proportions

See example of inadvertent enterotomy above.

One can also make inference about the population with hypothesis

testingThere is a connection with confidence intervals.

Generally, studies are designed to identify

Conclusive differences

Conclusive similarities

or

Because of noise or error, a study may also generate

Conclusive differences

Conclusive similarities

or

Inconclusive results

or

To discover conclusive similarities

• A study will a priori establish an equivalence threshold.


• A study will a priori establish an equivalence threshold.

Other related terms:• Null region• Region of practical

equivalence


• A conclusive similarity is identified when the confidence

interval for the difference falls within the equivalence

threshold.

To discover conclusive differences

• A conclusive difference is identified when the confidence

interval for the difference falls outside the equivalence

threshold.

An inconclusive result

• An inconclusive result occurs when the confidence interval

for the difference straddles the equivalence threshold.

Same framework applies to ratios

Conclusive diff

Conclusive similarityInconclusive

Example

Suppose surgeons establish that the rate of surgical site

infections is equivalent for robotic and laparoscopic hernia

repair.

Example

Suppose surgeons establish that the rate of surgical site infections is equivalent for robotic and laparoscopic hernia repair. The surgeons decide on a 2.5 percentage point difference as threshold.

Example

Suppose surgeons establish that the rate of surgical site infections is equivalent for robotic and laparoscopic hernia repair. The surgeons decide on a 2.5 percentage point difference as threshold.

Example

If 89 in a cohort of 1000 laparoscopic repairs and 97 in a cohort

of 1000 robotic repairs experience an infection, then …

Example



Example



Inconclusive-1.7 3.3

Example

However, if 892 in a cohort of 10000 laparoscopic repairs and 967 in a cohort of 10000 robotic repairs experience an infection, then …

Example

However, if 892 in a cohort of 10000 laparoscopic repairs and

967 in a cohort of 10000 robotic repairs experience an

infection, then …

Conclusive similarity-0.0 1.6

It is common to have a point null instead of a null region


Conclusive differenceInconclusive


Conclusive differenceInconclusive

Notice that there is no possibility of identifying a conclusive similarity.

A common mistake is to interpret an inconclusive result as evidence for a conclusive similarity.

Inconclusive

You might see in a manuscript: The rates of adverse events were the same in the intervention and placebo groups.


Why can this be misleading?• There usually is no threshold for equivalence• The statement is usually based on a point null• Fails to communicate the degree of differences supported by the data



Why can this be misleading?• There usually is no threshold for equivalence• The statement is usually based on a point null• Fails to communicate the degree of differences supported by the data

Better practice: Use the CI to communicate the degree of uncertainty.


Hypothesis Testing, formalized

• DEFINITIONS:

▫ Hypothesis: A statement about a population parameter. Example: mean

bilirubin.

▫ Null Hypothesis (H0): The default claim.

Example: mean bilirubin = 3.

▫ Alternative Hypothesis (H1): The competing claim.

Example: mean bilirubin > 3 (one-sided)

Example: mean bilirubin < 3 (one-sided)

Example: mean bilirubin ≠ 3 (double-sided or two-sided)

Hypothesis Testing, formalized

• DEFINITIONS:


bilirubin.







Hypothesis Testing

• DEFINITIONS:


bilirubin.







Hypothesis Testing

• DEFINITIONS:


bilirubin.







Hypothesis Testing

• DEFINITIONS:


bilirubin.







Only interested in detecting differences in one direction

mean bilirubin > 2 (one-sided)

Ignored Only care about differences greater than 2

mean bilirubin < 2 (one-sided)

Ignored Only care about differences less than 2

mean bilirubin ≠ 2 (two-sided)The usual setup

Care to detect mean differences in any direction.

In general, for two-sided test with point null H0: mean = 2 H1: mean ≠ 2

Conclusive difference

You will see the phrase “rejected the null” when a conclusive difference is identified.

In general, for two-sided test with point null H0: mean = 2 H1: mean ≠ 2

Conclusive difference

You will see the phrase “rejected the null” when a conclusive difference is identified.

In my opinion, it is more straightforward to write: “detected a difference in …”

Hypothesis Testing

ONE-SIDED OR TWO-SIDED HYPOTHESIS TESTS

Remember, if there is a point null: One cannot identify conclusive

similarities. Therefore, one does not “accept the null” nor

“declare it to be true”.

One can perform an hypothesis test using a confidence interval.

How to perform an hypothesis test with point null via CI

• Compute a 1-α confidence interval for the parameter of

interest

Is the null hypothesis in the

CI?

Yes

No

Fail to reject H0

Reject H0

(Inconclusive result)

(Conclusive difference)

We saw this earlier … now updated with hypothesis testing vocab.

Reject H0 (Conclusive difference)Fail to reject H0 (Inconclusive result)

Hypothesis Testing via CI

Example: Low Birthweight Study

Suppose there is a concern that birth weights of infants among mothers

who smoked during pregnancy may be below the national average.

Suppose the national average is 3100g, and we will consider it a

population mean.

data(birthwt, package = "MASS")

Hypothesis Testing via CI

Example (cont):

To assess our study question, we will express our test hypotheses as:

Among smoking mothers,

H0: mean birthweight = 3100g

H1: mean birthweight ≠ 3100g.

Hypothesis Testing vi CI

Example (side note):

Why is H1 ≠ 3100g instead of < 3100g? Even though our concern is low

birthweight infants, if smoking mothers are having babies that are

much heavier than the national average, there may be some other

unrealized problem.

Hypothesis testing via CI

Example (cont):

Calculate the 95% CI:

birthwt %>% filter(smoke == 1) %>% select(bwt) %>% t.test

Hypothesis testing via CI

What does one conclude from the CI?

Example (cont):

Calculate the 95% CI:...95 percent confidence interval: 2619.094 2924.744...

How to perform an hypothesis test via CI

Example (cont):

Is the null hyp (3100g) in the CI?

Yes

No

Fail to reject H0

Reject H095% CI: (2619, 2925)

How to perform an hypothesis test via CI

Example (cont):

Is the null hyp (3100g) in the CI?

Yes

No

Fail to reject H0

Reject H095% CI: (2620, 2926)

Could our inference be wrong?(Yes)

Hypothesis TestingThe ways we can be wrongThe ways we can be right

H0 true H0 false

Reject H0 Type I Error (α) Correct

Do not reject H0 Correct Type II Error (β)


H0 truemean birthweight = 3100g

H0 falsemean birthweight ≠ 3100g

Reject H0 Type I Error (α) Correct

Do not reject H0 Correct Type II Error (β)

In the context of the low birthweight study:


Example: Type I Error

A type I error is made if conclude mean birthweight ≠ 3100g

(the alt hypothesis) when, in fact, the mean birthweight =

3100g (the null hypothesis).

Hypothesis test via CI

• The hypothesis test by (1-α)% CI has a type I error rate of α.


Example: Type II Error

A type II error is made if we fail to reject the null hypothesis

that mean birthweight = 3100g when in fact the mean

birthweight ≠ 3100g (alternative hypothesis).

Notation

• α = Probability of a Type I Error

• β = Probability of a Type II Error

• 1 - β = Power


Proportion

Hypothesis test via CIProportion

Example:

Suppose that the proportion of pregnant mothers that smoke in

the US is known to be 15%. Is the proportion of mothers that

smoke in the low birthweight study different than the national

proportion?


Example (cont):

1. Write the null and alternative hypotheses.


Example (cont):


H0: p = .15

H1: p ≠ .15


Example (cont):


2. Calculate the CI.


Example (cont):



birthwt %>% pull(smoke) %>% table %>% rev %>% prop.test


Example (cont):



3. Perform the hypothesis test. (What do you conclude?)


Rate

Hypothesis test via CIRate

Example: Is the rate incidence rate of death the same in the

treatment arm the same as placebo in the Primary Biliary

Cirrhosis Trial?

data(pbc, package = "survival")


Example (cont):

1. Write the hypothesis in terms of the incidence rate ratio

(IRR).


Example (cont):

1. Write the hypothesis in terms of the incidence rate ratio (IRR).

H0: IRR = 1

H1: IRR ≠ 1


Example (cont):

1. Write the hypothesis in terms of the incidence rate ratio

(IRR).

2. Using the 95% CI, perform the hypothesis test.


pbc %>% filter(trt %in% 1:2) %>% mutate(death = 1*(status == 2)) %>% group_by(trt) %>% summarize( obstime = sum(time/365.25/10) , events = sum(death) )poisson.test(c(65, 60), c(87.2, 84.2))

What other CI can you calculate for the purposes of performing an hypothesis test?Difference in population means, Difference in proportions, etc

One-Sided Hypothesis Tests

The confidence interval methods works for one-sided

hypotheses. One simply calculates a one-sided confidence

interval.



Consider the one-sided hypothesis test for birthweight from smoking

mothers:

H0: mean birthweight ≥ 3100g

H1: mean birthweight < 3100g



Is the null region in the CI?

birthwt %>% filter(smoke == 1) %>% select(bwt) %>% t.test(alternative = "less")


Words of Warning about One-Sided Tests:

• The choice between a one-sided and a two-sided test can be

controversial.

• A one-sided test can sometimes achieve significance when a two-

sided test does not.

• Studies designed under a one-sided setting have more power (or

need fewer subjects) than the two-sided counterpart (will discuss

this in greater detail later).


• There are few studies based on one-sided hypotheses in the

medical literature.

• If you do encounter one, read it with a very critical eye.

Hypothesis testing via the p-value method(Warning: p-values can be highly addictive. Use sparingly and with caution.)

A heuristic approach to p-values

Recall the following hypothesis from the Low Birthweight

Study:

Among smoking mothers,

H0: mean birthweight = 3100g

H1: mean birthweight ≠ 3100g.

A heuristic approach to p-valuesConfidence Interval α level of hypothesis test

The 95% CI: (2620, 2926)The 99% CI: (2570, 2976)The 99.9% CI: (2510, 3036)The 99.99% CI: (2408, 3137)

.05

.010.0010.00001

0.05

0.001

0.00001

0.01

0.05

0.001

0.00001

0.01

At what α level does the

confidence interval tip

from “reject the null” to

“fail to reject the null”?


The p-value is the tipping-point α.

0.05

0.001

0.00001

0.01

(1 – pvalue)% CI

fail to reject

reject

Why is knowing the tipping point helpful?

Why is knowing the tipping point helpful?It indicates the maximum confidence level for which one would reject the null.


Another example:

Recall the low birthweight study and the hypothesis regarding the

proportion of mothers that smoke during pregnancy. Suppose

H0: proportion = 0.3

H1: proportion ≠ 0.3


Another example (cont):

One might perform the 0.05 level test by calculating a 95%

confidence interval.

Or, one might calculate the p-value.

> birthwt %>% pull(smoke) %>% table %>% rev %>% prop.test(p = 0.3)

1-sample proportions test with continuity correction

data: ., null probability 0.3X-squared = 7.1111, df = 1, p-value = 0.007661alternative hypothesis: true p is not equal to 0.395 percent confidence interval: 0.3222615 0.4652911sample estimates: p 0.3915344





What will be the bound for the (1-0.007661)% CI?



Null hypothesis value

> birthwt %>% pull(smoke) %>% table %>% rev %>% prop.test(conf.level = 1-0.007661)

1-sample proportions test with continuity correction

data: ., null probability 0.5X-squared = 8.4656, df = 1, p-value = 0.003619alternative hypothesis: true p is not equal to 0.599.2339 percent confidence interval: 0.3000003 0.4911500sample estimates: p 0.3915344

Calculating p-values in R

• The same commands we used to generate confidence

intervals for differences in means, proportions, and rates will

be the same commands that generate p-values for hypothesis

tests.

Family-wise error rates


● Rather than thinking about single hypothesis test, consider a group of hypotheses.– The family-wise type I error rate is the probability of making at

least one type I error within the group of tests.


● Thought experiement (10.16.1 in text)– Say we have 250 pennies, and we wish to determine whether any

are unbalanced, i.e., have probability p of heads different from 0.5.

– Study design: (a) flip each coin 100 times.(b) perform hypothesis test with data from coin flips

– Question: What is the family-wise type I error rate?

R <- 5000pennies <- 250flips <- 100l30g60 <- function(x){x < 40 | x > 60}

fwe <- rep(NA, R)for(i in 1:R){ data <- rbinom(pennies, flips, 0.5) type1errors <- l30g60(data) fwe[i] <- sum(type1errors) > 0 }

mean(fwe)

Documents

Inference Chapter 10 - VUMC