97
Introduction to Biostatistics for Clinical and Translational Researchers KUMC Departments of Biostatistics & Internal Medicine University of Kansas Cancer Center

Introduction to Biostatistics for Clinical and Translational Researchers

  • Upload
    trapper

  • View
    35

  • Download
    3

Embed Size (px)

DESCRIPTION

Introduction to Biostatistics for Clinical and Translational Researchers. KUMC Departments of Biostatistics & Internal Medicine University of Kansas Cancer Center. Course Information. Jo A. Wick, PhD Office Location: 5028 Robinson Email: [email protected] - PowerPoint PPT Presentation

Citation preview

Page 1: Introduction to Biostatistics for Clinical and Translational Researchers

Introduction to Biostatistics for Clinical and Translational

Researchers

KUMC Departments of Biostatistics & Internal Medicine

University of Kansas Cancer Center

Page 2: Introduction to Biostatistics for Clinical and Translational Researchers

Course Information

Jo A. Wick, PhDOffice Location: 5028 RobinsonEmail: [email protected]

Lectures are recorded and posted at http://biostatistics.kumc.edu under ‘Events and Lectures’

Page 3: Introduction to Biostatistics for Clinical and Translational Researchers

Inferences: Hypothesis Testing

Page 4: Introduction to Biostatistics for Clinical and Translational Researchers

# Groups

2

Normal or large n

Independent Samples

2-sample t

Dependent Samples

Paired t

Non-normal or small n

Independent Samples

Wilcoxon Signed-Rank

Dependent Samples

Wilcoxon Rank-Sum

> 2

Normal or large n

Independent Samples

ANOVA

Dependent Samples

2-way ANOVA

Non-normal or small n

Independent Samples

Kruskal-Wallis

Dependent Samples

Friedman’s

Last Week

Continuous outcome, compared between groups

Page 5: Introduction to Biostatistics for Clinical and Translational Researchers

Today

Yes/No or categorical outcome compared between groups? Chi-square tests

Time-to-event compared between groups? Survival Analysis

Association between two continuous outcomes? Correlation

What if we want to ‘adjust’ any of these for additional factors? Regression Methods

Page 6: Introduction to Biostatistics for Clinical and Translational Researchers

Chi-Square Tests

Page 7: Introduction to Biostatistics for Clinical and Translational Researchers

Inferences on Proportions

When do we do when we have nominal (categorical) data on more than one factor?Gender and hair colorMenopausal status and disease stage at diagnosis‘Handedness’ and genderTumor response and treatmentPresence/absence of disease and exposure

These types of tests are looking at whether two categorical variables are independent of one another (versus associated)—thus, tests of this type are often referred to as chi-square tests of independence.

Page 8: Introduction to Biostatistics for Clinical and Translational Researchers

Inferences on Proportions

Remember, this is essentially looking at the association between two outcomes, where both are categorical (nominal or ordinal).

Assumptions?ROT: No expected frequency should be less than 5 (i.e.,

nπ < 5)If not met, Fisher’s Exact test is appropriate.

Page 9: Introduction to Biostatistics for Clinical and Translational Researchers

Inferences on Proportions

Example: Hair color and GenderGender: x1 = {M, F}

Hair Color: x1 = {Black, Brown, Blonde, Red}

Black Brown Blonde Red Total

Male 32 (32%) 43 (43%) 16 (16%) 9 (9%) 100

Female 55 (27.5%) 65 (32.5%) 64 (32%) 16 (8%) 200

Total 87 108 80 25 N = 300

Gender Hair Color

Male Black

Female Red

Female Blonde

What the data should look like in the actual dataset:

Page 10: Introduction to Biostatistics for Clinical and Translational Researchers

Hair Color and Gender

The researcher hypothesizes that hair color is not independent of sex.

H0: Hair color is independent of gender (i.e., the phenotypic ratio is the same within each gender).

H1: Hair color is not independent of gender (i.e., the phenotypic ratio is different between genders).

Page 11: Introduction to Biostatistics for Clinical and Translational Researchers

Hair Color and Gender

Chi-square statistics compute deviations between what is expected (under H0) and what is actually observed in the data:

DF = (r – 1)(c – 1)

where r is number of

rows and c is

number of columns

2

2

x

O E

E

Page 12: Introduction to Biostatistics for Clinical and Translational Researchers

Hair Color and Gender

Does it appear that this type of sample could have come from a population where the different hair colors occur with the same frequency within each gender?

OR does it appear that the distribution of hair color is different between men and women?

Black Brown Blonde Red Total

Male 32 (32%) 43 (43%) 16 (16%) 9 (9%) 100

Female 55 (27.5%) 65 (32.5%) 64 (32%) 16 (8%) 200

Total 87 108 80 25 N = 300

Page 13: Introduction to Biostatistics for Clinical and Translational Researchers

Hair Color and Gender

Conclusion: Reject H0: Gender and Hair Color are independent. It appears that the researcher’s hypothesis that the population phenotypic ratio is different between genders is correct (p = 0.029).

Black Brown Blonde Red Total

Male 32 (32%) 43 (43%) 16 (16%) 9 (9%) 100

Female 55 (27.5%) 65 (32.5%) 64 (32%) 16 (8%) 200

Total 87 108 80 25 N = 300

23 7.815

Page 14: Introduction to Biostatistics for Clinical and Translational Researchers

Inferences on Proportions

Special case: when you have a 2X2 contingency table, you are actually testing a hypothesis concerning two population proportions: H0: π1 = π2

(i.e., the proportion of males who are blonde is the same as the proportion of females who are blonde).

Blonde Non-blonde Total

Male 16 (16%) 84 (84%) 100

Female 64 (32%) 136 (68%) 200

Total 80 (26.7%) 220 (73.3%) N = 300

Page 15: Introduction to Biostatistics for Clinical and Translational Researchers

Inferences on Proportions

When you have a single proportion and have a small sample, substitute the Binomial test which provides exact results.

The nonparametric Fisher Exact test can be always be used in place of the chi-square test when you have contingency table-like data (i.e., two categorical factors whose association is of interest)—it should be substituted for the chi-square test of independence when ‘cell’ sizes are small.

Page 16: Introduction to Biostatistics for Clinical and Translational Researchers

Survival Analysis

Page 17: Introduction to Biostatistics for Clinical and Translational Researchers

Inferences on Time-to-Event

Survival Analysis is the class of statistical methods for studying the occurrence (categorical) and timing (continuous) of events.

The event could be development of a diseaseresponse to treatmentrelapsedeath

Survival analysis methods are most often applied to the study of deaths.

Page 18: Introduction to Biostatistics for Clinical and Translational Researchers

Inferences on Time-to-Event

Survival Time: the time from a well-defined point in time (time origin) to the occurrence of a given event.

Survival data includes:a timean event ‘status’any other relevant subject characteristics

Page 19: Introduction to Biostatistics for Clinical and Translational Researchers

Inferences on Time-to-Event

In most clinical studies the length of study period is fixed and the patients enter the study at different times.Lost-to-follow-up patients’ survival times are measured

from the study entry until last contact (censored observations).

Patients still alive at the termination date will have survival times equal to the time from the study entry until study termination (censored observations).

When there are no censored survival times, the set is said to be complete.

Page 20: Introduction to Biostatistics for Clinical and Translational Researchers

Functions of Survival Time

Let T = the length of time until a subject experiences the event.

The distribution of T can be described by several functions:Survival Function: the probability that an individual

survives longer than some time, t:

S(t) = P(an individual survives longer than t)

= P(T > t)

Page 21: Introduction to Biostatistics for Clinical and Translational Researchers

Functions of Survival Time

If there are no censored observations, the survival function is estimated as the proportion of patients surviving longer than time t:

ˆ # of patients surviving longer than ( ) =

total # of patients

tS t

Page 22: Introduction to Biostatistics for Clinical and Translational Researchers

Functions of Survival Time

Density Function: The survival time T has a probability density function defined as the limit of the probability that an individual experiences the event in the short interval (t, t + t) per unit width t:

( )

0

an individual dying in the interval , +( ) = lim

t

P t t tf t

t

Page 23: Introduction to Biostatistics for Clinical and Translational Researchers

Functions of Survival Time

Hazard Function: The hazard function h(t) of survival time T gives the conditional failure rate. It is defined as the probability of failure during a very small time interval, assuming the individual has survived to the beginning of the interval:

,( )

t

P t t t th t

t0

an individual of age fails in the time interval ( + )lim

Page 24: Introduction to Biostatistics for Clinical and Translational Researchers

Functions of Survival Time

The hazard is also known as the instantaneous failure rate, force of mortality, conditional mortality rate, or age-specific failure rate.

The hazard at any time t corresponds to the risk of event occurrence at time t:For example, a patient’s hazard for contracting influenza

is 0.015 with time measured in months.What does this mean? This patient would expect to

contract influenza 0.015 times over the course of a month assuming the hazard stays constant.

Page 25: Introduction to Biostatistics for Clinical and Translational Researchers

Functions of Survival Time

If there are no censored observations, the hazard function is estimated as the proportion of patients dying in an interval per unit time, given that they have survived to the beginning of the interval:

ˆ # of patients dying in the interval beginning at time ( ) =

# of patients surviving at interval width

# of patients dying per unit time in the interval =

# of patients surviving at

th t

t

t

Page 26: Introduction to Biostatistics for Clinical and Translational Researchers

Estimation of S(t)

Product-Limit Estimates (Kaplan-Meier): most widely used in biological and medical applications

Life Table Analysis (actuarial method): appropriate for large number of observations or if there are many unique event times

Page 27: Introduction to Biostatistics for Clinical and Translational Researchers

Methods for Comparing S(t)

If your question looks like: “Is the time-to-event different in group A than in group B (or C . . . )?” then you have several options, including:Log-rank Test: weights effects over the entire

observation equally—best when difference is constant over time

Weighted log-rank tests:• Wilcoxon Test: gives higher weights to earlier effects—better for

detecting short-term differences in survival• Tarome-Ware: a compromise between log-rank and Wilcoxon• Peto-Prentice: gives higher weights to earlier events• Fleming-Harrington: flexible weighting method

Page 28: Introduction to Biostatistics for Clinical and Translational Researchers

Early? Late? Proportional?

Early difference that fades

Difference appears late

Difference is early and maintained

Page 29: Introduction to Biostatistics for Clinical and Translational Researchers

Inferences for Time-to-Event

Example: survival in squamous cell carcinomaA pilot study was conducted to compare

Accelerated Fractionation Radiation Therapy versus Standard Fractionation Radiation Therapy for patients with advanced unresectable squamous cell carcinoma of the head and neck.

The researchers are interested in exploring any differences in survival between the patients treated with Accelerated FRT and the patients treated with Standard FRT.

Page 30: Introduction to Biostatistics for Clinical and Translational Researchers

Squamous Cell Carcinoma

  AFRT SFRT

Gender    

Male 28 (97%) 16 (100%)

Female 1 (3%) 0

Age    

Median 61 65

Range 30-71 43-78

Primary Site    

Larynx 3 (10%) 4 (25%)

Oral Cavity 6 (21%) 1 (6%)

Pharynx 20 (69%) 10 (63%)

Salivary Glands 0 1 (6%)

Stage    

III 4 (14%) 8 (50%)

IV 25 (86%) 8 (50%)

Tumor Stage    

T2 3 (10%) 2 (12%)

T3 8 (28%) 7 (44%)

T4 18 (62%) 7 (44%)

Page 31: Introduction to Biostatistics for Clinical and Translational Researchers

Overall Survival by Treatment

Survival Time (months)

0 12 24 36 48 60 72 84 96 108 120

Sur

viva

l Pro

babi

lity

0.00

0.25

0.50

0.75

1.00

AFRTSFRT

Inferences for Time-to-Event

H0: S1(t) = S2(t) for all t

H1: S1(t) ≠ S2(t) for at least one t

Page 32: Introduction to Biostatistics for Clinical and Translational Researchers

Squamous Cell Carcinoma

Overall Survival by Treatment

Survival Time (months)

0 12 24 36 48 60 72 84 96 108 120

Sur

viva

l Pro

babi

lity

0.00

0.25

0.50

0.75

1.00

AFRTSFRT

Median Survival Time:

AFRT: 18.38 months (2 censored)

SFRT: 13.19 months (5 censored)

Page 33: Introduction to Biostatistics for Clinical and Translational Researchers

Squamous Cell Carcinoma

Overall Survival by Treatment

Survival Time (months)

0 12 24 36 48 60 72 84 96 108 120

Sur

viva

l Pro

babi

lity

0.00

0.25

0.50

0.75

1.00

AFRTSFRT

Log-Rank test p-value= 0.5421

Median Survival Time:

AFRT: 18.38 months (2 censored)

SFRT: 13.19 months (5 censored)

Page 34: Introduction to Biostatistics for Clinical and Translational Researchers

Squamous Cell Carcinoma

  AFRT SFRT

Gender    

Male 28 (97%) 16 (100%)

Female 1 (3%) 0

Age    

Median 61 65

Range 30-71 43-78

Primary Site    

Larynx 3 (10%) 4 (25%)

Oral Cavity 6 (21%) 1 (6%)

Pharynx 20 (69%) 10 (63%)

Salivary Glands 0 1 (6%)

Stage    

III 4 (14%) 8 (50%)

IV 25 (86%) 8 (50%)

Tumor Stage    

T2 3 (10%) 2 (12%)

T3 8 (28%) 7 (44%)

T4 18 (62%) 7 (44%)

Page 35: Introduction to Biostatistics for Clinical and Translational Researchers

Squamous Cell Carcinoma

Staging of disease is also prognostic for survival.Shouldn’t we consider the analysis of the survival

of these patients by stage as well as by treatment?

Page 36: Introduction to Biostatistics for Clinical and Translational Researchers

Overall Survival by Treatment and Stage

Survival Time (Months)

0 12 24 36 48 60 72 84 96 108 120

Sur

viva

l Pro

babi

lity

0.00

0.25

0.50

0.75

1.00

AFRT/Stage 3AFRT/Stage 4SFRT/Stage 3SFRT/Stage 4

Squamous Cell Carcinoma

Median Survival Time:AFRT Stage 3: 77.98 mo. AFRT Stage 4: 16.21 mo.SFRT Stage 3: 19.34 mo. SFRT Stage 4: 8.82 mo.

Log-Rank test p-value = 0.0792

Page 37: Introduction to Biostatistics for Clinical and Translational Researchers

Inferences on Time-to-Event

Concerns a response that is both categorical (event?) and continuous (time)

There are several nonparametric methods that can be used—choice should be based on whether you anticipate a short-term or long-term benefit.

Log-rank test is optimal when the survival curves are approximately parallel.

Weight functions should be chosen based on clinical knowledge and should be pre-specified.

Page 38: Introduction to Biostatistics for Clinical and Translational Researchers

Publication Bias

From: Publication bias: evidence of delayed publication in a cohort study of clinical research projects BMJ 1997;315:640-645 (13 September)

Page 39: Introduction to Biostatistics for Clinical and Translational Researchers

From: Publication bias: evidence of delayed publication in a cohort study of clinical research projects BMJ 1997;315:640-645 (13 September)

Table 4 Risk factors for time to publication using univariate Cox regression analysis

Characteristic # not published # published Hazard ratio (95% CI)

 Null 29 23 1.00

Non-significant trend

16 4 0.39 (0.13 to 1.12)

Significant 47 99 2.32 (1.47 to 3.66)

Interpretation: Significant results have a 2-fold higher incidence of publication compared to null results.

Publication Bias

Page 40: Introduction to Biostatistics for Clinical and Translational Researchers

Correlation

Page 41: Introduction to Biostatistics for Clinical and Translational Researchers

Linear Correlation

Linear regression assumes the linear dependence of one variable y (dependent) on a second variable x (independent).

Linear correlation also considers the linear relationship between two continuous outcomes but neither is assumed to be functionally dependent upon the other.Interest is primarily in the strength of association, not in

describing the actual relationship.

Page 42: Introduction to Biostatistics for Clinical and Translational Researchers

42

Scatterplot

Page 43: Introduction to Biostatistics for Clinical and Translational Researchers

43

Correlation

Pearson’s Correlation Coefficient is used to quantify the strength.

Note: If sample size is small or data is non-normal, use non-parametric Spearman’s coefficent.

2 2

x x y yr

x x y y

Page 44: Introduction to Biostatistics for Clinical and Translational Researchers

Correlation

Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons. 3-56

r < 0 r > 0

r = 0

Page 45: Introduction to Biostatistics for Clinical and Translational Researchers

Inferences on Correlation

H0: ρ = 0 (no linear association) versusH1: ρ > 0 (strong positive linear relationship) or H1: ρ < 0 (strong negative linear

relationship)or H1: ρ ≠ 0 (strong linear relationship) Test statistic: t (df = 2)

Page 46: Introduction to Biostatistics for Clinical and Translational Researchers

46

Correlation

Page 47: Introduction to Biostatistics for Clinical and Translational Researchers

47

Correlation

* Excluding France

Page 48: Introduction to Biostatistics for Clinical and Translational Researchers

Regression Methods

Page 49: Introduction to Biostatistics for Clinical and Translational Researchers

What about adjustments?

There may be other predictors or explanatory variables that you believe are related to the response other than the actual factor (treatment) of interest.

Regression methods will allow you to incorporate these factors into the test of a treatment effect:Logistic regression: when y is categorical and nominal

binaryMultinomial logistic regression: when y is categorical

with more than 2 nominal categoriesOrdinal logistic regression: when y is categorical and

ordinal

Page 50: Introduction to Biostatistics for Clinical and Translational Researchers

What about adjustments?

Regression methods will allow you to incorporate these factors into the test of a treatment effect:Linear regression: when y is continuous and the factors

are a combination of categorical and continuous (or just continuous)

Two- and three-way ANOVA: when y is continuous and the factors are all categorical

Page 51: Introduction to Biostatistics for Clinical and Translational Researchers

What about adjustments?

Regression methods will allow you to incorporate these factors into the test of a treatment effect:Cox regression: when y is a time-to-event outcome

Page 52: Introduction to Biostatistics for Clinical and Translational Researchers

Linear Regression

The relationship between two variables may be one of functional dependence—that is, the magnitude of one of the variables (dependent) is assumed to be determined by (dependent on) the magnitude of the second (independent), whereas the reverse is not true.Blood pressure and ageDependent does not equate to ‘caused by’

Page 53: Introduction to Biostatistics for Clinical and Translational Researchers

Linear Regression

In it’s most basic form, linear regression is a probabilistic model that accounts for unexplained variation in the relationship between two variables:

This model is referred to as simple linear regression.

y

mx + b

x0 1

=Deterministic Component + Random Error

= +ε

= β +β +ε

Page 54: Introduction to Biostatistics for Clinical and Translational Researchers

0 2 4 6 8 10

02

46

81

0

x

y

0 2 4 6 8 10

02

46

81

0

x

y

Simple Linear Regression

0 1

0

1

= β +β +ε

response variable

explanatory variable

β intercept

β slope

ε 'error'

y x

y

x

y x= 0+1 +0y x= 0.78+0.89 + ε

Page 55: Introduction to Biostatistics for Clinical and Translational Researchers

Arm Circumference and Height

Data on anthropomorphic measures from a random sample of 150 Nepali children up to 12 months old

What is the relationship between average arm circumference and height?

Data:Arm circumference:

Height:

x

s

R

=12.4cm

=1.5cm

= (7.3cm,15.6cm)x

s

R

= 61.6cm

= 6.3cm

= (40.9cm,73.3cm)

Page 56: Introduction to Biostatistics for Clinical and Translational Researchers

Arm Circumference and Height

Treat height as continuous when estimating the relationship

Linear regression is a potential option--it allows us to associate a continuous outcome with a continuous predictor via a linear relationshipThe line estimates the mean value of the outcome for

each continuous value of height in the sample usedMakes a lot of sense, but only if a line reasonably

describes the relationship

Page 57: Introduction to Biostatistics for Clinical and Translational Researchers

Visualizing the Relationship

Scatterplot

Page 58: Introduction to Biostatistics for Clinical and Translational Researchers

Visualizing the Relationship

Does a line reasonably describe the general shape of the relationship?

We can estimate a line using a statistical software package

The line we estimate will be of the form:

Here, is the average arm circumference for a group of children all of the same height, x

y x0 1= β +β

y

Page 59: Introduction to Biostatistics for Clinical and Translational Researchers

Arm Circumference and Height

Page 60: Introduction to Biostatistics for Clinical and Translational Researchers

Arm Circumference and Height

Page 61: Introduction to Biostatistics for Clinical and Translational Researchers

Arm Circumference and Height

How do we interpret the estimated slope?The average change in arm circumference for a one-unit

(1 cm) increase in heightThe mean difference in arm circumference for two

groups of children who differ by one unit (1 cm) in heightThese results estimate that the mean difference in

arm circumferences for a one centimeter difference in height is 0.16 cm, with taller children having greater average arm circumference

Page 62: Introduction to Biostatistics for Clinical and Translational Researchers

Arm Circumference and Height

What is the estimated mean difference in arm circumference for children 60 cm versus 50 cm tall?

Page 63: Introduction to Biostatistics for Clinical and Translational Researchers

Arm Circumference and Height

Our regression results only apply to the range of observed data

Page 64: Introduction to Biostatistics for Clinical and Translational Researchers

Arm Circumference and Height

How do we interpret the estimated intercept?The estimated y when x = 0--the estimated mean arm

circumference for children 0 cm tall.Does this make sense given our sample?Frequently, the scientific interpretation of the

intercept is meaningless.It is necessary for fully specifying the equation of a

line.

Page 65: Introduction to Biostatistics for Clinical and Translational Researchers

Arm Circumference and Height

X = 0 isn’t even on the graph

Page 66: Introduction to Biostatistics for Clinical and Translational Researchers

Inferences using Linear Regression

H0: β1 = 0 (no relationship) versusH1: β1 > 0 (strong positive linear relationship)

or H1: β1 < 0 (strong negative linear relationship)or H1: β1 ≠ 0 (strong linear relationship)

Test statistic: t (df = n – 2)

1

2

1

ˆ 2

ˆi i

i

i

x x y y

x xt

ssx x

Page 67: Introduction to Biostatistics for Clinical and Translational Researchers

Notes

Linear regression performed with a single predictor (one x) is called simple linear regression.Correlation is a measure of the strength of the linear

relationship between two continuous outcomes.Linear regression with more than one predictor is

called multiple linear regression.

k ky x x x0 1 1 2 2= β +β +β + +β +ε

Page 68: Introduction to Biostatistics for Clinical and Translational Researchers

Logistic Regression

When you are interested in describing the relationship between a dichotomous (categorical, nominal) outcome and a predictor x, logistic regression is appropriate.

Conceptually, the method is the same as linear regression MINUS the assumption of y being continuous.

1

ln x

y

0 1= β +β +ε

Pr =1

Page 69: Introduction to Biostatistics for Clinical and Translational Researchers

Logistic Regression

Interpretation of regression coefficients is not straight-forward since they describe the relationship between x and the log-odds of y = 1.

We often use odds ratios to determine the relationship between x and y.

Page 70: Introduction to Biostatistics for Clinical and Translational Researchers

Odds of Death

A logistic regression model was used to describe the relationship between treatment and death:Y = {died, alive}X = {intervention, standard of care}

1

ln x

y

x

0 1= β +β +ε

Pr = death

1 if intervention=

2 if standard of care

Page 71: Introduction to Biostatistics for Clinical and Translational Researchers

Odds of Death

β1 was estimated to be -0.69. What does this mean?If you exponentiate the estimate, you get the odds ratio

relating treatment to the probability of death!exp(-0.69) = 0.5—when treatment involves the

intervention, the odds of dying decrease by 50% (relative to standard of care).

Notice the negative sign—also indicates a decrease in the chances of death, but difficult to interpret without transformation.

Page 72: Introduction to Biostatistics for Clinical and Translational Researchers

Death

β1 was estimated to be 0.41. What does this mean?If you exponentiate the estimate, you get the odds ratio

relating treatment to the probability of death!exp(0.41) = 1.5—when treatment involves the

intervention, the odds of dying increase by 50% (relative to standard of care).

Notice the positive sign—also indicates an increase in the chances of death, but difficult to interpret without transformation.

Page 73: Introduction to Biostatistics for Clinical and Translational Researchers

Logistic Regression

What about when x is continuous?Suppose x is age and y is still representative of

death during the study period.

1

ln x

y

x

0 1= β +β +ε

Pr = death

= baseline age in years

Page 74: Introduction to Biostatistics for Clinical and Translational Researchers

Death

β1 was estimated to be 0.095. What does this mean?If you exponentiate the estimate, you get the odds ratio

relating age to the probability of death!exp(0.095) = 1.1—for every one-year increase in age,

the odds of dying increase by 10%.Notice the positive sign—also indicates a decrease in the

chances of death, but difficult to interpret without transformation.

Page 75: Introduction to Biostatistics for Clinical and Translational Researchers

Multiple Logistic Regression

In the same way that linear regression can incorporate multiple x’s, logistic regression can relate a categorical y response to several independent variables.

Interpretation of partial regression coefficients is the same.

Page 76: Introduction to Biostatistics for Clinical and Translational Researchers

Cox Regression

Cox regression and logistic regression are very similarBoth are trying to describe a yes/no outcomeCox regression also attempts to incorporate the timing

of the outcome in the modeling

Page 77: Introduction to Biostatistics for Clinical and Translational Researchers

Cox vs Logistic Regression

Distinction between rate and proportion:Incidence (hazard) rate: number of “events” per

population at-risk per unit time (or mortality rate, if outcome is death)

Cumulative incidence: proportion of “events” that occur in a given time period

Page 78: Introduction to Biostatistics for Clinical and Translational Researchers

Cox vs Logistic Regression

Distinction between hazard ratio and odds ratio:Hazard ratio: ratio of incidence ratesOdds ratio: ratio of proportions

Logistic regression aims to estimate the odds ratio

Cox regression aims to estimate the hazard ratioBy taking into account the timing of events, more

information is collected than just the binary yes/no.

Page 79: Introduction to Biostatistics for Clinical and Translational Researchers

Proportional Hazards Assumption

Early? Late? Proportional?

Early difference that fades

Difference appears late

Difference is early and maintained

Treatment interacts with time!

Page 80: Introduction to Biostatistics for Clinical and Translational Researchers

Cox Regression

Cox Regression is what we call semiparametricKaplan-Meier is nonparametricThere are also parametric methods which assume the

distribution of survival times follows some type of probability model (e.g., exponential)

Can accommodate both discrete and continuous measures of event times.

Can accommodate multiple x’s.Easy to incorporate time-dependent covariates—

covariates that may change in value over the course of the observation period

Page 81: Introduction to Biostatistics for Clinical and Translational Researchers

For example, evaluating the effect of taking oral contraceptives (OCs) on stress fracture risk in women athletes over two years—many women switch on or off OCs .

If you just examine risk by a woman’s OC-status at baseline, can’t see much effect for OCs. But, you can incorporate times of starting and stopping OCs.

Time Dependent Covariates

Page 82: Introduction to Biostatistics for Clinical and Translational Researchers

Incidence and Prevalence

Page 83: Introduction to Biostatistics for Clinical and Translational Researchers

Incidence and Prevalence

An incidence rate of a disease is a rate that is measured over a period of time; e.g., 1/100 person-years.

For a given time period, incidence is defined as:

Only those free of the disease at time t = 0 can be included in numerator or denominator.

# of newly - diagnosed cases of disease

# of individuals at risk

Page 84: Introduction to Biostatistics for Clinical and Translational Researchers

Incidence and Prevalence

A prevalence ratio is a rate that is taken at a snapshot in time (cross-sectional).

At any given point, the prevalence is defined as

The prevalence of a disease includes both new incident cases and survivors with the illness.

# with the illness

# of individuals in population

Page 85: Introduction to Biostatistics for Clinical and Translational Researchers

Incidence and Prevalence

Prevalence is equivalent to incidence multiplied by the average duration of the disease.

Hence, prevalence is greater than incidence if the disease is long-lasting.

Page 86: Introduction to Biostatistics for Clinical and Translational Researchers

Measurement Error

To this point, we have assumed that the outcome of interest, x, can be measured perfectly.

However, mismeasurement of outcomes is common in the medical field due to fallible tests and imprecise measurement tools.

Page 87: Introduction to Biostatistics for Clinical and Translational Researchers

Diagnostic Testing

True Disease State

Diagnostic Test Result Present (D+) Absent (D-)

Positive (T+) True Positive (TP) False Positive (FP)

Negative (T-) False Negative (FN)

True Negative (TN)

Page 88: Introduction to Biostatistics for Clinical and Translational Researchers

Sensitivity and Specificity

Sensitivity of a diagnostic test is the probability that the test will be positive among people that have the disease.

P(T+| D+) = TP/(TP + FN)Sensitivity provides no information about people that

do not have the disease.Specificity is the probability that the test will be

negative among people that are free of the disease.Pr(T-|D-) = TN/(TN + FP)

Specificity provides no information about people that have the disease.

Page 89: Introduction to Biostatistics for Clinical and Translational Researchers

DiseasedNon-Diseased

Positive DiagnosisNegative Diagnosis

DiseasedNon-Diseased

Positive DiagnosisNegative Diagnosis

Diseased

Healthy

Diagnosed positive

SN = 24/30 = 0.80SP = 56/70 = 0.80Prevalence = 30/100 = 0.30

Page 90: Introduction to Biostatistics for Clinical and Translational Researchers

Diseased

Healthy

DiseasedNon-Diseased

Positive DiagnosisNegative Diagnosis

A perfect diagnostic test has SN = SP = 1

Page 91: Introduction to Biostatistics for Clinical and Translational Researchers

Diseased

Healthy

DiseasedNon-Diseased

Positive DiagnosisNegative Diagnosis

A 100% inaccurate diagnostic test has SN = SP = 0

Page 92: Introduction to Biostatistics for Clinical and Translational Researchers

Sensitivity and Specificity

Example: 100 HIV+ patients are given a new diagnostic test for rapid diagnosis of HIV, and 80 of these patients are correctly identified as HIV+

What is the sensitivity of this new diagnostic test?Example: 500 HIV- patients are given a new

diagnostic test for rapid diagnosis of HIV, and 50 of these patients are incorrectly specified as HIV+

What is the specificity of this new diagnostic test? (Hint: How many of these 500 patients are correctly specified as HIV-?)

Page 93: Introduction to Biostatistics for Clinical and Translational Researchers

Positive and Negative Predictive Value

Positive predictive value is the probability that a person with a positive diagnosis actually has the disease.

Pr(D+|T+) = TP/(TP + FP)This is often what physicians want-patient tests positive for

the disease; does this patient actually have the disease?Negative predictive value is the probability that a person

with a negative test does not have the disease.Pr(D-|T-) = TN/(TN + FN)

This is often what physicians want-patient tests negative for the disease; is this patient truly disease free?

Page 94: Introduction to Biostatistics for Clinical and Translational Researchers

DiseasedNon-Diseased

Positive DiagnosisNegative Diagnosis

DiseasedNon-Diseased

Positive DiagnosisNegative Diagnosis

Diseased

Healthy

Diagnosed positive

PPV = 24/38 = 0.63NPV = 56/62 = 0.90

Page 95: Introduction to Biostatistics for Clinical and Translational Researchers

Diseased

Healthy

DiseasedNon-Diseased

Positive DiagnosisNegative Diagnosis

A perfect diagnostic test has PPV = NPV = 1

Page 96: Introduction to Biostatistics for Clinical and Translational Researchers

Diseased

Healthy

DiseasedNon-Diseased

Positive DiagnosisNegative Diagnosis

A 100% inaccurate diagnostic test has PPV = NPV = 0

Page 97: Introduction to Biostatistics for Clinical and Translational Researchers

PPV and NPV

Example: 50 patients given a new diagnostic test for rapid diagnosis of HIV test positive, and 25 of these patients are actually HIV+.

What is the PPV of this new diagnostic test?Example: 200 patients given a new diagnostic test

for rapid diagnosis of HIV test negative, but 2 of these patients are actually HIV+.

What is the NPV of this new diagnostic test? (Hint: How many of these 200 patients testing negative for HIV are truly HIV-?)