46
Biostatistics course Part 9 Comparison between two means Dr. Sc Nicolas Padilla Raygoza Department Nursing and Obstetrics Division Health Sciences and Engineering Campus Celaya-Salvatierra University of Guanajuato, Mexico

Biostatistics course Part 9 Comparison between two means Dr. Sc Nicolas Padilla Raygoza Department Nursing and Obstetrics Division Health Sciences and

Embed Size (px)

Citation preview

Page 1: Biostatistics course Part 9 Comparison between two means Dr. Sc Nicolas Padilla Raygoza Department Nursing and Obstetrics Division Health Sciences and

Biostatistics course Part 9Comparison between two

means

Dr. Sc Nicolas Padilla RaygozaDepartment Nursing and Obstetrics

Division Health Sciences and EngineeringCampus Celaya-Salvatierra

University of Guanajuato, Mexico

Page 2: Biostatistics course Part 9 Comparison between two means Dr. Sc Nicolas Padilla Raygoza Department Nursing and Obstetrics Division Health Sciences and

Biosketch

Medical Doctor by University Autonomous of Guadalajara. Pediatrician by the Mexican Council of Certification on

Pediatrics. Postgraduate Diploma on Epidemiology, London School of

Hygine and Tropical Medicine, University of London. Master Sciences with aim in Epidemiology, Atlantic International

University. Doctorate Sciences with aim in Epidemiology, Atlantic

International University. Professor Titular A, Full Time, University of Guanajuato. Level 1 National Researcher System [email protected]

Page 3: Biostatistics course Part 9 Comparison between two means Dr. Sc Nicolas Padilla Raygoza Department Nursing and Obstetrics Division Health Sciences and

Competencies

The reader will apply a Z test to inferences from a comparison of two paired means.

He (she) will apply a Z test to inferences from two independent means.

He (she) will apply t test to inferences from a mean of differences in a small sample.

He (she) will apply a t test to inferences for two independent means in a small sample.

He (she) will obtain a confidence interval for two independent means and for a mean of differences.

Page 4: Biostatistics course Part 9 Comparison between two means Dr. Sc Nicolas Padilla Raygoza Department Nursing and Obstetrics Division Health Sciences and

Introduction

Often we want to compare two groups. The statistical methods used for the

comparison of two means depends on how these means were obtained.

The data can be obtained from paired or not paired samples.

Page 5: Biostatistics course Part 9 Comparison between two means Dr. Sc Nicolas Padilla Raygoza Department Nursing and Obstetrics Division Health Sciences and

Paired data

How to obtain paired data? Paired samples occur when first measure is

matched with a second measure in the same subject.

For quantitative data usually occurs when there are repeated measurements on the same person.

Page 6: Biostatistics course Part 9 Comparison between two means Dr. Sc Nicolas Padilla Raygoza Department Nursing and Obstetrics Division Health Sciences and

Example

In a study to determine whether birth weight measurements are adequate, we compared the birth weight of newborns from a hospital in Celaya, Gto.

The measurements were performed by different people, to control the measurement bias, being an observer blinded to the measurement of another observer.

Page 7: Biostatistics course Part 9 Comparison between two means Dr. Sc Nicolas Padilla Raygoza Department Nursing and Obstetrics Division Health Sciences and

Non-paired data

How to obtain non-paired data? We get non-paired data when observations in

a sample are independent from observations in another sample.

Page 8: Biostatistics course Part 9 Comparison between two means Dr. Sc Nicolas Padilla Raygoza Department Nursing and Obstetrics Division Health Sciences and

Example

To study the effects of a new drug to treat the parasitic burden of Ascaris lumbricoides, patients were randomized to receive nitazoxanide (group A) and albendazole (group B).

The effect of the drug in each group was measured and compared.

In the analysis of paired data we calculate the difference between the first and second measurement. This gives us a sample of differences, and then apply the methods of analysis for quantitative data from one mean.

Page 9: Biostatistics course Part 9 Comparison between two means Dr. Sc Nicolas Padilla Raygoza Department Nursing and Obstetrics Division Health Sciences and

Analysis of quantitative paired data

When analyzing paired data, you must first calculate the difference between two measurements in the same subject.

We measurement birth weights of newborns in Celaya, by two observers.

Patient Observer 1 (g) Observer 2 (g) Difference (d)

1 2970 3010 - 40

2 3525 3650 -125

3 3100 3125 - 25

4 2750 2550 200

5 4000 4050 - 50

6 3200 3300 -200

7 3000 3000 0

8 2500 2700 -200

9 3200 3400 -200

10 3900 3700 200

Page 10: Biostatistics course Part 9 Comparison between two means Dr. Sc Nicolas Padilla Raygoza Department Nursing and Obstetrics Division Health Sciences and

To assess the difference in paired measurements we can calculate the mean differences and confidence intervals; we can also calculate whether the mean of the differences is significantly different from 0.

The notation that we use to indicate the mean of differences and standard deviation in the sample and the population are displayed:

Population Sample

Mean of differences

_

δ

_

d

Standard deviation σ s

Analysis of quantitative paired data

Page 11: Biostatistics course Part 9 Comparison between two means Dr. Sc Nicolas Padilla Raygoza Department Nursing and Obstetrics Division Health Sciences and

Confidence interval

If there is no difference between the paired measurements, the average of the differences will be 0.

To calculate the confidence interval of the mean of the differences in the sample and test the hypothesis that is equal to 0, we need to know: The mean differences The standard deviation of differences The standard error of the mean of the differences.

Page 12: Biostatistics course Part 9 Comparison between two means Dr. Sc Nicolas Padilla Raygoza Department Nursing and Obstetrics Division Health Sciences and

Confidence interval

We can estimate the confidence interval around the mean of the differences in the sample in the same way as we did for one mean.

The confidence interval at 95% tells us that we have 95% confidence that the true mean of differences in the population is between the confidence interval 95% to the sides of the mean of differences of the sample.

Page 13: Biostatistics course Part 9 Comparison between two means Dr. Sc Nicolas Padilla Raygoza Department Nursing and Obstetrics Division Health Sciences and

Confidence interval

The general formula for confidence interval 95% is: Estimate of the sample ± 1.96 X SE of the

estimate of the sample Then the confidence interval 95% for the

mean of the differences is: δ + 1.96 x (s (δ) / √ n)

δ is the mean of the differences. 1.96 is the multiplier used to calculate the

confidence interval at 95%. If it is calculated at 90% using 1.64 as a

multiplier.

Page 14: Biostatistics course Part 9 Comparison between two means Dr. Sc Nicolas Padilla Raygoza Department Nursing and Obstetrics Division Health Sciences and

Example

Confidence interval 95% d of birth weights = -34.0 s= 140.94 SE= 140.94/√10=44.60 -34±1.96 (44.60) = -121.42 a 53.42

Page 15: Biostatistics course Part 9 Comparison between two means Dr. Sc Nicolas Padilla Raygoza Department Nursing and Obstetrics Division Health Sciences and

Example

Confidence interval 90% d of birth weights = -34.0 s= 140.94 SE= 140.94/√10=44.60 -34±1.64 (44.60) = -107.14 a 39.1

Page 16: Biostatistics course Part 9 Comparison between two means Dr. Sc Nicolas Padilla Raygoza Department Nursing and Obstetrics Division Health Sciences and

Hypothesis test for a mean of differences A confidence interval gives us a 95% range to

the sides of the mean of the differences that we have confidence in 95% of times that it includes the mean of differences in the population.

We can also calculate the probability that, on average, there is no difference between the paired observations in the population, using a hypothesis test.

Page 17: Biostatistics course Part 9 Comparison between two means Dr. Sc Nicolas Padilla Raygoza Department Nursing and Obstetrics Division Health Sciences and

The null hypothesis is that the mean differences in the population is zero: Ho: δ = 0 This is equivalent to say that the distribution of

mean of differences in the sample is Normal with mean 0 and a standard error that depends on the standard deviation of the difference in the population.

The alternative hypothesis is that the mean of the difference in population is not zero: Ha: δ ≠ 0

Hypothesis test for a mean of differences

Page 18: Biostatistics course Part 9 Comparison between two means Dr. Sc Nicolas Padilla Raygoza Department Nursing and Obstetrics Division Health Sciences and

Test hypothesis: To test null hypothesis, we calculate Z

test

Mean of differences of the sample - mean of the difference of hypothesis d - 0z = ----------------------------------------------------- = ------------ standard error of the mean of the ES(d)

differences if the sample Where the mean of differences of hypothesis is

zero.

Hypothesis test for a mean of differences

Page 19: Biostatistics course Part 9 Comparison between two means Dr. Sc Nicolas Padilla Raygoza Department Nursing and Obstetrics Division Health Sciences and

Calculate the value of z in the hypothesis test, tells us how many standard errors of the mean observed is the center of the distribution, defined by the null hypothesis.

δ - 0

Z= -----------------

S(δ) /√n

Hypothesis test for a mean of differences

Page 20: Biostatistics course Part 9 Comparison between two means Dr. Sc Nicolas Padilla Raygoza Department Nursing and Obstetrics Division Health Sciences and

Example

We have seen that the mean of differences in weight in 10 babies was -34, with s = 140.9 and confidence intervals at 95% -121.42 to 53.42 gr.

We want to find out if the measurements taken by the two observers were really different.

Page 21: Biostatistics course Part 9 Comparison between two means Dr. Sc Nicolas Padilla Raygoza Department Nursing and Obstetrics Division Health Sciences and

Example

We should note the null hypothesis: “In average, all possible measurements taken

by two observers arte equal” or Mean of the differences in the population is

zero. Alternative hypothesis will be: the mean of the

differences in the population will no be zero.

Page 22: Biostatistics course Part 9 Comparison between two means Dr. Sc Nicolas Padilla Raygoza Department Nursing and Obstetrics Division Health Sciences and

Example

-34 – 0To test hypothesis, we calculate z = ----------- = - 0.76 44.60

Assuming that the mean of the differences is normally distributed with mean zero, the test result said that mean of differences estimate is -0.76 standard errors from the center of the distribution.

Referring the Z value of -0.76 in tables for two tails of Normal distribution, the p-value is 0.44.

The conclusion is that we accept the null hypothesis and say the sampling variation is a likely explanation for the mean of differences.

Page 23: Biostatistics course Part 9 Comparison between two means Dr. Sc Nicolas Padilla Raygoza Department Nursing and Obstetrics Division Health Sciences and

How obtain the p-value

In the table of distribution Z or Normal, we search the Z value obtained with our test and see in the column on the right, the corresponding p-value.

This table can be found in textbooks of Biostatistics.

Page 24: Biostatistics course Part 9 Comparison between two means Dr. Sc Nicolas Padilla Raygoza Department Nursing and Obstetrics Division Health Sciences and

Small paired samples

When the sample size is small, the distribution of samples is not exactly Normal, but the follow the t distribution.

Therefore, if the sample size is small (less than 50) we use the values of the t distribution for calculating the confidence interval and hypothesis test.

Page 25: Biostatistics course Part 9 Comparison between two means Dr. Sc Nicolas Padilla Raygoza Department Nursing and Obstetrics Division Health Sciences and

Confidence interval for paired sample

Formulae for 95% confidence interval is estimate ± t0.05 (ES)

Where estimate is the mean of differences t0.05 is the value of t distribution to 0.05 of p

with n-1 degree of freedom. The first column from t distribution is the

degrees of freedom corresponding to n-1. We go on the right until the value of 0.05 and that is the multiplier used for the confidence interval.

Page 26: Biostatistics course Part 9 Comparison between two means Dr. Sc Nicolas Padilla Raygoza Department Nursing and Obstetrics Division Health Sciences and

Hypothesis test for small paired samples The formulae for hypothesis test is:

t = mean of differences – 0 /SE The formulae is similar that Z test, only that

the result, to obtain the p-value, is search in the table of t distribution.

The first column is degree of freedom (n-1) and it is search on the right the t value and in top of the column see the p-value.

Page 27: Biostatistics course Part 9 Comparison between two means Dr. Sc Nicolas Padilla Raygoza Department Nursing and Obstetrics Division Health Sciences and

Differs from the analysis of paired data, as we observe the difference between two independent means rather than the mean of the difference of two paired observations.

Examples Do smokers have a different blood pressure

than non-smokers? In a sample of smokers and non-smokers:

Systolic blood pressure averaged 148 and 138 non-smokers.

The difference in average is 148-138 = 10.

Analysis of independent samples

Page 28: Biostatistics course Part 9 Comparison between two means Dr. Sc Nicolas Padilla Raygoza Department Nursing and Obstetrics Division Health Sciences and

Analysis of independent samples

Notation: We are observing two independent populations and it is

needed two samples, we need additional notations. As shown in the table below:

Remember that we use Greek letters for population parameters and Latin letters for the sample estimates:

The lower numbers serve to distinguish between sample 1 and sample 2, and between populations 1 and 2.

Population Sample 1 2 1 2

_ _

Mean μ1 μ2 X1 X2

Standard deviation σ1 σ2 s1 s2

Page 29: Biostatistics course Part 9 Comparison between two means Dr. Sc Nicolas Padilla Raygoza Department Nursing and Obstetrics Division Health Sciences and

The sampling distribution of the difference between two independent means is found using the same procedures used for a single sample.

Repeatedly took random samples of size n1 and size n2 and each time, we calculated the means (x1, x2) and standard deviations (s1, s2) in both populations and then measure the difference between the means for each pair of samples.

The result is a sampling distribution of differences between two independent means.

Sampling distribution for two independent samples

Page 30: Biostatistics course Part 9 Comparison between two means Dr. Sc Nicolas Padilla Raygoza Department Nursing and Obstetrics Division Health Sciences and

Sampling distribution for two independent samples Generating this distribution we see that:

1 .- The mean of the sampling distribution is the value of the population, which is the difference between the two means in the population. 2 .- The standard deviation of the sampling distribution depends on n1 and n2, which are the sample sizes. 3 .- The shape of the distribution becomes closer to Normal when n1 and n2, are increasing.

We know that the sampling distribution of any estimate of the sample can be inferred from the data collected from only one sample.

The same principles apply here: the sampling distribution of difference of means can be inferred from only one group of two samples. To do this, we need: The difference between the two means from the samples The standard error of the difference between the two means

from the samples

Page 31: Biostatistics course Part 9 Comparison between two means Dr. Sc Nicolas Padilla Raygoza Department Nursing and Obstetrics Division Health Sciences and

The standard error of the difference between two independent means is the combination of the standard errors of two independent sampling distributions.

We know that the standard error for half of the sample is:

s

SE = --------

√ n Variance of the mean is the square of standard error:

Variance = σ2 / n

Standard error for the distribution of differences of means

Page 32: Biostatistics course Part 9 Comparison between two means Dr. Sc Nicolas Padilla Raygoza Department Nursing and Obstetrics Division Health Sciences and

One can show that the variance of two independent means is equal to the sum of the variances of the two averages of samples as:

σ1 σ2SE (X1) = ------- SE (X2) = -------- n1 n2 _ _ σ2

1 σ22

Variance (X1 –X2) = variance of X1 + variance of X2 = --------- + ------- n1 n2 The variances are coupled because each sample contributes to

sampling error of the distribution of differences. Then, the standard error of the difference between two independent

samples is given by: σ2

1 σ22

SE (X1 – X2) = √ ------- + ------ n1 n2

Standard error for the distribution of differences of means

Page 33: Biostatistics course Part 9 Comparison between two means Dr. Sc Nicolas Padilla Raygoza Department Nursing and Obstetrics Division Health Sciences and

Standard error for the distribution of differences of means In most situations we do not know the

standard deviations of the population (σ1 and σ2), in the practice, we use the standard deviations of the sample (s1 and s2) so that:

s21 s2

1

SE(X1 – X2) = √ ------- + ---------

n1 n2

Page 34: Biostatistics course Part 9 Comparison between two means Dr. Sc Nicolas Padilla Raygoza Department Nursing and Obstetrics Division Health Sciences and

Confidence interval for the difference of two means Assuming that the sampling distribution of

(X1 – X2) is Normal, we can calculate confidence interval for the difference of two means using the formulae general:

Difference of means ± 1.96 (ES (X1 –X2)) For a 95% confidence interval, assuming

Normal distribution:

_ _

(X1 – X2) ± 1.96 [√(s21 / n1) + (s2

1 / n2)]

Page 35: Biostatistics course Part 9 Comparison between two means Dr. Sc Nicolas Padilla Raygoza Department Nursing and Obstetrics Division Health Sciences and

Example

In a study to evaluate the efficacy of oral rehydration solution (ORS) in children with acute diarrhea, 40 children were in the treatment group and 40 children in the control group. We measured the duration in hours of diarrhea and its standard deviation.

Group n Mean duration of diarrhea s

Treatment 40 72 10

Control 40 120 12

Page 36: Biostatistics course Part 9 Comparison between two means Dr. Sc Nicolas Padilla Raygoza Department Nursing and Obstetrics Division Health Sciences and

Example

To calculate confidence interval 95% for the difference between means of independent samples, we need to calculate difference between means and standard error:

_ _ X1 – X2 = 72 – 120 = - 48 hours

s21 s2

2 102 122

ES(X1 – X2) = √ -----+ ----- = √------ + ---- =√2.5+3.6 = 2.47 n1 n2 40 4095% IC = -48 ± 1.96 (2.47)= - 52.84 a – 43.16

Page 37: Biostatistics course Part 9 Comparison between two means Dr. Sc Nicolas Padilla Raygoza Department Nursing and Obstetrics Division Health Sciences and

Example

The difference from means was -48 hours with an standard error of 2.47.

Confidence interval 95% say us that we have 95% of confidence that the difference between means of duration of diarrhea in the population is between -52.8 hours and -43.16 hours.

The interval does not include the unit, we can say that the difference of means is significant statistically.

Page 38: Biostatistics course Part 9 Comparison between two means Dr. Sc Nicolas Padilla Raygoza Department Nursing and Obstetrics Division Health Sciences and

Hypothesis test for two independent means To calculate probability (p-value) that two independent means

are equal. We use Z test to probe hypothesis. We used the Z test in the same form, that in did in mean of the

differences in paired samples: Null hypothesis is that the two means are equal:

Ho: μ1 – μ2 = 0 Alternative hypothesis is: H1: μ1 - μ2 ≠ 0 Then, the formulae for Z test is:

_ _ (X1 – X2) - 0z = ------------------

ES(X1 –X2)

ES (X1 –X2) = √(s21 /n1) + (s2

1 /n2)

Page 39: Biostatistics course Part 9 Comparison between two means Dr. Sc Nicolas Padilla Raygoza Department Nursing and Obstetrics Division Health Sciences and

Example To apply the hypothesis test in the study of oral rehydration

solution, of the duration of diarrhea is in average the same for the two groups. Differences from means is - 48 hours. Standard error is 2.47.

- 48 - 0 Z = ----------- = - 19.43 2.47

This say us that the observed difference is -19.43 standard errors from the center of distribution (0).

P-value, for z= -19.43 is <0.0001 If it does not having difference in duration of diarrhea, should

having a small opportunity (p<0.0001) of observe an extreme difference as observed.

We can say that it is more probable that the means are different; difference in mean in the group with ORS comparing with control group, are different statistically.

Page 40: Biostatistics course Part 9 Comparison between two means Dr. Sc Nicolas Padilla Raygoza Department Nursing and Obstetrics Division Health Sciences and

When comparing two independent samples that are small, we use the t distribution instead of the Normal distribution to calculate confidence intervals and test hypotheses.

The procedure is similar to that we used data from a sample, with one exception: when calculating the standard error.

The common variance: With small samples, we estimate a common variance using

data from two independent samples. Is the average of the two variances:

(n1 – 1)s21 + (n2 -1)s2

1 S2 = --------------------------- (n1 – 1) + (n2 -1)

Small samples with two independent samples

Page 41: Biostatistics course Part 9 Comparison between two means Dr. Sc Nicolas Padilla Raygoza Department Nursing and Obstetrics Division Health Sciences and

Small samples with two independent samples Standard error of the difference of means in

the samples is:

SE(X1-X2) = s x √1/n1 + 1/n2

Page 42: Biostatistics course Part 9 Comparison between two means Dr. Sc Nicolas Padilla Raygoza Department Nursing and Obstetrics Division Health Sciences and

Example

In a study for the treatment of iron deficiency anemia, with two different types of iron, were randomized the students in a village school, to receive either treatment.

Initially, the levels of hemoglobin (HB) in g / dl. were similar in both groups.

After 3 months of treatment were measured the levels of HB.

Page 43: Biostatistics course Part 9 Comparison between two means Dr. Sc Nicolas Padilla Raygoza Department Nursing and Obstetrics Division Health Sciences and

Example

Hemoglobin n Mean (g/Dl.) s Iron A 15 14.8 0.5Iron B 13 12.1 1.1Confidence interval 95% = difference of means

± multiplier t0.05 x SE

Multiplier t0.05 with n-2 degree of freedom = 2.056S2 = (15-1)0.52 + (13 -1)1.12 /15-1 + 13-1 =3.5

+14.52/26 = 18/26 =0.69

Page 44: Biostatistics course Part 9 Comparison between two means Dr. Sc Nicolas Padilla Raygoza Department Nursing and Obstetrics Division Health Sciences and

Example

Hemoglobin n Mean (g/Dl.) s Iron A 15 14.8 0.5Iron B 13 12.1 1.1Confidence interval 95% = 14.8 - 12.1 ± 2.056 x

0.32SE = s √1/n1 + 1/n2 = √0.69 x√1/15 + 1/13=0.83 x 0.379 = 0.32 CI95% = 2.7± 0.66 =2.04 a 3.36

Page 45: Biostatistics course Part 9 Comparison between two means Dr. Sc Nicolas Padilla Raygoza Department Nursing and Obstetrics Division Health Sciences and

Example

Hemoglobin n Mean (g/Dl.) s Iron A 15 14.8 0.5Iron B 13 12.1 1.1

Ho: µ1=µ2 o µ1-µ2= 0

HA: µ1≠µ2 o µ1-µ2≠ 0t= (14.8 - 12.1)-0 / 0.32 = 8.44df n-2 = 26 p<0.05

Page 46: Biostatistics course Part 9 Comparison between two means Dr. Sc Nicolas Padilla Raygoza Department Nursing and Obstetrics Division Health Sciences and

Bibliografía

1.- Last JM. A dictionary of epidemiology. New York, 4ª ed. Oxford University Press, 2001:173.

2.- Kirkwood BR. Essentials of medical statistics. Oxford, Blackwell Science, 1988: 1-4.

3.- Altman DG. Practical statistics for medical research. Boca Ratón, Chapman & Hall/ CRC; 1991: 1-9.