Measures of Central Tendency and their dispersion and ... · Compute and distinguish between the uses of measures of central tendency: mean, median and mode. Compute and list some

Measures of Central Tendency and their dispersion and

applications

Acknowledgement: Dr Muslima Ejaz

�LEARNING OBJECTIVES:Compute and distinguish between the uses of measures of central tendency: mean, median and mode. Compute and list some uses for measures of variation of dispersion: range, variance and standard deviation.

9/24/2013 2

range, variance and standard deviation.Understand the distinction between the population mean and the sample mean.Learn the empirical rule and its application.

REFERENCES:Basic Statistics for the Health Sciences, Jan W. Kuzma and Stephen E. Bohnenblust, by Mayfield Publishing Company, 2001.An introduction to Statistical Methods and Data Analysis, Lyman OttPWS-Kent Publishing Company, 1988

�Average speed of a car crossing midtown Manhattan during the day is 5.3 miles /hr

�Average minutes an American father of 4-year-old spend alone with his child each day is 42

�Average American man is 5 feet 9 inches

9/24/2013 3

�Average American man is 5 feet 9 inches and average women is 5 feet 3.6 inches tall

�The average American man is sick in bed seven days a year missing 5 days of work

Measures of Central Tendency (center of the distribution)

�Find a single score that is most typical or most representative of the entire group�Helpful in comparing groups

�No single measure representative in every

9/24/2013 4

�No single measure representative in every situation - three ways of determining central tendency

�Mean�Median�Mode

Mean

� Also called arithmetic mean or average� The sum of all scores divided by the

number of scores

9/24/2013 5

n

XiX

n

i∑

== 1

Sample Mean

�Add up all the observations given in the data, then divide by sample size (n)

� The sample size n is the number of

9/24/2013 6

� The sample size n is the number of observations

Example; Mean

n = 5 Systolic blood pressures ( mmHg)

�X1 = 120

9/24/2013 7

�X2 = 80�X3 = 90�X4 = 110�X5 = 95

Example: Mean

n

XiX

n

i∑

== 1

Mean Systolic Blood Pressure:

9/24/2013 8

Mean Systolic Blood Pressure:

995

495 ==X

Pros and Cons of the Mean

�Pros�Mathematical

center of a distribution.

�Just as far from

�Cons�Influenced by

extreme scores and skewed

9/24/2013 9

�Just as far from scores above it as it is from scores below it.

�Does not ignore any information

and skewed distributions

�One data point could make a great change in sample mean

Example

n= 5 Systolic blood pressures ( mmHg)

� X1 = 120� X2 = 180� X3 = 90

9/24/2013 10

� X3 = 90� X4 = 110� X5 = 95

�Mean Systolic Blood Pressure:

1195

595 ==X

Population Versus Sample Mean

Population —The entire group you want information about –

�For example: The blood pressure of all 18-

9/24/2013 11

�For example: The blood pressure of all 18-year-old male Medical college students at AKU

Cont…

Sample— A part of the population from which we actually collect information and draw conclusions about the whole population –

9/24/2013 12

population –

For example: Sample of blood pressures N=five 18-year-old male college students in AKU

Mean

�Population

N

XiN

i∑

== 1µ“mu”

“sigma”, the sum of X, add up all scores

“N”, the total number of scores in a population

9/24/2013 13

�Sample

N

n

XiX

n

i∑

== 1“X bar”

“n”, the total number of scores in a sample

“sigma”, the sum of X, add up all scores

The Median

�The score that divides the distribution exactly in half when observations are ordered

�The 50th percentile (50%)

9/24/2013 14

�The 50th percentile (50%) �Goal: determine the exact midpoint

�Half of the rank order of observations n+1 / 2�Scores arranged from highest to lowest –

middle score

Example: Median

110, 90, 80, 95, 120

80, 90, 95, 110, 120

9/24/2013 15

� The median is the middle value when observations are ordered.�To find the middle, count in (N+1)/2 scores when

observations are ordered lowest to highest.� Median Systolic BP:

�(5+1)/2 = 3

Finding the median with an even number of scores.�With an even number of scores, the

median is the average of the middle two observations when observations are ordered.

9/24/2013 16

�(95 + 110)/2 = 102.5

80, 90, 95, 110, 120, 125

Example; Median

80, 90, 95, 110, 220

9/24/2013 17

Median

Pros and Cons of Median

�Pros�Not influenced by

extreme scores or skewed

�Cons�Doesn’t take actual

values into account.�As its value is

9/24/2013 18

skewed distributions

�Easier to compute than the mean.

�As its value is determined solely by its rank, provides no information about any of the other values within the distribution

The Mode

�The highest frequency/most frequently occurring score

9/24/2013 19

�Applicable to qualitative and quantitative data�Could be bi-modal or multi-modal

Central Tendency Example: Mode

Mode: most frequent observation

75, 76, 90, 90, 95, 99, 100, 120, 120, 135,135, 155, 170, 186, 196, 205, 220

9/24/2013 20

�Mode: most frequent observation�Mode(s) for Blood Pressure:

�90, 120, 135

Pros and Cons of the Mode

�Pros�Easiest to

compute and understand.

�Cons�Ignores most of

the information in a distribution

9/24/2013 21

understand.

�The score comes from the data set.

in a distribution

�Small samples may not have a mode

Using different measures of central tendencyTwo factors are important in making the decision

of which measure of central tendency should be

used:

� Scale of measurement (ordinal or numerical)

9/24/2013 22

� Scale of measurement (ordinal or numerical)

� Shape of the distribution of observations.

�A distribution can be symmetric or skewed to

the right, positively skewed or to the left,

negatively skewed.

Using different measures of central tendency

� In a normal distribution, the mean, median, and mode are

f(x)

9/24/2013 23

and mode are the same.

µµµµx

Mean Median Mode

The effect of skew on average.

� In a skewed distribution, the mean is pulled toward the tail.

9/24/2013 24

toward the tail.

Using different measures of central tendency

The following guidelines help the researcher decide which measure is best with a given set of data:

�The mean is used for numerical data

Fre

quen

cy

0.3

9/24/2013 25

for numerical data and for symmetric distribution.

Values

Fre

quen

cy

-4 -2 0 2 4

0.0

0.1

0.2

Using different measures of central tendencyThe following guidelines help the researcher decide which measure is best with a given set of data:

�The median is used for ordinal data or for

9/24/2013 26

data or for numerical data whose distribution is skewed.

Using different measures of central tendencyThe following guidelines help the researcher decide which measure is best with a given set of data:

�The mode is used primarily for nominal or ordinal 20

2530

9/24/2013 27

nominal or ordinal data or for numerical data with bimodal distribution Stress Rating

Fre

quen

cy

0 2 4 6 8 10

05

1015

20

Measures of VariationOr

9/24/2013 28

OrMeasures of dispersion

Measures of Variability

�A single summary figure that describes the spread of observations within a distribution.

9/24/2013 29

Centrally located at the Same value on the horizontal axis, but havesubstantially different amount of variability

Measures of Variability� Consider the following two data sets on the ages of all

patients suffering from bladder cancer and prostatic cancer.

39453640353847BC

2752183370PC

9/24/2013 30

� The mean age of both the groups is 40 years. � If we do not know the ages of individual patients and are told only that

the mean age of the patients in the two groups is the same, we may assume that the patients in the two groups have a similar age distribution.

� Variation in the patient’s ages in each of these two groups is very different.

� The ages of the prostatic cancer patients have a much larger variation than the ages of the bladder cancer patients.

Measures of Variability

� Measure the “spread” in the data� Some important measures

�Range�Mean deviation

9/24/2013 31

�Mean deviation�Variance�Standard Deviation�Coefficient of variation

Variability

� The purpose of the majority of medical, behavioural and social science research is to explain or account for variance or differences among individuals or groups.

9/24/2013 32

Examples1. What factors account for the variance (or

difference) in IQ among individuals?2. What factors account for the variance in

treatment compliance among different groups of patients?

Range�The range tells us the span over which

the data are distributed, and is only a very rough measure of variability

�Range: The difference between the

9/24/2013 33

�Range: The difference between the maximum and minimum scores

�Range = 120 – 80 = 40

80, 90, 95, 110, 120

Range

�Range is the simplest measure of dispersion

� It depends entirely on the extreme scores

9/24/2013 34

� It depends entirely on the extreme scores and doesn’t take into consideration the bulk of the observations

Variation

X

5 0.00 5 0.00 5 0.00

XX −

9/24/2013 35

5 0.005 0.005 0.00

= 25 n = 5 = 5∑ X XThis is an example of data with no i.e. zero variability

Variation

X

6 +1.00 4 -1.00 6 +1.00

XX −

9/24/2013 36

6 +1.005 0.004 -1.00

= 25 n = 5 = 5∑ X X

This is an example of data with low variability

Variation

X

8 +3.00 1 -4.00 9 +4.005 0.00

XX −

9/24/2013 37

5 0.002 -3.00

= 25 n = 5 = 5∑ X X

This is an example of data with high variability

Mean deviation

�The best measures of dispersion should:�take into account all the scores in the distribution �and should describe the average deviation of all

observations from the mean.

�Normally, to find the average we would want to

9/24/2013 38

�Normally, to find the average we would want to sum all deviations from the mean and then divide by n, i.e.,

n

xX∑ −

Mean DeviationX | X- x | n = 6; ΣX = 33 3 3 - 5.50 = 2.50 X = Σ X/n5 5 - 5.50 = 0.50 X = 33/69 9 - 5.50 = 3.50 X = 5.502 2 - 5.50 = 3.50

9/24/2013 39

2 2 - 5.50 = 3.508 8 - 5.50 = 2.506 6 - 5.50 = 0.50

= 13

Mean Deviation = 13/ 6 = 2.167

Variance & Standard Deviation

�However, if we square each of the deviations from the mean, we obtain a sum that is not equal to zero

9/24/2013 40

�This is the basis for the measures of varianceand standard deviation, the two most common measures of variability (or dispersion) of data

Variance & Standard Deviation (cont)

X

8 +3.00 9.001 -4.00 16.009 +4.00 16.005 0.00 0.00

XX − ( )2XX −

9/24/2013 41

2 -3.00 9.00= 25 = 0.00 = 50.00

Note: The is called the Sum of Squares

∑ X ( )∑ − XX

( )∑ − 2XX

( )∑ − 2XX

Steps to calculate Variance

�Compute the mean. �Subtract the mean from each observation. �Square each of the deviations.

9/24/2013 42

�Square each of the deviations. �Find the sum of the squares. �Divide the sum by N to get the variance�Take the square root of the variance to get

the standard deviation.

Few Facts� The square root of the variance gives the standard

deviation (SD) and vice versa� Variance is actually the average of the square of the

distance that the each value is from the mean� Why the squared distances and not the actual ones!

Sum of the distances will always be zero, when each

9/24/2013 43

Sum of the distances will always be zero, when each value is squared the negative sign is eliminated

� Why to take the square root? Since distances were squared, the units of the resultant numbers are the squares of the units of the original raw data. Finding the square root of the variance puts the SD in the same units as the raw data. i.e. standard deviation expresses variability in the same units as the data.

Sample Variance

�The sum of squared deviations from the mean divided by the n - 1 (an estimate of the population variance)

9/24/2013 44

( )1

2

2

−−∑=

n

xXs

Variance of a Population

�The sum of squared deviations from the mean divided by the number of scores (sigma squared):

9/24/2013 45

( )N

X 22 µσ −∑=

Standard Deviation Formulas

Population Standard Deviation( )

N

X 2µσ −∑=

( )2−

= ∑ xXs

9/24/2013 46

( )1−−

= ∑n

xXsSample Standard Deviation

Sample standard deviation usually underestimates population standard deviation. Using n-1 in the denominator corrects for this and gives us a better estimate of the population standard deviation.

�Sometimes it is of interest to compare the degree of variability in the distribution of a factor from two different populations or of two different variables from the same populations eg; SBP (factor) among

9/24/2013 47

populations eg; SBP (factor) among children and adults (two different populations) or among adults the distribution of SBP has more spread than that of DBP

Coefficient of variation: expresses the SD as proportion of the mean

� It is a dimensionless measure of the relative variation. �Constructed by dividing the standard deviation by the

mean and multiplying by 100.CV = (SD/mean) * (100)

�It depicts the size of standard deviation relative to its

9/24/2013 48

�It depicts the size of standard deviation relative to its mean

�Used to compare the variability in one data set with that in another when a direct comparison of standard deviation is not appropriate.

Coefficient of variation

� The formula is:� CV = (s/x) (100)� Suppose two samples

of human males yield

Children

Adults

11 yrs25 yrsMean age

9/24/2013 49

of human males yield the following results:

age

80lbs145lbsMean wt

10lbs10lbsSD

12.5%6.9%CV

Using different measures of dispersion

The following guidelines help investigators decide which measure of dispersion is most appropriate for a given set of data:

� The standard deviation is used when the mean is used i.e., with symmetric distributions of

9/24/2013 50

is used i.e., with symmetric distributions of numerical data

� The range is used with numerical data when the purpose is to emphasize extreme values.

� The coefficient of variation is used when the intent is to compare two numerical distributions measured on different scales.

Empirical Rule

�Specifies the proportion of the spread in terms of the standard deviation

� It applies to the normal symmetric or bell-shaped distribution

9/24/2013 51

shaped distribution�Approx 68% of the data values will fall within 1 SD of

the mean�Approx 95% of the data values will fall within 2 SD of

the mean�Approx 99.7% of the data values will fall within 3 SD

of the mean

Empirical Rule

95%

99.7%

Approximate percentage of area within given standard deviations

9/24/2013 52

68%

95%

Assume the distribution of underlying variable is symmetric and bell shaped (Normal)

Example

�Scores on a National Achievement Exam have a mean of 480 and a SD of 90. And if these scores are normally distributed, then�approximately 68% will fall between 390 & 570

9/24/2013 53

�approximately 68% will fall between 390 & 570�approximately 95% will fall between 300 & 660�approximately 99.7% will fall between 210 &

750

Women participating in a three-day experimental diet regime have been demonstrated to have normally distributed weight loss with mean 600 g and a standard deviation 200 g.

Application of the Empirical Rule

9/24/2013 54

and a standard deviation 200 g.

a) What percentage of these women will have a weight loss between 400 and 800 g?

b) What percentage of women will lose weight too quickly on the diet (where too much weight is defined as >1000g)?

X : (600,200)

~ 68%

a)

9/24/2013 55

600 800 1000 12004002000

X : (600,200)b)

9/24/2013 56

600 800 1000 12004002000

2.3%

Documents

Measures of Central Tendency and their dispersion and ... · Compute and distinguish between the uses of measures of central tendency: mean, median and mode. Compute and list some