1 Descriptive Statistics: Numerical Methods Chapter III

Preview:

Citation preview

1

Descriptive Statistics: Numerical Methods

Chapter III

2

Key Learning Objectives and Topics in this Chapter

Measures of Location: (Mean, Median, Mode, Percentiles, Quartiles)

Measures of Dispersion/Variability ( Range, Variance, Standard Deviation, Coefficient of Variation)

Measures of distribution shape, and association between two variables

3

Important Note

In all cases :

Know the formulas, learn the computation procedures (i.e., apply the formulas) and know the meaning (interpretation) of the measures computed.

Use Excel; Practice! Practice! and

Practice!

4

3.1. Introduction

When describing data, usually we focus our attention on two types of measures..

Central location (e.g. average) Variability or Spread

These measures could be computed for Population: Parameters Sample : Statistics

5

With one data pointclearly the central location is at the pointitself.

3.2 Measures of Central Location

A center is a reference point. Thus a good measure of central location is expected to reflect the locations of all the other actual points in the data.

How?

if the third data point appears on the left hand-sideof the center, it should “pull”the central location to the left.

With two data points,the central location should fall in the middlebetween them (in order to reflect the location ofboth of them).

6

Measures of LocationIf the measures are computed

for data from a sample,they are called sample statistics.

If the measures are computed for data from a population,

they are called population parameters.

A sample statistic is referred toas the point estimator of the

corresponding population parameter.

Mean Median Mode Percentiles Quartiles

7

This is the most popular and useful measure of central location

i) The Arithmetic Mean (µ)

Sum of the observationsNumber of observationsMean =

8

Sample mean Population mean

N

N

1iix

Number of observationsIn the sample (Sample size)

Number of ObservationsIn the Population (Population size)

n

Xx

n

ii

1

i) The Arithmetic MeanSum of the values of

Observations in the dataSum of the values of

Observations in the data

9

• Example 1Time (hours) spent by 10 adults on the Internet are as follows: 0, 7, 12, 5, 33, 14, 8, 0, 9, 22 hours.

Based on this data, compute the mean (average) amount of time spent on the Internet?

i) The Arithmetic Mean

hours1110

110==

10

22+9+0 +8+14+33+5+12+7+0

n

Xx

n

ii

1

Based on this data, the average amount of time spent on the internet by a typical adult is 11 hours.

10

The Median of a set of observations is the value that falls in the middle of a data that is arranged in certain order (ascending or descending).

It is the value that divides the observation into two equal halves

ii) The Median

ii) The Median

To find the median: We Put the data in an array (in increasing or decreasing order).

If the total number of observation in the data set is an ODD number, the median is the middle value.

If the total number of observation contained in the data set is EVEN, then the median is the AVERAGE of the middle two values.

12

Odd Number Observations Median= 8

0, 0, 5, 7, 8 9, 12, 14, 22

Example 2aFind the median for the following observations.

0, 7, 12, 5, 14, 8, 0, 9, 22

iii) The Median

Step-1: Arrange the data in increasing/ decreasing order

Step-2: Count the total number of observation in the data (9) …

13

0, 0, 5, 7, 8, 9, 12, 14, 22, 33

Example 2bFind the median for the following observations.

0, 7, 12, 5, 33, 14, 8, 0, 9, 22

iii) The Median

Even number Observations

Median=(8+9)/2=8.5

Step-1: Arrange the data in increasing/ decreasing order

Step-2: Count the total number of observation in the

data (10)…

ii) The Median

Note: The median (8 in example 2a)of an odd set of data is a

member of the data values.

The median (8.5 in example 2b) of an even data set is not necessarily a member of the set of values.

Unlike the mean, the median is not affected by the value of an observation in the data set.

III) The Center: Mode

The mode is the most frequent value.

The Mode is the value that occurs most frequently in the data. It is the value with the highest frequency

In any data set there is only one value for the mean or the median. However, a data set may have more than one value for the mode.

16

One modal class

III) The Center: Mode

Two modal classes

Histogram of Income distribution

17

Example 3: What is the mode for the following data?

0, 7, 12, 5, 33, 14, 8, 0, 9, 22

Solution All observation except “0” occur once. There are two “0”

values. Thus, the mode is zero.

Is this a good measure of central location?

The value “0” does not reside at the center of this set(compare with the mean = 11.0 and the median = 8.5).

III) The Center: Mode

18

• If mean = median = mode, the shape of the distribution is symmetric.

Comparing Measures ofCentral Tendency: Mean, Median, Mode

19

If mode < median < mean, the shape of the distribution trails to the right, is positively skewed.

A positively skewed distribution(“skewed to the right”)

MeanMedian

Mode MeanMedian

Mode

A negatively skewed distribution(“skewed to the left”)

Comparing Measures ofCentral Tendency: Mean, Median, Mode

• If mode > median > mean, the shape of the distribution

trails to the left, is negatively skewed.

20

A percentile provides information about the relative location and spread of the data between the smallest to the largest value.

Is a measure of the relative location, but not necessarily that of the central location

Percentile tells us the proportion of observationsthat lie below or above a certain value in the data. Example: Admission test scores for colleges and universitiesare frequently reported in terms of percentiles.

Percentiles

21

Definition:

The pth percentile of a data set is a value such that at least p percent of the items take on this value or less and at least (100 - p) percent of the items take on this value or more.

Percentiles

22

Arrange the data in ascending order.

Compute the ith position of the pth percentile.

If i is not an integer, round up. The p th percentile is the value in the i th position.

If i is an integer, the p th percentile is the average of the values in positions i and i +1.

Computing Percentiles

100100

xp

i

xn

pi

100

23

425 430 430 435 435 435 435 435 440 440440 440 440 445 445 445 445 445 450 450450 450 450 450 450 460 460 460 465 465465 470 470 472 475 475 475 480 480 480480 485 490 490 490 500 500 500 500 510510 515 525 525 525 535 549 550 570 570575 575 580 590 600 600 600 600 615 615

i = (p/100)n = (75/100)X10 =7.5

Rounding 7.5, we note that the 8th data value is

The 75th Percentile = 435

Compute the 75th percentile of the following data

24

425 430 430 435 435 435 435 435 440 440440 440 440 445 445 445 445 445 450 450450 450 450 450 450 460 460 460 465 465465 470 470 472 475 475 475 480 480 480480 485 490 490 490 500 500 500 500 510510 515 525 525 525 535 549 550 570 570575 575 580 590 600 600 600 600 615 615

i = (p/100)n = (50/100)X10 =5

Averaging the 5th and 6th data value, we get

Compute the 50th percentile of the following data

5th Percentile = (435 + 435)/2 = 435

25

Quartiles

Quartiles are specific percentiles.

First Quartile = 25th Percentile

Second Quartile = 50th Percentile = the Median

Third Quartile = 75th Percentile

26

Quartiles Divide a data set into four equal parts

QuartilethitheoflocationtheisiQWhere

NQ

NQ

NQ ; ;

4

)1(334

)1(22;

4

)1(1

+=

+=

+=

27

3.2 Measures of Variability

28

3.2 Measures of Variability

Measures of central location fail to tell the whole story about the distribution.

A question of interest that remains unanswered even after obtaining measures of central location is how spread out are the observations around the central (say, mean) value?

• Variability is Important in business decisions.

• For example, in choosing between two suppliers A and B, we might consider not only the average delivery time for each, but also the variability in delivery time for each.

29

Measures of Variability

Range

Inter-Quartile Range

Variance

Standard Deviation

Coefficient of Variation

30

The range in a set of observations is the difference between the largest and smallest observations.

The range is the distance between the smallest and the largest data value in the set.

• Range = largest value – smallest value

Its major advantage is the ease with which it can be computed. Its major shortcoming is its failure to provide information on the

dispersion of the observations between the two end points. It is also very sensitive to the smallest and largest data

values

i) The Range

31

This is a measure of the spread of the middle 50% of the observations

Large value indicates a large spread of the observations

Is not sensitive to extreme data values

Inter quartile range = Q3 – Q1

ii) Inter Quartile Range

32

iii) The Variance

Is the average of the squared differences between each data value and the measure of central location (mean)

Is calculated differently when we use population and when we use a sample

The variance is a measure of variability that utilizes all the data.

33

N

xN

ii

1

2

2

)-(

1-

)-(1

2

2

n

xxs

n

ii

iv) The Variance

Variance of a Population

Variance of a sample

34

Why divide by n-1 instead of n ?

Better approximation of the population variance

iii) The Variance

Why square the difference?

Sum of deviation from the mean is zero

1-

)-(1

2

2

n

xxs

n

ii

35

1-

)-(1

2

2

n

xxs

n

ii

Example- Computing the Variance-Based on a Sample data

Variance of a sample

Find the variance of the following sample observations

9 11 8 12

36

Computing Variance of a sample

33.33

10

14

2)2(11 22222

s

8-10= -2

9-10= -111-10= +1

12-10= +2

104

40

4

128119

XStep-1: Find the mean

Step-2: Compute deviations from the mean

Step-3: Square the deviations, add them together, and divide

the sum of the squared deviations by n-1

37

The standard deviation of a set of observations is the square root of the variance .

2

2

:deviationandardstPopulation

ss:deviationstandardSample

iv) Standard Deviation

38

Why Standard Deviation?

The standard deviation Is often reported in the actual unit of measure in

which the data is recorded.

Thus it can be used to compare the variability of several distributions that are measured in the same units,

It can also be used to make a statement about the general shape of a distribution (Kurtosis).

39

Computing the standard deviation

33.33

10

14

2)2(11 22222

s10

4

40

4

128119

X

8-10= -2

9-10= -111-10= +1

12-10= +2

Step-1: Find the mean

Step-2: Compute deviations from the mean

Step-3: Square the deviations, add them together, and divide

the sum of the squared deviations by n-1

step-4: Take the square root of the variance 824.133.32 ss

40

The coefficient of variation is computed as follows:

V) Coefficient of Variation

100 %s

x

The coefficient of variation is a measure of how large the standard deviation is relative to the mean.

for asample

for apopulation

100 %

CV=

41

A standard deviation of 10 may be perceived large when the mean value is 100, but it is only moderately large if the mean value is 500

Why Coefficient of Variation?

Example: Is a standard deviation of 10 large?

Coefficient of Variation can be used to compare variability in data sets that are measured in different units.

42

54.74100 % 100 % 11.15%

490.80sx

22 ( )

2,996.161

ix xs

n

2 2996.47 54.74s s

the standarddeviation isabout 11%

of the mean

Variance

Standard Deviation

Coefficient of Variation

Variance, Standard Deviation,and Coefficient of Variation

44

425 430 430 435 435 435 435 435 440 440440 440 440 445 445 445 445 445 450 450450 450 450 450 450 460 460 460 465 465465 470 470 472 475 475 475 480 480 480480 485 490 490 490 500 500 500 500 510510 515 525 525 525 535 549 550 570 570575 575 580 590 600 600 600 600 615 615

Compute every single measure of central location and Variability you have learned in this chapter for the following sample rent data on 70 efficiency apartments

Recommended