40
Chapter 3, Part 1 Descriptive Statistics II: Numerical Methods Measures of Location Measures of Variability x x s s

Chapter 3, Part 1 Descriptive Statistics II: Numerical Methods

  • Upload
    aletha

  • View
    33

  • Download
    1

Embed Size (px)

DESCRIPTION

Chapter 3, Part 1 Descriptive Statistics II: Numerical Methods. Measures of Location Measures of Variability. x. s. s. m. Measures of Location. Mean Median Mode Percentiles. Example: Apartment Rents. - PowerPoint PPT Presentation

Citation preview

Page 1: Chapter 3, Part 1  Descriptive Statistics II:   Numerical Methods

Chapter 3, Part 1 Descriptive Statistics II:

Numerical Methods

Measures of Location Measures of Variability

xx ss

Page 2: Chapter 3, Part 1  Descriptive Statistics II:   Numerical Methods

Measures of Location

Mean Median Mode Percentiles

Page 3: Chapter 3, Part 1  Descriptive Statistics II:   Numerical Methods

Example: Apartment RentsGiven below is a sample of monthly rent values ($) for one-bedroom apartments. The data is a sample of 70 apartments in a particular city. The data are presented in ascending order. 425425 430430 430430 435435 435435 435435 435435 435435 440440 440440440440 440440 440440 445445 445445 445445 445445 445445 450450 450450

450450 450450 450450 450450 450450 460460 460460 460460 465465 465465465465 470470 470470 472472 475475 475475 475475 480480 480480 480480

480480 485485 490490 490490 490490 500500 500500 500500 500500 510510515515 525525 525525 525525 535535 549549 550550 570570 570570

575575 575575 580580 590590 600600 600600 600600 600600 615615 615615510510

Page 4: Chapter 3, Part 1  Descriptive Statistics II:   Numerical Methods

Mean

The mean of a data set is the average of all the data values.

If the data are from a sample, the mean is denoted by (x-bar)

If the data are from a population, the mean is denoted by (mu).

xx

ni

xN

i

x

Page 5: Chapter 3, Part 1  Descriptive Statistics II:   Numerical Methods

Example: Apartment Rents Mean

xx

ni 34 356

70490 80

,. ,.

425 430 430 435 435 435 435 435 440 440440 440 440 445 445 445 445 445 450 450450 450 450 450 450 460 460 460 465 465465 470 470 472 475 475 475 480 480 480480 485 490 490 490 500 500 500 500 510510 515 525 525 525 535 549 550 570 570575 575 580 590 600 600 600 600 615 615

Page 6: Chapter 3, Part 1  Descriptive Statistics II:   Numerical Methods

Example: Apartment Rents Trimmed Mean

With n = 70, a 5% trimmed mean removes

.05(70) = 3.5 = 4 values from each end of the set.

5% trimmed mean = 30 206

62487 19

,.,.

425 430 430 435 435 435 435 435 440 440440 440 440 445 445 445 445 445 450 450450 450 450 450 450 460 460 460 465 465465 470 470 472 475 475 475 480 480 480480 485 490 490 490 500 500 500 500 510510 515 525 525 525 535 549 550 570 570575 575 580 590 600 600 600 600 615 615

Page 7: Chapter 3, Part 1  Descriptive Statistics II:   Numerical Methods

Median

The median of a data set is the value in the middle when the data items are arranged in ascending order.

If there is an odd number of items, the median is the value of the middle item.

If there is an even number of items, the median is the average of the values for the middle two items.

Page 8: Chapter 3, Part 1  Descriptive Statistics II:   Numerical Methods

425 430 430 435 435 435 435 435 440 440440 440 440 445 445 445 445 445 450 450450 450 450 450 450 460 460 460 465 465465 470 470 472 475 475 475 480 480 480480 485 490 490 490 500 500 500 500 510510 515 525 525 525 535 549 550 570 570575 575 580 590 600 600 600 600 615 615

Example: Apartment Rents Median

Median = 50th percentile

i = (p/100)n = (50/100)70 = 35.5 Averaging the 35th and 36th

data values:

Median = (475 + 475)/2 = 475

Page 9: Chapter 3, Part 1  Descriptive Statistics II:   Numerical Methods

Mode

The mode of a data set is the value that occurs with greatest frequency.

Page 10: Chapter 3, Part 1  Descriptive Statistics II:   Numerical Methods

Example: Apartment Rents

Mode

450 occurred most frequently (7 times)

Mode = 450425 430 430 435 435 435 435 435 440 440440 440 440 445 445 445 445 445 450 450450 450 450 450 450 460 460 460 465 465465 470 470 472 475 475 475 480 480 480480 485 490 490 490 500 500 500 500 510510 515 525 525 525 535 549 550 570 570575 575 580 590 600 600 600 600 615 615

Page 11: Chapter 3, Part 1  Descriptive Statistics II:   Numerical Methods

Percentiles The p th percentile of a data set is a value such that at least p

percent of the items take on this value or less and at least (100-p) percent of the items take on this value or more.

– Arrange the data in ascending order.

– Compute index i, the position of the p th percentile.

i = (p/100)n

– If i is not an integer, round up. The p th percentile is the value in the i th position.

– If i is an integer, the p th percentile is the average of the values in positions i and i+1.

Page 12: Chapter 3, Part 1  Descriptive Statistics II:   Numerical Methods

Example: Apartment Rents 90th Percentile

i = (p/100)n = (90/100)70 = 63

Averaging the 63rd and 64th data values:

90th Percentile = (580 + 590)/2 = 585425 430 430 435 435 435 435 435 440 440440 440 440 445 445 445 445 445 450 450450 450 450 450 450 460 460 460 465 465465 470 470 472 475 475 475 480 480 480480 485 490 490 490 500 500 500 500 510510 515 525 525 525 535 549 550 570 570575 575 580 590 600 600 600 600 615 615

Page 13: Chapter 3, Part 1  Descriptive Statistics II:   Numerical Methods

Quartiles

Quartiles are specific percentiles. First Quartile = 25th Percentile Second Quartile = 50th Percentile = Median Third Quartile = 75th Percentile

Page 14: Chapter 3, Part 1  Descriptive Statistics II:   Numerical Methods

425 430 430 435 435 435 435 435 440 440440 440 440 445 445 445 445 445 450 450450 450 450 450 450 460 460 460 465 465465 470 470 472 475 475 475 480 480 480480 485 490 490 490 500 500 500 500 510510 515 525 525 525 535 549 550 570 570575 575 580 590 600 600 600 600 615 615

Example: Apartment Rents

Third Quartile

Third quartile = 75th percentile

i = (p/100)n = (75/100)70 = 52.5 = 53

Third quartile = 525

Page 15: Chapter 3, Part 1  Descriptive Statistics II:   Numerical Methods

Measures of Variability

Range Variance Standard Deviation Coefficient of Variation

Page 16: Chapter 3, Part 1  Descriptive Statistics II:   Numerical Methods

Range

The range of a data set is the difference between the largest and smallest data values.

It is the simplest measure of dispersion. It is very sensitive to the smallest and largest data

values.

Page 17: Chapter 3, Part 1  Descriptive Statistics II:   Numerical Methods

Example: Apartment Rents Range

Range = largest value - smallest value

Range = 615 - 425 = 190425 430 430 435 435 435 435 435 440 440440 440 440 445 445 445 445 445 450 450450 450 450 450 450 460 460 460 465 465465 470 470 472 475 475 475 480 480 480480 485 490 490 490 500 500 500 500 510510 515 525 525 525 535 549 550 570 570575 575 580 590 600 600 600 600 615 615

Page 18: Chapter 3, Part 1  Descriptive Statistics II:   Numerical Methods

Variance

The variance is the average of the squared differences between each data value and the mean.

If the data set is a sample, the variance is denoted by s2.

If the data set is a population, the variance is denoted by 2.sxi x

n

2

1

( )i22

2

( )xN

2 )xi

Page 19: Chapter 3, Part 1  Descriptive Statistics II:   Numerical Methods

Standard Deviation

The standard deviation of a data set is the positive square root of the variance.

It is measured in the same units as the data, making it more easily comparable to the mean.

If the data set is a sample, the standard deviation is denoted s.

If the data set is a population, the standard deviation is denoted (sigma).

s s 2

2

Page 20: Chapter 3, Part 1  Descriptive Statistics II:   Numerical Methods

Coefficient of Variation

The coefficient of variation indicates how large the standard deviation is in relation to the mean.

If the data set is a sample, the coefficient of variation is computed as follows:

If the data set is a population, the coefficient of variation is computed as follows:

s

x( )100( )

( )100

Page 21: Chapter 3, Part 1  Descriptive Statistics II:   Numerical Methods

Example: Apartment Rents

Variance

Standard Deviation

Coefficient of Variation

sxi x

n2

2

12 996 16

( ), .

, .

s s 2 2996 47 54 74. . . .

s

x 100

54 74

490 80100 11 15

.

.. ..

Page 22: Chapter 3, Part 1  Descriptive Statistics II:   Numerical Methods

Chapter 3, Part 2 Descriptive Statistics II:

Numerical Methods

Measures of Relative Location and Locating Outliers– z -Scores– Chebyshev’s Theorem– The Empirical Rule– Detecting Outliers

Page 23: Chapter 3, Part 1  Descriptive Statistics II:   Numerical Methods

z -Scores

The z -score is often called the standardized value. It denotes the number of standard deviations a data value xi is from the mean.

A data value less than the sample mean will have a z-score less than zero. A data value greater than the sample mean will have a z -score greater than zero. A data value equal to the sample mean will have a z -score of zero.z

x xsi

i

Page 24: Chapter 3, Part 1  Descriptive Statistics II:   Numerical Methods

z -Score of Smallest Value (425)

Standardized Values for Apartment Rents

zx x

s 425 490 80

54 741 20

..

.i ..

.

-1.20-1.20 -1.11-1.11 -1.11-1.11 -1.02-1.02 -1.02-1.02 -1.02-1.02 -1.02-1.02 -1.02-1.02 -0.93-0.93 -0.93-0.93-0.93-0.93 -0.93-0.93 -0.93-0.93 -0.84-0.84 -0.84-0.84 -0.84-0.84 -0.84-0.84 -0.84-0.84 -0.75-0.75 -0.75-0.75-0.75-0.75 -0.75-0.75 -0.75-0.75 -0.75-0.75 -0.75-0.75 -0.56-0.56 -0.56-0.56 -0.56-0.56 -0.47-0.47 -0.47-0.47-0.47-0.47 -0.38-0.38 -0.38-0.38 -0.34-0.34 -0.29-0.29 -0.29-0.29 -0.29-0.29 -0.20-0.20 -0.20-0.20 -0.20-0.20-0.20-0.20 -0.11-0.11 -0.01-0.01 -0.01-0.01 -0.01-0.01 0.170.17 0.170.17 0.170.17 0.170.17 0.350.350.350.35 0.440.44 0.620.62 0.620.62 0.620.62 0.810.81 1.061.06 1.081.08 1.451.45 1.451.451.541.54 1.541.54 1.631.63 1.811.81 1.991.99 1.991.99 1.991.99 1.991.99 2.272.27 2.272.27

-1.20-1.20 -1.02-1.02 -1.02-1.02 -0.93-0.93 -0.93-0.93

Example: Apartment Rents

Page 25: Chapter 3, Part 1  Descriptive Statistics II:   Numerical Methods

Chebyshev’S Theorem

At least (1 - 1/At least (1 - 1/k k 22) of the items in any data set will ) of the items in any data set will be within be within k k standard deviations of the mean, standard deviations of the mean, where where k k is any value greater than 1.is any value greater than 1.– At least 75% of the items must be within At least 75% of the items must be within k k = 2 = 2

standard deviations of the mean.standard deviations of the mean.– At least 89% of the items must be within At least 89% of the items must be within kk = 3 = 3

standard deviations of the mean.standard deviations of the mean.– At least 94% of the items must be within At least 94% of the items must be within kk = 4 = 4

standard deviations of the mean.standard deviations of the mean.

Page 26: Chapter 3, Part 1  Descriptive Statistics II:   Numerical Methods

Example: Apartment Rents

Chebyshev’s Theorem

Let k = 1.5 with = 490.80 and s = 54.74

At least (1 - 1/(1.5)2) = 1 - 0.44 = 0.56 or 56%

of the rent values must be between

- k(s) = 490.80 - 1.5(54.74) = 409

and

+ k(s) = 490.80 + 1.5(54.74) = 573

x

x

x

Page 27: Chapter 3, Part 1  Descriptive Statistics II:   Numerical Methods

Chebyshev’s Theorem (continued)

Actually, 86% of the rent values

are between 409 and 573.

425 430 430 435 435 435 435 435 440 440440 440 440 445 445 445 445 445 450 450450 450 450 450 450 460 460 460 465 465465 470 470 472 475 475 475 480 480 480480 485 490 490 490 500 500 500 500 510510 515 525 525 525 535 549 550 570 570575 575 580 590 600 600 600 600 615 615

Example: Apartment Rents

Page 28: Chapter 3, Part 1  Descriptive Statistics II:   Numerical Methods

Empirical Rule

For data having a bell-shaped distribution:

Approximately 68% of the data values will be within one standard deviation of the mean.

Approximately 95% of the data values will be within two standard deviations of the mean.

Almost all of the items (99%) will be within three standard deviations of the mean.

Page 29: Chapter 3, Part 1  Descriptive Statistics II:   Numerical Methods

Example: Apartment Rents Empirical Rule

Interval % in Interval

Within +/- 1s 436.06 to 545.54 48/70 = 69%

Within +/- 2s 381.32 to 600.28 68/70 = 97%

Within +/- 3s 326.58 to 655.02 70/70 = 100%

425 430 430 435 435 435 435 435 440 440440 440 440 445 445 445 445 445 450 450450 450 450 450 450 460 460 460 465 465465 470 470 472 475 475 475 480 480 480480 485 490 490 490 500 500 500 500 510510 515 525 525 525 535 549 550 570 570575 575 580 590 600 600 600 600 615 615

Page 30: Chapter 3, Part 1  Descriptive Statistics II:   Numerical Methods

Detecting Outliers

An outlier is an unusually small or unusually large value in a data set.

A data value with a z -score less than -3 or greater than +3 might be considered an outlier.

It might be an incorrectly recorded data value. It might be a data value that was incorrectly

included in the data set. It might be a correctly recorded data value that

belongs in the data set!

Page 31: Chapter 3, Part 1  Descriptive Statistics II:   Numerical Methods

Example: Apartment Rents Detecting Outliers

The most extreme z -scores are -1.20 and 2.27.

Using |z | > 3 as the criterion for an outlier, there are no outliers in this data set.

Page 32: Chapter 3, Part 1  Descriptive Statistics II:   Numerical Methods

Chapter 3, Part 3 Descriptive Statistics II:

Numerical Methods Measures of Association Between Two Variables Working with Grouped Data

Page 33: Chapter 3, Part 1  Descriptive Statistics II:   Numerical Methods

Measures of Association Between Two Variables

Covariance Correlation Coefficient

Page 34: Chapter 3, Part 1  Descriptive Statistics II:   Numerical Methods

Covariance Positive values indicate a positive relationship. Negative values indicate a negative relationship. If the data sets are samples, the covariance is

denoted by sxy.

If the data sets are populations, the covariance is denoted by .

sx x y y

nxyi i

( )( )

1

xyi x i yx y

N

( )( )

xy

Page 35: Chapter 3, Part 1  Descriptive Statistics II:   Numerical Methods

Correlation Coefficient The coefficient can take on values between -1 and +1. Values near -1 indicate a strong negative linear relationship. Values near +1 indicate a strong positive linear relationship. If the data sets are samples, the coefficient is denoted by rxy.

If the data sets are populations, the coefficient is denoted by .

rs

s sxyxy

x y

xyxy

x yxy

Where Sx and Sy are the standarddeviations for each variable!

Page 36: Chapter 3, Part 1  Descriptive Statistics II:   Numerical Methods

Mean for Grouped Data Sample Data

Population Data

where fi = frequency of class i

Mi = midpoint of class i

xf Mni i

f M

Ni i

Page 37: Chapter 3, Part 1  Descriptive Statistics II:   Numerical Methods

Example: Apartment Rents Given below is the previous sample of monthly

rents for one-bedroom apartments presented as grouped data in the form of a frequency distribution.

Rent ($) Frequency420-439 8440-459 17460-479 12480-499 8500-519 7520-539 4540-559 2560-579 4580-599 2600-619 6

Rent ($) Frequency420-439 8440-459 17460-479 12480-499 8500-519 7520-539 4540-559 2560-579 4580-599 2600-619 6

Page 38: Chapter 3, Part 1  Descriptive Statistics II:   Numerical Methods

Example: Apartment Rent Mean for Grouped Data

This approximation differs by $2.41

from

the actual sample mean of $490.80.

Rent ($) fi Mi fiMi420-439 8 429.5 3436.0440-459 17 449.5 7641.5460-479 12 469.5 5634.0480-499 8 489.5 3916.0500-519 7 509.5 3566.5520-539 4 529.5 2118.0540-559 2 549.5 1099.0560-579 4 569.5 2278.0580-599 2 589.5 1179.0600-619 6 609.5 3657.0

Total 34525.0

Rent ($) fi Mi fiMi420-439 8 429.5 3436.0440-459 17 449.5 7641.5460-479 12 469.5 5634.0480-499 8 489.5 3916.0500-519 7 509.5 3566.5520-539 4 529.5 2118.0540-559 2 549.5 1099.0560-579 4 569.5 2278.0580-599 2 589.5 1179.0600-619 6 609.5 3657.0

Total 34525.0

x 34 525

70493 21

,.

Page 39: Chapter 3, Part 1  Descriptive Statistics II:   Numerical Methods

Variance for Grouped Data

Sample Data

Population Data

sf M x

ni i2

2

1

( )

22

f MN

i i( )

Page 40: Chapter 3, Part 1  Descriptive Statistics II:   Numerical Methods

Example: Apartment Rents

Sample Variance for Grouped Data

Sample Standard Deviation for Grouped Data

This approximation differs by only $.20 from the

actual standard deviation of $54.74.

s2 3 017 89 , .

s 3 017 89 54 94, . .