30
Lecture 5 Chebyshev’s Theorem and Exercises Introduction to Probability and Statistics I

Introduction to Probability and Statistics Ihomes.ieu.edu.tr/~ytutuncu/Math211_2010_Lecture5.pdf · ... (x,y) = 0 x and y are independent Interpreting Covariance. Coefficient of Correlation

  • Upload
    buique

  • View
    229

  • Download
    2

Embed Size (px)

Citation preview

Page 1: Introduction to Probability and Statistics Ihomes.ieu.edu.tr/~ytutuncu/Math211_2010_Lecture5.pdf · ... (x,y) = 0 x and y are independent Interpreting Covariance. Coefficient of Correlation

Lecture 5

Chebyshev’s Theorem

and

Exercises

Introduction to Probability and

Statistics I

Page 2: Introduction to Probability and Statistics Ihomes.ieu.edu.tr/~ytutuncu/Math211_2010_Lecture5.pdf · ... (x,y) = 0 x and y are independent Interpreting Covariance. Coefficient of Correlation

Cruise agency – number of weekly specials to the

Caribbean: 20, 73, 75, 80, 82

Review Example

Compute the mean, median and

mode and interpret your

results?

Page 3: Introduction to Probability and Statistics Ihomes.ieu.edu.tr/~ytutuncu/Math211_2010_Lecture5.pdf · ... (x,y) = 0 x and y are independent Interpreting Covariance. Coefficient of Correlation

Review Example:Summary Statistics

Mean:

Median: middlemost observation = 75

Mode: no unique mode exists

33066

5

ixx

n

The median best describes the data due to the

presence of the outlier of 20. This skews the

distribution to the left. The manager should first check

to see if the value ‘20’ is correct.

20, 73, 75, 80, 82

Page 4: Introduction to Probability and Statistics Ihomes.ieu.edu.tr/~ytutuncu/Math211_2010_Lecture5.pdf · ... (x,y) = 0 x and y are independent Interpreting Covariance. Coefficient of Correlation

Review Example:Summary Statistics

Mean:

Median: middlemost observation = 75

Mode: no unique mode exists

33066

5

ixx

n

The median best describes the data due to the

presence of the outlier of 20. This skews the

distribution to the left. The manager should first check

to see if the value ‘20’ is correct.

20, 73, 75, 80, 82

Page 5: Introduction to Probability and Statistics Ihomes.ieu.edu.tr/~ytutuncu/Math211_2010_Lecture5.pdf · ... (x,y) = 0 x and y are independent Interpreting Covariance. Coefficient of Correlation

common stocks

4 14.3 19 -14.7 -26.5 37.2 23.8

treasury bills

6.5 4.4 3.8 6.9 8 5.8 5.1

Review Example

57.128.16

7

i

stocks

x

N

40.5025.786

7

i

Tbills

x

N

The mean annual % return on stocks is higher than the

return for U.S. Treasury bills

Page 6: Introduction to Probability and Statistics Ihomes.ieu.edu.tr/~ytutuncu/Math211_2010_Lecture5.pdf · ... (x,y) = 0 x and y are independent Interpreting Covariance. Coefficient of Correlation

common stocks

4 14.3 19 -14.7 -26.5 37.2 23.8

treasury bills

6.5 4.4 3.8 6.9 8 5.8 5.1

Review Example

2

2( )i

stocks

x

N

2 2 2 2 2 2 2(4.0 8.16) (14.3 8.16) (19 8.16) ( 14.7 8.16) ( 26.5 8.16) (37.2 8.16) (23.8 8.16)

7

= 20.648

2

2( )i

Tbills

x

N

2 2 2 2 2 2 2(6.5 5.8) (4.4 5.8) (3.8 5.8) (6.9 5.8) (8.0 5.8) (5.8 5.8) (5.1 5.8)

7

=1.362

The variability of the U.S. Treasury bills is much smaller than the return on stocks.

Page 7: Introduction to Probability and Statistics Ihomes.ieu.edu.tr/~ytutuncu/Math211_2010_Lecture5.pdf · ... (x,y) = 0 x and y are independent Interpreting Covariance. Coefficient of Correlation

common stocks

4 14.3 19 -14.7 -26.5 37.2 23.8

treasury bills

6.5 4.4 3.8 6.9 8 5.8 5.1

Review Example

2

2( )i

stocks

x

N

2 2 2 2 2 2 2(4.0 8.16) (14.3 8.16) (19 8.16) ( 14.7 8.16) ( 26.5 8.16) (37.2 8.16) (23.8 8.16)

7

= 20.648

2

2( )i

Tbills

x

N

2 2 2 2 2 2 2(6.5 5.8) (4.4 5.8) (3.8 5.8) (6.9 5.8) (8.0 5.8) (5.8 5.8) (5.1 5.8)

7

=1.362

The variability of the U.S. Treasury bills is much smaller than the return on stocks.

Page 8: Introduction to Probability and Statistics Ihomes.ieu.edu.tr/~ytutuncu/Math211_2010_Lecture5.pdf · ... (x,y) = 0 x and y are independent Interpreting Covariance. Coefficient of Correlation

For any population with mean μ and

standard deviation σ , and k > 1 , the

percentage of observations that fall within

the interval

[μ + kσ]Is at least

Chebyshev’s Theorem

)]%(1/k100[1 2

Page 9: Introduction to Probability and Statistics Ihomes.ieu.edu.tr/~ytutuncu/Math211_2010_Lecture5.pdf · ... (x,y) = 0 x and y are independent Interpreting Covariance. Coefficient of Correlation

Regardless of how the data are distributed,

at least (1 - 1/k2) of the values will fall

within k standard deviations of the mean

(for k > 1)

Examples:

(1 - 1/12) = 0% ……..... k=1 (μ ± 1σ)

(1 - 1/22) = 75% …........ k=2 (μ ± 2σ)

(1 - 1/32) = 89% ………. k=3 (μ ± 3σ)

Chebyshev’s Theorem

withinAt least

(continued)

Page 10: Introduction to Probability and Statistics Ihomes.ieu.edu.tr/~ytutuncu/Math211_2010_Lecture5.pdf · ... (x,y) = 0 x and y are independent Interpreting Covariance. Coefficient of Correlation

If the data distribution is bell-shaped, then

the interval:

contains about 68% of the values in

the population or the sample

The Empirical Rule

1σμ

μ

68%

1σμ

Page 11: Introduction to Probability and Statistics Ihomes.ieu.edu.tr/~ytutuncu/Math211_2010_Lecture5.pdf · ... (x,y) = 0 x and y are independent Interpreting Covariance. Coefficient of Correlation

contains about 95% of the values in

the population or the sample

contains about 99.7% of the values

in the population or the sample

The Empirical Rule

2σμ

3σμ

3σμ

99.7%95%

2σμ

Page 12: Introduction to Probability and Statistics Ihomes.ieu.edu.tr/~ytutuncu/Math211_2010_Lecture5.pdf · ... (x,y) = 0 x and y are independent Interpreting Covariance. Coefficient of Correlation

Coefficient of Variation

Measures relative variation

Always in percentage (%)

Shows variation relative to mean

Can be used to compare two or more sets of

data measured in different units

100%x

sCV

Page 13: Introduction to Probability and Statistics Ihomes.ieu.edu.tr/~ytutuncu/Math211_2010_Lecture5.pdf · ... (x,y) = 0 x and y are independent Interpreting Covariance. Coefficient of Correlation

A random sample of data has Mean = 75, variance

= 25.

Use Chebychev’s theorem to determine the

percent of observations between 65 and 85.

If the data are mounded use the emprical rule to

find the approximate percent of observations

between 65 and 85.

Review Example

Page 14: Introduction to Probability and Statistics Ihomes.ieu.edu.tr/~ytutuncu/Math211_2010_Lecture5.pdf · ... (x,y) = 0 x and y are independent Interpreting Covariance. Coefficient of Correlation

A random sample of data has Mean = 75, variance

= 25.

Use Chebychev’s theorem. +/- 2 standard

deviations:

proportion must be at least

= = at least 75%

Review Example

2100[1 (1/ )]%k 2100[1 (1/ 2 )]%

The empirical rule. +/- 2 standard deviations:

Approximately 95% of the observations are within 2

standard deviations from the mean

Page 15: Introduction to Probability and Statistics Ihomes.ieu.edu.tr/~ytutuncu/Math211_2010_Lecture5.pdf · ... (x,y) = 0 x and y are independent Interpreting Covariance. Coefficient of Correlation

Comparing Coefficient of Variation

Stock A:

Average price last year = $50

Standard deviation = $5

Stock B:

Average price last year = $100

Standard deviation = $5

Both stocks

have the same

standard

deviation, but

stock B is less

variable relative

to its price

10%100%$50

$5100%

x

sCVA

5%100%$100

$5100%

x

sCVB

Page 16: Introduction to Probability and Statistics Ihomes.ieu.edu.tr/~ytutuncu/Math211_2010_Lecture5.pdf · ... (x,y) = 0 x and y are independent Interpreting Covariance. Coefficient of Correlation

Weighted Mean

The weighted mean of a set of data is

Where wi is the weight of the ith observation

Use when data is already grouped into n classes, with wi values in the ith class

i

nn2211

n

1i

ii

w

xwxwxw

w

xw

x

Page 17: Introduction to Probability and Statistics Ihomes.ieu.edu.tr/~ytutuncu/Math211_2010_Lecture5.pdf · ... (x,y) = 0 x and y are independent Interpreting Covariance. Coefficient of Correlation

Approximations for Grouped

Data

Suppose a data set contains values m1, m2, . . ., mk,

occurring with frequencies f1, f2, . . . fK

For a population of N observations the mean is

For a sample of n observations, the mean is

N

mf

μ

K

1i

ii

n

mf

x

K

1i

ii

K

1i

ifNwhere

K

1i

ifnwhere

Page 18: Introduction to Probability and Statistics Ihomes.ieu.edu.tr/~ytutuncu/Math211_2010_Lecture5.pdf · ... (x,y) = 0 x and y are independent Interpreting Covariance. Coefficient of Correlation

Approximations for Grouped

Data

Suppose a data set contains values m1, m2, . . ., mk,

occurring with frequencies f1, f2, . . . fK

For a population of N observations the variance is

For a sample of n observations, the variance is

N

μ)(mf

σ

K

1i

2

ii2

1n

)x(mf

s

K

1i

2

ii2

Page 19: Introduction to Probability and Statistics Ihomes.ieu.edu.tr/~ytutuncu/Math211_2010_Lecture5.pdf · ... (x,y) = 0 x and y are independent Interpreting Covariance. Coefficient of Correlation

The Sample Covariance

The covariance measures the strength of the linear relationship between two variables

The population covariance:

The sample covariance:

Only concerned with the strength of the relationship

No causal effect is implied

N

))(y(x

y),(xCov

N

1i

yixi

xy

1n

)y)(yx(x

sy),(xCov

n

1i

ii

xy

Page 20: Introduction to Probability and Statistics Ihomes.ieu.edu.tr/~ytutuncu/Math211_2010_Lecture5.pdf · ... (x,y) = 0 x and y are independent Interpreting Covariance. Coefficient of Correlation

Covariance between two variables:

Cov(x,y) > 0 x and y tend to move in the same direction

Cov(x,y) < 0 x and y tend to move in opposite directions

Cov(x,y) = 0 x and y are independent

Interpreting Covariance

Page 21: Introduction to Probability and Statistics Ihomes.ieu.edu.tr/~ytutuncu/Math211_2010_Lecture5.pdf · ... (x,y) = 0 x and y are independent Interpreting Covariance. Coefficient of Correlation

Coefficient of Correlation

Measures the relative strength of the linear relationship between two variables

Population correlation coefficient:

Sample correlation coefficient:

YX ss

y),(xCovr

YXσσ

y),(xCovρ

Page 22: Introduction to Probability and Statistics Ihomes.ieu.edu.tr/~ytutuncu/Math211_2010_Lecture5.pdf · ... (x,y) = 0 x and y are independent Interpreting Covariance. Coefficient of Correlation

Features of Correlation Coefficient, r

Unit free

Ranges between –1 and 1

The closer to –1, the stronger the negative linear

relationship

The closer to 1, the stronger the positive linear

relationship

The closer to 0, the weaker any positive linear

relationship

Page 23: Introduction to Probability and Statistics Ihomes.ieu.edu.tr/~ytutuncu/Math211_2010_Lecture5.pdf · ... (x,y) = 0 x and y are independent Interpreting Covariance. Coefficient of Correlation

Scatter Plots of Data with Various Correlation Coefficients

Y

X

Y

X

Y

X

Y

X

Y

X

r = -1 r = -.6 r = 0

r = +.3r = +1

Y

Xr = 0

Page 24: Introduction to Probability and Statistics Ihomes.ieu.edu.tr/~ytutuncu/Math211_2010_Lecture5.pdf · ... (x,y) = 0 x and y are independent Interpreting Covariance. Coefficient of Correlation

Interpreting the Result

r = .733

There is a relatively

strong positive linear

relationship between

test score #1

and test score #2

Students who scored high on the first test tended to score high on second test

Scatter Plot of Test Scores

70

75

80

85

90

95

100

70 75 80 85 90 95 100

Test #1 ScoreT

est

#2 S

co

re

Page 25: Introduction to Probability and Statistics Ihomes.ieu.edu.tr/~ytutuncu/Math211_2010_Lecture5.pdf · ... (x,y) = 0 x and y are independent Interpreting Covariance. Coefficient of Correlation

Obtaining Linear Relationships

An equation can be fit to show the best linear

relationship between two variables:

Y = β0 + β1X

Where Y is the dependent variable and X is the

independent variable

Page 26: Introduction to Probability and Statistics Ihomes.ieu.edu.tr/~ytutuncu/Math211_2010_Lecture5.pdf · ... (x,y) = 0 x and y are independent Interpreting Covariance. Coefficient of Correlation

Least Squares Regression

Estimates for coefficients β0 and β1 are found to

minimize the sum of the squared residuals

The least-squares regression line, based on sample

data, is

Where b1 is the slope of the line and b0 is the y-

intercept:

xbby 10ˆ

x

y

2

x

1s

sr

s

y)Cov(x,b xbyb 10

Page 27: Introduction to Probability and Statistics Ihomes.ieu.edu.tr/~ytutuncu/Math211_2010_Lecture5.pdf · ... (x,y) = 0 x and y are independent Interpreting Covariance. Coefficient of Correlation

The following data give X, the price charged per

piece of plywood($) and Y, the quantitiy sold ( in

thousands)

(6,80) (7,60) (8,70) (9,40)(10,0)

Compute the covariance

Correlation coefficient

Compute and interpret regression coefficients.

What quantity of plywood is expected to be sold if

the price were $7 per piece?

Review Example

Page 28: Introduction to Probability and Statistics Ihomes.ieu.edu.tr/~ytutuncu/Math211_2010_Lecture5.pdf · ... (x,y) = 0 x and y are independent Interpreting Covariance. Coefficient of Correlation

(6,80) (7,60) (8,70) (9,40)(10,0)

Compute the covariance = -45

Correlation coefficient= -.900. The correlation coefficient indicates

the strength of the linear association between the two variables

Compute and interpret regression coefficients.

What quantity of plywood is expected to be sold if the price were $7

per piece?

Review Example

6 80 -2 4 30 900 -60

7 60 -1 1 10 100 -10

8 70 0 0 20 400 0

9 40 1 1 -10 100 -10

10 0 2 4 -50 2500 -100

40 250 0 10 0 4000 -180

= 8.00 = 50.00 = 2.5 =1000 Cov(x,y) = -45

= 1.5811 =31.623

)( xxi )( yyi 2)( xxi

2)( yyi )( yyi )( xxi

Page 29: Introduction to Probability and Statistics Ihomes.ieu.edu.tr/~ytutuncu/Math211_2010_Lecture5.pdf · ... (x,y) = 0 x and y are independent Interpreting Covariance. Coefficient of Correlation

(6,80) (7,60) (8,70) (9,40)(10,0)

Compute and interpret regression coefficients.

For a one dollar increase in the price per piece of plywood, the

quantity sold of plywood is estimated to decrease by 18 thousand

pieces

= 50.0 – (-18)(8.0) = 194.00

What quantity of plywood is expected to be sold if the price were $7

per piece?

Review Example

0.185.2

45),(21

xs

yxCovb

xbyb 10

0 1ˆ 194.00 18.0(7) 68y b b x

Page 30: Introduction to Probability and Statistics Ihomes.ieu.edu.tr/~ytutuncu/Math211_2010_Lecture5.pdf · ... (x,y) = 0 x and y are independent Interpreting Covariance. Coefficient of Correlation

Summary

Described measures of central tendency Mean, median, mode

Illustrated the shape of the distribution Symmetric, skewed

Described measures of variation Range, interquartile range, variance and standard deviation,

coefficient of variation

Discussed measures of grouped data

Calculated measures of relationships between

variables covariance and correlation coefficient