Upload
oliver-hudson
View
229
Download
0
Tags:
Embed Size (px)
Citation preview
1
Chapter 5The Standard Deviation and
the Normal Model
2
68-95-99.7 rule
Mean andStandard Deviation
(numerical)
Histogram(graphical)
68-95-99.7 rule
3
The 68-95-99.7 rule; applies only to mound-shaped data
approximately 68% of the measurements
are within 1 standard deviation of the mean,
that is, in ( , )
approx. 95% of the measurements are within
2 stand. dev. of the mean, i.e., in ( 2 , 2 )
almos
y s y s
y s y s
t all the measurements are within 3 stan.
dev of the mean, i.e., in ( 3 , 3 )y s y s
4
68-95-99.7 rule: 68% within 1 stan. dev. of the mean
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
-5
-4.5 -4
-3.5 -3
-2.5 -2
-1.5 -1
-0.5 0
0.5 1
1.5 2
2.5 3
3.5 4
4.5 5
68%
34%34%
y-s y y+s
5
68-95-99.7 rule: 95% within 2 stan. dev. of the mean
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
-5
-4.5 -4
-3.5 -3
-2.5 -2
-1.5 -1
-0.5 0
0.5 1
1.5 2
2.5 3
3.5 4
4.5 5
95%
47.5% 47.5%
y-2s y y+2s
6
Example: textbook costs
375.48
42.72
50
y
s
n
286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480
7
Example: textbook costs (cont.)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480
375.48 42.72
( , ) (332.76, 418.20)
32percentage of data values in this interval 64%;
5068-95-99.7 rule: 68%
y s
y s y s
1 standard deviation interval about the mean
8
Example: textbook costs (cont.)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480
375.48 42.72
( 2 , 2 ) (290.04, 460.92)
48percentage of data values in this interval 96%;
5068-95-99.7 rule: 95%
y s
y s y s
2 standard deviation interval about the mean
9
Example: textbook costs (cont.)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480
375.48 42.72
( 3 , 3 ) (247.32, 503.64)
50percentage of data values in this interval 100%;
5068-95-99.7 rule: 99.7%
y s
y s y s
3 standard deviation interval about the mean
10
The best estimate of the standard deviation of the men’s weights
displayed in this dotplot is
1 2 3 4
4%9%
71%
16%
1. 10
2. 15
3. 20
4. 40
Changing Units of Measurement
Shifting data and rescaling data, and how shifting and rescaling data affect graphical and numerical summaries of data.
Shifting and rescaling: linear transformations
Original data x1, x2, . . . xn
Linear transformation:x* = a + bx, (intercept a, slope b)
x
x*
0
aShifts data by a
Changes scale
Linear Transformationsx* = a+ b x
Examples: Changing1. from feet (x) to inches (x*): x*=12x2. from dollars (x) to cents (x*):
x*=100x3. from degrees celsius (x) to degrees
fahrenheit (x*): x* = 32 + (9/5)x 4. from ACT (x) to SAT (x*): x*=150+40x5. from inches (x) to centimeters (x*):
x* = 2.54x
0 120 10032 9/5150 400 2.54
Shifting data only: b = 1x* = a + x
Adding the same value a to each value in the data set: changes the mean, median, Q1 and Q3
by a The standard deviation, IQR and
variance are NOT CHANGED. Everything shifts together. Spread of the items does not change.
Shifting data only: b = 1x* = a + x (cont.)
weights of 80 men age 19 to 24 of average height (5'8" to 5'10") x = 82.36 kg
NIH recommends maximum healthy weight of 74 kg. To compare their weights to the recommended maximum, subtract 74 kg from each weight; x* = x – 74 (a=-74, b=1)
x* = x – 74 = 8.36 kg
1. No change in shape
2. No change in spread
3. Shift by 74
Shifting and Rescaling data: x* = a + bx, b > 0
Original x data:x1, x2, x3, . . ., xn
Summary statistics:mean xmedian m1st quartile Q1
3rd quartile Q3
stand dev svariance s2
IQR
x* data: x* = a + bxx1*, x2*, x3*, . . ., xn*
Summary statistics:new mean x* = a + bxnew median m* = a+bmnew 1st quart Q1*= a+bQ1
new 3rd quart Q3* = a+bQ3
new stand dev s* = b snew variance s*2 = b2 s2
new IQR* = b IQR
Rescaling data: x* = a + bx, b > 0 (cont.)
weights of 80 men age 19 to 24, of average height (5'8" to 5'10")
x = 82.36 kg min=54.30 kg max=161.50 kg range=107.20 kg s = 18.35 kg
Change from kilograms to pounds:x* = 2.2x (a = 0, b = 2.2)
x* = 2.2(82.36)=181.19 pounds min* = 2.2(54.30)=119.46 pounds max* = 2.2(161.50)=355.3 pounds range*= 2.2(107.20)=235.84 pounds s* = 18.35 * 2.2 = 40.37 pounds
Example of x* = a + bx
4 student heights in inches
(x data)62, 64, 74, 72x = 68 inchess = 5.89 inches
Suppose we wantcentimeters instead:x* = 2.54x(a = 0, b = 2.54)
4 student heights in centimeters:
157.48 = 2.54(62)162.56 = 2.54(64)187.96 = 2.54(74)182.88 = 2.54(72)x* = 172.72 centimeterss* = 14.9606 centimeters
Note thatx* = 2.54x = 2.54(68)=172.2s* = 2.54s =
2.54(5.89)=14.9606
not necessary!UNC method
Go directly to this. NCSU method
Example of x* = a + bxx data:Percent returns from 4investments during2003:5%, 4%, 3%, 6%x = 4.5%s = 1.29%Inflation during 2003:2%x* data:Inflation-adjusted returns.x* = x – 2%(a=-2, b=1)
x* data:
3% = 5% - 2%2% = 4% - 2%1% = 3% - 2%4% = 6% - 2%x* = 10%/4 = 2.5%s* = s = 1.29%
x* = x – 2% = 4.5% –2%s* = s = 1.29% (note!
thats* ≠ s – 2%) !!
not necessary!
Go directly to this
Example Original data x: Jim Bob’s jumbo watermelons from
his garden have the following weights (lbs):
23, 34, 38, 44, 48, 55, 55, 68, 72, 75s = 17.12; Q1=37, Q3 =69; IQR = 69 – 37 = 32
Melons over 50 lbs are priced differently; the amount each melon is over (or under) 50 lbs is:
x* = x 50 (x* = a + bx, a=-50, b=1)-27, -16, -12, -6, -2, 5, 5, 18, 22, 25
s* = 17.12; Q*1 = 37 - 50 =-13, Q*3 = 69 - 50 = 19
IQR* = 19 – (-13) = 32 NOTE: s* = s, IQR*= IQR
SUMMARY: Linear Transformations x* = a + bx
Assembly Time (seconds)
0
5
10
15
20
25
30
Fre
qu
en
cy
Linear transformations do not affect the shape of the distribution of the data-for example, if the original data is right-skewed, the transformed data is right-skewed
Assembly Time (minutes)
0
10
20
30
Fre
qu
en
cy
SUMMARY: Shifting and Rescaling data, x* = a + bx, b > 0
* * *1 2 3 1 2 3
*
*
*1 1 1
*3 3 3
original data , , ,... transformed data , , ,...
summary statistics summary statistics
mean new mean
median new median
1st new
3rd new
st dev
x x x x x x
x x a bx
m m a bm
Q Q a bQ
Q Q a bQ
*
2 2 2 2
new st dev
var. new var. *
new *
s s bs
s s b s
IQR IQR bIQR
24
Z-scores: Standardized Data Values
Measures the distance of a number from the mean in units of
the standard deviation
25
z-score corresponding to y
where
original data value
the sample mean
s the sample standard deviation
the z-score corresponding to
y yz
s
y
y
z y
26
Exam 1: y1 = 88, s1 = 6; exam 1 score: 91
Exam 2: y2 = 88, s2 = 10; exam 2 score: 92
Which score is better?
1
2
91 88 3z .5
6 692 88 4
z .410 10
91 on exam 1 is better than 92 on exam 2
If data has mean and standard deviation ,
then standardizing a particular value of
indicates how many standard deviations
is above or below the mean .
y s
y
y
y
27
Comparing SAT and ACT Scores
SAT Math: Eleanor’s score 680
SAT mean =500 sd=100 ACT Math: Gerald’s score 27
ACT mean=18 sd=6 Eleanor’s z-score: z=(680-500)/100=1.8 Gerald’s z-score: z=(27-18)/6=1.5 Eleanor’s score is better.
Z-scores: a special linear transformation a + bx
1 1where ,
x x x xz x a bx a b
s s s s s
Example. At a community college, if a student takes x credit hours the tuition is x* = $250 + $35x. The credit hours taken by students in an Intro Stats class have mean x = 15.7 hrs and standard deviation s = 2.7 hrs.
Question 1. A student’s tuition charge is $941.25. What is the z-score of this tuition?
x* = $250+$35(15.7) = $799.50; s* = $35(2.7) = $94.50
941.25 799.50 141.75 1.594.50 94.50
z
Z-scores: a special linear transformation a + bx (cont.)Example. At a community college, if a student takes x credit hours the tuition is x* = $250 + $35x. The credit hours taken by students in an Intro Stats class have mean x = 15.7 hrs and standard deviation s = 2.7 hrs.
Question 2. Roger is a student in the Intro Stats class who has a course load of x = 13 credit hours. The z-score isz = (13 – 15.7)/2.7 = -2.7/2.7 = -1.What is the z-score of Roger’s tuition?
Roger’s tuition is x* = $250 + $35(13) = $705
Since x* = $250+$35(15.7) = $799.50; s* = $35(2.7) = $94.50
705-799.50 -94.50z= = =-194.50 94.50
This is why z-scores are so useful!!
The linear transformation did not change the z-score!
30
Z-scores add to zero
Student/Institutional Support to Athletic Depts For the 8 Public ACC Schools: 2008 ($ millions)
School Support y - ybar Z-score
Clemson 4.5 -3.713 -0.8806
FSU 7.5 -0.7125 -0.1690
GaTech 6.0 -2.213 -0.5248
Maryland 17.1 8.8875 2.1082
NCSU 5.5 -2.713 -0.6434
UNC 6.4 -1.813 -0.4299
UVA 11.9 3.6875 0.8747
VaTech 6.8 -1.413 -0.3351
Mean=8.2125, s=4.216
Sum = 0 Sum = 0
31
Average IQ by BrowserNationally:Mean IQ=100sd = 15
81 100 19z 1.27
15 15
127 100 27
z 1.8015 15
Story was exposed as a hoax
32
NORMAL PROBABILITY MODELS
The Most Important Model for Data in Statistics
33
X83 6 9 120
A family of bell-shaped curves that differ only in their means and standard deviations.
µ = the mean
= the standard deviation
µ = 3 and = 1
34
Normal Probability Models
The mean, denoted ,can be any number
The standard deviation can be any nonnegative number
The total area under every normal model curve is 1
There are infinitely many normal distributions
35
Total area =1; symmetric around µ
36
The effects of The effects of and and
How does the standard deviation affect the shape of f(x)?
= 2
=3 =4
= 10 = 11 = 12How does the expected value affect the location of f(x)?
37X3 6 9 120
X3 6 9 120
µ = 3 and = 1
µ = 6 and = 1
38X83 6 9 120
X83 6 9 120
µ = 6 and = 2
µ = 6 and = 1
39
area under the density curve between 6 and 8.
3 6 9 12
µ = 6 and = 2
0 X
40
area under the density curve between 6 and 8
41
Standardizing
Suppose X~N( Form a new random variable by
subtracting the mean from X and dividing by the standard deviation :
(X This process is called standardizing the
random variable X.
42
Standardizing (cont.)
(X is also a normal random variable; we will denote it by Z:
Z = (X has mean 0 and standard deviation
1:E(Z) = = 0; SD(Z) =
The probability distribution of Z is called
the standard normal distribution.
43
Standardizing (cont.) If X has mean and stand. dev. , standardizing
a particular value of x tells how many standard deviations x is above or below the mean .
Exam 1: =80, =10; exam 1 score: 92
Exam 2: =80, =8; exam 2 score: 90
Which score is better?
1 exam on 92 than better is 2 exam on 90
1.258
10
8
8090z
1.210
12
10
8092z
2
1
44
X83 6 9 120
µ = 6 and = 2
Z0 1 2 3-1-2-3
.5.5
µ = 0 and = 1
(X-6)/2
45
Z = standard normal random variable
= 0 and = 1
Z0 1 2 3-1-2-3
.5.5
Standard Normal Model
.5.5
46
Important Properties of Z
#1. The standard normal curve is symmetric around the mean 0
#2. The total area under the curve is 1;
so (from #1) the area to the left of 0 is 1/2, and the area to the right of 0 is 1/2
47
Finding Normal Percentiles by Hand (cont.)
Table Z is the standard Normal table. We have to convert our data to z-scores before using the table.
The figure shows us how to find the area to the left when we have a z-score of 1.80:
48
Areas Under the Z Curve: Using the Table
Proportion of area above the interval from 0 to 1 = .8413 - .5 = .3413
0 1Z
.1587.3413
.50
49
Standard normal areas have been calculated and are provided in table Z.
The tabulated area correspondto the area between Z= - and some z0
Z = z0
Area between - and z0
50
Example – begin with a normal model with mean 60 and stand dev 8
In this example z0 = 1.25
0.89440.8944
0.89440.8944
Proportion of the area to the left of 70
under the original curve is the proportion
70 60of the area to the left of 1.25
8under the standard normal Z curve
= 0.8944
51
Example
Area between 0 and 1.27) =
1.270 z
Area=.3980
.8980-.5=.3980
52
Example
Area to the right of .55 = A1
= 1 - A2
= 1 - .7088 = .2912
0 .55
A2
53
Example
Area between -2.24 and 0 =
Area=.4875
.5 - .0125 = .4875
z-2.24 0Area=.0125
54
Example
Area to the left of -1.85 = .0322
55
Example
A1 A2
0 2.73z
-1.18
Area between -1.18 and 2.73 = A - A1
= .9968 - .1190
= .8778
.1190
.9968
A1
A
56Area between -1 and +1 = .8413 - .1587 =.6826
.8413.1587
.6826
Example
57
Example
Is k positive or negative?
Direction of inequality; magnitude of probability
Look up .2514 in body of table; corresponding entry is -.67
-.67
58
Example
Area to the right of 250
under original curve
area to the right of
250 275 25.58
43 43under the standard normal
curve = 1 .2810 .7190
Z
59
Example
area between 225 and 375 area under
standard normal curve between z = (225 275) 43
= -1.16 and z = (375 275) 43 = 2.33;
the area is .9901 .1230 .8671
.9901.1230
.8671
60
N(275, 43); find k so that areato the left is .9846
.9846 area to the left of k under N(275,43)
curve area to left of z = ( 275) 43 under
275N(0,1) curve 2.1643
(from standard normal table)
2.16(43) 275 367.88
k
k
k
61
Area to the left of z = 2.16 = .9846
0 2.16 Z.1587
.4846
Area=.5
.9846
62
Example
Regulate blue dye for mixing paint; machine can be set to discharge an average of ml./can of paint.
Amount discharged: N(, .4 ml). If more than 6 ml. discharged into paint can, shade of blue is unacceptable.
Determine the setting so that only 1% of the cans of paint will be unacceptable
63
Solution
=amount of dye discharged into can
~N( , .4); determine so that
area to the right of 6 is .01
X
X
64
Solution (cont.)
6.4
(6 ) .4
=amount of dye discharged into can
~N( , .4); determine so that
the area to the right of x= 6 is .01.
.01 area to the right of 6
area to the right of z =
2.33(from standard normal tab
X
X
x
le)
= 6-2.33(.4) = 5.068
65
A random variable X with mean and standard deviation is normally distributed if its probability density function is given by
...71828.2eand...14159.3where
xe2
1)x(f
2x
)2/1(
...71828.2eand...14159.3where
xe2
1)x(f
2x
)2/1(
Normal Distributions
66
The Shape of Normal Distributions
Normal distributions are bell shaped, and symmetrical around
Why symmetrical? Let = 100. Suppose x = 110.
2210
)2/1(100110
)2/1(e
21
e2
1)110(f
Now suppose x = 9022
10)2/1(
10090)2/1(
e2
1e
21
)90(f
11090
67
Are You Normal? Normal Probability Plots
Checking your data to determine if a normal model is appropriate
68
Are You Normal? Normal Probability Plots
When you actually have your own data, you must check to see whether a Normal model is reasonable.
Looking at a histogram of the data is a good way to check that the underlying distribution is roughly unimodal and symmetric.
69
A more specialized graphical display that can help you decide whether a Normal model is appropriate is the Normal probability plot.
If the distribution of the data is roughly Normal, the Normal probability plot approximates a diagonal straight line. Deviations from a straight line indicate that the distribution is not Normal.
Are You Normal? Normal Probability Plots (cont)
70
Nearly Normal data have a histogram and a Normal probability plot that look somewhat like this example:
Are You Normal? Normal Probability Plots (cont)
71
A skewed distribution might have a histogram and Normal probability plot like this:
Are You Normal? Normal Probability Plots (cont)