Upload
cindy-lathe
View
225
Download
0
Tags:
Embed Size (px)
Citation preview
1
Midterm Review
2
Econ 240ADescriptive StatisticsProbability InferenceDifferences between populationsRegression
3
I. Descriptive StatisticsTelling stories with Tables and Graphs
That are self-explanatory and esthetically appealing
Exploratory Data Analysis for random variables that are not normally distributed Stem and Leaf diagrams Box and Whisker Plots
4
Stem and Leaf DiagtamExample: Problem 2.24Prices in thousands of $ of houses sold
in a Los Angeles suburb in a given year
5
Subsample
Prices289208255215270222206221210224209250222213220250209
Problem 2.24Prices in thousands $Houses sold in a Los Angeles suburb
6
Sorted Data
Prices192195198200202205206206208208209209209209209210211
Problem 2.24Prices in thousands $Houses sold in a Los Angeles suburb
7
Prices
Mean 237.9882Standard Error 3.314365Median 230Mode 222Standard Deviation 30.55693Sample Variance 933.7261Kurtosis 1.620493Skewness 1.164885Range 149Minimum 192Maximum 341Sum 20229Count 85
Summary StatisticsProblem 2.24Prices in thousands $Houses sold in a Los Angeles suburb
8
Stem & Leaf Display
Stems Leaves19 ->25820 ->02566889999921 ->0123345778922 ->001222222334669923 ->0033624 ->01224446778825 ->0000225526 ->056927 ->0023568928 ->692930 ->6831323334 ->01
Problem 2.24Prices in thousands $Houses sold in a Los Angeles suburb
9
Box and Whiskers PlotsExample: Problem 4.30Starting salaries by degree
10
SubsampleBA BSc BBA Other
26819 28930 38968 3455025797 36602 35187 3024529115 35098 29452 3152032877 36793 30943 2668030015 36171 31610 2904725090 28396 39738 3503723163 26204 37444 2655028225 37280 38403 3570425103 37660 36459 3226229742 24539 37963 3420624587 27222 34138 2691720780 39536 42062 2672330353 32653 32700 36297
Problem 4.50Starting salariesBy degree
11
BASmallest = 18719Q1 = 25730Median = 27765Q3 = 29835.5Largest = 37025IQR = 4105.5Outliers: 37025, 36345, 18719,
BScSmallest = 23451Q1 = 29927Median = 33396.5Q3 = 36745.25Largest = 40105IQR = 6818.25Outliers:
BBASmallest = 23401Q1 = 31316Median = 34284Q3 = 39551Largest = 47639IQR = 8235Outliers:
OtherSmallest = 21994Q1 = 28253.5Median = 29950.5Q3 = 32905.25Largest = 38812IQR = 4651.75Outliers:
BoxPlot
0 10000 20000 30000 40000 50000
BoxPlot
0 10000 20000 30000 40000 50000
BoxPlot
0 10000 20000 30000 40000 50000
BoxPlot
0 10000 20000 30000 40000 50000
12
BASmallest = 18719Q1 = 25730Median = 27765Q3 = 29835.5Largest = 37025IQR = 4105.5Outliers: 37025, 36345, 18719,
BoxPlot
0 10000 20000 30000 40000 50000
BoxPlot
0 10000 20000 30000 40000 50000
BScSmallest = 23451Q1 = 29927Median = 33396.5Q3 = 36745.25Largest = 40105IQR = 6818.25Outliers:
BoxPlot
0 10000 20000 30000 40000 50000
BoxPlot
0 10000 20000 30000 40000 50000
13
BBASmallest = 23401Q1 = 31316Median = 34284Q3 = 39551Largest = 47639IQR = 8235Outliers:
BoxPlot
0 10000 20000 30000 40000 50000
BoxPlot
0 10000 20000 30000 40000 50000
OtherSmallest = 21994Q1 = 28253.5Median = 29950.5Q3 = 32905.25Largest = 38812IQR = 4651.75Outliers:
BoxPlot
0 10000 20000 30000 40000 50000
14
II. ProbabilityConcepts
Elementary outcomes Bernoulli trials Random experiments events
15
Probability (Cont.)Rules or axioms:Addition rule
P(AUB) = P(A) + P(B) – P(A^B)Conditional probability
P(A/B) = P(A^B)/P(B) Independence
16
Probability ( Cont.)Conditional probability
P(A/B) = P(A^B)/P(B) Independence
P(A)*P(B) = P(A^B) So P(A/B) = P(A)
17
Probability (Cont.)Discrete Binomial Distribution
P(k) = Cn(k) pk (1-p)n-k
n repeated independent Bernoulli trials k successes and n-k failures
18
Binomial Random Number Generator
Take 50 statesSuppose each state was a battleground
state, with probability 0.5 of winning that state
What would the distribution of states look like? How few could you win? How many could you win?
19
24
24
28
25
18
29
25
24
24
23
25
24
29
32
28
30
23
27
21
Subsample
20
Histogram of States Won
0
2
4
6
8
10
Bin
Fre
qu
ency
21
Discrete Probability Density, p=0.5
0
0.02
0.04
0.06
0.08
0.1
0.12
15 20 25 30 35 40
States Won
Pro
ba
bil
ity
22
Discrete Cumulative Distribution, p=0.5
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
States Won
Pro
ba
bil
ty
23
Discrete Cumulative Distribution
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
States Won
Pro
ba
bil
ity
p=0.5
p=0.48
24
Probability (Cont.)Continuous normal distribution as an
approximation to the binomial n*p>5, n(1-p)>5 f(z) = (1/2½ exp[-½*z2] z=(x- f(x) = (1/ (1/2½ exp[-½*{(x-
25
III. InferenceRates and ProportionsPopulation Means and Sample MeansPopulation Variances and Sample
VariancesDecision Theory
26
Decision Theory In inference, I.e. hypothesis testing, and
confidence interval estimation, we can make mistakes because we are making guesses about unknown parameters
The objective is to minimize the expected cost of making errors
E(C) = C(I) + C(II)
27
Sample Proportions from Polls
Where n is sample size and k is number of successes
nkp /ˆ
)1(,(~ pnpnpBk
28
Sample Proportions
)1()/1(,(~ˆ
/)1()1()/1()/1(ˆ
)/1()/1(ˆ22
ppnpNp
npppnpnVarknpVAR
pnpnEknpE
So estimated p-hat is approximately normal for large sample sizes
29
Sample ProportionsWhere the sample size is large
30
Problem 9.38A commercial for a household
appliances manufacturer claims that less than 5% of all of its products require a service call in the first year. A consumer protection association wants to check the claim by surveying 400 households that recently purchased one of the company’s appliances
31
Problem 9.38 (Cont.)What is the probability that more than
10% require a service call in the first year?
What would you say about the commercial’s honesty if in a random sample of 400 households, 10% report at least one service call?
32
Problem 9.38 Answer Null Hypothesis: H0: p=0.05
Alternative Hypothesis: p>0.05 Statistic:
59.4
95.0)05.0)(400/1(/)05.010.0(
)ˆ1(ˆ)/1(/)ˆ(/)ˆˆ( ˆ
z
z
ppnpppEpz p
33
0.0
0.1
0.2
0.3
0.4
0.5
-4 -2 0 2 4
Z
NO
RM
DE
NS
Continuous Density of the Standardized Normal Variate, Z
4.59
Z .
Z critical
1.645
5%
34
Sample means and population means where the population variance is known
35
Problem 9.26, Sample MeansThe dean of a business school claims
that the average MBA graduate is offered a starting salary of $55,000. The standard deviation of the offers is $4600. What is the probability that in a sample of 38 MBA graduates , the mean starting salary is less than $53,000?
36
Problem 9.26 (Cont.) Null Hypothesis: H0: Alternative Hypothesis: HA: Statistic:
68.23.746/2000
38/4600/()5300055000(
)//()(/)(
z
z
nxxExz x
37
0.0
0.1
0.2
0.3
0.4
0.5
-4 -2 0 2 4
Z
NO
RM
DE
NS
Continuous Density of the Standardized Normal Variate, Z
Zcrit(1%)= -2.33
38
Sample means and population means when the population variance is unknown
39
Problems 12.33A federal agency responsible for
enforcing laws governing weights and measures routinely inspects packages to determine whether the weight of the contents is at least as great as that advertised on the package. A random sample of 18 containers whose packaging states that the contents weighs 8 ounces was drawn.
40
Problems 12.33 (Cont.) Can we conclude that on average the
containers are mislabeled? Use
)//()(/)( nsxxExt x
41
0.0
0.1
0.2
0.3
0.4
-2 -1 0 1 2
RANDT
TD
EN
S
Density Function for Student's t-distribution, 17 Degrees of Freedom
t crit 5%
42
Problems 12.33 (Cont.)
7.8 7.97 7.92
7.91 7.95 7.87
7.93 7.79 7.92
7.99 8.06 7.98
7.94 7.82 8.05
7.75 7.89 7.91
43
Mean 7.913888889
Standard Error 0.019969567
Median 7.92
Mode 7.91
Standard Deviation 0.084723695
Sample Variance 0.007178105
Kurtosis -0.24366084
Skewness -0.22739254
Range 0.31
Minimum 7.75
Maximum 8.06
Sum 142.45
Count 18
44
Problems 12.33 (Cont.) Can we conclude that on average the
containers are mislabeled? Use
3.4
020.0/086.0)18/0847.0/()8914.7(
)//()(/)(
t
t
nsxxExt x
45
Confidence Intervals for Variances
46
Problems 12.33 &12.55A federal agency responsible for
enforcing laws governing weights and measures routinely inspects packages to determine whether the weight of the contents is at least as great as that advertised on the package. A random sample of 18 containers whose packaging states that the contents weighs 8 ounces was drawn.
47
Problems 12.33 &12.55 (Cont.)
Estimate with 95% confidence the variance in contents’ weight.
variable with n-1 degrees of freedom is (n-1)s2 /
48
0.00
0.02
0.04
0.06
0.08
5 10 15 20 25
RANDCHI
CH
IDE
NS
Chi Square Density for 17 Degrees of Freedom
49
Problems 12.33 &12.55(Cont.)
7.8 7.97 7.92
7.91 7.95 7.87
7.93 7.79 7.92
7.99 8.06 7.98
7.94 7.82 8.05
7.75 7.89 7.91
50
Mean 7.913888889
Standard Error 0.019969567
Median 7.92
Mode 7.91
Standard Deviation 0.084723695
Sample Variance 0.007178105
Kurtosis -0.24366084
Skewness -0.22739254
Range 0.31
Minimum 7.75
Maximum 8.06
Sum 142.45
Count 18
51
Problems 12.33 &12.55(Cont.)7.564<(n-1)s2 /<30.1917.564<17*0.00718/<30.191 (1/7.564)*17*0.00718>>(1/30.191)*17*0
.007180.0161>>0.0040
52
IV. Differences in Populations Null Hypothesis: H0: or =0
Alternative Hypothesis: HA: ≠ 0
212121
22112
222
1121
2221121
2212121
2121
2][
)])((2)()[(][
)]()[(][
)]()[(][
/)]()[(21
xxCovxVarxVarxxVar
xxxxExxVar
xxExxVar
xxExxVar
xxt xx
53
IV. Differences in Populations
)]/()/[(/)]()[(
][/)]()[(
2][
/)]()[(
2221
212121
212121
212121
2121 21
nnxxt
xVarxVarxxt
xxCovxVarxVarxxVar
xxt xx
Reference Ch. 9 & Ch. 13
54
V. Regression Model: yi = a + b*xi + ei
n
i
n
iii
n
ii
n
ii
iii
n
i
n
iiii
ii
exxbTSS
USSSumlainedUnESSSumExplainedTSS
yyTSSSquaresofSumTotalANOVA
esidualsSquaredofSum
yyeerrorestimated
xbyaestimate
xxxxyybestimate
xbayFitted
1 1
222
2
1
1
2
1 1
2
ˆ][ˆ
)(_exp)(_
][)(___:
ˆ:Re___
)ˆ(ˆ:_
*ˆˆ:
][/]][[ˆ:
*ˆˆˆ:
55
Fortune 500, 1999: Assets Vs. Revenue, In Logs
General Motors
Exxon Mobil
Wal-Mart
Kroger
Ingram Micro
Costco Wholesale
McKesson HBOC
General Electric
CitigroupBank of AmericaFannie May
Chase ManhattenMorgan Stanley
Merrill LynchPrudential
Bank One American InternationalTIAA-CREF
State Farm
Allstate
1000
10000
100000
1000000
10000 100000 1000000
Log Revenue
Lo
g A
ss
ets
Lab Five
56
rank firm industry revenue M$ profits M$ assets M$5 General Electric Diversified Financials 111630 10717 4052007 Citigroup Diversified Financials 82005 9867 716900
11 Bank of America Corp. Commercial banks 51392 7882 63257426 Fannie Mae Diversified Financials 36968.6 3911.9 575167.431 Chase Manhatten Corp. Commercial Banks 33710 5446 40610548 Prudential Ins.Co. of America Insurance: Life, Health(stock) 26618 813 28509450 Bank One Corp. Commercial Banks 25986 3479 26942530 Morgan Stanley Dean Witter Securities 33928 4791 36696729 Merrill Lynch Securities 34879 2618 32807119 TIAA-CREF Insurance: Life, Health(mutual) 39410.2 1024.07 289247.9917 American International Group Insurance; P&C(stock) 40656.08 5055.44 268238
The Financials
57
The Financials: Eleven Firms
y = 0.4335x + 8.2535
R2 = 0.3039
12.4
12.6
12.8
13
13.2
13.4
13.6
10 10.2 10.4 10.6 10.8 11 11.2 11.4 11.6 11.8
ln Revenue M$
ln A
ss
ets
M$
Excel Chart
58
SUMMARY OUTPUT
Regression StatisticsMultiple R 0.5512779R Square 0.3039073Adjusted R Square 0.2265636Standard Error 0.3117374Observations 11
ANOVAdf SS MS F Significance F
Regression 1 0.381851405 0.381851 3.929312 0.078773838Residual 9 0.874622016 0.09718Total 10 1.256473421
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%Lower 95.0%Upper 95.0%Intercept 8.2534951 2.33138973 3.540161 0.006313 2.979521108 13.52747 2.979521 13.52747X Variable 1 0.4335105 0.218696259 1.982249 0.078774 -0.061215204 0.928236 -0.06122 0.928236
Excel Regression
59
12.4
12.6
12.8
13.0
13.2
13.4
13.6
10.0 10.5 11.0 11.5 12.0
LNSALES
LNA
SS
ET
S
Eleven Financial Firms
Eviews Chart
60
Eviews Regression
61
-0.4
-0.2
0.0
0.2
0.4
0.6
12.4
12.6
12.8
13.0
13.2
13.4
13.6
1 2 3 4 5 6 7 8 9 10 11
Residual Actual Fitted
Eviews: Actual, Fitted & residual