View
57
Download
0
Category
Tags:
Preview:
DESCRIPTION
WEMBA, Regression Analysis. Market Intelligence Julie Edell Britton Session 9 October 9, 2009. Today’s Agenda. Announcements WEMBA C Multiple Regression Conjoint & dummy variable regression Multiple R2Y|X1, X2 vs. r2Y|X(i) Uncorrelated predictors Correlated predictors - PowerPoint PPT Presentation
Citation preview
WEMBA, Regression Analysis
Market IntelligenceJulie Edell Britton
Session 9October 9, 2009
Today’s Agenda•Announcements
•WEMBA C
•Multiple Regression •Conjoint & dummy variable regression•Multiple R2Y|X1, X2 vs. r2Y|X(i)•Uncorrelated predictors•Correlated predictors•Promotion analysis
•Course Evaluations
Announcements
• Submit Nestle Contadina slides by 8 am tomorrow, Sat., 10/10
4
A Model of School Choice
ValuesPerceptionsIndividual Differences & Constraints
Become a Fuqua Student
Assumes that behavior is driven by differences in:Values (Importance of key attributes)
Perceptions (Duke and Competition on key attributes)Individual Differences & Constraints (travel, cost, etc.)
5
The Funnel
Matriculate
Admitted
Opt Out
Apply
SelectedOut
Attend Information Session
Opt Out
Do not attend Information Session
Opt Out
6
The Analysis Approach
Sample groups that differ in behavior
Compare the groups on relevant dimensions:Perceptions
ValuesIndividual Differences & Constraints
Infer that any difference found between groups are partly responsible for differences in behavior
77
Disproportionate Stratified Random Sample
Matriculate
Admitted
Opt Out
Apply
SelectedOut
Attend Information SessionDo not attend Information Session
Not Apply
n = 26 (of 158, 16.5%)
Not Apply
n = 24 (of 173, 14%)
n = 30 (of 60, 50%)
n = 26 (of 60, 52%)
n = 42 (of 56, 75%)
n = 12 (of 56, 25%)
n = 56
8
What Drives Application to Duke?
• Compare Applied to Did Not Apply• Perceptions (Duke – Competitor)• Constraints
• Financial assistance program• % Cost• Time to travel to Duke
• Can’t use attendance at Information Session as a factor here without going to population data
• What did you learn?
9
What Drives Application to Duke?
* = chi-square test statistic
FACTORApplied Did Not
Apply t-statp-value(2-tail)
Perceptions (Duke – Competitor)
Faculty Reputation .500 .200 -2.102 .039
Cost -.667 -.273 1.777 .080
Mix of Face to Face and Distance Learning
.09 .361.737 .087
Demographics
% Cost You Pay 52.16 72.60 2.771 .007
How Long to Travel Duke 2.34 3.05 1.911 .059
Company Helps Pay 85.7% 72.0% 3.038* .081
10
What Drives Acceptance?• Compare accepted to did not accept,
conditional having applied• Perceptions (Duke – Competitor)• Constraints
• Financial assistance program• % Cost paid by employer• Time to travel to Duke
• Can consider info session attendance here
• What did you learn?
11
FACTORAccepted Did Not
Acceptt-stat
p-value(2-tail)
Perceptions (Duke – Competitior)
Faculty Reputation .679 .083 2.947 .005
Teaching Quality .462 -.231 3.469 .001
Core Quality .464 -.077 3.203 .003
Electives .421 -.923 3.599 .001
Student Quality .759 .167 2.614 .013
Technology .500 .077 1.714 .096
Advance Career .667 -.154 4.176 .000
Cost -.448 -1.154 2.202 .033
Student Reputation .900 .385 2.130 .039
Demographics
% Cost You Pay 46.46 68.86 1.845 .070
How Long to Travel Duke 2.097 3.064 1.978 .053
Attended an Information Session 59.5% 39.7% 2.406* .108
What Drives Acceptance?
* = chi-square test statistic
12
Acceptance by Sponsorship
% paid by company:
Student = 53.5
NonStudent = 31.1
13
The Impact of Info Sessions?
• Compare those who attended to those who did not attend on perceptions of Duke
• What did you learn?
14
The Impact of Info Sessions?
FACTORInfo Session No Info
Session t-statp-value(2-tail)
Perceptions (Duke)
Ability to Shift Work 3.07 3.35 1.887 .063
Cost 2.69 2.39 1.962 .053
Half the perceptual factors (9 of 17) were in the wrong direction!
Information sessions don’t seem to be doing much good at all.
*Only positive evidence is that, in overall population, probability of applying was 16.5% for those who attend an info session vs. 9.5% for those who did not, χ2 = 8.70, p < .005. *No significant effect on matriculation / acceptance: 59.5% of those who attended accepted v. 39.7% of those who did not attend, χ2 = 2.41, p = .11
15
Who to Target?
FACTORApplied Did Not
Apply t-statp-value(2-tail)
% Cost You Pay
52.16 72.602.771 .007
How Long to Travel Duke
2.34 3.051.911 .059
Company Helps Pay
85.7% 72.0%3.038* .081
FACTORAccepted Did Not
Acceptt-stat
p-value(2-tail)
% Cost You Pay
46.46 68.861.845 .070
How Long to Travel Duke
2.097 3.0641.978 .053
16
What Should be Emphasized in Information Sessions?
• Attributes that are important and where we do well and/or where our competition does not do well.
• Importance• Focus on attributes that predict applying &
accepting• Rank order attribute importance
• Look at important attributes where perceptions (Duke – Comp) is positive.
17
Quasi-MAAM for Communication
Content: Demonstrated Importance
Duke Advantage
Duke Disadvantage
ImportantFaculty Rep, Cost
Advance Career, Teaching Quality, Electives, Core Quality, Faculty Rep, Student Quality, School Rep
Brag Misperception (fix Marcom) or Real Problem (fix Product)?
Unimportant Save Your Breath
Ignore
18
Quasi-MAAM for Communication Content: Using Importance Ratings
Duke Advantage
Duke Disadvantage
ImportantContinue Career, School Rep, Forwarding Career, Teaching Quality, Faculty Reputation, Other Students
Brag Misperception (fix Marcom) or Real Problem (fix Product)?
Unimportant Save Your Breath
Ignore
19
WEMBA Takeaways
• Be backward in your analysis process too
• Outline your analysis before you begin
• Think about tables needed, then do the analysis to make them
• Real data is imperfect…do the best you can
• Survey data is correlational, not causal
• The funnel approach applies to many business problems
20
Multiple Regression
Simple linear regression, with more than one predictor.
kik2i21i1
^
i xb...xbxbay
a = intercept: predicted value of y if x1 = x2 = …xk = 0
b1 = Slope of y on x1 given that x2…xk are already in equation
R = multiple correlation = correlation of Y-hat with Y (0< R < 1)
R2 = % variance in Y explained by best linear regression equation.
21
Conjoint as Dummy Variable Regression
a1=12 oz cana1=12 oz cana2=16 oz disposablea2=16 oz disposableb1=Coke b2=Pepsi b1=Coke b2=Pepsi
c1=decaff 7 3 8 5c2=caff 9 4 10 6
(Y) (X1) (X2) (X3)Rating Size16oz Pepsi Caffeine
7 0 0 0 9 0 0 1 3 0 1 0 4 0 1 1 8 1 0 0 10 1 0 1 5 1 1 0 6 1 1 1
22
Bivariate Correlation MatrixCorrelations
1 .327 -.873** .327
.429 .005 .429
8 8 8 8
.327 1 .000 .000
.429 1.000 1.000
8 8 8 8
-.873** .000 1 .000
.005 1.000 1.000
8 8 8 8
.327 .000 .000 1
.429 1.000 1.000
8 8 8 8
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Rating
Size16oz
Pepsi
Caffine
Rating Size16oz Pepsi Caffine
Correlation is significant at the 0.01 level (2-tailed).**.
23
R2Y|X1,X2,X3 = r2
Y|X1 +r2Y|X2+r2
Y|X3
Because Dummies are Uncorrelated
.976 =[(.327)2+(-.873)2+.327)2]
Model Summary
.988a .976 .958 .50000Model1
R R SquareAdjustedR Square
Std. Error ofthe Estimate
Predictors: (Constant), Caffine, Pepsi, Size16oza.
24
Dummy Variable Regression
Coefficientsa
7.000 .354 19.799 .000 6.018 7.982
1.500 .354 4.243 .013 .518 2.482
-4.000 .354 -11.314 .000 -4.982 -3.018
1.500 .354 4.243 .013 .518 2.482
(Constant)
Size16oz
Pepsi
Caffine
Model1
B Std. Error
UnstandardizedCoefficients
t Sig. Lower Bound Upper Bound
95% Confidence Interval for B
Dependent Variable: Ratinga.
25
21
21
2
1
2
nor x neither xby sharedy in variance)error"(" =
xand both x with sharedy in variance= c
with xshareduniquely y in variance= b
with xshareduniquely y in variance= a
where,21
e
ecba
cbaR xxy
A Framework for Understanding MulticollinearityMajor Problem in Multiple Regression:
Assessing the (unique) contribution of the individual predictors.
a b
Variance in x1
Variance in x2
Variance in y
c
e
2626
The area in c causes ambiguity in specifying the contributions of x1 and x2 to explaining y. Should we attribute all of this variance to x1? All to x2? Somehow split it?
Two different ways exist of assessing the contribution of x1 to explaining y.
Multicollinearity
27
2 Ways to Deal with Overlap (see slide 26)
ecba
car
c. and a both for credit X1 Give
...r , x y withof ncorrelatio order"-Zero" 1.
2yx
yx1
1
1
ecba
ar
a. oncontributi unique foronly credit X1 Give
...r , x y withof ncorrelatio tial"(semi-)Par" 2.
2yx
yx1
1
1
28
Understand Expenditures in Milan Food
Run two regressions
(Kid6To18)b + (HHSize)b + a = FoodExp 1. 21
Total in household Weekly food expenditures
Any kids 6-18?
(0=no, 1=yes)
(IncomeK)b + (HHSize)b + a = FoodExp 2. '2
'1
'
Annual income
29
But Check Correlations Between Predictors
FoodExp$ HHSize Kid6-18? IncomeKFoodExp$ 1HHSize 0.43 1Kid6-18? 0.40 0.70 1IncomeK 0.37 0.16 0.19 1
30
Zero order coefficients tell us:
rbc = +.43rbi = +.40rbj = +.37
More people in household, more weekly food expenditures
If any kids 6-18, more weekly food expenditures
All strongly statisticallysignificant with 498 df
Higher income, more spent on food
Which two should predict Weekly expenditures best?
31
Multicollinearity, r2, and R2
FoodExp$ 1HHSize 0.43 1Kid6-18? 0.40 0.70 1IncomeK 0.37 0.16 0.19 1Income$ 0.37 0.99
Model Predictor 1 Predictor 2 Sum of r2 Actual R2
1 HHSize Kid 6-18 .346 .205
2 HHSize IncomeK .327 .282
3 IncomeK Income$ .274 .140
3232
33
“Partial Effect” Milan Food Problem
Run two regressions
(KidLT6)b + a = FoodExp 1. 2
Weekly food expendituresAny children under 6?
(0=no, 1=yes)
(KidLT6)b + (HHSize)b + a = FoodExp 2. '2
'1
'
Total in household Any children under 6? (0=no, 1=yes)
34
Coefficientsa
40.680 1.154 35.266 .000
7.278 1.923 .167 3.786 .000 1.000 1.000
25.288 1.841 13.734 .000
-4.875 2.119 -.112 -2.301 .022 .683 1.465
5.455 .536 .496 10.179 .000 .683 1.465
(Constant)
H=Any Children Under 6?
(Constant)
H=Any Children Under 6?
C=Household Size
Model1
2
B Std. Error
UnstandardizedCoefficients
Beta
Standardized
Coefficients
t Sig. Tolerance VIF
Collinearity Statistics
Dependent Variable: B=Weekly Food Expenditure Dollarsa.
35
Doritos
•XL Models. Effects of own price (& price promotion) & price promotions of other sizes (SM, XXL, 3XL) on sales of XL size
3636
Correlations
1.000 .018 .449** .502** .085
. .854 .000 .000 .391
104 104 104 104 104
.018 1.000 .091 .120 -.801**
.854 . .356 .224 .000
104 104 104 104 104
.449** .091 1.000 .950** .107
.000 .356 . .000 .279
104 104 104 104 104
.502** .120 .950** 1.000 .067
.000 .224 .000 . .500
104 104 104 104 104
.085 -.801** .107 .067 1.000
.391 .000 .279 .500 .
104 104 104 104 104
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Average Price PerPound Small Size
Average Price PerPound XL Size
Average Price PerPound 2XL Size
Average Price PerPound 3XL Size
Lbs Extra LargeSize 9 Oz $2.19
AveragePrice Per
PoundSmall Size
AveragePrice PerPound XL
Size
AveragePrice Per
Pound2XL Size
AveragePrice Per
Pound3XL Size
Lbs ExtraLarge Size9 Oz $2.19
Correlation is s ignificant at the 0.01 level (2-tailed).**.
37
Only Own XL Price SignificantCoefficientsa
316.844 2445.178 .130 .897
253.361 515.280 .033 .492 .624
-1915.729 136.477 -.813 -14.037 .000
3590.806 2495.219 .267 1.439 .153
-1413.885 2574.564 -.106 -.549 .584
(Constant)
Average Price PerPound Small Size
Average Price PerPound XL Size
Average Price PerPound 2XL Size
Average Price PerPound 3XL Size
Model1
B Std. Error
UnstandardizedCoefficients
Beta
Standardized
Coefficients
t Sig.
Dependent Variable: Lbs Extra Large Size 9 Oz $2.19a.
38
Coefficientsa
625.958 2371.187 .264 .792
175.976 493.905 .023 .356 .722
-1924.665 135.029 -.817 -14.254 .000
2305.377 861.524 .172 2.676 .009
(Constant)
Average Price PerPound Small Size
Average Price PerPound XL Size
Average Price PerPound 2XL Size
Model1
B Std. Error
UnstandardizedCoefficients
Beta
Standardized
Coefficients
t Sig.
Dependent Variable: Lbs Extra Large Size 9 Oz $2.19a.
Drop 3 XL from the model
39
Promotion Models (p. 195)
•If coupon dummy (1 = yes, 0 = no) and promo dummy (1=yes, 0=no) are perfectly correlated and each is correlated, say, r = .5 with weekly sales,
•R2 (sales |coupon, promo) = .25 << .25 + .25.
•Coefficients on coupon, promo would be indistinguishable from zero (nonsignificant), with huge standard errs.
40
Regression Analysis Data
Week Y = Sales X1 = Coupon X2 =Promotion
1 1000 0 0
2 2900 1 1
3 1100 0 0
… … … …
52 3100 1 1
41
Promotion Models
•Omitted Variable Bias•Promo, Coupons each boost 1000 units, but•Coupon omitted & r = 1 with promo, coefficient for promo will be 2000. (P. 175-181)
•Multicollinearity & Overloaded Models
42
Sales Promotion Analysis, p. 187Period Category
SalesBrand A
SalesAll Others'
SalesCategory -
1000Brand A -
300All Others -
700
1 1000 300 700 0 0 02 1000 300 700 0 0 03 1000 300 700 0 0 04 1200 600 600 200 300 -1005 850 200 650 -150 -100 -506 1000 300 700 0 0 07 1000 300 700 0 0 08 1000 300 700 0 0 0
Total 8050 2600 5450 50 200 -150
Example shows That 300 units incremental volume Week 4 came froma. 100 Units from A's own sales in Week 5, so net gain only 200 unitsb. 50 units net category expansionc. 150 units stolen from competitors (100 week 4, 50 week 5)
43
Takeaways
•Multiple Regression •Dummy Variable Regression for Conjoint (uncorrelated predictors)•Correlated predictors make it difficult to assess each predictor’s unique contribution.
•Common in promotion analysis because it is common to pull multiple promotional levers simultaneously. •2 Solutions:
•Drop a predictor (omitted variable bias so reinterpret coefficients)•Leave both in (inflated Standard Errors, hard to assess impact of each)
Recommended