WEMBA, Regression Analysis

Market IntelligenceJulie Edell Britton

Session 9October 9, 2009

Today’s Agenda•Announcements

•WEMBA C

•Multiple Regression •Conjoint & dummy variable regression•Multiple R2Y|X1, X2 vs. r2Y|X(i)•Uncorrelated predictors•Correlated predictors•Promotion analysis

•Course Evaluations

Announcements

• Submit Nestle Contadina slides by 8 am tomorrow, Sat., 10/10

A Model of School Choice

ValuesPerceptionsIndividual Differences & Constraints

Become a Fuqua Student

Assumes that behavior is driven by differences in:Values (Importance of key attributes)

Perceptions (Duke and Competition on key attributes)Individual Differences & Constraints (travel, cost, etc.)

The Funnel

Matriculate

Admitted

Opt Out

SelectedOut

Attend Information Session

Opt Out

Do not attend Information Session

Opt Out

The Analysis Approach

Sample groups that differ in behavior

Compare the groups on relevant dimensions:Perceptions

ValuesIndividual Differences & Constraints

Infer that any difference found between groups are partly responsible for differences in behavior

Disproportionate Stratified Random Sample

Matriculate

Admitted

Opt Out

SelectedOut

Attend Information SessionDo not attend Information Session

Not Apply

n = 26 (of 158, 16.5%)

Not Apply

n = 24 (of 173, 14%)

n = 30 (of 60, 50%)

n = 26 (of 60, 52%)

n = 42 (of 56, 75%)

n = 12 (of 56, 25%)

n = 56

What Drives Application to Duke?

• Compare Applied to Did Not Apply• Perceptions (Duke – Competitor)• Constraints

• Financial assistance program• % Cost• Time to travel to Duke

• Can’t use attendance at Information Session as a factor here without going to population data

• What did you learn?

What Drives Application to Duke?

* = chi-square test statistic

FACTORApplied Did Not

Apply t-statp-value(2-tail)

Perceptions (Duke – Competitor)

Faculty Reputation .500 .200 -2.102 .039

Cost -.667 -.273 1.777 .080

Mix of Face to Face and Distance Learning

.09 .361.737 .087

Demographics

% Cost You Pay 52.16 72.60 2.771 .007

How Long to Travel Duke 2.34 3.05 1.911 .059

Company Helps Pay 85.7% 72.0% 3.038* .081

What Drives Acceptance?• Compare accepted to did not accept,

conditional having applied• Perceptions (Duke – Competitor)• Constraints

• Financial assistance program• % Cost paid by employer• Time to travel to Duke

• Can consider info session attendance here

FACTORAccepted Did Not

Acceptt-stat

p-value(2-tail)

Perceptions (Duke – Competitior)

Faculty Reputation .679 .083 2.947 .005

Teaching Quality .462 -.231 3.469 .001

Core Quality .464 -.077 3.203 .003

Electives .421 -.923 3.599 .001

Student Quality .759 .167 2.614 .013

Technology .500 .077 1.714 .096

Advance Career .667 -.154 4.176 .000

Cost -.448 -1.154 2.202 .033

Student Reputation .900 .385 2.130 .039

Demographics

% Cost You Pay 46.46 68.86 1.845 .070

How Long to Travel Duke 2.097 3.064 1.978 .053

Attended an Information Session 59.5% 39.7% 2.406* .108

What Drives Acceptance?

* = chi-square test statistic

Acceptance by Sponsorship

% paid by company:

Student = 53.5

NonStudent = 31.1

The Impact of Info Sessions?

• Compare those who attended to those who did not attend on perceptions of Duke

The Impact of Info Sessions?

FACTORInfo Session No Info

Session t-statp-value(2-tail)

Perceptions (Duke)

Ability to Shift Work 3.07 3.35 1.887 .063

Cost 2.69 2.39 1.962 .053

Half the perceptual factors (9 of 17) were in the wrong direction!

Information sessions don’t seem to be doing much good at all.

*Only positive evidence is that, in overall population, probability of applying was 16.5% for those who attend an info session vs. 9.5% for those who did not, χ2 = 8.70, p < .005. *No significant effect on matriculation / acceptance: 59.5% of those who attended accepted v. 39.7% of those who did not attend, χ2 = 2.41, p = .11

Who to Target?

FACTORApplied Did Not

Apply t-statp-value(2-tail)

% Cost You Pay

52.16 72.602.771 .007

How Long to Travel Duke

2.34 3.051.911 .059

Company Helps Pay

85.7% 72.0%3.038* .081

FACTORAccepted Did Not

Acceptt-stat

p-value(2-tail)

% Cost You Pay

46.46 68.861.845 .070

How Long to Travel Duke

2.097 3.0641.978 .053

What Should be Emphasized in Information Sessions?

• Attributes that are important and where we do well and/or where our competition does not do well.

• Importance• Focus on attributes that predict applying &

accepting• Rank order attribute importance

• Look at important attributes where perceptions (Duke – Comp) is positive.

Quasi-MAAM for Communication

Content: Demonstrated Importance

Duke Advantage

Duke Disadvantage

ImportantFaculty Rep, Cost

Advance Career, Teaching Quality, Electives, Core Quality, Faculty Rep, Student Quality, School Rep

Brag Misperception (fix Marcom) or Real Problem (fix Product)?

Unimportant Save Your Breath

Ignore

Quasi-MAAM for Communication Content: Using Importance Ratings

Duke Advantage

Duke Disadvantage

ImportantContinue Career, School Rep, Forwarding Career, Teaching Quality, Faculty Reputation, Other Students

Brag Misperception (fix Marcom) or Real Problem (fix Product)?

Unimportant Save Your Breath

Ignore

WEMBA Takeaways

• Be backward in your analysis process too

• Outline your analysis before you begin

• Think about tables needed, then do the analysis to make them

• Real data is imperfect…do the best you can

• Survey data is correlational, not causal

• The funnel approach applies to many business problems

Multiple Regression

Simple linear regression, with more than one predictor.

kik2i21i1

i xb...xbxbay

a = intercept: predicted value of y if x1 = x2 = …xk = 0

b1 = Slope of y on x1 given that x2…xk are already in equation

R = multiple correlation = correlation of Y-hat with Y (0< R < 1)

R2 = % variance in Y explained by best linear regression equation.

Conjoint as Dummy Variable Regression

a1=12 oz cana1=12 oz cana2=16 oz disposablea2=16 oz disposableb1=Coke b2=Pepsi b1=Coke b2=Pepsi

c1=decaff 7 3 8 5c2=caff 9 4 10 6

(Y) (X1) (X2) (X3)Rating Size16oz Pepsi Caffeine

7 0 0 0 9 0 0 1 3 0 1 0 4 0 1 1 8 1 0 0 10 1 0 1 5 1 1 0 6 1 1 1

Bivariate Correlation MatrixCorrelations

1 .327 -.873** .327

.429 .005 .429

8 8 8 8

.327 1 .000 .000

.429 1.000 1.000

8 8 8 8

-.873** .000 1 .000

.005 1.000 1.000

8 8 8 8

.327 .000 .000 1

.429 1.000 1.000

8 8 8 8

Pearson Correlation

Sig. (2-tailed)

Pearson Correlation

Sig. (2-tailed)

Pearson Correlation

Sig. (2-tailed)

Pearson Correlation

Sig. (2-tailed)

Rating

Size16oz

Caffine

Rating Size16oz Pepsi Caffine

Correlation is significant at the 0.01 level (2-tailed).**.

R2Y|X1,X2,X3 = r2

Y|X1 +r2Y|X2+r2

Because Dummies are Uncorrelated

.976 =[(.327)2+(-.873)2+.327)2]

Model Summary

.988a .976 .958 .50000Model1

R R SquareAdjustedR Square

Std. Error ofthe Estimate

Predictors: (Constant), Caffine, Pepsi, Size16oza.

Dummy Variable Regression

Coefficientsa

7.000 .354 19.799 .000 6.018 7.982

1.500 .354 4.243 .013 .518 2.482

-4.000 .354 -11.314 .000 -4.982 -3.018

1.500 .354 4.243 .013 .518 2.482

(Constant)

Size16oz

Caffine

Model1

B Std. Error

UnstandardizedCoefficients

t Sig. Lower Bound Upper Bound

95% Confidence Interval for B

Dependent Variable: Ratinga.

nor x neither xby sharedy in variance)error"(" =

xand both x with sharedy in variance= c

with xshareduniquely y in variance= b

with xshareduniquely y in variance= a

where,21

cbaR xxy

A Framework for Understanding MulticollinearityMajor Problem in Multiple Regression:

Assessing the (unique) contribution of the individual predictors.

Variance in x1

Variance in x2

Variance in y

The area in c causes ambiguity in specifying the contributions of x1 and x2 to explaining y. Should we attribute all of this variance to x1? All to x2? Somehow split it?

Two different ways exist of assessing the contribution of x1 to explaining y.

Multicollinearity

2 Ways to Deal with Overlap (see slide 26)

c. and a both for credit X1 Give

...r , x y withof ncorrelatio order"-Zero" 1.

a. oncontributi unique foronly credit X1 Give

...r , x y withof ncorrelatio tial"(semi-)Par" 2.

Understand Expenditures in Milan Food

Run two regressions

(Kid6To18)b + (HHSize)b + a = FoodExp 1. 21

Total in household Weekly food expenditures

Any kids 6-18?

(0=no, 1=yes)

(IncomeK)b + (HHSize)b + a = FoodExp 2. '2

Annual income

But Check Correlations Between Predictors

FoodExp$ HHSize Kid6-18? IncomeKFoodExp$ 1HHSize 0.43 1Kid6-18? 0.40 0.70 1IncomeK 0.37 0.16 0.19 1

Zero order coefficients tell us:

rbc = +.43rbi = +.40rbj = +.37

More people in household, more weekly food expenditures

If any kids 6-18, more weekly food expenditures

All strongly statisticallysignificant with 498 df

Higher income, more spent on food

Which two should predict Weekly expenditures best?

Multicollinearity, r2, and R2

FoodExp$ 1HHSize 0.43 1Kid6-18? 0.40 0.70 1IncomeK 0.37 0.16 0.19 1Income$ 0.37 0.99

Model Predictor 1 Predictor 2 Sum of r2 Actual R2

1 HHSize Kid 6-18 .346 .205

2 HHSize IncomeK .327 .282

3 IncomeK Income$ .274 .140

“Partial Effect” Milan Food Problem

Run two regressions

(KidLT6)b + a = FoodExp 1. 2

Weekly food expendituresAny children under 6?

(0=no, 1=yes)

(KidLT6)b + (HHSize)b + a = FoodExp 2. '2

Total in household Any children under 6? (0=no, 1=yes)

Coefficientsa

40.680 1.154 35.266 .000

7.278 1.923 .167 3.786 .000 1.000 1.000

25.288 1.841 13.734 .000

-4.875 2.119 -.112 -2.301 .022 .683 1.465

5.455 .536 .496 10.179 .000 .683 1.465

(Constant)

H=Any Children Under 6?

(Constant)

H=Any Children Under 6?

C=Household Size

Model1

B Std. Error

Standardized

Coefficients

t Sig. Tolerance VIF

Collinearity Statistics

Dependent Variable: B=Weekly Food Expenditure Dollarsa.

Doritos

•XL Models. Effects of own price (& price promotion) & price promotions of other sizes (SM, XXL, 3XL) on sales of XL size

Correlations

1.000 .018 .449** .502** .085

. .854 .000 .000 .391

104 104 104 104 104

.018 1.000 .091 .120 -.801**

.854 . .356 .224 .000

104 104 104 104 104

.449** .091 1.000 .950** .107

.000 .356 . .000 .279

104 104 104 104 104

.502** .120 .950** 1.000 .067

.000 .224 .000 . .500

104 104 104 104 104

.085 -.801** .107 .067 1.000

.391 .000 .279 .500 .

104 104 104 104 104

Pearson Correlation

Sig. (2-tailed)

Pearson Correlation

Sig. (2-tailed)

Pearson Correlation

Sig. (2-tailed)

Pearson Correlation

Sig. (2-tailed)

Pearson Correlation

Sig. (2-tailed)

Average Price PerPound Small Size

Average Price PerPound XL Size

Average Price PerPound 2XL Size

Lbs Extra LargeSize 9 Oz $2.19

AveragePrice Per

PoundSmall Size

AveragePrice PerPound XL

AveragePrice Per

Pound2XL Size

AveragePrice Per

Pound3XL Size

Lbs ExtraLarge Size9 Oz $2.19

Correlation is s ignificant at the 0.01 level (2-tailed).**.

Only Own XL Price SignificantCoefficientsa

316.844 2445.178 .130 .897

253.361 515.280 .033 .492 .624

-1915.729 136.477 -.813 -14.037 .000

3590.806 2495.219 .267 1.439 .153

-1413.885 2574.564 -.106 -.549 .584

(Constant)

Model1

B Std. Error

Standardized

Coefficients

t Sig.

Dependent Variable: Lbs Extra Large Size 9 Oz $2.19a.

Coefficientsa

625.958 2371.187 .264 .792

175.976 493.905 .023 .356 .722

-1924.665 135.029 -.817 -14.254 .000

2305.377 861.524 .172 2.676 .009

(Constant)

Model1

B Std. Error

Standardized

Coefficients

t Sig.

Dependent Variable: Lbs Extra Large Size 9 Oz $2.19a.

Drop 3 XL from the model

Promotion Models (p. 195)

•If coupon dummy (1 = yes, 0 = no) and promo dummy (1=yes, 0=no) are perfectly correlated and each is correlated, say, r = .5 with weekly sales,

•R2 (sales |coupon, promo) = .25 << .25 + .25.

•Coefficients on coupon, promo would be indistinguishable from zero (nonsignificant), with huge standard errs.

Regression Analysis Data

Week Y = Sales X1 = Coupon X2 =Promotion

1 1000 0 0

2 2900 1 1

3 1100 0 0

… … … …

52 3100 1 1

Promotion Models

•Omitted Variable Bias•Promo, Coupons each boost 1000 units, but•Coupon omitted & r = 1 with promo, coefficient for promo will be 2000. (P. 175-181)

•Multicollinearity & Overloaded Models

Sales Promotion Analysis, p. 187Period Category

SalesBrand A

SalesAll Others'

SalesCategory -

1000Brand A -

300All Others -

1 1000 300 700 0 0 02 1000 300 700 0 0 03 1000 300 700 0 0 04 1200 600 600 200 300 -1005 850 200 650 -150 -100 -506 1000 300 700 0 0 07 1000 300 700 0 0 08 1000 300 700 0 0 0

Total 8050 2600 5450 50 200 -150

Example shows That 300 units incremental volume Week 4 came froma. 100 Units from A's own sales in Week 5, so net gain only 200 unitsb. 50 units net category expansionc. 150 units stolen from competitors (100 week 4, 50 week 5)

Takeaways

•Multiple Regression •Dummy Variable Regression for Conjoint (uncorrelated predictors)•Correlated predictors make it difficult to assess each predictor’s unique contribution.

•Common in promotion analysis because it is common to pull multiple promotional levers simultaneously. •2 Solutions:

•Drop a predictor (omitted variable bias so reinterpret coefficients)•Leave both in (inflated Standard Errors, hard to assess impact of each)

WEMBA, Regression Analysis

Documents

Regression analysis Regression Models

13 regression analysis quant-tech-regression

Data mining, prediction, correlation, regression, correlation analysis, regression analysis

WEMBA Lofts -Seevak Report_Final

Regression Analysis

Applied Regression Analysis - Department of …honli/teaching/Regression/lectureNotes/Lect3.pdf · Applied Regression Analysis Recall simple linear regression Multiple Linear Regression

Regression Analysis

Regression and Correlation Analysis - Regression and Correlation Analysis

Regression analysis Linear regression Logistic regression

Regression analysis

Chapter 2 Simple Linear Regression Analysis The simple ...home.iitk.ac.in/~shalab/regression/Chapter2-Regression-Simple... · Regression Analysis | Chapter 2 | Simple Linear Regression

Regression Analysis - STAT · Regression Analysis • 1. Simple Linear Regression • 2. Inference in Regression Analysis • 3. Diagnostics • 4. Simultaneous Inference • 5. Matrix

MULTIPLE REGRESSION ANALYSIS: ESTIMATION - …tastan/teaching/03 Multiple Regression... · Multiple Regression Analysis In the simple regression analysis with only one explanatory

Regression Analysis and Multiple Regression

WEMBA, Regression Analysis Market Intelligence Julie Edell Britton Session 9 October 9, 2009

CHAPTER 3 EXAMPLES: REGRESSION AND PATH ANALYSIS · PDF fileExamples: Regression And Path Analysis 19 CHAPTER 3 EXAMPLES: REGRESSION AND PATH ANALYSIS Regression analysis with univariate

Regression Analysis - Muthén & Muthén, Mplus Home Page 1/lec1_Regressi… · Regression Analysis Regression model: ... Further Readings On Regression Analysis Agresti, A. & Finlay

Yearbook WEMBA 39 East

WEMBA B, Causal Research, Conjoint Analysis Entitle Insurance

Regress™ User’s Guidepecklund/WEMBA/Regres… · Web viewJeffrey H. Moore. Graduate School of Business, Stanford University . Introduction. A statistical regression add-in to