Secondary Data, Measures, Hypothesis Formulation, Chi-Square Market Intelligence Julie Edell Britton...

Preview:

Citation preview

Secondary Data, Measures, Hypothesis Formulation, Chi-Square

Market IntelligenceJulie Edell Britton

Session 3August 21, 2009

Today’s Agenda

Announcements Secondary data quality Measure types Hypothesis Testing and Chi-Square

3

• National Insurance Case for Sat. 8/22– Stephen will do a tutorial today, Friday, 8/21 from 1:00

-2:15 in the MBA PC Lab and be available tonight from 7 – 9 pm in the MBA PC Lab to answer questions

– Submit slides by 8:00 am on Sat. 8/22– 2 slides with your conclusions – you may add

Appendices to support you conclusions

Announcements

Primary vs. Secondary Data

Primary -- collected anew for current purposes Secondary -- exists already, was collected for some other purpose

Finding Secondary Data Online @ Fuqua http://library.fuqua.duke.edu

Primary vs. Secondary Data

Evaluating Sources of Secondary Data

If you can’t find the source of a number, don’t use it. Look for further data.Always give sources when writing a report.

Applies for Focus Group write-ups too

Be skeptical.

Secondary Data: Pros & Cons

Advantagescheapquickoften sufficientthere is a lot of data out there

Disadvantagesthere is a lot of data out therenumbers sometimes conflict categories may not fit your needs

Types of Secondary Data

Internal External

Database: Can Slice/Dice; Need more processing

WEMBA_C IMS Health, Nielsen, IRI*

Summary: Can’t change categories, get new crosstabs

Knowledge Management

Conquistador, Simmons,

IRI_factbook

*IRI = Information Resources, Inc. (http://us.infores.com/)

Secondary Data Quality: KAD p. 120 & “What’s Behind the Numbers?”

Data consistent with other independent sources?What are the classifications? Do they fit needs?When were numbers collected? Obsolete?Who collected the numbers? Bias, resources?Why were the data collected? Self-interest?How were the numbers generated?

Sample sizeSampling method Measure typeCausality (MBA Marketing Timing & Internship)

It is Hard to Infer Causality from Secondary Data

Took Core Marketing

Got Desired Marketing Internship

Did Not Get Desired Marketing Internship

Term 1 76% 24%

Term 3 51% 49%

Today’s Agenda

Announcements Secondary data quality Measure types Hypothesis Testing and Chi-Square

Measure TypesNominal: Unordered Categories

Male=1; Female = 2;

Ordinal: Ordered Categories, intervals can’t be assumed to be equal.

I-95 is east of I-85; I-80 is north of I-40; Preference data

Interval: Equally spaced categories, 0 is arbitrary and units arbitrary.

Fahrenheit temperature – each degree is equal, Attitudes

Ratio: Equally spaced categories, 0 on scale means 0 of underlying quantity.

$ Sales, Market Share

Meaningful Statistics & Permissible Transformations

Examples Permissible Transform

Meaningful Stats

Ratio Q1 = Bottles of wine Q2 = b*Q1 e.g., cases sold (b = 1/12)

All below + % change

Interval Wine Rating Scale 1 = Very Bad to 20 = Very Good

Att2 = a + (b*Att1) e.g., 81 to 100 (a = 80, b = 1) e.g., 80.5 to 90 (a = 80, b = .5)

All below + mean

Ordinal Rank order of wines 1 = favorite 2 = 2nd preferred 3 = least preferred

Any order preserving 100 = favorite 90 = 2nd preferred 0 = least preferred

All below + median

Nominal 1 = Pinot Noir 2 = Merlot 3 = Chardonnay

Any transformation is ok 16 = Pinot Noir 3 = Merlot 13 = Chardonnay

# of cases mode

Means and Medians with Ordinal Data

Gender Measure 1 Measure 2 Means

M 1 1 Measure 1

M 2 2 M=5.4 < F=5.6

F 3 3 Measure 2

F 4 4 M=65.4 > F=25.6

F 5 5

F 6 6 Medians

M 7 107 Measure 1

M 8 108 M=7 > F=5

M 9 109 Measure 2

F 10 110 M=107 > F=5

Ratio Scales & Index Numbers

Index= 100* (Per Capita Segment i) / (Per Capita Ave)

(000s) Sales Per Capita SegmentAge Group Population Units (000) Sales Index

<25 700 1400 2.00 7025-34 500 1250 2.50 8835-44 300 900 3.00 10545-54 240 960 4.00 14055 + 260 1196 4.60 161Total 2000 5706 2.85 100

Today’s Agenda

Announcements Southwestern Conquistador Beer Case Backward Market Research Secondary data quality Measure types Hypothesis Testing and Chi-Square

Cross Tabs of MBA Acceptance by Gender

Accept Reject

M 140 860 1000

F 60 740 800

200 1600

A. Raw Frequencies

Accept Reject

M .078 .478 .556

F .033 .411 .444

.111 .889 1.0

B. Cell Percentages

Accept Reject

M 140/ 1000 = .140

860/ 1000 = .860

1.00

F 60/ 800 =.075

740/ 800 = .925

1.00

C. Row Percentages

D. Column Percentages

Accept Reject

M 140/ 200 = .700

860/ 1600 = .538

F 60/ 200 =.300

740/ 1600 = .462

1.00 1.00

Rule of Thumb

If a potential causal interpretation exists, make numbers add up to 100% at each level of the causal factor.

Above: it is possible that gender (row) causes or influences acceptance (column), but not that acceptance influences gender. Hence, row percentages (format C) would be desirable.

Hypothesis Formulation and TestingHypothesis: What you believe the relationship is between the measures.

TheoryEmpirical EvidenceBeliefsExperience

Here: Believe that acceptance is related to gender

Null Hypothesis: Acceptance is not related to gender

Logic of hypothesis testing: Negative InferenceThe null hypothesis will be rejected by showing that a given observation would be quite improbable, if the hypothesis was true.

Want to see if we can reject the null.

Steps in Hypothesis Testing

1. State the hypothesis in Null and Alternative Form

– Ho: There is no relationship between gender and MBA acceptance

– Ha1: Gender and Acceptance are related (2-sided)

– Ha2: Fewer Women are Accepted (1-sided)

2. Choose a test statistic

3. Construct a decision rule

Chi-Square Test

Used for nominal data, to compare the observed frequency of responses to what would be “expected” under the null hypothesis.

Two types of tests

Contingency (or Relationship) – tests if the variables are independent – i.e., no significant relationship exists between the two variables

Goodness of fit test – Compare whether the data sampled is proportionate to some standard

Chi-Square Test

k

i i

ii

E

EO

1

22 )( With (r-1)*(c-1)

degrees of freedom

iO Observed number in cell i i

iE Expected number in cell iunder independence

k number of cells r cnumber of rows number of columns

iE = Column Proportion * Row Proportion * total number observed

MBA Acceptance Data Contingency

Accept Reject

M 140 860 1000

F 60 740 800

200 1600 1800

A. Observed Frequencies Accept Reject

M .078 .478 .556

F .033 .411 .444

.111 .889 1.0

B. Cell Percentages

Accept Reject

M .111*.556*1800=111 .889*.556*1800=890

F .111*.444*1800= 89 .889*.444*1800=710

C. Expected Frequencies

Chi-Square Test

k

i i

ii

E

EO

1

22 )(

With (r-1)*(c-1) degrees of freedom

i

2=(140-111)2/111 + (860-890)2/890 + (60-89)2/89 + (740-710)2/710= 19.30 So?

3. Construct a decision rule

Decision Rule1. Significance Level -

2. Degrees of freedom - number of unconstrained data used in calculating a test statistic - for Chi Square it is (r-1)*(c-1), so here that would be 1. When the number of cells is larger, we need a larger test statistic to reject the null.

3. Two-tailed or One-tailed test – Significance tables are (unless otherwise specified) two tailed tables. Chi-Sq is on pg 517Ha1: Gender and Acceptance are related (2-sided) Critical Value =

3.84 Ha2: Fewer Women are Accepted (1-sided) Critical Value = 2.71

4. Decision Rule: Reject the Ho if calculated Chi-sq value (19.3) > the test critical value (3.84) for Ha1 or (2.71) for Ha2

05. Probability of rejecting the Null Hypothesis, when it is true

Chi-Square Table

Chi-Square Test

Used for nominal data, to compare the observed frequency of responses to what would be “expected” under some specific null hypothesis.

Two types of tests

Contingency (or Relationship) – tests if the variables are independent – i.e., no significant relationship exists

Goodness of fit test – Compare whether the data sampled is proportionate to some standard

Goodness of fit – Chi-Square

Ho: Car Color Preferences have not shiftedHa: Car color Preferences have shifted

Data Historic Distribution Expected # = Prob*n

Red 680 30% 750Green 520 25% 625Black 675 25% 625White 625 20% 500Tot (n) 2500

Do we observe what we expected?

Chi-Square Test

k

i i

ii

E

EO

1

22 )(

With (k-1) degrees of freedom

i

2=(680-750)2/750 + (520-625)2/625 + (675-625)2/625 + (625-500)2/500= 59.42

So?

3. Construct a decision rule

Decision Rule1. Significance Level -

2. Degrees of freedom - number of unconstrained data used in calculating a test statistic - for Chi Square it is (k-1), so here that would be 3. When the number of cells is larger, we need a larger test statistic to reject the null.

3. Two-tailed or One-tailed test – Significance tables are (unless otherwise specified) two tailed tables. Chi-Sq is on pg 517 Ha: Preference have changed (2-sided) Critical Value = 7.81

4. Decision Rule: Reject the Ho if calculated Chi-sq value (59.42) > the test critical value (7.81).

05. Probability of rejecting the Null Hypothesis, when it is true

Chi-Square Table

RecapFinding & Evaluating Secondary DataMeasure Types

permissible transformationsMeaningful statistics

Index #sCrosstabs

Casting right direction Chi-square statistic

Contingency Test Goodness of Fit Test