View
246
Download
1
Category
Preview:
Citation preview
Secondary Data, Measures, Hypothesis Formulation, Chi-Square
Market IntelligenceJulie Edell Britton
Session 3August 21, 2009
Today’s Agenda
Announcements Secondary data quality Measure types Hypothesis Testing and Chi-Square
3
• National Insurance Case for Sat. 8/22– Stephen will do a tutorial today, Friday, 8/21 from 1:00
-2:15 in the MBA PC Lab and be available tonight from 7 – 9 pm in the MBA PC Lab to answer questions
– Submit slides by 8:00 am on Sat. 8/22– 2 slides with your conclusions – you may add
Appendices to support you conclusions
Announcements
Primary vs. Secondary Data
Primary -- collected anew for current purposes Secondary -- exists already, was collected for some other purpose
Finding Secondary Data Online @ Fuqua http://library.fuqua.duke.edu
Primary vs. Secondary Data
Evaluating Sources of Secondary Data
If you can’t find the source of a number, don’t use it. Look for further data.Always give sources when writing a report.
Applies for Focus Group write-ups too
Be skeptical.
Secondary Data: Pros & Cons
Advantagescheapquickoften sufficientthere is a lot of data out there
Disadvantagesthere is a lot of data out therenumbers sometimes conflict categories may not fit your needs
Types of Secondary Data
Internal External
Database: Can Slice/Dice; Need more processing
WEMBA_C IMS Health, Nielsen, IRI*
Summary: Can’t change categories, get new crosstabs
Knowledge Management
Conquistador, Simmons,
IRI_factbook
*IRI = Information Resources, Inc. (http://us.infores.com/)
Secondary Data Quality: KAD p. 120 & “What’s Behind the Numbers?”
Data consistent with other independent sources?What are the classifications? Do they fit needs?When were numbers collected? Obsolete?Who collected the numbers? Bias, resources?Why were the data collected? Self-interest?How were the numbers generated?
Sample sizeSampling method Measure typeCausality (MBA Marketing Timing & Internship)
It is Hard to Infer Causality from Secondary Data
Took Core Marketing
Got Desired Marketing Internship
Did Not Get Desired Marketing Internship
Term 1 76% 24%
Term 3 51% 49%
Today’s Agenda
Announcements Secondary data quality Measure types Hypothesis Testing and Chi-Square
Measure TypesNominal: Unordered Categories
Male=1; Female = 2;
Ordinal: Ordered Categories, intervals can’t be assumed to be equal.
I-95 is east of I-85; I-80 is north of I-40; Preference data
Interval: Equally spaced categories, 0 is arbitrary and units arbitrary.
Fahrenheit temperature – each degree is equal, Attitudes
Ratio: Equally spaced categories, 0 on scale means 0 of underlying quantity.
$ Sales, Market Share
Meaningful Statistics & Permissible Transformations
Examples Permissible Transform
Meaningful Stats
Ratio Q1 = Bottles of wine Q2 = b*Q1 e.g., cases sold (b = 1/12)
All below + % change
Interval Wine Rating Scale 1 = Very Bad to 20 = Very Good
Att2 = a + (b*Att1) e.g., 81 to 100 (a = 80, b = 1) e.g., 80.5 to 90 (a = 80, b = .5)
All below + mean
Ordinal Rank order of wines 1 = favorite 2 = 2nd preferred 3 = least preferred
Any order preserving 100 = favorite 90 = 2nd preferred 0 = least preferred
All below + median
Nominal 1 = Pinot Noir 2 = Merlot 3 = Chardonnay
Any transformation is ok 16 = Pinot Noir 3 = Merlot 13 = Chardonnay
# of cases mode
Means and Medians with Ordinal Data
Gender Measure 1 Measure 2 Means
M 1 1 Measure 1
M 2 2 M=5.4 < F=5.6
F 3 3 Measure 2
F 4 4 M=65.4 > F=25.6
F 5 5
F 6 6 Medians
M 7 107 Measure 1
M 8 108 M=7 > F=5
M 9 109 Measure 2
F 10 110 M=107 > F=5
Ratio Scales & Index Numbers
Index= 100* (Per Capita Segment i) / (Per Capita Ave)
(000s) Sales Per Capita SegmentAge Group Population Units (000) Sales Index
<25 700 1400 2.00 7025-34 500 1250 2.50 8835-44 300 900 3.00 10545-54 240 960 4.00 14055 + 260 1196 4.60 161Total 2000 5706 2.85 100
Today’s Agenda
Announcements Southwestern Conquistador Beer Case Backward Market Research Secondary data quality Measure types Hypothesis Testing and Chi-Square
Cross Tabs of MBA Acceptance by Gender
Accept Reject
M 140 860 1000
F 60 740 800
200 1600
A. Raw Frequencies
Accept Reject
M .078 .478 .556
F .033 .411 .444
.111 .889 1.0
B. Cell Percentages
Accept Reject
M 140/ 1000 = .140
860/ 1000 = .860
1.00
F 60/ 800 =.075
740/ 800 = .925
1.00
C. Row Percentages
D. Column Percentages
Accept Reject
M 140/ 200 = .700
860/ 1600 = .538
F 60/ 200 =.300
740/ 1600 = .462
1.00 1.00
Rule of Thumb
If a potential causal interpretation exists, make numbers add up to 100% at each level of the causal factor.
Above: it is possible that gender (row) causes or influences acceptance (column), but not that acceptance influences gender. Hence, row percentages (format C) would be desirable.
Hypothesis Formulation and TestingHypothesis: What you believe the relationship is between the measures.
TheoryEmpirical EvidenceBeliefsExperience
Here: Believe that acceptance is related to gender
Null Hypothesis: Acceptance is not related to gender
Logic of hypothesis testing: Negative InferenceThe null hypothesis will be rejected by showing that a given observation would be quite improbable, if the hypothesis was true.
Want to see if we can reject the null.
Steps in Hypothesis Testing
1. State the hypothesis in Null and Alternative Form
– Ho: There is no relationship between gender and MBA acceptance
– Ha1: Gender and Acceptance are related (2-sided)
– Ha2: Fewer Women are Accepted (1-sided)
2. Choose a test statistic
3. Construct a decision rule
Chi-Square Test
Used for nominal data, to compare the observed frequency of responses to what would be “expected” under the null hypothesis.
Two types of tests
Contingency (or Relationship) – tests if the variables are independent – i.e., no significant relationship exists between the two variables
Goodness of fit test – Compare whether the data sampled is proportionate to some standard
Chi-Square Test
k
i i
ii
E
EO
1
22 )( With (r-1)*(c-1)
degrees of freedom
iO Observed number in cell i i
iE Expected number in cell iunder independence
k number of cells r cnumber of rows number of columns
iE = Column Proportion * Row Proportion * total number observed
MBA Acceptance Data Contingency
Accept Reject
M 140 860 1000
F 60 740 800
200 1600 1800
A. Observed Frequencies Accept Reject
M .078 .478 .556
F .033 .411 .444
.111 .889 1.0
B. Cell Percentages
Accept Reject
M .111*.556*1800=111 .889*.556*1800=890
F .111*.444*1800= 89 .889*.444*1800=710
C. Expected Frequencies
Chi-Square Test
k
i i
ii
E
EO
1
22 )(
With (r-1)*(c-1) degrees of freedom
i
2=(140-111)2/111 + (860-890)2/890 + (60-89)2/89 + (740-710)2/710= 19.30 So?
3. Construct a decision rule
Decision Rule1. Significance Level -
2. Degrees of freedom - number of unconstrained data used in calculating a test statistic - for Chi Square it is (r-1)*(c-1), so here that would be 1. When the number of cells is larger, we need a larger test statistic to reject the null.
3. Two-tailed or One-tailed test – Significance tables are (unless otherwise specified) two tailed tables. Chi-Sq is on pg 517Ha1: Gender and Acceptance are related (2-sided) Critical Value =
3.84 Ha2: Fewer Women are Accepted (1-sided) Critical Value = 2.71
4. Decision Rule: Reject the Ho if calculated Chi-sq value (19.3) > the test critical value (3.84) for Ha1 or (2.71) for Ha2
05. Probability of rejecting the Null Hypothesis, when it is true
Chi-Square Table
Chi-Square Test
Used for nominal data, to compare the observed frequency of responses to what would be “expected” under some specific null hypothesis.
Two types of tests
Contingency (or Relationship) – tests if the variables are independent – i.e., no significant relationship exists
Goodness of fit test – Compare whether the data sampled is proportionate to some standard
Goodness of fit – Chi-Square
Ho: Car Color Preferences have not shiftedHa: Car color Preferences have shifted
Data Historic Distribution Expected # = Prob*n
Red 680 30% 750Green 520 25% 625Black 675 25% 625White 625 20% 500Tot (n) 2500
Do we observe what we expected?
Chi-Square Test
k
i i
ii
E
EO
1
22 )(
With (k-1) degrees of freedom
i
2=(680-750)2/750 + (520-625)2/625 + (675-625)2/625 + (625-500)2/500= 59.42
So?
3. Construct a decision rule
Decision Rule1. Significance Level -
2. Degrees of freedom - number of unconstrained data used in calculating a test statistic - for Chi Square it is (k-1), so here that would be 3. When the number of cells is larger, we need a larger test statistic to reject the null.
3. Two-tailed or One-tailed test – Significance tables are (unless otherwise specified) two tailed tables. Chi-Sq is on pg 517 Ha: Preference have changed (2-sided) Critical Value = 7.81
4. Decision Rule: Reject the Ho if calculated Chi-sq value (59.42) > the test critical value (7.81).
05. Probability of rejecting the Null Hypothesis, when it is true
Chi-Square Table
RecapFinding & Evaluating Secondary DataMeasure Types
permissible transformationsMeaningful statistics
Index #sCrosstabs
Casting right direction Chi-square statistic
Contingency Test Goodness of Fit Test
Recommended