The Chi Squared Procedurefaculty.fairfield.edu/ssawin/217/lecturenotes217/lect27...Brand KUWTK...

The Chi Squared Procedure

MA 217 - Stephen Sawin

Fairfield University

August 8, 2017

The Chi Squared Procedure: Introduction

The Chi Squared Procedure is named after the χ2-distribution. χis the Greek letter written chi, pronounced “kai,” which is theancestor of our x . It is also called the Chi Square Procedure.The Chi Squared Procedure tests whether two categorical variables(not necessarily binary) are associated rather than independent.Alternately, it tests whether the proportion of various values of acategorical (not necessarily binary) variable differ among two ormore populations. As such it generalizes the Two SampleProportion Procedure, which does the same thing for binaryvariables and exactly two populations.

The Chi Squared Procedure: Initial Example

A group project asked for evidence that what brand of designerclothes you like affects what reality TV show you like among F. U.students. The population was Fairfield U. students, explanantoryvariable was brand, response variable was TV show (bothcategorical, not binary). They stopped 50 students going intolibrary on Wednesday evening and asked these two questions.Convenience sample, favoring studious students who go to libraryWedensday evening, and students more like the questioners(unconscious bias). If you can argue either of these groups aremore likely to prefer one particular brand or one particular show,you have identified sampling bias. Their results are

Brand KUWTK Jersey Shore Teen Mom TotalLouis Vuitton 13 8 6 27

Ed Hardy 1 2 1 4A& F 4 10 5 19Total 18 20 12 50

Chi Squared Procedure: Review Association

Recall variables are independent if knowing value of one gives noinformation on likelihood of other, associated otherwise. Forcategorical variables check this by conditional proportions, theproportion of each value of explanatory variable with given value ofresponse variable.

Ed Hardy 1 2 1 4A& F 4 10 5 19Total 18 20 12 50

Conditional ProportionsBrand KUWTK Jersey Shore Teen Mom Total

Louis Vuitton 13/27 = 48.1% 29.6% 22.2% 100%

Ed Hardy 1/4 = 25% 50% 25% 100%

A& F 4/19 = 21.1% 52.6% 26.3% 100%

Ed Hardy 1 2 1 4A& F 4 10 5 19Total 18 20 12 50

Louis Vuitton 13/27 = 48.1% 29.6% 22.2% 100%

Ed Hardy 1/4 = 25% 50% 25% 100%

A& F 4/19 = 21.1% 52.6% 26.3% 100%

Ed Hardy 1 2 1 4A& F 4 10 5 19Total 18 20 12 50

Louis Vuitton 13/27 = 48.1% 29.6% 22.2% 100%

Ed Hardy 1/4 = 25% 50% 25% 100%

A& F 4/19 = 21.1% 52.6% 26.3% 100%

Chi Squared Procedure: Review AssociationRecall variables are independent if knowing value of one gives noinformation on likelihood of other, associated otherwise. Forcategorical variables check this by conditional proportions, theproportion of each value of explanatory variable with given value ofresponse variable.

Ed Hardy 1 2 1 4A& F 4 10 5 19Total 18 20 12 50

Louis Vuitton 13/27 = 48.1% 29.6% 22.2% 100%

Ed Hardy 1/4 = 25% 50% 25% 100%

A& F 4/19 = 21.1% 52.6% 26.3% 100%48.1% of Louis Vuitton wearers watch Keeping Up with theKardashians.

Ed Hardy 1 2 1 4A& F 4 10 5 19Total 18 20 12 50

Louis Vuitton 13/27 = 48.1% 29.6% 22.2% 100%

Ed Hardy 1/4 = 25% 50% 25% 100%

A& F 4/19 = 21.1% 52.6% 26.3% 100%25% of Ed Hardy wearers watch Keeping Up with the Kardashians.

Ed Hardy 1 2 1 4A& F 4 10 5 19Total 18 20 12 50

Louis Vuitton 13/27 = 48.1% 29.6% 22.2% 100%

Ed Hardy 1/4 = 25% 50% 25% 100%

A& F 4/19 = 21.1% 52.6% 26.3% 100%21.1% of Abercrombie and Fitch wearers watch Keeping Up withthe Kardashians.

Ed Hardy 1 2 1 4A& F 4 10 5 19Total 18 20 12 50

Louis Vuitton 13/27 = 48.1% 29.6% 22.2% 100%

Ed Hardy 1/4 = 25% 50% 25% 100%

A& F 4/19 = 21.1% 52.6% 26.3% 100%So chance of watching KUWTK differs depending on what youwear. The variables are related (in the sample)

Ed Hardy 1 2 1 4A& F 4 10 5 19Total 18 20 12 50

Louis Vuitton 13/27 = 48.1% 29.6% 22.2% 100%

Ed Hardy 1/4 = 25% 50% 25% 100%

A& F 4/19 = 21.1% 52.6% 26.3% 100%If first column of conditional proportions were equal, knowing whatyou wear tells you nothing about chances of watching Kim.

Ed Hardy 1 2 1 4A& F 4 10 5 19Total 18 20 12 50

Louis Vuitton 13/27 = 48.1% 29.6% 22.2% 100%

Ed Hardy 1/4 = 25% 50% 25% 100%

A& F 4/19 = 21.1% 52.6% 26.3% 100%If cond. props in each column are equal, variables are independent.

Chi Squared Procedure

Ed Hardy 1 2 1 4A& F 4 10 5 19Total 18 20 12 50

These two variables are associated in the sample. But are theyassociated in the population? If one more Ed Hardy wearer likedKUWTK, the difference with Louis Vuitton would disappear. ChiSquared tells if apparant relationships in data are explainable byrandom variation or probably represent real relationships atpopulation level. p-value gives chance you’d see results like yougot if variables were independent. If small, results are strongevidence vars. are not independent.

Ed Hardy 1 2 1 4A& F 4 10 5 19Total 18 20 12 50

Chi Squared Procedure: Process

I Enter the table of counts (not percentages) into the “data”tab of the Chi Squared Procedure template. Do not enter thetotals, and rename the row and column labels if they arenumerals. Delete excess rows or columns.

I Check number of rows and columns, number of observations,and row/column totals given at top of “expected” tab arecorrect.

I Read off the p-value from the top of the “calc” tab. Sincethere is no choice in H0 and HA, there is no need to setanything.

I Conclude this data [is/ is not] significant evidence at the [α]significance level that [EXPLANATORY VARIABLE] and[RESPONSE VARIABLE] are related in [POPULATION]. Orconclude this data [is/ is not] significant evidence at the [α]significance level that the porportions of [ VARIABLE] aredifferent among the populations [POPULATION1,POPULATION 2, etc].

Chi Squared Procedure: example again

Test at the 5% significance level that the following sample isevidence that there is a relationship between favorite designer andfavorite reality TV show among F.U. students.

Ed Hardy 1 2 1 4A& F 4 10 5 19Total 18 20 12 50

Enter the table (minus totals!) into the “data” tab of template.Read off p-value from “calc” tab

p-val = 0.395.

Since this is more than the significance level this data is notsignificant evidence at the 5% level that favorite designer andfavorite reality TV show are related in F.U. students.

Ed Hardy 1 2 1 4A& F 4 10 5 19Total 18 20 12 50

p-val = 0.395.

Ed Hardy 1 2 1 4A& F 4 10 5 19Total 18 20 12 50

p-val = 0.395.

Ed Hardy 1 2 1 4A& F 4 10 5 19Total 18 20 12 50

p-val = 0.395.

Ed Hardy 1 2 1 4A& F 4 10 5 19Total 18 20 12 50

p-val = 0.395.

Chi Squared Procedure: under the hood

I Chi Squared associates to your actual data an table ofexpected data. It is a table with the same row and columntotals as your data but independent, so cond. props alongeach colmn are equal Expected table is at bottom of“expected” tab. It is what you would expect to get fromsample if H0 were true.

I Chi Squared measures how far actual data is from beingindependent, which is how far it is from expected data, bycombining the differences into one number called thechi-squared statistic. Chi squared stat is found above p-valueon “calc” page.

I If H0 is true the chi squared stat follows a chi squareddistribution with (#rows − 1)(#columns − 1) degrees offreedom. Degrees of freedom is found on “calc” page abovechi squared value. The p-value comes from this distribution.

Chi Squared Procedure: Assumptions

1. SRS- the sample is a SRS, or if there are several separatesamples they are each SRS of their respective populations andare independent.

2. Large Pop- The population is at least 20 times the samplesize, or each population is at least 20 times its respectivesample size if there are separate samples.

3. Rule of 5- 80% of the expected cells must have at least 5 inthem. This percentage is worked out for you at bottom of the“use” tab.

Chi Squared Procedure: example assumptions

Ed Hardy 1 2 1 4A& F 4 10 5 19Total 18 20 12 50

1. SRS- The sample was a convenience sample. Not Met.

2. Large Pop-n = 50 so need at least 1000 F.U. students. Met.

3. Rule of 5- “use” tab says that only 55.6% of expected cellshave at least 5. Not Met.

Ed Hardy 1 2 1 4A& F 4 10 5 19Total 18 20 12 50

Chi Squared Procedure: another example

I surveyed my class about their gender and party affiliation and putthe data on my website under Gender Partisan. Use this data totest the claim at the 1% level that gender and party affiliation arerelated.H0 : gender and party affilation are independent.HA : gender and party affiliation are related

p-val = .878

This data is not significant evidence at the 1% level that genderand party affiliation are related in F.U. students.Assumptions:

1. SRS- Not met. Convenience sample.

2. Large Pop- Met. More than 20 × 68 = 1360 students atFairfield.

3. Rule of 5- Met. 83.3% are 5 or more.

p-val = .878

Key Points

After whating this lecture you should be able to

I state the null (the two variables are independent) andalternate (the two variables are related) hypotheses for a ChiSquared Procedure hypothesis test.

I enter a table of data into the Chi Squared Procedure templatecorrectly, get the p-value, and state your conclusion in anEnglish sentence.

I use and understand the table of expected values and relate itto the actual data.

I check the assumptions.

The Chi Squared Procedurefaculty.fairfield.edu/ssawin/217/lecturenotes217/lect27...Brand KUWTK...

Documents

FY2020 RECOMMENDED BUDGET€¦ · Positions # *Total Salary Total FICA Total Retirement Total Insurances Total Unemployment Total Workers Comp Effect on Budget Patrol 6 $284,820 $21,789

Kuwtk(10.07 10.15)

DENGLISCH. Colgate Total Weisse Paste Colgate Total Weisse PasteColgate Total Weisse PasteColgate Total Weisse Paste Colgate Total Plus Whitening

TOTAL TOTAL KON - support.spencer.it

Lect27 Engin112

Total Rev & Total Cost

portalanterior.ine.mx · Ord. Ext. Esp. Total Unani-midad Mayoría Total Unani-midad Mayoría Total Unani-midad Mayoría Total Unani-midad Mayoría Total Unani-midad Mayoría Total

· 2018-11-09 · fl.n. 2542 GFMIS 2561 Total Total Total Total Total Total Total Total Total Total Total Total Total Total Total Total Total Total Total Total Total Total Total

arryrahmawan.net · + pemasukan Total — Pengeluaran Total Pepbedaan +/ - Ditabung + pemasukan Total — Pengeluanan Total Perbedaan +/ - Ùitabung + pemasukan Total — Pengeluanan

Total: Total: Total - Sanbornton NH

· Web viewName. KMs Total. Jerry Reitman Total. 233. Mel Spooner Total. 229. corey holbrooke Total. 207. Michael Arding Total. 200. A Mund Total. 193. Michael Halliwell Total

k'T,'- 'I - Pages · l ,,' ,, ", TSll Total TSJXL Total TSL2Xl Total TSLXl Total TSOl Total TSOXl Total TSPL Total TSPMTotal TSPXl Total TSQ2Xl Total ~Total . Total

Formas Total Formas Total Formas Total Formas …cueda/kenkyu/rekisi/...Formas abreviadas frecuentes Formas Total Formas Total Formas Total Formas Total Formas Total q 39728

The Age of - fajarpaper.comfajarpaper.com/include/.../FajarPaper_AR_2016.pdf · Modal Kerja Bersih Total Assets Total Aset Total Liabilities Total Liabilitas Total Equity Total Ekuitas

TQM-S Total Quality Management of Self Total Self Total Quality Total Management

Total turnover Total turnover Total payed commission Total payed commission

Estimates of Total Nitrogen, Total Phosphorous, and Total

€¦ · Web viewName. KMs Total. A Mund Total. 405. Jerry Reitman Total. 307. Wyatt Tarr Total. 296. Michael Arding Total. 253. corey holbrooke Total. 251. Mel Spooner Total. 229

Dept. New OLD TOTAL€¦ · Total New Old Total New Old Total New Old Total New Old Total New Old Total New Old Total New Old Total New Old Opthalmology ENT OBST /Gynaec CTVS Cardiology

Lect27 handout