24
Cross-Tabulation Tables Tables in R and Computing Chi Square

Cross-Tabulation Tables

  • Upload
    seoras

  • View
    68

  • Download
    0

Embed Size (px)

DESCRIPTION

Cross-Tabulation Tables. Tables in R and Computing Chi Square. Kinds of Data. Nominal or Ordinal (few categories) Interval if it is grouped Some tests ignore the ordering of the categories (e.g. Chi square ) In R this means we are working with factors. Kinds of Tables. - PowerPoint PPT Presentation

Citation preview

Page 1: Cross-Tabulation  Tables

Cross-Tabulation Tables

Tables in R and Computing Chi Square

Page 2: Cross-Tabulation  Tables

Kinds of Data• Nominal or Ordinal (few

categories)• Interval if it is grouped• Some tests ignore the ordering of

the categories (e.g. Chi square)• In R this means we are working

with factors

Page 3: Cross-Tabulation  Tables

Kinds of Tables1. One line per observation, e.g.

data on Ernest Witte where each row is a single individual - table() and Rcmdr()

2. One line per cell with a column of numbers representing the count for that cell – xtabs()

Page 4: Cross-Tabulation  Tables

Kinds of Tables3. A row for each category of the

first variable and a column for each category of the second variable with counts at the intersection of a row and column – Rcmdr (Enter table directly)

Page 5: Cross-Tabulation  Tables

Type 1> EWG2[sample(rownames(EWG2), 6),c("Age", "Goods")]

Age Goods159 Middle Adult Absent126 Child Present075 Child Absent156 Old Adult Present095 Adult Absent157 Old Adult Absent

Page 6: Cross-Tabulation  Tables

Type 2 Age Goods FreqChild Absent 18Adult Absent 51Child Present 19Adult Present 55

Page 7: Cross-Tabulation  Tables

Type 3

Absent Present Child 18 19 Adult 51 55

Page 8: Cross-Tabulation  Tables

Factors in R• Factors use integers to code for

categorical data• Each integer code is associated

with a label, e.g. 1 could stand for “Absent” and 2 for “Present”

• Usually R creates factors from any character data columns

Page 9: Cross-Tabulation  Tables

Factors• Regular factors are either equal or

not equal (nominal)• Ordered factors can be >, ==, and

<• Rcmdr makes is easy to convert a

numeric variable to a factor, to change the factor labels, to change the order of the factor levels, and to make the factor ordered

Page 10: Cross-Tabulation  Tables

Tables in R• Tables are basically matrices with

labeling• Transferring between data.frames

and tables is possible but can lead to unexpected results

• Rcmdr does not recognize tables.

Page 11: Cross-Tabulation  Tables

Key table commands in R• table() – create one and multi-way

tables• xtabs() - uses formulas (and

optionally weights/counts)• addmargins() – add row and

column totals• prop.table() – create table of

proportions

Page 12: Cross-Tabulation  Tables

Key commands (cont.)• ftable() – flatten a

multidimensional table – but does not work with xtable()

• print(xtable(), type=“html”) – print an html version of the table.

Page 13: Cross-Tabulation  Tables

# Use Rcmdr to load ErnestWitte and create EWG2# EWG2 <- subset(ErnestWitte, subset=Group==2)table(EWG2$Age)EWG2$Age <- factor(EWG2$Age)Table1 <- table(EWG2$Age, EWG2$Goods, dnn=c("Age", "Goods"))Table1str(Table1)Table2 <- xtabs(~Age+Goods, data=EWG2)Table2str(Table2)DF1 <- data.frame(Table1)DF1names(DF1) <- c("Age", "Goods","Freq")DF

Page 14: Cross-Tabulation  Tables

Table3 <- xtabs(Freq~Age+Goods, data=DF1)Table3addmargins(Table1)prop.table(Table1)prop.table(Table1, 1)prop.table(addmargins(Table1, 1), 1)

# Included in RcmdrrowPercents(Table1)colPercents(Table1)

Page 15: Cross-Tabulation  Tables

Table4 <- xtabs(~Adult+Goods+Pathology, data=EWG2)Table4str(Table4)ftable(Table4, row.vars=c(1, 2), col.vars=3)ftable(Table4, row.vars=c(3, 2), col.vars=1)

# tohtml() puts html code for table into Windows# clipboard or a file# named “clipboard” in Mac OsX or Linuxtohtml <- function(x) print(xtable(x), type="html", file="clipboard")tohtml(Table1)# Paste clipboard into Microsoft Excel

Page 16: Cross-Tabulation  Tables

Null Hypothesis• The usual null hypothesis is that

the row and column variables are independent of one another – knowing one does not help us predict the other

• If the null hypothesis is false, the cell values will deviate from expected values

Page 17: Cross-Tabulation  Tables

E.g. Coin Flipping• If I flip a coin twice, the chance

that the first flip comes up heads is .5

• The chance that the second flip comes up heads is .5 as well

• But what if the chance of getting a head changed depending on the first toss? The probabilities would be conditional

Page 18: Cross-Tabulation  Tables

Expected Probabilities• Under the null hypothesis the

expected value for a cell is– (Row sum * Column sum)/Total count

• Deviations of the actual counts from the expected values is measured as– (Observed – Expected)2/Expected

• Summing the deviations over all cells gives us a statistic with a chi-square distribution

Page 19: Cross-Tabulation  Tables

Chi-Square Test• Compares observed counts to

expected counts based on independence

• Rcmdr constructs the tables and computes the test, BUT deletes the results

Page 20: Cross-Tabulation  Tables

Two Options• chisq.test()

– Saves results in multiple tables– Performs Chi Square and simulation

for p value• CrossTable() and crosstab() in

descr– SAS, SPSS style output with xtable()– More formatting options– Mosaic plot with crosstab()

Page 21: Cross-Tabulation  Tables

Results <- chisq.test(xtabs(~Age+Pathology, data=EWG2), simulate.p.value=TRUE)

Pearson's Chi-squared test with simulated p-value (based on 2000 replicates)

data: xtabs(~Age + Pathology, data = EWG2) X-squared = 31.2876, df = NA, p-value = 0.0004998

str(Results)Results$expectedResults$residualsfisher.test(xtabs(~Sex+Goods, data=EWG2))

Page 22: Cross-Tabulation  Tables

with(EWG2, CrossTable(Age, Pathology))with(EWG2, CrossTable(Age, Pathology, prop.c=FALSE, prop.t=FALSE))with(EWG2, crosstab(Age, Pathology))with(EWG2, crosstab(Age, Pathology, expected=TRUE, resid=TRUE))with(EWG2, crosstab(Sex, Goods))

Page 23: Cross-Tabulation  Tables
Page 24: Cross-Tabulation  Tables