Categorical Data Analysis Selected Solutions by Agresti

Embed Size (px)

DESCRIPTION

Categorical Analysis

Citation preview

  • 1CATEGORICAL DATA ANALYSIS, 3rd edition

    Solutions to Selected Exercises

    Alan Agresti

    Version August 3, 2012, cAlan Agresti 2012

    This file contains solutions and hints to solutions for some of the exercises in CategoricalData Analysis, third edition, by Alan Agresti (John Wiley, & Sons, 2012). The solutionsgiven are partly those that are also available at the website www.stat.ufl.edu/~aa/cda2/cda.html for many of the odd-numbered exercises in the second edition of thebook (some of which are now even-numbered). I intend to expand the document withadditional solutions, when I have time.

    Please report errors in these solutions to the author (Department of Statistics, Univer-sity of Florida, Gainesville, Florida 32611-8545, e-mail [email protected]), so theycan be corrected in future revisions of this site. The author regrets that he cannot providestudents with more detailed solutions or with solutions of other exercises not in this file.

    Chapter 1

    1. a. nominal, b. ordinal, c. interval, d. nominal, e. ordinal, f. nominal,

    3. varies from batch to batch, so the counts come from a mixture of binomials ratherthan a single bin(n, ). Var(Y ) = E[Var(Y | )] + Var[E(Y | )] > E[Var(Y | )] =E[n(1 )].7. a. () = 20, so it is not close to quadratic.

    b. = 1.0. Wald statistic z = (1.0.5)/1.0(0)/20 =. Wald CI is 1.01.96

    1.0(0)/20 =

    1.0 0.0, or (1.0, 1.0). These are not sensible.c. z = (1.0 .5)/

    .5(.5)/20 = 4.47, P < 0.0001. Score CI is (0.839, 1.000).

    d. Test statistic 2(20) log(20/10) = 27.7, df = 1. The CI is (exp(1.962/40), 1) =(0.908, 1.0).e. P -value = 2(.5)20 = .00000191.

    9. The chi-squared goodness-of-fit test of the null hypothesis that the binomial propor-tions equal (0.75, 0.25) has expected frequencies (827.25, 275.75), and X2 = 3.46 basedon df = 1. The P -value is 0.063, giving moderate evidence against the null.

    10. The sample mean is 0.61. Fitted probabilities for the truncated distribution are0.543, 0.332, 0.102, 0.021, 0.003. The estimated expected frequencies are 108.5, 66.4,20.3, 4.1, and 0.6, and the Pearson X2 = 0.7 with df = 3 (0.3 with df = 2 if we truncateat 3 and above). The fit seems adequate.

    11. With the binomial test the smallest possible P -value, from y = 0 or y = 5, is

  • 22(1/2)5 = 1/16. Since this exceeds 0.05, it is impossible to reject H0, and thus P(TypeI error) = 0. With the large-sample score test, y = 0 and y = 5 are the only outcomes

    to give P 0.05 (e.g., with y = 5, z = (1.0 0.5)/0.5(0.5)/5 = 2.24 and P = 0.025).

    Thus, for that test, P(Type I error) = P (Y = 0) + P (Y = 5) = 1/16.12. a. No outcome can give P .05, and hence one never rejects H0.b. When T = 2, mid P -value = 0.04 and one rejects H0. Thus, P(Type I error) = P(T= 2) = 0.08.c. P -values of the two tests are 0.04 and 0.02; P(Type I error) = P(T = 2) = 0.04 withboth tests.d. P(Type I error) = E[P(Type I error | T )] = (5/8)(0.08) = 0.05. Randomized tests arenot sensible for practical application.

    16. Var() = (1 )/n decreases as moves toward 0 or 1 from 0.5.17. a. Var(Y ) = n(1 ), binomial.b. Var(Y ) =

    Var(Yi)+2

    i n(1).

    c. Var(Y ) = E[Var(Y |)] + Var[E(Y |)] = E[n(1 )] + Var(n) = n nE(2) +[n2E(2)n22] = n+(n2n)[E(2)2]n2 = n(1)+(n2n)V ar() > n(1).18. This is the binomial probability of y successes and k 1 failures in y + k 1 trialstimes the probability of a failure at the next trial.

    19. Using results shown in Sec. 16.1, Cov(nj , nk)/Var(nj)Var(nk) = njk/

    nj(1 j)nk(1 k).

    When c = 2, 1 = 1 2 and correlation simplifies to 1.20. a. For binomial, m(t) = E(etY ) =

    y

    (ny

    )(et)y(1 )ny = (1 + et)n, so

    m(0) = n.

    21. to = 2 log[(prob. under H0)/(prob. under Ha)], so (prob. under H0)/(prob. underHa) = exp(to/2).22. a. () = exp(n)

    yi, so L() = n + ( yi) log() and L() = n +

    (yi)/ = 0 yields = (

    yi)/n.

    b. (i) zw = (y 0)/y/n, (ii) zs = (y 0)/

    0/n, (iii) 2[n0 + ( yi) log(0) +

    ny ( yi) log(y)].c. (i) y z/2

    y/n, (ii) all 0 such that |zs| z/2, (iii) all 0 such that the LR statistic

    21().23. Conditional on n = y1 + y2, y1 has a bin(n, ) distribution with = 1/(1 + 2),

    which is 0.5 under H0. The large sample score test uses z = (y1/n 0.5)/0.5(0.5)/n.

    If (, u) denotes a CI for (e.g., the score CI), then the CI for /(1 ) = 1/2 is[/(1 ), u/(1 u)].25. g() = (1 )/n is a concave function of , so if is random, g(E) Eg() byJensens inequality. Now is the expected value of for a distribution putting proba-bility n/(n+ z2/2) at and probability z

    2/2/(n+ z

    2/2) at 1/2.

  • 326. a. The likelihood-ratio (LR) CI is the set of 0 for testing H0: = 0 suchthat LR statistic = 2 log[(1 0)n/(1 )n] z2/2, with = 0.0. Solving for 0,n log(1 0) z2/2/2, or (1 0) exp(z2/2/2n), or 0 1 exp(z2/2/2n). Us-ing exp(x) = 1 + x + ... for small x, the upper bound is roughly 1 (1 z20.025/2n) =z20.025/2n = 1.96

    2/2n 22/2n = 2/n.b. Solve for (0 )/

    (1 )/n = z/2.

    27. If we form the P -value using the right tail, then mid P -value = j/2 + j+1 + .Thus, E(mid P -value) =

    j j(j/2 + j+1 + ) = (

    j j)

    2/2 = 1/2.

    28. The right-tail mid P -value equals P (T > to) + (1/2)p(to) = 1 P (T to) +(1/2)p(to) = 1 Fmid(to).29. a. The kernel of the log likelihood is L() = n1 log

    2+n2 log[2(1)]+n3 log(1)2.Take L/ = 2n1/ + n2/ n2/(1 ) 2n3/(1 ) = 0 and solve for .b. Find the expectation using E(n1) = n

    2, etc. Then, the asymptotic variance is the

    inverse information = (1 )/2n, and thus the estimated SE =(1 )/2n.

    c. The estimated expected counts are [n2, 2n(1 ), n(1 )2]. Compare these to theobserved counts (n1, n2, n3) using X

    2 or G2, with df = (31)1 = 1, since 1 parameteris estimated.

    30. Since 2L/2 = (2n11/2) n12/2 n12/(1 )2 n22/(1 )2,the information is its negative expected value, which is2n2/2 + n(1 )/2 + n(1 )/(1 )2 + n(1 )/(1 )2,which simplifies to n(1 + )/(1 ). The asymptotic standard error is the square rootof the inverse information, or

    (1 )/n(1 + ).

    32. c. Let = n1/n, and (1 ) = n2/n, and denote the null probabilities in the twocategories by 0 and (10). Then, X2 = (n1n0)2/n0+(n2n(12))2/n(10)= n[( 0)2(1 0) + ((1 ) (1 0))20]/0(1 0),which equals ( 0)2/[0(1 0)/n] = z2S.33. Let X be a random variable that equals j0/j with probability j . By Jensensinequality, since the negative log function is convex, E( logX) log(EX). Hence,E( logX) = j log(j/pj0) log[ j(j0/j)] = log(j0) = log(1) = 0.Thus G2 = 2nE( logX) 0.35. If Y1 is

    2 with df = 1 and if Y2 is independent 2 with df = 2, then the mgf of

    Y1+ Y2 is the product of the mgfs, which is m(t) = (1 2t)(1+2)/2, which is the mgf ofa 2 with df = 1 + 2.

    40a. The Bayes estimator is (n1 + )/(n + + ), in which > 0, > 0. No properprior leads to the ML estimate, n1/n. The ML estimator is the limit of Bayes estimatorsas and both converge to 0.b. This happens with the improper prior, proportional to [1(1 1)]1, which we getfrom the beta density by taking the improper settings = = 0.

  • 4Chapter 2

    3. P (|C) = 1/4. It is unclear from the wording, but presumably this means thatP (C|+) = 2/3. Sensitivity = P (+|C) = 1 P (|C) = 3/4. Specificity = P (|C) =1 P (+|C) cant be determined from information given.5. a. Relative risk.b. (i) 1 = 0.552, so 1/2 = 0.55.(ii) 1/0.55 = 1.82.

    11. a. (0.847/0.153)/(0.906/0.094) = 0.574.b. This is interpretation for relative risk, not the odds ratio. The actual relative risk =0.847/0.906 = 0.935; i.e., 60% should have been 93.5%.

    12. a. Relative risk: Lung cancer, 14.00; Heart disease, 1.62. (Cigarette smoking seemsmore highly associated with lung cancer)Difference of proportions: Lung cancer, 0.00130; Heart disease, 0.00256. (Cigarette smok-ing seems more highly associated with heart disease)Odds ratio: Lung cancer, 14.02; Heart disease, 1.62. e.g., the odds of dying from lungcancer for smokers are estimated to be 14.02 times those for nonsmokers. (Note similarityto relative risks.)b. Difference of proportions describes excess deaths due to smoking. That is, if N = no.smokers in population, we predict there would be 0.00130N fewer deaths per year fromlung cancer if they had never smoked, and 0.00256N fewer deaths per year from heartdisease. Thus elimination of cigarette smoking would have biggest impact on deaths dueto heart disease.

    15. Marginal odds ratio = 1.84, but most conditional odds ratios are close to 1.0 exceptin Department A where odds ratio = 0.35. Note that males tend to apply in greaternumbers to Departments A and B, in which admissions rates are relatively high, andfemales tend to aply in greater numbers to Departments C, D, E, F, in which admissionsrates are relatively low. This results in the marginal association whereby the odds ofadmission for males are 84% higher than those for females.

    17. a. 0.18 for males and 0.32 for females; e.g., for male children, the odds that a whitewas a murder victim were 0.18 times the odds that a nonwhite was a murder victim.b. 0.21.

    18. The age distribution is relatively higher in Maine. Death rates are higher at olderages, and Maine tends to have an older population than South Carolina.

    19. Kentucky: Counts are (31, 360 / 7, 50) when victim was white and (0, 18 / 2, 106)when victim was black. Conditional odds ratios are 0.62 and 0.0, whereas marginal oddsratio is 1.42. Simpsons paradox occurs. Whites tend to kill whites and blacks tend tokill blacks, and killing a white is more likely to result in the death penalty.

    21. Yes, this would be an occurrence of Simpsons paradox. One could display the data

  • 5as a 2 2 K table, where rows = (Smith, Jones), columns = (hit, out) response foreach time at bat, layers = (year 1, . . . , year K). This could happen if Jones tends tohave relatively more observations (i.e., at bats) for years in which his average is high.

    25. a. Let pos denote positive diagnosis, dis denote subject has disease.

    P (dis|pos) = P (pos|dis)P (dis)P (pos|dis)P (dis) + P (pos|no dis)P (no dis)

    b. 0.95(0.005)/[0.95(0.005) + 0.05(0.995)] = 0.087.

    Test

    Reality+ Total

    + 0.00475 0.00025 0.005 0.04975 0.94525 0.995

    Nearly all (99.5%) subjects are not HIV+. The 5% errors for them swamp (in frequency)the 95% correct cases for subjects who truly are HIV+. The odds ratio = 361; i.e., theodds of a positive test result are 361 times higher for those who are HIV+ than for thosenot HIV+.

    27. a. The numerator is the extra proportion that got the disease above and beyondwhat the proportion would be if no one had been exposed (which is P (D | E)).b. Use Bayes Theorem and result that RR = P (D | E)/P (D | E).29. a. For instance, if first row becomes first column and second row becomes secondcolumn, the table entries become

    n11 n21n12 n22

    The odds ratio is the same as before. The difference of proportions and relative risk areonly invariant to multiplication of cell counts within rows by a constant.

    30. Suppose 1 > 2. Then, 11 < 12, and = [1/(11)]/[2/(12)] > 1/2 >1. If 1 < 2, then 1 1 > 1 2, and = [1/(1 1)]/[2/(1 2)] < 1/2 < 1.31. This simply states that ordinary independence for a two-way table holds in eachpartial table.

    36. This condition is equivalent to the conditional distributions of Y in the first I 1rows being identical to the one in row I. Equality of the I conditional distributions isequivalent to independence.

    37. Use an argument similar to that in Sec. 1.2.5. Since Yi+ is sum of independent Pois-sons, it is Poisson. In the denominator for the calculation of the conditional probability,the distribution of {Yi+} is a product of Poissons with means {i+}. The multinomialdistributions are obtained by identifying j|i with ij/i+.

    40. If in each row the maximum probability falls in the same column, say column 1, then

  • 6E[V (Y | X)] = i i+(11|i) = 1+1 = 1max{+j}, so = 0. Since the maximumbeing the same in each row does not imply independence, = 0 can occur even whenthe variables are not independent.

    Chapter 3

    14. b. Compare rows 1 and 2 (G2 = 0.76, df = 1, no evidence of difference), rows3 and 4 (G2 = 0.02, df = 1, no evidence of difference), and the 3 2 table consisting ofrows 1 and 2 combined, rows 3 and 4 combined, and row 5 (G2 = 95.74, df = 2, strongevidences of differences).

    16.a. X2 = 8.9, df = 6, P = 0.18; test treats variables as nominal and ignores theinformation on the ordering.b. Residuals suggest tendency for aspirations to be higher when family income is higher.c. Ordinal test gives M2 = 4.75, df = 1, P = 0.03, and much stronger evidence of anassociation.

    18. a. It is plausible that control of cancer is independent of treatment used. (i) P -valueis hypergeometric probability P (n11 = 21 or 22 or 23) = 0.3808, (ii) P -value = 0.638is sum of probabilities that are no greater than the probability (0.2755) of the observedtable.b. 0.3808 - 0.5(0.2755) = 0.243. With this type of P -value, the actual error probabilitytends to be closer to the nominal value, the sum of the two one-sided P-values is 1,and the null expected value is 0.50; however, it does not guarantee that the actual errorprobability is no greater than the nominal value.

    25. For proportions and 1 in the two categories for a given sample, the contributionto the asymptotic variance is [1/n + 1/n(1 )]. The derivative of this with respectto is 1/n(1 )2 1/n2, which is less than 0 for < 0.50 and greater than 0 for > 0.50. Thus, the minimum is with proportions (0.5, 0.5) in the two categories.

    29. Use formula (3.9), noting that the partial derivative of the measure with respect toi is just i/

    2.

    30. For any reasonable significance test, whenever H0 is false, the test statistic tends tobe larger and the P -value tends to be smaller as the sample size increases. Even if H0is just slightly false, the P -value will be small if the sample size is large enough. Moststatisticians feel we learn more by estimating parameters using confidence intervals thanby conducting significance tests.

    31. a. Note = 1+ = +1.b. The log likelihood has kernel

    L = n11 log(2) + (n12 + n21) log[(1 )] + n22 log(1 )2

    L/ = 2n11/ + (n12 + n21)/ (n12 + n21)/(1 ) 2n22/(1 ) = 0 gives =(2n11 + n12 + n21)/2(n11 + n12 + n21 + n22) = (n1+ + n+1)/2n = (p1+ + p+1)/2.c,d. Calculate estimated expected frequencies (e.g., 11 = n

    2), and obtain Pearson X2,

  • 7which is 2.8. We estimated one parameter, so df = (4-1)-1 = 2 (one higher than in testingindependence without assuming identical marginal distributions). The free throws areplausibly independent and identically distributed.

    32. By expanding the square and simplifying, one can obtain the alternative formula forX2,

    X2 = n[i

    j

    (n2ij/ni+n+j) 1].

    Since nij ni+, the double sum term cannot exceed ij nij/n+j = J , and sincenij n+j , the double sum cannot exceed ij nij/ni+ = I. It follows that X2 cannotexceed n[min(I, J) 1] = n[min(I 1, J 1)].35. Because G2 for full table = G2 for collapsed table + G2 for table consisting of thetwo rows that are combined.

    37.

    j p+j rj =

    j p+j [

    k

  • 8b. (0.5893/0.0650)2 = 82.15.Need log likelihood value when = 0.

    c. Multiply standard errors by535.896/171 = 1.77. There is still very strong evidence

    of a positive weight effect.

    7. a. log() = 1.6094 + 0.5878x. Since = log(B/A), exp() = B/A = 1.80; i.e.,the mean is predicted to be 80% higher for treatment B. (In fact, this estimate is simplythe ratio of sample means.)b. Wald test gives z = 0.588/0.176 = 3.33, z2 = 11.1 (df = 1), P < 0.001. Likelihood-ratio statistic equals 27.86 - 16.27 = 11.6 with df = 1, P < 0.001. There is strongevidence against H0 and of a higher defect rate for treatment B.c. Exponentiate 95% CI for of 0.588 1.96(0.176) to get Wald CI of exp(0.242, 0.934) =(1.27, 2.54).

    d. Normal approximation to binomial yields z = (50 70)/140(0.5)(0.5) = 3.4 and

    very strong evidence against H0.

    9. Since exp(0.192) = 1.21, a 1 cm increase in width corresponds to an estimated in-crease of 21% in the expected number of satellites. For estimated mean , the estimatedvariance is + 1.112, considerably larger than Poisson variance unless is very small.The relatively small SE for k1 gives strong evidence that this model fits better thanthe Poisson model and that it is necessary to allow for overdispersion. The much largerSE of in this model also reflects the overdispersion.

    11.a. The ratio of the rate for smokers to nonsmokers decreases markedly as age increases.b. G2 = 12.1, df = 4.c. For age scores (1,2,3,4,5), G2 = 1.5, df = 3. The interaction term = -0.309, with std.error = 0.097; the estimated ratio of rates is multiplied by exp(0.309) = 0.73 for eachsuccessive increase of one age category.

    15. The link function determines the function of the mean that is predicted by the linearpredictor in a GLM. The identity link models the binomial probability directly as a linearfunction of the predictors. It is not often used, because probabilities must fall between 0and 1, whereas straight lines provide predictions that can be any real number. When theprobability is near 0 or 1 for some predictor values or when there are several predictors, itis not unusual to get estimated probabilities below 0 or above 1. With the logit link, anyreal number predicted value for the linear model corresponds to a probability between0 and 1. Similarly, Poisson means must be nonnegative. If we use an identity link, wecould get negative predicted values. With the log link, a predicted negative log meanstill corresponds to a positive mean.

    16. With single predictor, log[(x)] = + x. Since log[(x + 1)] log[(x)] = , therelative risk is (x + 1)/(x) = exp(). A restriction of the model is that to ensure0 < (x) < 1, it is necessary that + x < 0.

    17. a. (x)/x = e+x/[1 + e+x]2, which is positive if > 0. Note that the generallogistic cdf on p. 121 has mean and standard deviation /

    3. Writing + x in the

  • 9form (x/)/(1/), we identify with / and with 1/, so the standard deviationis /

    3 when > 0.

    19. For j = 1, xij = 0 for group B, and for observations in group A, A/i is constant,so likelihood equation sets

    A(yi A)/A = 0, so A = yA. For j = 0, xij = 1 and the

    likelihood equation gives

    A

    (yi A)A

    (Ai

    )+B

    (yi B)B

    (Bi

    )= 0.

    The first sum is 0 from the first likelihood equation, and for observations in group B,B/i is constant, so second sum sets

    B(yi B)/B = 0, so B = yB.

    21. Letting = , wi = [(

    j jxij)]2/[(

    j jxij)(1 (

    j jxij))/ni]

    22. a. Since is symmetric, (0) = 0.5. Setting + x = 0 gives x = /.b. The derivative of at x = / is ( + (/)) = (0). The logistic pdf has(x) = ex/(1 + ex)2 which equals 0.25 at x = 0; the standard normal pdf equals 1/

    2

    at x = 0.c. ( + x) = (x(/)

    1/).

    23. a. Cauchy. You can see this by taking the derivative and noting it has the form ofa Cauchy density. The GLM with Bernoulli random component, systematic component+ x, link function tan[pi((x) 1/2)] (where pi = 3.14...), would work well when therate of convergence of to 0 and 1 is slower than with the logit or probit link (Recallthat Cauchy density has thick tails compared to logistic and normal densities).

    26. a. With identity link the GLM likelihood equations simplify to, for each i,ni

    j=1(yiji)/i = 0, from which i =

    j yij/ni.

    b. Deviance = 2

    i

    j [yij log(yij/yi).

    31. For log likelihood L() = n + (i yi) log(), the score is u = (i yi n)/,H = (i yi)/2, and the information is n/. It follows that the adjustment to (t) inFisher scoring is [(t)/n][(

    i yin(t))/(t)] = y(t), and hence (t+1) = y. For Newton-

    Raphson, the adjustment to (t) is (t) ((t))2/y, so that (t+1) = 2(t) ((t))2/y. Notethat if (t) = y, then also (t+1) = y.

    Chapter 5

    2. a. = e3.7771+0.1449(8)/[1 + e3.7771+0.1449(8)].b. = 0.5 at / = 3.7771/0.1449 = 26.c. At LI = 8, = 0.068, so rate of change is (1 ) = 0.1449(0.068)(0.932) = 0.009.e. e = e.1449 = 1.16.f. The odds of remission at LI = x + 1 are estimated to fall between 1.029 and 1.298times the odds of remission at LI = x.g. Wald statistic = (0.1449/0.0593)2 = 5.96, df = 1, P -value = 0.0146 for Ha: 6= 0.h. Likelihood-ratio statistic = 34.37 - 26.07 = 8.30, df = 1, P -value = 0.004.

  • 10

    5. a. At 26.3, estimated odds = exp[12.351 + 0.497(26.3)] = 2.06, and at 27.3 theestimated odds = exp[12.351 + 0.497(27.3)] = 3.38, and 3.38 = 1.64(2.06). For each1-unit increase in x, the odds multiply by 1.64 (i.e., increase by 64%).b. The approximate rate of change when = 0.5 is (1 ) = /4. The 95% Wald CIfor of (0.298, 0.697) translates to one for /4 of (0.07, 0.17).

    7. logit() = -3.866 + 0.397(snoring). Fitted probabilities are 0.021, 0.044, 0.093, 0.132.Multiplicative effect on odds equals exp(0.397) = 1.49 for one-unit change in snoring, and2.21 for two-unit change. Goodness-of-fit statistic G2 = 2.8, df = 2 shows no evidence oflack of fit.

    9. The CochranArmitage test uses the ordering of rows and has df = 1, and tends togive smaller P -values when there truly is a linear trend.

    11. Estimated odds of contraceptive use for those with at least 1 year of college weree0.501 = 1.65 times the estimated odds for those with less than 1 year of college. The 95%Wald CI for the true odds ratio is exp[0.5011.96(0.077)] = (e0.350, e0.652) = (1.42, 1.92).14. The original variables c and x relate to the standardized variables zc and zx by zc =(c2.44)/0.80 and zx = (x26.3)/2.11, so that c = 0.80zc+2.44 and x = 2.11zx+26.3.Thus, the prediction equation islogit() = 10.071 0.509[0.80zc + 2.44] + 0.458[2.11zx + 26.3],The coefficients of the standardized variables are -0.509(0.80) = -0.41 and 0.458(2.11) =0.97. Adjusting for the other variable, a one standard deviation change in x has morethan double the effect of a one standard deviation change in c. At x = 26.3, the esti-mated logits at c = 1 and at c = 4 are 1.465 and -0.062, which correspond to estimatedprobabilities of 0.81 and 0.48.

    15. a. Black defendants with white victims had estimated probability e3.5961+2.4044/[1 +e3.5961+2.4044] = 0.23.b. For a given defendants race, the odds of the death penalty when the victim was whiteare estimated to be between e1.3068 = 3.7 and e3.7175 = 41.2 times the odds when thevictim was black.c. Wald statistic (0.8678/0.3671)2 = 5.6, LR statistic = 5.0, each with df = 1. P -value= 0.025 for LR statistic.d. G2 = 0.38, X2 = 0.20, df = 1, so model fits well.

    19. R = 1: logit() = 6.7 + 0.1A+ 1.4S. R = 0: logit() = 7.0 + 0.1A+ 1.2S.The YS conditional odds ratio is exp(1.4) = 4.1 for blacks and exp(1.2) = 3.3 for whites.Note that 0.2, the coeff. of the cross-product term, is the difference between the log oddsratios 1.4 and 1.2. The coeff. of S of 1.2 is the log odds ratio between Y and S when R= 0 (whites), in which case the RS interaction does not enter the equation. The P -valueof P < 0.01 for smoking represents the result of the test that the log odds ratio betweenY and S for whites is 0.

    22. Logit model gives fit, logit() = -3.556 + 0.053(income).

    25. The derivative equals exp( + x)/[1 + exp( + x)]2 = (x)(1 (x)).

  • 11

    26. The odds ratio e is approximately equal to the relative risk when the probability isnear 0 and the complement is near 1, sincee = [(x+ 1)/(1 (x+ 1))]/[(x)/(1 (x))] (x+ 1)/(x).27. (x)/x = (x)[1 (x)], and (1 ) 0.25 with equality at = 0.5. Formultiple explanatory variable case, the rate of change as xi changes with other variablesheld constant is greatest when = 0.5.

    28. The square of the denominator is the variance of logit() = + x. For large n, theratio of ( + x - logit(0) to its standard deviation is approximately standard normal,and (for fixed 0) all x for which the absolute ratio is no larger than z/2 are not contra-dictory.

    29. a. Since log[/(1 )] = + log(d), exponentiating yields /(1 ) = eelog(d) =ed. Letting d = 1, e equals the odds for the first draft pick.b. As a function of d, the odds decreases more quickly for pro basketball.

    30. a. Let = P(Y=1). By Bayes Theorem,

    P (Y = 1|x) = exp[(x1)2/22]/{ exp[(x1)2/22+(1) exp[(x0)2/22]}= 1/{1 + [(1 )/] exp{[20 21 + 2x(1 0)]/22}

    = 1/{1 + exp[( + x)]} = exp( + x)/[1 + exp( + x)],where = (1 0)/2 and = log[(1 )/] + [20 21]/22.32. a. Given {i}, we can find parameters so model holds exactly. With constraintI = 0, log[I/(1 I)] = determines . Since log[i/(1 i)] = +i, it follows that

    i = log[i/(1 i)]) log[I/(1 I)].That is, i is the log odds ratio for rows i and I of the table. When all i are equal, thenthe logit is the same for each row, so i is the same in each row, so there is independence.

    35. d. When yi is a 0 or 1, the log likelihood is

    i[yi log i + (1 yi) log(1 i)].For the saturated model, i = yi, and the log likelihood equals 0. So, in terms of the MLfit and the ML estimates {i} for this linear trend model, the deviance equalsD = 2i[yi log i + (1 yi) log(1 i)] = 2i[yi log

    (pii

    1pii

    )+ log(1 i)]

    = 2i[yi( + xi) + log(1 i)].For this model, the likelihood equations are

    i yi =

    i i and

    i xiyi =

    i xii.

    So, the deviance simplifies toD = 2[i i + i xii +i log(1 i)]= 2[i i(+ xi) +i log(1 i)]= 2i i log

    (pii

    1pii

    ) 2i log(1 i).

    40. a. Expand log[p/(1 p)] in a Taylor series for a neighborhood of points around p =, and take just the term with the first derivative.b. Let pi = yi/ni. The ith sample logit is

    log[pi/(1 pi)] log[(t)i /(1 (t)i )] + (pi (t)i )/(t)i (1 (t)i )

  • 12

    = log[(t)i /(1 (t)i )] + [yi ni(t)i ]/ni(t)i (1 (t)i )

    Chapter 6

    1. logit() = -9.35 + 0.834(weight) + 0.307(width).a. Like. ratio stat. = 32.9 (df = 2), P < 0.0001. There is extremely strong evidencethat at least one variable affects the response.b. Wald statistics are (0.834/0.671)2 = 1.55 and (0.307/0.182)2 = 2.85. These eachhave df = 1, and the P -values are 0.21 and 0.09. These predictors are highly correlated(Pearson corr. = 0.887), so this is the problem of multicollinearity.

    12. The estimated odds of admission were 1.84 times higher for men than women. How-ever, AG(D) = 0.90, so given department, the estimated odds of admission were 0.90times as high for men as for women. Simpsons paradox strikes again! Men appliedrelatively more often to Departments A and B, whereas women applied relatively moreoften to Departments C, D, E, F. At the same time, admissions rates were relatively highfor Departments A and B and relatively low for C, D, E, F. These two effects combine togive a relative advantage to men for admissions when we study the marginal association.The values of G2 are 2.68 for the model with no G effect and 2.56 for the model withG and D main effects. For the latter model, CI for conditional AG odds ratio is (0.87,1.22).

    17. The CMH statistic simplifies to the McNemar statistic of Sec. 11.1, which in chi-squared form equals (146)2/(14+6) = 3.2 (df = 1). There is slight evidence of a betterresponse with treatment B (P = 0.074 for the two-sided alternative).

    27. logit() = 12.351 + 0.497x. Prob. at x = 26.3 is 0.674; prob. at x = 28.4 (i.e., onestd. dev. above mean) is 0.854. The odds ratio is [(0.854/0.146)/(0.674/0.326)] = 2.83,so = 1.04, = 5.1. Then n = 75.

    31. We consider the contribution to theX2 statistic of its two components (correspondingto the two levels of the response) at level i of the explanatory variable. For simplicity,we use the notation of (4.21) but suppress the subscripts. Then, that contribution is(y n)2/n + [(n y) n(1 )]2/n(1 ), where the first component is (observed- fitted)2/fitted for the success category and the second component is (observed -fitted)2/fitted for the failure category. Combining terms gives (y n)2/n(1 ),which is the square of the residual. Adding these chi-squared components therefore givesthe sum of the squared residuals.

    35. The noncentrality is the same for models (X +Z) and (Z), so the difference statistichas noncentrality 0. The conditional XY independence model has noncentrality propor-tional to n, so the power goes to 1 as n increases.

    41. a. E(Y ) = + 1X + 2Z. The slope 1 of the line for the partial relationshipbetween E(Y ) and X is the same at all fixed levels of Z.

  • 13

    b. With dummy (indicator) variables for Z, one has parallelism of lines. That is, theslope of the line relating E(Y ) and X is the same for each category of Z.c. Use dummy variables for X and Z, but no interaction terms. The difference betweenE(Y ) at two categories of X is the same at each fixed category of Z.d. For logistic models, the odds ratio relating Y and X is the same at each category of Z.

    Chapter 7

    21. Log likelihood for the probit model is

    log

    Ni=1

    [(

    j

    jxij)]yi[

    1 (

    j

    jxij)]1yi

    =i

    yi log[(

    j

    jxij)]

    +i

    (1 yi) log[1

    (j

    jxij)]

    =i

    yi log

    (j jxij

    )1

    (j jxij

    )+

    i

    log[1

    (j

    jxij)]

    For the probit model,

    L

    j=i

    yi

    1

    (j jxij

    )(

    j jxij)

    [(

    j jxij)xij(1

    (j jxij

    ))+ xij

    (j jxij

    )(

    j jxij)]

    [1

    (j jxij

    )]2

    i

    xij(

    j jxij)

    1 (

    j jxij) = 0

    = i

    yixij(

    j jxij)

    [1

    (j jxij

    )](

    j jxij)

    i

    xij(

    j jxij)

    1 (

    j jxij) = 0

    = i

    yixij(

    jxij)

    i(1 i) i

    xij(

    j jxij)i

    i(1 i) = 0

    = i

    (yi i

    )xijzi = 0, where zi = (jjxij)/i(1 i).

    For logistic regression, from (4.28) with {ni = 1}, i (yi i)xij = 0.25. (log (x2))/(log (x1)) = exp[(x2 x1)], so (x2) = (x1)exp[(x2x1)]. For x2 x1 =1, (x2) equals (x1) raised to the power exp().

    Chapter 8

  • 14

    3. Both gender and race have significant effects. The logistic model with additive effectsand no interaction fits well, with G2 = 0.2 based on df = 2. The estimated odds ofpreferring Democrat instead of Republican are higher for females and for blacks, withestimated conditional odds ratios of 1.8 between gender and party ID and 9.8 betweenrace and party ID.

    7. For any collapsing of the response, for Democrats the estimated odds of response inthe liberal direction are exp(0.975) = 2.65 times the estimated odds for Republicans. Theestimated probability of a very liberal response equals exp(2.469)/[1 + exp(2.469)] =0.078 for Republicans and exp(2.469 + 0.975)/[1 + exp(2.469 + 0.975)] = 0.183 forDemocrats.

    8. a. Four intercepts are needed for five response categories. For males in urban areaswearing seat belts, all dummy variables equal 0 and the estimated cumulative probabil-ities are exp(3.3074)/[1 + exp(3.3074)] = 0.965, exp(3.4818)/[1 + exp(3.4818)] = 0.970,exp(5.3494)/[1 + exp(5.3494)] = 0.995, exp(7.2563)/[1 + exp(7.2563)] = 0.9993, and 1.0.The corresponding response probabilities are 0.965, 0.005, 0.025, 0.004, and 0.0007.b. Wald CI is exp[0.54631.96(0.0272)] = (exp(0.600), exp(0.493)) = (0.549, 0.611).Give seat belt use and location, the estimated odds of injury below any fixed level for afemale are between 0.549 and 0.611 times the estimated odds for a male.c. Estimated odds ratio equals exp(0.7602 0.1244) = 0.41 in rural locations andexp(0.7602) = 0.47 in urban locations. The interaction effect -0.1244 is the differencebetween the two log odds ratios.

    10. a. Setting up indicator variables (1,0) for (male, female) and (1,0) for (sequential,alternating), we get treatment effect = -0.581 (SE = 0.212) and gender effect = -0.541(SE = 0.295). The estimated odds ratios are 0.56 and 0.58. The sequential therapyleads to a better response than the alternating therapy; the estimated odds of responsewith sequential therapy below any fixed level are 0.56 times the estimated odds withalternating therapy.b. The main effects model fits well (G2 = 5.6, df = 7), and adding an interaction termdoes not give an improved fit (The interaction model has G2 = 4.5, df = 6).

    15. The estimated odds a Democrat is classified in the more liberal instead of themore conservative of two adjacent categories are exp(0.435) = 1.54 times the estimatedodds for a Republican. For the two extreme categories, the estimated odds ratio equalsexp[4(0.435)] = 5.7.

    17.a. Using scores 3.2, 3.75, 4.5, 5.2, the proportional odds model has a treatment ef-fect of 0.805 with SE = 0.206; for the treatment group, the estimated odds that endingcholesterol is below any fixed level are exp(0.805) = 2.24 times the odds for the controlgroup. The psyllium treatment seems to have had a strong, beneficial effect.

    18. CMH statistic for correlation alternative, using equally-spaced scores, equals 6.3(df = 1) and has P -value = 0.012. When there is roughly a linear trend, this tends to bemore powerful and give smaller P -values, since it focuses on a single degree of freedom.LR statistic for cumulative logit model with linear effect of operation = 6.7, df = 1, P =

  • 15

    0.01; strong evidence that operation has an effect on dumping, gives similar results as in(a). LR statistic comparing this model to model with four separate operation parametersequals 2.8 (df = 3), so simpler model is adequate.

    29. The multinomial mass function factors as the multinomial coefficient timesnJ exp[

    J1i=1 ni log(i/J)], which has the form a function of the data times a function of

    the parameters (namely (1 1 ... J1)n) times an exponential function of a sum ofthe observations times the canonical parameters, which are the baseline-category logits.

    32. 3(x)/x =[1 exp(1+1x)+2 exp(2+2x)][1+exp(1+1x)+exp(2+2x)]2

    .a. The denominator is positive, and the numerator is negative when 1 > 0 and 2 > 0.

    36. The baseline-category logit model refers to individual categrories rather than cumu-lative probabilities. There is not linear structure for baseline-category logits that impliesidentical effects for each cumulative logit.

    37. a. For j < k, logit[P (Y j | X = xi)] - logit[P (Y k | X = xi)] =(j k) + (j k)x. This difference of cumulative probabilities cannot be positivesince P (Y j) P (Y k); however, if j > k then the difference is positive for largex, and if j > k then the difference is positive for small x.

    39. a. df = I(J 1) [(J 1) + (I 1)] = (I 1)(J 2).c. The full model has an extra I 1 parameters.d. The cumulative probabilities in row a are all smaller or all greater than those in rowb depending on whether a > b or a < b.

    43. For a given subject, the model has the form

    j =j + jx+ ujh h + hx+ uh

    .

    For a given cost, the odds a female selects a over b are exp(a b) times the odds formales. For a given gender, the log odds of selecting a over b depend on ua ub.

    Chapter 9

    1. G2 values are 2.38 (df = 2) for (GI,HI), and 0.30 (df = 1) for (GI,HI,GH).b. Estimated log odds ratios is -0.252 (SE = 0.175) for GH association, so CI for oddsratio is exp[0.252 1.96(0.175)]. Similarly, estimated log odds ratio is 0.464 (SE =0.241) for GI association, leading to CI of exp[0.464 1.96(0.241)]. Since the intervalscontain values rather far from 1.0, it is safest to use model (GH,GI,HI), even thoughsimpler models fit adequately.

    4. For either approach, from (8.14), the estimated conditional log odds ratio equals

    AC11 + AC22 AC12 AC21

    5. a. G2 = 31.7, df = 48. The data are sparse, but the model seems to fit well. It is

  • 16

    plausible that the association between any two items is the same at each combination oflevels of the other two items.b. log(11cl33cl/13cl31cl) = log(11cl) + log(33cl) log(13cl) log(31cl).Substitute model formula, and simplify. The estimated odds ratio equals exp(2.142) =8.5. There is a strong positive association. Given responses on C and L, the estimatedodds of judging spending on E to be too much instead of too little are 8.5 times as highfor those who judge spending on H to be too much than for those who judge spendingon H to be too low. The 95% CI is exp[2.142 1.96(0.523)], or (3.1, 24.4.). Though itis very wide, it is clear that the true association is strong.

    7. a. Let S = safety equipment, E = whether ejected, I = injury. Then, G2(SE, SI, EI) =2.85, df = 1. Any simpler model has G2 > 1000, so it seems there is an association foreach pair of variables, and that association can be regarded as the same at each levelof the third variable. The estimated conditional odds ratios are 0.091 for S and E (i.e.,wearers of seat belts are much less likely to be ejected), 5.57 for S and I, and 0.061 forE and I.b. Loglinear models containing SE are equivalent to logit models with I as responsevariable and S and E as explanatory variables. The loglinear model (SE, SI, EI) isequivalent to a logit model in which S and E have additive effects on I. The estimatedodds of a fatal injury are exp(2.798) = 16.4 times higher for those ejected (controllingfor S), and exp(1.717) = 5.57 times higher for those not wearing seat belts (controllingfor E).

    8. Injury has estimated conditional odds ratios 0.58 with gender, 2.13 with location, and0.44 with seat-belt use. No is category 1 of I, and female is category 1 of G, so theodds of no injury for females are estimated to be 0.58 times the odds of no injury formales (controlling for L and S); that is, females are more likely to be injured. Similarly,the odds of no injury for urban location are estimated to be 2.13 times the odds for rurallocation, so injury is more likely at a rural location, and the odds of no injury for noseat belt use are estimated to be 0.44 times the odds for seat belt use, so injury is morelikely for no seat belt use, other things being fixed. Since there is no interaction for thismodel, overall the most likely case for injury is therefore females not wearing seat beltsin rural locations.

    9. a. (DV F, Y D, Y V, Y F ).b. Model with Y as response and additive factor effects for D and V , logit() = + Di +

    Vj .

    c. (i) (DV F, Y ), logit() = , (ii) (DV F, Y F ), logit() = + Fi ,(iii) (DV F, Y DV, Y F ), add term of form DVij to logit model.

    13. Homogeneous association model (BP,BR,BS, PR, PS,RS) fits well (G2 = 7.0,df = 9). Model deleting PR association also fits well (G2 = 10.7, df = 11), but we usethe full model.For homogeneous association model, estimated conditional BS odds ratio equals exp(1.147)= 3.15. For those who agree with birth control availability, the estimated odds of view-ing premarital sex as wrong only sometimes or not wrong at all are about triple the

  • 17

    estimated odds for those who disagree with birth control availability; there is a positiveassociation between support for birth control availability and premarital sex. The 95%CI is exp(1.147 1.645(0.153)) = (2.45, 4.05).Model (BPR,BS, PS,RS) has G2 = 5.8, df = 7, and also a good fit.

    17. b. log 11(k) = log11k + log 22k log 12k log 21k = XY11 + XY22 XY12 XY21 ; forzero-sum constraints, as in problem 16c this simplifies to 4XY11 .e. Use equations such as

    = log(111), Xi = log

    (i11111

    ), XYij = log

    (ij1111i111j1

    )

    XY Zijk = log

    ([ijk11k/i1k1jk]

    [ij1111/i111j1]

    )

    19. a. When Y is jointly independent of X and Z, ijk = +j+i+k. Dividing ijk by++k, we find that P (X = i, Y = j|Z = k) = P (X = i|Z = k)P (Y = j). But whenijk = +j+i+k, P (Y = j|Z = k) = +jk/++k = +j+++k/++k = +j+ = P (Y = j).Hence, P (X = i, Y = j|Z = k) = P (X = i|Z = k)P (Y = j) = P (X = i|Z = k)P (Y =j|Z = k) and there is XY conditional independence.b. For mutual independence, ijk = i+++j+++k. Summing both sides over k, ij+ =i+++j+, which is marginal independence in the XY marginal table.c. For instance, model (Y,XZ) satisfies this, but X and Z are dependent (the conditionalassociation being the same as the marginal association in each case, for this model).21. Use the definitions of the models, in terms of cell probabilities as functions of marginalprobabilities. When one specifies sufficient marginal probabilities that have the requiredone-way marginal probabilities of 1/2 each, these specified marginal distributions thendetermine the joint distribution. Model (XY, XZ, YZ) is not defined in the same way;for it, one needs to determine cell probabilities for which each set of partial odds ratiosdo not equal 1.0 but are the same at each level of the third variable.a.

    Y Y0.125 0.125 0.125 0.125

    X 0.125 0.125 0.125 0.125

    Z = 1 Z = 2

    This is actually a special case of (X,Y,Z) called the equiprobability model.b.

    0.15 0.10 0.15 0.100.10 0.15 0.10 0.15

    c.

    1/4 1/24 1/12 1/81/8 1/12 1/24 1/4

  • 18

    d.

    2/16 1/16 4/16 1/161/16 4/16 1/16 2/16

    e. Any 2 2 2 table

    23. Number of terms = 1 +

    (T1

    )+

    (T2

    )+ ...+

    (TT

    )=

    i

    (Ti

    )1i1Ti = (1+1)T ,

    by the Binomial theorem.

    25. a. The XY term does not appear in the model, so X and Y are conditionallyindependent. All terms in the saturated model that are not in model (WXZ,WYZ)involve X and Y , so permit an XY conditional association.b. (WX,WZ,WY,XZ, Y Z)

    27. For independent Poisson sampling,

    L =i

    j

    nij log ij i

    j

    ij = n+i

    ni+Xi +

    j

    n+jYj

    i

    j

    exp(log ij)

    It follows that {ni+}, {n+j} are minimal sufficient statistics, and the likelihood equa-tions are i+ = ni+, +j = n+j for all i and j. Since the model is ij = i++j/n,the fitted values are ij = i++j/n = ni+n+j/n. The residual degrees of freedom areIJ [1 + (I 1) + (J 1)] = (I 1)(J 1).28. For this model, in a given row the J cell probabilities are equal. The likelihood equa-tions are i+ = ni+ for all i. The fitted values that satisfy the model and the likelihoodequations are ij = ni+/J .

    31. a. The formula reported in the table satisfies the likelihood equations h+++ =nh+++, +i++ = n+i++, ++j+ = n++j+, +++k = n+++k, and they satisfy the model,which has probabilistic form hijk = h++++i++++j++++k, so by Birchs results theyare ML estimates.b. Model (WX, Y Z) says that the composite variable (having marginal frequencies{nhi++}) is independent of the Y Z composite variable (having marginal frequencies{n++jk}). Thus, df = [no. categories of (XY )-1][no. categories of (Y Z)-1] = (HI 1)(JK 1). Model (WXY,Z) says that Z is independent of the WXY composite vari-able, so the usual results apply to the two-way table having Z in one dimension, HIJlevels of WXY composite variable in the other; e.g., df = (HIJ 1)(K 1).

    Chapter 10

    2. a. For any pair of variables, the marginal odds ratio is the same as the conditionalodds ratio (and hence 1.0), since the remaining variable is conditionally independent ofeach of those two.b. (i) For each pair of variables, at least one of them is conditionally independent ofthe remaining variable, so the marginal odds ratio equals the conditional odds ratio. (ii)

  • 19

    these are the likelihood equations implied by the AC term in the model.c. (i) Both A and C are conditionally dependent with M , so the association may changewhen one controls for M . (ii) For the AM odds ratio, since A and C are conditionallyindependent (given M), the odds ratio is the same when one collapses over C. (iii) Theseare likelihood equations implied by the AM and CM terms in the model.d. (i) no pairs of variables are conditionally independent, so collapsibility conditions arenot satisfied for any pair of variables. (ii) These are likelihood equations implied by thethree association terms in the model.

    7. Model (AC,AM,CM) fits well. It has df = 1, and the likelihood equations implyfitted values equal observed in each two-way marginal table, which implies the differencebetween an observed and fitted count in one cell is the negative of that in an adjacentcell; their SE values are thus identical, as are the standardized Pearson residuals. Theother models fit poorly; e.g. for model (AM,CM), in the cell with each variable equalto yes, the difference between the observed and fitted counts is 3.7 standard errors.

    19. W and Z are separated using X alone or Y alone or X and Y together. W and Y areconditionally independent given X and Z (as the model symbol implies) or conditionalon X alone since X separates W and Y . X and Z are conditionally independent givenW and Y or given only Y alone.

    20. a. Yes let U be a composite variable consisting of combinations of levels of Y andZ; then, collapsibility conditions are satisfied as W is conditionally independent of U ,given X .b. No.

    21. b. Using the Haberman result, it follows that

    1i log(0i) =

    0i log(0i)

    ni log(ai) =

    ai log(ai), a = 0, 1.

    The first equation is obtained by letting {i} be the fitted values for M0. The secondpair of equations is obtained by letting M1 be the saturated model. Using these, one canobtain the result.

    25. From the definition, it follows that a joint distribution of two discrete variables ispositively likelihood-ratio dependent if all odds ratios of form ijhk/ikhj 1, wheni < h and j < k.a. For LL model, this odds ratio equals exp[(uhui)(vkvj)]. Monotonicity of scoresimplies ui < uh and vj < vk, so these odds ratios all are at least equal to 1.0 when 0.Thus, when > 0, as X increases, the conditional distributions on Y are stochasticallyincreasing; also, as Y increases, the conditional distributions on X are stochastically in-creasing. When < 0, the variables are negatively likelihood-ratio dependent, and theconditional distributions on Y (X) are stochastically decreasing as X (Y ) increases.b. For row effects model with j < k, hjik/hkij = exp[(i h)(vk vj)]. Whenih > 0, all such odds ratios are positive, since scores on Y are monotone increasing.Thus, there is likelihood-ratio dependence for the 2 J table consisting of rows i and h,

  • 20

    and Y is stochastically higher in row i.

    27. a. Note the derivative of the log likelihood with respect to is

    i

    j uivj(nij ij),

    which under indep. estimates is n

    i

    j uivj(pij pi+p+j).

    b. Use formula (3.9). In this context, =

    uivj(ij i++j) and ij = uivj ui(

    b vb+b) vj(

    a uaa+) Under H0, ij = i++j , and

    ijij simplifies to( uii+)( vj+j). Also under H0,

    i

    j

    ij2ij =

    i

    j

    u2i v2ji++j + (

    j

    vj+j)2(i

    u2ii+) + (i

    uii+)2(j

    v2j+j)

    +2(i

    j

    uivji++j)(i

    uii+)(j

    vj+j)2(i

    u2ii+)(j

    vj+j)22(

    j

    v2j+j)(i

    uii+)2.

    Then 2 in (3.9) simplifies to

    [i

    u2ii+ (i

    uii+)2][j

    v2j+j (j

    vj+j)2].

    The asymptotic standard error is /n, the estimate of which is the same formula with

    ij replaced by pij.

    28. For Poisson sampling, log likelihood is

    L = n+i

    ni+Xi +

    j

    n+jYj +

    i

    i[j

    nijvj ]i

    j

    exp(+ ...)

    Thus, the minimal sufficient statistics are {ni+}, {n+j}, and {j nijvj}. Differentiatingwith respect to the parameters and setting results equal to zero gives the likelihoodequations. For instance, L/i =

    j vjnij

    j vjij, i = 1, ..., I, from which follows

    the I equations in the third set of likelihood equations.

    30. a. These equations are obtained successively by differentiating with respect toXZ , Y Z , and . Note these equations imply that the correlation between the scoresfor X and the scores for Y is the same for the fitted and observed data. This model usesthe ordinality of X and Y , and is a parsimonious special case of model (XY,XZ, Y Z).b. The third equation is replaced by the K equations,

    i

    j

    uivjijk =i

    j

    uivjnijk, k = 1, ..., K.

    This model corresponds to fitting L L model separately at each level of Z. The G2value is the sum of G2 for separate fits, and df is the sum of IJ I J values fromseparate fits (i.e., df = K(IJ I J)).31. Deleting the XY superscript to simplify notation,

    log ij(k) = (ij + i+1,j+1 i,j+1 i+1,j) + (ui+1 ui)(vj+1 vj)wk.This has form ij + ijwk, a linear function of the scores for the levels of Z. Thus, theconditional association between X and Y changes linearly across the levels of Z.

  • 21

    36. Suppose ML estimates did exist, and let c = 111. Then c > 0, since we must be ableto evaluate the logarithm for all fitted values. But then 112 = n112 c, since likelihoodequations for the model imply that 111 + 112 = n111 + n112 (i.e., 11+ = n11+). Usingsimilar arguments for other two-way margins implies that 122 = n122+c, 212 = n212+c,and 222 = n222 c. But since n222 = 0, 222 = c < 0, which is impossible. Thus wehave a contradiction, and it follows that ML estimates cannot exist for this model.

    37. That value for the sufficient statistic becomes more likely as the model parametermoves toward infinity.

    Chapter 11

    6. a. Ignoring order, (A=1,B=0) occurred 45 times and (A=0,B=1)) occurred 22 times.The McNemar z = 2.81, which has a two-tail P -value of 0.005 and provides strong evi-dence that the response rate of successes is higher for drug A.b. Pearson statistic = 7.8, df = 1

    7. b. z2 = (3 1)2/(3 + 1) = 1.0 = CMH statistic.e. The P -value equals the binomial probability of 3 or more successes out of 4 trials whenthe success probability equals 0.5, which equals 5/16.

    9. a. Symmetry has G2 = 22.5, X2 = 20.4, with df = 10. The lack of fit resultsprimarily from the discrepancy between n13 and n31, for which the adjusted residual is(44 17)/44 + 17 = 3.5.b. Compared to quasi symmetry, G2(S | QS) = 22.5 10.0 = 12.5, df = 4, for a P -valueof .014. The McNemar statistic for the 22 table with row and column categories (HighPoint, Others) is z = (78 42)/78 + 42 = 3.3. The 95% CI comparing the proportionchoosing High Point at the two times is 0.067 0.039.c. Quasi independence fits much better than independence, which has G2 = 346.4(df = 16). Given a change in brands, the new choice of coffee brand is plausibly in-dependent of the original choice.

    12. a. Symmetry model has X2 = 0.59, based on df = 3 (P = 0.90). Independence hasX2 = 45.4 (df = 4), and quasi independence has X2 = 0.01 (df = 1) and is identical toquasi symmetry. The symmetry and quasi independence models fit well.b. G2(S | QS) = 0.591 0.006 = 0.585, df = 3 1 = 2. Marginal homogeneity isplausible.c. Kappa = 0.389 (SE = 0.060), weighted kappa equals 0.427 (SE = 0.0635).

    15. Under independence, on the main diagonal, fitted = 5 = observed. Thus, kappa =0, yet there is clearly strong association in the table.

    16. a. Good fit, with G2 = 0.3, df = 1. The parameter estimates for Coke, Pepsi, andClassic Coke are 0.580 (SE = 0.240), 0.296 (SE = 0.240), and 0. Coke is preferred toClassic Coke.b. model estimate = 0.57, sample proportion = 29/49 = 0.59.

    17. G2 = 4.29, X2 = 4.65, df = 3; With JRSS-B parameter = 0, other estimates are

  • 22

    -0.269 for Biometrika, -0.748 for JASA, -3.218 for Communications, so prestige rankingis: 1. JRSS-B, 2. Biometrika, 3. JASA, 4. Commun. Stat.

    26. The matched-pairs t test compares means for dependent samples, and McNemarstest compares proportions for dependent samples. The t test is valid for interval-scaledata (with normally-distributed differences, for small samples) whereas McNemars testis valid for binary data.

    28. a. This is a conditional odds ratio, conditional on the subject, but the other modelis a marginal model so its odds ratio is not conditional on the subject.d. This is simply the mean of the expected values of the individual binary observations.e. In the three-way representation, note that each partial table has one observation ineach row. If each response in a partial table is identical, then each cross-product thatcontributes to the M-H estimator equals 0, so that table makes no contribution to thestatistic. Otherwise, there is a contribution of 1 to the numerator or the denominator,depending on whether the first observation is a success and the second a failure, or thereverse. The overall estimator then is the ratio of the numbers of such pairs, or in termsof the original 22 table, this is n12/n21.30. When {i} are identical, the individual trials for the conditional model are identicalas well as independent, so averaging over them to get the marginal Y1 and Y2 gives bino-mials with the same parameters.

    31. Since M = log[+1(1 1+)/(1 +1)1+], M/+1 = 1/+1 + 1/(1 +1) =1/+1(1 +1) and M/1+ = 1/(1 1+) 1/1+ = 1/1+(1 1+). The covari-ance matrix of

    n(p+1, p1+) has variances +1(1 +1) and 1+(1 1+) and covariance

    (11221221). By the delta method, the asymptotic variance ofn[log(p+1p2+/p+2p1+)

    log(+12+/+21+)] is

    [1/+1(1 +1),1/1+(1 1+)]Cov[n(p+1, p1+)][1/+1(1 +1),1/1+(1 1+)]

    which simplifies to the expression given in the problem. Under independence, the lastterm in the variance expression drops out (since an odds ratio of 1.0 implies 1122 =1221) and the variance simplifies to (1+2+)

    1+ (+1+2)1. Similarly, with the delta

    method the asymptotic variance ofn(C) is

    112 +

    121 (which leads to the SE in (11.10)

    for C)); under independence, this is (1++2)1+(+12+)

    1. For each variance, combin-ing the two parts to get a common denominator, then expressing marginal probabilitiesin each numerator in terms of cell probabilities and comparing the two numerators givesthe result.

    34. Consider the 33 table with cell probabilities, by row, (0.20, 0.10, 0, / 0, 0.30, 0.10,/ 0.10, 0, 0.20).

    41. a. Since ab = ba, it satisfies symmetry, which then implies marginal homogeneityand quasi symmetry as special cases. For a 6= b, ab has form ab, identifying b withb(1 ), so it also satisfies quasi independence.c. = = 0 is equivalent to independence for this model, and = = 1 is equivalentto perfect agreement.

  • 23

    43. a. log(ac/ca) = a c = (a b) + (b c) = log(ab/ba) + log(bc/cb).b. No, this is not possible, since if a is preferred to b then a > b, and if b is preferredto c then b > c; then, it follows that a > c, so a is preferred to c.

    45. The kernel of the log likelihood simplifies to

    a

  • 24

    variance is

    V =[

    i

    (i

    )[v(i)]

    1(i

    )]1= [

    i

    (1/i)]1 = /n.

    Thus, the model-based asymptotic variance estimate is y/n. The actual asymptoticvariance that allows for variance misspecification is

    V

    [i

    (i

    )[v(i)]

    1Var(Yi)[v(i)]1(i

    )]V

    = (/n)[i

    (1/i)Var(Yi)(1/i)](/n) = (i

    Var(Yi))/n2,

    which is estimated by [

    i(Yi y)2]/n2. The model-based estimate tends to be betterwhen the model holds, and the robust estimate tends to be better when there is severeoverdispersion so that the model-based estimate tends to underestimate the SE.

    23. Since i/ = 1, u() =

    i

    (i

    )v(i)

    1(yi i) = (1/2)i(yi i) =(1/2)

    i(yi ). Setting this equal to 0, = (

    i yi)/n = y. Also,

    V =[

    i

    (i

    )[v(i)]

    1(i

    )]1= [

    i

    (1/2)]1 = 2/n.

    Also, the actual asymptotic variance that allows for variance misspecification is

    V

    [i

    (i

    )[v(i)]

    1Var(Yi)[v(i)]1(i

    )]V = (2/n)[

    i

    (1/2)i(1/2)](2/n) = (

    i)/n

    2.

    Replacing the true variance i in this expression by (yiy)2, the last expression simplifies(using i = ) to

    i(yi y)2/n2.

    26. a. GEE does not assume a parametric distribution, but only a variance function anda correlation structure.b. An advantage is being able to extend ordinary GLMs to allow for overdispersion, forinstance by permitting the variance to be some constant multiple of the variance for theusual GLM. A disadvantage is not having a likelihood function and related likelihood-ratio tests and confidence intervals.c. They are consistent if the model for the mean is correct, even if one misspecifies thevariance function and correlation structure. They are not consistent if one misspecifiesthe model for the mean.

    31. b. For 0 < i < I, i+1|i(t) = , i1|i(t) = 1 , and j|i = 0 otherwise. For theabsorbing states, 0|0 = 1 and I|I = 1.

    Chapter 13

  • 25

    3. For a given subject, the odds of having used cigarettes are estimated to equalexp[1.6209 (0.7751) = 11.0 times the odds of having used marijuana. The largevalue of = 3.5 reflects strong associations among the three responses.

    6. a. B = 1.99 (SE = 0.35), C = 2.51 (SE = 0.37), with = 0. e.g., for a givensubject for any sequence, the estimated odds of relief for A are exp(1.99) = 0.13 timesthe estimated odds for B (and odds ratio = 0.08 comparing A to C and 0.59 comparingB and C). Taking into account SE values, B and C are better than A.

    b. Comparing the simpler model with the model in which treatment effects vary bysequence, double the change in maximized log likelihood is 13.6 on df = 10; P = 0.19 forcomparing models. The simpler model is adequate. Adding period effects to the simplermodel, the likelihood-ratio statistic = 0.5, df = 2, so the evidence of a period effect isweak.

    7. a. For a given department, the estimated odds of admission for a female are exp(0.173) =1.19 times the estimated odds of admission for a male. For the random effects model, fora given department, the estimated odds of admission for a female are exp(0.163) = 1.18times the estimated odds of admission for a male.b. The estimated mean log odds ratio between gender and admissions, given department,is 0.176, corresponding to an odds ratio of 1.19. Because of the extra variance compo-nent, permitting heterogeneity among departments, the estimate of is not as precise.(Note that the marginal odds ratio of exp(0.07) = 0.93 is in a different direction, cor-responding to an odds of being admitted that is lower for females than for males. Thisis Simpsons paradox, and by results in Chapter 9 on collapsibility is possible when De-partment is associated both with gender and with admissions.)c. The random effects model assumes the true log odds ratios come from a normal dis-tribution. It smooths the sample values, shrinking them toward a common mean.

    11. When is large, subject-specific estimates of random effects models tend to be muchlarger than population-averaged estimates from marginal models. See Sec. 13.2.3.

    13. 2M1M = 0.39 (SE = 0.09), 2A1A = 0.07 (SE = 0.06), with 1 = 4.1, 2 = 1.8,and estimated correlation 0.33 between random effects.

    15. b. When is large, the log likelihood is flat and many N values are consistent withthe sample. A narrower interval is not necessarily more reliable. If the model is incorrect,the actual coverage probability may be much less than the nominal probability.

    23. When = 0, the model is equivalent to the marginal one deleting the random effect.Then, probability = odds/(1 + odds) = exp[logit(qi) + ]/[1 + exp[logit(qi) + ]]. Also,exp[logit(qi)] = exp[log(qi)log(1qi)] = qi/(1qi). The estimated probability is mono-tone increasing in . Thus, as the Democratic vote in the previous election increases, sodoes the estimated Democratic vote in this election.

    25. a. P (Yit = 1|ui) = (xit + zitui), so

    P (Yit = 1) =P (Z xit + z

    itui)f(u;)dui,

  • 26

    where Z is a standard normal variate that is independent of ui. Since Z zitui has aN(0, 1+z

    itzit) distribution, t he probability in the integrand is (x

    it[1+z

    itzit]1/2),

    which does not depend on ui, so the integral is the same.b. The parameters in the marginal model equal those in the GLMM divided by [1 +z

    itzit]1/2, which in the univariate case is

    1 + 2.

    27. b. Two terms drop out because 11 = n11 and 22 = n22.c. Setting 0 = 0, the fit is that of the symmetry model, for which log(21/12) = 0.

    Chapter 14

    3. With exchangeable association, the same parameter applies for each associationterm. One can express each association term as that parameter times a product ofdummy variables for the capture outcome for each time. The predicted missing count is22.5, and the predicted population size is 90.5.

    6.a. Including litter size as a predictor, its estimate is -0.102 with SE = 0.070. Thereis not strong evidence of a litter size effect. b. The estimates for the four groups are0.32, 0.02, -0.03, and 0.02. Only the placebo group shows evidence of overdispersion.

    7.a. Estimate should be minus infinity.b. Using the QL approach with beta-binomial form of variance, the = 0.12.

    10. a. No, because the Poisson distribution has variance equal to the mean and has modeequal to the integer part of the mean.b. A difference of 0.308 in fitted log means corresponds to a fitted ratio of exp(0.308) =1.36. The Wald CI is exp[0.308 1.96(0.038)] = exp(0.2335, 0.3825) = (1.26, 1.47).c. exp[0.3081.96(0.127)] = (1.06, 1.75). The CI is much wider for the negative binomialmodel, since the Poisson model exhibits considerable overdispersion and fits much worse.

    13. a. log = 4.05 + 0.19x.b. log = 5.78 + 0.24x, with estimated standard deviation 1.0 for the random effect.17. In the multinomial log likelihood,

    ny1,...,yT log y1,...,yT ,

    one substitutes

    y1,...,yT =q

    z=1

    [Tt=1

    P (Yt = yt | Z = z)]P (Z = z).

    18. The null model falls on the boundary of the parameter space in which the weightsgiven the two components are (1, 0). For ordinary chi-squared distributions to apply, theparameter in the null must fall in the interior of the parameter space.

    20. E(Yit) = i = E(Y2it), so Var(Yit) = i2i . Also, Cov(YitYis) = Corr(YitYis)

    [Var(Yit)][Var(Yis)] =

  • 27

    i(1 i). Then,

    Var(i

    Yit) =i

    Var(Yit+2i 70 and 83 if they have dementia orParkinsons.b. Predict disenrollment for no one.

    4.a. Predict at least one satellite for the 88 crabs of width 25.85 cm and color in thethree lowest categories, and for the 33 crabs of width < 25.85 cm and color in the two

  • 28

    lowest categories. Predict no satellites for the other cases.b. Those with the greater width (between 25.55 and 25.85) are predicted not to havesatellites, whereas the logistic model predicts that width has a positive effect on the like-lihood of a satellite (given the color).10.b. For the Jaccard dissimilarity, the first step combines states with Jaccard dissim-ilarity 0, which is only California and Florida, and the other states stay as separateclusters. At the second step, New York and Texas are combined, having Jaccard dis-similarity 0.333. (At subsequent steps, Illinois is joined with Massachusetts, then Michi-gan is joined with the California/Florida cluster, thenthe New York/Texas cluster isjoined with the Illinois/Massachusetts cluster, then Washington is joined with the Cali-fornia/Florida/Michigan cluster, and then finally the two clusters left are joined to givea single cluster.15. For such an application, the ratio d/p is very close to 1 (since for any given product,there is a high probability that neither person buys it). So, the dissimilarity (b + c)/pwould typically be very close to 0, even if there is no agreement between the people inwhat they buy.

    Chapter 16

    1. a. P -value is 0.22 for Pearson test and 0.208 for the one-sided Fishers exact testP -value, and 0.245 for the two-sided Fishers exact test P -value based on summing allprobabilities no greater than observed.b. Large-sample CI for odds ratio is (0.51, 15.37), and exact based on Cornfield approachis (0.39, 31.04).

    3. a. The margins all equal 1.0. Every table with these margins has each estimatedexpected frequency equal to 1/I, has I cell entries equal to 1 and (I2 - I) entries equalto 0, and has the same probability (namely, 1/I!, from formula 3.26). Thus the sum ofprobabilities of tables no more likely to occur than the observed table equals 1.0. Thisis the P -value for the test that orders the tables by their probabilities under H0. (Also,each table has the same X2 value. Thus the probability of observing X2 at least as largeas the given one is P = 1.0.)b. There are I! different tables with the given margins, each equally likely. The largestpossible value of C D occurs for the observed table, so the null P [C D observed]= (1/I!). Thus using the information on ordering can result in a much different P -valuethan ignoring it.

    9. Using the delta method, the asymptotic variance is (1 2)2/4n. This vanishes when= 1/2, and the convergence of the estimated standard deviation to the true value isthen faster than the usual rate.

    11.a. By the delta method with the square root function,n[Tn/n] is asymptot-

    ically normal with mean 0 and variance (1/2)2(), or in other words

    Tn n is

    asymptotically N(0, 1/4).

    b. If g(p) = arcsin(p), then g(p) = (1/

    1 p)(1/2p) = 1/2

    p(1 p), and the result

  • 29

    follows using the delta method. Ordinary least squares assumes constant variance.

    16.a. By the delta method with the square root function,n[Tn/n] is asymptot-

    ically normal with mean 0 and variance (1/2)2(), or in other words

    Tn n is

    asymptotically N(0, 1/4).

    b. If g(p) = arcsin(p), then g(p) = (1/

    1 p)(1/2p) = 1/2

    p(1 p), and the result

    follows using the delta method. Ordinary least squares assumes constant variance.

    17. The vector of partial derivatives, evaluated at the parameter value, is zero. Hencethe asymptotic normal distribution is degenerate, having a variance of zero. Using thesecond-order terms in the Taylor expansion yields an asymptotic chi-squared distribu-tion.

    19.a. The vector pi/ equals (2, 12, 12,2(1)). Multiplying this by the diag-onal matrix with elements [1/, [(1)]1/2, [(1)]1/2, 1/(1)] on the main diagonalshows that A is the 41 vector A = [2, (12)/[(1)]1/2, (12)/[(1)]1/2,2].Recall that = (p1+ + p+1)/2. Since A

    A = 8 + (1 2)2/(1 ), the asymptoticvariance of is (AA)1/n = 1/n[8+(12)2/(1)], which simplifies to (1)/2n.This is maximized when = .5, in which case the asymptotic variance is 1/8n. When = 0, then p1+ = p+1 = 0 with probability 1, so = 0 with probability 1, and theasymptotic variance is 0. When = 1, = 1 with probability 1, and the asymptoticvariance is also 0. In summary, the asymptotic normality of applies for 0 < < 1, thatis when is not on the boundary of the parameter space. This is one of the regularityconditions that is assumed in deriving results about asymptotic distributions.b. The asymptotic covariance matrix is (pi/)(AA)1(pi/) = [(1 )/2][2, 12, 1 2,2(1 )][2, 1 2, 1 2,2(1 )].23. X2 and G2 necessarily take very similar values when (1) the model holds, (2) thesample size n is large, and (3) the number of cells N is small compared to the samplesize n, so that the expected frequencies in the cells are relatively large.

    29. P (|P Po| B) = P (|P Po|/Po(1 Po)/M B/

    Po(1 Po)/M . By the

    approximate normality of P , this is approximately 1 if B/Po(1 Po)/M = z/2.

    Solving for M gives the result.

    31. For every possible outcome the Clopper-Pearson CI contains 0.5. e.g., when y = 5, theCI is (0.478, 1.0), since for 0 = 0.478 the binomial probability of y = 5 is 0.478

    5 = 0.025.

    33.

    L() = log

    (n1+n11

    )(n n1+n+1 n11

    )n11

    m+u=m

    (n1+u

    )(n n1+n+1 u

    )u

    , m n11 m+

    = n11 log() log[( n1+

    u

    )(n n1+n+1 u

    )u]+ .

  • 30

    L

    =

    n11u

    (n1+u

    )(n n1+n+1 u

    )u1

    ( n1+u

    )(n n1+n+1 u

    )u

    ,

    so satisfies

    n11 = n1+

    ( n1+ 1u 1

    )(n n1+n+1 u

    )u

    ( n1+u

    )(n n1+n+1 u

    )u

    .

    Now,

    E(n11) =

    u

    (n1+u

    )(n n1+n+1 u

    )u

    ( n1+u

    )(n n1+n+1 u

    )u

    = n1+

    ( n1+ 1u 1

    )(n n1+n+1 u

    )u

    ( n1+u

    )(n n1+n+1 u

    )u

    Thus ML estimate satisfies E(n11) = n11.