418

Exercises and Solutions in Biostatistical Theory (2010)

Embed Size (px)

DESCRIPTION

Biostatistics

Citation preview

Page 1: Exercises and Solutions in Biostatistical Theory (2010)
Page 2: Exercises and Solutions in Biostatistical Theory (2010)
Page 3: Exercises and Solutions in Biostatistical Theory (2010)
Page 4: Exercises and Solutions in Biostatistical Theory (2010)
Page 5: Exercises and Solutions in Biostatistical Theory (2010)
Page 6: Exercises and Solutions in Biostatistical Theory (2010)
Page 7: Exercises and Solutions in Biostatistical Theory (2010)

Chapman & Hall/CRCTaylor & Francis Group6000 Broken Sound Parkway NW, Suite 300Boca Raton, FL 33487-2742

© 2011 by Taylor and Francis Group, LLCChapman & Hall/CRC is an imprint of Taylor & Francis Group, an Informa business

No claim to original U.S. Government works

Printed in the United States of America on acid-free paper10 9 8 7 6 5 4 3 2 1

International Standard Book Number: 978-1-58488-722-5 (Paperback)

This book contains information obtained from authentic and highly regarded sources. Reasonable efforts have been made to publish reliable data and information, but the author and publisher cannot assume responsibility for the validity of all materials or the consequences of their use. The authors and publishers have attempted to trace the copyright holders of all material reproduced in this publication and apologize to copyright holders if permission to publish in this form has not been obtained. If any copyright material has not been acknowledged please write and let us know so we may rectify in any future reprint.

Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmit-ted, or utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying, microfilming, and recording, or in any information storage or retrieval system, without written permission from the publishers.

For permission to photocopy or use material electronically from this work, please access www.copyright.com (http://www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organization that provides licenses and registration for a variety of users. For organizations that have been granted a photocopy license by the CCC, a separate system of payment has been arranged.

Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation without intent to infringe.

Library of Congress Cataloging-in-Publication Data

Kupper, Lawrence L.Exercises and solutions in biostatistical theory / Lawrence L. Kupper, Sean M. O’Brien,

Brian H. Neelon.p. cm. -- (Chapman & Hall/CRC texts in statistical science series)

Includes bibliographical references and index.ISBN 978-1-58488-722-5 (pbk. : alk. paper)1. Biometry--Problems, exercises, etc. I. O’Brien, Sean M. II. Neelon, Brian H. III.

Title.

QH323.5.K87 2010570.1’5195--dc22 2010032496

Visit the Taylor & Francis Web site athttp://www.taylorandfrancis.com

and the CRC Press Web site athttp://www.crcpress.com

Page 8: Exercises and Solutions in Biostatistical Theory (2010)

To my wonderful wife Sandy, to the hundreds of students who have taken my

courses in biostatistical theory, and to the many students and colleagues who have

collaborated with me on publications involving both theoretical and applied

biostatistical research.

Lawrence L. Kupper

To Sara, Oscar, and my parents for their unwavering support, and to Larry,

a true mentor.

Brian H. Neelon

To Sarah and Avery, for support and inspiration.

Sean M. O’Brien

Page 9: Exercises and Solutions in Biostatistical Theory (2010)
Page 10: Exercises and Solutions in Biostatistical Theory (2010)

Contents

Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiiiAcknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvAuthors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvii

1. Basic Probability Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Concepts and Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.1.1 Counting Formulas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1.1.1 N-tuples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1.1.2 Permutations . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1.1.3 Combinations . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1.1.4 Pascal’s Identity . . . . . . . . . . . . . . . . . . . . . . . . . 21.1.1.5 Vandermonde’s Identity . . . . . . . . . . . . . . . . . . . 3

1.1.2 Probability Formulas . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.1.2.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.1.2.2 Mutually Exclusive Events . . . . . . . . . . . . . . . . . 41.1.2.3 Conditional Probability . . . . . . . . . . . . . . . . . . . 41.1.2.4 Independence . . . . . . . . . . . . . . . . . . . . . . . . . . 51.1.2.5 Partitions and Bayes’ Theorem . . . . . . . . . . . . . . 5

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2. Univariate Distribution Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . 452.1 Concepts and Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

2.1.1 Discrete and Continuous Random Variables . . . . . . . . . 452.1.2 Cumulative Distribution Functions . . . . . . . . . . . . . . . . 452.1.3 Median and Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . 462.1.4 Expectation Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . 462.1.5 Some Important Expectations . . . . . . . . . . . . . . . . . . . . 47

2.1.5.1 Mean . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 472.1.5.2 Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 472.1.5.3 Moments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 472.1.5.4 Moment Generating Function . . . . . . . . . . . . . . 482.1.5.5 Probability Generating Function . . . . . . . . . . . . 48

2.1.6 Inequalities Involving Expectations . . . . . . . . . . . . . . . . 492.1.6.1 Markov’s Inequality . . . . . . . . . . . . . . . . . . . . . 492.1.6.2 Jensen’s Inequality . . . . . . . . . . . . . . . . . . . . . . 49

ix

Page 11: Exercises and Solutions in Biostatistical Theory (2010)

x Contents

2.1.6.3 Hölder’s Inequality . . . . . . . . . . . . . . . . . . . . . 492.1.7 Some Important Probability Distributions for Discrete

Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . 492.1.7.1 Binomial Distribution . . . . . . . . . . . . . . . . . . . . 492.1.7.2 Negative Binomial Distribution . . . . . . . . . . . . 502.1.7.3 Poisson Distribution . . . . . . . . . . . . . . . . . . . . . 502.1.7.4 Hypergeometric Distribution . . . . . . . . . . . . . . 50

2.1.8 Some Important Distributions (i.e., Density Functions)for Continuous Random Variables . . . . . . . . . . . . . . . . 512.1.8.1 Normal Distribution . . . . . . . . . . . . . . . . . . . . . 512.1.8.2 Lognormal Distribution . . . . . . . . . . . . . . . . . . 512.1.8.3 Gamma Distribution . . . . . . . . . . . . . . . . . . . . 512.1.8.4 Beta Distribution . . . . . . . . . . . . . . . . . . . . . . . 522.1.8.5 Uniform Distribution . . . . . . . . . . . . . . . . . . . . 52

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

3. Multivariate Distribution Theory . . . . . . . . . . . . . . . . . . . . . . . . . 1073.1 Concepts and Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

3.1.1 Discrete and Continuous Multivariate Distributions . . . 1073.1.2 Multivariate Cumulative Distribution Functions . . . . . . 1083.1.3 Expectation Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108

3.1.3.1 Covariance . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1083.1.3.2 Correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1093.1.3.3 Moment Generating Function . . . . . . . . . . . . . . 109

3.1.4 Marginal Distributions . . . . . . . . . . . . . . . . . . . . . . . . . 1093.1.5 Conditional Distributions and Expectations . . . . . . . . . 1103.1.6 Mutual Independence among a Set of

Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1113.1.7 Random Sample . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1123.1.8 Some Important Multivariate Discrete and Continuous

Probability Distributions . . . . . . . . . . . . . . . . . . . . . . . . 1123.1.8.1 Multinomial . . . . . . . . . . . . . . . . . . . . . . . . . . . 1123.1.8.2 Multivariate Normal . . . . . . . . . . . . . . . . . . . . 112

3.1.9 Special Topics of Interest . . . . . . . . . . . . . . . . . . . . . . . . 1143.1.9.1 Mean and Variance of a Linear Function of

Random Variables . . . . . . . . . . . . . . . . . . . . . . 1143.1.9.2 Convergence in Distribution . . . . . . . . . . . . . . . 1143.1.9.3 Order Statistics . . . . . . . . . . . . . . . . . . . . . . . . . 1143.1.9.4 Method of Transformations . . . . . . . . . . . . . . . 115

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132

Page 12: Exercises and Solutions in Biostatistical Theory (2010)

Contents xi

4. Estimation Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1834.1 Concepts and Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183

4.1.1 Point Estimation of Population Parameters . . . . . . . . . . 1834.1.1.1 Method of Moments (MM) . . . . . . . . . . . . . . . . 1834.1.1.2 Unweighted Least Squares (ULS) . . . . . . . . . . . 1844.1.1.3 Weighted Least Squares (WLS) . . . . . . . . . . . . . 1844.1.1.4 Maximum Likelihood (ML) . . . . . . . . . . . . . . . 184

4.1.2 Data Reduction and Joint Sufficiency . . . . . . . . . . . . . . 1844.1.2.1 Joint Sufficiency . . . . . . . . . . . . . . . . . . . . . . . . 1844.1.2.2 Factorization Theorem . . . . . . . . . . . . . . . . . . . 185

4.1.3 Methods for Evaluating the Properties of a PointEstimator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1854.1.3.1 Mean-Squared Error (MSE) . . . . . . . . . . . . . . . 1854.1.3.2 Cramér–Rao Lower Bound (CRLB) . . . . . . . . . . 1864.1.3.3 Efficiency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1864.1.3.4 Completeness . . . . . . . . . . . . . . . . . . . . . . . . . 1874.1.3.5 Rao–Blackwell Theorem . . . . . . . . . . . . . . . . . . 187

4.1.4 Interval Estimation of Population Parameters . . . . . . . . 1874.1.4.1 Exact Confidence Intervals . . . . . . . . . . . . . . . . 1874.1.4.2 Exact CI for the Mean of a Normal

Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . 1874.1.4.3 Exact CI for a Linear Combination of

Means of Normal Distributions . . . . . . . . . . . . 1884.1.4.4 Exact CI for the Variance of a Normal

Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . 1894.1.4.5 Exact CI for the Ratio of Variances of Two

Normal Distributions . . . . . . . . . . . . . . . . . . . . 1904.1.4.6 Large-Sample Approximate CIs . . . . . . . . . . . . 1904.1.4.7 Consistency . . . . . . . . . . . . . . . . . . . . . . . . . . . 1914.1.4.8 Slutsky’s Theorem . . . . . . . . . . . . . . . . . . . . . . 1914.1.4.9 Construction of ML-Based CIs . . . . . . . . . . . . . 1924.1.4.10 ML-Based CI for a Bernoulli Distribution

Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1934.1.4.11 Delta Method . . . . . . . . . . . . . . . . . . . . . . . . . . 1944.1.4.12 Delta Method CI for a Function of a Bernoulli

Distribution Probability . . . . . . . . . . . . . . . . . . 195Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224

5. Hypothesis Testing Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3075.1 Concepts and Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 307

5.1.1 Basic Principles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3075.1.1.1 Simple and Composite Hypotheses . . . . . . . . . 3075.1.1.2 Null and Alternative Hypotheses . . . . . . . . . . . 3075.1.1.3 Statistical Tests . . . . . . . . . . . . . . . . . . . . . . . . . 307

Page 13: Exercises and Solutions in Biostatistical Theory (2010)

xii Contents

5.1.1.4 Type I and Type II Errors . . . . . . . . . . . . . . . . . 3075.1.1.5 Power . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3085.1.1.6 Test Statistics and Rejection Regions . . . . . . . . . 3085.1.1.7 P-Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 309

5.1.2 Most Powerful (MP) and Uniformly Most Powerful(UMP) Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3095.1.2.1 Review of Notation . . . . . . . . . . . . . . . . . . . . . 310

5.1.3 Large-Sample ML-Based Methods for Testing theSimple Null Hypothesis H0 : θ = θ0 (i.e., θ ∈ ω) versusthe Composite Alternative Hypothesis H1 : θ ∈ ω . . . . . 3105.1.3.1 Likelihood Ratio Test . . . . . . . . . . . . . . . . . . . . 3105.1.3.2 Wald Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3115.1.3.3 Score Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311

5.1.4 Large Sample ML-Based Methods for Testing theComposite Null Hypothesis H0 : θ ∈ ω versus theComposite Alternative Hypothesis H1 : θ ∈ ω . . . . . . . . 3125.1.4.1 Likelihood Ratio Test . . . . . . . . . . . . . . . . . . . . 3135.1.4.2 Wald Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3135.1.4.3 Score Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329

Appendix Useful Mathematical Results . . . . . . . . . . . . . . . . . . . . . 389A.1 Summations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 389A.2 Limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 389A.3 Important Calculus-Based Results . . . . . . . . . . . . . . 390A.4 Special Functions . . . . . . . . . . . . . . . . . . . . . . . . . . 391A.5 Approximations . . . . . . . . . . . . . . . . . . . . . . . . . . . 391A.6 Lagrange Multipliers . . . . . . . . . . . . . . . . . . . . . . . 393

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 395

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 399

Page 14: Exercises and Solutions in Biostatistical Theory (2010)

Preface

This exercises-and-solutions book contains exercises and their detailedsolutions covering statistical theory (from basic probability theory throughthe theory of statistical inference) that is taught in courses taken by advancedundergraduate students, and first-year and second-year graduate students,in many quantitative disciplines (e.g., statistics, biostatistics, mathematics,engineering, physics, computer science, psychometrics, epidemiology, etc.).

The motivation for, and the contents of this book, stem mainly from theclassroom teaching experiences of author Lawrence L. Kupper, who hastaught graduate-level courses in biostatistical theory for almost four decadesas a faculty member with the University of North Carolina Department of Bio-statistics. These courses have been uniformly and widely praised by studentsfor their rigor, clarity, and use of real-life settings to illustrate the practicalutility of the theoretical concepts being taught. Several exercises in this bookhave been motivated by actual biostatistical collaborative research experi-ences (including those of the three authors), where theoretical biostatisticalprinciples have been used to address complicated research design and anal-ysis issues (especially in fields related to the health sciences). The authorsstrongly believe that the best way to obtain an in-depth understanding ofthe principles of biostatistical theory is to work through exercises whosesolutions require nontrivial and illustrative utilization of relevant theoreti-cal concepts. The exercises in this book have been prepared with this beliefin mind. Mastery of the theoretical statistical strategies needed to solve theexercises in this book will prepare the reader for successful study of evenhigher-level statistical theory.

The exercises and their detailed solutions are divided into five chapters:Basic Probability Theory; Univariate Distribution Theory; Multivariate Dis-tribution Theory; Estimation Theory; and Hypothesis Testing Theory. Thechapters are arranged sequentially in the sense that a good understanding ofbasic probability theory is needed for exercises dealing with univariate dis-tribution theory, and univariate distribution theory provides the basis for theextensions to multivariate distribution theory. The material in the first threechapters is needed for the exercises on statistical inference that constitute thelast two chapters of the book. The exercises in each chapter vary in level of dif-ficulty from fairly basic to challenging, with more difficult exercises identifiedwith an asterisk. Each of the five chapters begins with a detailed introductionsummarizing the statistical concepts needed to help solve the exercises in that

xiii

Page 15: Exercises and Solutions in Biostatistical Theory (2010)

xiv Preface

chapter of the book. The book also contains a brief summary of some usefulmathematical results (see Appendix A).

The main mathematical prerequisite for this book is an excellent workingknowledge of multivariable calculus, along with some basic knowledge aboutmatrices (e.g., matrix multiplication, the inverse of a matrix, etc.).

This exercises-and-solutions book is not meant to be used as the maintextbook for a course on statistical theory. Some examples of excellent maintextbooks on statistical theory include Casella and Berger (2002), Hogg, Craig,and McKean (2005), Kalbfleish (1985), Ross (2006), and Wackerly, MendenhallIII, and Scheaffer (2008). Rather, our book should serve as a supplementalsource of a wide variety of exercises and their detailed solutions both foradvanced undergraduate and graduate students who take such courses instatistical theory and for the instructors of such courses. In addition, ourbook will be useful to individuals who are interested in enhancing and/orrefreshing their own theoretical statistical skills. The solutions to all exercisesare presented in sufficient detail so that users of the book can see how therelevant statistical theory is used in a logical manner to address importantstatistical questions in a wide variety of settings.

Lawrence L. KupperBrian H. NeelonSean M. O’Brien

Page 16: Exercises and Solutions in Biostatistical Theory (2010)

Acknowledgments

Lawrence L. Kupper acknowledges the hundreds of students who havetaken his classes in biostatistical theory. Many of these students have pro-vided valuable feedback on the lectures, homework sets, and examinationsthat make up most of the material for this book. In fact, two of theseexcellent former students are coauthors of this book (Brian H. Neelon andSean M. O’Brien). The authors want to personally thank Dr. Susan Reade-Christopher for helping with the construction of some exercises and solutions,and they want to thank the reviewers of this book for their helpful sugges-tions. Finally, the authors acknowledge the fact that some exercises mayoverlap in concept with exercises found in other statistical theory books;such conceptual overlap is unavoidable given the breadth of material beingcovered.

xv

Page 17: Exercises and Solutions in Biostatistical Theory (2010)
Page 18: Exercises and Solutions in Biostatistical Theory (2010)

Authors

Lawrence L. Kupper, PhD, is emeritus alumni distinguished professor ofbiostatistics, School of Public Health, University of North Carolina (UNC),Chapel Hill, North Carolina. Dr. Kupper is a fellow of the American Statisti-cal Association (ASA), and he received a Distinguished Achievement Medalfrom the ASA’s Environmental Statistics Section for his research, teaching,and service contributions. During his 40 academic years at UNC, Dr. Kupperhas won several classroom teaching and student mentoring awards. He hascoauthored over 160 papers in peer-reviewed journals, and he has publishedseveral coauthored book chapters. Dr. Kupper has also coauthored three text-books, namely, Epidemiologic Research—Principles and Quantitative Methods,Applied Regression Analysis and Other Multivariable Methods (four editions), andQuantitative Exposure Assessment. The contents of this exercises-and-solutionsbook come mainly from course materials developed and used by Dr. Kupperfor his graduate-level courses in biostatistical theory, taught over a period ofmore than three decades.

Brian H. Neelon, PhD, is a research statistician with the Children’s Envi-ronmental Health Initiative in the Nicholas School of the Environment atDuke University. He obtained his doctorate from the University of NorthCarolina, Chapel Hill, where he received the Kupper Dissertation Awardfor outstanding dissertation-based publication. Before arriving at DukeUniversity, Dr. Neelon was a postdoctoral research fellow in the Departmentof Health Care Policy at Harvard University. His research interests includeBayesian methods, longitudinal data analysis, health policy statistics, andenvironmental health.

Sean M. O’Brien, PhD, is an assistant professor in the Department ofBiostatistics & Bioinformatics at the Duke University School of Medicine.He works primarily on studies of cardiovascular interventions using largemulticenter clinical registries. He is currently statistical director of the Societyof Thoracic Surgeons National Data Warehouse at Duke Clinical ResearchInstitute. His methodological contributions are in the areas of healthcareprovider performance evaluation, development of multidimensional com-posite measures, and clinical risk adjustment. Before joining Duke University,he was a research fellow at the National Institute of Environmental HealthSciences. He received his PhD in biostatistics from the University of NorthCarolina at Chapel Hill in 2002.

xvii

Page 19: Exercises and Solutions in Biostatistical Theory (2010)
Page 20: Exercises and Solutions in Biostatistical Theory (2010)

1Basic Probability Theory

1.1 Concepts and Notation

1.1.1 Counting Formulas

1.1.1.1 N-tuples

With sets {a1, a2, . . . , aq} and {b1, b2, . . . , bs} containing q and s distinct items,respectively, it is possible to form qs distinct pairs (or 2-tuples) of the form(ai, bj), i = 1, 2, . . . , q and j = 1, 2, . . . , s. Adding a third set {c1, c2, . . . , ct} con-taining t distinct items, it is possible to form qst distinct triplets (or 3-tuples) ofthe form (ai, bj, ck), i = 1, 2, . . . , q, j = 1, 2, . . . , s, and k = 1, 2, . . . , t. Extensionsto more than three sets of distinct items are straightforward.

1.1.1.2 Permutations

A permutation is defined to be an ordered arrangement of r distinct items.The number of distinct ways of arranging n distinct items using r at a time isdenoted Pn

r and is computed as

Pnr = n!

(n − r)! ,

where n! = n(n − 1)(n − 2) · · · (3)(2)(1) and where 0! ≡ 1. If the n items arenot distinct, then the number of distinct permutations is less than Pn

r .

1.1.1.3 Combinations

The number of ways of dividing n distinct items into k distinct groups withthe ith group containing ni items, where n =∑k

i=1 ni, is equal to

n!n1!n2! · · · nk! = n!(∏k

i=1 ni!) .

1

Page 21: Exercises and Solutions in Biostatistical Theory (2010)

2 Basic Probability Theory

The above expression appears in the multinomial expansion

(x1 + x2 + · · · + xk)n =

∑∗ n!(∏ki=1 ni!

)xn11 xn2

2 · · · xnkk ,

where the summation symbol∑∗ indicates summation over all possible val-

ues of n1, n2, . . . , nk with ni, i = 1, 2, . . . , k, taking the set of possible values{0, 1, . . . , n} subject to the restriction

∑ki=1 ni = n.

With x1 = x2 = · · · = xk = 1, it follows that

∑∗ n!(∏ki=1 ni!

) = kn.

As an important special case, when k = 2, then

n!n1!n2! = n!

n1!(n − n1)! = Cnn1

,

which is also the number of ways of selecting without replacement n1 itemsfrom a set of n distinct items (i.e., the number of combinations of n distinctitems selected n1 at a time).

The above combinational expression appears in the binomial expansion

(x1 + x2)n =

∑∗ n!n1!n2!xn1

1 xn22 =

n∑n1=0

Cnn1

xn11 xn−n1

2 .

When x1 = x2 = 1, it follows that

n∑n1=0

Cnn1

= 2n.

Example

As a simple example using the above counting formulas, if 5 cards are dealt from awell-shuffled standard deck of 52 playing cards, the number of ways in which sucha 5-card hand would contain exactly 2 aces is equal to qs = C4

2C483 = 103, 776,

where q = C42 = 6 is the number of ways of selecting 2 of the 4 aces and where

s = C483 = 17, 296 is the number of ways of selecting 3 of the remaining 48 cards.

1.1.1.4 Pascal’s Identity

Cnk = Cn−1

k−1 + Cn−1k

for any positive integers n and k such that Cnk ≡ 0 if k > n.

Page 22: Exercises and Solutions in Biostatistical Theory (2010)

Concepts and Notation 3

1.1.1.5 Vandermonde’s Identity

Cm+nr =

r∑k=0

Cmr−kCn

k ,

where m, n, and r are nonnegative integers satisfying r ≤ min{m, n}.

1.1.2 Probability Formulas

1.1.2.1 Definitions

Let an experiment be any process via which an observation or measurement ismade. An experiment can range from a very controlled experimental situationto an uncontrolled observational situation.An example of the former situationwould be a laboratory experiment where chosen amounts of different chem-icals are mixed together to produce a certain chemical product. An exampleof the latter situation would be an epidemiological study where subjects arerandomly selected and interviewed about their smoking and physical activityhabits.

Let A1, A2, . . . , Ap be p(≥ 2) possible events (or outcomes) that could occurwhen an experiment is conducted. Then:

1. For i = 1, 2, . . . , p, the complement of the event Ai, denoted Ai, is theevent that Ai does not occur when the experiment is conducted.

2. The union of the eventsA1, A2, . . . , Ap, denoted∪pi=1Ai, is the event that

at least one of the events A1, A2, . . . , Ap occurs when the experimentis conducted.

3. The intersection of the events A1, A2, . . . , Ap, denoted ∩pi=1Ai, is the

event that all of the events A1, A2, . . . , Ap occur when the experimentis conducted.

Given these definitions, we have the following probabilistic results, wherepr(Ai), 0 ≤ pr(Ai) ≤ 1, denotes the probability that event Ai occurs when theexperiment is conducted:

(i) pr(Ai) = 1 − pr(Ai). More generally,

pr(

∪pi=1Ai

)= 1 − pr

(∪p

i=1Ai

)= pr

(∩p

i=1Ai

)

and

pr(

∩pi=1Ai

)= 1 − pr

(∩p

i=1Ai

)= pr

(∪p

i=1Ai

).

Page 23: Exercises and Solutions in Biostatistical Theory (2010)

4 Basic Probability Theory

(ii) The probability of the union of p events is given by:

pr(∪p

i=1Ai

)=

p∑i=1

pr(Ai) −p−1∑i=1

p∑j=i+1

pr(Ai ∩ Aj)

+p−2∑i=1

p−1∑j=i+1

p∑k=j+1

pr(Ai ∩ Aj ∩ Ak) − · · ·

+ (−1)p−1pr(∩p

i=1Ai

).

As important special cases, we have, for p = 2,

pr(A1 ∪ A2) = pr(A1) + pr(A2) − pr(A1 ∩ A2)

and, for p = 3,

pr(A1 ∪ A2 ∪ A3) = pr(A1) + pr(A2) + pr(A3)

− pr(A1 ∩ A2) − pr(A1 ∩ A3) − pr(A2 ∩ A3)

+ pr(A1 ∩ A2 ∩ A3).

1.1.2.2 Mutually Exclusive Events

For i = j, two events Ai and Aj are said to be mutually exclusive if these twoevents cannot both occur (i.e., cannot occur together) when the experimentis conducted; equivalently, the events Ai and Aj are mutually exclusive whenpr(Ai ∩ Aj) = 0. If the p events A1, A2, . . . , Ap are pairwise mutually exclusive,that is, if pr(Ai ∩ Aj) = 0 for every i = j, then

pr(∪p

i=1Ai

)=

p∑i=1

pr(Ai),

since pairwise mutual exclusivity implies that any intersection involvingmore than two events must necessarily have probability zero of occurring.

1.1.2.3 Conditional Probability

For i = j, the conditional probability that event Ai occurs given that (or con-ditional on the fact that) event Aj occurs when the experiment is conducted,denoted pr(Ai|Aj), is given by the expression

pr(Ai|Aj) = pr(Ai ∩ Aj)

pr(Aj), pr(Aj) > 0.

Page 24: Exercises and Solutions in Biostatistical Theory (2010)

Concepts and Notation 5

Using the above definition, we then have:

pr(∩p

i=1Ai

)= pr

(Ap| ∩p−1

i=1 Ai

)pr(∩p−1

i=1 Ai

)

= pr(

Ap| ∩p−1i=1 Ai

)pr(

Ap−1| ∩p−2i=1 Ai

)pr(∩p−2

i=1 Ai

)

...

= pr(

Ap| ∩p−1i=1 Ai

)pr(

Ap−1| ∩p−2i=1 Ai

)· · · pr(A2|A1)pr(A1).

Note that there would be p! ways of writing the above product of pprobabilities. For example, when p = 3, we have

pr(A1 ∩ A2 ∩ A3) = pr(A3|A1 ∩ A2)pr(A2|A1)pr(A1)

= pr(A2|A1 ∩ A3)pr(A1|A3)pr(A3)

= pr(A1|A2 ∩ A3)pr(A3|A2)pr(A2), and so on.

1.1.2.4 Independence

The events Ai and Aj are said to be independent events if and only if thefollowing equivalent probability statements are true:

1. pr(Ai|Aj) = pr(Ai);2. pr(Aj|Ai) = pr(Aj);3. pr(Ai ∩ Aj) = pr(Ai)pr(Aj).

When the eventsA1, A2, . . . , Ap are mutually independent, so that the conditionalprobability of any event is equal to the unconditional probability of that sameevent, then

pr(∩p

i=1Ai

)=

p∏i=1

pr(Ai).

1.1.2.5 Partitions and Bayes’ Theorem

When pr(∪p

i=1Ai

)= 1, and when the events A1, A2, . . . , Ap are pairwise mutu-

ally exclusive, then the events A1, A2, . . . , Ap are said to constitute a partitionof the experimental outcomes; in other words, when the experiment is con-ducted, exactly one and only one of the events A1, A2, . . . , Ap must occur. If B

Page 25: Exercises and Solutions in Biostatistical Theory (2010)

6 Basic Probability Theory

is any event and A1, A2, . . . , Ap constitute a partition, it follows that

pr(B) = pr[B ∩

(∪p

i=1Ai

)]= pr

[∪p

i=1(B ∩ Ai)]

=p∑

i=1

pr(B ∩ Ai) =p∑

i=1

pr(B|Ai)pr(Ai).

As an illustration of the use of the above formula, if the events A1, A2, . . . , Aprepresent an exhaustive list of all p possible causes of some observed out-come B, where pr(B) > 0, then, given values for pr(Ai) and pr(B|Ai) for alli = 1, 2, . . . , p, one can employ Bayes’ Theorem to compute the probability thatAi was the cause of the observed outcome B, namely,

pr(Ai|B) = pr(Ai ∩ B)

pr(B)= pr(B|Ai)pr(Ai)∑p

j=1 pr(B|Aj)pr(Aj), i = 1, 2, . . . , p.

Note that∑p

i=1 pr(Ai|B) = 1.As an important special case, suppose that the events A1, A2, . . . , Ap con-

stituting a partition are elementary events in the sense that none of these pevents can be further decomposed into smaller events (i.e., for i = 1, 2, . . . , p,the event Ai cannot be written as a union of mutually exclusive events eachhaving a smaller probability than Ai of occurring when the experiment isconducted). Then, any more complex event B (sometimes called a compoundevent) must be able to be represented as the union of two or more of theelementary events A1, A2, . . . , Ap. In particular, with 2 ≤ m ≤ p,

if B = ∪mj=1Aij ,

where the set of positive integers {i1, i2, . . . , im} is a subset of the set of positiveintegers {1, 2, . . . , p}, then

pr(B) =m∑

j=1

pr(Aij).

In the very special case when the elementary events A1, A2, . . . , Ap are equallylikely to occur, so that pr(Ai) = 1

p for i = 1, 2, . . . , p, then pr(B) = mp .

Example

To continue an earlier example, there would be p = C525 = 2,598,960 possible

5-card hands that could be dealt from a well-shuffled standard deck of 52 playingcards. Thus, each such 5-card hand has probability 1

2,598,960 of occurring. If B isthe event that a 5-card hand contains exactly two aces, then

pr(B) = mp

= 103,7762,598,960

= 0.0399.

Page 26: Exercises and Solutions in Biostatistical Theory (2010)

Exercises 7

EXERCISES

Exercise 1.1. Suppose that a pair of balanced dice is tossed. Let Ex be the event thatthe sum of the two numbers obtained is equal to x, x = 2, 3, . . . , 12.

(a) Develop an explicit expression for pr(Ex).

(b) Let A be the event that “x is divisible by 4,” let B be the event that “x is greater than9,” and let C be the event that “x is not a prime number.” Find the numerical valuesof the following probabilities: pr(A), pr(B), pr(C), pr(A∩ B), pr(A∩ C), pr(B ∩ C),pr(A∩ B ∩ C), pr(A∪ B ∪ C), pr(A∪ B|C), and pr(A|B ∪ C).

Exercise 1.2. For any family in the United States, suppose that the probability of anychild being male is equal to 0.50, and that the gender status of any child in a familyis unaffected by the gender status of any other child in that same family. What is theminimum number, say n∗, of children that any U.S. couple needs to have so that theprobability is no smaller than 0.90 of having at least one male child and at least onefemale child?

Exercise 1.3. Suppose that there are three urns. Urn 1 contains three white balls andfour black balls. Urn 2 contains two white balls and three black balls. And, Urn 3contains four white balls and two black balls. One ball is randomly selected from Urn1 and is put into Urn 2. Then, one ball is randomly selected from Urn 2 and is putinto Urn 3. Then, two balls are simultaneously selected from Urn 3. Find the exactnumerical value of the probability that both balls selected from Urn 3 are white.

Exercise 1.4. In the National Scrabble Contest, suppose that the two players in thefinal match (say, Player A and Player B) play consecutive games, with the nationalchampion being that player who is the first to win five games. Assuming that no gamecan end in a tie, the two finalists must necessarily play at least 5 games but no morethan 9 games. Further, assume (probably somewhat unrealistically) that the outcomesof the games are mutually independent of one another, and also assume that π is theprobability that Player A wins any particular game.

(a) Find an explicit expression for the probability that the final match between PlayerA and Player B lasts exactly 6 games.

(b) Given that Player A wins the first two games, find an explicit expression for theprobability that Player A wins the final match in exactly 7 games.

(c) Find an explicit expression for the probability that Player B wins the final match.

Exercise 1.5. Suppose that there are two different diagnostic tests (say, Test A and TestB) for a particular disease of interest. In a certain large population, suppose that theprevalence of this disease is 1%. Among all those people who have this disease in thislarge population, 10% will incorrectly test negatively for the presence of the diseasewhen given Test A; and, independently of any results based on Test A, 5% of thesediseased people will incorrectly test negatively when given Test B. Among all thosepeople who do not have the disease in this large population, 6% will incorrectly testpositively when given Test A; and, independently of any results based on Test A, 8%of these nondiseased people will incorrectly test positively when given Test B.

Page 27: Exercises and Solutions in Biostatistical Theory (2010)

8 Basic Probability Theory

(a) Given that both Tests A and B are positive when administered to a person selectedrandomly from this population, what is the numerical value of the probability thatthis person actually has the disease in question?

(b) Given that Test A is positive when administered to a person randomly selectedfrom this population, what is the numerical value of the probability that Test Bwill also be positive?

(c) Given that a person selected randomly from this population actually has the dis-ease in question, what is the numerical value of the probability that at least one ofthe two different diagnostic tests given to this particular person will be positive?

Exercise 1.6. A certain medical laboratory uses three machines (denoted M1, M2, andM3, respectively) to measure prostate-specific antigen (PSA) levels in blood samplesselected from adult males; high PSA levels have been shown to be associated with thepresence of prostate cancer. Assume that machine M1 has probability 0.01 of providingan incorrect PSA level, that machine M2 has probability 0.02 of providing an incorrectPSA level, and that machine M3 has probability 0.03 of providing an incorrect PSAlevel. Further, assume that machine M1 performs 20% of the PSA analyses done bythis medical laboratory, that machine M2 performs 50% of the PSA analyses, and thatmachine M3 performs 30% of the PSA analyses.

(a) Find the numerical value of the probability that a PSA analysis performed by thismedical laboratory will be done correctly.

(b) Given that a particular PSA analysis is found to be done incorrectly, what is thenumerical value of the probability that this PSA analysis was performed either bymachine M1 or by machine M2?

(c) Given that two independent PSA analyses are performed and that exactly one ofthese two PSA analyses is found to be correct, find the numerical value of theprobability that machine M2 did not perform both of these PSA analyses.

Exercise 1.7. Suppose that two medical doctors, denoted Doctor #1 and Doctor #2,each examine a person randomly chosen from a certain population to check for thepresence or absence of a particular disease. Let C1 be the event that Doctor #1 makesthe correct diagnosis, let C2 be the event that Doctor #2 makes the correct diagnosis,and let D be the event that the randomly chosen patient actually has the disease inquestion; further, assume that the events C1 and C2 are independent conditional ondisease status. Finally, let the prevalence of the disease in the population be θ = pr(D),let π1 = pr(C1|D) = pr(C2|D), and let π0 = pr(C1|D) = pr(C2|D).

(a) Develop an explicit expression for pr(C2|C1). Are the events C1 and C2 uncondi-tionally independent? Comment on the more general implications of this particularexample.

(b) For this particular example, determine specific conditions involving θ, π0, and π1such that pr(C2|C1) = pr(C2).

Exercise 1.8. For a certain state lottery, 5 balls are drawn each day randomly withoutreplacement from an urn containing 40 balls numbered individually from 1 to 40. Sup-pose that there are k (>1) consecutive days of such drawings. Develop an expression

Page 28: Exercises and Solutions in Biostatistical Theory (2010)

Exercises 9

for the probability πk that there is at least one matching set of 5 numbers in those kdrawings.

Exercise 1.9. In a certain small city in the United States, suppose that there are n (≥2)

dental offices listed in that city’s phone book. Further, suppose that k (2 ≤ k ≤ n)

people each independently and randomly call one of these n dental offices for anappointment.

(a) Find the probability α that none of these k people call the same dental office, andthen find the numerical value of α when n = 7 and k = 4.

(b) Find the probability β that all of these k people call the same dental office, and thenfind the numerical value of β when n = 7 and k = 4.

Exercise 1.10. Suppose that the positive integers 1, 2, . . . , k, k ≥ 3, are arranged ran-domly in a horizontal line, thus occupying k slots. Assume that all arrangements ofthese k integers are equally likely. For j = 0, 1, . . . , (k − 2), develop an explicit expres-sion for the probability θj that there are exactly j integers between the integers 1 and k.

Exercise 1.11. Suppose that a balanced die is rolled n (≥6) times. Find an explicitexpression for the probability θn that each of the six numbers 1, 2, . . . , 6 appears atleast once during the n rolls. Find the numerical value of θn when n = 10.

Exercise 1.12. An urn contains N balls numbered 1, 2, 3, . . . , (N − 1), N. A sample ofn (2 ≤ n < N) balls is selected at random with replacement from this urn, and the nnumbers obtained in this sample are recorded. Derive an explicit expression for theprobability that the n numbers obtained in this sample of size n are all different fromone another (i.e., no two or more of these n numbers are the same). If N = 10 andn = 4, what is the numerical value of this probability?

Exercise 1.13. Suppose that an urn contains N (N > 1) balls, each individually labeledwith a number from 1 to N, where N is an unknown positive integer.

(a) If n (2 ≤ n < N) balls are selected one-at-a-time with replacement from this urn, findan explicit expression for the probability θwr that the ball labelled with the numberN is selected.

(b) If n (2 ≤ n < N) balls are selected one-at-a-time without replacement from this urn,find an explicit expression for the probability θwor that the ball labelled with thenumber N is selected.

(c) Use a proof by induction to determine which method of sampling has the higherprobability of selecting the ball labeled with the number N.

Exercise 1.14. A midwestern U.S. city has a traffic system designed to move morningrushhour traffic from the suburbs into this city’s downtown area via three tunnels.During any weekday, there is a probability θ (0 < θ < 1) that there will be inclementweather. Because of the need for periodic maintenance, tunnel i (i = 1, 2, 3) has proba-bility πi (0 < πi < 1) of being closed to traffic on any weekday. Periodic maintenance

Page 29: Exercises and Solutions in Biostatistical Theory (2010)

10 Basic Probability Theory

activities for any particular tunnel occur independently of periodic maintenance activ-ities for any other tunnel, and all periodic maintenance activities for these three tunnelsare performed independently of weather conditions.

The rate of rushhour traffic flow into the downtown area on any weekday is con-sidered to be excellent if there is no inclement weather and if all three tunnels are opento traffic. The rate of traffic flow is considered to be poor if either: (i) more than onetunnel is closed to traffic; or, (ii) there is inclement weather and exactly one tunnel isclosed to traffic. Otherwise, the rate of traffic flow is considered to be marginal.

(a) Develop an explicit expression for the probability that exactly one tunnel is closedto traffic.

(b) Develop explicit expressions for the probability that the rate of traffic flow isexcellent, for the probability that the rate of traffic flow is marginal, and for theprobability that the rate of traffic flow is poor.

(c) Given that a particular weekday has a marginal rate of traffic flow, develop anexplicit expression for the conditional probability that this particular weekday ofmarginal flow is due to inclement weather and not to a tunnel being closed totraffic.

Exercise 1.15. Bonnie and Clyde each independently toss the same unbalanced coinand count the number of tosses that it takes each of them to obtain the first head.Assume that the probability of obtaining a head with this unbalanced coin is equal toπ, 0 < π < 1, with π = 1

2 .

(a) Find the probability that Bonnie and Clyde each require the same number of tossesof this unbalanced coin to obtain the first head.

(b) Find the probability that Bonnie will require more tosses of this unbalanced cointhan Clyde to obtain the first head.

Exercise 1.16. Suppose that 15 senior math majors, 7 males and 8 females, at a majorpublic university in the United States each take the same Graduate Record Examina-tion (GRE) in advanced mathematics. Further, suppose that each of these 15 studentshas probability π, 0 < π < 1, of obtaining a score that exceeds the 80-th percentile forall scores recorded for that particular examination. Given that exactly 5 of these 15students scored higher than the 80-th percentile, what is the numerical value of theprobability θ that at least 3 of these 5 students were female?

Exercise 1.17. In the popular card game bridge, each of four players is dealt a hand of13 cards from a well-shuffled deck of 52 standard playing cards. Find the numericalvalue of the probability that any randomly dealt hand of 13 cards contains all threeface cards of the same suit, where a face card is a jack, a queen, or a king; note that itis possible for a hand of 13 cards to contain all three face cards in at least two differentsuits.

Exercise 1.18∗. In the game known as “craps,” a dice game played in casinos all aroundthe world, a player competes against the casino (called “the house") according to thefollowing rules. If the player (called “the shooter” when rolling the dice) rolls either a7 or an 11 on the first roll of the pair of dice, the player wins the game (and the house,

Page 30: Exercises and Solutions in Biostatistical Theory (2010)

Exercises 11

of course, loses the game); if the player rolls either 2, 3, or 12 on the first roll, the playerloses the game (and the house, of course, wins the game). If the player rolls any of theremaining numbers 4, 5, 6, 8, 9, or 10 on the first roll (such a number is called “thepoint”), the player keeps rolling the pair of dice until either the point is rolled againor until a 7 is rolled. If the point (e.g., 4) is rolled before a 7 is rolled, the player winsthe game; if a 7 is rolled before the point (e.g., 4) is rolled, the player loses the game.Find the exact numerical value of the probability that the player wins the game.

Exercise 1.19∗. In a certain chemical industry, suppose that a proportion πh (0 < πh <

1) of all workers is exposed to a high daily concentration level of a certain potentialcarcinogen, that a proportion πm (0 < πm < 1) of all workers is exposed to a moderatedaily concentration level, that a proportion πl (0 < πl < 1) of all workers is exposedto a low daily concentration level, and that a proportion πo (0 < πo < 1) of all workersreceives no exposure to this potential carcinogen. Note that (πh + πm + πl + πo) = 1.Suppose that n workers in this chemical industry are randomly selected. Let θn be theprobability that an even number of highly exposed workers is included in this randomlyselected sample of n workers, where 0 is considered to be an even number.

(a) Find a difference equation of the form θn = f (πh, θn−1) that expresses θn as a functionof πh and θn−1, where θ0 ≡ 1.

(b) Assuming that a solution to this difference equation is of the form θn = α + βγn,find an explicit solution for this difference equation (i.e., find specific values forα, β, and γ), and then compute the numerical value of θ50 when πh = 0.05.

Exercise 1.20∗. In epidemiological research, a follow-up study involves enrollingrandomly selected disease-free subjects with different sets of values of known orsuspected risk factors for a certain disease of interest and then following these subjectsfor a specified period of time to investigate how these risk factors are related to the riskof disease development (i.e., to the probability of developing the disease of interest).

A model often used to relate a (row) vector of k risk factors x′ = (x1, x2, . . . , xk) tothe probability of developing the disease of interest, where D is the event that a persondevelops the disease of interest, is the logistic model

pr(D|x) =[

1 + e−(β0+∑k

j=1 βjxj)]−1

=[1 + e−(β0+β′x)

]−1 = eβ0+β′x

1 + eβ0+β′x,

where the intercept β0 and the (row) vector β′ = (β1, β2, . . . , βk) constitute a set of(k + 1) regression coefficients.

For certain rare chronic diseases like cancer, a follow-up study can take many years toyield valid and precise statistical conclusions because of the length to time required forsufficient numbers of disease-free subjects to develop the disease. Because of this lim-itation of follow-up studies for studying the potential causes of rare chronic diseases,epidemiologists developed the case–control study. In a case–control study, random sam-ples of cases (i.e., subjects who have the disease of interest) and controls (i.e., subjectswho do not have the disease of interest) are asked to provide information about theirvalues of the risk factors x1, x2, . . . , xk . One problem with this outcome-dependent sam-pling design is that statistical models for the risk of disease will now depend on theprobabilities of selection into the study for both cases and controls.

Page 31: Exercises and Solutions in Biostatistical Theory (2010)

12 Basic Probability Theory

More specifically, let S be the event that a subject is selected to participate in acase–control study. Then, let

π1 = pr(S|D, x) = pr(S|D) and π0 = pr(S|D, x) = pr(S|D)

be the probabilities of selection into the study for cases and controls, respectively,where it is assumed that these selection probabilities do not depend on x.

(a) Assuming the logistic model for pr(D|x) given above, show that the risk of diseasedevelopment for a case–control study, namely pr(D|S, x), can be written as a logisticmodel, but with an intercept that functionally depends on π1 and π0. Commenton this finding with regard to using a case–control study to estimate disease riskas a function of x.

(b) The risk odds ratio comparing the odds of disease for a subject with the set of riskfactors (x∗)′ = (x∗

1, x∗2, . . . , x∗

k ) to the odds of disease for a subject with the set ofrisk factors x′ = (x1, x2, . . . , xk) is defined as

θr = pr(D|x∗)/pr(D|x∗)

pr(D|x)/pr(D|x).

Show thatθr = eβ′(x∗−x),

and then show that the risk odds ratio expression for a case–control study, namely,

θc = pr(D|S, x∗)/pr(D|S, x∗)

pr(D|S, x)/pr(D|S, x),

is also equal to eβ′(x∗−x). Finally, interpret these results with regard to the utilityof case–control studies for epidemiological research.

Exercise 1.21∗. In a certain population of adults, the prevalence of inflammatorybowl disease (IBD) is θ, 0 < θ < 1. Suppose that three medical doctors each indepen-dently examine the same adult (randomly selected from this population) to determinewhether or not this adult has IBD. Further, given that this adult has IBD, supposethat each of the three doctors has probability π1, 0 < π1 < 1, of making the correctdiagnosis that this adult does have IBD; and, given that this adult does not have IBD,suppose that each of the three doctors has probability π0, 0 < π0 < 1, of making thecorrect diagnosis that this adult does not have IBD.

Consider the following two diagnostic strategies:

Diagnostic Strategy #1: The diagnosis is based on the majority opinion of the threedoctors;

Diagnostic Strategy #2: One of the three doctors is randomly chosen and thediagnosis is based on the opinion of just that one doctor.

(a) Find ranges of values for π1 and π0 that jointly represent a sufficient condition forwhich Diagnostic Strategy #1 has a higher probability than Diagnostic Strategy #2of providing the correct diagnosis. Comment on your findings.

Page 32: Exercises and Solutions in Biostatistical Theory (2010)

Exercises 13

(b) Under the stated assumptions, suppose a fourth doctor’s opinion is solicited.Would it be better to make a diagnosis based on the majority opinion of four doc-tors (call this Diagnostic Strategy #3) rather than on the majority opinion of threedoctors (i.e., Diagnostic Strategy #1)? Under Diagnostic Strategy #3, note that nodiagnosis will be made if two doctors claim that the adult has IBD and the othertwo doctors claim that the adult does not have IBD.

Exercise 1.22∗. Consider the following three events:

D: an individual has Alzheimer’s Disease;

E: an individual has diabetes;

M: an individual is male.

And, consider the following list of conditional probabilities:

π11 = pr(D|E ∩ M), π10 = pr(D|E ∩ M), π01 = pr(D|E ∩ M),

π00 = pr(D|E ∩ M), π1 = pr(D|E), and π0 = pr(D|E).

The risk ratio comparing the risk of Alzheimer’s Disease for a diabetic to that for anondiabetic among males is equal to

RR1 = π11π01

;

the risk ratio comparing the risk of Alzheimer’s Disease for a diabetic to that for anondiabetic among females is equal to

RR0 = π10π00

;

and, the crude risk ratio ignoring gender status that compares the risk of Alzheimer’sDisease for a diabetic to that for a nondiabetic is equal to

RRc = π1π0

.

Assuming that RR1 = RR0 = RR (i.e., there is homogeneity [or equality] of the riskratio across gender groups), then gender status is said to be a confounder of the trueassociation between diabetes and Alzheimer’s Disease when RRc = RR.

Under this homogeneity assumption, find two sufficient conditions for which gen-der status will not be a confounder of the true association between diabetes andAlzheimer’s Disease; that is, find two sufficient conditions for which RRc = RR.

Exercise 1.23∗. Consider a diagnostic test which is being used to diagnose the presence(D1) or absence (D1) of some particular disease in a population with pretest probability(or prevalence) of this particular disease equal to pr(D1) = π1; also, let pr(D1) = 1 −π1 = π2. Further, let θ1 = pr(T+|D1) and θ2 = pr(T+|D1) where T+ denotes the eventthat the diagnostic test is positive (i.e., the diagnostic test indicates the presence of thedisease in question).

Page 33: Exercises and Solutions in Biostatistical Theory (2010)

14 Basic Probability Theory

(a) Given that the diagnostic test is positive, prove that the posttest odds ofan individual having, versus not having, the disease in question is given bythe formula

pr(D1|T+)

pr(D1|T+)=(

θ1θ2

)(π1π2

)= LR12

(π1π2

)

where LR12 = θ1/θ2 is the so-called likelihood ratio for the diagnostic test and whereπ1/π2 is the pretest odds of the individual having, versus not having, the diseasein question. Hence, knowledge of the likelihood ratio for a diagnostic test permitsa simple conversion from pretest odds to posttest odds (Birkett, 1988).

(b) Now, suppose we wish to diagnose an individual as having one of three mutuallyexclusive diseases (i.e., the patient is assumed to have exactly one, but only one,of the three diseases in question). Thus, generalizing the notation in part (a), wehave

∑3i=1 πi = 1, where pr(Di) = πi is the pretest probability of having disease

i, i = 1, 2, 3. With θi = pr(T+|Di), i = 1, 2, 3, prove that

pr(D1|T+)

pr(D1|T+)=⎡⎣

3∑

i=2

(π1πi

LR1i

)−1⎤⎦

−1

,

where LR1i = θ1/θi, i = 2, 3. Further, prove that the posttest probability of havingdisease 1 is

pr(D1|T+) =⎡⎣1 +

3∑

i=2

(π1πi

LR1i

)−1⎤⎦

−1

.

(c) As a numerical example, consider an emergency room physician attending apatient presenting with acute abdominal pain. This physician is considering theuse of a new diagnostic test which will be employed to classify patients into one ofthree mutually exclusive categories: non-specific abdominal pain (NS), appendici-tis (A), or cholecystitis (C). The published paper describing this new diagnostictest reports that a positive test result gives a likelihood ratio for diagnosing NSversus Aof 0.30, namely, pr(T+|NS)/pr(T+|A) = 0.30. Also, the likelihood ratio fordiagnosing NS versus C is 0.50, and the likelihood ratio for diagnosing A versusC is 1.67. In addition, a study of a very large number of patients seen in emer-gency rooms revealed that the pre-test probabilities for the three diseases werepr(NS) = 0.57, pr(A) = 0.33, and pr(C) = 0.10. Using all this information, calcu-late for each of these three diseases, the posttest odds and the posttest probabilityof disease. Based on your numerical results, what is the most likely diagnosis (NS,A, or C) for an emergency room patient with a positive test result based on the useof this particular diagnostic test?

Exercise 1.24∗. In medicine, it is often of interest to assess whether two distinct diseases(say, disease A and disease B) tend to occur together. The odds ratio parameter ψ isdefined as

ψ = pr(A|B)/pr(A|B)

pr(A|B)/pr(A|B),

and serves as one statistical measure of the tendency for diseases A and B to occurtogether. An observed value of ψ significantly greater than 1 may suggest that diseases

Page 34: Exercises and Solutions in Biostatistical Theory (2010)

Exercises 15

A and B have a common etiology, which could lead to better understanding of diseaseprocesses and, ultimately, to prevention.

Suppose, however, that the diagnosis of the presence of diseases A and B involvesthe presence of a third factor (say, C). An example would be where a person with anabnormally high cholesterol level would then be evaluated more closely for evidenceof both ischemic heart disease and hypothyroidism. In such a situation, one is actuallyconsidering the odds ratio

ψc = pr(A ∩ C|B ∩ C)/pr(A ∩ C|B ∩ C)

pr(A ∩ C|B ∩ C)/pr(A ∩ C|B ∩ C),

which is a measure of the association between diseases A and B each observedsimultaneously with factor C.

(a) Show that ψ and ψc are related by the equation

ψc = ψ

[pr(C|A ∩ B)pr(C|A ∩ B)

pr(C|A ∩ B)pr(C|A ∩ B)

][1 + pr(C)

pr(A ∩ B ∩ C)

].

(b) If A, B, and C actually occur completely independently of one another, how are ψ

and ψc related? Comment on the direction of the bias when using ψc instead of ψ

as the measure of association between diseases A and B.

Exercise 1.25∗. Suppose that a certain process generates a sequence of (s + t) outcomesof two types, say, s successes (denoted as S’s) and t failures (denoted as F’s). A runis a subsequence of outcomes of the same type which is both preceded and succeededby outcomes of the opposite type or by the beginning or by the end of the completesequence. For example, consider the sequence

SSFSSSFSFSFFS

of s = 8 successes and t = 5 failures. When rewritten as

SS|F|SSS|F|S|F|S|FF|S,

it is clear that this particular sequence contains a total of nine runs, namely, five S runs(three of length 1, one of length 2, and one of length 3) and four F runs (three of length1 and one of length 2).

Since the S runs and F runs alternate in occurrence, the number of S runs differs byat most one from the number of F runs.

(a) Assuming that all possible sequences of s successes and t failures are equally likelyto occur, derive an expression for the probability πx that any sequence contains atotal of exactly x runs. HINT: Consider separately the two situations where x is aneven positive integer and where x is an odd positive integer.

(b) For each year over a 7-year period of time, a certain cancer treatment centerrecorded the percentage of pancreatic cancer patients who survived at least 5years following treatment involving both surgery and chemotherapy. For each

Page 35: Exercises and Solutions in Biostatistical Theory (2010)

16 Basic Probability Theory

of the seven years, let the event S be the event that the survival percentage is atleast 20%, and let the event F = S. Suppose that the following sequence (orderedchronologically) is observed:

FFSFSSS.

Does this observed sequence provide evidence of a nonrandom pattern of 5-yearsurvival percentages over the 7-year period of time?For additional information about the theory of runs, see Feller (1968).

Exercise 1.26∗. Consider the following experiment designed to examine whether ahuman subject has extra-sensory perception (ESP). A set of R (R > 2) chips, numberedindividually from 1 to R, is arranged in random order by an examiner, and this randomorder cannot be seen by the subject under study. Then, the subject is given an identicalset of R chips and is asked to arrange them in exactly the same order as the randomorder constructed by the experimenter.

(a) Develop an expression for the probability θ(0, R) that the subject has no chipsin their correct positions (i.e., in positions corresponding to the chip positionsconstructed by the experimenter).Also, find the limiting value of θ(0, R) as R → ∞,and then comment on your finding.

(b) For r = 0, 1, 2, . . . , R, use the result in part (a) to develop an expression for theprobability θ(r, R) that the subject has exactly r out of R chips in their correctpositions.

(c) Assuming that R = 5, what is the probability that the subject places at least 3 chipsin their correct positions?

Exercise 1.27∗. Suppose that two players (denoted Player A and Player B) play a gamewhere they alternate flipping a balanced coin, with the winner of the game being thefirst player to obtain k heads (where k is a known positive integer).

(a) With a and b being positive integers, let (a, b, A) denote that specific game wherePlayer A needs a heads to win, where Player B needs b heads to win, and where itis Player A’s turn to flip the balanced coin. Similarly, let (a, b, B) denote that specificgame where Player A needs a heads to win, where Player B needs b heads to win,and where it is Player B’s turn to flip the balanced coin. Also, let π(a, b, A) be theprobability that Player A wins game (a, b, A), and let π(a, b, B) be the probabilitythat Player A wins game (a, b, B). Show that

π(a, b, A) =(

23

)π(a − 1, b, B) +

(13

)π(a, b − 1, A),

and that

π(a, b, B) =(

13

)π(a − 1, b, B) +

(23

)π(a, b − 1, A).

(b) Assuming that Player A goes first in any game, find the exact numerical values ofthe probabilities that A wins the game when k = 2 and when k = 3. In other words,find the exact numerical values of π(2, 2, A) and π(3, 3, A).

Page 36: Exercises and Solutions in Biostatistical Theory (2010)

Solutions 17

Exercise 1.28∗. The first author (LLK) has been a University of North Carolina (UNC)Tar Heel basketball fan for close to 50 years. This exercise is dedicated to LLK’s alltimefavorite Tar Heel basketball player, Tyler Hansbrough; Tyler is the epitome of a student-athlete, and he led the Tar Heels to the 2009 NCAADivision I men’s basketball nationalchampionship. During his 4-year career, Tyler also set numerous UNC, ACC, andNCAA individual records.

In the questions to follow, assume that Tyler has a fixed probability π, 0 < π < 1,of making any particular free throw, and also assume that the outcome (i.e., either amake or a miss) for any one free throw is independent of the outcome for any otherfree throw.

(a) Given that Tyler starts shooting free throws, derive a general expression (as afunction of π, a, and b) for the probability θ(π, a, b) that Tyler makes a consecutivefree throws before he misses b consecutive free throws, where a and b are positiveintegers. For his 4-year career at UNC, Tyler’s value of π was 0.791; using thisvalue of π, compute the numerical value of the probability that Tyler makes 10consecutive free throws before he misses two consecutive free throws.

HINT: Let Aab be the event that Tyler makes a consecutive free throws beforehe misses b consecutive free throws, let Ba be the event that Tyler makes the firsta free throws that he attempts, and let Cb be the event that Tyler misses the first bfree throws that he attempts. Express α = pr(Aab|B1) as a function of both π andβ = pr(Aab|B1), express β as a function of both π and α, and then use the fact thatθ(π, a, b) = pr(Aab) = πα + (1 − π)β.

(b) Find the value of θ(π, a, b) when both π = 0.50 and a = b; also, find the valueof θ(π, a, b) when a = b = 1. For these two special cases, do these answers makesense? Also, comment on the reasonableness of any assumptions underlying thedevelopment of the expression for θ(π, a, b).

(c) If Tyler continues to shoot free throws indefinitely, show that he must eventuallyeither make a consecutive free throws or miss b consecutive free throws.

SOLUTIONS

Solution 1.1

(a) Let Dij be the event that die #1 shows the number i and that die #2 showsthe number j, i = 1, 2, . . . , 6 and j = 1, 2, . . . , 6. Clearly, these 36 events form thefinest partition of the set of possible experimental outcomes. Thus, it follows thatpr(Ex) =∑∗ pr(Dij), where pr(Dij) = 1

36 for all i and j, and where∑∗ indicates

summation over all (i, j) pairs for which (i + j) = x.For example,

pr(E6) = pr(D15) + pr(D51) + pr(D24) + pr(D42) + pr(D33) = 536

.

In general,

pr(Ex) = min {(x − 1), (13 − x)}36

, x = 2, 3, . . . , 12.

Page 37: Exercises and Solutions in Biostatistical Theory (2010)

18 Basic Probability Theory

(b) Note that

A = E4 ∪ E8 ∪ E12, B = E10 ∪ E11 ∪ E12, and

C = E4 ∪ E6 ∪ E8 ∪ E9 ∪ E10 ∪ E12,

so that C = E2 ∪ E3 ∪ E5 ∪ E7 ∪ E11.So, it follows directly that pr(A) = 1

4 , pr(B) = 16 , and pr(C) = 7

12 . Also, A ∩ B =E12, so that pr(A ∩ B) = 1

36 ; A ∩ C = E4 ∪ E8 ∪ E12, so that pr(A ∩ C) = 14 ; B ∩ C =

E10 ∪ E12, so that pr(B ∩ C) = 19 ; A ∩ B ∩ C = E12, so that pr(A ∩ B ∩ C) = 1

36 ; and,A ∪ B ∪ C = E4 ∪ E6 ∪ E8 ∪ E9 ∪ E10 ∪ E11 ∪ E12, so that pr(A ∪ B ∪ C) = 23

36 .Also,

pr(A ∪ B|C) = pr(A|C) + pr(B|C) − pr(A ∩ B|C)

= pr(A ∩ C)

pr(C)+ pr(B ∩ C)

pr(C)− pr(A ∩ B ∩ C)

pr(C)

=14712

+19712

−1

36712

= 47

.

Finally,

pr(A|B ∪ C) = pr[A ∩ (B ∪ C)]pr(B ∪ C)

= pr[(A ∩ B) ∪ (A ∩ C)]pr(B ∪ C)

= pr(A ∩ B) + pr(A ∩ C) − pr(A ∩ B ∩ C)

pr(B) + pr(C) − pr(B ∩ C).

Since pr(A ∩ C) = pr(A ∩ B ∩ C) = 0 and since pr(B ∩ C) = pr(E11) = 236 , we

obtain

pr(A|B ∪ C) = pr(A ∩ B)

pr(B) + pr(C) − pr(B ∩ C)=

136(

16 + 5

12 − 236

) = 119

.

Solution 1.2. Let θn be the probability that a family with n children has at least onemale child and at least one female child among these n children. Further, let Mn be theevent that all n children are male, and let Fn be the event that all n children are female.And, note that the events Mn and Fn are mutually exclusive. Then,

θn = 1 − pr(Mn ∪ Fn) = 1 − pr(Mn) − pr(Fn)

= 1 −(

12

)n−(

12

)n= 1 −

(12

)n−1.

So, we need to find the smallest value of n, say n∗, such that

θn = 1 −(

12

)n−1≥ 0.90.

It then follows that n∗ = 5.

Page 38: Exercises and Solutions in Biostatistical Theory (2010)

Solutions 19

Solution 1.3. Define the following events:

W1: a white ball is selected from Urn 1;

W2: a white ball is selected from Urn 2;

W3: two white balls are selected from Urn 3;

B1: a black ball is selected from Urn 1;

B2: a black ball is selected from Urn 2.

Then,

pr(W3) = pr(W1 ∩ W2 ∩ W3) + pr(W1 ∩ B2 ∩ W3)

+ pr(B1 ∩ W2 ∩ W3) + pr(B1 ∩ B2 ∩ W3)

= pr(W1)pr(W2|W1)pr(W3|W1 ∩ W2) + pr(W1)pr(B2|W1)

× pr(W3|W1 ∩ B2) + pr(B1)pr(W2|B1)pr(W3|B1 ∩ W2)

+ pr(B1)pr(B2|B1)pr(W3|B1 ∩ B2)

= (3/7)(3/6)[(5/7)(4/6)] + (3/7)(3/6)[(4/7)(3/6)]+ (4/7)(2/6)[(5/7)(4/6)] + (4/7)(4/6)[(4/7)(3/6)] = 0.3628.

Solution 1.4

(a) pr(final match lasts exactly 6 games) = pr[(Player A wins 4 of first 5 games)∩ (Player A wins sixth game)] + pr[(Player B wins 4 of first 5 games) ∩ (Player

B wins sixth game)] =[C5

4π4(1 − π)](π) +

[C5

4(1 − π)4π](1 − π).

(b) pr[(Player A wins match in 7 games)|(Player A wins first 2 games)]

= pr[(Player A wins match in 7 games) ∩ (Player A wins first 2 games)]pr(Player A wins first 2 games)

Since

pr[(Player A wins match in 7 games) ∩ (Player A wins first 2 games)]

= pr[(Player A wins two of games #3 through #6)

∩ (Player A wins game #7) ∩ (Player A wins first 2 games)]

= pr(Player A wins two of games #3 through #6)

× pr(Player A wins game #7)pr(Player A wins first 2 games),

it follows that

pr[(Player A wins match in 7 games)|(Player A wins first 2 games)]

= pr(Player A wins two of games #3 through #6)

× pr(Player A wins game #7)

=[C4

2π2(1 − π)2](π) = C4

2π3(1 − π)2.

Page 39: Exercises and Solutions in Biostatistical Theory (2010)

20 Basic Probability Theory

(c)

pr(Player B wins final match)

= pr[∪9j=5(Player B wins match in j games)]

=9∑

j=5

pr[(Player B wins 4 of first(j − 1)games)

∩ (Player B wins jth game)]

=9∑

j=5

[Cj−14 (1 − π)4πj−5](1 − π)

=(

1 − π

π

)5 9∑

j=5

Cj−14 πj.

Solution 1.5

(a) Define the following events:

D: “a person has the disease of interest”

A+: “Test A is positive”

B+: “Test B is positive”

Then,

pr(D) = 0.01

pr(A+|D) = 1 − 0.10 = 0.90,

pr(B+|D) = 1 − 0.05 = 0.95,

pr(A+|D) = 0.06,

and

pr(B+|D) = 0.08.

So,

pr(D|A+ ∩ B+)

= pr(D ∩ A+ ∩ B+)

pr(A+ ∩ B+)

= pr(A+ ∩ B+|D)pr(D)

pr(A+ ∩ B+|D)pr(D) + pr(A+ ∩ B+|D)pr(D)

Page 40: Exercises and Solutions in Biostatistical Theory (2010)

Solutions 21

= pr(A+|D)pr(B+|D)pr(D)

pr(A+|D)pr(B+|D)pr(D) + pr(A+|D)pr(B+|D)pr(D)

= (0.90)(0.95)(0.01)

(0.90)(0.95)(0.01) + (0.06)(0.08)(0.99)

= 0.00860.0086 + 0.0048

= 0.6418.

(b)

pr(B+|A+) = pr(A+ ∩ B+)

pr(A+)

= pr(A+ ∩ B+ ∩ D) + pr(A+ ∩ B+ ∩ D)

pr(A+ ∩ D) + pr(A+ ∩ D)

= pr(A+|D)pr(B+|D)pr(D) + pr(A+|D)pr(B+|D)pr(D)

pr(A+|D)pr(D) + pr(A+|D)pr(D)

= (0.90)(0.95)(0.01) + (0.06)(0.08)(0.99)

(0.90)(0.01) + (0.06)(0.99)

= 0.0086 + 0.00480.0090 + 0.0594

= 0.1959.

(c)

pr(A+ ∪ B+|D)

= pr[(A+ ∪ B+) ∩ D]pr(D)

= pr[(A+ ∩ D) ∪ (B+ ∩ D)]pr(D)

= pr(A+ ∩ D) + pr(B+ ∩ D) − pr(A+ ∩ B+ ∩ D)

pr(D)

= pr(A+|D)pr(D) + pr(B+|D)pr(D) − pr(A+|D)pr(B+|D)pr(D)

pr(D)

= (0.90)(0.01) + (0.95)(0.01) − (0.90)(0.95)(0.01)

0.01

= 0.0090 + 0.0095 − 0.00860.01

= 0.00990.01

= 0.9900.

Solution 1.6

(a) For i = 1, 2, 3, let Mi be the event that “machine Mi performs the PSA analysis”;and, let C be the event that “the PSA analysis is done correctly.” Then,

pr(C) = pr(C ∩ M1) + pr(C ∩ M2) + pr(C ∩ M3)

Page 41: Exercises and Solutions in Biostatistical Theory (2010)

22 Basic Probability Theory

= pr(C|M1)pr(M1) + pr(C|M2)pr(M2) + pr(C|M3)pr(M3)

= 0.99(0.20) + (0.98)(0.50) + 0.97(0.30) = 0.979.

(b)

pr(M1 ∪ M2|C) = pr[(M1 ∪ M2) ∩ C]pr(C)

= pr[(M1 ∩ C) ∪ (M2 ∩ C)]1 − pr(C)

= pr(M1 ∩ C) + pr(M2 ∩ C)

1 − pr(C)

= pr(C|M1)pr(M1) + pr(C|M2)pr(M2)

1 − pr(C)

= (0.01)(0.20) + (0.02)(0.50)

1 − 0.979= 0.5714.

Equivalently,

pr(M1 ∪ M2|C) = 1 − pr(M3|C) = 1 − pr(C|M3)pr(M3)

pr(C)

= 1 − (0.03)(0.30)

0.021= 0.5714.

(c)

pr(1 of 2 PSA analyses is correct) = C21 pr(C)pr(C)

= 2(0.979)(0.021) = 0.0411.

Now, pr(machine M2 did not perform both PSA analyses|1 of 2 PSA analyses iscorrect)

= 1 − pr(machine M2 performed both PSA analyses|1 of 2

PSA analyses is correct)

= 1 −[C2

1 pr(C|M2)pr(C|M2)][pr(M2)]2

0.0411

= 1 − 2(0.98)(0.02)(0.50)2

0.0411= 0.7616.

Solution 1.7

(a) First, pr(C2|C1) = pr(C1 ∩ C2)/pr(C1). Now,

pr(C1) = pr(C2) = pr(C2 ∩ D) + pr(C2 ∩ D)

= pr(C2|D)pr(D) + pr(C2|D)pr(D)

= π1θ + π0(1 − θ) = θ(π1 − π0) + π0.

Page 42: Exercises and Solutions in Biostatistical Theory (2010)

Solutions 23

And, appealing to the conditional independence of the events C1 and C2 givendisease status, we have

pr(C1 ∩ C2) = pr(C1 ∩ C2|D)pr(D) + pr(C1 ∩ C2|D)pr(D)

= pr(C1|D)pr(C2|D)pr(D) + pr(C1|D)pr(C2|D)pr(D)

= π21θ + π2

0(1 − θ) = θ(π21 − π2

0) + π20.

Finally,

pr(C2|C1) = θ(π21 − π2

0) + π20

θ(π1 − π0) + π0,

so that, in this example, pr(C2|C1) = pr(C2). More generally, this particularexample illustrates the general principle that conditional independence betweentwo events does not allow one to conclude that they are also unconditionallyindependent.

(b) Now,

pr(C2|C1) = pr(C2) ⇔ θ(π21 − π2

0) + π20 = [θ(π1 − π0) + π0]2 ,

which is equivalent to the condition

θ(1 − θ)(π1 − π0)2 = 0.

So, pr(C2|C1) = pr(C2) when either θ = 0 (i.e., the prevalence of the disease in thepopulation is equal to zero, so that nobody in the population has the disease), θ = 1(i.e., the prevalence of the disease in the population is equal to one, so that every-body in the population has the disease), or the probability of a correct diagnosisdoes not depend on disease status [i.e., since pr(C1) = pr(C2) = θ(π1 − π0) + π0,the condition π1 = π0 gives pr(C1) = pr(C2) = π1 = π0].

Solution 1.8. Now, πk = 1 − pr(no matching sets of 5 numbers in k drawings), so that

πk = 1 −(

C405 − 1

C405

)(C40

5 − 2

C405

)· · ·(

C405 − (k − 1)

C405

)

= 1 −∏k−1

j=1

(C40

5 − j)

(C40

5

)(k−1).

Solution 1.9

(a) For i = 1, 2, . . . , k, let Ai be the event that the ith person calls a dental office that isdifferent from the dental offices called by the preceding (i − 1) people. Then,

α = pr(∩k

i=1Ai

)

Page 43: Exercises and Solutions in Biostatistical Theory (2010)

24 Basic Probability Theory

=(n

n

)(n − 1n

)(n − 2

n

)· · ·[

n − (k − 1)

n

]

= [n!/(n − k)!]nk

.

When n = 7 and k = 4, then α = 0.350.

(b) For j = 1, 2, . . . , n, let Bj be the event that all k people call the jth dental office. Then,

β = pr(∪n

j=1Bj

)=

n∑

j=1

pr(Bj)

=n∑

j=1

(1n

)k= 1

nk−1.

When n = 7 and k = 4, then β = 0.003.

Solution 1.10. For j = 0, 1, . . . , (k − 2), there are exactly (k − j − 1) pairs of slots forwhich the integer 1 precedes the integer k and for which there are exactly j integersbetween the integers 1 and k. Also, the integer k can precede the integer 1, and theother (k − 2) integers can be arranged in the remaining (k − 2) slots in (k − 2)! ways.So,

θj = 2(k − j − 1)[(k − 2)!]k! = 2(k − j − 1)

k(k − 1), j = 0, 1, . . . , (k − 2).

Solution 1.11. For i = 1, 2, . . . , 6, let Ai be the event that the number i does not appear

in n rolls of this balanced die. Then, θn = 1 − pr(∪6

i=1Ai

), where pr

(∪6

i=1Ai

)may be

calculated using Result (ii) on page 4. By symmetry, pr(A1) = pr(A2) = · · · = pr(A6)

and pr(Ai1 ∩ Ai2 ∩ · · · ∩ Aik ) = pr(∩ki=1Ai), (1 ≤ i1 < i2 < · · · < ik ≤ 6). Thus:

pr(∪6

i=1Ai

)= C6

1[pr(A1)] − C62[pr(∩2

i=1Ai)] + C63[pr(∩3

i=1Ai)]

− C64[pr(∩4

i=1Ai)] + C65[pr(∩5

i=1Ai)]

= 6(

56

)n− 15

(46

)n+ 20

(36

)n− 15

(26

)n+ 6

(16

)n.

When n = 10, θ10 ≈ (1 − 0.73) = 0.27.

Page 44: Exercises and Solutions in Biostatistical Theory (2010)

Solutions 25

Solution 1.12. Let Ai denote the event that the first i numbers selected are differentfrom one another, i = 2, 3, . . . , n. Note that An ⊂ An−1 ⊂ · · · ⊂ A3 ⊂ A2. So,

pr(An) = pr(first n numbers selected are different from one another)

= pr

⎡⎣

n⋂

i=2

Ai

⎤⎦

= pr(A2)pr(A3|A2)pr(A4|A2 ∩ A3) · · · pr

⎡⎣An

∣∣∣∣∣∣n−1⋂

i=2

Ai

⎤⎦

=(

1 − 1N

)(1 − 2

N

)(1 − 3

N

)· · ·[

1 − (n − 1)

N

]

=(n−1)∏

j=1

(1 − j

N

)=(

N − 1N

)(N − 2

N

)· · ·[

N − (n − 1)

N

]

= N!(N − n)! Nn .

For N = 10 and n = 4,

pr(A4) =3∏

j=1

(1 − j

10

)=(

1 − 110

)(1 − 2

10

)(1 − 3

10

)

= 10!(10 − 4)!(10)4 = 0.504.

Solution 1.13

(a) We have

θwr = 1 − pr(all n numbers have values less than N)

= 1 −(

N − 1N

)n.

(b) We have

θwor = 1 − pr(all n numbers have values less than N)

= 1 − CN−1n C1

0CN

n

= 1 − (N − n)

N= n

N.

Page 45: Exercises and Solutions in Biostatistical Theory (2010)

26 Basic Probability Theory

(c) First, note that

δn = (θwor − θwr) = nN

−[

1 −(

N − 1N

)n]

=(

N − 1N

)n−(

N − nN

).

Now, for n = 2,

δ2 =(

N − 1N

)2−(

N − 2N

)

= 1N2

[(N − 1)2 − N(N − 2)

]= 1

N2 > 0.

Then, assuming δn > 0, we have

δn+1 =(

N − 1N

)n+1−[

N − (n + 1)

N

]

=(

N − 1N

)n (N − 1N

)−(

N − nN

)+ 1

N

=(

N − 1N

)n−(

N − 1N

)n ( 1N

)−(

N − nN

)+ 1

N

=[(

N − 1N

)n−(

N − nN

)]+ 1

N

[1 −

(N − 1

N

)n]

= δn +(

1N

)[1 −

(N − 1

N

)n]> 0,

which completes the proof by induction.Therefore, sampling without replacement has a higher probability than sam-

pling with replacement of selecting the ball labeled with the number N.

Solution 1.14

(a) Let T1 be the event that tunnel 1 is closed to traffic, let T2 be the event that tunnel2 is closed to traffic, and let T3 be the event that tunnel 3 is closed to traffic. If α

is the probability that exactly one tunnel is closed to traffic, then, since the eventsT1, T2, and T3 are mutually independent, it follows that

α = pr(T1)pr(T2)pr(T3) + pr(T1)pr(T2)pr(T3)

+ pr(T1)pr(T2)pr(T3)

= π1(1 − π2)(1 − π3) + (1 − π1)π2(1 − π3)

+ (1 − π1)(1 − π2)π3.

Page 46: Exercises and Solutions in Biostatistical Theory (2010)

Solutions 27

(b) Define the following five mutually exclusive events:

A: no inclement weather and all tunnels open to traffic

B: inclement weather and all tunnels are open to traffic

C: inclement weather and at least one tunnel is closed to traffic

D: no inclement weather and exactly one tunnel is closed to traffic

E: no inclement weather and at least two tunnels are closed to traffic

If β is the probability of an excellent traffic flow rate, then

β = pr(A) = (1 − θ)(1 − π1)(1 − π2)(1 − π3).

And, if γ is the probability of a marginal traffic flow rate, then

γ = pr(B) + pr(D) = θ(1 − π1)(1 − π2)(1 − π3) + (1 − θ)α.

Finally, if δ is the probability of a poor traffic flow rate, then

δ = pr(C) + pr(E) = θ[1 − (1 − π1)(1 − π2)(1 − π3)]+ (1 − θ)[1 − (1 − π1)(1 − π2)(1 − π3) − α] = 1 − β − γ.

(c) Using the event definitions given in part (b), we have

pr(B|B ∪ D) = pr[B ∩ (B ∪ D)]pr(B ∪ D)

= pr[B ∪ (B ∩ D)]γ

= pr(B)

γ

= θ(1 − π1)(1 − π2)(1 − π3)

γ.

Solution 1.15

(a) Let Ax be the event that it takes x tosses of this unbalanced coin to obtain the firsthead. Then,

pr(Ax) = pr{[first (x − 1) tosses are tails] ∩ [xth toss is a head]}= (1 − π)x−1π, x = 1, 2, . . . , ∞.

Now, letting θ = pr(Bonnie and Clyde each require the same number of tosses toobtain the first head), we have

θ = pr{∪∞x=1[(Bonnie requires x tosses) ∩ (Clyde requires x tosses)]}

=∞∑

x=1

[pr(Ax)]2 =∞∑

x=1

[(1 − π)x−1π]2

Page 47: Exercises and Solutions in Biostatistical Theory (2010)

28 Basic Probability Theory

=(

π

1 − π

)2 ∞∑

x=1

[(1 − π)2]x =(

π

1 − π

)2[

(1 − π)2

1 − (1 − π)2

]

= π

(2 − π).

(b) By symmetry, pr(Bonnie requires more tosses than Clyde to obtain the first head) =pr(Clyde requires more tosses than Bonnie to obtain the first head) = γ, say. Thus,since (2γ + θ) = 1, it follows that

γ = (1 − θ)

2=

1 −(

π2−π

)

2= (1 − π)

(2 − π).

To illustrate a more complicated approach,

γ =∞∑

x=1

∞∑

y=x+1

pr[(Clyde needs x tosses for first head) ∩ (Bonnie needs

y tosses for first head)]

=∞∑

x=1

[(1 − π)x−1π]∞∑

y=x+1

[(1 − π)y−1π]

= π2

(1 − π)

∞∑

x=1

(1 − π)x[

(1 − π)x

1 − (1 − π)

]

= π

(1 − π)

∞∑

x=1

[(1 − π)2]x = π

(1 − π)

[(1 − π)2

1 − (1 − π)2

]

= (1 − π)

(2 − π).

Solution 1.16. For x = 0, 1, . . . , 5, let A be the event that exactly 5 of these 15 studentsscored higher than the 80-th percentile, and let Bx be the event that exactly x femalesand exactly (5 − x) males scored higher than the 80-th percentile.

So,

pr(Bx|A) = pr(A ∩ Bx)

pr(A)= pr(Bx)

pr(A)

={[

C8xπx(1 − π)8−x

] [C7

5−xπ5−x(1 − π)2+x]}

C155 π5(1 − π)10

= C8xC7

5−x

C155

, x = 0, 1, . . . , 5.

Thus,

θ =5∑

x=3

C8xC7

5−x

C155

= (1176 + 490 + 56)

3003= 0.5734.

Page 48: Exercises and Solutions in Biostatistical Theory (2010)

Solutions 29

Solution 1.17. For i = 1, 2, 3, 4, let the event Ai be the event that a hand contains allthree face cards of the ith suit. Note that the event of interest is ∪4

i=1Ai. So,

pr(Ai) = C33C49

10C52

13, i = 1, 2, 3, 4.

For i = j,

pr(Ai ∩ Aj) = C66C46

7C52

13;

for i = j = k,

pr(Ai ∩ Aj ∩ Ak) = C99C43

4C52

13;

and, for i = j = k = l,

pr(Ai ∩ Aj ∩ Ak ∩ Al) = C1212C40

1C52

13.

Then, using Result (ii) on page 4, we have

pr(A) = pr(∪4

i=1Ai

)=∑4

m=1 C4m(−1)m−1C52−3m

13−3m

C5213

,

which is equal to 0.0513.

Solution 1.18∗. Let W denote the event that the player wins the game and let X denotethe number obtained on the first roll. So,

pr(W) =12∑

x=2

pr(W|X = x)pr(X = x)

=12∑

x=2

pr(W|X = x)

[min(x − 1, 13 − x)

36

]

= (0)136

+ (0)2

36+

6∑

x=4

pr(W|X = x)(x − 1)

36

+ (1)636

+10∑

x=8

pr(W|X = x)(13 − x)

36+ (1)

236

+ (0)1

36

= 29

+ 26∑

x=4

pr(W|X = x)(x − 1)

36,

Page 49: Exercises and Solutions in Biostatistical Theory (2010)

30 Basic Probability Theory

since the pairs of numbers “4 and 10,” “5 and 9,” and “6 and 8” lead to the same result.So, for x = 4, 5, or 6, let πx = pr(number x is rolled before number 7 is rolled). Then,

πx =∞∑

j=1

[pr(any number but x or 7 is rolled)](j−1) pr(number x is rolled)

=∞∑

j=1

[1 − (x − 1)

36− 6

36

](j−1) (x − 1)

36

= (x − 1)

36

∞∑

j=1

(31 − x

36

)(j−1)

= (x − 1)

36

[1 − (31 − x)

36

]−1= (x − 1)

(x + 5), x = 4, 5, 6.

So,

pr(W) = = 29

+ 118

6∑

x=4

πx(x − 1) = 29

+ 118

6∑

x=4

(x − 1)2

(x + 5)

= 29

+ 118

(99

+ 1610

+ 2511

)= 0.4931.

Thus, the probability of the house winning the game is (1−0.4931)=0.5069; so, asexpected with any casino game, the house always has the advantage. However, rel-ative to many other casino games (e.g., blackjack, roulette, slot machines), the houseadvantage of (0.5069 − 0.4931) = 0.0138 is relatively small.

Solution 1.19∗

(a) Let E be the event that the first worker randomly selected is a highly exposedworker. Then,

θn = (1 − θn−1)pr(E) + θn−1[1 − pr(E)]= (1 − θn−1)πh + θn−1(1 − πh)

= πh + θn−1(1 − 2πh), with θ0 ≡ 1.

(b) Now, assuming that θn = α + βγn and using the result in part (a), we have

α + βγn = πh + (α + βγn−1)(1 − 2πh)

= πh + (1 − 2πh)α + (1 − 2πh)βγn−1,

with the restriction that (α + β) = 1 since θ0 ≡ 1.Thus, we must have α = β = 1

2 and γ = (1 − 2πh), giving

θn = 12

+ 12(1 − 2πh)n, n = 1, 2, . . . , ∞.

Page 50: Exercises and Solutions in Biostatistical Theory (2010)

Solutions 31

Finally, when πh = 0.05, θ50 = 12 + 1

2 [1 − 2(0.05)]50 = 0.5026.

Solution 1.20∗

(a) We have

pr(D|S, x) = pr(D ∩ S|x)

pr(S|x)= pr(S|D, x)pr(D|x)

pr(S|D, x)pr(D|x) + pr(S|D, x)pr(D|x)

=π1

[eβ0+β′x

1+eβ0+β′x

]

π0

[1

1+eβ0+β′x

]+ π1

[eβ0+β′x

1+eβ0+β′x

]

= π1eβ0+β′x

π0 + π1eβ0+β′x

=(

π1π0

)eβ0+β′x

1 +(

π1π0

)eβ0+β′x

= eβ∗0+β′x

1 + eβ∗0+β′x

,

where β∗0 = β0 + ln (π1/π0) .

So, for a case–control study, since β0 = β∗0 − ln (π1/π0), to estimate the risk

pr(D|x) of disease using logistic regression would necessitate either knowing (orbeing able to estimate) the ratio of selection probabilities, namely, the ratio π1/π0.

(b) Sincepr(D|x)

pr(D|x)= eβ0+β′x,

it follows directly that

θr = eβ0+β′x∗

eβ0+β′x = eβ′(x∗−x).

Analogously, sincepr(D|S, x)

pr(D|S, x)= eβ∗

0+β′x,

it follows directly that

θc = eβ∗0+β′x∗

eβ∗0+β′x = eβ′(x∗−x) = θr.

Hence, we can, at least theoretically, use case–control study data to estimate riskodds ratios via logistic regression, even though we cannot estimate the risk (orprobability) of disease directly without information about the quantity π1/π0.

Page 51: Exercises and Solutions in Biostatistical Theory (2010)

32 Basic Probability Theory

There are other potential problems with the use of case–control studies in epi-demiologic research. For further discussion about such issues, see Breslow andDay (1980) and Kleinbaum, Kupper, and Morgenstern (1982).

Solution 1.21∗

(a) Let A be the event that Diagnostic Strategy #1 provides the correct diagnosis, letB be the event that Diagnostic Strategy #2 provides the correct diagnosis, and letD be the event that the adult has IBD. Then,

pr(A) = pr(A ∩ D) + pr(A ∩ D) = pr(A|D)pr(D) + pr(A|D)pr(D)

= [3π21(1 − π1) + π3

1]θ + [3π20(1 − π0) + π3

0](1 − θ)

= (3π21 − 2π3

1)θ + (3π20 − 2π3

0)(1 − θ).

And,

pr(B) = pr(B|D)pr(D) + pr(B|D)pr(D) = π1θ + π0(1 − θ).

Now,

pr(A) − pr(B) = [(3π21 − 2π3

1)θ + (3π20 − 2π3

0)(1 − θ)] − [π1θ + π0(1 − θ)]= (3π2

1 − 2π31 − π1)θ + (3π2

0 − 2π30 − π0)(1 − θ)

= π1(1 − π1)(2π1 − 1)θ + π0(1 − π0)(2π0 − 1)(1 − θ).

So, a sufficient condition for the ranges of π1 and π0 so that pr(A)>pr(B) is

12

< π1 < 1 and12

< π0 < 1.

In other words, if each doctor has a better than 50% chance of making the correctdiagnosis conditional on disease status, then Diagnostic Strategy #1 is preferableto Diagnostic Strategy #2.

(b) Let C be the event that Diagnostic Strategy #3 provides the correct diagnosis. Then,

pr(C) = pr(C|D)pr(D) + pr(C|D)pr(D)

= [4π31(1 − π1) + π4

1]θ + [4π30(1 − π0) + π4

0](1 − θ)

= (4π31 − 3π4

1)θ + (4π30 − 3π4

0)(1 − θ).

Since

pr(A) − pr(C) = [(3π21 − 2π3

1) − (4π31 − 3π4

1)]θ+ [3π2

0 − 2π30) − (4π3

0 − 3π40)](1 − θ)

= 3π21(1 − π1)2θ + 3π2

0(1 − π0)2(1 − θ) > 0,

Page 52: Exercises and Solutions in Biostatistical Theory (2010)

Solutions 33

Diagnostic Strategy #1 (using the majority opinion of three doctors) has a higherprobability than Diagnostic Strategy #3 (using the majority opinion of four doctors)of making the correct diagnosis.

Solution 1.22∗. First,

π1 = pr(D|E) = pr(D ∩ E)

pr(E)

= pr(D ∩ E ∩ M) + pr(D ∩ E ∩ M)

pr(E)

= π11pr(E ∩ M) + π10pr(E ∩ M)

pr(E)

= pr(M|E)π11 + pr(M|E)π10.

Similarly,

π0 = pr(M|E)π01 + pr(M|E)π00.

So,

RRc = π1π0

= pr(M|E)π11 + pr(M|E)π10

pr(M|E)π01 + pr(M|E)π00

= pr(M|E)π01RR1 + pr(M|E)π00RR0

pr(M|E)π01 + pr(M|E)π00

= RR

[pr(M|E)π01 + pr(M|E)π00

pr(M|E)π01 + pr(M|E)π00

].

Thus, a sufficient condition for RRc = RR is

pr(M|E)π01 + pr(M|E)π00 = pr(M|E)π01 + pr(M|E)π00,

or equivalently,

[pr(M|E) − pr(M|E)]π01 + [pr(M|E) − pr(M|E)]π00 = 0.

Using the relationships pr(M|E) = 1 − pr(M|E) and pr(M|E) = 1 − pr(M|E) in theabove expression, it follows that RRc = RR when

[pr(M|E) − pr(M|E)](π01 − π00) = 0.

Thus, the two sufficient conditions for no confounding are

pr(M|E) = pr(M|E) and π01 = π00.

Further, since

pr(M) = pr(M|E)pr(E) + pr(M|E)pr(E),

Page 53: Exercises and Solutions in Biostatistical Theory (2010)

34 Basic Probability Theory

the condition pr(M|E) = pr(M|E) means that pr(M) = pr(M|E), or equivalently, thatthe events E and M are independent events.

Finally, the two no confounding conditions are:

(i) The events E and M are independent events;

(ii) pr(D|E ∩ M) = pr(D|E ∩ M).

Solution 1.23∗

(a) First,

pr(D1|T+) = pr(D1 ∩ T+)

pr(T+)= pr(T+|D1)pr(D1)

pr(T+ ∩ D1) + pr(T+ ∩ D1)

= pr(T+|D1)pr(D1)

pr(T+|D1)pr(D1) + pr(T+|D1)pr(D1)

= θ1π1θ1π1 + θ2π2

.

And,

pr(D1|T+) = 1 − pr(D1|T+) = θ2π2θ1π1 + θ2π2

.

Finally,

pr(D1|T+)

pr(D1|T+)= θ1π1

θ2π2= LR12

(π1π2

).

(b) First,

pr(D1|T+) = pr(T+|D1)pr(D1)∑3i=1 pr(T+|Di)pr(Di)

= θ1π1∑3i=1 θiπi

,

and so

pr(D1|T+) = 1 − pr(D1|T+) =∑3

i=2 θiπi∑3i=1 θiπi

.

Finally,

pr(D1|T+)

pr(D1|T+)= θ1π1

θ2π2 + θ3π3= 1

θ2π2θ1π1

+ θ3π3θ1π1

=⎡⎣

3∑

i=2

(π1πi

LR1i

)−1⎤⎦

−1

.

Page 54: Exercises and Solutions in Biostatistical Theory (2010)

Solutions 35

And,

pr(D1|T+) = θ1π1∑3i=1 θiπi

= 1

1 +(

θ1π1θ2π2

)−1 +(

θ1π1θ3π3

)−1

=⎡⎣1 +

3∑

i=2

(π1πi

LR1i

)−1⎤⎦

−1

.

(c) For notational convenience, let π1=pr(NS)=0.57, π2=pr(A)=0.33, π3 = pr (C) =0.10, LR12=pr(T+|NS)/pr(T+|A)=0.30, LR13=pr(T+|NS)/pr(T+|C) = 0.50, andLR23=pr(T+|A)/pr(T+|C)=1.67.

Following the developments given in part (b), it then follows directly that

pr(NS|T+)

pr(NS|T+)= 0.4385 and pr(NS|T+) = 0.3048,

pr(A|T+)

pr(A|T+)= 1.4293 and pr(A|T+) = 0.5883,

andpr(C|T+)

pr(C|T+)= 0.1196 and pr(C|T+) = 0.1069.

Thus, based on this particular diagnostic test, the most likely diagnosis isappendicitis for an emergency room patient with a positive test result.

Solution 1.24∗

(a) The four probabilities appearing in the expression for ψc can be rewritten asfollows:

pr(A ∩ C|B ∩ C) = pr(A ∩ B ∩ C)

pr(B ∩ C)= pr(C|A ∩ B)pr(A|B)pr(B)

pr(B ∩ C)

pr(A ∩ C|B ∩ C) = pr[(A ∪ C) ∩ (B ∩ C)]pr(B ∩ C)

= pr(A ∩ B ∩ C)

pr(B ∩ C)

= pr(C|A ∩ B)pr(A|B)pr(B)

pr(B ∩ C)

pr(A ∩ C|B ∩ C) = pr[(A ∩ C) ∩ (B ∪ C)]pr(B ∩ C)

= pr(A ∩ B ∩ C)

pr(B ∩ C)

= pr(C|A ∩ B)pr(A|B)pr(B)

pr(B ∩ C)

Page 55: Exercises and Solutions in Biostatistical Theory (2010)

36 Basic Probability Theory

and

pr(A ∩ C|B ∩ C) = pr[(A ∪ C) ∩ (B ∪ C)]pr(B ∩ C)

= pr[(A ∩ B) ∪ (A ∩ C) ∪ (B ∩ C) ∪ C]pr(B ∩ C)

= pr(A ∩ B) + pr(C) − pr(A ∩ B ∩ C)

pr(B ∩ C),

since

pr[(A ∩ B) ∪ (A ∩ C) ∪ (B ∩ C) ∪ C]= pr(A ∩ B) + pr(C) − pr(A ∩ B ∩ C)

via use of the general formula for the union of four events.Then, inserting these four expansions into the formula for ψc and simpli-

fying gives the desired result, since pr(A ∩ B) + pr(C) − pr(A ∩ B ∩ C) can berewritten as

pr[(A ∩ B) ∪ C] = pr[(A ∪ C) ∩ (B ∪ C)]= pr[(A ∪ C) ∩ (B ∪ C) ∩ (C ∪ C)]= pr[(A ∩ B ∩ C) ∪ C] = pr(A ∩ B ∩ C) + pr(C).

(b) If events A, B, and C occur completely independently of one another, then ψ =1, pr(C|A ∩ B) = pr(C), so on, so that

ψc = (1)(1)

[1 + pr(C)

pr(A)pr(B)pr(C)

]> 1.

Thus, using ψc instead of ψ introduces a positive bias. So, using ψc could lead tothe false conclusion that diseases A and B are related when, in fact, they are notrelated at all (i.e., ψ = 1).

Solution 1.25∗

(a) First, given that there is a total of (s + t) available positions in a sequence, then sof these (s + t) positions can be filled with the letter S in Cs+t

s ways, leaving theremaining positions to be filled by the letter F. Under the assumption of random-ness, each of these Cs+t

s sequences is equally likely to occur, so that each randomsequence has probability 1/Cs+t

s of occurring.Now, for x an even positive integer, let x = 2y. Since the S and F runs alternate,

there will be exactly y S runs and exactly y F runs, where y = 1, 2, . . . , min(s, t). Thenumber of ways of dividing the s available S letters into y S runs is equal to Cs−1

y−1,which is simply the number of ways of choosing (y − 1) spaces from the (s − 1)

spaces between the s available S letters. Analogously, the t available F letters can be

Page 56: Exercises and Solutions in Biostatistical Theory (2010)

Solutions 37

divided into y runs in Ct−1y−1 ways. Thus, since the first run in the sequence can be

either an S run or an F run, the total number of sequences that each contain exactly2y runs is equal to 2Cs−1

y−1Ct−1y−1. Hence, under the assumption that all sequences

containing exactly s successes (the letter S) and t failures (the letter F) are equallylikely to occur, the probability of observing a sequence containing a total of exactlyx = 2y runs is equal to

π2y =2Cs−1

y−1Ct−1y−1

Cs+ts

, y = 1, 2, . . . , min(s, t).

Now, for x an odd positive integer, let x = (2y + 1). Either there will be (y + 1) Sruns and y F runs, or there will be y S runs and (y + 1) F runs, where y = min(s, t).In the former case, since the complete sequence must begin with an S run, thetotal number of runs will be Cs−1

y Ct−1y−1; analogously, in the latter case, the total

number of runs will be Cs−1y−1Ct−1

y . Hence, under the assumption that all sequencescontaining exactly s successes (the letter S) and t failures (the letter F) are equallylikely to occur, the probability of observing a sequence containing a total of exactlyx = (2y + 1) runs is equal to

π2y+1 =Cs−1

y Ct−1y−1 + Cs−1

y−1Ct−1y

Cs+ts

, y = 1, 2, . . . , min(s, t),

whereCs−1

y ≡ 0 when y = s and Ct−1y ≡ 0 when y = t.

(b) For the observed sequence, s = 4 and t = 3. Also, the observed total number ofruns x is equal to 4; in particular, there are two S runs, one of length 1 and one oflength 3, and there are two F runs, one of length 1 and one of length 2. Using theformula π2y with y = 2 gives

π4 = 2C31C2

1C7

4= 12

35= 0.343.

Since this probability is fairly large, there is no statistical evidence that the observedsequence represents a deviation from randomness.

Solution 1.26∗

(a) For i = 1, 2 . . . , R, let Ai be the event that the subject’s ith chip is in its correctposition. Then,

θ(0, R) = 1 − pr(∪R

i=1Ai

)= 1 −

R∑

i=1

pr(Ai) +R−1∑

i=1

R∑

j=i+1

pr(Ai ∩ Aj)

−R−2∑

i=1

R−1∑

j=i+1

R∑

k=j+1

pr(Ai ∩ Aj ∩ Ak) + · · · + (−1)Rpr(∩R

i=1Ai

).

Page 57: Exercises and Solutions in Biostatistical Theory (2010)

38 Basic Probability Theory

Now, for all i, pr(Ai) = 1/R = (R − 1)!/R!. For all i < j,

pr(Ai ∩ Aj) = pr(Ai)pr(Aj|Ai) =(

1R

)(1

R − 1

)= (R − 2)!

R! .

And, for i < j < k,

pr(Ai ∩ Aj ∩ Ak) = pr(Ai)pr(Aj|Ai)pr(Ak|Ai ∩ Aj)

=(

1R

)(1

R − 1

)(1

R − 2

)= (R − 3)!

R! .

In general, for r = 1, 2, . . . , R, the probability of the intersection of any subset of rof the R events A1, A2, . . . , AR is equal to (R − r)!/R!. Thus, we have

θ(0, R) = 1 − CR1

(R − 1)!R! + CR

2(R − 2)!

R!

− CR3

(R − 3)!R! + · · · + (−1)R

R!

= 1 − 1 + 12! − 1

3! + · · · + (−1)R

R!

=R∑

l=0

(−1)l

l! .

So,

limR→∞θ(0, R) = limR→∞R∑

l=0

(−1)l

l!

=∞∑

l=0

(−1)l

l! = e−1 ≈ 0.368,

which is a somewhat counterintuitive answer.

(b) For a particular set of r chips, let the event Br be the event that these r chips areall in their correct positions, and let CR−r be the event that none of the remaining(R − r) chips are in their correct positions. Then,

pr(Br ∩ CR−r) = pr(Br)pr(CR−r)

={(

1R

)(1

R − 1

)· · ·[

1R − (r − 1)

]}θ(0, R − r)

=[

(R − r)!R!

]R−r∑

l=0

(−1)l

l! .

Page 58: Exercises and Solutions in Biostatistical Theory (2010)

Solutions 39

Finally, since there are CRr ways of choosing a particular set of r chips from a total

of R chips, it follows directly that

θ(r, R) = CRr

[(R − r)!

R!]R−r∑

l=0

(−1)l

l!

=∑R−r

l=0 (−1)l/l!r! , r = 0, 1, . . . , R.

(c) The probability of interest is

θ(3, 5) + θ(4, 5) + θ(5, 5) =5∑

r=3

5−r∑

l=0

(−1)l/l!r!

=(

13!)(

12!)

+ 0 +(

15!)

(1)

= 112

+ 1120

= 11120

= 0.0917.

Note, in general, that θ(r − 1, R) ≡ 0 and that∑R

r=0 θ(r, R) = 1.

Solution 1.27∗

(a) First, let HA be the event that Player A obtains a head before Player B when it isPlayer A’s turn to flip the balanced coin. In particular, if H is the event that a headis obtained when the balanced coin is flipped, and if T is the event that a tail isobtained, then

pr(HA) = pr(H) + pr(T ∩ T ∩ H) + pr(T ∩ T ∩ T ∩ T ∩ H) + · · ·

= 12

+ 18

+ 132

+ · · · = 23

.

And, if HB is the event that Player A obtains a head before Player B when it isPlayer B’s turn to flip the balanced coin, then

pr(HB) = pr(T ∩ H) + pr(T ∩ T ∩ T ∩ H)

+ pr(T ∩ T ∩ T ∩ T ∩ T ∩ H) + · · ·

= 14

+ 116

+ 164

+ · · · = 13

.

Then, we move from game (a, b, A) to game (a − 1, b, B) if Player A obtains the nexthead before Player B (an event that occurs with probability 2/3); and, we movefrom game (a, b, A) to game (a, b − 1, A) if Player B obtains the next head beforePlayer A (an event that occurs with probability 1/3).Thus, we have

π(a, b, A) =(

23

)π(a − 1, b, B) +

(13

)π(a, b − 1, A).

Page 59: Exercises and Solutions in Biostatistical Theory (2010)

40 Basic Probability Theory

Using analogous arguments, we obtain

π(a, b, B) =(

13

)π(a − 1, b, B) +

(23

)π(a, b − 1, A).

(b) First, note that the following boundary conditions hold:

π(0, b, A) = π(0, b, B) = 1, b = 1, 2, . . . , ∞and

π(a, 0, A) = π(a, 0, B) = 0, a = 1, 2, . . . , ∞.

From part (a), we know that

π(1, 1, A) = 23

and π(1, 1, B) = 13 .

Now,

π(2, 2, A) =(

23

)π(1, 2, B) +

(13

)π(2, 1, A),

so that we need to know the numerical values of π(1, 2, B) and π(2, 1, A).So,

π(1, 2, B) =(

13

)π(0, 2, B) +

(23

)π(1, 1, A)

=(

13

)(1) +

(23

)(23

)= 7

9;

and,

π(2, 1, A) =(

23

)π(1, 1, B) +

(13

)π(2, 0, A)

=(

23

)(13

)+(

13

)(0) = 2

9.

Finally,

π(2, 2, A) =(

23

)(79

)+(

13

)(29

)= 16

27= 0.593.

Now,

π(3, 3, A) =(

23

)π(2, 3, B) +

(13

)π(3, 2, A),

where

π(2, 3, B) =(

13

)π(1, 3, B) +

(23

)π(2, 2, A)

=(

13

)π(1, 3, B) +

(23

)(1627

)

Page 60: Exercises and Solutions in Biostatistical Theory (2010)

Solutions 41

and

π(3, 2, A) =(

23

)π(2, 2, B) +

(13

)π(3, 1, A).

Now,

π(2, 2, B) =(

13

)π(1, 2, B) +

(23

)π(2, 1, A)

=(

13

)(79

)+(

23

)(29

)= 11

27;

so,

π(3, 2, A) =(

23

)(1127

)+(

13

)π(3, 1, A).

Since

π(2, 1, B) =(

13

)π(1, 1, B) +

(23

)π(2, 0, A)

=(

13

)(13

)+(

23

)(0) = 1

9,

we have

π(3, 1, A) =(

23

)π(2, 1, B) +

(13

)π(3, 0, A)

=(

23

)(19

)+(

13

)(0) = 2

27.

And, since

π(1, 2, A) =(

23

)π(0, 2, B) +

(13

)π(1, 1, A)

=(

23

)(1) +

(13

)(23

)= 8

9,

we have

π(1, 3, B) =(

13

)π(0, 3, B) +

(23

)π(1, 2, A)

=(

13

)(1) +

(23

)(89

)= 25

27.

Finally, since

π(2, 3, B) =(

13

)(2527

)+(

23

)(1627

)= 57

81

Page 61: Exercises and Solutions in Biostatistical Theory (2010)

42 Basic Probability Theory

and

π(3, 2, A) =(

23

)(1127

)+(

13

)(2

27

)= 24

81,

we have

π(3, 3, A) =(

23

)(5781

)+(

13

)(2481

)= 46

81= 0.568.

Clearly, this procedure can be programed to produce the numerical value ofπ(k, k, A) for any positive integer k. For example, the reader can verify thatπ(4, 4, A) = 0.556 and that π(5, 5, A) = 0.549. In general, π(k, k, A) monotonicallydecreases toward the value 1/2 as k becomes large, but the rate of decrease isrelatively slow.

Solution 1.28∗

(a) Now,

α = pr(Aab|B1) = pr(Aab ∩ Ba|B1) + pr(Aab ∩ Ba|B1)

= pr(Aab|Ba ∩ B1)pr(Ba|B1) + pr(Aab|Ba ∩ B1)pr(Ba|B1)

= (1)πa−1 + β[1 − πa−1]= πa−1 + β[1 − πa−1],

since the event “Aab given Ba ∩ B1” is equivalent to the event “Aab given B1.” Morespecifically, the event “Ba ∩ B1” means that the first free throw is made and thatthere is at least one missed free throw among the next (a − 1) free throws. And,when such a miss occurs, it renders irrelevant all the previous makes, and so thescenario becomes exactly that of starting with a missed free throw (namely, theevent “B1”).

Similarly,

β = pr(Aab|B1) = pr(Aab ∩ Cb|B1) + pr(Aab ∩ Cb|B1)

= pr(Aab|Cb ∩ B1)pr(Cb|B1) + pr(Aab|Cb ∩ B1)pr(Cb|B1)

= (0)(1 − π)b−1 + α[1 − (1 − π)b−1]= α[1 − (1 − π)b−1],

since the event “Aab given Cb ∩ B1” is equivalent to the event “Aab given B1.”Solving these two equations simultaneously, we have

α = πa−1 + β[1 − πa−1]= πa−1 + {α[1 − (1 − π)b−1]}[1 − πa−1],

giving

α = πa−1

πa−1 + (1 − π)b−1 − πa−1(1 − π)b−1

Page 62: Exercises and Solutions in Biostatistical Theory (2010)

Solutions 43

and

β = πa−1[1 − (1 − π)b−1]πa−1 + (1 − π)b−1 − πa−1(1 − π)b−1

.

Finally, it follows directly that

θ(π, a, b) = πα + (1 − π)β = πa−1[1 − (1 − π)b]πa−1 + (1 − π)b−1 − πa−1(1 − π)b−1

.

When π = 0.791, a = 10, and b = 2, then θ(0.791, 10, 2) = 0.38.

(b) When both π = 0.50 and a = b, then θ(0.50, a, a) = θ(0.50, b, b) = 0.50; this answermakes sense because runs of makes and misses of the same length are equally likelywhen π = 0.50. When a = b = 1, then θ(π, 1, 1) = π; this answer also makes sensebecause the event A11 (i.e., the event that the first free throw is made) occurs withprobability π. Finally, once several consecutive free throws are made, the pressureto continue the run of made free throws will increase; as a result, the assumption ofmutual independence among the outcomes of consecutive free throws is probablynot valid and the value of π would tend to decrease.

(c) Since the probability of Tyler missing b consecutive free throws before making aconsecutive free throws is equal to

θ(1 − π, b, a) = (1 − π)b−1(1 − πa)

(1 − π)b−1 + πa−1 − (1 − π)b−1πa−1,

it follows directly that θ(π, a, b) + θ(1 − π, b, a) = 1.

Page 63: Exercises and Solutions in Biostatistical Theory (2010)
Page 64: Exercises and Solutions in Biostatistical Theory (2010)

2Univariate Distribution Theory

2.1 Concepts and Notation

2.1.1 Discrete and Continuous Random Variables

A discrete random variable X takes either a finite, or a countably infinite,number of values. A discrete random variable X is characterized by itsprobability distribution pX(x) = pr(X = x), which is a formula giving theprobability that X takes the (permissible) value x. Hence, a valid discreteprobability distribution pX(x) has the following two properties:

i. 0 ≤ pX(x) ≤ 1 for all (permissible) values of x andii.∑

all x pX(x) = 1.

Acontinuous random variable X can theoretically take all the real (and henceuncountably infinite) numerical values on a line segment of either finite orinfinite length. Acontinuous random variable X is characterized by its densityfunction fX(x). A valid density function fX(x) has the following properties:

i. 0 ≤ fX(x) < +∞ for all (permissible) values of x;ii.

∫all x fX(x) dx = 1;

iii. For −∞ < a < b < +∞, pr(a < X < b) = ∫ba fX(x) dx; and

iv. pr(X = x) = 0 for any particular value x, since∫x

x fX(x) dx = 0.

2.1.2 Cumulative Distribution Functions

In general, the cumulative distribution function (CDF) for a univariate ran-dom variable X is the function FX(x) = pr(X ≤ x), −∞ < x < +∞, whichpossesses the following properties:

i. 0 ≤ FX(x) ≤ 1, −∞ < x < +∞;ii. FX(x) is a monotonically nondecreasing function of x; and

iii. limx→−∞ FX(x) = 0 and limx→+∞ FX(x) = 1.

45

Page 65: Exercises and Solutions in Biostatistical Theory (2010)

46 Univariate Distribution Theory

For an integer-valued discrete random variable X, it follows that

i. FX(x) =∑all x∗≤x pX(x∗);ii. pX(x) = pr(X = x) = FX(x) − FX(x − 1); and

iii. [dFX(x)]/dx = pX(x) since FX(x) is a discontinuous function of x.

For a continuous random variable X, it follows that

i. FX(x) = ∫allx∗≤x fX(x∗) dx∗;

ii. For −∞ < a < x < b < +∞, pr(a < X < b) = FX(b) − FX(a); andiii. [dFX(x)]/dx = fX(x) since FX(x) is an absolutely continuous function

of x.

2.1.3 Median and Mode

For any discrete distribution pX(x) or density function fX(x), the populationmedian ξ satisfies the two inequalities

pr(X ≤ ξ) ≥ 12 and pr(X ≥ ξ) ≥ 1

2 .

For a density function fX(x), ξ is that value of X such that

∫ ξ

−∞fX(x) dx = 1

2.

The population mode for either a discrete probability distribution pX(x) ora density function fX(x) is a value of x that maximizes pX(x) or fX(x). Thepopulation mode is not necessarily unique, since pX(x) or fX(x) may achieveits maximum for several different values of x; in this situation, all these localmaxima are called modes.

2.1.4 Expectation Theory

Let g(X) be any scalar function of a univariate random variable X. Then, theexpected value E[g(X)] of g(X) is defined to be

E[g(X)] =∑all x

g(x)pX(x) when X is a discrete random variable,

and is defined to be

E[g(X)] =∫

all xg(x)fX(x) dx when X is a continuous random variable.

Page 66: Exercises and Solutions in Biostatistical Theory (2010)

Concepts and Notation 47

Note that E[g(X)] is said to exist if |E[g(X)]| < +∞; otherwise, E[g(X)] is saidnot to exist.

Some general rules for computing expectations are:

i. If C is a constant independent of X, then E(C) = C;ii. E[Cg(X)] = CE[g(X)];

iii. If C1, C2, . . . , Ck are k constants all independent of X, and ifg1(X), g2(X), . . . , gk(X) are k scalar functions of X, then

E

⎡⎣

k∑i=1

Cigi(X)

⎤⎦ =

k∑i=1

CiE[gi(X)];

iv. If k → ∞, then

E

[ ∞∑i=1

Cigi(X)

]=

∞∑i=1

CiE[gi(X)]

when |∑∞i=1 CiE[gi(X)]| < +∞.

2.1.5 Some Important Expectations

2.1.5.1 Mean

μ = E(X) is the mean of X.

2.1.5.2 Variance

σ2 = V(X) = E{[X − E(X)]2} is the variance of X, and σ = +√σ2 is the standard

deviation of X.

2.1.5.3 Moments

More generally, if r is a positive integer, a binomial expansion of [X − E(X)]r

gives

E{[X − E(X)]r} = E

⎧⎨⎩

r∑j=0

Crj Xj[−E(X)]r−j

⎫⎬⎭ =

r∑j=0

Crj (−1)r−jE(Xj)[E(X)]r−j,

where E{[X − E(X)]r} is the rth moment about the mean.For example, for r = 2, we obtain

E{[X − E(X)]2} = V(X) = E(X2) − [E(X)]2;

Page 67: Exercises and Solutions in Biostatistical Theory (2010)

48 Univariate Distribution Theory

and, for r = 3, we obtain

E{[X − E(X)]3} = E(X3) − 3E(X2)E(X) + 2[E(X)]3,

which is a measure of the skewness of the distribution of X.

2.1.5.4 Moment Generating Function

MX(t) = E(etX) is called the moment generating function for the random vari-able X, provided that MX(t) < +∞ for t in some neighborhood of 0 [i.e., forall t ∈ (−ε, ε), ε > 0]. For r a positive integer, and with E(Xr) defined as the rthmoment about the origin (i.e., about 0) for the random variable X, then MX(t)can be used to generate moments about the origin via the algorithm

drMX(t)dtr |t=0

= E(Xr).

More generally, for r a positive integer, the function

M∗X(t) = E

{et[X−E(X)]

}= e−tE(X)MX(t)

can be used to generate moments about the mean via the algorithm

drM∗X(t)

dtr |t=0= E{[X − E(X)]r}.

2.1.5.5 Probability Generating Function

If we let et equal s in MX(t) = E(etX), we obtain the probability generatingfunction PX(s) = E(sX). Then, for r a positive integer, and with

E[

X!(X − r)!

]= E[X(X − 1)(X − 2) · · · (X − r + 1)]

defined as the rth factorial moment for the random variable X, then PX(s) canbe used to generate factorial moments via the algorithm

drPX(s)dsr |s=1

= E[

X!(X − r)!

].

As an example, the probability generating function PX(s) can be used tofind the variance of X when V(X) is written in the form

V(X) = E[X(X − 1)] + E(X) − [E(X)]2.

Page 68: Exercises and Solutions in Biostatistical Theory (2010)

Concepts and Notation 49

2.1.6 Inequalities Involving Expectations

2.1.6.1 Markov’s Inequality

If X is a nonnegative random variable [i.e., pr(X ≥ 0) = 1], then pr(X > k) ≤E(X)/k for any constant k > 0. As a special case, for r > 0, if X = |Y − E(Y)|rwhen Y is any random variable, then, with νr = E [|Y − E(Y)|r], we have

pr[|Y − E(Y)|r > k

] ≤ νr

k,

or equivalently with k = trνr,

pr[|Y − E(Y)| > tν1/r

r

]≤ t−r, t > 0.

For r = 2, we obtain Tchebyshev’s Inequality, namely,

pr[|Y − E(Y)| > t

√V(Y)

]≤ t−2, t > 0.

2.1.6.2 Jensen’s Inequality

Let X be a random variable with |E(X)| < ∞. If g(X) is a convex function ofX, then E[g(X)] ≥ g[E(X)], provided that |E[g(X)]| < ∞. If g(X) is a concavefunction of X, then the inequality is reversed, namely, E[g(X)] ≤ g[E(X)].

2.1.6.3 Hölder’s Inequality

Let X and Y be random variables, and let p, 1 < p < ∞, and q, 1 < q < ∞,satisfy the restriction 1/p + 1/q = 1. Then,

E(|XY|) ≤ [E(|X|p)]1/p [E(|Y|q)]1/q .

As a special case, when p = q = 2, we obtain the Cauchy–Schwartz Inequality,namely,

E(|XY|) ≤√

E(X2)E(Y2).

2.1.7 Some Important Probability Distributions for DiscreteRandom Variables

2.1.7.1 Binomial Distribution

If X is the number of successes in n trials, where the trials are conductedindependently with the probability π of success remaining the same fromtrial to trial, then

pX(x) = Cnxπx(1 − π)n−x, x = 0, 1, . . . , n and 0 < π < 1.

Page 69: Exercises and Solutions in Biostatistical Theory (2010)

50 Univariate Distribution Theory

When X ∼ BIN(n, π), then E(X) = nπ, V(X) = nπ(1 − π), and MX(t) =[πet + (1 − π)]n.

When n = 1, X has the Bernoulli distribution.

2.1.7.2 Negative Binomial Distribution

If Y is the number of trials required to obtain exactly k successes, where k is aspecified positive integer, and where the trials are conducted independentlywith the probability π of success remaining the same from trial to trial, then

pY(y) = Cy−1k−1π

k(1 − π)y−k , y = k, k + 1, . . . , ∞ and 0 < π < 1.

When Y ∼ NEGBIN(k, π), then E(Y) = k/π, V(Y) = k(1 − π)/π2, and

MY(t) =[

πet

1 − (1 − π)et

]k

.

In the special case when k = 1, then Y has a geometric distribution, namely,

pY(y) = π(1 − π)y−1, y = 1, 2, . . . , ∞ and 0 < π < 1.

When Y ∼ GEOM(π), then E(Y) = 1/π, V(Y) = (1 − π)/π2, and MY(t) =πet/[1 − (1 − π)et].

When X ∼ BIN(n, π) and when Y ∼ NEGBIN(k, π), then pr(X < k) =pr(Y > n).

2.1.7.3 Poisson Distribution

As a model for rare events, the Poisson distribution can be derived as a limitingcase of the binomial distribution as n → ∞ and π → 0 with λ = nπ heldconstant; this limit is

pX(x) = λxe−λ

x! , x = 0, 1, . . . , ∞ and λ > 0.

When X ∼ POI(λ), then E(X) = V(X) = λ and MX(t) = eλ(et−1).

2.1.7.4 Hypergeometric Distribution

Suppose that a finite-sized population of size N(< +∞) contains a items ofType A and b items of Type B, with (a + b) = N. If a sample of n(< N) items israndomly selected without replacement from this population of N items, thenthe number X of items of Type A contained in this sample of n items has thehypergeometric distribution, namely,

pX(x) = CaxCb

n−x

Ca+bn

= CaxCN−a

n−x

CNn

, max(0, n − b) ≤ X ≤ min(n, a).

Page 70: Exercises and Solutions in Biostatistical Theory (2010)

Concepts and Notation 51

When X ∼ HG(a, N − a, n), then

E(X) = n( a

N

)and V(X) = n

( aN

)(N − aN

)(N − nN − 1

).

2.1.8 Some Important Distributions (i.e., Density Functions)for Continuous Random Variables

2.1.8.1 Normal Distribution

The normal distribution density function is

fX(x) = 1√2πσ

e−(x−μ)2/2σ2, −∞ < x < ∞, −∞ < μ < ∞, 0 < σ2 < ∞.

When X ∼ N(μ, σ2), then E(X) = μ, V(X) = σ2, and MX(t) = eμt+σ2t2/2. Also,when X ∼ N(μ, σ2), then the standardized variable Z = (X − μ)/σ ∼ N(0, 1),with density function

fZ(z) = 1√2π

e−z2/2, −∞ < z < ∞.

2.1.8.2 Lognormal Distribution

When X ∼ N(μ, σ2), then the random variable Y = eX has a lognormaldistribution, with density function

fY(y) = 1√2πσy

e−[ln(y)−μ]2/2σ2, 0 < y < ∞, −∞ < μ < ∞, 0 < σ2 < ∞.

When Y ∼ LN(μ, σ2), then E(Y) = eμ+(σ2/2) and V(Y) = [E(Y)]2(eσ2 − 1).

2.1.8.3 Gamma Distribution

The gamma distribution density function is

fX(x) = xβ−1e−x/α

Γ(β)αβ, 0 < x < ∞, 0 < α < ∞, 0 < β < ∞.

When X ∼ GAMMA(α, β), then E(X) = αβ, V(X) = α2β, and MX(t) = (1 −αt)−β. The Gamma distribution has two important special cases:

i. When α = 2 and β = ν/2, then X ∼ χ2ν (i.e., X has a chi-squared

distribution with ν degrees of freedom). When X ∼ χ2ν, then

fX(x) = xν2 −1e−x/2

Γ(

ν2

)2ν/2

, 0 < x < ∞ and ν a positive integer;

Page 71: Exercises and Solutions in Biostatistical Theory (2010)

52 Univariate Distribution Theory

also, E(X) = ν, V(X) = 2ν, and MX(t) = (1 − 2t)−ν/2. And, if Z ∼N(0, 1), then Z2 ∼ χ2

1.ii. When β = 1, then X has a negative exponential distribution with

density function

fX(x) = 1α

e−x/α, 0 < x < ∞, 0 < α < ∞.

When X ∼ NEGEXP(α), then E(X) = α, V(X) = α2, and MX(t) =(1 − αt)−1.

2.1.8.4 Beta Distribution

The Beta distribution density function is

fX(x) = Γ(α + β)

Γ(α)Γ(β)xα−1(1 − x)β−1, 0 < x < 1, 0 < α < ∞, 0 < β < ∞.

When X ∼ BETA(α, β), then E(X) = αα+β

and V(X) = αβ

(α+β)2(α+β+1).

2.1.8.5 Uniform Distribution

The Uniform distribution density function is

fX(x) = 1(θ2 − θ1)

, −∞ < θ1 < x < θ2 < ∞.

When X ∼ UNIF(θ1, θ2), then E(X) = (θ1+θ2)2 , V(X) = (θ2−θ1)

2

12 and MX(t) =(etθ2−etθ1 )

t(θ2−θ1).

EXERCISES

Exercise 2.1

(a) In a certain small group of seven people, suppose that exactly four of these peoplehave a certain rare blood disorder. If individuals are selected at random one-at-a-time without replacement from this group of seven people, find the numerical valueof the expected number of individuals that have to be selected in order to obtainone individual with this rare blood disorder and one individual without this rareblood disorder.

(b) Now, consider a finite-sized population of size N (< + ∞) in which there areexactly M (2 ≤ M < N) individuals with this rare blood disorder. Suppose thatindividuals are selected from this population at random one-at-a-time withoutreplacement. Let the random variable X denote the number of individuals selecteduntil exactly k (1 ≤ k ≤ M < N) individuals are selected who have this rare blood

Page 72: Exercises and Solutions in Biostatistical Theory (2010)

Exercises 53

disorder. Derive an explicit expression for the probability distribution of the randomvariable X.

(c) Given the conditions described in part (b), derive an explicit expression for theprobability that the third individual selected has this rare blood disorder.

Exercise 2.2. Suppose that the positive integers 1, 2, . . . , k, k ≥ 3, are arranged ran-domly in a horizontal line, thus occupying k slots. Assume that all arrangements ofthese k integers are equally likely.

(a) Derive the probability distribution pX(x) of the discrete random variable X, whereX is the number of integers between the integers 1 and k. Also, show directly thatpX(x) is a valid discrete probability distribution.

(b) Develop an explicit expression for E(X).

Exercise 2.3. Consider an urn that contains four white balls and two black balls.

(a) Suppose that pairs of balls are selected from this urn without replacement; in particu-lar, the first two balls selected (each ball selected without replacement) constitutethe first pair, the next two balls selected constitute the second pair, and so on.Find numerical values for E(Y) and V(Y), where Y is the number of black ballsremaining in the urn after the first pair of white balls is selected.

(b) Now, suppose that pairs of balls are selected from this urn with replacement in thefollowing manner: the first ball in a pair is randomly selected, its color is recorded,and then it is returned to the urn; then, the second ball making up this particularpair is randomly selected, its color is recorded, and then it is returned to theurn. Provide an explicit expression for the probability distribution of the randomvariable X, the number of pairs of balls that have to be selected in this manneruntil exactly two pairs of white balls are obtained (i.e., both balls in each of thesetwo pairs are white)?

Exercise 2.4. To estimate the unknown size N(< +∞) of a population (e.g., the numberof bass in a particular lake, the number of whales in a particular ocean, the number ofbirds of a specific species in a particular forest, etc.), a sampling procedure known ascapture–recapture is often employed. This capture–recapture sampling method worksas follows. For the first stage of sampling, m animals are randomly chosen (i.e., cap-tured) from the population of animals under study and are then individually markedto permit future identification. Then, these m marked animals are released back intothe population of animals under study. At the second stage of sampling, which occursat some later time, n(<m) animals are then randomly chosen (i.e., captured) from apopulation (of unknown size N) that now contains both m marked animals and anunknown number of unmarked animals.

(a) Assuming (for now) that the size N of the population under study is known,provide an explicit expression for the probability distribution of X, the numberof marked animals in the set of n(<m) randomly chosen animals obtained at thesecond stage of sampling.

(b) Again, assume (for now) that the size N of the population under study is known.At the first stage of sampling, if the marks on the m randomly chosen animals

Page 73: Exercises and Solutions in Biostatistical Theory (2010)

54 Univariate Distribution Theory

consist of the positive integers 1, 2, . . . , m, derive an explicit expression for theprobability π that a set of n(4 < n < m) animals randomly chosen at the secondstage of sampling contains at least two animals that were marked with any of thepositive integers 1, 2, 3, and 4.

(c) Since the value of N is actually unknown, the purpose of the capture–recapturesampling method is to provide an estimate of N using the observed value x of Xand the known sample sizes m and n. Using logical arguments, suggest a formulafor an estimate N of N that is a function of x, m, and n. If x = 22, m = 600, andn = 300, compute the numerical value of N. Do you notice any obvious problemsassociated with the use of the formula for N that you have developed?

Exercise 2.5. A researcher at the National Center for Health Statistics (NCHS) is inter-ested in obtaining in-depth interviews from people in each of k(≥2) health statuscategories. In what follows, assume that:

(i) this researcher interviews exactly one randomly chosen person every day;

(ii) each person randomly chosen to be interviewed is equally likely to be in any oneof the k health status categories.

This NCHS researcher is concerned that it will take her a considerable amount of timeto interview at least one person in each of the k health status categories, and so sheasks the following design-related question: “Given that I have interviewed people inexactly c different health status categories by the end of today, where 0 ≤ c ≤ (k − 1),what is the (conditional) probability that I will encounter a person in a new healthstatus category exactly x(≥1) days from today?”

Develop an explicit expression for this conditional probability. Then, use this result toderive an expression for the expected total number of days required for this researcher toencounter at least one person in every health status category; also, find the numericalvalue of this expected value expression when k = 4.

Exercise 2.6. In a certain state lottery, suppose that the probability of buying a jackpot-winning ticket for a particular game is π = 0.0005.

(a) Suppose that a person wishes to buy n tickets for this particular game. What isthe smallest value, say n∗, of the number of tickets n that this person needs to buyto have a probability of at least 0.90 of purchasing at least one jackpot-winningticket? Use both the binomial and Poisson distributions to determine the value ofn∗, and then comment on the numerical results.

(b) If each of the n∗ tickets purchased by this person costs $1.00, what should be thesmallest dollar amount A of the jackpot so that this person’s expected net profitE(P) after purchasing n∗ tickets is nonnegative?

(c) If, in fact, a total of N tickets are purchased for this game by lottery participantsin this state, and if K(0 < K < N) of these tickets are actually jackpot-winningtickets, develop an expression for the probability that at least k(1 ≤ k ≤ n∗) of then∗(1 ≤ n∗ < N) tickets purchased by this person are jackpot-winning tickets.

Exercise 2.7. The Rhine Research Center, which studies parapsychology and relatedphenomena, is located near Duke University in Durham, North Carolina. It has been

Page 74: Exercises and Solutions in Biostatistical Theory (2010)

Exercises 55

suggested by a certain parapsychologist employed by the Rhine Research Center thatthere could be extra-sensory perception (ESP) between monozygotic twins. To testthis theory, this parapsychologist designs the following simple experiment. Each twinthinks of a particular whole number between 1 and k inclusive (namely, each twinpicks one of the k numbers 1, 2, . . . , k), and then writes that number on a piece ofpaper. The two numbers that are written down are then compared to see whether ornot they are the same. Let the random variable Y take the value 1 if the two numbersare the same, and let Y take the value 0 otherwise.

(a) Under the assumption that there is no ESP between a pair of monozygotic twins(i.e., each twin is picking his or her number totally at random), what is the exactprobability distribution of the dichotomous random variable Y?

(b) Suppose that this parapsychologist is willing to declare that any pair of monozy-gotic twins possesses ESP if those twins choose numbers that are the same in onerepetition of the experiment. However, this parapsychologist realizes that k mustbe large enough to make such a declaration appear statistically credible. Help thisparapsychologist out by determining the smallest value of k required such thatthe probability of monozygotic twins with no ESP choosing matching numbers inone repetition of the experiment is no larger than 0.01?

(c) Using the value of k determined in part (b), suppose that this parapsychologist runsthis experiment independently on n = 100 different sets of monozygotic twins. Ifnone of these sets of monozygotic twins actually has ESP, how likely is it that thisparapsychologist will incorrectly declare that at least one set of monozygotic twinsactually has ESP? Comment on this finding.

(d) For a particular set of monozygotic twins, suppose that this experiment is inde-pendently repeated 10 times using the value of k determined in part (b). If thesetwins choose the same number in exactly 2 of the 10 independent repetitions ofthis parapsychological experiment, do you think that these data provide evidenceof ESP or not?

(e) Suppose that only one repetition of the experiment is carried out using k = 4.Define the random variable S to be the sum of the two numbers chosen by themonozygotic twins under study. Given that the two numbers chosen by the twinsare not the same and given that the twins are choosing their numbers totally atrandom, derive the exact probability distribution of S and find E(S) given thestated conditions.

Exercise 2.8. In order to have clinical expression of a mutagenic disease, it has beenargued that two distinct steps have to occur. First, a mutagenic process starts withgenetic damage. A mutagen (e.g., an agent like ionizing radiation) causes defects (or“breakpoints”) in the DNA of human genetic material that produces the initial mutantcell. However, for a mutagenic process to be clinically expressed as a mutagenic dis-ease, a second step is necessary, namely, the damaged (or mutant) cell (i.e., a cell withat least one breakpoint) must be able to clone (i.e., to reproduce its damaged self) effec-tively. A damaged cell that retains its ability to clone is said to be viable. In particular,the clinical expression of genetic damage (say, as a detectable cancer) cannot occuruntil the cell population cloned from the viable damaged cell is very large.

Suppose that we want to develop a statistical model for the above two-step muta-genic process involving a single cell exposed to ionizing radiation. To start, assume that

Page 75: Exercises and Solutions in Biostatistical Theory (2010)

56 Univariate Distribution Theory

the number Y of breakpoints in the initial damaged (or mutant) cell has the truncatedPoisson distribution

pY(y) = λy

y!(eλ − 1), y = 1, 2, . . . , ∞ and λ > 0.

(a) Find an explicit expression for pr(Y ≤ 3|Y ≥ 2).

(b) For r = 1, 2, . . . , derive an explicit expression for

E[

Y!(Y − r)!

]= E[Y(Y − 1) · · · (Y − r + 1)],

and then use this expression to find E(Y) and V(Y).

(c) Now, as a simple model, let the probability that there is no loss of viability (i.e., noserious inhibition of the damaged cell’s reproductive capability) due to any onebreakpoint be equal to π, 0 < π < 1. Then, if V is the event that a damaged cell isviable, assume that pr(V|Y = y) = πy, where pr(V|Y = y) is the probability thata cell is viable given that it has y breakpoints, y = 1, 2, . . . , ∞. This assumption ismeant to reflect the fact that the viability of a damaged cell will decrease as thenumber of breakpoints increases. Develop an explicit expression for the probabilityθ that a damaged cell is viable.

Exercise 2.9. Acertain production process is designed to make electric light bulbs, witheach light bulb intended to have an exact wattage value of 30 watts. However, becauseof problems with the production process, the actual wattage of a light bulb made bythis production process can be considered to be a continuous random variable W thatcan be accurately modeled by the equation

W = 31 + (0.50)U,

where U ∼ N(0, 4). Find the exact numerical value of the probability that an electriclight bulb made by this production process will have a wattage that does not deviatefrom the desired value of 30 watts by more than 0.50 watts.

Exercise 2.10. A certain company employs two manufacturing processes, Process 1and Process 2, for producing very small square-shaped computer chips to be used inhuman hearing aids. Suppose that X, the diagonal of a computer chip in centimeters,is a continuous random variable with process-specific density functions defined asfollows:

Process 1: fX(x) = 3.144e−x, 1.0 < x < 3.0;

Process 2: fX(x) = 2.574e−x, 0.8 < x < 2.8.

Only computer chips with diagonals between 1.0 and 2.0 centimeters are usable.

(a) Suppose that Process 1 produces 1000 computer chips per day, and that Process 2produces 2000 computer chips per day. Further, at the end of each day, suppose

Page 76: Exercises and Solutions in Biostatistical Theory (2010)

Exercises 57

that all 3000 computer chips are put into a large container and mixed together,thus making it impossible to tell which manufacturing process produced any par-ticular computer chip. Suppose that two computer chips are selected randomlywith replacement from this large container, and further suppose that one of thesetwo computer chips is found to be usable and the other computer chip is found tobe unusable. Determine the numerical value of the probability that both of thesecomputer chips were produced by Process 1.

(b) If computer chips are selected randomly one-at-a-time with replacement from thecontainer described in part (a), provide an explicit expression for the probabilitydistribution pY(y) of the discrete random variable Y, where Y is the number ofcomputer chips that have to be selected until at least two usable computer chipsand at least one unusable computer chip are obtained. Also, prove directly thatpY(y) is a valid discrete probability distribution.

Exercise 2.11. Racing car windshields made of a new impact-resistant glass are testedfor breaking strength by striking them repeatedly with a mechanical device that sim-ulates the stresses caused by high-speed crashes in automobile races. A statisticianclaims that it is obviously unrealistic to assume that the probability of a windshieldbreaking on a given strike is independent of the number of strikes previously survived.More specifically, since any windshield would be expected to become progressivelymore prone to breaking as the number of strikes increases, this statistician suggestsusing the following probability model: Let Ax be the event that a windshield survivesthe xth strike; then, for 0 < θ < 1,

θ = pr(A1) and θx = pr(Ax| ∩x−1i=1 Ai), x = 2, 3, . . . , ∞.

(a) Given this probability model, let the random variable X denote the number ofstrikes required to break a windshield made of this new impact-resistant glass.Derive, using precise arguments, a general formula for pX(x), the probability dis-tribution of X, and carefully prove that pX(x) satisfies all the requirements to be avalid discrete probability distribution.

(b) If terms of the form θj for j > 3 can be neglected, develop a reasonable approxi-mation for E(X).

Exercise 2.12. Suppose that the continuous random variable X has the uniformdistribution

fX(x) = 1, 0 < x < 1.

Suppose that the continuous random variable Y is related to X via the equationY = [− ln(1 − X)]1/3. By relating FY(y) to FX(x), develop explicit expressions for fY(y)

and E(Yr) for r ≥ 0.

Exercise 2.13. For a certain psychological test designed to measure work-related stresslevel, a score of zero is considered to reflect a normal level of work-related stress. Basedon previous data, it is reasonable to assume that the score X on this psychological testcan be accurately modeled as a continuous random variable with density function

fX(x) = 1288

(36 − x2), −6 < x < 6,

Page 77: Exercises and Solutions in Biostatistical Theory (2010)

58 Univariate Distribution Theory

where negative scores indicate lower-than-normal work-related stress levels andpositive scores indicate higher-than-normal work-related stress levels.

(a) Find the numerical value of the probability that a randomly chosen person takingthis psychological test makes a test score within two units of a test score of zero.

(b) Develop an explicit expression for FX(x), the cumulative distribution function(CDF) for X, and then use this result to compute the exact numerical value ofthe probability that a randomly chosen person makes a test score greater thanthree in value given that this person’s test score suggests a higher-than-normalwork-related stress level.

(c) Find the numerical value of the probability (say, π) that, on any particular day, thesixth person taking this psychological test is at least the third person to make atest score greater than one in value.

(d) Use Tchebyshev’s Inequality to find numbers L and U such that

pr(L < X < U) ≥ 89 .

Comment on your findings.

Exercise 2.14. Suppose that the continuous random variable X has the mixturedistribution

fX(x) = πf1(x) + (1 − π)f2(x), −∞ < x < +∞,

where f1(x) is a normal density with mean μ1 and variance σ21, where f2(x) is a nor-

mal density with mean μ2 and variance σ22, where π is the probability that X has

distribution f1(x), and where (1 − π) is the probability that X has distribution f2(x).

(a) Develop an explicit expression for PX(s), the probability generating function of therandom variable X, and then use this result directly to find E(X).

(b) Let π = 0.60, μ1 = 1.00, σ21 = 0.50, μ2 = 1.20, and σ2

2 = 0.40. Suppose that onevalue of X is observed, and that value of X exceeds 1.10 in value. Find the numericalvalue of the probability that this observed value of X was obtained from f1(x).

(c) Now, suppose that π = 1, μ1 = 0, and σ21 = 1. Find the numerical value of E(X|X >

1.00).

Exercise 2.15. If the random variable Y ∼ N(0, 1), develop an explicit expression forE(|Yr|)when r is an odd positive integer.

Exercise 2.16. Suppose that the discrete random variable Y has the negative binomialdistribution

pY(y) = Cy+k−1k−1 πk(1 − π)y, y = 0, 1, . . . , ∞, 0 < π < 1,

with k a known positive integer. Derive an explicit expression for E[Y!/(Y − r)!]where ris a nonnegative integer. Then, use this result to find E(X) and V(X) when X = (Y + k).

Page 78: Exercises and Solutions in Biostatistical Theory (2010)

Exercises 59

Exercise 2.17. Suppose that X is the concentration (in parts per million) of a certainairborne pollutant, and suppose that the random variable Y = ln(X) has a distributionthat can be adequately modeled by the double exponential density function

fY(y) = (2α)−1e−|y−β|/α, −∞ < y < ∞, −∞ < β < ∞, 0 < α < ∞.

(a) Find an explicit expression for FY(y), the cumulative distribution function (CDF)associated with the density function fY(y). If α = 1 and β = 2, use this CDF to findthe numerical value of pr(X > 4|X > 2).

(b) For the density function fY(y) given above, derive an explicit expression for agenerating function φY(t) that can be used to generate the absolute-value momentsνr = E{|Y − E(Y)|r} for r a nonnegative integer, and then use φY(t) directly to findν1 and ν2 = V(Y).

Exercise 2.18. A certain statistical model describing the probability (or risk) Y of anadult developing leukemia as a function of lifetime cumulative exposure X to radiation(in microsieverts) is given by the equation

Y = g(X) = 1 − αe−βX2, 0 < X < +∞, 0 < α < 1, 0 < β < +∞,

where the continuous random variable X has the distribution

fX(x) =(

2πθ

)1/2e−x2/2θ, 0 < x < +∞, 0 < θ < +∞.

Find an explicit expression relating average risk E(Y) to average cumulative exposureE(X). Comment on how the average risk varies as a function of α, β, and E(X).

Exercise 2.19. A conceptually infinitely large population consists of a proportion π0 ofnonsmokers, a proportion πl of light smokers (no more than one pack per day), and aproportion πh of heavy smokers (more than one pack per day), where (π0 + πl + πh) =1. Consider the following three random variables based on three different samplingschemes:

1. X1 is the number of subjects that have to be randomly selected sequentially fromthis population until exactly two heavy smokers are obtained.

2. X2 is the number of subjects that have to be randomly selected sequentially fromthis population until at least one light smoker and at least one heavy smoker areobtained.

3. X3 is the number of subjects that have to be randomly selected sequentially fromthis population until at least one subject from each of the three smoking categories(i.e., nonsmokers, light smokers, and heavy smokers) is obtained.

(a) Develop an explicit expression for the probability distribution pX1(x1) of X1.

(b) Develop an explicit expression for the probability distribution pX2(x2) of X2.

(c) Develop an explicit expression for the probability distribution pX3(x3) of X3.

Page 79: Exercises and Solutions in Biostatistical Theory (2010)

60 Univariate Distribution Theory

Exercise 2.20. If Y is a normally distributed random variable with mean μ and varianceσ2, then the random variable X = eY is said to have a lognormal distribution. Thelognormal distribution has been used in many important practical applications, onesuch important application being to model the distributions of chemical concentrationlevels to which workers are exposed in occupational settings.

(a) Using the fact that Y ∼ N(μ, σ2) and that X = eY , derive explicit expressions forE(X) and V(X).

(b) If the lognormal random variable X = eY defined in part (a) represents the averageconcentration (in parts per million, or ppm) of a certain toxic chemical to whicha typical worker in a certain chemical manufacturing industry is exposed overan 8-hour workday, and if E(X) = V(X) = 1, find the exact numerical value ofpr(X > 1), namely, the probability that such a typical worker will be exposed overan 8-hour workday to an average chemical concentration level greater than 1 ppm.

(c) To protect the health of workers in this chemical manufacturing industry, it isdesirable to be highly confident that a typical worker will not be exposed to anaverage chemical concentration greater than c ppm over an 8-hour workday, wherec is a known positive constant specified by federal guidelines.

Prove that

pr(X ≤ c) ≥ (1 − α), 0 < α < 0.50,

if

E(X) ≤ ce−0.50z21−α ,

where pr(Z ≤ z1−α) = (1 − α) when Z ∼ N(0, 1). The implication of this result is thatit is possible to meaningfully reduce the chance that a worker will be exposed overan 8-hour workday to a high average concentration of a potentially harmful chemicalby sufficiently lowering the mean concentration level E(X), given the assumption thatY = ln(X) ∼ N(μ, σ2).

Exercise 2.21. Let X be a discrete random variable such that

θx = pr(X = x) = απx, x = 1, 2, . . . , +∞, 0 < π < 1,

and let

θ0 = pr(X = 0) = 1 −∞∑

x=1

απx.

Here, α is an appropriately chosen positive constant.

(a) Develop an explicit expression for MX(t) = E(etX), and then use this expressionto find E(X). Be sure to specify appropriate ranges for α and t.

(b) Verify your answer for E(X) in part (a) by computing E(X) directly.

Page 80: Exercises and Solutions in Biostatistical Theory (2010)

Exercises 61

Exercise 2.22. A popular dimensionless measure of the skewness (or “asymmetry”) ofa density function fX(x) is the quantity

α3 = μ3

μ3/22

= E{[X − E(X)]3}[V(X)]3/2 .

As a possible competitor to α3, a new dimensionless measure of asymmetry, denotedα∗

3, is proposed, where

α∗3 = E(X) − θ√

V(X);

here, θ is defined as the mode of the density function fX(x), namely, that unique valueof x (if it exists) that maximizes fX(x).

For the gamma density function

fX(x) = xβ−1e−x/α

Γ(β)αβ, 0 < x < ∞, α > 0, β > 0,

develop explicit expressions for α3 and α∗3, and comment on the findings.

Exercise 2.23∗. Environmental scientists typically use personal exposure monitors tomeasure the average daily concentrations of chemicals to which workers are exposedduring 8-h work shifts. In certain situations, some average concentration levels are verylow and so fall below a known detection limit L(>0) defined by the type of personalmonitor being used; such unobservable average concentration levels are said to beleft-censored.

To deal with this missing data problem, one suggested ad hoc approach is toreplace such left-censored average concentration levels with some numerical func-tion g(L)(>0) of L, say, L/

√2, L/2, or even L itself. To study the statistical ramifications

of such an ad hoc approach, let X(≥0) be a continuous random variable representingthe average concentration level for a randomly chosen worker in a certain industrialsetting; further, assume that X has the distribution fX(x) with mean E(X) and varianceV(X). Then, define the random variable

U = X if X ≥ L and U = g(L) if X < L.

(a) If π = pr(X ≥ L) = ∫∞L fX(x) dx, show that

E(U) = (1 − π)g(L) + πE(X|X ≥ L)

and that

V(U) = π{

V(X|X ≥ L) + (1 − π)[g(L) − E(X|X ≥ L)

]2} .

(b) Find an explicit expression for the optimal choice for g(L) such that E(U) = E(X),which is a very desirable equality when using U as a surrogate for X. If fX(x) =e−x, x ≥ 0, and L = 0.05, find the exact numerical value of this optimal choicefor g(L).

Page 81: Exercises and Solutions in Biostatistical Theory (2010)

62 Univariate Distribution Theory

Exercise 2.24∗. Suppose that X ∼ N(μ, σ2). Develop an explicit expression for E(Y)

when

Y = 1 − αe−βX2, 0 < α < 1, 0 < β < +∞.

Exercise 2.25∗. The cumulant generating function for a random variable X is defined as

ψX(t) = ln[MX(t)],

where MX(t) = E(etX) is the moment generating function of X; and, the rth cumulantκr is the coefficient of tr/r! in the series expansion

ψX(t) = ln[MX(t)] =∞∑

r=1

κrtr

r! .

(a) If Y = (X − c), where c is a constant independent of X, what is the relationshipbetween the cumulants of Y and the cumulants of X?

(b) Find the cumulants of X when X is distributed as:

(i) N(μ, σ2);

(ii) POI(λ);

(iii) GAMMA(α, β).

(c) In general, show that κ1 = E(X), that κ2 = V(X), and that κ3 = E{[X − E(X)]3}.

Exercise 2.26∗. In the branch of statistics known as “survival analysis,” interest con-cerns a continuous random variable T (0 < T < ∞), the time until an event (such asdeath) occurs. For example, in a clinical trial evaluating the effectiveness of a newremission induction chemotherapy treatment for leukemia, investigators may wishto model the time (in months) in remission (or, equivalently, the time to the reap-pearance of leukemia) for patients who have received this chemotherapy treatmentand who have gone into remission. In such settings, rather than modeling T directly,investigators will often model the hazard function, h(t), defined as

h(t) = limΔt→0

pr(t ≤ T ≤ t + Δt|T ≥ t)Δt

, t > 0.

The hazard function, or “instantaneous failure rate,” is the limiting value (as Δt → 0)of the probability per unit of time of the occurrence of the event of interest during asmall time interval [t, t + Δt] of length Δt, given that the event has not occurred priorto time t.

(a) If fT(t) ≡ f(t) is the density function of T and if FT(t) ≡ F(t) is the correspondingCDF, show that

h(t) = f(t)S(t)

,

Page 82: Exercises and Solutions in Biostatistical Theory (2010)

Exercises 63

where S(t) = [1 − F(t)] is called the survival function and is the probability that theevent of interest does not occur prior to time t.

(b) Using the result in part (a), show that

S(t) = e−H(t),

where H(t) = ∫t0 h(u) du is the cumulative hazard function.

(c) Prove that E(T) = ∫∞0 S(t) dt.

(d) Due to funding restrictions, the chemotherapy clinical trial described above is tobe terminated after a fixed period of time c (in months). Suppose that patientsremain in the trial until either their leukemia reappears or the clinical trial ends(i.e., assume that there is no loss to follow-up, so that all patients either come out ofremission or remain in remission until the trial ends). The observed time on studyfor each patient is therefore X = min(T, c), where T denotes the time in remission.Show that

E[H(X)] = FT(c),

where H(·) is the cumulative hazard function for T.

For further details about survival analysis, see Hosmer, Lemeshow, and May (2008)and Kleinbaum and Klein (2005).

Exercise 2.27∗. A certain drug company produces and sells a popular insulin for thetreatment of diabetes. At the beginning of each calendar year, the company produces avery large number of units of the insulin (where a unit is a dosage amount equivalent toone injection of the insulin), the production goal being to closely meet patient demandfor the insulin during that year. The company makes a net gain of G dollars for eachunit sold during the year, and the company suffers a net loss of L dollars for eachunit left unsold during the year. Further, suppose that the total number X of units ofinsulin (if available) that patients would purchase during the year can be modeledapproximately as a continuous random variable with probability density functionfX(x), x > 0.

(a) If N is the total number of units of the insulin that should be produced at thebeginning of the year to maximize the expected value of the profit P of the companyfor the entire year, show that N satisfies the equation

FX(N) = G(G + L)

,

where FX(x) = pr(X ≤ x) is the CDF of the random variable X.

(b) Compute the value of N if G = 4, L = 1, and

fX(x) = (2 × 10−10)xe−(10−10)x2, x > 0.

Page 83: Exercises and Solutions in Biostatistical Theory (2010)

64 Univariate Distribution Theory

Exercise 2.28∗. Suppose that a particular automobile insurance company adopts thefollowing strategy with regard to setting the value of yearly premiums for coverage.Any policy holder must pay a premium of P1 dollars for the first year of coverage. Ifa policy holder has a perfect driving record during this first year of coverage (i.e., thispolicy holder is not responsible for any traffic accidents or for any traffic violationsduring this first year of coverage), then the premium for the second year of coveragewill be reduced to αP1, where 0 < α < 1. However, if this policy holder does not havea perfect driving record during the first year of coverage, then the premium for thesecond year of coverage will be increased to βP1, where 1 < β < +∞.

More generally, let π, 0 < π < 1, be the probability that any policy holder has a per-fect driving record during any particular year of coverage, and assume that any policyholder’s driving record during any one particular year of coverage is independent ofhis or her driving record during any other year of coverage. Then, in general, fork = 2, 3, . . . , ∞, let Pk−1 denote the premium for year (k − 1); thus, the premium Pkfor year k will equal αPk−1 with probability π, and will equal βPk−1 with probability(1 − π).

(a) For k = 2, 3, . . . , ∞, develop an explicit expression for E(Pk), the average yearlypremium for the kth year of coverage for any policy holder.

(b) This insurance company cannot afford to let the average yearly premium for anypolicy holder be smaller than a certain value, say, P∗. Find an expression (as afunction of P1, P∗, β, and π) for the smallest value of α, say α∗, such that the averageyearly premium for year k is not less than P∗. Then, consider the limiting valueof α∗ as k → ∞; compute the numerical value of this limiting value of α∗ whenπ = 0.90 and β = 1.05, and then comment on your findings.

Exercise 2.29∗. Suppose that the discrete random variable X has the probabilitydistribution

pX(x) = pr(X = x) = 1x!

R−x∑

l=0

(−1)l

l! , x = 0, 1, . . . , R,

where R(>1) is a positive integer.

(a) Use an inductive argument to show that∑R

x=0 pX(x) = 1.

(b) Find explicit expressions for E(X) and V(X). Also, find limR→∞pX(x). Commenton all these findings.

Exercise 2.30∗. Suppose that the number XT of incident (i.e., new) lung cancer casesdeveloping in a certain disease-free population of size N during a time interval oflength T (in years) has the Poisson distribution

pXT(x) = (NTλ)xe−(NTλ)

x! , x = 0, 1, . . . , ∞; N > 0, λ > 0, T > 0.

Here, N and T are known constants, and the parameter λ is the unknown rate of lungcancer development per person-year (a quantity often referred to as the “incidencedensity” by epidemiologists).

Page 84: Exercises and Solutions in Biostatistical Theory (2010)

Exercises 65

(a) Starting at time zero, let the continuous random variable Wn be the length oftime in years that passes until exactly n lung cancer cases have developed. Wn isreferred to as the “waiting time” until the nth lung cancer case has developed. Byexpressing the CDF FWn(wn) of the random variable Wn in terms of a probabilitystatement about the Poisson random variable XT , develop an explicit expressionfor the density function of the random variable Wn.

(b) With XT ∼ POI(NTλ), consider the standardized random variableZ = [XT − E(XT)]/√V(XT). Show that

limN→∞ E(etZ) = et2/2,

which is the moment generating function of a standard normal random variable.Then, if N = 105 and λ = 10−4, use the above result to provide a reasonable valuefor the probability of observing no more than 90 new cases of lung cancer in any10-year period of time.

Exercise 2.31∗. Important computational aids for the numerical evaluation of incom-plete integrals of gamma and beta distributions involve expressing such integrals assums of probabilities of particular Poisson and binomial distributions.

(a) Prove that∫∞

c

xβ−1e−x/α

Γ(β)αβdx =

β−1∑

j=0

e−c/α (c/α)j

j! ,

where α > 0 and c > 0 and where β is a positive integer.

(b) Prove that

∫ c

0

Γ(α + β)

Γ(α)Γ(β)xα−1(1 − x)β−1 dx =

α+β−1∑

i=α

Cα+β−1i ci(1 − c)α+β−1−i,

where α and β are positive integers and where 0 < c < 1.

Exercise 2.32∗. Suppose that the probability that a sea turtle nest contains n eggs isequal to (1 − π)πn−1, where n = 1, 2, . . . , ∞ and 0 < π < 1. Furthermore, each egg inany such nest has probability 0.30 of producing a live and healthy baby sea turtle,completely independent of what happens to any other egg in that same nest. Finally,because of predators (e.g., sea birds and other sea creatures) and other risk factors (e.g.,shore erosion, harmful environmental conditions, etc.), each such live and healthybaby sea turtle then has probability 0.98 of NOT surviving to adulthood.

(a) Find the exact numerical value of the probability that any egg produces an adult seaturtle.

(b) Derive an explicit expression for the probability α that a randomly chosen sea turtlenest produces at least one adult sea turtle. Find the exact numerical value of α whenπ = 0.20.

Page 85: Exercises and Solutions in Biostatistical Theory (2010)

66 Univariate Distribution Theory

(c) Suppose that a randomly chosen sea turtle nest is known to have produced exactlyk adult sea turtles, where k ≥ 0. Derive an explicit expression for the probability βnkthat this randomly chosen sea turtle nest originally contained exactly n eggs, n ≥ 1.Find the exact numerical value of βnk when π = 0.20, k = 2, and n = 6.

Exercise 2.33∗

(a) Prove Pascal’s Identity, namely,

Cnk = Cn−1

k−1 + Cn−1k

for any positive integers n and k such that Cnk ≡ 0 if k > n.

(b) Prove Vandermonde’s Identity, namely,

Cm+nr =

r∑

k=0

Cmr−kCn

k ,

where m, n, and r are nonnegative integers satisfying r ≤ min{m, n}.(c) For y = 1, 2, . . . , min{s, t}, suppose that the discrete random variable X takes the

value x = 2y with probability

π2y =2Cs−1

y−1Ct−1y−1

Cs+ts

,

and takes the value

π2y+1 =Cs−1

y Ct−1y−1 + Cs−1

y−1Ct−1y

Cs+ts

,

where Cs−1y ≡ 0 when y = s and Ct−1

y ≡ 0 when y = t.

Use Pascal’s Identity and Vandermonde’s Identity to show that X has avalid discrete probability distribution.

SOLUTIONS

Solution 2.1

(a) Let the random variable Y denote the number of individuals that must be selecteduntil one individual with the rare blood disorder and one individual without therare blood disorder are selected. (Note that Y can take the values 2, 3, 4, and5.) If Di is the event that the ith individual selected has the rare blood disorder,

Page 86: Exercises and Solutions in Biostatistical Theory (2010)

Solutions 67

then

pr(Y = 2) = pr(D1 ∩ D2) + pr(D1 ∩ D2) = pr(D1)pr(D2|D1)

+ pr(D1)pr(D2|D1)

=(

47

)(36

)+(

37

)(46

)= 4

7;

pr(Y = 3) = pr(D1 ∩ D2 ∩ D3) + pr(D1 ∩ D2 ∩ D3)

= pr(D1)pr(D2|D1)pr(D3|D1 ∩ D2)

+ pr(D1)pr(D2|D1)pr(D3|D1 ∩ D2)

=(

47

)(36

)(35

)+(

37

)(26

)(45

)= 10

35.

Similarly,

pr(Y = 4) =(

47

)(36

)(25

)(34

)+(

37

)(26

)(15

)(44

)= 4

35; and,

pr(Y = 5) =(

47

)(36

)(25

)(14

)(33

)= 1

35

= 1 −4∑

y=2

pr(Y = y) = 1 − 3435

.

Finally, E(Y) = 2(

47

)+ 3

(1035

)+ 4

(435

)+ 5

(135

)= 2.60.

(b) Let A denote the event that “(k − 1) individuals have the rare blood disorderamong the first (x − 1) individuals selected,” and let B denote the event that “thexth individual selected has the rare blood disorder.” Then,

pX(x) = pr(X = x) = pr(A ∩ B) = pr(A)pr(B|A)

=CM

k−1CN−M(x−1)−(k−1)

CNx−1

· [M − (k − 1)][N − (x − 1)]

= CMk−1CN−M

x−k

CNx−1

(M − k + 1N − x + 1

)

= Cx−1k−1CN−x

M−k

CNM

, 1 ≤ k ≤ x ≤ (N − M + k).

Page 87: Exercises and Solutions in Biostatistical Theory (2010)

68 Univariate Distribution Theory

(c) pr(third individual selected has the rare blood disorder)

= pr(D1 ∩ D2 ∩ D3) + pr(D1 ∩ D2 ∩ D3) + pr(D1 ∩ D2 ∩ D3) + pr(D1 ∩ D2 ∩ D3)

=(

MN

)(M − 1N − 1

)(M − 2N − 2

)+(

MN

)(N − MN − 1

)(M − 1N − 2

)

+(

N − MN

)(N − M − 1

N − 1

)(M

N − 2

)+(

N − MN

)(M

N − 1

)(M − 1N − 2

)= M

N.

Solution 2.2. For x = 0, 1, . . . , (k − 2), there are exactly (k − x − 1) pairs of slots forwhich the integer 1 precedes the integer k and for which there are exactly x integersbetween the integers 1 and k. Also, the integer k can precede the integer 1, and the other(k − 2) integers can be arranged in the remaining (k − 2) slots in (k − 2)! ways. So,

pX(x) = 2(k − x − 1)[(k − 2)!]k! = 2(k − x − 1)

k(k − 1), x = 0, 1, . . . , (k − 2).

Clearly, pX(x) ≥ 0, x = 0, 1, . . . , (k − 2), and

k−2∑

x=0

pX(x) =k−2∑

x=0

2(k − x − 1)

k(k − 1)= 2

k(k − 1)

k−2∑

x=0

[(k − 1) − x]

= 2k(k − 1)

[(k − 1)2 − (k − 2)(k − 1)

2

]

= 2k

[(k − 1) − (k − 2)

2

]= 1.

So, pX(x) is a valid discrete probability distribution.

(b) Now,

E(X) =k−2∑

x=0

xpX(x) = 2k(k − 1)

k−2∑

x=0

x[(k − 1) − x]

= 2k(k − 1)

{(k − 1)

[(k − 2)(k − 1)

2

]− (k − 2)(k − 1)[2(k − 2) + 1]

6

}

= 2k

[(k − 2)(k − 1)

2− (k − 2)(2k − 3)

6

]

= (k − 2)

k

[3(k − 1) − (2k − 3)

3

]

= (k − 2)

3, k ≥ 3.

Page 88: Exercises and Solutions in Biostatistical Theory (2010)

Solutions 69

Solution 2.3

(a)

pr(Y = 2) = pr(W1 ∩ W2) =(

46

) (35

)= 2

5 ,

pr(Y = 1) = pr(B1 ∩ W2 ∩ W3 ∩ W4) + pr(W1 ∩ B2 ∩ W3 ∩ W4)

=(

26

)(45

)(34

)(23

)+(

46

)(25

)(34

)(23

)= 4

15,

pr(Y = 0) = pr(B1 ∩ W2 ∩ W3 ∩ B4) + pr(B1 ∩ W2 ∩ B3 ∩ W4)

+ pr(W1 ∩ B2 ∩ B3 ∩ W4) + pr(W1 ∩ B2 ∩ W3 ∩ B4)

+ pr(B1 ∩ B2)

= 4(

26

)(45

)(34

)(13

)+(

26

)(15

)= 1

3.

Or, pr(Y = 0) = 1 − pr(Y = 1) − pr(Y = 2) = 1 − 415 − 2/5 = 1/3.

Thus,

E(Y) = 0(

13

)+ 1

(415

)+ 2

(25

)= 16

15= 1.0667.

Since

E(Y2) = (0)2(

13

)+ (1)2

(415

)+ (2)2

(25

)= 28

15,

V(Y) = 2815

−(

1615

)2= 164

225= 0.7289.

(b) Clearly, pr(white ball) = 2/3, and this probability stays the same for each ballselected. So, pr(a pair contains 2 white balls) = (2/3)2 = 4/9. Now, let X = numberof pairs that have to be selected to obtain exactly two pairs of white balls. SinceX ∼ NEGBIN(k = 2, π = 4/9), it follows that

pX(x) = Cx−12−1

(49

)2 (59

)x−2

= (x − 1)

(49

)2 (59

)x−2, x = 2, 3, . . . , ∞.

Solution 2.4

(a) At the second stage of sampling, we are sampling without replacement from a finitepopulation of N animals, of which m are marked and (N − m) are unmarked. So,the hypergeometric distribution applies. In particular, the exact distribution of X is

pX(x) = Cmx CN−m

n−x

CNn

, max[0, n − (N − m)] ≤ x ≤ n.

Page 89: Exercises and Solutions in Biostatistical Theory (2010)

70 Univariate Distribution Theory

(b)

π =4∑

j=2

C4j CN−4

n−j

CNn

.

(c) Since X has the hypergeometric distribution given in part (a), it follows directlythat E(X) = n(m/N). Since x, the observed value of X, is our best guess for E(X), itis logical to equate x to E(X), obtaining x = n(m/N). This leads to the expressionN = mn/x. When x = 22, m = 600, and n = 300, the computed value of N is 8181.82.Two obvious problems with the estimate N are that it does not necessarily takepositive integer values, and it is not defined when x = 0.

Solution 2.5. Since each health status category has probability 1/k of being encoun-tered, pr(encountering a new health status category|c different health status categorieshave already been encountered) = (1 − c/k). Also, the daily outcomes are mutuallyindependent of one another, and the probability (1 − c/k) remains the same from dayto day.

So, pr(it takes exactly x days to encounter a new health status category|c differenthealth status categories have already been encountered) = pr[not a new category inthe first (x − 1) days] × pr[new category on the xth day]

=( c

k

)x−1 ·(

1 − ck

)= (k − c)k−xcx−1, 0 ≤ c ≤ (k − 1).

In other words, if X is the random variable denoting the number of days required toencounter a new health status category, then X has a geometric distribution, namely,

pX(x) = (k − c)k−xcx−1, x = 1, 2, . . . , ∞.

(b) For 0 ≤ c ≤ (k − 1) and with q = c/k, we have E(X|c different health statuscategories have already been encountered)

=∞∑

x=1

x( c

k

)x−1 (1 − c

k

)=(

1 − ck

) ∞∑

x=1

xqx−1

=(

1 − ck

) ∞∑

x=1

d(qx)

dq

=(

1 − ck

) ddq

⎧⎨⎩

∞∑

x=1

qx

⎫⎬⎭ =

(1 − c

k

) ddq

[q

1 − q

]

=(

1 − ck

){ (1)(1 − q) − q(−1)

(1 − q)2

}=(

1 − ck

)

(1 − c

k

)2 = k(k − c)

,

which follows directly since X ∼ GEOM(

1 − ck

).

So, the expected total number of days = k∑k−1

c=0(k − c)−1.

Page 90: Exercises and Solutions in Biostatistical Theory (2010)

Solutions 71

When k = 4, we get 4∑3

c=0(4 − c)−1 = 8.33; in other words, it will take, on average,nine days to encounter people in all k = 4 health status categories.

Solution 2.6

(a) If X ∼ BIN(n, π = 0.0005), then

pr(X ≥ 1) = 1 − pr(X = 0) = 1 − (0.9995)n ≥ 0.90;

thus, we obtain

n ln(0.9995) ≤ ln(0.10), or n ≥ 4605.17, or n∗ = 4606.

And, if Y ∼POI(nπ), then

pr(Y ≥ 1) = 1 − pr(Y = 0) = 1 − e−nπ = 1 − e−0.0005n ≥ 0.90;

thus, we obtain

e−0.0005n ≤ 0.10, or − 0.0005n ≤ ln(0.10), or n ≥ 4605.17,

which again gives n∗ = 4606. These numerical answers are the same because π isvery close to zero in value.

(b) With X ∼ BIN(n∗, π), then P = (AX − n∗). Thus, requiring E(P) = (An∗π − n∗)

≥ 0 gives

Aπ − 1 ≥ 0, or A ≥ π−1, or A ≥ (0.0005)−1 = $2000.00.

(c) Let U be the discrete random variable denoting the number of then∗ tickets purchased by this person that are jackpot-winning tickets. Then, U ∼HG(N, K, n∗). So,

pr(k ≤ U ≤ 4, 606) =4606∑

u=k

CKu CN−K

4606−u

CN4606

.

Solution 2.7

(a)

pr(Y = 1) =k∑

j=1

pr(both twins choose the number j)

=k∑

j=1

pr(one twin chooses j)pr(other twin chooses j)

=k∑

j=1

(1k

)(1k

)= 1

k.

Page 91: Exercises and Solutions in Biostatistical Theory (2010)

72 Univariate Distribution Theory

So,

pY(y) =(

1k

)y ( k − 1k

)1−y, y = 0, 1.

(b) We wish to choose the smallest value of k such that 1k ≤ 0.01, which requires

k = 100.

(c) Let A be the event that “at least one set out of 100 sets of monozygotic twinschooses matching numbers” and let B be the event that “no set of monozygotictwins actually has ESP.” Then,

pr(A|B) = 1 − pr(A|B) = 1 − (0.99)100 = 1 − 0.366 = 0.634.

Thus, if this parapsychologist conducts this experiment on a reasonably large num-ber of monozygotic twins, there is a very high probability of concluding incorrectlythat one or more sets of monozygotic twins has ESP. Clearly, the chance of mak-ing this mistake increases as the number of sets of monozygotic twins studiedincreases.

(d) With X defined as the number of matches in n = 10 independent repetitions of theexperiment, then X ∼ BIN(n = 10, π = 0.01). So,

pr(X ≥ 2) = 1 − pr(X ≤ 1)

= 1 −1∑

x=0

C10x (0.01)x(0.99)10−x

= 1 − (0.99)10 − (10)(0.01)(0.99)9

= 1 − 0.9044 − 0.0914

= 0.0042.

So, for this particular set of monozygotic twins, there is some statistical evidencefor the presence of ESP. Perhaps further study about this pair of twins is warranted,hopefully using other more sophisticated ESP detection experiments.

(e) Let D be the event that “the two randomly chosen numbers are not the same.” Then,

pr(S = 3|D) = pr[(S = 3) ∩ D]pr(D)

= pr(1, 2) + pr(2, 1)(1 − 1

4

)

= pr(1)pr(2) + pr(2)pr(1)(34

)

Page 92: Exercises and Solutions in Biostatistical Theory (2010)

Solutions 73

=

(14

)(14

)+(

14

)(14

)

(34

)

= 16

;

pr(S = 4|D) = pr(1, 3) + pr(3, 1)(34

) =

(116

+ 116

)

(34

) = 16

;

pr(S = 5|D) = pr(2, 3) + pr(3, 2) + pr(4, 1) + pr(1, 4)(34

)

=

(416

)

(34

) = 13

;

pr(S = 6|D) = pr(2, 4) + pr(4, 2)(34

) =

(216

)

(34

) = 16

pr(S = 7|D) = pr(3, 4) + pr(4, 3)(34

) =

(216

)

(34

) = 16

.

Hence, the probability distribution is

pS(s|D) =

⎧⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎩

16

if s = 3, 4, 6, or 7;

13

if s = 5;

0 otherwise.

Note that this is a type of “truncated” distribution. The expected value is

E(S|D) = 3(

16

)+ 4

(16

)+ 5

(13

)+ 6

(16

)+ 7

(16

)= 5.

Page 93: Exercises and Solutions in Biostatistical Theory (2010)

74 Univariate Distribution Theory

Solution 2.8

(a)

pr(Y ≤ 3|Y ≥ 2) = pr[(Y ≥ 2) ∩ (Y ≤ 3)]pr(Y ≥ 2)

= pr(Y = 2) + pr(Y = 3)

1 − pr(Y = 1)

=[

λ2

2!(eλ−1)

]+[

λ3

3!(eλ−1)

][1 − λ

eλ−1

] = λ2(λ + 3)

6(eλ − λ − 1).

(b) For r = 1, 2, . . . ,

E[

Y!(Y − r)!

]=

∞∑

y=1

[y!

(y − r)!]

λy

y!(eλ − 1)=

∞∑y=r

λy

(y − r)!(eλ − 1)

=∞∑

u=0

λu+r

u!(eλ − 1)= λreλ

(eλ − 1).

So, for r = 1,

E(Y) = λeλ

(eλ − 1).

And, for r = 2,

E[Y(Y − 1)] = λ2eλ

(eλ − 1),

so that

V(Y) = E[Y(Y − 1)] + E(Y) − [E(Y)]2 = λeλ(eλ − λ − 1)

(eλ − 1)2 .

(c)

θ = pr(V) = pr[V ∩ (Y ≥ 1)] = pr{

V ∩ [∪∞y=1(Y = y)]

}

= pr{∪∞

y=1[V ∩ (Y = y)]}

=∞∑

y=1

pr[V ∩ (Y = y)]

=∞∑

y=1

pr(V|Y = y)pr(Y = y) =∞∑

y=1

(πy)λy

y!(eλ − 1)

= (eλ − 1)−1∞∑

y=1

(πλ)y

y! = (eλ − 1)−1

⎡⎣

∞∑

y=0

(πλ)y

y! − 1

⎤⎦

= (eπλ − 1)

(eλ − 1).

Page 94: Exercises and Solutions in Biostatistical Theory (2010)

Solutions 75

Solution 2.9. First, note that U/2 = Z ∼ N(0, 1). Making use of this result, we thenhave

pr[|W − 30| < 0.50] = pr[−0.50 < (W − 30) < 0.50]= pr[29.5 < W < 30.5]= pr[29.5 < 31 + (0.50)U < 30.5]= pr [(29.5 − 31) < Z < (30.5 − 31)]

= pr(−1.50 < Z < −0.50)

= FZ(−0.50) − FZ(−1.50)

= 0.3085 − 0.0668

= 0.2417.

Solution 2.10

(a) For Process 1,∫2.0

1.03.144e−x dx = 0.7313.

For Process 2, ∫2.0

1.02.574e−x dx = 0.5987.

Let A be the event that “both computer chips were produced by Process 1,” let B bethe event that “one of the computer chips is acceptable and the other computer chipis unacceptable,” and let C be the event that “any computer chip is acceptable.”

Clearly, pr(A) =(

13

)2 = 1/9 = 0.1111. And, pr(B|A) = 2(0.7313)(0.2687) = 0.3930.Also, since

pr(C) = (0.7313)

(13

)+ (0.5987)

(23

)= 0.6429,

it follows that

pr(B) = C21(0.6429)(0.3571) = 0.4592.

Finally,

pr(A|B) = pr(A ∩ B)

pr(B)

= pr(B|A)pr(A)

pr(B)= (0.3930)(0.1111)

0.4592= 0.0951.

(b) For y = 3, 4, . . . , ∞, the event Y = y can occur in one of two mutually exclu-sive ways: (i) The first (y − 1) chips selected are all acceptable, and then the yth

Page 95: Exercises and Solutions in Biostatistical Theory (2010)

76 Univariate Distribution Theory

chip selected is unacceptable; or (ii) The first (y − 1) chips selected include oneacceptable chip and (y − 2) unacceptable chips, and then the yth chip selected isacceptable. So, if C is the event that a computer chip is acceptable, and if P1 isthe event that a computer chip is produced by Process 1, and if θ denotes theprobability of selecting an acceptable chip, then

θ = pr(C|P1)pr(P1) + pr(C|P1)pr(P1)

= (0.7313)

(13

)+ (0.5987)

(23

)= 0.6429.

Thus, with θ = 0.6429, we have

pY(y) = θy−1(1 − θ) + (y − 1)(1 − θ)y−2θ2, y = 3, 4, . . . , ∞.

Now, 0 ≤ pY(y), y = 3, 4, . . . , ∞, and

∞∑

y=3

pY(y) =∞∑

y=3

[θy−1(1 − θ) + (y − 1)(1 − θ)y−2θ2

]

= (1 − θ)

∞∑

y=3

θy−1 + θ

∞∑

u=2

u(1 − θ)u−1θ

= (1 − θ)

(θ2

1 − θ

)+ θ

⎡⎣

∞∑

u=1

u(1 − θ)u−1θ − θ

⎤⎦

= θ2 + θ

(1θ

− θ

)= 1,

so that pY(y) is a valid discrete probability distribution.

Solution 2.11

(a)

pr(X > 1) = pr(A1) = θ;

pr(X > 2) = pr(A1)pr(A2|A1) = θ(θ2) = θ3;

pr(X > 3) = pr(A1)pr(A2|A1)pr(A3|A1 ∩ A2) = θ(θ2)(θ3) = θ6;

and, in general,

pr(X > x) =x∏

i=1

θi, x = 1, 2, . . . , ∞.

Page 96: Exercises and Solutions in Biostatistical Theory (2010)

Solutions 77

So,

pX(x) = pr(X = x) = pr(X > x − 1) − pr(X > x)

=x−1∏

i=1

θi −x∏

i=1

θi

=⎛⎝

x−1∏

i=1

θi

⎞⎠ (1 − θx)

=(

θ∑x−1

i=1 i)

(1 − θx)

= θx(x−1)

2 (1 − θx), x = 1, 2, . . . , ∞.

Since 0 < θ < 1, clearly 0 ≤ pX(x) ≤ 1 for x = 1, 2, . . . , ∞. And,

∞∑

x=1

pX(x) =∞∑

x=1

θx(x−1)

2 (1 − θx)

=∞∑

x=1

θx(x−1)

2 −∞∑

x=1

θx(x+1)

2

=∞∑

y=0

θy(y+1)

2 −∞∑

x=1

θx(x+1)

2

= 1 +∞∑

y=1

θy(y+1)

2 −∞∑

x=1

θx(x+1)

2 = 1.

(b) We have

E(X) =∞∑

x=1

xθx(x−1)

2 (1 − θx)

=∞∑

x=1

xθx(x−1)

2 −∞∑

x=1

xθx(x+1)

2

=∞∑

y=0

(y + 1)θy(y+1)

2 −∞∑

x=1

xθx(x+1)

2

=∞∑

y=0

θy(y+1)

2 +∞∑

y=0

yθy(y+1)

2 −∞∑

x=1

xθx(x+1)

2

Page 97: Exercises and Solutions in Biostatistical Theory (2010)

78 Univariate Distribution Theory

=∞∑

y=0

θy(y+1)

2

= 1 + θ + θ3 + θ6 + θ10 + · · · ≈ (1 + θ + θ3),

assuming terms of the form θj for j > 3 can be neglected.

Solution 2.12. For y ≥ 0,

FY(y) = pr[Y ≤ y] = pr{[− ln(1 − X)]1/3 ≤ y} = pr[− ln(1 − X) ≤ y3]= pr[ln(1 − X) ≥ −y3] = pr[(1 − X) ≥ e−y3 ]= pr[X ≤ (1 − e−y3

)] = FX(1 − e−y3) = 1 − e−y3

,

since FX(x) = x, 0 < x < 1.So,

fY(y) = dFY(y)

dy= 3y2e−y3

, 0 < y < ∞.

So, for r ≥ 0, and with u = y3,

E(Yr) =∫∞

0(yr)3y2e−y3

dy =∫∞

0ur/3e−u du

= Γ( r

3+ 1) ∫∞

0

u( r

3 +1)−1e−u

Γ( r

3 + 1) du

= Γ( r

3+ 1)

, r ≥ 0.

Solution 2.13

(a) pr(−2 < X < 2) = ∫2−2

1288 (36 − x2) dx = 0.4815.

(b) FX(x) = ∫x−6

1288 (36 − t2) dt = 1

288 (144 + 36x − x3

3 ), −6 < x < 6. So,

pr(X > 3|X > 0) = pr[(X > 3) ∩ (X > 0)]pr(X > 0)

= pr(X > 3)

pr(X > 0)

= 1 − FX(3)

1 − FX(0)= 0.3124.

(c) Now, pr(X > 1) = 1 − FX(1) = 0.3762. So, using the negative binomial distribution,

π =6∑

k=3

C6−1k−1(0.3762)k(0.6238)6−k = 0.2332.

Page 98: Exercises and Solutions in Biostatistical Theory (2010)

Solutions 79

(d) Using Tchebyshev’s Inequality, we know that L = E(X) − 3√

V(X)and U = E(X) +3√

V(X). Since fX(x) is symmetric about zero, we know that E(X) = 0. So, V(X) =E(X2) = ∫6

−6(x2) 1288 (36 − x2) dx = 7.20, so that L = −8.05 and U = 8.05. These

findings clearly illustrate the very conservative nature of Tchebyshev’s Theorem,since pr(−8.05 < X < 8.05) = 1.

Solution 2.14

(a) Although it is possible to find PX(s) directly, it is easier to make use of the connec-tion between the moment generating function of X and the probability generatingfunction of X. In particular,

MX(t) = E(etX) =∫∞−∞

etx[πf1(x) + (1 − π)f2(x)] dx

= π

∫∞−∞

etxf1(x) dx + (1 − π)

∫∞−∞

etxf2(x) dx

= πe(μ1t+σ21t2/2) + (1 − π)e(μ2t+σ2

2t2/2).

So, using the fact that s = et and ln(s) = t, it follows directly that

PX(s) = πsμ1 eσ2

1[ln(s)]22 + (1 − π)sμ2 e

σ22[ln(s)]2

2 .

So,

dPX(s)ds

= π[μ1sμ1−1e(σ2

1/2)[ln(s)]2 + sμ1 e(σ21/2)[ln(s)]2

σ21s−1 ln(s)

]

+ (1 − π)[μ2sμ2−1e(σ2

2/2)[ln(s)]2 + sμ2 e(σ22/2)[ln(s)]2

× σ22s−1 ln(s)

].

Finally,

E(X) = dPX(s)ds |s=1

= πμ1 + (1 − π)μ2.

(b) Let A be the event that “X is from f1(x),” so that A is the event that “X is fromf2(x)”; and, let B be the event that “X > 1.10.” Then, as a direct application ofBayes’ Theorem, we have

pr(A|B) = pr(B|A)pr(A)

pr(B|A)pr(A) + pr(B|A)pr(A)

= πpr(B|A)

πpr(B|A) + (1 − π)pr(B|A).

Page 99: Exercises and Solutions in Biostatistical Theory (2010)

80 Univariate Distribution Theory

Now, with Z ∼N(0,1), we have

pr(B|A) = pr(

X − 1.00√0.50

>1.10 − 1.00√

0.50

)= pr(Z > 0.1414) = 0.44

and

pr(B|A) = pr(

X − 1.20√0.40

>1.10 − 1.20√

0.40

)= pr(Z > −0.1581) = 0.56.

Thus,

pr(A|B) = (0.60)(0.44)

(0.60)(0.44) + (0.40)(0.56)= 0.54.

(c) Since pr(X > 1) = 0.16, it follows that the appropriate truncated density functionfor X is

fX(x|X > 1) = (0.16)−1 1√2π

e−x2/2, 1 < x < ∞.

So,

E(X|X > 1) = (0.16)−1∫∞

1x

1√2π

e−x2/2 dx.

Letting y = x2/2, so that dy = x dx, we have

E(X|X > 1) = [√2π(0.16)]−1∫∞

1/2e−y dy

= 2.4934[−e−y]∞

1/2 = 2.4934(

e−1/2)

= 1.5123.

Solution 2.15. For r an odd positive integer,

E(|Yr|) =

∫∞−∞

|yr| 1√2π

e−y2/2 dy

=∫0

−∞(−yr)

1√2π

e−y2/2 dy +∫∞

0yr 1√

2πe−y2/2 dy

= 2∫∞

0yr 1√

2πe−y2/2 dy

=∫∞

0

1√2π

(u1/2

)r−1e−u/2 du

=∫∞

0

1√2π

u

(r+1

2

)−1

e−u/2 du

= 1√2π

Γ

(r + 1

2

)2

(r+1

2

) ∫∞0

u

(r+1

2

)−1

e−u/2

Γ(

r+12

)2

(r+1

2

) du

= 1√2π

Γ

(r + 1

2

)2

(r+1

2

).

Page 100: Exercises and Solutions in Biostatistical Theory (2010)

Solutions 81

Solution 2.16

E[

Y!(Y − r)!

]=

∞∑

y=0

y!(y − r)!Cy+k−1

k−1 πk(1 − π)y

=∞∑

y=r

(y + k − 1)!(k − 1)!(y − r)!π

k(1 − π)y

=∞∑

u=0

(u + r + k − 1)!(k − 1)!u! πk(1 − π)u+r

= π−r(1 − π)r(k + r − 1)!(k − 1)!

∞∑

u=0

Cu+(k+r)−1(k+r)−1 π(k+r)(1 − π)u

= π−r(1 − π)r(k + r − 1)!(k − 1)!

= (k + r − 1)!(k − 1)!

(1 − π

π

)r, r = 0, 1, . . . , ∞.

So, r = 1 gives

E(Y) = k(

1 − π

π

).

And, r = 2 gives

E[Y(Y − 1)] = k(k + 1)

(1 − π

π

)2,

so that

V(Y) = k(k + 1)

(1 − π

π

)2+ k

(1 − π

π

)−[

k(

1 − π

π

)]2

= k(1 − π)

π2 .

Since X = (Y + k),

E(X) = E(Y) + k = k(

1 − π

π

)+ k = k

π

and

V(X) = V(Y) = k(1 − π)/π2.

These are expected answers, since X ∼ NEGBIN(k, π).

Page 101: Exercises and Solutions in Biostatistical Theory (2010)

82 Univariate Distribution Theory

Solution 2.17

(a) For −∞ < y ≤ β,

FY(y) =∫y

−∞(2α)−1e−(β−t)/α dt

= e−β/α

2

∫y

−∞1α

et/α dt

= e−β/α

2

(ey/α

)

= 12

e(y−β)/α.

Note that FY(β) = 12 ; this is an expected result because the density function fY(y)

is symmetric around β.

For β < y < +∞,

FY(y) = FY(β) +∫y

β

12α

e−(t−β)/α dt

= 12

+ eβ/α

2

∫y

β

e−t/α dt

= 12

+ eβ/α

2[e−β/α − e−y/α]

= 1 − 12

e−(y−β)/α.

Thus,

FY(y) =

⎧⎪⎨⎪⎩

12

e(y−β)/α, −∞ < y ≤ β;

1 − 12

e−(y−β)/α, β < y < +∞.

Now, if α = 1 and β = 2,

pr(X > 4|X > 2) = pr(Y > ln 4|Y > ln 2)

= pr [(Y > ln 4) ∩ (Y > ln 2)]pr(Y > ln 2)

= pr(Y > ln 4)

pr(Y > ln 2)

= 1 − FY(1.3863)

1 − FY(0.6931)

Page 102: Exercises and Solutions in Biostatistical Theory (2010)

Solutions 83

= 1 − 12 e(1.3863−2)

1 − 12 e(0.6931−2)

= 0.8434.

(b) Now,

φY(t) = E{

et|Y−E(Y)|}

= E{

et|Y−β|}

=∫∞−∞

et|y−β|(2α)−1e−|y−β|/α dy

=∫∞−∞

12α

e−|y−β|

(1α−t)

dy

=∫∞−∞

12α

e−|y−β|

/[α

(1−αt)

]dy

= [α/(1 − αt)]α

= (1 − αt)−1, αt < 1.

So,

[dφY(t)

dt

]

|t=0= ν1 = E{|Y − E(Y)|}

= [−(1 − αt)−2(−α)]|t=0 = α.

And,

[d2φY(t)

dt2

]

|t=0

= ν2 = E{|Y − E(Y)|2} = V(Y)

= [α(−2)(1 − αt)−3(−α)]|t=0 = 2α2.

Solution 2.18. First, with y = x2/2 so that dy = x dx, we have

E(X) =∫∞

0x(

2πθ

)1/2e−x2/2θ dx

=∫∞

0

(2πθ

)1/2e−y/θ dy

=(

2πθ

)1/2(θ)

∫∞0

e−y/θ dy

=(

π

)1/2.

Page 103: Exercises and Solutions in Biostatistical Theory (2010)

84 Univariate Distribution Theory

Now, we have

E(Y) = E[g(X)] = 1 −∫∞

0

(αe−βx2)( 2

πθ

)1/2e−x2/2θ dx

= 1 − α

(2πθ

)1/2 ∫∞0

e−(β+ 1

)x2

dx

= 1 − α

(2πθ

)1/2 (12

) ∫∞−∞

e−x2/[

2(

θ2θβ+1

)]dx

= 1 − α√θ

∫∞−∞

1√2π

e−x2/[

2(

θ2θβ+1

)]dx

= 1 − α√θ

√θ

(2θβ + 1)

= 1 − α√2θβ + 1

= 1 − α√πβ[E(X)]2 + 1

.

Note that the average risk increases as both β and E(X) increase, but the average riskdecreases as α increases.

Solution 2.19

(a) Clearly, the distribution of X1 is negative binomial, namely,

pX1(x1) = C(x1−1)

(2−1)π2

h(1 − πh)(x1−2), x1 = 2, 3, . . . , ∞.

(b) pX2(x2) = pr(X2 = x2) = pr[∪x2−1

j=1 (Aj ∩ B)]

+ pr[∪x2−1

j=1 (Cj ∩ D)]

, where Aj isthe event that “the first (x2 − 1) subjects selected consist of j heavy smokers and(x2 − 1 − j) nonsmokers,” B is the event that “the x2th subject selected is a lightsmoker,” Cj is the event that “the first (x2 − 1) subjects selected consist of j lightsmokers and (x2 − 1 − j) nonsmokers,” and D is the event that “the x2th subjectselected is a heavy smoker.”

So,

pX2(x2) =

⎡⎣

(x2−1)∑

j=1

C(x2−1)j π

jh(π0)(x2−1−j)

⎤⎦πl

+⎡⎣

(x2−1)∑

j=1

C(x2−1)j π

jl(π0)(x2−1−j)

⎤⎦πh

Page 104: Exercises and Solutions in Biostatistical Theory (2010)

Solutions 85

= [(πh + π0)(x2−1) − π(x2−1)0 ]πl + [(πl + π0)(x2−1)

− π(x2−1)0 ]πh

= πl(1 − πl)(x2−1) + πh(1 − πh)(x2−1) − (1 − π0)π

(x2−1)0 ,

x2 = 2, 3, . . . , ∞.

(c) Via a direct extension of the reasoning used in part (b), we obtain the following:

pX3(x3) =

⎡⎣

(x3−2)∑

j=1

C(x3−1)j π

jlπ

(x3−1−j)h

⎤⎦π0

+⎡⎣

(x3−2)∑

j=1

C(x3−1)j π

j0π

(x3−1−j)h

⎤⎦πl

+⎡⎣

(x3−2)∑

j=1

C(x3−1)j π

j0π

(x3−1−j)l

⎤⎦πh

= π0

[(πl + πh)(x3−1) − π

(x3−1)

l − π(x3−1)

h

]

+ πl

[(π0 + πh)(x3−1) − π

(x3−1)0 − π

(x3−1)

h

]

+ πh

[(π0 + πl)

(x3−1) − π(x3−1)0 − π

(x3−1)

l

]

= π0(1 − π0)(x3−1) + πl(1 − πl)(x3−1) + πh(1 − πh)(x3−1)

− (1 − π0)π(x3−1)0 − (1 − πl)π

(x3−1)

l − (1 − πh)π(x3−1)

h ,

x3 = 3, 4, . . . , ∞.

Solution 2.20

(a) Since Y ∼ N(μ, σ2), it follows that the moment generating function for Y = eX is

E(etY) = E(Xt) = e

(μt+ σ2t2

2

), −∞ < t < +∞.

So, for t = 1,

E(X) = e(μ+0.50σ2).

And, for t = 2,

V(X) = E(X2) − [E(X)]2 = e(2μ+2σ2) −[e(μ+0.50σ2)

]2

= e(2μ+σ2)(eσ2 − 1).

Page 105: Exercises and Solutions in Biostatistical Theory (2010)

86 Univariate Distribution Theory

(b) Since E(X) = V(X) = 1, we have

V(X)

[E(X)]2 =(

eσ2 − 1)

= 1,

which gives σ = 0.8326.And, the equation

[E(X)]2 = e(2μ+σ2) = e[2μ+(0.8326)2] = 1 gives μ = −0.3466.

So,

pr(X > 1) = pr(Y > 0) = pr[

Y − (−0.3466)

0.8326>

0 − (−0.3466)

0.8326

]

= pr(Z > 0.4163) = 0.339, since Z ∼ N(0, 1).

(c) Now,

pr(X ≤ c) = pr[Y ≤ ln(c)] = pr[

Z ≤ ln(c) − μ

σ

],

where Z = Y − μ

σ∼ N(0, 1).

Thus, to satisfy pr(X ≤ c) ≥ (1 − α) requires {[ln(c) − μ]/σ} ≥ z1−α.

And, since E(X) = e(μ+0.50σ2), so that μ = ln[E(X)] − 0.50σ2, the inequality{[ln(c) − μ]/σ} ≥ z1−α is equivalent to the inequality

ln(c) − [lnE(X) − 0.50σ2]σ

≥ z1−α,

which, in turn, can be written in the form

ln[

cE(X)

]≥ σz1−α − 0.50σ2 = 0.50z2

1−α − 0.50(z1−α − σ)2.

So, if ln [c/E(X)] ≥ 0.50z21−α

, then the above inequality will be satisfied. Equiva-lently, we need to pick E(X) small enough so that

E(X) ≤ ce−0.50z21−α .

Solution 2.21

(a) Since θ0 = 1 − α [π/(1 − π)] and θx = αθx for x ≥ 1, we require that0 < α < [(1 − π)/π] so that 0 < πx < 1, x = 0, 1, 2, . . . , +∞. Now,

E(etX) =∞∑

x=0

etxπx

Page 106: Exercises and Solutions in Biostatistical Theory (2010)

Solutions 87

=[

1 − α

1 − π

)]+

∞∑

x=1

etx(απx)

=[

1 − α

1 − π

)]+ α

∞∑

x=1

(πet)x

=[

1 − α

1 − π

)]+ α

[πet

1 − πet

]

provided that 0 < πet < 1, or that −∞ < t < − ln π. So,

MX(t) =[

1 − α

1 − π

)]+ α

[πet

1 − πet

],

0 < α < [(1 − π)/π] , −∞ < t < − ln π.So,

E(X) = dMX(t)dt

∣∣∣∣t=0

={

απ

[et(1 − πet) − et(−πet)

(1 − πet)2

]}

|t=0

= απ

{[et

(1 − πet)2

]}

|t=0

= απ

(1 − π)2 .

(b)

E(X) =∞∑

x=0

xθx =∞∑

x=1

xαπx

= απ

∞∑

x=1

xπx−1 = απ

∞∑

x=1

ddπ

(πx)

= απd

∞∑

x=1

πx = απd

1 − π

)

= απ

(1 − π)2 .

Page 107: Exercises and Solutions in Biostatistical Theory (2010)

88 Univariate Distribution Theory

Solution 2.22. For the gamma distribution,

E(Xr) =∫∞

0xr xβ−1e−x/α

Γ(β)αβdx = Γ(β + r)

Γ(β)αr , (β + r) > 0.

So,

μ3 = E{[X − E(X)]3} = E(X3) − 3E(X2)E(X) + 2[E(X)]3= β(β + 1)(β + 2)α3 − 3[β(β + 1)α2](αβ) + 2α3β3

= 2α3β.

Thus,

α3 = 2α3β

(α2β)3/2 = 2√β

.

Now, to find the mode of the gamma distribution, we need to find that value of x, sayθ, which maximizes fX(x), or equivalently, which maximizes the function

h(x) = ln(

xβ−1e−x/α)

= (β − 1)ln(x) − xα

.

So,

dh(x)

dx= (β − 1)

x− 1

α= 0

gives θ = α(β − 1), which, for β > 1, maximizes fX(x); in particular, note that[d2h(x)]/dx2 = (1 − β)/x2, when evaluated at x = θ = α(β − 1), is negative for β > 1.

Finally, we have

α∗3 = αβ − α(β − 1)√

α2β= 1√

β.

Thus, we have α3 = 2α∗3, so that the two measures are essentially equivalent with

regard to quantifying the degree of asymmetry for the gamma distribution.NOTE: For the beta distribution,

fX(x) = Γ(α + β)

Γ(α)Γ(β)xα−1(1 − x)β−1, 0 < x < 1, α > 0, β > 0,

the interested reader can verify that the mode of the beta distribution is

θ = (α − 1)

(α + β − 2), α > 1, β > 1,

and that

α3 = 2(β − α)

(α + β + 2)

√(α + β + 1)

αβ= 2(α + β − 2)

(α + β + 2)α∗

3.

Page 108: Exercises and Solutions in Biostatistical Theory (2010)

Solutions 89

Solution 2.23∗

(a) We have

E(U) =∫L

0g(L)fX(x)dx +

∫∞L

xfX(x) dx

= g(L)

∫L

0fX(x)dx + π

∫∞L

x[

fX(x)

π

]dx

= (1 − π)g(L) + π

∫∞L

xfX(x|X ≥ L) dx

= (1 − π)g(L) + πE(X|X ≥ L).

And, using a similar development, we have

E(U2) =∫L

0

[g(L)

]2 fX(x) dx +∫∞

Lx2fX(x) dx

= (1 − π)[g(L)

]2 + π

∫∞L

x2fX(x|X ≥ L) dx

= (1 − π)[g(L)

]2 + πE(X2|X ≥ L).

Thus,

V(U) = E(U2) − [E(U)]2

= (1 − π)[g(L)

]2 + πE(X2|X ≥ L) − [(1 − π)g(L)

+ πE(X|X ≥ L)]2

= (1 − π)[g(L)

]2 + πE(X2|X ≥ L) − (1 − π)2 [g(L)]2

− 2π(1 − π)g(L)E(X|X ≥ L) − π2 [E(X|X ≥ L)]2

=[(1 − π) − (1 − π)2

] [g(L)

]2

+ πE(X2|X ≥ L) − 2π(1 − π)g(L)E(X|X ≥ L)

− π2 [E(X|X ≥ L)]2 + π [E(X|X ≥ L)]2 − π [E(X|X ≥ L)]2

= π(1 − π)[g(L)

]2 + π{

E(X2|X ≥ L) − [E(X|X ≥ L)]2}

− 2π(1 − π)g(L)E(X|X ≥ L) + π(1 − π) [E(X|X ≥ L)]2

= πV(X|X ≥ L) + π(1 − π)[g(L) − E(X|X ≥ L)

]2

= π{

V(X|X ≥ L) + (1 − π)[g(L) − E(X|X ≥ L)

]2} .

Page 109: Exercises and Solutions in Biostatistical Theory (2010)

90 Univariate Distribution Theory

(b) Since

E(X) =∫∞

0xfX(x) dx =

∫L

0xfX(x) dx +

∫∞L

xfX(x) dx

= (1 − π)

∫L

0x[

fX(x)

(1 − π)

]dx + π

∫∞L

x[

fX(x)

π

]dx

= (1 − π)E(X|X < L) + πE(X|X ≥ L),

it follows directly that choosing g(L) to be equal to E(X|X < L) will insure thatE(U) = E(X).

When fX(x) = e−x, x ≥ 0, and L = 0.05, then

(1 − π) =∫0.05

0e−x dx = [−e−x]0.05

0 = 0.0488.

Thus, using integration by parts with u = x and dv = e−x dx, we find that theoptimal choice for g(L) has the numerical value

E(X|X < L) =∫L

0xfX(x|X < L) dx =

∫L

0x[

fX(x)

(1 − π)

]dx

=∫0.05

0x[

e−x

0.0488

]dx = (0.0488)−1

∫0.05

0xe−x dx

= (20.4918)

{[−xe−x]0.05

0 +∫0.05

0e−x dx

}

= (20.4918)(−0.05e−0.05 + 0.0488

)= 0.0254.

For information about a more rigorous statistical approach for dealing with thisleft-censoring issue, see Taylor et al. (2001).

Solution 2.24∗. Now,

E(Y) =∫∞−∞

(1 − αe−βx2)

1√2πσ

e−(x−μ)2/2σ2dx

= 1 − α√2πσ

∫∞−∞

e−[βx2+ (x−μ)2

2σ2

]

dx.

Page 110: Exercises and Solutions in Biostatistical Theory (2010)

Solutions 91

And,

βx2 + (x − μ)2

2σ2 =(

β + 12σ2

)x2 −

σ2

)x + μ2

2σ2

=⎡⎢⎣x

√β + 1

2σ2 − μ

2σ2√

β + 12σ2

⎤⎥⎦

2

− μ2

4σ4(β + 1

2σ2

) + μ2

2σ2

=(

2βσ2 + 12σ2

)[x − μ

(2βσ2 + 1)

]2+ βμ2

(2βσ2 + 1).

Finally,

E(Y) = 1 − α√2πσ

∫∞−∞

e

⎧⎪⎪⎪⎪⎨⎪⎪⎪⎪⎩

⎡⎣x− μ

(2βσ2+1)

⎤⎦

2

2

⎛⎝ σ2

2βσ2+1

⎞⎠

⎫⎪⎪⎪⎪⎬⎪⎪⎪⎪⎭

e− βμ2

(2βσ2+1) dx

= 1 −(α

σ

)e− βμ2

(2βσ2+1)

√σ2

(2βσ2 + 1)

= 1 − α√2βσ2 + 1

e− βμ2

(2βσ2+1) .

Solution 2.25∗

(a) Now,

ψY(t) = ln[E(etY)

]= ln

[E(

et(X−c))]

= ln[e−tcE

(etX)]

= −tc + ln[E(

etX)]

= −tc +∞∑

r=1

κrtr

r!

= (κ1 − c)t +∞∑

r=2

κrtr

r! .

Hence, the cumulants of Y are identical to those for X, except for the first cumulant.In particular, if Y = (X − c), then the first cumulant of Y is (κ1 − c), where κ1 is thefirst cumulant of X.

(b)

(i) If X ∼N(μ, σ2), then the moment generating function of X is MX(t) =eμt+σ2t2/2. So,

ψX(t) = μt + σ2t2

2.

Page 111: Exercises and Solutions in Biostatistical Theory (2010)

92 Univariate Distribution Theory

Hence, κ1 = μ, κ2 = σ2, and κr = 0 for r = 3, 4, . . . , ∞.

(ii) If X ∼POI(λ), then MX(t) = eλ(et−1). So,

ψX(t) = λ(et − 1) = λ

∞∑

r=1

tr

r! =∞∑

r=1

(λ)tr

r! .

Thus, κr = λ for r = 1, 2, . . . , ∞.

(iii) If X ∼GAMMA(α, β), then MX(t) = (1 − αt)−β.So, ψX(t) = −β ln(1 − αt). Now,

ln(1 + y) =∞∑

r=1

(−1)r+1 yr

r, −1 < y < +1.

If y = −αt, and t is chosen so that |αt| < 1, then

ln(1 − αt) =∞∑

r=1

(−1)r+1 (−αt)r

r

=∞∑

r=1

(−1)2r+1αr(r − 1)! tr

r! = −∞∑

r=1

[(r − 1)!αr] tr

r! .

So,

ψX(t) = −β ln(1 − αt) =∞∑

r=1

[(r − 1)!αrβ] tr

r! , |αt| < 1;

thus, κr = (r − 1)!αrβ for r = 1, 2, . . . , ∞.

(c) First, for r = 1, 2, . . . , ∞, since ψX(t) =∑∞r=1 κr

tr

r! , it follows directly that

κr = drψX(t)dtr |t=0

.

So, since ψX(t) = ln[MX(t)] and since [drMX(t)]/dtr |t=0 = E(Xr) forr = 1, 2, . . . , ∞, we have

κ1 = dψX(t)dt |t=0

={[MX(t)]−1 dMX(t)

dt

}

|t=0

= (1)−1E(X) = E(X).

Next, since

d2ψX(t)dt2 = −[MX(t)]−2

[dMX(t)

dt

]2+ [MX(t)]−1 d2MX(t)

dt2 ,

Page 112: Exercises and Solutions in Biostatistical Theory (2010)

Solutions 93

it follows that

κ2 = d2ψX(t)dt2 |t=0

= −(1)−2[E(X)]2 + (1)−1E(X2)

= E(X2) − [E(X)]2 = V(X).

Finally, since

d3ψX(t)dt3 = 2[MX(t)]−3

[dMX(t)

dt

]3− 2[MX(t)]−2

[dMX(t)

dt

]

×[

d2MX(t)dt2

]− [MX(t)]−2

[dMX(t)

dt

][d2MX(t)

dt2

]

+ [MX(t)]−1

[d3MX(t)

dt3

],

we have

κ3 = d3ψX(t)dt3 |t=0

= 2(1)−3[E(X)]3 − 2(1)−2[E(X)][E(X2)] − (1)−2[E(X)]× [E(X2)] + (1)−1E(X3)

= E(X3) − 3E(X)E(X2) + 2[E(X)]3 = E{[X − E(X)]3}.

Solution 2.26∗

(a)

h(t) = limΔt→0

pr(t ≤ T ≤ t + Δt|T ≥ t)Δt

= limΔt→0 pr(t ≤ T ≤ t + Δt)/Δtpr(T ≥ t)

= dF(t)/dt1 − F(t)

= f(t)S(t)

.

(b) From part (a), H(t) = ∫t0 h(u) du = ∫t

0f(u)S(u)

du. Since

dS(u) = d[1 − F(u)] du = −f(u) du,

we have

H(t) = −∫ t

0

1S(u)

dS(u) = − ln[S(t)] + ln[S(0)] = − ln[S(t)] + ln(1)

= − ln[S(t)], or S(t) = e−H(t).

Page 113: Exercises and Solutions in Biostatistical Theory (2010)

94 Univariate Distribution Theory

(c) Now,

E(T) =∫∞

0tf(t) dt =

∫∞0

[∫ t

0du

]f(t) dt =

∫∞0

[∫∞0

I(t > u)du]

f(t) dt,

where I(A) is an indicator function taking the value 1 if event A holds and takingthe value 0 otherwise. Hence,

E(T) =∫∞

0

[∫∞0

I(t > u) du]

f(t) dt

=∫∞

0

[∫∞0

I(t > u)f(t) dt]

du

=∫∞

0

[∫∞u

f(t) dt]

du

=∫∞

0S(u) du.

(d) Note that

X ={

T if T < c;c if T ≥ c

So, fX(x) = fT(x) if T < c (so that 0 < x < c) and fX(x) = c if T ≥ c, an event whichoccurs with probability [1 − FT(c)]. Thus,

fX(x) = fT(x)I(x < c) + [1 − FT(c)]I(x = c), 0 < x ≤ c,

where, as in part (c), I(·) denotes the indicator function. In other words, fX(x) isa mixture of a continuous density [namely, fT(x)] for x < c and a point mass at coccurring with probability [1 − FT(c)].

So,

E[H(X)] = E [H(X)I(X < c) + H(c)I(X = c)]

= E[H(X)I(X < c)] + H(c)E[I(X = c)]

=∫ c

0H(x)fT(x) dx + H(c)pr(X = c)

=∫ c

0H(x)fT(x) dx + H(c)[1 − FT(c)].

Page 114: Exercises and Solutions in Biostatistical Theory (2010)

Solutions 95

Using integration by parts with u = H(x) and dv = fT(x) dx, we have

E[H(X)] = H(x)FT(x)∣∣c0 −

∫ c

0h(x)FT(x) dx + H(c) − H(c)FT(c)

= H(c)FT(c) − 0 −∫ c

0h(x)FT(x) dx + H(c) − H(c)FT(c)

= −∫ c

0h(x)FT(x) dx + H(c) = −

∫ c

0h(x)[1 − S(x)] dx + H(c)

= −∫ c

0h(x)

[1 − fT(x)

h(x)

]dx + H(c)

= −∫ c

0h(x) dx +

∫ c

0fT(x) dx + H(c)

= −[H(c) − H(0)] + [FT(c) − 0] + H(c)

= H(0) + FT(c) = − ln[S(0)] + FT(c) = − ln(1) + FT(c)

= FT(c).

Solution 2.27∗

(a) If N units are produced, then it follows that P = NG if X ≥ N, and P = [XG − (N −X)L] = [(G + L)X − NL] if X < N. Hence,

E(P) =∫N

0[(G + L)x − NL]fX(x) dx +

∫∞N

(NG)fX(x) dx

= (G + L)

∫N

0xfX(x) dx − NLFX(N) + NG[1 − FX(N)]

= (G + L)

∫N

0xfX(x) dx + NG − N(G + L)FX(N).

Now, via integration by parts,

∫N

0xfX(x) dx = [xFX(x)]N0 −

∫N

0FX(x) dx = NFX(N) −

∫N

0FX(x) dx,

so that we finally obtain

E(P) = NG − (G + L)

∫N

0FX(x) dx.

So,dE(P)

dN= G − (G + L)[FX(N)] = G − (G + L)FX(N) = 0,

which gives

FX(N) = GG + L

;

Page 115: Exercises and Solutions in Biostatistical Theory (2010)

96 Univariate Distribution Theory

since

d2E(P)

dN2 = −(G + L)fX(N) < 0,

this choice for N maximizes E(P).

(b) Since fX(x) = 2kxe−kx2, with k = 10−10, FX(x) = 1 − e−kx2

. So, solving theequation

FX(N) = 1 − e−kN2 = G(G + L)

gives

N =⎡⎣ ln

(L

G+L

)

−k

⎤⎦

1/2

So, using the values G = 4, L = 1, and k = 10−10, we obtain N = 126, 860 units.

Solution 2.28∗

(a) For k = 2, pr(P2 = αP1) = π and pr(P2 = βP1) = (1 − π), so that E(P2) =P1 [απ + β(1 − π)].

For k = 3, pr(P3 = α2P1) = π2, pr(P3 = αβP1) = 2π(1 − π), and pr(P3 = β2P1) =(1 − π)2, so that

E(P3) = P1

[α2π2 + 2αβπ(1 − π) + β2(1 − π)2

]

= P1 [απ + β(1 − π)]2 .

In general,

pr[Pk = αjβ(k−1)−jP1

]= Ck−1

j πj(1 − π)(k−1)−j,

j = 0, 1, . . . , (k − 1),

so that

E(Pk) =k−1∑

j=0

[αjβ(k−1)−jP1

]Ck−1

j πj(1 − π)(k−1)−j

= P1

k−1∑

j=0

Ck−1j (απ)j [β(1 − π)](k−1)−j

= P1 [απ + β(1 − π)]k−1 , k = 2, 3, . . . , ∞.

Page 116: Exercises and Solutions in Biostatistical Theory (2010)

Solutions 97

(b) For k = 2, 3, . . . , ∞, we consider the inequality

E(Pk) = P1 [απ + β(1 − π)]k−1 ≥ P∗,

or equivalently

α ≥ 1π

[(P∗P1

)[1/(k−1)]− β(1 − π)

],

which gives

α∗ = 1π

[(P∗P1

)[1/(k−1)]− β(1 − π)

].

Now,

limk→∞α∗ = 1π

[1 − β(1 − π)] = β − (β − 1)

π.

Since 1 < β < +∞, this limiting value of α∗ varies directly with π (i.e., the largeris π, the larger is this limiting value of α∗). In particular, when π = 1, so that everypolicy holder has a perfect driving record every year, then this insurance companyshould never reduce the yearly premium from its first-year value of P1.

If β = 1.05 and π = 0.90, then this limiting value equals 0.9944. So, for theseparticular values of β and π, this insurance company should never allow the yearlypremium to be below 0.9944P1 in value.

Solution 2.29∗

(a) For R = 2, we have

2∑

x=0

(1x!) 2−x∑

l=0

(−1)l

l! =(

10!)(

1 − 1 + 12!)

+(

11!)

(1 − 1) +(

12!)

(1) = 1.

Then, assuming that the result holds for the value R, we obtain

R+1∑

x=0

(1x!) (R+1)−x∑

l=0

(−1)l

l!

=R∑

x=0

(1x!) (R−x)+1∑

l=0

(−1)l

l! + 1(R + 1)!

=R∑

x=0

(1x!)R−x∑

l=0

(−1)l

l! +R∑

x=0

(1x!)

(−1)(R+1)−x

[(R + 1) − x]! + 1(R + 1)!

=R∑

x=0

pX(x) + 1(R + 1)!

R∑

x=0

CR+1x (−1)(R+1)−x + 1

(R + 1)!

Page 117: Exercises and Solutions in Biostatistical Theory (2010)

98 Univariate Distribution Theory

=R∑

x=0

pX(x) + 1(R + 1)!

R+1∑

x=0

CR+1x (1)x(−1)(R+1)−x

− 1(R + 1)! + 1

(R + 1)!

=R∑

x=0

pX(x) + [1 + (−1)]R+1 =R∑

x=0

pX(x) = 1,

which completes the proof by induction.

(b) We have

E(X) =R∑

x=0

x(

1x!)R−x∑

l=0

(−1)l

l!

=R∑

x=1

[1

(x − 1)!]R−x∑

l=0

(−1)l

l!

=R−1∑

y=0

(1y!) (R−1)−y∑

l=0

(−1)l

l! = 1.

And,

E[X(X − 1)] =R∑

x=0

x(x − 1)

(1x!)R−x∑

x=0

(−1)l

l!

=R∑

x=2

[1

(x − 2)!]R−x∑

l=0

(−1)l

l!

=R−2∑

y=0

(1y!) (R−2)−y∑

l=0

(−1)l

l! = 1,

so that V(X) = E[X(X − 1)] + E(X) − [E(X)]2 = 1 + 1 − (1)2 = 1.It seems counterintuitive that neither E(X) nor V(X) depends on the value of R.

Also,

limR→∞pX(x) = limR→∞

⎡⎣ 1

x!R−x∑

l=0

(−1)l

l!

⎤⎦

= 1x!

∞∑

l=0

(−1)l

l! = e−1

x!

= (1)xe−1

x! , x = 0, 1, . . . , ∞.

Page 118: Exercises and Solutions in Biostatistical Theory (2010)

Solutions 99

So, as R → ∞, the distribution of X becomes Poisson with E(X) = V(X) = 1.

Solution 2.30∗

(a) First,

1 − FWn(wn) = pr(Wn > wn)

= pr[Xwn ≤ (n − 1) in the time interval(0, wn)]

=n−1∑

x=0

(Nwnλ)xe−(Nwnλ)

x! ,

so that

FWn(wn) = 1 −n−1∑

x=0

(Nwnλ)xe−(Nwnλ)

x! .

So,

fWn(wn) = dFWn(wn)

dwn

= −e−Nwnλn−1∑

x=0

1x![xNλ(Nwnλ)x−1 − Nλ(Nwnλ)x

]

= −Nλe−Nwnλ

⎡⎣

n−1∑

x=1

(Nwnλ)x−1

(x − 1)! −n−1∑

x=0

(Nwnλ)x

x!

⎤⎦

= −Nλe−Nwnλ

[− (Nwnλ)n−1

(n − 1)!

]

= wn−1n e−Nλwn

Γ(n)(Nλ)−n , wn > 0.

So, Wn ∼ GAMMA[α = (Nλ)−1, β = n

].

(b) Note that E(XT) = V(XT) = NTλ. So,

E(etZ) = E

[e

t(

XT−NTλ√NTλ

)]

= e−t√

NTλE[

et√

NTλXT]

= e−t√

NTλeNTλ

(et/

√NTλ−1

).

Page 119: Exercises and Solutions in Biostatistical Theory (2010)

100 Univariate Distribution Theory

Now,

−t√

NTλ + NTλ

⎡⎢⎣

∞∑

j=0

(t/

√NTλ

)j

j! − 1

⎤⎥⎦

= −t√

NTλ + t√

NTλ + t2/2 +∞∑

j=3

(t/

√NTλ

)j

j! (NTλ),

which converges to t2/2 as N → ∞. Thus,

limN→∞ E(etZ) = et2/2,

so that, for large N,

Z = XT − NTλ√NTλ

∼ N(0, 1).

Then, if N = 105, λ = 10−4, and T = 10, so that NTλ = 100, then

pr(XT ≤ 90|NTλ = 100) = pr[

XT − 100√100

≤ 90 − 100√100

]

= pr(Z ≤ −1.00)

= 0.16,

since Z = (XT − 100)/√

100 ∼N(0, 1) for large N.

Solution 2.31∗

(a) With y = (x − c), we have

∫∞c

xβ−1e−x/α

Γ(β)αβdx =

∫∞0

(y + c)β−1e−(y+c)/α

Γ(β)αβdy

= e−c/α

Γ(β)αβ

∫∞0

(y + c)β−1e−y/α dy

= e−c/α

Γ(β)αβ

∫∞0

⎡⎣

β−1∑

j=0

Cβ−1j cjyβ−1−j

⎤⎦ e−y/α dy

= e−c/α

Γ(β)αβ

β−1∑

j=0

Cβ−1j cj

∫∞0

y(β−j)−1e−y/α dy

Page 120: Exercises and Solutions in Biostatistical Theory (2010)

Solutions 101

= e−c/α

(β − 1)!αβ

β−1∑

j=0

(β − 1)!(β − j − 1)!j! cj

[Γ(β − j)αβ−j

]

=β−1∑

j=0

e−c/α(c/α)j

j! ,

which is pr[X ≤ (β − 1)] when X ∼ POI(c/α).

(b) With x = c(1 − y), we have

∫ c

0

Γ(α + β)

Γ(α)Γ(β)xα−1(1 − x)β−1 dx

=∫1

0

Γ(α + β)

Γ(α)Γ(β)[c(1 − y)]α−1[1 − c(1 − y)]β−1(c) dy

= Γ(α + β)

Γ(α)Γ(β)cα

∫1

0(1 − y)α−1

⎡⎣

β−1∑

j=0

Cβ−1j (cy)j(1 − c)β−1−j

⎤⎦ dy

= Γ(α + β)

Γ(α)Γ(β)cα

β−1∑

j=0

Cβ−1j cj(1 − c)β−1−j

∫1

0yj(1 − y)α−1 dy.

Thus, since ∫1

0yj(1 − y)α−1 dy = Γ(j + 1)Γ(α)

Γ(α + j + 1),

we have

(α + β − 1)!(α − 1)!(β − 1)!

β−1∑

j=0

(β − 1)!j!(β − 1 − j)! cα+j(1 − c)β−1−j Γ(j + 1)Γ(α)

Γ(α + j + 1)

=β−1∑

j=0

(α + β − 1)!(α + j)!(β − 1 − j)! cα+j(1 − c)β−1−j

=α+β−1∑

i=α

Cα+β−1i ci(1 − c)α+β−1−i,

which is pr(X ≥ α) when X ∼ BIN(α + β − 1, c).

Solution 2.32∗

(a) Let A be the event that “any egg produces a live and healthy baby sea turtle,” letB be the event that “a live and healthy baby sea turtle grows to adulthood,” and

Page 121: Exercises and Solutions in Biostatistical Theory (2010)

102 Univariate Distribution Theory

let C be the event that “any egg produces an adult sea turtle.” Then,

pr(C) = pr(A ∩ B) = pr(A)pr(B|A) = (0.30)(1 − 0.98)

= (0.30)(0.02) = 0.006.

(b) Let T0 be the event that “any randomly chosen sea turtle nest produces no adultsea turtles” and let En be the event that “any randomly chosen sea turtle nestcontains exactly n eggs.” Then,

α = pr(T0) = pr[∪∞n=1(T0 ∩ En] =

∞∑

n=1

pr(T0|En)pr(En)

= 1 − pr(T0) = 1 −∞∑

n=1

pr(T0|En)pr(En)

= 1 −∞∑

n=1

[(0.994)n](1 − π)πn−1

= 1 − 0.994(1 − π)

∞∑

n=1

(0.994π)n−1

= 1 − 0.994(1 − π)

[1

1 − 0.994π

]

= 1 − 0.994(1 − π)

1 − 0.994π= 0.006

1 − 0.994π.

When π = 0.20, then α = 0.0075.

(c) Let Tk be the event that “a randomly chosen sea turtle nest produces exactly kadult sea turtles.” Then, based on the stated assumptions, it follows that

pr(Tk|En) = Cnk (0.006)k(0.994)n−k , k = 0, 1, . . . , n.

Then,

pr(En|Tk) = pr(En ∩ Tk)

pr(Tk)= pr(Tk|En)pr(En)

pr(Tk)

= [Cnk (0.006)k(0.994)n−k][π(1 − π)n−1]

pr(Tk).

Now, for n ≥ k ≥ 1, we have

pr(Tk) =∞∑

n=k

pr(Tk ∩ En) =∞∑

n=k

pr(Tk|En)pr(En)

=∞∑

n=k

Cnk (0.006)k(0.994)n−k(1 − π)πn−1

Page 122: Exercises and Solutions in Biostatistical Theory (2010)

Solutions 103

=(

0.0060.994

)k (1 − π

π

) ∞∑

n=k

Cnk (0.994π)n

=(

0.0060.994

)k (1 − π

π

) ∞∑

m=0

Cm+kk (0.994π)m+k

= (0.006)k(1 − π)πk−1∞∑

m=0

Cm+kk (0.994π)m

= (0.006)k(1 − π)πk−1(1 − 0.994π)−(k+1)

So,

βnk = pr(En|Tk) = [Cnk (0.006)k(0.994)n−k][(1 − π)πn−1]

(0.006)k(1 − π)πk−1(1 − 0.994π)−(k+1)

= Cnk (1 − 0.994π)k+1(0.994π)n−k , 1 ≤ k ≤ n < ∞.

When k = 0,

βn0 = (0.994)n[(1 − π)πn−1][0.994(1−π)1−0.994π

]

= (0.994π)n−1(1 − 0.994π), n = 1, 2, . . . , ∞.

For any fixed k ≥ 0, note that, as required,∑∞

n=k pr(En|Tk) = 1. Finally, whenπ = 0.20, k = 2, and n = 6, β62 = pr(E6|T2) = 0.0123.

Solution 2.33∗

(a) If k > n, the result is obvious since 0 = (0 + 0); so, we only need to consider thecase when k ≤ n. Now,

Cn−1k−1 + Cn−1

k = (n − 1)!(k − 1)!(n − k)! + (n − 1)!

k!(n − k − 1)!

= (n − 1)![

kk!(n − k)! + (n − k)

k!(n − k)!]

= (n − 1)![

nk!(n − k)!

]= n!

k!(n − k)! = Cnk ,

which completes the proof.

(b) The left-hand side of Vandermonde’s Identity is the number of ways of choosing robjects from a total of (m + n) objects. For k = 0, 1, . . . , r, this can be accomplishedby choosing k objects from the set of n objects (which can be done in Cn

k ways)and by choosing (r − k) objects from the set of m objects (which can be done inCm

r−k ways), giving the product Cnk Cm

r−k as the total number of ways of choosing

Page 123: Exercises and Solutions in Biostatistical Theory (2010)

104 Univariate Distribution Theory

r objects from a total of (m + n) objects given that exactly k objects must be chosenfrom the set of n objects. Vandermonde’s Identity follows directly by summingthis product over all the values of k.

(c) Without loss of generality, assume that s ≤ t. Then, we wish to show that

s∑

y=1

(π2y + π2y+1) =∑s

y=1

[2Cs−1

y−1Ct−1y−1 + Cs−1

y Ct−1y−1 + Cs−1

y−1Ct−1y

]

Cs+ts

= 1,

or, equivalently, that the numerator N in the above ratio expression is equal to Cs+ts .

Now, using Pascal’s Identity, we have

N =s∑

y=1

[2Cs−1

y−1Ct−1y−1 + Cs−1

y Ct−1y−1 + Cs−1

y−1Ct−1y

]

=s∑

y=1

{Cs−1

y−1

[Ct−1

y−1 + Ct−1y

]+ Ct−1

y−1

[Cs−1

y−1 + Cs−1y

]}

=s∑

y=1

[Cs−1

y−1Cty + Ct−1

y−1Csy

]

=s∑

y=1

[(s − 1)!t!

(y − 1)!(s − y)!y!(t − y)! + (t − 1)!s!(y − 1)!(t − y)!y!(s − y)!

]

= (s + t)s∑

y=1

[(s − 1)!(t − 1)!

(y − 1)!(s − y)!y!(t − y)!]

= (s + t)s

s∑

y=1

CsyCt−1

y−1

= (s + t)s

s−1∑

k=0

Csk+1Ct−1

k

= (s + t)s

s−1∑

k=0

Cs(s−1)−kCt−1

k .

Then, in the above summation, if we let r = (s − 1), m = s, and n = (t − 1), in whichcase (s − 1) ≤ min{s, (t − 1)} since s ≤ t, then Vandermonde’s Identity gives

s−1∑

k=0

Cs(s−1)−kCt−1

k =r∑

k=0

Cmr−kCn

k

= Cm+nr

= Cs+t−1s−1 .

Page 124: Exercises and Solutions in Biostatistical Theory (2010)

Solutions 105

Finally,

N = (s + t)s

Cs+t−1s−1 = (s + t)

s

[(s + t − 1)!(s − 1)!t!

]

= (s + t)!s!t! = Cs+t

s .

This completes the proof since it then follows that

0 ≤ π2y ≤ 1 and 0 ≤ π2y+1 ≤ 1, y = 1, 2, . . . , min{s, t}.

Page 125: Exercises and Solutions in Biostatistical Theory (2010)
Page 126: Exercises and Solutions in Biostatistical Theory (2010)

3Multivariate Distribution Theory

3.1 Concepts and Notation

3.1.1 Discrete and Continuous Multivariate Distributions

A discrete multivariate probability distribution for k discrete random variablesX1, X2, . . . , Xk is denoted

pX1,X2,...,Xk(x1, x2, . . . , xk) = pr

[∩k

i=1(Xi = xi)]

≡ pX(x) = pr(X = x), x ∈ D,

where the row vector X = (X1, X2, . . . , Xk), the row vector x = (x1, x2, . . . , xk),and D is the domain (i.e., the set of all permissible values) of the discreterandom vector X . A valid multivariate discrete probability distribution hasthe following properties:

(i) 0 ≤ pX(x) ≤ 1 for all x ∈ D;(ii)

∑∑ · · ·∑D

pX(x) = 1;

(iii) If D1 is a subset of D, then

pr[X ∈ D1] =∑∑

· · ·∑

D1

pX(x).

A continuous multivariate probability distribution (i.e., a multivariate densityfunction) for k continuous random variables X1, X2, . . . , Xk is denoted

fX1,X2,...,Xk (x1, x2, . . . , xk) ≡ fX(x), x ∈ D,

where D is the domain of the continuous random vector X . A validmultivariate density function has the following properties:

(i) 0 ≤ fX(x) < +∞ for all x ∈ D;(ii)

∫ ∫ · · · ∫D

fX(x) dx = 1, where dx = dx1dx2 . . . dxk ;

107

Page 127: Exercises and Solutions in Biostatistical Theory (2010)

108 Multivariate Distribution Theory

(iii) If D1 is a subset of D, then

pr[X ∈ D1] =∫ ∫

· · ·∫

D1

fX(x) dx.

3.1.2 Multivariate Cumulative Distribution Functions

In general, the multivariate CDF for a random vector X is the scalar function

FX(x) = pr(X ≤ x) = pr[∩k

i=1(Xi ≤ xi)]

.

For a discrete random vector, FX(x) is a discontinuous function of x. For acontinuous random vector, FX(x) is an absolutely continuous function of x,so that

∂kFX(x)

∂x1∂x2 · · · ∂xk= fX(x).

3.1.3 Expectation Theory

Let g(X) be a scalar function of X . If X is a discrete random vector withprobability distribution pX(x), then

E[g(X)] =∑∑

· · ·∑

Dg(x)pX(x).

And, if X is a continuous random vector with density function fX(x), then

E[g(X)] =∫ ∫

· · ·∫

D

g(x)fX(x) dx.

Some important expectations of interest in the multivariate setting are:

3.1.3.1 Covariance

For i = j, the covariance between the two random variables Xi and Xj isdefined as

cov(Xi, Xj) = E{[Xi − E(Xi)][Xj − E(Xj)]}= E(XiXj) − E(Xi)E(Xj), −∞ < cov(Xi, Xj) < +∞.

Page 128: Exercises and Solutions in Biostatistical Theory (2010)

Concepts and Notation 109

3.1.3.2 Correlation

For i = j, the correlation between the two random variables Xi and Xj isdefined as

corr(Xi, Xj) = cov(Xi, Xj)√V(Xi)V(Xj)

, −1 ≤ corr(Xi, Xj) ≤ +1.

3.1.3.3 Moment Generating Function

With the row vector t = (t1, t2, . . . , tk),

MX(t) = E(

etX ′) = E(

e∑k

i=1 tiXi)

is called the multivariate moment generating function for the random vectorX . In particular, with r1, r2, . . . , rk being nonnegative integers satisfying therestriction

∑ki=1 ri = r, we have

E[Xr11 Xr2

2 · · · Xrkk ] = ∂rMX(t)

∂tr11 ∂tr2

2 · · · ∂trkk |t=0

,

where the notation t = 0 means that ti = 0, i = 1, 2, . . . , k.

3.1.4 Marginal Distributions

When X is a discrete random vector, the marginal distribution of any propersubset of the k random variables X1, X2, . . . , Xk can be found by summingover all the random variables not in the subset of interest. In particular, for1 ≤ j < k, the marginal distribution of the random variables X1, X2, . . . , Xj isequal to

pX1,X2,...,Xj(x1, x2, . . . , xj) =

∑all xj+1

∑all xj+2

· · ·∑

all xk−1

∑all xk

pX(x).

When X is a continuous random vector, the marginal distribution of any propersubset of the k random variables X1, X2, . . . , Xk can be found by integratingover all the random variables not in the subset of interest. In particular, for1 ≤ j < k, the marginal distribution of the random variables X1, X2, . . . , Xj isequal to

fX1,X2,...,Xj(x1, x2, . . . , xj)

=∫

all xj+1

all xj+2

· · ·∫

all xk−1

all xk

fX(x)dxk dxk−1 · · · dxj+2 dxj+1.

Page 129: Exercises and Solutions in Biostatistical Theory (2010)

110 Multivariate Distribution Theory

3.1.5 Conditional Distributions and Expectations

For X a discrete random vector, let X1 denote a proper subset of the k dis-crete random variables X1, X2, . . . , Xk , let X2 denote another proper subset ofX1, X2, . . . , Xk , and assume that the subsets X1 and X2 have no elements incommon. Then, the conditional distribution of X2 given that X1 = x1 is definedas the joint distribution of X1 and X2 divided by the marginal distribution ofX1, namely,

pX2(x2|X1 = x1) = pX1,X2

(x1, x2)

pX1(x1)

= pr[(X1 = x1) ∩ (X2 = x2)]pr(X1 = x1)

, pr(X1 = x1) > 0.

Then, if g(X2) is a scalar function of X2, it follows that

E[g(X2)|X1 = x1] =∑∑

· · ·∑

all x2

g(x2)pX2(x2|X1 = x1).

For X a continuous random vector, let X1 denote a proper subset of the kcontinuous random variables X1, X2, . . . , Xk , let X2 denote another propersubset of X1, X2, . . . , Xk , and assume that the subsets X1 and X2 have noelements in common. Then, the conditional density function of X2 given thatX1 = x1 is defined as the joint density function of X1 and X2 divided by themarginal density function of X1, namely,

fX2(x2|X1 = x1) = fX1,X2(x1, x2)

fX1(x1), fX1(x1) > 0.

Then, if g(X2) is a scalar function of X2, it follows that

E[g(X2)|X1 = x1] =∫ ∫

· · ·∫

all x2

g(x2)fX2(x2|X1 = x1) dx2.

More generally, if g(X1, X2) is a scalar function of X1 and X2, then usefuliterated expectation formulas are:

E[g(X1, X2)] = Ex1{E[g(X1, X2)|X1 = x1]} = Ex2{E[g(X1, X2)|X2 = x2]}

and

V[g(X1, X2)] = Ex1{V[g(X1, X2)|X1 = x1]} + Vx1{E[g(X1, X2)|X1 = x1]}= Ex2{V[g(X1, X2)|X2 = x2]} + Vx2{E[g(X1, X2)|X2 = x2]}.

Page 130: Exercises and Solutions in Biostatistical Theory (2010)

Concepts and Notation 111

Also,

pX(x) ≡ pX1,X2,...,Xk(x1, x2, . . . , xk) = pX1

(x1)

k∏i=2

pXi

[xi

∣∣∣ ∩i−1j=1 (Xj = xj)

]

and

fX(x) ≡ fX1,X2,...,Xk (x1, x2, . . . , xk) = fX1(x1)

k∏i=2

fXi

[xi

∣∣∣ ∩i−1j=1 (Xj = xj)

].

Note that there are k! ways of writing each of the above two expressions.

3.1.6 Mutual Independence among a Set of Random Variables

The random vector X is said to consist of a set of k mutually independentrandom variables if and only if

FX(x) =k∏

i=1

FXi(xi) =k∏

i=1

pr(Xi ≤ xi)

for all possible choices of x1, x2, . . . , xk .Given mutual independence, then

pX(x) ≡ pX1,X2,...,Xk(x1, x2, . . . , xk) =

k∏i=1

pXi(xi)

when X is a discrete random vector, and

fX(x) ≡ fX1,X2,...,Xk (x1, x2, . . . , xk) =k∏

i=1

fXi(xi)

when X is a continuous random vector.Also, for i = 1, 2, . . . , k, let gi(Xi) be a scalar function of Xi. Then, if

X1, X2, . . . , Xk constitute a set of k mutually independent random variables,it follows that

E

⎡⎣

k∏i=1

gi(Xi)

⎤⎦ =

k∏i=1

E[gi(Xi)].

And, if X1, X2, . . . , Xk are mutually independent random variables, then anysubset of these k random variables also constitutes a group of mutuallyindependent random variables. Also, for i = j, if Xi and Xj are independentrandom variables, then corr(Xi, Xj) = 0; however, if corr(Xi, Xj) = 0, it doesnot necessarily follow that Xi and Xj are independent random variables.

Page 131: Exercises and Solutions in Biostatistical Theory (2010)

112 Multivariate Distribution Theory

3.1.7 Random Sample

Using the notation X i = (Xi1, Xi2, . . . , Xik), the random vectors X1, X2, . . . , Xnare said to constitute a random sample of size n from the discrete parentpopulation pX(x) if the following two conditions hold:

(i) X1, X2, . . . , Xn constitute a set of mutually independent randomvectors;

(ii) For i = 1, 2, . . . , n, pX i(xi) = pX(xi); in other words, X i follows the

discrete parent population distribution pX(x).

A completely analogous definition holds for a random sample from acontinuous parent population fX(x).

Standard statistical terminology describes a random sample X1, X2, . . . , Xnof size n as consisting of a set of independent and identically distributed (i.i.d.)random vectors. In this regard, it is important to note that the mutual inde-pendence property pertains to the relationship among the random vectors, notto the relationship among the k (possibly mutually dependent) scalar randomvariables within a random vector.

3.1.8 Some Important Multivariate Discrete and ContinuousProbability Distributions

3.1.8.1 Multinomial

The multinomial distribution is often used as a statistical model for the anal-ysis of categorical data. In particular, for i = 1, 2, . . . , k, suppose that πi isthe probability that an observation falls into the ith of k distinct categories,where 0 < πi < 1 and where

∑ki=1 πi = 1. If the discrete random variable Xi

is the number of observations out of n that fall into the ith category, thenthe k random variables X1, X2, . . . , Xk jointly follow a k-variate multinomialdistribution, namely,

pX(x) ≡ pX1,X2,...,Xk(x1, x2, . . . , xk) = n!

x1!x2! · · · xk!πx11 π

x22 · · · πxk

k , x ∈ D,

where D = {x : 0 ≤ xi ≤ n, i = 1, 2, . . . , k, and∑k

i=1 xi = n}.When (X1, X2, . . . , Xk) ∼ MULT(n; π1, π2, . . . , πk), then Xi ∼ BIN(n, πi) for

i = 1, 2, . . . , k, and cov(Xi, Xj) = −nπiπj for i = j.

3.1.8.2 Multivariate Normal

The multivariate normal distribution is often used to model the joint behavior ofk possibly mutually correlated continuous random variables. The multivari-ate normal density function for k continuous random variables X1, X2, . . . , Xk

Page 132: Exercises and Solutions in Biostatistical Theory (2010)

Concepts and Notation 113

is defined as

fX(x) ≡ fX1,X2,...,Xk (x1, x2, . . . , xk) = 1(2π)k/2|Σ|1/2

e−(1/2)(x−μ)Σ−1(x−μ)′ ,

where −∞ < xi < ∞ for i = 1, 2, . . . , k, where μ = (μ1, μ2, . . . , μk) = [E(X1),E(X2), . . . , E(Xk)], and where Σ is the (k × k) covariance matrix of X with ithdiagonal element equal to σ2

i = V(Xi) and with (i, j)th element σij equal tocov(Xi, Xj) for i = j.

Also, when X ∼ MVNk(μ, Σ), then the moment generating function for Xis

MX(t) = etμ′+(1/2)tΣt′.

And, for i = 1, 2, . . . , k, the marginal distribution of Xi is normal with meanμi and variance σ2

i .As an important special case, when k = 2, we obtain the bivariate normal

distribution, namely,

fX1,X2(x1, x2) = 1

2πσ1σ2√

(1 − ρ2)e− 1

2(1−ρ2)

[(x1−μ1

σ1

)2−2ρ(

x1−μ1σ1

)(x2−μ2

σ2

)+(

x2−μ2σ2

)2]

,

where −∞ < x1 < ∞ and −∞ < x2 < ∞, and where ρ = corr(X1, X2).When (X1, X2) ∼ BVN(μ1, μ2; σ2

1, σ22; ρ), then the moment generating func-

tion for X1 and X2 is

MX1,X2(t1, t2) = et1μ1+t2μ2+(1/2)(t21σ2

1+2t1t2ρσ1σ2+t22σ2

2).

The conditional distribution of X2 given X1 = x1 is normal with

E(X2|X1 = x1) = μ2 + ρσ2

σ1(x1 − μ1) and V(X2|X1 = x1) = σ2

2(1 − ρ2).

And, the conditional distribution of X1 given X2 = x2 is normal with

E(X1|X2 = x2) = μ1 + ρσ1

σ2(x2 − μ2) and V(X1|X2 = x2) = σ2

1(1 − ρ2).

These conditional expectation expressions for the bivariate normal dis-tribution are special cases of a more general result. More generally,for a pair of either discrete or continuous random variables X1 andX2, if the conditional expectation of X2 given X1 = x1 is a linear (orstraightline) function of x1, namely E(X2|X1 = x1) = α1 + β1x1, −∞ < α1 <

+∞, −∞ < β1 < +∞, then corr(X1, X2) = ρ = β1√[V(X1)]/[V(X2)]. Analo-

gously, if E(X1|X2 = x2) = α2 + β2x2, −∞ < α2 < +∞, −∞ < β2 < +∞, thenρ = β2

√[V(X2)]/[V(X1)].

Page 133: Exercises and Solutions in Biostatistical Theory (2010)

114 Multivariate Distribution Theory

3.1.9 Special Topics of Interest

3.1.9.1 Mean and Variance of a Linear Function of Random Variables

For i = 1, 2, . . . , k, let gi(Xi) be a scalar function of the random variable Xi.Then, if a1, a2, . . . , ak are known constants, and if L =∑k

i=1 aigi(Xi), we have

E(L) =k∑

i=1

aiE[gi(Xi)],

and

V(L) =k∑

i=1

a2i V[gi(Xi)] + 2

k−1∑i=1

k∑j=i+1

aiajcov[gi(Xi), gj(Xj)].

In the special case when the random variables Xi and Xj are uncorrelated forall i = j, then

V(L) =k∑

i=1

a2i V[gi(Xi)].

3.1.9.2 Convergence in Distribution

A sequence of random variables U1, U2, . . . , Un, . . . converges in distribution toa random variable U if

limn→∞FUn(u) = FU(u)

for all values of u where FU(u) is continuous. Notationally, we write UnD→ U.

As an important example, suppose that X1, X2, . . . , Xn constitute a randomsample of size n from either a univariate discrete probability distribu-tion pX(x) or a univariate density function fX(x), where E(X) = μ(−∞ <

μ < +∞) and V(X) = σ2(0 < σ2 < +∞). With X = n−1∑ni=1 Xi, consider the

standardized random variable

Un = X − μ

σ/√

n=∑n

i=1 Xi − nμ√nσ

.

Then, it can be shown that limn→∞ MUn(t) = et2/2, leading to the conclusion

that UnD→ Z, where Z ∼N(0,1). This is the well-known Central Limit Theorem.

3.1.9.3 Order Statistics

Let X1, X2, . . . , Xn constitute a random sample of size n from a univariatedensity function fX(x), −∞ < x < +∞, with corresponding cumulative

Page 134: Exercises and Solutions in Biostatistical Theory (2010)

Concepts and Notation 115

distribution function FX(x) = ∫x−∞ fX(t) dt. Then, the n order statistics

X(1), X(2), . . . , X(n) satisfy the relationship

−∞ < X(1) < X(2) < · · · < X(n−1) < X(n) < +∞.

For r = 1, 2, . . . , n, the random variable X(r) is called the rth order statis-tic. In particular, X(1) = min{X1, X2, . . . , Xn}, X(n) = max{X1, X2, . . . , Xn}, andX((n+1)/2) = median{X1, X2, . . . , Xn} when n is an odd positive integer.

For r = 1, 2, . . . , n, the distribution of X(r) is

fX(r) (x(r)) = nCn−1r−1 [FX(x(r))]r−1[1 − FX(x(r))]n−rfX(x(r)), −∞ < x(r) < +∞.

For 1 ≤ r < s ≤ n, the joint distribution of X(r) and X(s) is equal to

fX(r),X(s) (x(r), x(s)) = n!(r − 1)!(s − r − 1)!(n − s)! [FX(x(r))]r−1

× [FX(x(s)) − FX(x(r))]s−r−1

× [1 − FX(x(s))]n−sfX(x(r))fX(x(s)),

− ∞ < x(r) < x(s) < +∞.

And, the joint distribution of X(1), X(2), . . . , X(n) is

fX(1),X(2),...,X(n)(x(1), x(2), . . . , x(n)) = n!

n∏i=1

fX(x(i)),

− ∞ < x(1) < x(2) < · · · < x(n−1) < x(n) < +∞.

3.1.9.4 Method of Transformations

With k = 2, let X1 and X2 be two continuous random variables with jointdensity function fX1,X2(x1, x2), (x1, x2) ∈ D. Let Y1 = g1(X1, X2) and Y2 =g2(X1, X2) be random variables, where the functions y1 = g1(x1, x2) andy2 = g2(x1, x2) define a one-to-one transformation from the domain D in the(x1, x2)-plane to the domain D∗ in the (y1, y2)-plane. Further, let x1 = h1(y1, y2)

and x2 = h2(y1, y2) be the inverse functions expressing x1 and x2 as functionsof y1 and y2. Then, the joint density function of the random variables Y1 andY2 is

fY1,Y2(y1, y2) = fX1,X2 [h1(y1, y2), h2(y1, y2)]|J|, (y1, y2) ∈ D∗,

where the Jacobian J, J = 0, of the transformation is the second-order determi-nant

J =

∣∣∣∣∣∣∣∣

∂h1(y1, y2)

∂y1

∂h1(y1, y2)

∂y2

∂h2(y1, y2)

∂y1

∂h2(y1, y2)

∂y2

∣∣∣∣∣∣∣∣.

Page 135: Exercises and Solutions in Biostatistical Theory (2010)

116 Multivariate Distribution Theory

For the special case k = 1 when Y1 = g1(X1) and X1 = h1(Y1), it follows that

fY1(y1) = fX1 [h1(y1)]∣∣∣∣dh1(y1)

dy1

∣∣∣∣ , y1 ∈ D∗.

It is a straightforward generalization to the situation when Yi =gi(X1, X2, . . . , Xk), i = 1, 2, . . . , k, with the Jacobian J being the determinantof a (k × k) matrix.

EXERCISES

Exercise 3.1. Two balls are selected sequentially at random without replacement froman urn containing N (>1) balls numbered individually from 1 to N. Let the discreterandom variable X be the number on the first ball selected, and let the discrete randomvariable Y be the number on the second ball selected.

(a) Provide an explicit expression for the joint distribution of the random variables Xand Y, and also provide explicit expressions for the marginal distributions of Xand Y.

(b) Provide an explicit expression for pr[X ≥ (N − 1)|Y = y], where y is a fixedpositive integer satisfying the inequality 1 ≤ y ≤ N.

(c) Derive an explicit expression for corr(X, Y), the correlation between X and Y. Findthe limiting value of corr(X, Y) as N → ∞, and then comment on your finding.

Exercise 3.2. Consider an experiment consisting of n mutually independent Bernoullitrials, where each trial results in either a success (denoted by the letter S) or a failure(denoted by the letter F). For any trial, the probability of a success is equal to π, 0 <

π < 1, and so the probability of a failure is equal to (1 − π). For any set of n trials withoutcomes arranged in a linear sequence, a run is a subsequence of outcomes of the sametype which is both preceded and succeeded by outcomes of the opposite type or bythe beginning or by the end of the complete sequence. The number of successes in asuccess (or S) run is referred to as its length.

For any such sequence of n Bernoulli trial outcomes, let the discrete random variableMn denote the length of the shortest S run in the sequence, and let the discrete randomvariable Ln denote the length of the longest S run in the sequence. For example, for thesequence of n = 12 outcomes given by

FFSFSSSFFFSS,

the observed value of M12 is m12 = 1 and the observed value of L12 is l12 = 3.

(a) If n = 5, find the joint distribution of the random variables M5 and L5.

(b) Find the marginal distribution of the random variable L5, and then find thenumerical value of E(L5) when π = 0.90.

Exercise 3.3. Suppose that pU(u) = pr(U = u) = n−1, u = 1, 2, . . . , n. Further, supposethat, given (or conditional on) U = u, X and Y are independent geometric randomvariables, with

pX(x|U = u) = u−1(1 − u−1)x−1, x = 1, 2, . . . , ∞

Page 136: Exercises and Solutions in Biostatistical Theory (2010)

Exercises 117

and

pY(y|U = u) = u−1(1 − u−1)y−1, y = 1, 2, . . . , ∞.

(a) Derive an explicit expression for corr(X, Y), the correlation between the randomvariables X and Y.

(b) Develop an expression for pr(X = Y). What is the numerical value of thisprobability when n = 4?

Exercise 3.4. Suppose that Z ∼ N(0, 1), that U ∼ χ2ν, and that Z and U are independent

random variables. Then, the random variable

Tν = Z√U/ν

has a (central) t-distribution with ν degrees of freedom.

(a) By considering the conditional density function of Tν given U = u, develop anexplicit expression for the density function of Tν.

(b) Find E(Tν) and V(Tν).

Exercise 3.5

(a) Suppose that Y is a random variable with conditional mean E(Y|X = x) = β0 +β1x and that X is a random variable with mean E(X) and variance V(X). Useconditional expectation theory to show that

corr(X, Y) = β1

√V(X)

V(Y),

and then comment on this finding.

(b) Now, given the above assumptions, suppose also that E(X|Y = y) = α0 + α1y.Develop an explicit expression relating corr(X, Y) to α1 and β1, and then commenton this finding.

(c) Now, suppose that E(Y|X = x) = β0 + β1x + β2x2. Derive an explicit expressionfor corr(X, Y), and then comment on how the addition of the quadratic term β2x2

affects the relationship between corr(X, Y) and β1 given in part (a).

Exercise 3.6. Suppose that the amounts X and Y (in milligrams) of two toxic chemicalsin a liter of water selected at random from a river near a certain manufacturing plantcan be modeled by the bivariate density function

fX,Y(x, y) = 6θ−3(x − y), 0 < y < x < θ.

(a) Derive an explicit expression for corr(X, Y), the correlation between the twocontinuous random variables X and Y.

Page 137: Exercises and Solutions in Biostatistical Theory (2010)

118 Multivariate Distribution Theory

(b) Set up appropriate integrals that are needed to find

pr[(X + Y) < θ|(X + 2Y) >

θ

4

].

Note that the appropriate integrals do not have to be evaluated, but the integrandsand the limits of integration must be correctly specified for all integrals that areused.

(c) Let (X1, Y1), (X2, Y2), . . . , (Xn, Yn) constitute a random sample of size n fromfX,Y(x, y), and let X = n−1∑n

i=1 Xi and Y = n−1∑ni=1 Yi. Develop explicit expres-

sions for E(L) and V(L) when L = (3X − 2Y).

Exercise 3.7. For a certain type of chemical reaction involving two chemicals A and B,let X denote the proportion of the initial amount (in grams) of chemical A that remainsunreacted at equilibrium, and let Y denote the corresponding proportion of the initialamount (in grams) of chemical B that remains unreacted at equilibrium. The bivariatedensity function for the continuous random variables X and Y is assumed to be of theform

fX,Y(x, y) = Γ(α + β + 3)

Γ(α + 1)Γ(β + 1)(1 − x)αyβ, 0 < y < x, 0 < x < 1,

α > −1, β > −1.

(a) Derive explicit expressions for fX(x) and fY(y), the marginal distributions of therandom variables X and Y, and for fY(y|X = x), the conditional density functionof Y given X = x.

(b) Use the results obtained in part (a) to develop an expression for ρX,Y = corr(X, Y),the correlation between the random variables X and Y. What is the numericalvalue of this correlation coefficient when α = 2 and β = 3?

(c) Let (X1, Y1), (X2, Y2), . . . , (Xn, Yn) constitute a random sample of size n fromfX,Y(x, y). If X = n−1∑n

i=1 Xi and Y = n−1∑ni=1 Yi, find the expected value and

variance of the random variable L = (3X − 5Y) when n = 10, α = 2, and β = 3.

Exercise 3.8. A certain simple biological system involves exactly two independentlyfunctioning components. If one of these two components fails, then the entire systemfails. For i = 1, 2, let Yi be the random variable representing the time (in weeks) tofailure of the ith component, with the distribution of Yi being negative exponential,namely,

fYi (yi) = θie−θiyi , 0 < yi < ∞, θi > 0.

Further, assume that Y1 and Y2 are independent random variables. Clearly, if compo-nent 1 fails first, then Y1 is observable, but Y2 is not observable (i.e., Y2 is then saidto be censored); conversely, if component 2 fails first, then Y2 is observable, but Y1 isnot observable (i.e., Y1 is censored). Thus, if this biological system fails, then only tworandom variables, call them U and W, are observable, where U = min(Y1, Y2) andwhere W = 1 if Y1 < Y2 and W = 0 if Y2 < Y1.

Page 138: Exercises and Solutions in Biostatistical Theory (2010)

Exercises 119

(a) Develop an explicit expression for the joint distribution fU,W(u, w) of the randomvariables U and W.

(b) Find the marginal distribution pW(w) of the random variable W.

(c) Find the marginal distribution fU(u) of the random variable U.

(d) Are U and W independent random variables?

Exercise 3.9. It has been documented via numerous research studies that the eldestchild in a family with multiple children generally has a higher IQ than his or hersiblings. In a certain large population of U.S. families with two children, supposethat the random variable Y1 denotes the IQ of the older child and that the randomvariable Y2 denotes the IQ of the younger child. Assume that Y1 and Y2 have ajoint bivariate normal distribution with parameter values E(Y1) = 110, E(Y2) = 100,V(Y1) = V(Y2) = 225, and ρ = corr(Y1, Y2) = 0.80.

(a) Suppose that three families are randomly chosen from this large population ofU.S. families with two children. What is the probability that the older child has anIQ at least 15 points higher than the younger child for at least two of these threefamilies?

(b) For a family randomly chosen from this population, if the older child is known tohave an IQ of 120, what is the probability that the younger child has an IQ greaterthan 120?

Exercise 3.10. Discrete choice statistical models are useful in many situations, includingtransportation research. For example, transportation researchers may want to knowwhy certain individuals choose to use public bus transportation instead of a car. Asa starting point, the investigators typically assume that each mode of transportationcarries with it a certain value, or “utility,” that makes it more or less desirable toconsumers. For instance, cars may be more convenient, but a bus may be more envi-ronmentally friendly. According to the “maximum utility principle,” consumers selectthe alternative that has the greatest desirability or utility.

As a simple illustration of a discrete choice statistical model, suppose that there areonly two possible discrete choices, A and B. Let the random variable Y take the value1 if choice A is made, and let Y take the value 0 if choice B is made. Furthermore, let Uand V be the utilities associated with the choices Aand B, respectively, and assume thatU and V are independent random variables, each having the same standard Gumbel(Type-I Extreme-Value) distribution. In particular, both U and V are assumed to haveCDFs of the general form

FX(x) = pr(X ≤ x) = e−e−x, −∞ < x < ∞.

According to the maximum utility principle, Y = 1 if and only if U > V, or equiva-lently, if W = (U − V) > 0.

(a) Show that W follows a logistic distribution, with CDF FW(w) = 1/(1 + e−w), −∞ <

w < ∞.

(b) Suppose that U = α + E1 and V = E2, where E1 and E2 are independent errorterms, each following the standard Gumbel CDF of the general form FX(x) given

Page 139: Exercises and Solutions in Biostatistical Theory (2010)

120 Multivariate Distribution Theory

above. Here, α represents the average population difference between the two utili-ties. (More generally, U and V can be modeled as functions of covariates, althoughthis extension is not considered here.) Again, assume that we observe Y = 1 ifchoice A is made and Y = 0 if choice B is made. Find an explicit expression as afunction of α for pr(Y = 1) under the maximum utility principle.

Exercise 3.11. Let X1, X2, . . . , Xn constitute a random sample of size n(n ≥ 3) from theparent population

fX(x) = λe−λx, 0 < x < +∞, 0 < λ < +∞.

(a) Find the conditional density function of X1, X2, . . . , Xn given that S =∑ni=1 Xi = s.

(b) Consider the (n − 1) random variables

Y1 = X1S

, Y2 = (X1 + X2)

S, . . . , Yn−1 = (X1 + X2 + · · · + Xn−1)

S.

Find the joint distribution of Y1, Y2, . . . , Yn−1 given that S = s.

(c) When n = 3 and when n = 4, find the marginal distribution of Y1 given that S = s,and then use these results to infer the structure of the marginal distribution of Y1given that S = s for any n ≥ 3.

Exercise 3.12. Let X1, X2, . . . , Xn constitute a random sample of size n from a N(μ, σ2)

population. Then, consider the n random variables Y1, Y2, . . . , Yn, where Yi = eXi , i =1, 2, . . . , n. Finally, consider the following two random variables:

(i) The arithmetic mean Ya = n−1n∑

i=1Yi;

(ii) The geometric mean Yg =(

n∏i=1

Yi

)1/n

.

Develop an explicit expression for corr(Ya, Yg), the correlation between the two ran-dom variables Ya and Yg . Then, find the limiting value of this correlation as n → ∞,and comment on your finding.

Exercise 3.13. For a certain public health research study, an epidemiologist is interestedin determining via blood tests which particular subjects in a random sample of N(=Gn) human subjects possess a certain antibody; here, G and n are positive integers. Forthe population from which the random sample of N subjects is selected, the proportionof subjects in that population possessing the antibody is equal to π(0 < π < 1), aknown quantity. The epidemiologist is considering two possible blood testing plans:

Plan #1: Perform the blood test separately on each of the N subjects in the randomsample;

Plan #2: Divide the N subjects in the random sample into G groups of n subjectseach; then, for each group of size n, take a blood sample from each of the n subjectsin that group, mix the n blood samples together, and do one blood test on the

Page 140: Exercises and Solutions in Biostatistical Theory (2010)

Exercises 121

mixture; if the blood test on the mixture is negative (indicating that the antibody isnot present in that mixture), then none of those n subjects possesses the antibody;however, if the blood test on the mixture is positive (indicating that the antibody ispresent), then the blood test will have to be performed on each of the n subjects inthat group.

(a) Let T2 be the random variable denoting the number of blood tests required forPlan #2. Develop an explicit expression for E(T2).

(b) Clearly, the larger the value of π, the more likely it is that the blood test on a mixtureof n blood samples will be positive, necessitating a blood test on every one of thosen subjects. Determine the optimal value of n (say, n∗) and the associated desiredlargest value of π (say, π∗) for which E(T2) < N (i.e., for which Plan #2 is preferredto Plan #1).

Exercise 3.14. For the state of North Carolina (NC), suppose that the number Y offemale residents who are homicide victims in any particular calendar year followsa Poisson distribution with mean E(Y) = Lλ, where L is the total number of person-months at risk for homicide for all female NC residents during that year and whereλ is the rate of female homicides per person-month. Let π be the proportion of allhomicide victims who were pregnant at the time of the homicide; more specifically,π =pr(woman was pregnant at the time of the homicide | woman was a homicidevictim). It can be assumed that women in NC function independently of one anotherwith regard to homicide-related and pregnancy-related issues.

Domestic violence researchers are interested in making statistical inferences aboutthe true average (or expected value) of the number Yp of homicide victims who werepregnant at the time of the homicide and about the true average (or expected value)of the number Yp = (Y − Yp) of homicide victims who were not pregnant at the timeof the homicide.

Find the conditional joint moment generating function

MYp,Yp(s, t|Y = y) = MYp,(Y−Yp)(s, t|Y = y)

of Yp and Yp = (Y − Yp) given Y = y, and then unconditionalize to determine thedistributions of Yp and Yp. Are Yp and Yp independent random variables? If L has aknown value, and if estimates λ and π of λ and π are available, provide reasonableestimates of E(Yp) and E(Yp).

Exercise 3.15. A chemical test for the presence of a fairly common protein in humanblood produces a continuous measurement X. Let the random variable D take thevalue 1 if a person’s blood contains the protein in question, and let D take the value 0if a person’s blood does not contain the protein in question. Among all those peoplecarrying the protein, X has a lognormal distribution with mean E(X|D = 1) = 2.00 andvariance V(X|D = 1) = 2.60. Among all those people not carrying the protein, X has alognormal distribution with mean E(X|D = 0) = 1.50 and variance V(X|D = 0) = 3.00.In addition, it is known that 60% of all human beings actually carry this particularprotein in their blood.

(a) If a person is randomly chosen and is given the chemical test, what is the numer-ical value of the probability that this person’s blood truly contains the protein in

Page 141: Exercises and Solutions in Biostatistical Theory (2010)

122 Multivariate Distribution Theory

question given that the event “1.60 < X < 1.80” has occurred (i.e., it is known thatthe observed value of X for this person lies between 1.60 and 1.80)?

(b) Let the random variable X be the value of the chemical test for a person chosencompletely randomly. Provide numerical values for E(X) and V(X).

(c) Suppose that the following diagnostic rule is proposed: “classify a randomly cho-sen person as carrying the protein if X > c, and classify that person as not carryingthe protein if X ≤ c, where 0 < c < ∞.” Thus, a carrier for which X ≤ c is misclas-sified, as is a noncarrier for which X > c. For this diagnostic rule, develop anexpression (as a function of c) for the probability of misclassification θ of a randomlychosen human being, and then find the numerical value c∗ of c that minimizes θ.Comment on your finding.

Exercise 3.16. Let (X1, Y1), (X2, Y2), . . . , (Xn, Yn) constitute a random sample of sizen from a bivariate population involving two random variables X and Y, whereE(X) = μx, E(Y) = μy, V(X) = σ2

x, V(Y) = σ2y, and ρ = corr(X, Y). Show that the ran-

dom variable

U = (n − 1)−1n∑

i=1

(Xi − X)(Yi − Y)

has an expected value equal to the parametric function cov(X, Y) = ρσxσy.

Exercise 3.17. A certain large community in the United States receives its drinkingwater supply from a nearby lake, which itself is located in close proximity to a plantthat uses benzene, among other chemicals, to manufacture styrene. Because this com-munity has recently experienced elevated rates of leukemia, a blood cancer that hasbeen associated with benzene exposure, the EPA decides to send a team to sample thedrinking water used by this community and to determine whether or not this drink-ing water contains a benzene level exceeding the EPA standard of 5 parts of benzeneper billion parts of water (i.e., a standard of 5 ppb). Suppose that the continuous ran-dom variable X represents the measured benzene concentration in ppb in the drinkingwater used by this community, and assume that X has a lognormal distribution. Morespecifically, assume that Y = ln(X) has a normal distribution with unknown mean μ

and variance σ2 = 2. The EPA decides to take n = 10 independently chosen drinkingwater samples and to measure the benzene concentration in each of these 10 drinkingwater samples. Based on the results of these 10 benzene concentration measurements(denoted X1, X2, . . . , X10), the EPA team has to decide whether the true mean benzeneconcentration in this community’s drinking water is in violation of the EPA standard(i.e., exceeds 5 ppb). Three decision rules are proposed:

Decision Rule #1: Decide that the drinking water is in violation of the EPA standardif at least 3 of the 10 benzene concentration measurements exceed 5 ppb.

Decision Rule #2: Decide that the drinking water is in violation of the EPAstandard ifthe geometric mean of the 10 benzene concentration measurements exceeds 5 ppb,where

Xg =⎛⎝

10∏

i=1

Xi

⎞⎠

1/10

.

Page 142: Exercises and Solutions in Biostatistical Theory (2010)

Exercises 123

Decision Rule #3: Decide that the drinking water is in violation of the EPA standardif the maximum of the 10 benzene concentration measurements, denoted X(10),exceeds 5 ppb.

(a) For each of these three different decision rules, develop, as a function of theunknown parameter μ, a general expression for the probability of deciding thatthe drinking water is in violation of the EPA standard. Also, if E(X) = 7, findthe numerical value of each of these three probabilities.

(b) For Decision Rule #2, examine expressions for pr(Xg > 5) and E(Xg) to provideanalytical arguments as to why Decision Rule #2 performs so poorly.

Exercise 3.18. Let X1, X2, . . . , Xn constitute a random sample of size n from a normaldistribution with mean μ = 0 and variance σ2 = 2. Determine the smallest value ofn, say n∗, such that pr[min{X2

1, X22, . . . , X2

n} ≤ 0.002] ≥ 0.80. [HINT: If Z ∼N(0,1), thenZ2 ∼ χ2

1.]

Exercise 3.19. A large hospital wishes to determine the appropriate number of coro-nary bypass grafts that it can perform during the upcoming calendar year based bothon the size of its coronary bypass surgery staff (e.g., surgeons, nurses, anesthesiol-ogists, technicians, etc.) and on other logistical and space considerations. Nationaldata suggest that a typical coronary bypass surgery patient would require exactly one(vessel) graft with probability π1 = 0.54, would require exactly two grafts with prob-ability π2 = 0.22, would require exactly three grafts with probability π3 = 0.15, andwould require exactly four grafts with probability π4 = 0.09. Further, suppose thatit is known that this hospital cannot feasibly perform more than about 900 coronarybypass grafts in any calendar year.

(a) An administrator for this hospital suggests that it might be reasonable to performcoronary bypass surgery on n = 500 different patients during the upcoming calen-dar year and still have a reasonably high probability (say, ≥0.95) of not exceedingthe yearly upper limit of 900 coronary bypass grafts. Use the Central Limit Theoremto assess the reasonableness of this administrator’s suggestion.

(b) Provide a reasonable value for the largest number n∗ of patients that can undergocoronary bypass surgery at this hospital during the upcoming year so that,with probability at least equal to 0.95, no more than 900 grafts will need to beperformed.

Exercise 3.20. For the ith of k drug treatment centers (i = 1, 2, . . . , k) in a certain largeU.S. city, suppose that the distribution of the number Xi of adult male drug usersthat have to be tested until exactly one such adult drug user tests positively for HIV isassumed to be geometric, namely

pXi(xi) = π(1 − π)xi−1, xi = 1, 2, . . . , +∞; 0 < π < 1.

In all that follows, assume that X1, X2, . . . , Xk constitute a set of mutually independentrandom variables.

Page 143: Exercises and Solutions in Biostatistical Theory (2010)

124 Multivariate Distribution Theory

(a) If π = 0.05 and if S =∑ki=1 Xi, provide a reasonable numerical value for pr(S >

1, 100) if k = 50.

(b) Use moment generating function (MGF) theory to show that the distribution ofthe random variable U = 2πS is, for small π, approximately chi-squared with 2kdegrees of freedom.

(c) Use the result in part (b) to compute a numerical value for pr(S > 1, 100) whenπ = 0.05 and k = 50, and then compare your answer to the one found in part (a).

Exercise 3.21. Let Y1, Y2, . . . , Yn constitute a random sample of size n (>1) from thePareto density function

fY(y; θ) = θcθy−(θ+1) 0 < c < y < +∞ and θ > 0,

where c is a known positive constant and where θ is an unknown parameter. ThePareto density function has been used to model the distribution of family incomes incertain populations.

Consider the random variable

Un = θn[Y(1) − c]/c,

where Y(1) = min{Y1, Y2, . . . , Yn}. Directly evaluate limn→∞ FUn(u), where FUn(u) isthe CDF of Un, to find the asymptotic distribution of Un. In other words, derive anexplicit expression for the CDF of U when Un converges in distribution to U.

Exercise 3.22. For a certain laboratory experiment involving mice, suppose that therandom variable X, 0 < X < 1, represents the proportion of a fixed time period (inminutes) that it takes a mouse to locate food at the end of a maze, and further supposethat X follows a uniform distribution on the interval (0, 1), namely,

fX(x) = 1, 0 < x < 1.

Suppose that the experiment involves n randomly chosen mice. Further, supposethat x1, x2, . . . , xn are the n realized values (i.e., the n observed proportions) of then mutually independent random variables X1, X2, . . . , Xn, which themselves can beconsidered to constitute a random sample of size n from fX(x). Let the random variableU be the smallest proportion based on the shortest time required for a mouse to locatethe food, and let the random variable V be the proportion of the fixed time period stillremaining based on the longest time required for a mouse to locate the food.

(a) Find an explicit expression for the joint distribution of the random variables Uand V.

(b) Let R = nU and let S = nV. Find the asymptotic joint distribution of R and S.[HINT : Evaluate limn→∞

{pr[(R > r) ∩ (S > s)]}].

Exercise 3.23. Suppose that the total cost C (in millions of dollars) for repairs due tofloods occurring in the United States in any particular year can be modeled by defining

Page 144: Exercises and Solutions in Biostatistical Theory (2010)

Exercises 125

the random variable C as follows:

C = 0 if X = 0 and C =X∑

j=1

Cj if X > 0;

here, the number of floods X in any particular year in the United States is assumedto have a Poisson distribution with mean E(X) = λ, and Cj is the cost (in millions ofdollars) for repairs due to the jth flood in that particular year. Also, it is assumed thatC1, C2, . . . are i.i.d. random variables, each with the same expected value μ, the samevariance σ2, and the same moment generating function M(t) = E(etCj ). Note that theactual distribution of the random variables C1, C2, . . . has not been specified.

(a) Develop an explicit expression for corr(X, C), the correlation between the randomvariables X and C, and then comment on the structure of the expression that youobtained.

(b) Develop an explicit expression for MC(t) =E(etC), the moment generating functionof the random variable C, and then use this result to find E(C).

Exercise 3.24. To evaluate the performance of a new cholesterol-lowering drug, a largedrug company plans to enlist a randomly chosen set of k private medical practices tohelp conduct a clinical trial. Under the protocol proposed by the drug company, eachprivate medical practice is to enroll into the clinical trial a set of n randomly chosensubjects with high cholesterol. The cholestorol level (in mg/dL) of each subject is to bemeasured both before taking the new drug and after taking the new drug on a dailybasis for 6 months. The continuous response variable of interest is Y, the change ina subject’s cholesterol level over the 6-month period. The following statistical modelwill be used:

Yij = μ + βi + εij, i = 1, 2, . . . , k and j = 1, 2, . . . , n.

Here, μ is the average change in cholesterol level for a typical subject with high choles-terol who takes this new cholesterol-lowering drug on a daily basis for a 6-monthperiod, βi is the random effect associated with the ith private medical practice, and εijis the random effect associated with the jth subject in the ith private medical practice.Here, it is assumed that βi ∼ N(0, σ2

β), that εij ∼ N(0, σ2

ε ), and that the sets {βi} and {εij}constitute a group of (k + kn) mutually independent random variables. Finally, let

Y = (kn)−1k∑

i=1

n∑

j=1

Yij

be the overall sample mean.

(a) Develop explicit expressions for E(Y) and V(Y).

(b) Suppose that it will cost Dc dollars for each clinic to enroll and monitor n subjectsover the duration of the proposed clinical trial, and further suppose that eachsubject is to be paid Dp dollars for participating in the clinical trial. Thus, the

Page 145: Exercises and Solutions in Biostatistical Theory (2010)

126 Multivariate Distribution Theory

total cost of the clinical trial is equal to C = (kDc + knDp). Suppose that this drugcompany can only afford to spend C∗ dollars to conduct the proposed clinical trial.Find specific expressions for n∗ and k∗, the specific values of n and k that minimizethe variance of Y subject to the condition that C = (kDc + knDp) = C∗.

(c) If C∗ = 100, 000, Dc = 10, 000, Dp = 100, σ2β

= 4, and σ2ε = 9, find appropriate

numerical values for n∗ and k∗.

Exercise 3.25. For i = 1, 2, suppose that the conditional distribution of Yi given thatY3 = y3 is

pYi(yi|Y3 = y3) = y

yi3 e−y3/yi!, yi = 0, 1, . . . , ∞.

Further, assume that the random variable Y3 has the truncated Poisson distribution

pY3(y3) = λ

y33

y3!(eλ3 − 1), y3 = 1, 2, . . . , ∞ and λ3 > 0;

and, also assume that the random variables Y1 and Y2 are conditionally independentgiven that Y3 = y3.

Then, consider the random variables

R = (Y1 + Y3) and S = (Y2 + Y3).

Derive an explicit expression for the moment generating function MU(t) of the ran-dom variable U = (R + S), and then use MU(t) directly to find an explicit expressionfor E(U). Verify that your expression for E(U) is correct by finding E(U) directly.

Exercise 3.26. Suppose that n(>1) balls are randomly tossed into C(>1) cells, so thatthe probability is 1/C of any ball ending up in the ith cell, i = 1, 2, . . . , C.

Find the expected value and the variance of the number X of cells that will endup being empty (i.e., that will contain no balls). For the special case when C = 6 andn = 5, find the numerical values of E(X) and V(X).

Exercise 3.27. A researcher at the Federal Highway Administration (FHWA) proposesthe following statistical model for traffic fatalities. Let the random variable N be thenumber of automobile accidents occurring on a given stretch of heavily traveled inter-state highway over a specified time period. For i = 1, 2, . . . , N, let the random variableYi take the value 1 if the ith automobile accident involved at least one fatality, andlet Yi take the value 0 otherwise. Let pr(Yi = 1) = π, 0 < π < 1, and further assumethat the {Yi} are mutually independent dichotomous random variables. Also, let therandom variable N have the geometric distribution

pN(n) = θ(1 − θ)n−1, n = 1, . . . , ∞; 0 < θ < 1.

This researcher is interested in the random variable

T = Y1 + Y2 + · · · + YN ,

Page 146: Exercises and Solutions in Biostatistical Theory (2010)

Exercises 127

the total number of automobile accidents involving fatalities on that stretch ofinterstate highway during the specified time period.

(a) Find explicit expressions for E(T), V(T), and corr(N, T).

(b) Find an explicit expression for pr(T = 0).

Exercise 3.28. Let X1, X2, . . . , Xm constitute a random sample of size m from a POI(λ1)population, and let Y1, Y2, . . . , Yn constitute a random sample of size n from a POI(λ2)population. Consider the random variable

U = (X − Y) = m−1m∑

i=1

Xi − n−1n∑

i=1

Yi.

(a) Find explicit expressions for E(U) and V(U).

(b) Use the Lagrange multiplier method to find expressions (which are functions ofN, λ1, and λ2) for m and n that minimize V(U) subject to the restriction (m + n) =N, where N is the total sample size that can be selected from these two Poissonpopulations due to cost considerations. Provide an interpretation for your findings.If N = 60, λ1 = 2, and λ2 = 8, use these expressions to find numerical values form and n.

Exercise 3.29∗. Let the discrete random variables X and Y denote the numbers of AIDScases that will be detected yearly in two different NC counties, one in the eastern partof the state and the other in the western part of the state. Further, assume that X andY are independent random variables, and that they have the respective distributions

pX(x) = (1 − πx)πxx, x = 0, 1, . . . , ∞, 0 < πx < 1

andpY(y) = (1 − πy)π

yy, y = 0, 1, . . . , ∞, 0 < πy < 1.

(a) Derive an explicit expression for θ = pr(X = Y).

(b) The absolute difference in the numbers of AIDS cases that will be detected yearly inboth counties is the random variable U = |X − Y|. Derive an explicit expressionfor pU(u), the probability distribution of the random variable U.

(c) For a particular year, suppose that the observed values of X and Y are x = 9 andy = 7. Provide a quantitative answer regarding the question of whether or notthese observed values of X and Y provide statistical evidence that πx = πy. Foryour calculations, you may assume that πx ≤ 0.10 and that πy ≤ 0.10.

Exercise 3.30∗. Let the random variable Y denote the number of Lyme disease casesthat develop in the state of NC during any one calendar year. The event Y = 0 isnot observable since the observational apparatus (i.e., diagnosis) is activated onlywhen Y > 0. Since Lyme disease is a rare disease, it seems appropriate to model thedistribution of Y by the zero-truncated Poisson distribution (ZTPD)

pY(y) =(

eθ − 1)−1

θy

y! , y = 1, 2, . . . , ∞,

Page 147: Exercises and Solutions in Biostatistical Theory (2010)

128 Multivariate Distribution Theory

where θ(>0) is called the “incidence parameter.”

(a) Find an explicit expression for

ψ(t) = E[(t + 1)Y

].

(b) Use ψ(t) to show that

E(Y) = θeθ

(eθ − 1)and V(Y) = θeθ(eθ − θ − 1)

(eθ − 1)2 .

(c) To lower the incidence of Lyme disease in NC, the state health department mountsa vigorous media campaign to educate NC residents about all aspects of Lymedisease (including information about preventing and dealing with tick bites, usingprotective measures such as clothing and insect repellents, recognizing symptomsof Lyme disease, treating Lyme disease, etc.) Assume that this media campaignhas the desired effect of lowering θ to πθ, where 0 < π < 1. Let Z be the numberof Lyme disease cases occurring during a 1-year period after the media campaignis over. Assume that

pZ(z) = (πθ)ze−πθ

z! , z = 0, 1, . . . , ∞,

and that Y and Z are independent random variables.There is interest in the random variable X = (Y + Z), the total number of Lyme

disease cases that occur altogether (namely, 1 year before and 1 year after the mediacampaign). Find an explicit expression for

pX(x) = pr(X = x) = pr[(Y + Z) = x], x = 1, 2, . . . , ∞.

(d) Find E(X) and V(X).

Exercise 3.31∗. For patients receiving a double kidney transplant, let Xi be the lifetime(in months) of the ith kidney, i = 1, 2. Also, assume that the density function of Xi isnegative exponential with mean α−1, namely,

fXi (xi) = αe−αxi , xi > 0, α > 0, i = 1, 2,

and further assume that X1 and X2 are independent random variables. As soon as oneof the two kidneys fails, the lifetime Y (in months) of the remaining functional kidneyfollows the conditional density function

fY(y|U = u) = βe−β(y−u), 0 < u < y < ∞, β > 2α,

where U = min(X1, X2).

(a) Show that the probability that both organs are still functioning at time t is equal to

π2(t) = e−2αt, t ≥ 0.

Page 148: Exercises and Solutions in Biostatistical Theory (2010)

Exercises 129

(b) Show that the probability that exactly one organ is still functioning at time t isequal to

π1(t) = 2α

(β − 2α)

(e−2αt − e−βt

), t ≥ 0.

(c) Using the results in parts (a) and (b), develop an explicit expression for fT(t), thedensity function of the length of life T (in months) of the two-kidney system [i.e.,T is the length of time (in months) until both kidneys have failed].

(d) Develop an explicit expression for the marginal distribution fY(y) of the randomvariable Y. How are the random variables T and Y related? Also, find explicitexpressions for the expected value and variance of the length of life of the two-kidney system.

Exercise 3.32∗. Let X1, X2, . . . , Xn constitute a random sample of size n(>3) from aN(μ, σ2) parent population. Further, define

X = n−1n∑

i=1

Xi, S2 = (n − 1)−1n∑

i=1

(Xi − X)2 and T(n−1) = X − μ

S/√

n.

(a) Develop an explicit expression for corr[X, T(n−1)]. Find the numerical value of thiscorrelation when n = 4 and when n = 6.

(b) Using the fact that Γ(x) ≈ √2πe−xx(x−1/2) for large x, find the limiting value of

corr[X, T(n−1)] as n → ∞, and then interpret this limit in a meaningful way.

Exercise 3.33∗. Suppose that there are three identical looking die. Two of these threedie are perfectly balanced, so that the probability is 1

6 of obtaining any one of the sixnumbers 1, 2, 3, 4, 5, and 6. The third die is an unbalanced die. For this unbalanced die,the probability of obtaining a 1 is equal to

(16 − ε

)and the probability of obtaining a 6

is equal to(

16 + ε

), where ε, 0 < ε < 1

6 , has a known value; for this unbalance die, the

probability is 16 of obtaining any of the remaining numbers 2, 3, 4, and 5.

In a simple attempt to identify which of these die is the unbalanced one, it is decidedthat each of the three die will be tossed n times, and then that die producing the small-est number of ones in n tosses will be identified as the unbalanced die. Develop anexpression (which may involve summation signs) that can be used to find the mini-mum value of n (say, n∗) required so that the probability of correctly identifying theunbalanced die will be at least 0.99.

Exercise 3.34∗

(a) If X1 and X2 are i.i.d. random variables, each with the same CDF

FXi (xi) = exp(−e−xi ), −∞ < xi < +∞, i = 1, 2,

prove that the random variable Y = (X1 − X2) has CDF

FY(y) = (1 + e−y)−1, −∞ < y < +∞.

Page 149: Exercises and Solutions in Biostatistical Theory (2010)

130 Multivariate Distribution Theory

(b) In extreme value theory, under certain validating conditions, the largest observa-tion X(n) in a random sample X1, X2, . . . , Xn of size n has a CDF which can beapproximated for large n by the expression

FX(n)(x(n)) = exp{− exp[−nθ(x(n) − β)]}.

The parameters θ (θ > 0) and β(−∞ < β < +∞) depend on the structure of thepopulation being sampled. Using this large-sample approximation and the resultfrom part (a), find an explicit expression for a random variable U = g[X1(m), X2(m)]such that pr(θ ≤ U)

.= (1 − α), 0 < α < 1. Assume that there is a random sample ofsize 2m (m large) available that has been selected from a population of unspecifiedstructure satisfying the validating conditions, and consider the random variable

mθ[X1(m) − β

]− mθ[X2(m) − β

],

where X1(m) is the largest observation in the first set of m observations and whereX2(m) is the largest observation in the second set of m observations.

Exercise 3.35∗. Let X1, X2, . . . , Xn constitute a random sample of size n from the densityfunction fX(x), −∞ < x < +∞.

(a) For i = 1, 2, . . . , n, let Ui = FX(Xi), where FX(x) = ∫x−∞ fX(t) dt. Find the distri-

bution of the random variable Ui. The transformation Ui = FX(Xi) is called theProbability Integral Transformation.

(b) Let U(1), U(2), . . . , U(n) be the n order statistics corresponding to the i.i.d. randomvariables U1, U2, . . . , Un. For 1 ≤ r < s ≤ n, prove that the random variable

Vrs = [U(s) − U(r)] ∼ BETA(α = s − r, β = n − s + r + 1).

(c) For 0 < θ < 1 and 0 < p < 1, consider the probability statement

θ = pr(Vrs ≥ p) = pr{[U(s) − U(r)] ≥ p}= pr

[FX(X(s)) − FX(X(r)) ≥ p

].

The random interval [X(r), X(s)] is referred to as a 100θ percent tolerance interval forthe density function fX(x). More specifically, this random interval has probabilityθ of containing at least a proportion p of the total area (equal to 1) under fX(x),regardless of the particular structure of fX(x). As an example, find the numericalvalue of θ when n = 10, r = 1, s = 10, and p = 0.80.

Exercise 3.36∗. Clinical studies where several clinics participate, using a standardizedprotocol, in the evaluation of new drug therapies have become quite common. In whatfollows, assume that a statistical design is being used for which patients who meetprotocol requirements are each randomly assigned to one of t new drug therapiesand to one of c clinics, where the c clinics participating in a particular study can beconsidered to represent a random sample from a conceptually very large populationof clinics that might use the new drug therapies.

Page 150: Exercises and Solutions in Biostatistical Theory (2010)

Exercises 131

For i = 1, 2, . . . , t, j = 1, 2, . . . , c, and k = 1, 2, . . . , nij, consider the linear model

Yijk = μi + βj + γij + εijk ,

where Yijk is a continuous random variable representing the response to the ith drugtherapy of the kth patient at the jth clinic, μi is the fixed average effect of the ith drugtherapy, βj is a random variable representing the random effect of the jth clinic, γij is arandom variable representing the random effect due to the interaction between the ithdrug therapy and the jth clinic, and εijk is a random variable representing the randomeffect of the kth patient receiving the ith drug therapy at the jth clinic. The randomvariables βj, γij, and εijk are assumed to be mutually independent random variablesfor all i, j, and k, each with an expected value equal to 0 and with respective variancesequal to σ2

β, σ2

γ, and σ2ε .

(a) Develop an explicit expression for V(Yijk), the variance of Yijk .

(b) Develop an explicit expression for the covariance between the responses of twodifferent patients receiving the same drug therapy at the same clinic.

(c) Develop an explicit expression for the covariance between the responses of twodifferent patients receiving different drug therapies at the same clinic.

(d) For i = 1, 2, . . . , t, let Yij = n−1ij∑nij

k=1 Yijk be the mean of the nij responses for

patients receiving drug therapy i at clinic j. Develop explicit expressions for E(Yij),

for V(Yij), and for cov(

Yij, Yi′j)

when i = i′.(e) Let

L =t∑

i=1

aiYi,

where the {ai}ti=1 are a set of known constants satisfying the constraint

∑ti=1 ai = 0

and where Yi = c−1∑cj=1 Yij. Develop explicit general expressions for E(L) and

for V(L). For the special case when a1 = +1, a2 = −1, a3 = a4 = · · · = at = 0, howdo the general expressions for E(L) and V(L) simplify? More generally, commenton why L can be considered to be an important random variable when analyzingdata from multicenter clinical studies that simultaneously evaluate several drugtherapies.

Exercise 3.37∗. Let X1, X2, . . . , Xn constitute a random sample of size n(>1) from a par-ent population of unspecfied structure, where E(Xi) = μ, V(Xi) = σ2, and E[(Xi − μ)4] =μ4, i = 1, 2, . . . , n. Define the sample mean and the sample variance, respectively, as

X = n−1n∑

i=1

Xi and S2 = (n − 1)−1n∑

i=1

(Xi − X)2.

(a) Prove that

V(S2) = 1n

[μ4 −

(n − 3n − 1

)σ4]

.

Page 151: Exercises and Solutions in Biostatistical Theory (2010)

132 Multivariate Distribution Theory

(b) How does the general expression in part (a) simplify if the parent population isPOI(λ) and if the parent population is N(μ, σ2)?

Exercise 3.38∗. Let X1, X2, . . . , Xn constitute a random sample of size n from a parentpopulation of unspecified structure, where E(Xi) = μ, V(Xi) = σ2, and E[(Xi − μ)3] =μ3, i = 1, 2, . . . , n. Define the sample mean and the sample variance, respectively, as

X = n−1n∑

i=1

Xi and S2 = (n − 1)−1n∑

i=1

(Xi − X)2.

(a) Show that cov(X, S2) can be written as an explicit function of n and μ3.

(b) Suppose that X1 and X2 constitute a random sample of size n = 2 from the parentpopulation

pX(x) =(

14

)|x| (12

)1−|x|, x = −1, 0, 1.

Show directly that cov(X, S2) = 0, but that X and S2 are dependent randomvariables. Comment on this finding relative to the general result developed inpart (a).

SOLUTIONS

Solution 3.1

(a) The joint distribution of X and Y is

pX,Y(x, y) = pr(X = x)pr(Y = y|X = x) =(

1N

)(1

N − 1

),

x = 1, 2, . . . , N and y = 1, 2, . . . , N with x = y.

Hence, the marginal distribution of X is

pX(x) =∑

all y, y =x

pX,Y(x, y) = (N − 1)

N(N − 1)= 1

N, x = 1, 2, . . . , N.

Analogously, the marginal distribution of Y is

pY(y) = 1N

, y = 1, 2, . . . , N.

Page 152: Exercises and Solutions in Biostatistical Theory (2010)

Solutions 133

(b)

pr[X ≥ (N − 1)|Y = y]

= pr{[X ≥ (N − 1)] ∩ (Y = y)}pr(Y = y)

= pr[(X = N − 1) ∩ (Y = y)] + pr[(X = N) ∩ (Y = y)]1/N

={

2N−1 , y = 1, 2, . . . , (N − 2);

1N−1 , y = (N − 1), N.

(c)

E(X) = E(Y) = 1N

N∑

i=1

i = N(N + 1)/2N

= (N + 1)

2.

V(X) = V(Y) = 1N

N∑

i=1

i2 −[

(N + 1)

2

]2

= N(N + 1)(2N + 1)/6N

− (N + 1)2

4= (N2 − 1)

12.

Now, pY(y|X = x) = 1/(N − 1), y = 1, 2, . . . , N with y = x.So,

E(Y|X = x) =⎡⎣

N∑

y=1

y(N − 1)

⎤⎦− x

(N − 1)= N(N + 1)

2(N − 1)− x

(N − 1).

So,

E(XY) = Ex{E(XY|X = x)} = Ex{xE(Y|X = x)}

= Ex

{N(N + 1)

2(N − 1)x − x2

(N − 1)

}

= N(N + 1)

2(N − 1)E(X) − E(X2)

(N − 1)

= N(N + 1)2

4(N − 1)−[(N2 − 1)/12 + (N + 1)2/4

]

(N − 1)

= (N + 1)(3N + 2)

12.

So,

cov(X, Y) = (N + 1)(3N + 2)

12− (N + 1)2

4= −(N + 1)

12.

Page 153: Exercises and Solutions in Biostatistical Theory (2010)

134 Multivariate Distribution Theory

Hence,

corr(X, Y) = cov(X, Y)√V(X)V(Y)

= −(N + 1)/12√[(N2 − 1)/12][(N2 − 1)/12]

= −1(N − 1)

.

As N → ∞, corr(X, Y) → 0 as expected, since the population of balls is becominginfinitely large.

Solution 3.2

(a) The most direct approach is to list all the 25 = 32 possible sequences and theirassociated individual probabilities of occurring. If we let

πml = pr[(M5 = m) ∩ (L5 = l)], m = 0, 1, . . . , 5 and l = 0, 1, . . . , 5,

it then follows directly that

π00 = (1 − π)5, π11 = [π3(1 − π)2 + 6π2(1 − π)3 + 5π(1 − π)4],π22 = [π4(1 − π) + 4π2(1 − π)3], π33 = 3π3(1 − π)2,

π44 = 2π4(1 − π), π55 = π5, π12 = 6π3(1 − π)2, π13 = 2π4(1 − π),

and πml = 0 otherwise.

(b) With pL5(l) = pr(L5 = l) = πl, l = 0, 1, . . . , 5, then

π0 = (1 − π)5, π1 = [π3(1 − π)2 + 6π2(1 − π)3 + 5π(1 − π)4],π2 = [π4(1 − π) + 4π2(1 − π)3 + 6π3(1 − π)2,

π3 = [3π3(1 − π)2 + 2π4(1 − π)],π4 = 2π4(1 − π), and π5 = π5.

Thus,

E(L5) =5∑

l=0

lπl = 5π(1 − π)4 + 14π2(1 − π)3 + 22π3(1 − π)2

+ 16π4(1 − π) + 5π5.

When π = 0.90, then E(L5) = 4.1745. For further details, see Makri, Philippou, andPsillakis (2007).

Page 154: Exercises and Solutions in Biostatistical Theory (2010)

Solutions 135

Solution 3.3

(a) Since E(U) = (n + 1)/2, it follows from conditional expectation theory thatE(X) = Eu[E(X|U = u)] = Eu(u) = E(U) = (n + 1)/2. Completely analogously,E(Y) = (n + 1)/2. Also, V(X|U = u) = (1 − u−1)/(u−1)2 = u(u − 1), so that V(X)

= Vu[E(X|U = u)] + Eu[V(X|U = u)] = Vu(u) + Eu[u(u − 1)] = V(U) + E(U2) −E(U) = 2V(U) + [E(U)]2 − E(U) = 5

12 (n2 − 1). Completely analogously, V(Y) =512 (n2 − 1). Also, E(XY) = Eu[E(XY|U = u)] = Eu[E(X|U = u)E(Y|U = u)] =Eu(u2) = V(U) + [E(U)]2 = (n + 1)(2n + 1)/6.

Thus, based on the above results,

corr(X, Y) = E(XY) − E(X)E(Y)√V(X)V(Y)

= 15

.

(b) Now,

pr(X = Y) = 1 − pr(X = Y)

= 1 −n∑

u=1

pr(X = Y|U = u)pr(U = u)

= 1 − 1n

n∑

u=1

pr(X = Y|U = u).

And,

pr(X = Y|U = u) =∞∑

k=1

pr(X = k|U = u)pr(Y = k|U = u)

=∞∑

k=1

u−1(1 − u−1)k−1u−1(1 − u−1)k−1

= u−2∞∑

k=1

[(1 − u−1)2]k−1

= u−2[

11 − (1 − u−1)2

]= 1

2u − 1.

So,

pr(X = Y) = 1 − 1n

n∑

u=1

12u − 1

.

And, when n = 4, pr(X = Y) = 0.581.

Page 155: Exercises and Solutions in Biostatistical Theory (2010)

136 Multivariate Distribution Theory

Solution 3.4

(a) First, the conditional density function of Tν given U = u is N(0, ν/u). SinceU ∼ χ2

ν =GAMMA(α = 2, β = ν

2), we have

fTν(tν) =

∫∞0

fTν,U(tν, u) du =∫∞

0fTν

(tν|U = u)fU(u) du

=∫∞

0

u1/2√

2πνe−ut2

ν/2ν · uν2 −1e−u/2

Γ(ν2)

2ν/2 du

= 1√2πνΓ

(ν2)

2ν/2

∫∞0

u

(ν+1

2

)−1

e−(t2ν/2ν+1/2

)u du

=Γ [(ν + 1)/2]

(t2ν

2ν+ 1

2

)−[(ν+1)/2]

√2πνΓ

(ν2)

2ν/2

= Γ [(ν + 1)/2]√πνΓ

(ν2)(

1 + t2ν

ν

)−(

ν+12

)

, −∞ < tν < ∞.

(b) Since Z and U are independent random variables, we have

E(Tν) = √νE(

ZU−1/2)

= √νE(Z)E

(U−1/2

)= 0 since E(Z) = 0.

And,

V(Tν) = E(T2ν ) = νE

(Z2U−1

)= νE(Z2)E(U−1)

= ν(1)E(U−1) = νE(U−1).

Since U ∼ GAMMA(α = 2, β = ν/2), we know that

E(Ur) = Γ (β + r)Γ (β)

αr =(ν2 + r

)

Γ(ν2) 2r ,

2+ r)

> 0.

Finally, with r = −1,

V(Tν) = νΓ(ν2 − 1

)

Γ(ν2) 2−1 = ν

Γ(ν2 − 1

)(ν2 − 1

)Γ(ν2 − 1

)2−1 = ν

(ν − 2), ν > 2.

Solution 3.5

(a) First, using conditional expectation theory, we have

E(Y) = Ex[E(Y|X = x)] = Ex[β0 + β1x] = β0 + β1E(X).

Page 156: Exercises and Solutions in Biostatistical Theory (2010)

Solutions 137

And, since

E(XY) = Ex[E(XY|X = x)] = Ex[xE(Y|X = x)]= Ex[x(β0 + β1x)] = β0E(X) + β1E(X2),

we have

cov(X, Y) = β0E(X) + β1E(X2) − E(X)[β0 + β1E(X)] = β1V(X).

Thus,

corr(X, Y) = β1V(X)√V(X)V(Y)

= β1

√V(X)

V(Y).

Thus, corr(X, Y) = kβ1, where k = √V(X)/V(Y) > 0. In particular, when β1 = 0,

indicating no linear relationship between X and Y in the sense that E(Y|X = x) =β0 does not depend on x, then corr(X, Y) = 0. When β1 < 0, then corr(X, Y) <

0. And, when β1 > 0, then corr(X, Y) > 0. In general, corr(X, Y) is reflecting thestrength of the linear, or straight-line, relationship between X and Y.

(b) When E(X|Y = y) = α0 + α1y, it follows, using arguments identical to those usedin part (a), that

corr(X, Y) = α1V(Y)√V(X)V(Y)

= α1

√V(Y)

V(X).

Thus, it follows directly that α1 and β1 have the same sign and that [corr(X, Y)]2 =α1β1. In particular, when both α1 and β1 are negative, then corr(X, Y) = −√α1β1;and, when both α1 and β1 are positive, then corr(X, Y) = +√α1β1.

(c) If E(Y|X = x) = β0 + β1x + β2x2, then

E(Y) = Ex[E(Y|X = x)] = Ex[β0 + β1x + β2x2]= β0 + β1E(X) + β2E(X2).

And, since

E(XY) = Ex[E(XY|X = x)] = Ex[xE(Y|X = x)]= Ex[x(β0 + β1x + β2x2)] = β0E(X) + β1E(X2) + β2E(X3),

we have

cov(X, Y) = β0E(X) + β1E(X2) + β2E(X3)

− E(X)[β0 + β1E(X) + β2E(X2)]= β1V(X) + β2[E(X3) − E(X)E(X2)].

Page 157: Exercises and Solutions in Biostatistical Theory (2010)

138 Multivariate Distribution Theory

Finally,

corr(X, Y) = β1

√V(X)

V(Y)+ β2

[E(X3) − E(X)E(X2)√

V(X)V(Y)

].

Thus, unless β2 = 0 or unless E(X3) = E(X)E(X2), the direct connection betweencorr(X, Y) and β1 is lost.

Solution 3.6

(a)

fX(x) = 6θ−3∫x

0(x − y) dy = 6θ−3

(x2 − x2

2

)= 3x2

θ3 , 0 < x < θ.

So,

E(Xr) =∫θ

0xr · 3θ−3x2 dx = 3θr

(r + 3), r = 1, 2, . . . ;

thus,

E(X) = 3θ

4, E(X2) = 3θ2

5, and V(X) = 3θ2

5−(

4

)2= 3θ2

80.

Now,

fY(y|X = x) = fX,Y(x, y)

fX(x)= 2(x − y)

x2 , 0 < y < x.

So,

E(Yr|X = x) =∫x

0yr(

2x

− 2yx2

)dy = 2xr

(r + 1)(r + 2), r = 1, 2, . . .

So,

E(Y|X = x) = x3

(which is a linear function of x).

Also,

E(Y2|X = x) = x2

6, so that V(Y|X = x) = x2

6−(x

3

)2 = x2

18.

Since corr(X, Y) =(

13

)√V(X)/V(Y), we need V(Y).

Now,

V(Y) = Vx[E(Y|X = x)] + Ex[V(Y|X = x)]

= V(

X3

)+ E

(X2

18

)

= 3θ2

80= V(X), so that corr(X, Y) = 1

3.

Page 158: Exercises and Solutions in Biostatistical Theory (2010)

Solutions 139

Equivalently,

E(Y) = Ex[E(Y|X = x)] = E(

X3

)= θ

4

and

E(XY) = Ex[xE(Y|X = x)] = E

(X2

3

)= θ2

5,

so that

cov(X, Y) = E(XY) − E(X)E(Y)

= θ2

5−(

4

)(θ

4

)= θ2

80.

Hence,

corr(X, Y) = cov(X, Y)√V(X)V(Y)

= θ2/80√(3θ2/80)(3θ2/80)

= 13

as before.

(b)

pr[(X + Y) < θ

∣∣∣∣(X + 2Y) >θ

4

]=

pr{

[(X + Y) < θ] ∩[(X + 2Y) > θ

4

]}

pr[(X + 2Y) > θ

4

] .

So,

pr{

[(X + Y) < θ] ∩[(X + 2Y) >

θ

4

]}

=∫ θ

4

θ12

∫x(

θ−4x8

) fX,Y(x, y) dy dx +∫ θ

2

θ4

∫x

0fX,Y(x, y) dy dx

+∫θ

θ2

∫ (θ−x)

0fX,Y(x, y) dy dx

=∫ θ

12

0

∫ (θ−y)

(θ−8y

4

) fX,Y(x, y) dx dy +∫ θ

2

θ12

∫ (θ−y)

yfX,Y(x, y) dx dy.

And,

pr[(X + 2Y) >

θ

4

]=

∫ θ4

θ12

∫x(

θ−4x8

) fX,Y(x, y) dy dx +∫θ

θ4

∫x

0fX,Y(x, y) dy dx

=∫ θ

12

0

∫θ

(θ−8y

4

) fX,Y(x, y) dx dy +∫θ

θ12

∫θ

yfX,Y(x, y) dx dy,

where fX,Y(x, y) = 6θ−3(x − y), 0 < y < x < θ.

Page 159: Exercises and Solutions in Biostatistical Theory (2010)

140 Multivariate Distribution Theory

(c) From part (a), we know that E(Xi) = 3θ4 , V(Xi) = 3θ2

80 , E(Yi) = θ4 , V(Yi) =

3θ2

80 , and cov(Xi, Yi) = θ2

80 .So,

E(L) = E(3X − 2Y) = 3E(X) − 2E(Y) = 7θ

4, since E(X) = 1

n

n∑i=1

E(Xi) and E(Y) =1n

n∑i=1

E(Yi).

And,

V(L) = V(3X − 2Y) = V

[(3)

1n

n∑i=1

Xi − (2)1n

n∑i=1

Yi

]= 1

n2

n∑i=1

V(3Xi − 2Yi), since

the pairs are mutually independent.Now,

V(3Xi − 2Yi) = 9V(Xi) + 4V(Yi) + 2(3)(−2)cov(Xi, Yi)

= 9

(3θ2

80

)+ 4

(3θ2

80

)− 12

(θ2

80

)= 27θ2

80.

Thus,

V(L) = 1n2

n∑

i=1

(27θ2

80

)= 27θ2

80n.

Solution 3.7

(a) Since

fX(x) = Γ(α + β + 3)

Γ(α + 1)Γ(β + 1)(1 − x)α

∫x

0yβ dy

= Γ(α + β + 3)

Γ(α + 1)Γ(β + 2)xβ+1(1 − x)α, 0 < x < 1,

it follows that

fY(y|X = x) = fX,Y(x, y)

fX(x)= (β + 1)yβx−β−1, 0 < y < x < 1.

Also,

fY(y) = Γ(α + β + 3)

Γ(α + 1)Γ(β + 1)yβ

∫1

y(1 − x)α dx

= Γ(α + β + 3)

Γ(α + 2)Γ(β + 1)yβ(1 − y)α+1, 0 < y < 1.

(b) It is clear that fX(x) and fY(y) are beta distributions, with variances

V(X) = (β + 2)(α + 1)

(α + β + 3)2(α + β + 4)and V(Y) = (β + 1)(α + 2)

(α + β + 3)2(α + β + 4).

Page 160: Exercises and Solutions in Biostatistical Theory (2010)

Solutions 141

And,

E(Y|X = x) =∫x

0yfY(y|X = x) dy = (β + 1)

xβ+1

∫x

0yβ+1 dy =

(β + 1β + 2

)x.

Thus, appealing to the mathematical relationship between the correlationcoefficient and the slope for a simple linear (i.e., straightline) regression model,we have

ρX,Y =(

β + 1β + 2

)√V(X)

V(Y)=[

(α + 1)(β + 1)

(α + 2)(β + 2)

]1/2.

When α = 2 and β = 3, ρX,Y = 0.7746.Alternatively, ρX,Y can be computed using the formula

ρX,Y = E(XY) − E(X)E(Y)√V(X)V(Y)

,

where, for example,

E(XY) = Ex[E(XY|X = x)] = Ex[xE(Y|X = x)] =(

β + 1β + 2

)E(X2),

and

E(X2) = V(X) + [E(X)]2 = (β + 2)(α + 1)

(α + β + 3)2(α + β + 4)+[

β + 2α + β + 3

]2.

(c) Since

E(Xi) = E(X) = (β + 2)

(α + β + 3)= 5

8and E(Yi) = E(Y) = (β + 1)

(α + β + 3)= 1

2,

it follows that E(L) = 3( 58 ) − 5( 1

2 ) = − 58 .

And,

V(L) = V(3X − 5Y) = V

⎡⎣ 3

n

n∑

i=1

Xi − 5n

n∑

i=1

Yi

⎤⎦ = 1

n2 V

⎡⎣

n∑

i=1

(3Xi − 5Yi)

⎤⎦

= 1n

V(3Xi − 5Yi) = 1n

[9V(Xi) + 25V(Yi) − 2(3)(5)ρX,Y

√V(Xi)V(Yi)

].

When n = 10, α = 2, and β = 3, we then find that

V(L) = 110

[9(

5192

)+ 25

(136

)− 30(0.7746)

√(5

192

)(136

)]= 0.0304.

Solution 3.8

(a) Now,

pr[(U ≤ u) ∩ (W = 0)] = pr[(Y2 ≤ u) ∩ (Y2 < Y1)]

Page 161: Exercises and Solutions in Biostatistical Theory (2010)

142 Multivariate Distribution Theory

=∫u

0

∫∞y2

(θ1e−θ1y1

) (θ2e−θ2y2

)dy1 dy2

= θ1(θ1 + θ2)

[1 − e−(θ1+θ2)u

], 0 < u < ∞, w = 0.

So,

fU,W(u, 0) = θ1e−(θ1+θ2)u, 0 < u < ∞, w = 0.

And,

pr[(U ≤ u) ∩ (W = 1)] = pr[(Y1 ≤ u) ∩ (Y1 < Y2)]

=∫u

0

∫∞y1

(θ1e−θ1y1

) (θ2e−θ2y2

)dy2 dy1

= θ2(θ1 + θ2)

[1 − e−(θ1+θ2)u

],

0 < u < ∞, w = 1.

So,

fU,W(u, 1) = θ2e−(θ1+θ2)u, 0 < u < ∞, w = 1.

So, we can compactly combine the above two results notationally as follows:

fU,W(u, w) = θ(1−w)1 θw

2 e−(θ1+θ2)u, 0 < u < ∞, w = 0, 1.

(b) We have

pW(w) =∫∞

0fU,W(u, w) du = θ

(1−w)1 θw

2

∫∞0

e−(θ1+θ2)u du

= θ(1−w)1 θw

2 (θ1 + θ2)−1 =(

θ1θ1 + θ2

)(1−w) ( θ2θ1 + θ2

)w, w = 0, 1.

(c) We have

fU(u) =1∑

w=0

fU,W(u, w) = e−(θ1+θ2)u1∑

w=0

θ(1−w)1 θw

2

= (θ1 + θ2)e−(θ1+θ2)u, 0 < u < ∞.

(d) Since fU,W(u, w) = fU(u)pW(w), 0 < u < ∞, w = 0, 1, it follows that U and W areindependent random variables.

Solution 3.9

(a) First, since Y1 and Y2 have a joint bivariate normal distribution, it follows that therandom variable (Y1 − Y2) is normally distributed.

Page 162: Exercises and Solutions in Biostatistical Theory (2010)

Solutions 143

Also, E(Y1 − Y2) = E(Y1) − E(Y2) = 110 − 100 = 10. And,

V(Y1 − Y2) = V(Y1) + V(Y2) − 2ρ√

V(Y1)V(Y2)

= 225 + 225 − 2(0.80)(15)(15) = 90.

Thus, we have (Y1 − Y2) ∼ N(10, 90).Hence,

pr(Y1 − Y2 > 15) = pr[

(Y1 − Y2) − 10√90

>15 − 10√

90

]= pr(Z > 0.527),

where Z ∼ N(0, 1), so that pr(Y1 − Y2 > 15) ≈ 0.30.So, using the BIN(n = 3, π = 0.30) distribution, the probability that the older

child has an IQ at least 15 points higher than the younger child for at least two ofthree randomly chosen families is equal to

C32(0.30)2(0.70)1 + C3

3(0.30)3(0.70)0 = 0.216.

(b) From general properties of the bivariate normal distribution, we have

E(Y2|Y1 = y1) = E(Y2) + ρ

√V(Y2)

V(Y1)

[y1 − E(Y1)

],

and

V(Y2|Y1 = y1) = V(Y2)(1 − ρ2).

Also, Y2 given Y1 = y1 is normally distributed. In our particular situation,

E(Y2|Y1 = 120) = 100 + (0.80)

√225225

(120 − 110) = 108,

and

V(Y2|Y1 = 120) = 225[1 − (0.80)2] = 81.

So,

pr(Y2 > 120|Y1 = 120) = pr[

Y2 − 108√81

>120 − 108√

81

]

= pr(Z > 1.333)

where Z ∼ N(0, 1), so that pr(Y2 > 120|Y1 = 120) ≈ 0.09.

Page 163: Exercises and Solutions in Biostatistical Theory (2010)

144 Multivariate Distribution Theory

Solution 3.10

(a) Now,

FW(w) = pr(W ≤ w) = pr[(U − V) ≤ w) = Ev[pr(U − v ≤ w|V = v)]

= Ev[pr(U ≤ w + v|V = v)] =∞∫

−∞FU (w + v|V = v) fV(v) dv,

where fV(v) is the density for V. Thus, we obtain

FW(w) =∞∫

−∞e−e−(w+v)

e−ve−e−vdv

=∞∫

−∞e−(e−we−v)

e−ve−e−vdv

=∞∫

−∞e−ve−[(1+e−w)e−v]

dv.

Letting z = 1 + e−w, we obtain

FW(w) =∞∫

−∞e−ve−ze−v

dv

= 1z

∞∫

−∞ze−ve−ze−v

dv = z−1.

Thus, FW(w) = 1/(1 + e−w), −∞ < w < ∞, and hence W has a logistic dis-tribution.

(b) Now, pr(Y = 1) = pr(U > V) = pr(α + E1 > E2) =pr(E1 − E2 > − α) = 1 − FW(−α)

= FW(α) = 1/(1 + e−α). This expression is exactly pr(Y = 1) for an ordinary logis-tic regression model with a single intercept term α. Thus, logistic regression, and,more generally, multinomial logistic regression, can be motivated via a randomutility framework, where the utilities involve i.i.d. standard Gumbel error terms.Likewise, probit regression can be motivated by assuming i.i.d. standard normalerror terms for the utilities.

Solution 3.11

(a) Since X1, X2, . . . , Xn constitute a set of i.i.d. negative exponential random variableswith E(Xi) = λ−1, i = 1, 2, . . . , n, it follows directly that S ∼ GAMMA(α = λ−1,β = n).

Page 164: Exercises and Solutions in Biostatistical Theory (2010)

Solutions 145

Hence, with s =∑ni=1 xi, we have

fX1,X2,...,Xn(x1, x2, . . . , xn|S = s) = fX1,X2,...,Xn,S(x1, x2, . . . , xn, s)fS(s)

= fX1,X2,...,Xn(x1, x2, . . . , xn)

fS(s)

=∏n

i=1 λe−λxi

λnsn−1e−λs/(n − 1)!

= (n − 1)!sn−1 , xi > 0, i = 1, 2, . . . , n,

andn∑

i=1

xi = s.

(b) The inverse functions for this transformation are

X1 = SY1, X2 = S(Y2 − Y1), . . . , Xn−1 = S(Yn−1 − Yn−2);

hence, it follows that the Jacobian is equal to Sn−1, since it is the determinant of the(n − 1) × (n − 1) matrix with (i, j)th element equal to ∂Xi/∂Yj, i = 1, 2, . . . , (n − 1)

and j = 1, 2, . . . , (n − 1). Thus, using the result from part (a), we have

fY1,Y2,...,Yn−1(y1, y2, . . . , yn−1|S = s)

= (n − 1)!sn−1

∣∣∣sn−1∣∣∣ = (n − 1)!, 0 < y1 < y2 < · · · < yn−1 < 1.

(c) When n = 3,

fY1(y1) =∫1

y1

(2!) dy2 = 2(1 − y1), 0 < y1 < 1.

When n = 4,

fY1(y1) =∫1

y1

∫1

y2

(3!) dy3 dy2 = 3(1 − y1)2, 0 < y1 < 1.

In general,

fY1(y1) = (n − 1)(1 − y1)n−2, 0 < y1 < 1.

Solution 3.12 From moment generating function theory, E(Yr

i) = E

(erXi

)=

erμ+r2σ2/2, −∞ < r < ∞, since Xi ∼ N(μ, σ2). So, E(Yi) = eμ+σ2/2 and E(Y2i ) =

e2μ+2σ2, i = 1, . . . , n; also, Y1, Y2, . . . , Yn are mutually independent random variables.

Page 165: Exercises and Solutions in Biostatistical Theory (2010)

146 Multivariate Distribution Theory

So, E(Ya) = eμ+σ2/2 and, by mutual independence,

V(Ya) = 1

n2

n∑

i=1

V(Yi) =e2μ+2σ2 −

(eμ+ σ2

2

)2

n=

e2μ+σ2(

eσ2 − 1)

n.

Also, by mutual independence,

E(Yg) =

n∏

i=1

E(

Y1/ni

)= eμ+ σ2

2n and E(

Y2g

)=

n∏

i=1

E(

Y2/ni

)

= e2μ+2σ2/n,

so that

V(Yg) = e2μ+2σ2/n −

(eμ+σ2/2n

)2 = e2μ+σ2/n(

eσ2/n − 1)

.

Finally,

E(YaYg

) = E

⎡⎣ 1

n

n∑

i=1

YiYg

⎤⎦ = 1

n

n∑

i=1

E(YiYg

).

Now,

E(YiYg

) = E

⎡⎢⎣Yi

⎛⎝

n∏

i=1

Yi

⎞⎠

1/n⎤⎥⎦ = E

⎡⎣Y

(1+ 1n )

i ·∏

allj =i

Y1/nj

⎤⎦

= E(

Y1+ 1

ni

) ∏

all j =i

E(

Y1/nj

)

= e

(n+1

n

)μ+ (n+1)2σ2

2n2 ·[

eμn + σ2

2n2

](n−1)

= e2μ+ (n+3)σ22n .

So,

corr(Ya, Yg

) = E(YaYg

)− E(Ya)

E(Yg)

√V(Ya)

V(Yg) =

e2μ+ (n+3)σ22n −

(eμ+ σ2

2

)(eμ+ σ2

2n

)

√√√√[

e2μ+σ2(

eσ2−1)

n

][e2μ+ σ2

n

(e

σ2n − 1

)]

=e

2μ+(

n+12n

)σ2(

neσ2n − n

)1/2

√e

4μ+(

n+1n

)σ2 (

eσ2 − 1) =

(ne

σ2n − n

)1/2

√eσ2 − 1

,

which does not depend on μ.

Page 166: Exercises and Solutions in Biostatistical Theory (2010)

Solutions 147

Since

neσ2/n − n = n∞∑

j=0

(σ2/n

)j

j! − n = n + σ2 + n∞∑

j=2

(σ2/n

)j

j! − n = σ2 +∞∑

j=2

σ2jn1−j

j! ,

it follows that

limn→∞

[corr(Ya, Yg)

] = σ√eσ2 − 1

,

which monotonically goes to 0 as σ2 → ∞. Hence, the larger is σ2, the smaller is thecorrelation.

Solution 3.13

(a) For the ith group, i = 1, 2, . . . , G, if Yi denotes the number of blood tests requiredfor Plan #2, then

E(Yi) = (1)(1 − π)n + (n + 1)[1 − (1 − π)n]= (n + 1) − n(1 − π)n.

Then, since T2 =∑Gi=1 Yi, it follows that

E(T2) =G∑

i=1

E(Yi) = G[(n + 1) − n(1 − π)n]

= N + G[1 − n(1 − π)n].

(b) For N − E(T2) = G[n(1 − π)n − 1] > 0, we require n(1 − π)n > 1, or equivalently,

ln(n)

n> ln

(1

1 − π

).

Now, it is clear that we want to pick the largest value of n, say n∗, that maximizes thequantity ln(n)/n, thus providing the desired largest value of π, say π∗, for whichE(T2) < N. It is straightforward to show that n∗ = 3, which then gives π∗=0.3066.So, if we use groups of size three, then the expected number of blood tests requiredunder Plan #2 will be smaller than the number N of blood tests required underPlan #1 for all values of π less than 0.3066.

Solution 3.14. Since the conditional distribution of Yp, given Y = y, is BIN(y, π), wehave

MYp,Yp(s, t|Y = y) = E[esYp+tYp |Y = y

]= E

[esYp+t(Y−Yp)|Y = y

]

= etyE[e(s−t)Yp |Y = y

]= ety

[πe(s−t) + (1 − π)

]y

=[πes + (1 − π)et

]y.

Page 167: Exercises and Solutions in Biostatistical Theory (2010)

148 Multivariate Distribution Theory

Hence, letting θ =[πes + (1 − π)et

]and recalling that Y ∼ POI(Lλ), we have

MYp,Yp(s, t) = Ey[MYp,Yp(s, t|Y = y)

]= E(θY)

=∞∑

y=0

(θy)(Lλ)ye−Lλ

y! = e−Lλ∞∑

y=0

(Lλθ)y

y!

= eLλ(θ−1) = eLλ[πes+(1−π)et−1]

= eLλπ(es−1)eLλ(1−π)(et−1)

= MYp(s)MYp(t).

Hence, we have shown that Yp ∼ POI(Lλπ), that Yp ∼ POI[Lλ(1 − π)], and that Ypand Yp are independent random variables. Finally, reasonable estimates of E(Yp) andE(Yp) are Lλπ and Lλ(1 − π), respectively.

Solution 3.15

(a) In general, if Y = ln(X) ∼ N(μ, σ2), then

E(X) = eμ+σ2/2 and V(X) = [E(X)]2(eσ2 − 1),

so that

σ2 = ln[

1 + V(X)

[E(X)]2]

and μ = ln[E(X)] − σ2

2.

So, since E(X|D = 1) = 2.00 and V(X|D = 1) = 2.60, it follows that E(Y|D = 1) =0.443 and V(Y|D = 1) = 0.501. Also, since E(X|D = 0) = 1.50 and V(X|D = 0) =3.00, we have E(Y|D = 0) = −0.018 and V(Y|D = 0) = 0.847. Thus,

pr(D = 1|1.60 < X < 1.80)

= pr [(D = 1) ∩ (1.60 < X < 1.80)]pr(1.60 < X < 1.80)

= pr(1.60 < X < 1.80|D = 1)pr(D = 1)

pr(1.60 < X < 1.80|D = 1)pr(D = 1) + pr(1.60 < X < 1.80|D = 0)pr(D = 0).

Now,

pr(1.60 < X < 1.80|D = 1) = pr(

0.470 − 0.4430.708

< Z <0.588 − 0.443

0.708

)

= pr(0.038 < Z < 0.205)

= 0.070,

Page 168: Exercises and Solutions in Biostatistical Theory (2010)

Solutions 149

where Z ∼ N(0, 1). And,

pr(1.60 < X < 1.80|D = 0) = pr(

0.470 − (−0.018)

0.920< Z <

0.588 − (−0.018)

0.920

)

= pr(0.530 < Z < 0.659)

= 0.043.

So,

pr(D = 1|1.60 < X < 1.80) = 0.070(0.60)

0.070(0.60) + 0.043(0.40)

= 0.0420.042 + 0.017

= 0.0420.059

= 0.712.

(b) Clearly, X is a mixture of lognormal densities, namely,

fX(x) = 0.60

[1√

2π(0.708)xe

−[ln(x)−0.441]22(0.501)

]+ 0.40

[1√

2π(0.920)xe

−[ln(x)−(−0.018)]22(0.847)

],

0 < x < +∞. So,

E(X) = E(X|D = 1)pr(D = 1) + E(X|D = 0)pr(D = 0)

= (2.00)(0.60) + (1.50)(0.40) = 1.20 + 0.60 = 1.80.

And,

E(X2) = E(X2|D = 1)pr(D = 1) + E(X2|D = 0)pr(D = 0)

=[(2.60) + (2.00)2

](0.60) +

[(3.00) + (1.50)2

](0.40)

= (6.60)(0.60) + (5.25)(0.40) = 3.96 + 2.10 = 6.06.

So,

V(X) = 6.06 − (1.80)2 = 6.06 − 3.24 = 2.82.

(c)

θ = pr(misclassification)

= pr [(X ≤ c) ∩ (D = 1)] + pr [(X > c) ∩ (D = 0)]

= pr(X ≤ c|D = 1)pr(D = 1) + pr(X > c|D = 0)pr(D = 0)

= pr(

Z ≤ ln(c) − 0.4430.708

)(0.60) + pr

(Z >

ln(c) − (−0.018)

0.920

)(0.40)

= (0.60)FZ

(ln(c) − 0.443

0.708

)+ 0.40

[1 − FZ

(ln(c) + 0.018

0.920

)],

where FZ(z) = pr(Z ≤ z) when Z ∼ N(0, 1).

Page 169: Exercises and Solutions in Biostatistical Theory (2010)

150 Multivariate Distribution Theory

So, with k = ln(c), we have

dk=(

0.600.708

)1√2π

e− 1

2

(k−0.443

0.708

)2

−(

0.400.920

)1√2π

e− 1

2

(k+0.018

0.920

)2

= 0

⇒ ln(0.848) − (k2 − 0.886k + 0.196)

1.003− ln(0.435) + (k2 + 0.036k + 0.0003)

1.693= 0

⇒(

11.693

− 11.003

)k2 +

(0.8861.003

+ 0.0361.693

)k

+[

0.00031.693

− 0.165 − 0.1961.003

+ 0.832]

= 0

⇒ 0.406k2 − 0.905k − 0.472 = 0.

The two roots of this quadratic equation are:

0.905 ±√

(−0.905)2 − 4(0.406)(−0.472)

2(0.406)= 0.905 ± 1.259

0.812,

or −0.436 and 2.665. The value c∗ = e−0.436 = 0.647 minimizes θ.Note that

c∗ < E(X|D = 0) < E(X|D = 1),

which appears to be a counterintuitive finding. However, note that the value ofc∗ is inversely proportional to the value of the prevalence of the protein (i.e., thehigher the prevalence of the protein, the lower the value of c∗). In the extreme, ifthe prevalence is 0%, then the value of c∗ is +∞; and, if the prevalence is 100%,then the value of c∗ is 0. In our particular situation, the prevalence is 60% (a fairlyhigh value), so that a “low” value of c∗ would be anticipated.

Solution 3.16. Now,

n∑

i=1

(Xi − X)(Yi − Y) =n∑

i=1

(XiYi − XiY − XYi + XY) =n∑

i=1

XiYi − nXY

=n∑

i=1

XiYi − n−1

⎡⎣⎛⎝

n∑

i=1

Xi

⎞⎠⎛⎝

n∑

i=1

Yi

⎞⎠⎤⎦

=n∑

i=1

XiYi − n−1

⎡⎣

n∑

i=1

XiYi +∑

all i =j

XiYj

⎤⎦

= (1 − n−1)

n∑

i=1

XiYi − n−1∑

all i =j

XiYj.

Page 170: Exercises and Solutions in Biostatistical Theory (2010)

Solutions 151

Since

E

⎛⎝

n∑

i=1

XiYi

⎞⎠ = nE(XiYi) = n[cov(Xi, Yi) + μxμy]

= n(ρσxσy + μxμy),

we have

E(U) = (n − 1)−1{(1 − n−1)n(ρσxσy + μxμy) − n−1[n(n − 1)μxμy]}= (n − 1)−1[(n − 1)(ρσxσy + μxμy) − (n − 1)μxμy]= ρσxσy.

Solution 3.17

(a) Let Xi denote the ith benzene concentration measurement, i = 1, 2, . . . , 10. Then,we know that Yi = lnXi ∼ N(μ, σ2 = 2) and that the {Yi} are mutually indepen-dent.

Decision Rule #1:

pr(Xi > 5) = pr[lnXi > ln5] = pr[

Yi − μ√2

>1.6094 − μ√

2

]

= pr[

Z >1.6094 − μ

1.4142

]= 1 − FZ

(1.6094 − μ

1.4142

), Z ∼ N(0, 1).

So, if θ1 = pr(Decision that drinking water violates EPA standard|Decision Rule#1), then

θ1 =10∑

j=3

C10j

[1 − FZ

(1.6094 − μ

1.4142

)]j [FZ

(1.6094 − μ

1.4142

)]10−j

= 1 −2∑

j=0

C10j

[1 − FZ

(1.6094 − μ

1.4142

)]j [FZ

(1.6094 − μ

1.4142

)]10−j.

Now, with E(X) = eμ+ σ22 = eμ+1 = 7, then μ = ln(7) − 1 = 1.9459 − 1 = 0.9459.

Thus, with μ = 0.9459, we have FZ

(1.6094−0.9459

1.4142

)= FZ(0.4692) ≈ 0.680,

so that

θ1 = 1 −2∑

j=0

C10j (0.320)j(0.680)10−j = 1 − 0.0211 − 0.0995 − 0.2107 = 0.6687.

Page 171: Exercises and Solutions in Biostatistical Theory (2010)

152 Multivariate Distribution Theory

Decision Rule #2:Since Y = lnXg = 1

10∑10

i=1 Yi, then Y ∼ N(μ, 2/10). So, with θ2 =pr(Decision thatdrinking water violates EPA standard|Decision Rule #2), then

θ2 = pr(Xg > 5) = pr

[Y − μ√

0.20>

ln5 − μ√0.20

]

= 1 − FZ

(1.6094 − μ

0.4472

), Z ∼ N(0, 1).

With μ = 0.9459, we have

θ2 = 1 − FZ

(1.6094 − 0.9459

0.4472

)

= 1 − FZ(1.4837) ≈ 1 − 0.931 = 0.069.

Decision Rule #3:With θ3 = pr(Decision that drinking water violates EPA standard|Decision Rule#3), we have

θ3 = pr[X(10) > 5

] = 1 − pr[∩10i=1(Xi ≤ 5)]

= 1 − {pr(Yi ≤ ln5)}10 = 1 −

{pr(

Yi − μ√2

≤ 1.6094 − μ√2

)}10

= 1 −[

pr(

Z ≤ 1.6094 − μ

1.4142

)]10, Z ∼ N(0, 1).

With μ = 0.9459, we have

θ3 = 1 −[pr(

Z ≤ 1.6094 − 0.94591.4142

)]10

= 1 − [pr (Z ≤ 0.4692)]10 = 1 − [FZ(0.4692)]10

= 1 − (0.680)10 = 1 − 0.0211 = 0.9789.

(b) With E(X) = 7, so that μ = 0.9459, we have

pr(Xg > 5) = pr(Y > ln5) = pr

[Y − 0.9459√

2/n>

ln5 − 0.9459√2/n

]

= pr(Z > 0.4692√

n), Z ∼ N(0, 1).

Page 172: Exercises and Solutions in Biostatistical Theory (2010)

Solutions 153

Thus, pr(Xg > 5) gets smaller as n increases! The reason for this phenomenon canbe determined by examining E(Xg). In general,

E(Xg) = E

⎡⎢⎣⎛⎝

n∏

i=1

Xi

⎞⎠

1/n⎤⎥⎦ =

n∏

i=1

E(

X1/ni

)

=[E(

eYi/n)]n =

[e

μn + σ2(1/n)2

2

]n

= eμ+ σ22n .

Hence, for n > 1,

E(Xg) < E(X) = eμ+σ2/2,

with the size of the bias increasing as n increases. In particular,

limn→+∞ E(Xg) = eμ,

which is the median, not the mean, of the lognormal distribution of the randomvariable X.

Solution 3.18

pr[min{X2

1, X22, . . . , X2

n} ≤ 0.002]

= 1 − pr[min{X2

1, X22, . . . , X2

n} > 0.002]

= 1 − pr[∩n

i=1(X2i > 0.002)

]

= 1 −n∏

i=1

pr

[(Xi√

2

)2>

0.0022

]

= 1 − pr [(Ui > 0.001)]n

= 1 − (0.975)n,

since Ui ∼ χ21, i = 1, 2, . . . , n.

So, n∗ is the smallest positive integer such that

1 − (0.975)n ≥ 0.80,

or

n ≥ 0.20− ln(0.975)

= 0.200.0253

= 7.9051,

so that n∗ = 8.

Page 173: Exercises and Solutions in Biostatistical Theory (2010)

154 Multivariate Distribution Theory

Solution 3.19

(a) Let the random variable Xi denote the number of coronary bypass grafts neededby the ith patient, i = 1, 2, . . . , 500. It is reasonable to assume that X1, X2, . . . , X500constitute a set of 500 i.i.d random variables. Also, E(Xi) =∑4

j=1 jπj = 1.79

and V(Xi) =∑4j=1 j2πj − (1.79)2 = 1.0059. Thus, with the random variable T =

∑500i=1 Xi denoting the total number of coronary bypass grafts to be performed

during the upcoming year, it follows that E(T) = 500(1.79) = 895.00 and V(T) =500(1.0059) = 502.95. Thus, by the Central Limit Theorem, the standardizedrandom variable Z = [T − E(T)]/√V(T) ∼ N(0, 1) for large n. Hence, we have

pr(T ≤ 900) = pr[

T − E(T)√V(T)

≤ 900 − E(T)√V(T)

]

≈ pr(Z ≤ 0.223) ≈ 0.59.

Thus, the hospital administrator’s suggestion is not reasonable.

(b) In general, if this hospital plans to perform coronary bypass surgery on n patientsduring the upcoming year, then E(T) = 1.79n, V(T) = 1.0059n, and

√V(T) =

1.0029√

n. Again, by the Central Limit Theorem, the standardized random variableZ = (T − 1.79n)/(1.0029

√n) ∼ N(0, 1) for large n. Hence, we have

pr(T ≤ 900) = pr[

T − 1.79n1.0029

√n

≤ 900 − 1.79n1.0029

√n

]

≈ pr[

Z ≤ 897.3975√n

− 1.7848√

n]

.

Hence, for pr(T ≤ 900) ≥ 0.95, n∗ is the largest value of n satisfying the inequality

897.3975√n∗ − 1.7848

√n∗ ≥ 1.645.

It is straightforward to show that n∗ = 482.

Solution 3.20

(a) Since S =∑ki=1 Xi, where the {Xi} are i.i.d. random variables, the Central Limit

Theorem allows us to say that

S − E(S)√V(S)

∼ N(0, 1)

for large k. So, for π = 0.05 and k = 50,

E(S) = kπ

= 500.05

= 1000

Page 174: Exercises and Solutions in Biostatistical Theory (2010)

Solutions 155

and

V(S) = k(1 − π)

π2 = 50(0.95)

(0.05)2 = 19,000;

thus,√

V(S) = 137.84.

So, with Z ∼ N(0, 1) for large k, we have

pr[S > (1100)] = pr[

S − E(S)√V(S)

>(1100) − E(S)√

V(S)

]

= pr[

Z >(1100) − (1000)

137.84

]

= pr(Z > 0.7255) = 0.235.

(b)

MU(t) = E(etU) = E[et(2πS)

]= E

[e2πt

∑ki=1 Xi

]

= E

⎡⎣

k∏

i=1

e2πtXi

⎤⎦ =

k∏

i=1

MXi (2πt),

so that

limπ→0

MU(t) =k∏

i=1

[limπ→0

MXi (2πt)]

.

Now,

limπ→0

MXi (2πt) = limπ→0

{πe2πt

1 − (1 − π)e2πt

}= 0

0,

so we can employ L’Hôpital’s Rule.So,

∂(πe2πt)

∂π= e2πt + 2πte2πt

and

∂[1 − (1 − π)e2πt

]

∂π= e2πt − (1 − π)(2t)e2πt.

Page 175: Exercises and Solutions in Biostatistical Theory (2010)

156 Multivariate Distribution Theory

So,

limπ→0

MXi (2πt) = limπ→0

{e2πt + 2πte2πt

e2πt − (1 − π)(2t)e2πt

}

= limπ→0

{1 + 2πt

1 − (1 − π)2t

}= (1 − 2t)−1,

so that

limπ→0

MU(t) =k∏

i=1

[(1 − 2t)−1

]= (1 − 2t)−k ,

which is the MGF for a GAMMA[α = 2, β = k], or χ22k , random variable. So, for

small π, U = 2πS ∼ χ22k .

(c) For small π,

pr[S > (1100)] = pr[2πS > (2π)(1100)] = pr[U > (2π)(1100)],

where U ∼ χ22k .

When π = 0.05 and k = 50, we have

pr[S > (1100)] = pr[U > (2)(0.05)(1100)] = pr(U > 110) ≈ 0.234,

since U ∼ χ22k = χ2

100. This number agrees quite well with the numerical answercomputed in part (a).

Solution 3.21

FUn(u) = pr(Un ≤ u) = 1 − pr(Un > u).

Now,

pr(Un > u) = pr[

θn[Y(1) − c]c

> u]

= pr{

Y(1) >ucθn

+ c}

= pr{∩n

i=1

(Yi >

ucθn

+ c)}

=n∏

i=1

pr(

Yi >ucθn

+ c)

=[1 − FY

(ucθn

+ c; θ)]n

,

Page 176: Exercises and Solutions in Biostatistical Theory (2010)

Solutions 157

where

FY(y; θ) =∫y

cθcθt−(θ+1) dt

= cθ[−t−θ

]y

c

= cθ[c−θ − y−θ]= 1 − (y/c

)−θ , 0 < c < y < +∞.

So, FUn(u) = 1 − pr(Un > u), where

pr(Un > u) =⎧⎨⎩1 −

⎡⎣1 −

( ucθn + c

c

)−θ⎤⎦⎫⎬⎭

n

=[(

1 + uθn

)−θ]n

=[(

1 + uθn

)n]−θ

.

So,

limn→∞ pr(Un > u) = lim

n→∞

[(1 + u

θn

)n]−θ

= (eu/θ)−θ = e−u.

So,

limn→∞ FUn(u) = 1 − e−u,

so that fU(u) = e−u, 0 < u < +∞.

Solution 3.22

(a) First, note that U = X(1) = min{X1, X2, . . . , Xn} and that V = (1 − X(n)

), where

X(n) = max{X1, X2, . . . , Xn}. Since FX(x) = x, 0 < x < 1, direct application of thegeneral formula for the joint distribution of any two-order statistics based on arandom sample of size n from fX(x) (see the introductory material for this chapter)gives

fX(1),X(n)(x(1), x(n)) = n(n − 1)

(x(n) − x(1)

)n−2 , 0 < x(1) < x(n) < 1.

For the transformation U = X(1) and V = (1 − X(n)

), with inverse functions X(1) =

U and X(n) = (1 − V), the absolute value of the Jacobian is equal to 1; so, it followsdirectly that

fU,V(u, v) = n(n − 1)(1 − u − v)n−2, 0 < u < 1, 0 < (u + v) < 1.

Page 177: Exercises and Solutions in Biostatistical Theory (2010)

158 Multivariate Distribution Theory

(b) Now,

θn = pr [(R > r) ∩ (S > s)] = pr [(nU > r) ∩ (nV > r)]

= pr[(

U >rn

)∩(

V >sn

)]

=∫1−s/n

r/n

∫1−u

s/nn(n − 1)(1 − u − v)n−2 dv du

=(

1 − rn

− sn

)n.

So, we have

limn→∞ θn = lim

n→∞(

1 − rn

− sn

)n

= limn→∞

{1 + [−(r + s)]

n

}n

= e−(r+s) = e−re−s, 0 < r < ∞, 0 < s < ∞.

So, asymptotically, R and S are independent random variables with exponentialdistributions, namely,

fR(r) = e−r , 0 < r < ∞ and fS(s) = e−s, 0 < s < ∞.

Solution 3.23

(a)

E(C) = Ex[E(C|X = x)] =∞∑

x=0

E(C|X = x)pr(X = x)

= E(C|X = 0)pr(X = 0) +∞∑

x=1

E (C|X = x) pr(X = x)

= 0 +∞∑

x=1

E

⎡⎣

X∑

j=1

Cj∣∣X = x

⎤⎦pr(X = x)

=∞∑

x=1

(xμ)pr(X = x) = μ

∞∑

x=0

x pr(X = x) = μE(X) = μλ;

and,

E(C2) =∞∑

x=0

E(C2|X = x)pr(X = x)

= 0 +∞∑

x=1

E

⎡⎢⎣⎛⎝

X∑

j=1

Cj

⎞⎠

2 ∣∣∣∣X = x

⎤⎥⎦pr(X = x)

Page 178: Exercises and Solutions in Biostatistical Theory (2010)

Solutions 159

=∞∑

x=1

⎧⎨⎩E

⎡⎣

x∑

j=1

C2j + 2

all j<k

CjCk

∣∣∣X = x

⎤⎦⎫⎬⎭pr(X = x)

=∞∑

x=1

[x(σ2 + μ2) + x(x − 1)μ2

]pr(X = x)

=∞∑

x=0

(xσ2 + x2μ2)pr(X = x)

= σ2E(X) + μ2E(X2) = σ2λ + μ2(λ + λ2).

Thus, V(C) = σ2λ + μ2(λ + λ2) − (μλ)2 = λ(σ2 + μ2). Alternatively, V(C) =Vx [E(C|X = x)] + Ex [V(C|X = x)] = V(Xμ) + E(Xσ2) = λ(μ2 + σ2).Now,

E(XC) =∞∑

x=0

E(XC|X = x)pr(X = x) =∞∑

x=1

xE(C|X = x)pr(X = x)

=∞∑

x=1

xE

⎡⎣

X∑

j=1

Cj∣∣X = x

⎤⎦pr(X = x) =

∞∑

x=1

x(xμ)pr(X = x)

=∞∑

x=0

μx2pr(X = x) = μE(X2) = μ(λ + λ2).

So,

corr(X, C) = cov(X, C)√V(X)V(C)

= E(XC) − E(X)E(C)√V(X)V(C)

= μ(λ + λ2) − (λ)(μλ)√λ[λ(σ2 + μ2)

] = μ

(σ2 + μ2)1/2 ,

which does not depend on λ.

(b)

MC(t) = E(etC) = Ex[E(etC|X = x)] =∞∑

x=0

E(etC∣∣X = x)pr(X = x)

= E(etC∣∣X = 0)pr(X = 0) +∞∑

x=1

E(etC∣∣X = x)pr(X = x)

Page 179: Exercises and Solutions in Biostatistical Theory (2010)

160 Multivariate Distribution Theory

= (1)(e−λ) +∞∑

x=1

E

⎡⎢⎢⎣e

tX∑

j=1Cj ∣∣X = x

⎤⎥⎥⎦pr(X = x)

= e−λ +∞∑

x=1

E

⎡⎣

x∏

j=1

etCj

⎤⎦pr(X = x)

= e−λ +∞∑

x=1

[M(t)]x pr(X = x)

= e−λ +∞∑

x=0

[M(t)]x λxe−λ

x! − e−λ

= e−λ∞∑

x=0

[λM(t)]x

x! = e−λeλM(t) = eλ[M(t)−1].

So,

E(C) = dMC(t)dt

∣∣t=0={

eλ[M(t)−1] · λdM(t)

dt

}∣∣t=0

= eλ[M(0)−1] · λ

[dM(t)

dt

]∣∣t=0

= eλ(1−1) · λ · E(Cj) = λμ,

which agrees with the result derived in part (a).

Solution 3.24

(a) Since E(Yij) = μ, it follows directly that E(Y) = μ. Now, Y = k−1∑ki=1 Yi, where

Yi = n−1n∑

j=1

Yij = n−1n∑

j=1

(μ + βi + εij) = μ + βi + εi,

where εi = n−1∑nj=1 εij. Thus, V(Yi) = σ2

β+ σ2

ε/n.

Since {Y1, Y2, . . . , Yk} constitute a set of k mutually independent random variables,it follows that

V(Y) = k−2k∑

i=1

V(Yi) =σ2β

k+ σ2

ε

kn.

(b) We can employ the method of Lagrange multipliers to solve this problem. Inparticular, consider the function

Q =σ2β

k+ σ2

ε

kn+ λ

[(kDc + knDp) − C∗] ,

Page 180: Exercises and Solutions in Biostatistical Theory (2010)

Solutions 161

where λ is the Lagrange multiplier. Now, consider the following three equations:

∂Q∂k

=−σ2

β

k2 − σ2ε

k2n+ λ(Dc + nDp) = 0;

∂Q∂n

= −σ2ε

kn2 + λkDp = 0;

∂Q∂λ

= kDc + knDp − C∗ = 0.

Solving these three equations gives

n∗ =(

σε

σβ

)(Dc

Dp

)1/2

and k∗ = C∗(Dc + n∗Dp)

.

(c) Since

n∗ = 32

(10, 000

100

)1/2= 15 and k∗ = 100, 000

10, 000 + 15(100)= 8.70,

the clinical trial should involve 9 private medical practices, with each privatemedical practice being required to enroll 15 patients.

Solution 3.25. First, we have

MY3(t) = E(etY3) =∞∑

y3=1

ety3λ

y33

y3!(eλ3 − 1)

= eλ3

(eλ3 − 1)

∞∑

y3=0

ety3λ

y33 e−λ3

y3! − 1(eλ3 − 1)

= eλ3 [eλ3(et−1)](eλ3 − 1)

− 1(eλ3 − 1)

= eλ3et − 1(eλ3 − 1)

.

So,

MU(t) = E[et(R+S)] = E[et(Y1+Y2+2Y3)]= E[etY1 etY2 e2tY3 ]= Ey3

{E(etY1 etY2 e2tY3 |Y3 = y3)

}

= Ey3

{e2ty3 E(etY1 etY2 |Y3 = y3)

}

Page 181: Exercises and Solutions in Biostatistical Theory (2010)

162 Multivariate Distribution Theory

= Ey3

{e2ty3 E(etY1 |Y3 = y3)E(etY2 |Y3 = y3)

}

= Ey3

{e2ty3 ey3(et−1)ey3(et−1)

}

= Ey3

{e2(et+t−1)y3

}

= eλ3[e2(et+t−1)] − 1(eλ3 − 1)

.

So,

E(U) = dMU(t)dt |t=0

=⎡⎣eλ3[e2(et+t−1)] · λ3e2(et+t−1) · 2(et + 1)

(eλ3 − 1)

⎤⎦

|t=0

= 4λ3eλ3

(eλ3 − 1).

And, since

E(Y3) = dMY3(t)dt |t=0

= λ3eλ3

(eλ3 − 1),

we have

E(U) = E(R + S) = E(Y1 + Y2 + 2Y3)

= Ey3 [E(Y1|Y3 = y3)] + Ey3 [E(Y2|Y3 = y3)] + 2E(Y3)

= E(Y3) + E(Y3) + 2E(Y3) = 4λ3eλ3

(eλ3 − 1).

Solution 3.26. Consider the random variable X =∑Ci=1 Xi, where the dichotomous

random variable Xi takes the value 1 if the ith cell is empty and takes the value 0 ifthe ith cell contains at least one ball.

Now,

pr(Xi = 1) =(

1 − 1C

)n= π, say,

so that E(Xi) = π and V(Xi) = π(1 − π), i = 1, 2, . . . , C.Hence,

E(X) =C∑

i=1

E(Xi) = Cπ = C(

C − 1C

)n.

Now,

V(X) =C∑

i=1

V(Xi) + 2C−1∑

i=1

C∑

j=i+1

cov(Xi, Xj).

Page 182: Exercises and Solutions in Biostatistical Theory (2010)

Solutions 163

And,

cov(Xi, Xj) = E(XiXj) − E(Xi)E(Xj),

with

E(XiXj) = pr[(Xi = 1) ∩ (Xj = 1)] =(

C − 2C

)n,

so that

cov(Xi, Xj) =(

C − 2C

)n−(

C − 1C

)2n.

Finally,

V(X) = C(

C − 1C

)n [1 −

(C − 1

C

)n]+ C(C − 1)

[(C − 2

C

)n−(

C − 1C

)2n]

.

When C = 6 and n = 5, it follows that E(X) = 2.4113 and V(X) = 0.5483.

Solution 3.27

(a) First,

E(Yi) = π and V(Yi) = π(1 − π), i = 1, 2, . . . , N.

So,

E(T) = En [E(T|N = n)]

= En

⎡⎣E

⎛⎝

N∑

i=1

Yi|N = n

⎞⎠⎤⎦

= En[nE(Yi)] = En(nπ) = πE(N).

Since E(N) = θ−1 and V(N) = (1 − θ)/θ2, we have

E(T) = π/θ.

Now,

V(T) = Vn[E(T|N = n)] + En[V(T|N = n)]= Vn(nπ) + En[nπ(1 − π)]= π2V(N) + π(1 − π)E(N)

= π2 (1 − θ)

θ2 + π(1 − π)

θ

= π(π + θ − 2πθ)

θ2 .

Page 183: Exercises and Solutions in Biostatistical Theory (2010)

164 Multivariate Distribution Theory

Now,

E(NT) = En[E(NT|N = n)] = En[nE(T|N = n)]

= En(n2π) = πE(N2) = π

[(1 − θ)

θ2 + 1θ2

]

= π(2 − θ)/θ2.

So,

corr(N, T) =π(2−θ)

θ2 − (πθ

) ( 1θ

)√

π(π+θ−2πθ)

θ2 · (1−θ)

θ2

=√

π(1 − θ)

π(1 − θ) + θ(1 − π).

(b)

pr(T = 0) =∞∑

n=1

pr[(T = 0) ∩ (N = n)] =∞∑

n=1

pr(T = 0|N = n)pr(N = n).

Now, since the {Yi} are i.i.d. Bernoulli random variables, we know that T ∼BIN(n, π) given N = n. So,

pT(t|N = n) = Cnt πt(1 − π)n−t, t = 0, 1, 2, . . . , n.

So,

pr(T = 0) =∞∑

n=1

(1 − π)nθ(1 − θ)n−1

= θ

(1 − θ)

∞∑

n=1

[(1 − π)(1 − θ)]n

= θ

(1 − θ)

{(1 − π)(1 − θ)

1 − (1 − π)(1 − θ)

}

= θ(1 − π)

π + θ(1 − π).

Solution 3.28

(a) Now, since E(Xi) = λ1 and E(Yi) = λ2, we have

E(U) = E(X) − E(Y) = m−1m∑

i=1

E(Xi) − n−1n∑

i=1

E(Yi)

= (λ1 − λ2).

Page 184: Exercises and Solutions in Biostatistical Theory (2010)

Solutions 165

And, since {X1, X2, . . . , Xm; Y1, Y2, . . . , Yn} constitute a set of (m + n) mutuallyindependent random variables with V(Xi) = λ1 and V(Yi) = λ2, it follows that

V(U) = V(X − Y) = V(X) + V(Y) = λ1m

+ λ2n

.

(b) We wish to minimize V(U) subject to the restriction (m + n) = N. So, consider thefunction

H(m, n) = λ1m

+ λ2n

+ γ(m + n − N),

where γ is a Lagrange multiplier.So,

∂H(m, n)

∂m= −λ1

m2 + γ = 0 gives λ1 = γm2,

and∂H(m, n)

∂n= −λ2

n2 + γ = 0 gives λ2 = γn2.

Thus,

λ2λ1

= γn2

γm2 = n2

m2 givesnm

= √θ,

where θ = λ2/λ1.Hence, n = m

√θ gives N = (m + n) = m(1 + √

θ), so that

m = N

(1 + √θ)

and n = N√

θ

(1 + √θ)

.

Note that

nm

= √θ =

√λ2λ1

=√

V(Yi)

V(Xi),

indicating that a larger sample should be selected from the more variable Poissonpopulation. In particular, V(Xi) < V(Yi) requires n > m, V(Xi) > V(Yi) requiresm > n, and V(Xi) = V(Yi) requires m = n.

When N = 60, λ1 = 2, and λ2 = 8, so that√

θ = 2, we obtain m = 20 and n = 40.

Solution 3.29∗

(a)

θ = pr(X = Y) =∞∑

s=0

pr[(X = s) ∩ (Y = s)]

=∞∑

s=0

pr(X = s)pr(Y = s) =∞∑

s=0

(1 − πx)πsx(1 − πy)πs

y

Page 185: Exercises and Solutions in Biostatistical Theory (2010)

166 Multivariate Distribution Theory

= (1 − πx)(1 − πy)

∞∑

s=0

(πxπy)s = (1 − πx)(1 − πy)

[1

(1 − πxπy)

]

= (1 − πx)(1 − πy)

(1 − πxπy).

(b) First, pU(0) = pr(U = 0) = pr(X = Y) = θ. And, for u = 1, 2, . . . , ∞,

pU(u) = pr(|X − Y| = u) = pr[(X − Y) = u] + pr[(X − Y) = −u].

So, for u = 1, 2, . . . , ∞, we have

pr(X − Y = u) =∞∑

k=0

pr[(X = k + u) ∩ (Y = k)]

=∞∑

k=0

pr(X = k + u)pr(Y = k) =∞∑

k=0

(1 − πx)πk+ux (1 − πy)πk

y

= (1 − πx)(1 − πy)πux

∞∑

k=0

(πxπy)k = (1 − πx)(1 − πy)

(1 − πxπy)πu

x = θπux .

And,

pr(X − Y = −u) =∞∑

k=0

pr(X = k)pr(Y = k + u)

=∞∑

k=0

(1 − πx)πkx(1 − πy)πk+u

y = (1 − πx)(1 − πy)πuy

∞∑

k=0

(πxπy)k

= (1 − πx)(1 − πy)

(1 − πxπy)πu

y = θπuy .

Hence, we have

pU(0) = θ and pU(u) = θ(πu

x + πuy

), u = 1, 2, . . . , ∞.

It can be shown directly that∑∞

u=0 pU(u) = 1.

(c) For the available data, the observed value of U is u = 2. So, under the assumptionthat πx = πy = π, say, it follows that

pr(U ≥ 2|πx = πy = π) =∞∑

u=2

2θπu = 2(

1 − π

1 + π

) ∞∑

u=2

πu

= 2(

1 − π

1 + π

)(π2

1 − π

)= 2π2

(1 + π).

Page 186: Exercises and Solutions in Biostatistical Theory (2010)

Solutions 167

Given the restrictions πx ≤ 0.10 and πy ≤ 0.10, the largest possible value of pr(U ≥2|πx = πy = π) = 2π2/(1 + π) is 2(0.10)2/(1 + 0.10) = 0.018. So, these data pro-vide fairly strong statistical evidence that πx = πy.

Solution 3.30∗

(a)

ψ(t) = E[(t + 1)Y] =∞∑

y=1

(t + 1)y

(eθ − 1

)−1θy

y!

= (eθ − 1)−1∞∑

y=1

[θ(t + 1)]yy! = (eθ − 1)−1

⎧⎨⎩

∞∑

y=0

[θ(t + 1)]yy! − 1

⎫⎬⎭

= eθ(t+1) − 1(eθ − 1)

.

(b) For Y a positive integer,

E[(t + 1)Y

]= E

⎡⎣

Y∑

j=0

CYj tj(1)Y−j

⎤⎦

= E

{1 + tY + t2

2Y(Y − 1) + · · ·

}

= 1 + tE(Y) + t2

2E[Y(Y − 1)] + · · ·.

So,

dψ(t)dt

∣∣∣∣t=0

= E(Y),d2ψ(t)

dt2

∣∣∣∣t=0

= E[Y(Y − 1)].

Thus,

ddt

[eθ(t+1) − 1(eθ − 1)

]

t=0

=[

θeθ(t+1)

(eθ − 1)

]

t=0

= θeθ

(eθ − 1)= E(Y).

And,

d2

dt2

[eθ(t+1) − 1(eθ − 1)

]

t=0

= ddt

[θeθ(t+1)

(eθ − 1)

]

t=0

=[

θ2eθ(t+1)

(eθ − 1)

]

t=0

= θ2eθ

(eθ − 1)= E[Y(Y − 1)].

Page 187: Exercises and Solutions in Biostatistical Theory (2010)

168 Multivariate Distribution Theory

Finally,

V(Y) = E[Y(Y − 1)] + E(Y) − [E(Y)]2

= θ2eθ

(eθ − 1)+ θeθ

(eθ − 1)− θ2e2θ

(eθ − 1

)2

= θeθ(eθ − θ − 1)

(eθ − 1)2 .

(c)

pX(x) = pr(X = x) = pr[(Y + Z) = x]

=x−1∑

l=0

pr(Y = x − l)pr(Z = l)

=x−1∑

l=0

(eθ − 1)−1θx−l

(x − l)! · (πθ)le−πθ

l!

=[eπθ(eθ − 1)

]−1θx

x−1∑

l=0

πl

l!(x − l)!

=[eπθ(eθ − 1)

]−1θx

x!x−1∑

l=0

Cxl πl

=[eπθ(eθ − 1)

]−1θx

x!

⎡⎣

x∑

l=0

Cxl πl(1)x−l − πx

⎤⎦

=[eπθ(eθ − 1)

]−1 θx

x![(π + 1)x − πx] , x = 1, 2, . . . , ∞.

(d) Since Z ∼ POI(πθ),

E(X) = E(Y + Z) = E(Y) + E(Z) = θeθ

(eθ − 1)+ πθ = θ

[eθ

(eθ − 1)+ π

].

And,

V(X) = V(Y) + V(Z) = θeθ(eθ − θ − 1)

(eθ − 1)2 + πθ = θ

[eθ(eθ − θ − 1)

(eθ − 1)2 + π

].

Solution 3.31*

(a) pr(both kidneys are still functioning at time t) =

pr[(X1 ≥ t) ∩ (X2 ≥ t)] = pr(X1 ≥ t)pr(X2 ≥ t) =[∫∞

tαe−αx dx

]2= e−2αt.

Page 188: Exercises and Solutions in Biostatistical Theory (2010)

Solutions 169

(b) pr(exactly one kidney is functioning at time t) =pr[(U < t) ∩ (Y ≥ t)]. Now, frompart (a),

FU(u) = pr(U ≤ u) = 1 − e−2αu,

so that

fU(u) = 2αe−2αu, u > 0.

Hence,

fU,Y(u, y) = fU(u)fY(y|U = u) =(

2αe−2αu) [

βe−β(y−u)]

,

0 < u < y < ∞.

Thus,

pr[(U < t) ∩ (Y ≥ t)] =∫ t

0

∫∞t

(2αe−2αu)[βe−β(y−u)] dy du

= 2α

(β − 2α)

(e−2αt − e−βt

), t ≥ 0.

(c) FT(t) = pr(T ≤ t) = 1 − π0(t) − π1(t), so that

fT(t) = ddt

[FT(t)] = 2αβ

(β − 2α)

(e−2αt − e−βt

), t ≥ 0.

(d) The marginal density of Y is given by the expression

fY(y) =∫y

0(2αe−2αu)[βe−β(y−u)] du = 2αβ

(β − 2α)

(e−2αy − e−βy

),

y ≥ 0.

So, as expected, T and Y have exactly the same distribution (i.e., T = Y). Finally,

E(T) = E(Y) = Eu[E(Y|U = u)] = Eu

(u + 1

β

)= 1

2α+ 1

β,

and

V(T) = V(Y) = Vu[E(Y|U = u)] + Eu[V(Y|U = u)]

= Vu

(u + 1

β

)+ E

(1β2

)= 1

4α2 + 1β2 .

Page 189: Exercises and Solutions in Biostatistical Theory (2010)

170 Multivariate Distribution Theory

Solution 3.32∗

(a) Since X and S2 are independent random variables, and sinceX ∼ N(μ, σ2/n), it follows that E[T(n−1)] = √

nE(X − μ)E(S−1) = 0. Thus,

cov

[X,

√n(X − μ)

S

]= E

{X

[√n(X − μ)

S

]}= √

nE[X(X − μ)]E(S−1)

= √nE(X2 − μ2)E(S−1) = √

nV(X)E(S−1)

= σ2√

nE(S−1).

Now, since

U = (n − 1)S2

σ2 ∼ χ2(n−1) = GAMMA

[α = 2, β = (n − 1)

2

],

it follows that

E(Ur) =∫∞

0ur u

(n−1

2

)−1

e−u/2

Γ(

n−12

)2

(n−1

2

) du =Γ(

n−12 + r

)

Γ(

n−12

) 2r ,(

n − 12

)+ r > 0.

So,

E(U−1/2) = E

⎧⎨⎩

[(n − 1)S2

σ2

]−1/2⎫⎬⎭ = σ√

n − 1E(S−1) =

Γ(

n−12 − 1

2

)

Γ(

n−12

) 2−1/2,

so that

E(S−1) =Γ(

n−22

)

Γ(

n−12

)√

(n − 1)

2σ2 , n > 2.

Thus,

cov[X, T(n−1)] = σ

√(n − 1)

2n

Γ(

n−22

)

Γ(

n−12

) , n > 2.

Now,

V[T(n−1)] = E[T2

n−1

]= E

[n(X − μ)2

S2

]= nE[(X − μ)2]E(S−2) = σ2E(S−2).

Page 190: Exercises and Solutions in Biostatistical Theory (2010)

Solutions 171

And, since

E(U−1) = σ2

(n − 1)E(S−2) =

Γ(

n−12 − 1

)

Γ(

n−12

) 2−1 = (n − 3)−1, n > 3,

it follows that

E(S−2) = (n − 1)

(n − 3)σ2

and hence

V[T(n−1)] = (n − 1)

(n − 3), n > 3.

Finally,

corr[X, T(n−1)] =√

(n − 3)

2

Γ(

n−22

)

Γ(

n−12

) , n > 3.

When n = 4, corr[X, T(n−1)] = √2/π = 0.798; for n = 6, corr[X, T(n−1)] =

2√

2/3π = 0.921.

(b) Using the stated “large x” approximation for Γ(x), we have

corr[X, T(n−1)] ≈√

(n − 3)

2

√2πe

−(

n−22

) (n−2

2

)[( n−22

)− 1

2

]

√2πe

−(

n−12

) (n−1

2

)[( n−12

)− 1

2

]

=[

e(n − 3)(n − 2)(n−3)

(n − 1)(n−2)

]1/2

,

so that limn→∞ corr[X, T(n−1)] = 1.As n → ∞, the distribution of T(n−1) becomes that of a standard normal random

variable Z = (X − μ)/(σ/√

n), and the random variable Z = −√nμ/σ + (

√n/σ)X

is a straight line function of the random variable X.

Solution 3.33∗. Let X and Y denote the numbers of ones obtained when the twobalanced die are each tossed n times, and let Z be the number of ones obtained whenthe unbalanced die is tossed n times. Further, let U = min(X, Y). Then, n∗ is the smallestvalue of n such that pr(Z < U) ≥ 0.99.

Page 191: Exercises and Solutions in Biostatistical Theory (2010)

172 Multivariate Distribution Theory

Now, for u = 0, 1, . . . , n,

pr(U = u) = pr[(X = u) ∩ (Y > u)] + pr[(X > u) ∩ (Y = u)]+ pr[(X = u) ∩ (Y = u)]

= pr(X = u)pr(Y > u) + pr(X > u)pr(Y = u)

+ pr(X = u)pr(Y = u)

= 2Cnu

(16

)u (56

)n−u⎡⎣

n∑

j=u+1

Cnj

(16

)j (56

)n−j⎤⎦

+[

Cnu

(16

)u (56

)n−u]2

.

Finally, determine n∗ as the smallest value of n such that

pr(Z < U) =n−1∑

z=0

n∑

u=z+1

pr(Z = z)pr(U = u)

=n−1∑

z=0

n∑

u=z+1

Cnz

(16

− ε

)z (56

+ ε

)n−zpr(U = u) ≥ 0.99.

Solution 3.34

(a)

FY(y) = pr(Y ≤ y) = pr[(X1 − X2) ≤ y] = pr[X1 ≤ (X2 + y)]

=∫∞−∞

pr[X1 ≤ (x2 + y)|X2 = x2]fX2(x2) dx2

=∫∞−∞

e−e−(x2+y)e−e−x2 e−x2 dx2

=∫∞−∞

e−e−x2 (1+e−y)e−x2 dx2.

Let u = −e−x2(1 + e−y), so that du = e−x2(1 + e−y) dx2. So,

FY(y) =∫0

−∞eu(1 + e−y)−1 du

= (1 + e−y)−1∫0

−∞eu du

Page 192: Exercises and Solutions in Biostatistical Theory (2010)

Solutions 173

= (1 + e−y)−1[eu]0−∞= (1 + e−y)−1, −∞ < y < +∞.

(b) Let X1(m) denote the largest observation in the first m observations, and let X2(m)

denote the largest observation in the second m observations. Then, from part (a),the variable

mθ[X1(m) − β] − mθ[X2(m) − β] = mθ[X1(m) − X2(m)]

has the CDF[1 + e−mθ(X1(m)−X2(m))

]−1.

So,

pr{∣∣mθ(X1(m) − X2(m))

∣∣ ≤ k} = pr

{θ ≤ k

m∣∣X1(m) − X2(m)

∣∣}

= pr{−k ≤ mθ

(X1(m) − X2(m)

) ≤ k}

= 1(1 + e−k)

− 1(1 + ek)

=(

ek − 1)

(ek + 1

) , k > 0.

So, if k1−α is chosen so that

(ek1−α − 1

)(

ek1−α + 1) = (1 − α),

then

U = k1−α

m∣∣X1(m) − X2(m)

∣∣ .

Solution 3.35∗

(a) Clearly, 0 < Ui = FX(Xi) < 1. And,

FUi (ui) = pr(Ui ≤ ui) = pr [FX(Xi) ≤ ui]

= pr{

F−1X [FX(Xi)] ≤ F−1

X (ui)}

= pr[Xi ≤ F−1

X (ui)]

= FX

[F−1

X (ui)]

= ui.

So, since dFUi (ui)/dui = fUi (ui) = 1, 0 < ui < 1, it follows that Ui = FX(Xi) has auniform distribution on the interval (0, 1).

Page 193: Exercises and Solutions in Biostatistical Theory (2010)

174 Multivariate Distribution Theory

(b) Given the result in part (a), it follows that U(1), U(2), . . . , U(n) can be considered tobe the order statistics based on a random sample U1, U2, . . . , Un of size n from a uni-form distribution on the interval (0, 1). Hence, from the theory of order statistics,it follows directly that

fU(r),U(s) (u(r), u(s)) = n!(r − 1)!(s − r − 1)!(n − s)!ur−1

(r) (u(s) − u(r))s−r−1

× (1 − u(s))n−s, 0 < u(r) < u(s) < 1.

Now, using the method of transformations, let Vrs ≡ V = [U(s) − U(r)] and W =U(r), so that U(s) = (V + W) and U(r) = W. Then, the Jacobian J = 1, and so

fV,W(v, w) = n!(r − 1)!(s − r − 1)!(n − s)!wr−1vs−r−1(1 − v − w)n−s,

0 < (v + w) < 1.

Then, using the relationship y = w/(1 − v), so that dy = dw/(1 − v), and makinguse of properties of the beta distribution, we have

fV(v) =∫1−v

0

n!(r − 1)!(s − r − 1)!(n − s)!wr−1vs−r−1

× (1 − v − w)n−s dw

=∫1

0

n!(r − 1)!(s − r − 1)!(n − s)! [(1 − v)y]r−1vs−r−1

× [(1 − v) − (1 − v)y]n−s(1 − v) dy

= vs−r−1(1 − v)[(r−1)+(n−s)+1]∫1

0

n!(r − 1)!(s − r − 1)!(n − s)!

× yr−1(1 − y)n−s dy

= Γ(n + 1)

Γ(s − r)Γ(n − s + r + 1)vs−r−1(1 − v)n−s+r , 0 < v < 1,

so that Vrs ∼ BETA(α = s − r, β = n − s + r + 1).

(c) If n = 10, r = 1, s = 10, and p = 0.80, then fVrs(v) = 90v8(1 − v), 0 < v < 1, so that

θ = pr(V1n ≥ 0.80) =∫1

0.8090v8(1 − v) dv = 0.6242.

Solution 3.36∗

(a) V(Yijk) = V(βj) + V[γij] + V(εijk) = σ2β

+ σ2αβ

+ σ2ε .

Page 194: Exercises and Solutions in Biostatistical Theory (2010)

Solutions 175

(b) For fixed i and j, and for k = k′, we have

cov(Yijk , Yijk′) = cov[μi + βj + γij + εijk , μi + βj + γij + εijk′ ]= cov(βj, βj) + cov(γij, γij)

= V(βj) + V(γij) = σ2β + σ2

γ.

(c) For i = i′, for fixed j, and for k = k′, we have

cov(Yijk , Yi′jk′) = cov[μi + βj + γij + εijk , μi′ + βj + γi′j + εi′jk′ ]= cov(βj, βj) = V(βj) = σ2

β.

(d) Now,

Yij = n−1ij

nij∑

k=1

[μi + βj + γij + εijk]

= μi + βj + γij + εij,

where εij = n−1ij∑nij

k=1 εij.So,

E(Yij) = μi and V(Yij) = σ2β + σ2

γ + σ2ε

nij.

Also, for i = i′,

cov(Yij, Yi′j) = cov[μi + βj + γij + εij, μi′ + βj + γi′j + εi′j]= cov(βj, βj) = V(βj) = σ2

β.

(e) First,

L =t∑

i=1

aiYi =t∑

i=1

ai

⎛⎝c−1

c∑

j=1

Yij

⎞⎠ = c−1

t∑

i=1

ai

c∑

j=1

Yij

= c−1t∑

i=1

ai

c∑

j=1

[μi + βj + γij + εij]

=t∑

i=1

aiμi + c−1t∑

i=1

c∑

j=1

aiγij + c−1t∑

i=1

c∑

j=1

ai εij,

since(∑t

i=1 ai

) (∑cj=1 βj

)= 0.

Page 195: Exercises and Solutions in Biostatistical Theory (2010)

176 Multivariate Distribution Theory

So, we clearly have

E(L) =t∑

i=1

aiμi.

And,

V(L) = c−2t∑

i=1

c∑

j=1

a2i σ2

γ + c−2t∑

i=1

c∑

j=1

a2iσ2ε

nij

= σ2γ

c

t∑

i=1

a2i + σ2

ε

c2

t∑

i=1

a2i

c∑

j=1

n−1ij .

For the special case when a1 = +1, a2 = −1, a3 = a4 = · · · = at = 0, we obtain

E(L) = (μ1 − μ2),

which is the true difference in average effects for drug therapies 1 and 2; and,

V(L) = 2σ2γ

c+ σ2

ε

c2

⎛⎝

c∑

j=1

n−11j +

c∑

j=1

n−12j

⎞⎠ .

The random variable L =∑ti=1 aiYi is called a contrast since

∑ti=1 ai = 0, and

L can be used to estimate unbiasedly important comparisons among the set{μ1, μ2, . . . , μt} of t drug therapy average effects. For example, if a1 = +1,a2 = − 1

2 , a3 = − 12 , a4 = a5 = · · · = at = 0, then E(L) = μ1 − 1

2 (μ2 + μ3), which isa comparison between the average effect of drug therapy 1 and the mean of theaverage effects of drug therapies 2 and 3.

Solution 3.37∗

(a) Now, V(S2) = (n − 1)−2V[∑n

i=1(Xi − X)2]. So,

V

⎡⎣

n∑

i=1

(Xi − X)2

⎤⎦ = E

⎧⎪⎨⎪⎩

⎡⎣

n∑

i=1

(Xi − X)2

⎤⎦

2⎫⎪⎬⎪⎭

−⎧⎨⎩E

⎡⎣

n∑

i=1

(Xi − X)2

⎤⎦⎫⎬⎭

2

= E

⎧⎪⎨⎪⎩

⎡⎣

n∑

i=1

(Xi − μ)2 − n(X − μ)2

⎤⎦

2⎫⎪⎬⎪⎭

− (n − 1)2σ4.

Page 196: Exercises and Solutions in Biostatistical Theory (2010)

Solutions 177

Now,

E

⎧⎪⎨⎪⎩

⎡⎣

n∑

i=1

(Xi − μ)2 − n(X − μ)2

⎤⎦

2⎫⎪⎬⎪⎭

= E

⎧⎪⎨⎪⎩

⎡⎣

n∑

i=1

(Xi − μ)2

⎤⎦

2⎫⎪⎬⎪⎭

+ n2E[(X − μ)4

]

− 2nE

⎡⎣(X − μ)2

n∑

i=1

(Xi − μ)2

⎤⎦ .

Now,

E

⎧⎪⎨⎪⎩

⎡⎣

n∑

i=1

(Xi − μ)2

⎤⎦

2⎫⎪⎬⎪⎭

= E

⎡⎣

n∑

i=1

(Xi − μ)4 + 2n−1∑

i=1

n∑

j=i+1

(Xi − μ)2(Xj − μ)2

⎤⎦

= nμ4 + n(n − 1)σ4.

And,

E[(X − μ)4

]= E

⎡⎢⎣⎛⎝ 1

n

n∑

i=1

Xi − μ

⎞⎠

4⎤⎥⎦ = E

⎧⎪⎨⎪⎩

⎡⎣ 1

n

n∑

i=1

(Xi − μ)

⎤⎦

4⎫⎪⎬⎪⎭

= n−4E

⎧⎪⎨⎪⎩

⎡⎣

n∑

i=1

(Xi − μ)

⎤⎦

4⎫⎪⎬⎪⎭

= n−4E

⎧⎨⎩∑∑

· · ·∑ 4!(∏n

i=1 αi!)

n∏

i=1

(Xi − μ)αi

⎫⎬⎭

= n−4∑∑

· · ·∑ 4!(∏n

i=1 αi!)

n∏

i=1

E[(Xi − μ)αi

],

where the notation∑∑ · · ·∑denotes the summation over all nonnegative integer

value choices for α1, α2, . . . , αn such that∑n

i=1 αi = 4.Noting that E [(Xi − μ)αi ] = 0 when αi = 1, we only have to consider two types

of terms: i) αi = 4 for some i and αj = 0 for all j(= i); and, ii) αi = 2 and αj = 2for i = j, and αk = 0 for all k( = i or j). There are n of the former terms, each withexpectation μ4, and there are n(n − 1)/2 of the latter terms, each with expectation6σ4. Thus,

E[(X − μ)4

]= n−4

[nμ4 + n(n − 1)

2(6σ4)

]= n−3

[μ4 + 3(n − 1)σ4

].

Page 197: Exercises and Solutions in Biostatistical Theory (2010)

178 Multivariate Distribution Theory

And,

E

⎡⎣(X − μ)2

n∑

i=1

(Xi − μ)2

⎤⎦ = E

⎧⎪⎨⎪⎩

⎡⎣ 1

n

n∑

i=1

(Xi − μ)

⎤⎦

2 ⎡⎣

n∑

i=1

(Xi − μ)2

⎤⎦⎫⎪⎬⎪⎭

= 1n2 E

⎧⎪⎨⎪⎩

⎡⎣

n∑

i=1

(Xi − μ)2

⎤⎦

2

+ 2

⎡⎣

n∑

k=1

(Xk − μ)2

⎤⎦

n−1∑

i=1

n∑

j=i+1

(Xi − μ)(Xj − μ)

⎫⎪⎬⎪⎭

= 1n2 E

⎧⎪⎨⎪⎩

⎡⎣

n∑

i=1

(Xi − μ)2

⎤⎦

2⎫⎪⎬⎪⎭

= 1n2

[nμ4 + n(n − 1)σ4

]

= n−1[μ4 + (n − 1)σ4

].

So, we have

V

⎡⎣

n∑

i=1

(Xi − X)2

⎤⎦ =

[nμ4 + n(n − 1)σ4

]+ n2

{n−3

[μ4 + 3(n − 1)σ4

]}

− 2n{

n−1[μ4 + (n − 1)σ4

]}− (n − 1)2σ4

= (n − 1)2

nμ4 − (n − 1)(n − 3)

nσ4.

Finally,

V(S2) = (n − 1)−2V

⎡⎣

n∑

i=1

(Xi − X)2

⎤⎦ = 1

n

[μ4 −

(n − 3n − 1

)σ4]

.

(b) For the POI(λ) distribution, σ2 = λ and μ4 = λ(1 + 3λ), giving

V(S2) = 1n

{[λ(1 + 3λ)] −

(n − 3n − 1

)λ2}

= λ

n

[1 +

(2n

n − 1

].

For the N(μ, σ2) distribution, μ4 = 3σ4, giving

V(S2) = 1n

[3σ4 −

(n − 3n − 1

)σ4]

= 2σ4

(n − 1).

Solution 3.38∗

(a) First,

cov(X, S2) = E{[X − E(X)][S2 − E(S2)]} = E(XS2) − E(X)E(S2)

= 1n(n − 1)

E

⎡⎣⎛⎝

n∑

i=1

Xi

⎞⎠⎛⎝

n∑

j=1

(Xj − X)2

⎞⎠⎤⎦− μσ2.

Page 198: Exercises and Solutions in Biostatistical Theory (2010)

Solutions 179

Now,

E

⎡⎣⎛⎝

n∑

i=1

Xi

⎞⎠⎛⎝

n∑

j=1

(Xj − X)2

⎞⎠⎤⎦ = E

⎡⎣

n∑

i=1

Xi(Xi − X)2

⎤⎦+ E

⎡⎣ ∑

all i =j

Xi(Xj − X)2

⎤⎦ .

And,

Xi(Xi − X)2 = (Xi − μ)[(Xi − μ) − (X − μ)]2 + μ(Xi − X)2

= (Xi − μ)[(Xi − μ)2 − 2(X − μ)(Xi − μ)

+ (X − μ)2] + μ(Xi − X)2

= (Xi − μ)3 − 2(X − μ)(Xi − μ)2

+ (X − μ)2(Xi − μ) + μ(Xi − X)2

= (Xi − μ)3 −⎡⎣ 2

n

n∑

l=1

(Xl − μ)

⎤⎦ (Xi − μ)2

+⎡⎣ 1

n

n∑

l=1

(Xl − μ)

⎤⎦

2

(Xi − μ) + μ(Xi − X)2

= (Xi − μ)3 − 2n

(Xi − μ)3 − 2n

(Xi − μ)2∑

all l(=i)

(Xl − μ)

+ 1n2

⎡⎣

n∑

l=1

(Xl − μ)2 +∑

all l =l′(Xl − μ)(Xl′ − μ)

⎤⎦

× (Xi − μ) + μ(Xi − X)2

= (Xi − μ)3 − 2n

(Xi − μ)3 − 2n

(Xi − μ)2∑

all l(=i)

(Xl − μ)

+ 1n2 (Xi − μ)3 + 1

n2 (Xi − μ)∑

all l(=i)

(Xl − μ)2

+ 1n2 (Xi − μ)

all l =l′(Xl − μ)(Xl′ − μ) + μ(Xi − X)2.

Finally,

E

⎡⎣

n∑

i=1

Xi(Xi − X)2

⎤⎦ = nμ3 − 2μ3 − 0 + μ3

n+ 0 + 0 + μ(n − 1)σ2

=[

(n − 1)2

n

]μ3 + (n − 1)μσ2.

Page 199: Exercises and Solutions in Biostatistical Theory (2010)

180 Multivariate Distribution Theory

Also, for i = j,

Xi(Xj − X)2 = (Xi − μ)[(Xj − μ) − (X − μ)]2 + μ(Xj − X)2

= (Xi − μ)[(Xj − μ)2 − 2(X − μ)(Xj − μ)

+ (X − μ)2] + μ(Xj − X)2

= (Xi − μ)(Xj − μ)2 − 2(X − μ)(Xi − μ)(Xj − μ)

+ (X − μ)2(Xi − μ) + μ(Xj − X)2

= (Xi − μ)(Xj − μ)2 −⎡⎣ 2

n

n∑

l=1

(Xl − μ)

⎤⎦

× (Xi − μ)(Xj − μ) +⎡⎣ 1

n

n∑

l=1

(Xl − μ)

⎤⎦

2

× (Xi − μ) + μ(Xj − X)2

= (Xi − μ)(Xj − μ)2 − 2n

(Xi − μ)(Xj − μ)

×n∑

l=1

(Xl − μ) + 1n2

⎡⎣

n∑

l=1

(Xl − μ)2

+∑

all l =l′(Xl − μ)(Xl′ − μ)

⎤⎦(Xi − μ) + μ(Xj − X)2

= (Xi − μ)(Xj − μ)2 − 2n

(Xi − μ)(Xj − μ)

n∑

l=1

(Xl − μ)

+ 1n2 (Xi − μ)3 + 1

n2 (Xi − μ)∑

all l(=i)

(Xl − μ)2

+ (Xi − μ)∑

all l =l′(Xl − μ)(Xl′ − μ) + μ(Xj − X)2.

Hence, we have

E

⎡⎣ ∑

all i =j

Xi(Xj − X)2

⎤⎦ = 0 − 0 + 1

n2 [n(n − 1)μ3] + 0 + 0

+ μ(n − 1)2σ2

=(

n − 1n

)μ3 + (n − 1)2μσ2.

Page 200: Exercises and Solutions in Biostatistical Theory (2010)

Solutions 181

Finally, we have

cov(X, S2) = 1n(n − 1)

{[(n − 1)2

n

]μ3 + (n − 1)μσ2

+(

n − 1n

)μ3 + (n − 1)2μσ2

}− μσ2.

= μ3/n.

(b) The joint distribution of X1 and X2 is equal to

pX1,X2(x1, x2) = pX1

(x1)pX2(x2)

=(

14

)|x1| ( 12

)1−|x1| ( 14

)|x2| ( 12

)1−|x2|

=(

12

)2+|x1|+|x2|, x1 = −1, 0, 1 and x2 = −1, 0, 1.

Hence, it follows directly that the following pairs of (X, S2) values occur with thefollowing probabilities: (−1,0) with probability 1/16, (−1/2,1/2) with probability1/4, (0,0) with probability 1/4, (0,2) with probability 1/8, (1/2, 1/2) with probability1/4, and (1,0) with probability 1/16.

Hence, it is easy to show by direct computation that cov(X, S2) = 0. However,since

pr(S2 = 0|X = 1) = 1 = pr(S2 = 0) = 38 ,

it follows that X and S2 are dependent random variables.Clearly, pX(x) is a discrete distribution that is symmetric about E(X) = 0, so

that μ3 = 0. Thus, it follows from part (a) that, as shown directly, cov(X, S2) = 0.More generally, the random variables X and S2 are independent when selecting arandom sample from a normally distributed parent population, but are generallydependent when selecting a random sample from a nonnormal parent population.

Page 201: Exercises and Solutions in Biostatistical Theory (2010)
Page 202: Exercises and Solutions in Biostatistical Theory (2010)

4Estimation Theory

4.1 Concepts and Notation

4.1.1 Point Estimation of Population Parameters

Let the random variables X1, X2, . . . , Xn constitute a sample of size nfrom some population with properties depending on a row vector θ =(θ1, θ2, . . . , θp) of p unknown parameters, where the parameter space is the setΩ of all possible values of θ. In the most general situation, the n randomvariables X1, X2, . . . , Xn are allowed to be mutually dependent and to havedifferent distributions (e.g., different means and different variances).

A point estimator or a statistic is any scalar function U(X1, X2, . . . , Xn) ≡U(X) of the random variables X1, X2, . . . , Xn, but not of θ. A point estimator orstatistic is itself a random variable since it is a function of the random vectorX = (X1, X2, . . . , Xn). In contrast, the corresponding point estimate or observedstatistic U(x1, x2, . . . , xn) ≡ U(x) is the realized (or observed) numerical valueof the point estimator or statistic that is computed using the realized (orobserved) numerical values x1, x2, . . . , xn of X1, X2, . . . , Xn for the particularsample obtained.

Some popular methods for obtaining a row vector θ = (θ1, θ2, . . . , θp) ofpoint estimators of the elements of the row vector θ = (θ1, θ2, . . . , θp), whereθj ≡ θj(X) for j = 1, 2, . . . , p, are the following:

4.1.1.1 Method of Moments (MM)

For j = 1, 2, . . . , p, let

Mj = 1n

n∑i=1

Xji and E(Mj) = 1

n

n∑i=1

E(Xji),

where E(Mj), j = 1, 2, . . . , p, is a function of the elements of θ.Then, θmm, the MM estimator of θ, is obtained as the solution of the p

equations

Mj = E(Mj), j = 1, 2, . . . , p.

183

Page 203: Exercises and Solutions in Biostatistical Theory (2010)

184 Estimation Theory

4.1.1.2 Unweighted Least Squares (ULS)

Let Qu =∑ni=1[Xi − E(Xi)]2. Then, θuls, the ULS estimator of θ, is chosen to

minimize Qu and is defined as the solution of the p equations

∂Qu

∂θj= 0, j = 1, 2, . . . , p.

4.1.1.3 Weighted Least Squares (WLS)

Let Qw =∑ni=1 wi[Xi − E(Xi)]2, where w1, w2, . . . , wn are weights. Then, θwls,

the WLS estimator of θ, is chosen to minimize Qw and is defined as the solutionof the p equations

∂Qw

∂θj= 0, j = 1, 2, . . . , p.

4.1.1.4 Maximum Likelihood (ML)

Let L(x; θ) denote the likelihood function, which is often simply the joint dis-tribution of the random variables X1, X2, . . . , Xn. Then, θml, the ML estimator(MLE) of θ, is chosen to maximize L(x; θ) and is defined as the solution of thep equations

∂ ln L(x; θ)∂θj

= 0, j = 1, 2, . . . , p.

If τ(θ) is a scalar function of θ, then τ(θml) is the MLE of τ(θ); this is known asthe invariance property of MLEs.

4.1.2 Data Reduction and Joint Sufficiency

The goal of any statistical analysis is to quantify the information contained ina sample of size n by making valid and precise statistical inferences using thesmallest possible number of point estimators or statistics. This data reductiongoal leads to the concept of joint sufficiency.

4.1.2.1 Joint Sufficiency

The statistics U1(X), U2(X), . . . , Uk(X), k ≥ p, are jointly sufficient forthe parameter vector θ if and only if the conditional distribution of Xgiven U1(X) = U1(x), U2(X) = U2(x), . . . , Uk(X) = Uk(x) does not in any waydepend on θ. More specifically, the phrase “in any way” means that the con-ditional distribution of X , including the domain of X , given the k sufficientstatistics is not a function of θ. In other words, the jointly sufficient statisticsU1(X), U2(X), . . . , Uk(X) utilize all the information about θ that is containedin the sample X .

Page 204: Exercises and Solutions in Biostatistical Theory (2010)

Concepts and Notation 185

4.1.2.2 Factorization Theorem

To demonstrate joint sufficiency, the Factorization Theorem (Halmos andSavage, 1949) is quite useful: Let X be a discrete or continuous random vectorwith distribution L(x; θ). Then, U1(X), U2(X), . . . , Uk(X) are jointly sufficientfor θ if and only if there are nonnegative functions g[U1(x), U2(x), . . . , Uk(x); θ]and h(x) such that

L(x; θ) = g[U1(x), U2(x), . . . , Uk(x); θ]h(x),

where, given U1(X) = U1(x), U2(X) = U2(x), . . . , Uk(X) = Uk(x), the functionh(x) in no way depends on θ. Also, any one-to-one function of a sufficientstatistic is also a sufficient statistic.

As an important example, a family Fd = {pX(x; θ), θ ∈ Ω} of discrete prob-ability distributions is a member of the exponential family of distributions ifpX(x; θ) can be written in the general form

pX(x; θ) = h(x)b(θ)e∑k

j=1 wj(θ)vj(x),

where h(x) ≥ 0 does not in any way depend on θ, b(θ) ≥ 0 does not dependon x, w1(θ), w2(θ), . . . , wk(θ) are real-valued functions of θ but not of x, andv1(x), v2(x), . . . , vk(x) are real-valued functions of x but not of θ. Then, ifX1, X2, . . . , Xn constitute a random sample of size n from pX(x; θ), so thatpX(x; θ) =∏n

i=1 pX(xi; θ), it follows that

pX(x; θ) ={[b(θ)]ne

∑kj=1 wj(θ)[

∑ni=1 vj(xi)]

}{ n∏i=1

h(xi)

};

so, by the Factorization Theorem, the p statistics Uj(X) =∑ni=1 vj(Xi), j =

1, 2, . . . , k, are jointly sufficient for θ. The above results also hold when consid-ering a family Fc = {fX(x; θ), θ ∈ Ω} of continuous probability distributions.Many important families of distributions are members of the exponentialfamily; these include the binomial, Poisson, and negative binomial fami-lies in the discrete case, and the normal, gamma, and beta families in thecontinuous case.

4.1.3 Methods for Evaluating the Properties of a Point Estimator

For now, consider the special case of one unknown parameter θ.

4.1.3.1 Mean-Squared Error (MSE)

The mean-squared error of θ as an estimator of the parameter θ is defined as

MSE(θ, θ) = E[(θ − θ)2] = V(θ) + [E(θ) − θ]2,

Page 205: Exercises and Solutions in Biostatistical Theory (2010)

186 Estimation Theory

where V(θ) is the variance of θ and [E(θ) − θ]2 is the squared-bias of θ as anestimator of the parameter θ. An estimator with small MSE has both a smallvariance and a small squared-bias.

Using MSE as the criterion for choosing among a class of possible estima-tors of θ is problematic because this class is too large. Hence, it is commonpractice to limit the class of possible estimators of θ to those estimators thatare unbiased estimators of θ. More formally, θ is an unbiased estimator of theparameter θ if E(θ) = θ for all θ ∈ Ω. Then, if θ is an unbiased estimator of θ, wehave MSE(θ, θ) = V(θ), so that the criterion for choosing among competingunbiased estimators of θ is based solely on variance considerations.

4.1.3.2 Cramér–Rao Lower Bound (CRLB)

Let L(x; θ) denote the distribution of the random vector X , and let θ be anyunbiased estimator of the parameter θ. Then, under certain mathematicalregularity conditions, it can be shown (Rao, 1945; Cramer, 1946) that

V(θ) ≥ 1Ex[(∂ ln L(x; θ)/∂θ)2

] = 1−Ex

[∂2 ln L(x; θ)/∂θ2

] .

In the important special case when X1, X2, . . . , Xn constitute a randomsample of size n from the discrete probability distribution pX(x; θ), so thatL(x; θ) =∏n

i=1 pX(xi; θ), then we obtain

V(θ) ≥ 1

nEx

{(∂ ln[pX(x; θ)]/∂θ

)2} = 1−nEx

{∂2 ln[pX(x; θ)]/∂θ2

} .

A completely analogous result holds when X1, X2, . . . , Xn constitute a ran-dom sample of size n from the density function fX(x; θ). For further discussion,see Lehmann (1983).

4.1.3.3 Efficiency

The efficiency of any unbiased estimator θ of θ relative to the CRLB is defined as

EFF(θ, θ) = CRLB

V(θ), 0 ≤ EFF(θ, θ) ≤ 1,

and the corresponding asymptotic efficiency is limn→∞ EFF(θ, θ).There are situations when no unbiased estimator of θ achieves the CRLB. In

such a situation, we can utilize the Rao–Blackwell Theorem (Rao, 1945; Black-well, 1947) to aid in the search for that unbiased estimator with the smallestvariance (i.e., the minimum variance unbiased estimator or MVUE).

First, we need to introduce the concept of a complete sufficient statistic:

Page 206: Exercises and Solutions in Biostatistical Theory (2010)

Concepts and Notation 187

4.1.3.4 Completeness

The family Fu = {pU(u; θ), θ ∈ Ω}, or Fu = {fU(u; θ), θ ∈ Ω}, for the sufficientstatistic U is called complete (or, equivalently, U is a complete sufficient statis-tic) if the condition E[g(U)] = 0 for all θ ∈ Ω implies that pr[g(U) = 0] = 1 forall θ ∈ Ω.

As an important special case, for an exponential family with Uj(X) =∑ni=1 vj(Xi) for j = 1, 2, . . . , k, the vector of sufficient statistics

U(X) = [U1(X), U2(X), . . . , Uk(X)]

is complete if {w1(θ), w2(θ), . . . , wk(θ) : θ ∈ Ω} contains an open set in �k .

4.1.3.5 Rao–Blackwell Theorem

Let U∗ ≡ U∗(X) be any unbiased point estimator of θ, and let U ≡ U(X) be asufficient statistic for θ. Then, θ = E(U∗|U = u) is an unbiased point estimatorof θ, and V(θ) ≤ V(U∗). If U is a complete sufficient statistic for θ, then θ is theunique (with probability one) MVUE of θ.

It is important to emphasize that the variance of the MVUE of θ may notachieve the CRLB.

4.1.4 Interval Estimation of Population Parameters

4.1.4.1 Exact Confidence Intervals

An exact 100(1 − α)% confidence interval (CI) for a parameter θ involves tworandom variables, L (called the lower limit) and U (called the upper limit),defined so that

pr(L < θ < U) = (1 − α),

where typically 0 < α ≤ 0.10.The construction of exact CIs often involves the properties of statistics

based on random samples from normal populations. Some illustrations areas follows.

4.1.4.2 Exact CI for the Mean of a Normal Distribution

Let X1, X2, . . . , Xn constitute a random sample from a N(μ, σ2) parent pop-ulation. The sample mean is X = n−1∑n

i=1 Xi and the sample variance isS2 = (n − 1)−1∑n

i=1(Xi − X)2.

Page 207: Exercises and Solutions in Biostatistical Theory (2010)

188 Estimation Theory

Then,

X ∼ N

(μ,

σ2

n

),

(n − 1)S2

σ2 =∑n

i=1(Xi − X)2

σ2 ∼ χ2n−1,

and X and S2 are independent random variables.In general, if Z ∼ N(0, 1), U ∼ χ2

ν, and Z and U are independent randomvariables, then the random variable Tν = Z/

√U/ν ∼ tν; that is, Tν has a t-

distribution with ν degrees of freedom (df). Thus, the random variable

Tn−1 = (X − μ)/(σ/√

n)√[(n − 1)S2/σ2]/(n − 1)= X − μ

S/√

n∼ tn−1.

With tn−1,1−α/2 defined so that pr(Tn−1 < tn−1,1−α/2) = 1 − α/2, we then have

(1 − α) = pr(−tn−1,1−α/2 < Tn−1 < tn−1,1−α/2)

= pr

[−tn−1,1−α/2 <

X − μ

S/√

n< tn−1,1−α/2

]

= pr[

X − tn−1,1−α/2S√n

< μ < X + tn−1,1−α/2S√n

].

Thus,

L = X − tn−1,1−α/2S√n

and U = X + tn−1,1−α/2S√n

,

giving

X ± tn−1,1−α/2S√n

as the exact 100(1 − α)% CI for μ based on a random sample X1, X2, . . . , Xn ofsize n from a N(μ, σ2) parent population.

4.1.4.3 Exact CI for a Linear Combination of Means of Normal Distributions

More generally, for i = 1, 2, . . . , k, let Xi1, Xi2, . . . , Xini constitute a randomsample of size ni from a N(μi, σ2

i ) parent population. Then,

i. For i = 1, 2, . . . , k, Xi = n−1i∑ni

j=1 Xij ∼ N(

μi,σ2

ini

);

Page 208: Exercises and Solutions in Biostatistical Theory (2010)

Concepts and Notation 189

ii. For i = 1, 2, . . . , k, (ni−1)S2i

σ2i

=∑ni

j=1(Xij−Xi)2

σ2i

∼ χ2ni−1;

iii. The 2k random variables {Xi, S2i }k

i=1 are mutually independent.

Now, assuming σ2i = σ2 for all i (i.e., assuming variance homogeneity), if

c1, c2, . . . , ck are known constants, then the random variable

k∑i=1

ciXi ∼ N

⎡⎣

k∑i=1

ciμi, σ2

⎛⎝

k∑i=1

c2i

ni

⎞⎠⎤⎦ ;

and, with N =∑ki=1 ni, the random variable

∑ki=1(ni − 1)S2

i

σ2 =∑k

i=1∑ni

j=1(Xij − Xi)2

σ2 ∼ χ2N−k ;

Thus, the random variable

TN−k =∑k

i=1 ciXi −∑ki=1 ciμi

Sp

√∑ki=1

c2i

ni

∼ tN−k ,

where the pooled sample variance is S2p =∑k

i=1(ni − 1)S2i /(N − k).

This gives

k∑i=1

ciXi ± tN−k,1− α2Sp

√√√√ k∑i=1

c2i

ni

as the exact 100(1 − α)% CI for the parameter∑k

i=1 ciμi.In the special case when k = 2, c1 = +1, and c2 = −1, we obtain the well-

known two-sample CI for (μ1 − μ2), namely,

(X1 − X2) ± tn1+n2−2,1−α/2Sp

√1n1

+ 1n2

.

4.1.4.4 Exact CI for the Variance of a Normal Distribution

For i = 1, 2, . . . , k, since (ni − 1)S2i /σ

2i ∼ χ2

ni−1, we have

(1 − α) = pr

[χ2

ni−1,α/2 <(ni − 1)S2

i

σ2i

< χ2ni−1,1−α/2

]= pr(L < σ2

i < U),

Page 209: Exercises and Solutions in Biostatistical Theory (2010)

190 Estimation Theory

where

L = (ni − 1)S2i

χ2ni−1,1−α/2

and U = (ni − 1)S2i

χ2ni−1,α/2

,

and where χ2ni−1,α/2 and χ2

ni−1,1−α/2 are, respectively, the 100 (α/2) and100 (1 − α/2) percentiles of the χ2

ni−1 distribution.

4.1.4.5 Exact CI for the Ratio of Variances of Two Normal Distributions

In general, if U1 ∼ χ2ν1

, U2 ∼ χ2ν2

, and U1 and U2 are independent randomvariables, then the random variable

Fν1,ν2 = U1/ν1

U2/ν2∼ fν1,ν2 ;

that is, Fν1,ν2 follows an f -distribution with ν1 numerator df and ν2 denomi-nator df. As an example, when k = 2, the random variable

Fn1−1,n2−1 =[[(n1 − 1)S2

1]/σ21]/(n1 − 1)[[(n2 − 1)S2

2]/σ22]/(n2 − 1)

=(

S21

S22

)(σ2

2

σ21

)∼ fn1−1,n2−1.

So, since fn1−1,n2−1,α/2 = f −1n2−1,n1−1,1−α/2, we have

(1 − α) = pr

[f −1n2−1,n1−1,1−α/2 <

(S2

1

S22

)(σ2

2

σ21

)< fn1−1,n2−1,1−α/2

]

= pr

[L <

(σ2

2

σ21

)< U

],

where

L = f −1n2−1,n1−1,1−α/2

(S2

2

S21

)and U = fn1−1,n2−1,1−α/2

(S2

2

S21

),

and where fn1−1,n2−1,1−α/2 is the 100 (1 − α/2) percentile of the f -distributionwith (n1 − 1) numerator df and (n2 − 1) denominator df.

4.1.4.6 Large-Sample Approximate CIs

By an approximate CI for a parameter θ, we mean that the random variables Land U satisfy

pr(L < θ < U) ≈ (1 − α),

where typically 0 < α ≤ 0.10.

Page 210: Exercises and Solutions in Biostatistical Theory (2010)

Concepts and Notation 191

The concepts of convergence in distribution (discussed in the front material forChapter 3: Multivariate Distribution Theory) and consistency, coupled withthe use of Slutsky’s Theorem (see Serfling, 2002), are typically used for thedevelopment of ML-based approximate CIs.

4.1.4.7 Consistency

A point estimator θ is a consistent estimator of a parameter θ if, for every ε > 0,

limn→∞ pr(|θ − θ| > ε) = 0.

In this case, we say that θ converges in probability to θ, and we write θP→ θ.

Two sufficient conditions so that θP→ θ are

limn→∞ E(θ) = θ and lim

n→∞ V(θ) = 0.

4.1.4.8 Slutsky’s Theorem

If VnP→ c, where c is a constant, and if Wn

D→ W, then

VnWnD→ cW and (Vn + Wn)

D→ (c + W).

To develop ML-based large-sample approximate CIs, we make use of thefollowing properties of the MLE θml ≡ θ of θ, assuming L(x; θ) is the correctlikelihood function and assuming that certain regularity conditions hold:

i. For j = 1, 2, . . . , p, θj is a consistent estimator of θj. More generally, ifthe scalar function τ(θ) is a continuous function of θ, then τ(θ) is aconsistent estimator of τ(θ).

ii.√

n(θ − θ)D→ MVNp[0, nI−1(θ)],

where I(θ) is the (p × p) expected information matrix, with (j, j′)element equal to

−Ex

[∂2 ln L(x; θ)

∂θj∂θj′

],

and where I−1(θ) is the large-sample covariance matrix of θ basedon expected information. In particular, the (j, j′) element of I−1(θ) isdenoted vjj′(θ) = cov(θj, θj′), j = 1, 2, . . . , p and j′ = 1, 2, . . . , p.

Page 211: Exercises and Solutions in Biostatistical Theory (2010)

192 Estimation Theory

4.1.4.9 Construction of ML-Based CIs

As an illustration, properties (i) and (ii) will now be used to construct a large-sample ML-based approximate 100(1 − α)% CI for the parameter θj.

First, with the (j, j) diagonal element vjj(θ) of I−1(θ) being the large-samplevariance of θj based on expected information, it follows that

θj − θj√vjj(θ)

D→ N(0, 1) as n −→ ∞.

Then, with I−1(θ) denoting the estimated large-sample covariance matrix ofθ based on expected information, and with the (j, j) diagonal element vjj(θ)

of I−1(θ) being the estimated large-sample variance of θj based on expectedinformation, it follows by Sluksky’s Theorem that

θj − θj√vjj(θ)

=√

vjj(θ)

vjj(θ)

[θj − θj√

vjj(θ)

]D→ N(0, 1) as n −→ ∞

since vjj(θ) is a consistent estimator of vjj(θ).Thus, it follows from the above results that

θj − θj√vjj(θ)

∼N(0, 1) for large n.

Finally, with Z1−α/2 defined so that pr(Z < Z1−α/2) = (1 − α/2) whenZ ∼ N(0, 1), we have

(1 − α) = pr(−Z1−α/2 < Z < Z1−α/2)

≈ pr

⎡⎢⎣−Z1−α/2 <

θj − θj√vjj(θ)

< Z1−α/2

⎤⎥⎦

= pr[θj − Z1−α/2

√vjj(θ) < θj < θj + Z1−α/2

√vjj(θ)

].

Thus,

θj ± Z1−α/2

√vjj(θ)

is the large-sample ML-based approximate 100(1 − α)% CI for the parameterθj based on expected information.

Page 212: Exercises and Solutions in Biostatistical Theory (2010)

Concepts and Notation 193

In practice, instead of the estimated expected information matrix, theestimated observed information matrix I(x; θ) is used, with its (j, j′) elementequal to

−[

∂2 ln L(x; θ)∂θj∂θj′

]

|θ=θ

.

Then, with I−1(x; θ) denoting the estimated large-sample covariance matrixof θ based on observed information, and with the (j, j) diagonal elementvjj(x; θ) of I−1(x; θ) being the estimated large-sample variance of θj based onobserved information, it follows that

θj ± Z1−α/2

√vjj(x; θ)

is the large-sample ML-based approximate 100(1 − α)% CI for the parameterθj based on observed information.

4.1.4.10 ML-Based CI for a Bernoulli Distribution Probability

As a simple one-parameter (p = 1) example, let X1, X2, . . . , Xn constitute arandom sample of size n from the Bernoulli parent population

pX(x; θ) = θx(1 − θ)1−x, x = 0, 1 and 0 < θ < 1,

and suppose that it is desired to develop a large-sample ML-based approx-imate 100(1 − α)% CI for the parameter θ. First, the appropriate likelihoodfunction is

L(x; θ) =n∏

i=1

[θxi(1 − θ)1−xi

]= θs(1 − θ)n−s,

where s =∑ni=1 xi is a sufficient statistic for θ.

Now,

ln L(x; θ) = s ln θ + (n − s) ln(1 − θ),

so that the equation

∂ ln L(x; θ)∂θ

= sθ

− (n − s)(1 − θ)

= 0

gives θ = X = n−1∑ni=1 Xi as the MLE of θ.

And,

∂2 ln L(x; θ)∂θ2 = −s

θ2 − (n − s)(1 − θ)2 ,

Page 213: Exercises and Solutions in Biostatistical Theory (2010)

194 Estimation Theory

so that

−E

[∂2 ln L(x; θ)

∂θ2

]= nθ

θ2 + (n − nθ)

(1 − θ)2 = nθ(1 − θ)

.

Hence,

v11(θ) ={

−E

[∂2 ln L(x; θ)

∂θ2

]}−1

|θ=θ

= v11(x; θ) ={

−∂2 ln L(x; θ)∂θ2

}−1

|θ=θ

= X(1 − X)

n,

so that the large-sample ML-based approximate 100(1 − α)% CI for θ isequal to

X ± Z1−α/2

√X(1 − X)

n.

In this simple example, the same CI is obtained using either expected infor-mation or observed information. In more complicated situations, this willtypically not happen.

4.1.4.11 Delta Method

Let Y = g(X), where X = (X1, X2, . . . , Xk), μ = (μ1, μ2, . . . , μk), E(Xi) =μi, V(Xi) = σ2

i , and cov(Xi, Xj) = σij for i = j, i = 1, 2, . . . , k and j = 1, 2, . . . , k.Then, a first-order (or linear) multivariate Taylor series approximation to Yaround μ is

Y ≈ g(μ) +k∑

i=1

∂g(μ)

∂Xi(Xi − μi),

where∂g(μ)

∂Xi= ∂g(X)

∂Xi∣∣X=μ

.

Thus, using the above linear approximation for Y, it follows that E(Y) ≈g(μ) and that

V(Y) ≈k∑

i=1

[∂g(μ)

∂Xi

]2

σ2i + 2

k−1∑i=1

k∑j=i+1

[∂g(μ)

∂Xi

] [∂g(μ)

∂Xj

]σij.

Page 214: Exercises and Solutions in Biostatistical Theory (2010)

Concepts and Notation 195

The delta method for MLEs is as follows. For q ≤ p, suppose that the (1 × q)row vector

Φ(θ) = [τ1(θ), τ2(θ), . . . , τq(θ)]involves q scalar parametric functions of the parameter vector θ. Then,

Φ(θ) = [τ1(θ), τ2(θ), . . . , τq(θ)]is the MLE of Φ(θ).

Then, the (q × q) large-sample covariance matrix of Φ(θ) based on expectedinformation is

[Δ(θ)]I−1(θ)[Δ(θ)]′,where the (i, j) element of the (q × p) matrix Δ(θ) is equal to ∂τi(θ)/∂θj, i =1, 2, . . . , q and j = 1, 2, . . . , p.

Hence, the corresponding estimated large-sample covariance matrix ofΦ(θ)

based on expected information is

[Δ(θ)]I−1(θ)[Δ(θ)]′.Analogous expressions based on observed information are obtained by sub-

stituting I−1(x; θ) for I−1(θ) and by substituting I−1(x; θ) for I−1(θ) in theabove two expressions.

The special case q = p = 1 gives

V[τ1(θ1)] ≈[∂τ1(θ1)

∂θ1

]2

V(θ1).

The corresponding large-sample ML-based approximate 100(1 − α)% CIfor τ1(θ1) based on expected information is equal to

τ1(θ1) ± Z1−α/2

√√√√[∂τ1(θ1)

∂θ1

]2

|θ1=θ1

v11(θ1).

The corresponding CI based on observed information is obtained bysubstituting v11(x; θ1) for v11(θ1) in the above expression.

4.1.4.12 Delta Method CI for a Function of a Bernoulli DistributionProbability

As a simple illustration, for the Bernoulli population example consideredearlier, suppose that it is now desired to use the delta method to obtain alarge-sample ML-based approximate 100(1 − α)% CI for the “odds”

τ(θ) = θ

(1 − θ)= pr(X = 1)

[1 − pr(X = 1)] .

Page 215: Exercises and Solutions in Biostatistical Theory (2010)

196 Estimation Theory

So, by the invariance property, τ(θ) = X/(1 − X) is the MLE of τ(θ) since θ =X is the MLE of θ. And, via the delta method, the large-sample estimatedvariance of τ(θ) is equal to

V[τ(θ)

]≈[∂τ(θ)

∂θ

]2

|θ=θ

V(θ)

=[

1(1 − X)2

]2[

X(1 − X)

n

]

= X

n(1 − X)3.

Finally, the large-sample ML-based approximate 100(1 − α)% CI for τ(θ) =θ/(1 − θ) using the delta method is equal to

X

(1 − X)± Z1−α/2

√X

n(1 − X)3.

EXERCISES

Exercise 4.1. Suppose that Yx ∼ N(xμ, x3σ2), x = 1, 2, . . . , n. Further, assume that{Y1, Y2, . . . , Yn} constitute a set of n mutually independent random variables, andthat σ2 is a known positive constant. Consider the following three estimators of μ:

1. μ1, the method of moments estimator of μ;

2. μ2, the unweighted least squares estimator of μ;

3. μ3, the MLE of μ.

(a) Derive expressions for μ1, μ2, and μ3. (These expressions can involve summationsigns.) Also, determine the exact distribution of each of these estimators of μ.

(b) If n = 5, σ2 = 2, and yx = (x + 1) for x = 1, 2, 3, 4, and 5, construct what you believeto be the “best” exact 95% CI for μ.

Exercise 4.2. An epidemiologist gathers data (xi, Yi) on each of n randomly chosennoncontiguous cities in the United States, where xi (i = 1, 2, . . . , n) is the known popu-lation size (in millions of people) in city i, and where Yi is the random variable denotingthe number of people in city i with liver cancer. It is reasonable to assume that Yi(i = 1, 2, . . . , n) has a Poisson distribution with mean E(Yi) = θxi, where θ (>0) is anunknown parameter, and that Y1, Y2, . . . , Yn constitute a set of mutually independentrandom variables.

(a) Find an explicit expression for the unweighted least-squares estimator θuls of θ.Also, find explicit expressions for E(θuls) and V(θuls).

Page 216: Exercises and Solutions in Biostatistical Theory (2010)

Exercises 197

(b) Find an explicit expression for the method of moments estimator θmm of θ. Also,find explicit expressions for E (θmm) and V(θmm).

(c) Find an explicit expression for the MLE θml of θ. Also, find explicit expressions forE(θml) and V(θml).

(d) Find an explicit expression for the CRLB for the variance of any unbiased estimatorof θ. Which (if any) of the three estimators θuls, θmm, and θml achieve this lowerbound?

Exercise 4.3. Suppose that θ1 and θ2 are two unbiased estimators of an unknown param-eter θ. Further, suppose that the variance of θ1 is σ2

1, that the variance of θ2 is σ22, and that

corr(θ1, θ2) = ρ, −1 < ρ < +1. Define the parameter λ = σ1/σ2, and assume (withoutloss of generality) that 0 < σ1 ≤ σ2 < +∞, so that 0 < λ ≤ 1. Consider the unbiasedestimator of θ of the general form

θ = kθ1 + (1 − k)θ2,

where the quantity k satisfies the inequality −∞ < k < +∞.

(a) Develop an explicit expression (as a function of λ and ρ) for that value of k (say,k∗) that minimizes the variance of the unbiased estimator θ of θ. Discuss the specialcases when ρ > λ and when λ = 1.

(b) Let θ∗ = k∗θ1 + (1 − k∗)θ2, where k∗ was determined in part (a). Develop asufficient condition (as a function of λ and ρ) for which

V(θ∗) < σ21 = V(θ1) ≤ σ2

2 = V(θ2).

Exercise 4.4. Suppose that the random variable Xi ∼ N(βai, σ2i ), i = 1, 2, . . . , n. Further,

assume that {X1, X2, . . . , Xn} constitute a set of mutually independent random vari-ables, that {a1, a2, . . . , an} constitute a set of known constants, and that {σ2

1, σ22, . . . , σ2

n}constitute a set of known variances. Abiostatistician suggests that the random variable

β =n∑

i=1

ciXi

would be an excellent estimator of the unknown parameter β if the constantsc1, c2, . . . , cn are chosen so that the following two conditions simultaneously hold:(1) E(β) = β; and, (2) V(β) is a minimum.

Find explicit expressions for c1, c2, . . . , cn (as functions of the ai’s and σ2i ’s) such that

these two conditions simultaneously hold. Using these “optimal” choices of the ci’s,what then is the exact distribution of this “optimal” estimator of β?

Exercise 4.5. For i = 1, 2, . . . , k, let Yi1, Y12, . . . , Yini constitute a random sample of sizeni (> 1) from a N(μi, σ2) parent population. Further,

Yi = n−1i

ni∑

j=1

Yij and S2i = (ni − 1)−1

ni∑

j=1

(Yij − Yi)2

Page 217: Exercises and Solutions in Biostatistical Theory (2010)

198 Estimation Theory

are, respectively, the sample mean and sample variance of the ni observations fromthis N(μi, σ2) parent population. Further, let N =∑k

i=1 ni denote the total number ofobservations.

(a) Consider estimating σ2 with the estimator

σ2 =k∑

i=1

wiS2i ,

where wi, w2, . . . , wk are constants satisfying the constraint∑k

i=1 wi = 1. Proverigorously that E(σ2) = σ2, namely, that σ2 is an unbiased estimator of σ2.

(b) Under the constraint∑k

i=1 wi = 1, find explicit expressions for w1, w2, . . . , wk suchthat V(σ2), the variance of σ2, is a minimum.

Exercise 4.6. Suppose that a professor in the Maternal and Child Health Department atthe University of North Carolina at Chapel Hill administers a questionnaire (consistingof k questions, each of which is to be answered “yes” or “no”) to each of n randomlyselected mothers of infants less than 6 months of age in Chapel Hill. The purpose ofthis questionnaire is to assess the quality of maternal infant care in Chapel Hill, with“yes” answers indicating good care and “no” answers indicating bad care.

Suppose that this professor asks you, the consulting biostatistician on this researchproject, the following question: Is it possible for you to provide me with a “good”estimator of the probability that a randomly selected new mother in Chapel Hill willrespond “yes” to all k items on the questionnaire, reflecting “perfect care”?

As a start, assume that the number X of “yes” answers to the questionnaire for a ran-domly chosen new mother in Chapel Hill follows a binomial distribution with samplesize k and probability parameter π, 0 < π < 1. Then, the responses X1, X2, . . . , Xn ofthe n randomly chosen mothers can be considered to be a random sample of size nfrom this binomial distribution. Your task as the consulting biostatistician is to findthe minimum variance unbiased estimator (MVUE) θ of θ = pr(X = k) = πk . Onceyou have found an explicit expression for θ, demonstrate by direct calculation thatE(θ) = θ.

Exercise 4.7. Let Y1, Y2, . . . , Yn constitute a random sample of size n (n ≥ 2) from aN(0, σ2) population.

(a) Develop an explicit expression for an unbiased estimator θ of the unknown param-eter θ = σr (r a known positive integer) that is a function of a sufficient statistic forθ.

(b) Derive an explicit expression for the CRLB for the variance of any unbiased esti-mator of the parameter θ = σr . Find a particular value of r for which the varianceof θ actually achieves the CRLB.

Exercise 4.8. In a certain laboratory experiment, the time Y (in milliseconds) for a cer-tain blood clotting agent to show an observable effect is assumed to have the negative

Page 218: Exercises and Solutions in Biostatistical Theory (2010)

Exercises 199

exponential distribution

fY(y) = α−1e−y/α, y > 0, α > 0.

Let Y1, Y2, . . . , Yn constitute a random sample of size n from fY(y), and lety1, y2, . . . , yn be the corresponding observed values (or realizations) of Y1, Y2, . . . , Yn.One can think of y1, y2, . . . , yn as the set of observed times for the blood clotting agentto show an observable effect based on n repetitions of the laboratory experiment.

It is of interest to make statistical inferences about the unknown parameter θ =V(Y) = α2 using the available data y = {y1, y2, . . . , yn}.

(a) Develop an explicit expression for the MLE θ of θ. If the observed value of S =∑ni=1 Yi is the value s = 40 when n = 50, compute an appropriate large-sample

95% CI for the parameter θ.

(b) Develop an explicit expression for the MVUE θ∗ of θ, and then develop an explicitexpression for V(θ∗), the variance of the MVUE of θ.

(c) Does θ∗ achieve the CRLB for the variance of any unbiased estimator of θ?

(d) For any finite value of n, develop explicit expressions for MSE(θ, θ) and MSE(θ∗, θ),the mean squared errors of θ and θ∗ as estimators of the unknown parameter θ. Usingthis MSE criterion, which estimator do you prefer for finite n, and which estimatordo you prefer asymptotically (i.e., as n → +∞)?

Exercise 4.9. Suppose that a laboratory test is conducted on a blood sample fromeach of n randomly chosen human subjects in a certain city in the United States. Thepurpose of the test is to detect the presence of a particular biomarker reflecting recentexposure to benzene, a known human carcinogen. Let π, 0 < π < 1, be the unknownprobability that a randomly chosen subject in this city has been recently exposed tobenzene. When a subject has been recently exposed to benzene, the biomarker will becorrectly detected with known probability γ, 0 < γ < 1; when a subject has not beenrecently exposed to benzene, the biomarker will be incorrectly detected with knownprobability δ, 0 < δ < γ < 1. Let X be the random variable denoting the number ofthe n subjects who are classified as having been recently exposed to benzene (or,equivalently, who provide a blood sample in which the biomarker is detected).

(a) Find an unbiased estimator π of the parameter π that is an explicit function of therandom variable X, and also derive an explicit expression for V(π), the varianceof the estimator π.

(b) If n = 50, α = 0.05, β = 0.90, and if the observed value of X is x =20, compute anappropriate 95% large-sample CI for the unknown parameter π.

Exercise 4.10. A scientist at the National Institute of Environmental Health Sciences(NIEHS) is studying the teratogenic effects of a certain chemical by injecting a group ofpregnant female rats with this chemical and then observing the number of abnormal(i.e., dead or malformed) fetuses in each litter.

Suppose that π, 0 < π < 1, is the probability that a fetus is abnormal. Further, forthe ith of n litters, each litter being of size two, let the random variable Xij take the

Page 219: Exercises and Solutions in Biostatistical Theory (2010)

200 Estimation Theory

value 1 if the jth fetus is abnormal and let Xij take the value 0 if the jth fetus is normal,j = 1, 2.

Since the two fetuses in each litter have experienced the same gestational conditions,the dichotomous random variables Xi1 and Xi2 are expected to be correlated. To allowfor such a correlation, the following correlated binomial model is proposed: for i =1, 2, . . . , n,

pr[(Xi1 = 1) ∩ (Xi2 = 1)] = π2 + θ,

pr[(Xi1 = 1) ∩ (Xi2 = 0)] = pr[(Xi1 = 0) ∩ (Xi2 = 1)]= π(1 − π) − θ,

and

pr[(Xi1 = 0) ∩ (Xi2 = 0)] = (1 − π)2 + θ.

Here, cov(Xi1, Xi2) = θ, −min[π2, (1 − π)2] ≤ θ ≤ π(1 − π).

(a) Let the random variable Y11 be the number of litters out of n for which both fetusesare abnormal, and let the random variable Y00 be the number of litters out of nfor which both fetuses are normal. Show that the MLEs π of π and θ of θ are,respectively,

π = 12

+ (Y11 − Y00)

2n,

and

θ = Y11n

− π2.

(b) Develop explicit expressions for E(π) and V(π).

(c) If n = 30, and if the observed values of Y11 and Y00 are y11 = 3 and y00 = 15,compute an appropriate large-sample 95% CI for π.

For a more general statistical treatment of such a correlated binomial model, seeKupper and Haseman (1978).

Exercise 4.11. A popular epidemiologic study design is the pair-matched case–controlstudy design, where a case (i.e., a diseased person, denoted D) is “matched” (oncovariates such as age, race, and sex) to a control (i.e., a nondiseased person, denotedD). Each member of the pair is then interviewed as to the presence (E) or absence (E)of a history of exposure to some potentially harmful substance (e.g., cigarette smoke,asbestos, benzene, etc.). The data from such a study involving n case–control pairs canbe presented in tabular form, as follows:

DE E

E Y11 Y10D

E Y01 Y00n

Page 220: Exercises and Solutions in Biostatistical Theory (2010)

Exercises 201

Here, Y11 is the number of pairs for which both the case and the control are exposed(i.e., both have a history of exposure), Y10 is the number of pairs for which the case isexposed but the control is not, and so on. Clearly,

∑1i=0∑1

j=0 Yij = n.In what follows, assume that the {Yij} have a multinomial distribution with sample

size n and associated cell probabilities {πij}, where

1∑

i=0

1∑

j=0

πij = 1.

For example, then, π10 is the probability of obtaining a pair in which the case isexposed and its matched control is not. In such a study, the parameter measuring theassociation between exposure status and disease status is the odds ratio OR = π10/π01;the estimator of OR is OR = Y10/Y01.

(a) Under the assumed multinomial model for the {Yij}, use the delta method to

develop an appropriate estimator V(ln OR) of V(ln OR), the variance of the ran-dom variable ln OR. What is the numerical value of your variance estimator whenn = 100 and when the observed cell counts are y11 = 15, y10 = 25, y01 = 15, andy00 = 45?

(b) Assuming that

ln OR − ln OR√V(ln OR)

∼ N(0, 1),

for large n, use the observed cell counts given in part (a) to construct an appropriate95% CI for OR.

Exercise 4.12. Actinic keratoses are small skin lesions that serve as precursors forskin cancer. It has been theorized that adults who are residents of U.S. cities nearthe equator are more likely to develop actinic keratoses, and hence to be at greaterrisk for skin cancer, than are adults who are residents of U.S. cities distant from theequator. To test this theory, suppose that dermatology records for a random sample ofn1 adult residents of a particular U.S. city (say, City 1) near the equator are examined todetermine the number of actinic keratoses that each of these n1 adults has developed. Inaddition, dermatology records for a random sample of n0 adult residents of a particularU.S. city (say, City 0) distant from the equator are examined to determine the numberof actinic keratoses that each of these adults has developed.

As a statistical model for evaluating this theory, for adult resident j (j = 1, 2, . . . , ni)in City i (i = 0, 1), suppose that the random variable Yij ∼ POI(Lijλi), where Lij isthe length of time (in years) that adult j has resided in City i and where λi is therate of development of actinic keratoses per year (i.e., the expected number of actinickeratoses that develop per year) for an adult resident of City i. So, the pair (Lij, yij)constitutes the observed information for adult resident j in City i.

(a) Develop an appropriate ML-based large-sample 100(1 − α)% CI for the log rateratio ln ψ = ln(λ1/λ0).

Page 221: Exercises and Solutions in Biostatistical Theory (2010)

202 Estimation Theory

(b) If n1 = n0 = 30,∑30

j=1 y1j = 40,∑30

j=1 L1j = 350,∑30

j=1 y0j = 35, and∑30

j=1 L0j =400, compute a 95% CI for the rate ratio ψ. Comment on your findings.

Exercise 4.13. The time T (in months) in remission for leukemia patients who havecompleted a certain type of chemotherapy treatment is assumed to have the negativeexponential distribution

fT(t; θ) = θe−θt, t > 0, θ > 0.

Suppose that monitoring a random sample of n leukemia patients who have completedthis chemotherapy treatment leads to the n observed remission times t1, t2, . . . , tn. Informal statistical terms, T1, T2, . . . , Tn represent a random sample of size n from fT(t; θ),and t1, t2, . . . , tn are the observed values (or realizations) of the n random variablesT1, T2, . . . , Tn.

(a) Using the available data, derive an explicit expression for the large-samplevariance (based on expected information) of the MLE θ of θ.

(b) Abiostatistician responsible for analyzing this data set realizes that it is not possibleto know with certainty the exact number of months that each patient is in remis-sion after completing the chemotherapy treatment. So, this biostatistician suggeststhe following alternative procedure for estimating θ: “After some specified timeperiod (in months) of length t∗ (a known positive constant) after completion ofthe chemotherapy treatment, let Yi = 1 if the ith patient is still in remission after t∗months and let Yi = 0 if not, where pr(Yi = 1) = pr(Ti > t∗), i = 1, 2, . . . , n. Then,use the n mutually independent dichotomous random variables Y1, Y2, . . . , Yn tofind an alternative MLE θ∗ of the parameter θ.” Develop an explicit expressionfor θ∗.

(c) Use expected information to compare the large-sample variances of θ and θ∗.Assuming t∗ ≥ E(T), which of these two MLEs has the smaller variance, and whyshould this be the anticipated finding? Are there circumstances where the MLEwith the larger variance might be preferred?

Exercise 4.14. For a typical woman in a certain high-risk population of women, supposethat the number Y of lifetime events of domestic violence involving emergency roomtreatment is assumed to have the Poisson distribution

pY(y; λ) = λye−λ/y!, y = 0, 1, . . . , + ∞ and λ > 0.

Let Y1, Y2, . . . , Yn constitute a random sample of size n (where n is large) fromthis Poisson population (i.e., n women from this high-risk population are randomlysampled and then each woman in the random sample is asked to recall the numberof lifetime events of domestic violence involving emergency room treatment that shehas experienced).

(a) Find an explicit expression for the CRLB for the variance of any unbiased estima-tor of parameter θ = pr(Y = 0). Does there exist an unbiased estimator of θ thatachieves this CRLB for all finite values of n?

Page 222: Exercises and Solutions in Biostatistical Theory (2010)

Exercises 203

(b) Suppose that a certain domestic violence researcher believes that reported valuesof Y greater than zero are not very accurate (although a reported value greater thanzero almost surely indicates at least one domestic violence experience involvingemergency room treatment), but that reported values of Y equal to zero are accu-rate. Because of this possible data inaccuracy problem, this researcher wants toanalyze the data by converting each Yi to a two-valued (or dichotomous) randomvariable Xi, where Xi is defined as follows: if Yi ≥ 1, then Xi = 1; and, if Yi = 0,then Xi = 0. Using the n mutually independent dichotomous random variablesX1, X2, . . . , Xn, find an explicit expression for the MLE λ∗ of λ and then find anexplicit expression for the large-sample variance of λ∗.

(c) This domestic violence researcher is concerned that she may be doing somethingwrong by using the dichotomous variables X1, X2, . . . , Xn (instead of the originalPoisson variables Y1, Y2, . . . , Yn) to estimate the unknown parameter λ. To addressher concern, make a quantitative comparison between the properties of λ∗ and λ,where λ is the MLE of λ obtained by using Y1, Y2, . . . , Yn. Also, comment on issuesof validity (i.e., bias) and precision (i.e.,variability) as they relate to the choicebetween λ and λ∗.

Exercise 4.15. For a certain African village, available data strongly suggest that theexpected number of new cases of AIDS developing in any particular year is directlyproportional to the expected number of new AIDS cases that developed during theimmediately preceding year. An important statistical goal is to estimate the value ofthis unknown proportionality constant θ (θ > 1), which is assumed not to vary fromyear to year, and then to find an appropriate 95% CI for θ.

To accomplish this goal, the following statistical model is to be used: For j =0, 1, . . . , n consecutive years of data, let Yj be the random variable denoting the numberof new AIDS cases developing in year j. Further, suppose that the (n + 1) random vari-ables Y0, Y1, . . . , Yn are such that the conditional distribution of Yj+1, given Yk = ykfor k = 0, 1, . . . , j, depends only on yj and is Poisson with E(Yj+1|Yj = yj) = θyj, j =0, 1, . . . , (n − 1). Further, assume that the distribution of the random variable Y0 isPoisson with E(Y0) = θ, where θ > 1.

(a) Using all (n + 1) random variables Y0, Y1, . . . , Yn, develop an explicit expressionfor the MLE θ of the unknown proportionality constant θ.

(b) If n = 25 and θ = 1.20, compute an appropriate ML-based 95% CI for θ.

Exercise 4.16. In a certain clinical trial, suppose that the outcome variable X representsthe 6-month change in cholesterol level (in milligrams per deciliter) for subjects inthe treatment (T) group who will be given a certain cholesterol-lowering drug, andsuppose that Y represents this same outcome variable for subjects in the control (C)group who will be given a placebo. Further, suppose that it is reasonable to assumethat X ∼ N(μt, σ2

t ) and Y ∼ N(μc, σ2c ), and that σ2

t and σ2c have known values such that

σ2t = σ2

c .Let X1, X2, . . . , Xnt constitute a random sample of size nt from N(μt, σ2

t ); namely,these nt observations represent the set of outcomes to be measured on the nt subjectswho have been randomly assigned to the T group. Similarly, let Y1, Y2, . . . , Ync consti-tute a random sample of size nc from N(μc, σ2

c ); namely, these nc observations represent

Page 223: Exercises and Solutions in Biostatistical Theory (2010)

204 Estimation Theory

the set of outcomes to be measured on the nc subjects who have been randomlyassigned to the C group.

Because of monetary and logistical constraints, suppose that a total of only N subjectscan participate in this clinical trial, so that nt and nc are constrained to satisfy therelationship (nt + nc) = N. Based on the stated assumptions (namely, random samplesfrom two normal populations with known, but unequal, variances), determine the“optimal” partition of N into values nt and nc that will produce the most “precise”exact 95% CI for (μt − μc). When N = 100, σ2

t = 4, and σ2c = 9, find the optimal choices

for nt and nc. Comment on your findings.

Exercise 4.17. Suppose that the random variable Y = ln(X), where X is the ambi-ent carbon monoxide (CO) concentration (in parts per million) in a certain highlypopulated U.S. city, is assumed to have a normal distribution with mean E(Y) = μ andvariance V(Y) = σ2. Let Y1, Y2, . . . , Yn constitute a random sample from this N(μ, σ2)

population. Practically speaking, Y1, Y2, . . . , Yn can be considered to be ln(CO concen-tration) readings taken on days 1, 2, . . . , n, where these n days are spaced far enoughapart so that Y1, Y2, . . . , Yn can be assumed to be mutually independent random vari-ables. It is of interest to be able to predict with some accuracy the value of the randomvariable Yn+1, namely, the value of the random variable representing the ln(CO con-centration) on day (n + 1), where day (n + 1) is far enough in time from day n sothat Yn+1 can reasonably be assumed to be independent of the random variablesY1, Y2, . . . , Yn. Also, it can be further assumed, as well, that Yn+1 ∼ N(μ, σ2).

If Y = n−1∑ni=1 Yi and if S2 = (n − 1)−1∑n

i=1(Yi − Y)2, determine explicitexpressions for random variables L and U (involving Y and S) such that

pr[L < Yn+1 < U] = (1 − α), 0 < α ≤ 0.10.

In other words, rigorously derive an exact 100(1 − α)% prediction interval forthe random variable Yn+1. If n = 5, and if Yi = i, i = 1, 2, 3, 4, 5, compute an exact95% prediction interval for Y6. As a hint, construct a statistic involving the randomvariable (Y − Yn+1) that has a t-distribution.

Exercise 4.18. Let X1, X2, . . . , Xn constitute a random sample of size n from a N(μ, σ2)

population. Let

X = n−1n∑

i=1

Xi and S2 = (n − 1)−1n∑

i=1

(Xi − X

)2 .

Under the stated assumptions, the most appropriate 100(1 − α)% CI for μ is

X ± tn−1,1−α/2S/√

n,

where tn−1,1−α/2 is the 100(1 − α/2)% percentile point of Student’s t-distribution with(n − 1) df. The width Wn of this CI is

Wn = 2tn−1,1−α/2S/√

n.

Page 224: Exercises and Solutions in Biostatistical Theory (2010)

Exercises 205

(a) Under the stated assumptions, derive an explicit expression for E(Wn), theexpected width of this CI. What is the exact numerical value of E(Wn) if n = 4,α = 0.05, and σ2 = 4?

(b) Suppose that it is desired to find the smallest sample size n∗ such that

pr(Wn∗ ≤ δ) = pr{2tn∗−1,1−α/2S/√

n∗ ≤ δ} ≥ (1 − γ),

where δ (> 0) and γ (0 < γ < 1) are specified positive numbers.Under the stated assumptions, prove rigorously that n∗ should be chosen to be

the smallest positive integer satisfying the inequality

n∗(n∗ − 1) ≥(

δ

)2χ2

n∗−1,1−γ f1,n∗−1,1−α,

where χ2n∗−1,1−γ

and f1,n∗−1,1−α denote, respectively, 100(1 − γ) and 100(1 − α)

percentile points for a chi-square distribution with (n∗ − 1) df and for anf -distribution with 1 numerator, and (n∗ − 1) denominator, df.

Exercise 4.19. Suppose that an epidemiologist desires to make statistical inferencesabout the true mean diastolic blood pressure levels for adult residents in three ruralNorth Carolina cities. As a starting model, suppose that she assumes that the trueunderlying distribution of diastolic blood pressure measurements for adults in eachcity is normal, and that these three normal distributions have a common variance (say,σ2), but possibly different means (say, μ1, μ2, and μ3). This epidemiologist decides toobtain her blood pressure study data by randomly selecting ni adult residents fromcity i, i = 1, 2, 3, and then measuring their diastolic blood pressures.

Using more formal statistical notation, for i = 1, 2, 3, let Yi1, Yi2, . . . , Yini consti-tute a random sample of size ni from a N(μi, σ2) population. Define the randomvariables

Yi = n−1i

ni∑

j=1

Yij, i = 1, 2, 3,

and

S2i = (ni − 1)−1

ni∑

j=1

(Yij − Yi

)2, i = 1, 2, 3.

(a) Consider the parameter

θ = (2μ1 − 3μ2 + μ3).

Using all the available data (in particular, all three sample means and all threesample variances), construct a random variable that has a Student’s t-distribution.

(b) If n1 = n2 = n3 = 4, y1 = 80, y2 = 75, y3 = 70, s21 = 4, s2

2 = 3, and s23 = 5, find an

exact 95% CI for θ given the stated assumptions.

(c) Now, suppose that governmental reviewers of this study are skeptical aboutboth the epidemiologist’s assumptions of normality and homogeneous variance,

Page 225: Exercises and Solutions in Biostatistical Theory (2010)

206 Estimation Theory

claiming that her sample sizes were much too small to provide reliable informationabout the appropriateness of these assumptions or about the parameter θ. Toaddress these criticisms, this epidemiologist goes back to these same three ruralNorth Carolina cities and takes blood pressure measurements on large randomsamples of adult residents in each of the three cities; she obtains the following data:

n1 = n2 = n3 = 50; y1 = 85, y2 = 82, y3 = 79; s21 = 7, s2

2 = 2, s23 = 6.

Retaining the normality assumption for now, find an appropriate 95% CI forσ2

1/σ22, and then comment regarding the appropriateness of the homogeneous

variance assumption.

(d) Using the data in part (c), compute an appropriate large-sample 95% CI for θ.Comment on the advantages of increasing the sizes of the random samples selectedfrom each of the three populations.

Exercise 4.20. Let X1, X2, . . . , Xn1 constitute a random sample of size n1(>2) from anormal parent population with mean 0 and variance θ. Also, let Y1, Y2, . . . , Yn2 con-stitute a random sample of size n2(>2) from a normal parent population with mean 0and variance θ−1. The set of random variables {X1, X2, . . . , Xn1 } is independent of theset of random variables {Y1, Y2, . . . , Yn2 }, and θ(>0) is an unknown parameter.

(a) Derive an explicit expression for E(√

L) when L =∑n1i=1 X2

i .

(b) Using all (n1 + n2) available observations, derive an explicit expression for an exact100(1 − α)% CI for the unknown parameter θ. If n1 = 8, n2 = 5,

∑8i=1 x2

i = 30, and∑5i=1 y2

i = 15, compute a 95% confidence interval for θ.

Exercise 4.21. In certain types of studies called crossover studies, each of n randomlychosen subjects is administered both a treatment T (e.g., a new drug pill) and a placeboP (e.g., a sugar pill). Typically, neither the subject nor the person administering the pillsknows which pill is T and which pill is P (namely, the study is a so-called double-blindstudy). Also, the two possible pill administration orderings “first T, then P” and “firstP, then T” are typically allocated randomly to subjects, and sufficient time is allowedbetween administrations to avoid so-called “carry-over” effects. One advantage ofa crossover study is that a comparison between the effects of T and P can be madewithin (or specific to) each subject (since each subject supplies information on theeffects of both T and P), thus eliminating subject-to-subject variability in each subject-specific comparison.

For the ith subject (i = 1, 2, . . . , n), suppose that Di = (YTi − YPi) is the continu-ous random variable representing the difference between a continous response (YTi)following T administration and a continuous response (YPi) following P administra-tion. So, Di is measuring the effect of T relative to P for subject i. Since YTi and YPiare responses for the same subject (namely, subject i), it is very sensible to expectthat YTi and YPi will be correlated to some extent. To allow for this potential intra-subject response correlation, assume in what follows that YTi and YPi jointly followa bivariate normal distribution with E(YTi) = μT, V(YTi) = σ2

T, E(YPi) = μP, V(YPi) =σ2

P, and with corr(YTi, YPi) = ρ, i = 1, 2, . . . , n. Further, assume that the n differencesD1, D2, . . . , Dn are mutually independent of one another.

Page 226: Exercises and Solutions in Biostatistical Theory (2010)

Exercises 207

(a) Assuming that σ2T, σ2

P, and ρ have known values, use the n mutually indepen-dent random variables D1, D2, . . . , Dn to derive an exact 100(1 − α)% CI for theunknown parameter θ = (μT − μP), the true difference between the expectedresponses for the T and P administrations. In particular, find explicit expressionsfor random variables L and U such that pr(L < θ < U) = (1 − α), 0 < α ≤ 0.10. Ifthere are available data for which n = 10, yT = 15.0, yP = 14.0, σ2

T = 2.0, σ2P = 3.0,

ρ = 0.30, and α = 0.05, use this numerical information to compute exact numericalvalues for L and U. Interpret these numerical results with regard to whether ornot the available data provide statistical evidence that μT and μP have differentvalues.

(b) Now, assume that treatment effectiveness is equivalent to the inequality θ > 0(or, equivalently, μT > μP). If α = 0.05, σ2

T = 2.0, σ2P = 3.0, ρ = 0.30, and θ = 1.0

(so that T is truly effective compared to P), what is the minimum number n∗ ofsubjects that should be enrolled in this crossover study so that the random vari-able L determined in part (a) exceeds the value zero with probability at least equalto 0.95? The motivation for finding n∗ is that, if the treatment is truly effective, itis highly desirable for the lower limit L of the CI for θ to have a high probabilityof exceeding zero in value, thus providing statistical evidence in favor of a realtreatment effect relative to the placebo effect.

Exercise 4.22. For i = 1, 2, . . . , n, let the random variables Xi and Yi denote, respectively,the diastolic blood pressure (DBP) and systolic blood pressure (SBP) for the ith ofn (>1) randomly chosen hypertensive adult males. Assume that the pairs (Xi, Yi),i = 1, 2, . . . , n, constitute a random sample of size n from a bivariate normal population,where E(Xi) = μx, E(Yi) = μy, V(Xi) = V(Yi) = σ2, and corr(Xi, Yi) = ρ. The goal isto develop an exact 95% CI for the correlation coefficient ρ. To accomplish this goal,consider the following random variables. Let Ui = (Xi + Yi) and Vi = (Xi − Yi), i =1, 2, . . . , n. Further, let nU =∑n

i=1 Ui, nV =∑ni=1 Vi, (n − 1)S2

u =∑ni=1(Ui − U)2, and

(n − 1)S2v =∑n

i=1(Vi − V)2.

(a) Derive explicit expressions for the means and variances of the random variablesUi and Vi, i = 1, 2, . . . , n.

(b) Prove rigorously that cov(Ui, Vi) = 0, i = 1, 2, . . . , n, so that, in this situation, it willfollow that Ui and Vi are independent random variables, i = 1, 2, . . . , n.

(c) Use rigorous arguments to prove that the random variable

W = (1 − ρ)S2u

(1 + ρ)S2v

has an f -distribution.

(d) If n = 10, and if the realized values of S2u and S2

v are 1.0 and 2.0, respectively, usethese data, along with careful arguments, to compute an exact 95% CI for ρ.

Exercise 4.23. An economist postulates that the distribution of income (in thousandsof dollars) in a certain large U.S. city can be modeled by the Pareto density function

fY(y; γ, θ) = θγθy−(θ+1), 0 < γ < y < ∞ and 2 < θ < ∞,

Page 227: Exercises and Solutions in Biostatistical Theory (2010)

208 Estimation Theory

where γ and θ are unknown parameters. Let Y1, Y2, . . . , Yn constitute a random sampleof size n from fY(y; γ, θ).

(a) If n = 50, y = n−1∑ni=1 yi = 30, and s2 = (n − 1)−1∑n

i=1(yi − y)2 = 10, find exactnumerical values for the method of moments estimators γmm and θmm, respec-tively, of γ and θ.

(b) A consulting biostatistician suggests that the smallest order statistic Y(1) =min{Y1, Y2, . . . , Yn} is also a possible estimator for γ. Is Y(1) a consistent estimatorof γ?

(c) Now, assume that θ = 3, so that the only unknown parameter is γ. It is desiredto use the random variable Y(1) to compute an exact upper one-sided CI for γ. Inparticular, derive an explicit expression for a random variable U = cY(1), 0 < c < 1,such that pr(γ < U) = (1 − α), 0 < α ≤ 0.10. If n = 5, α = 0.10, and the observedvalue of Y(1) is y(1) = 20, use this information to compute an upper one-sided 90%CI for the unknown parameter γ.

Exercise 4.24. Let X1, X2, . . . , Xn constitute a random sample of size n from theparent population fX(x), −∞ < x < +∞. Further, let X(1), X(2), . . . , X(n) be the setof corresponding order statistics, where −∞ < X(1) < X(2) < · · · < X(n−1) < X(n) <

+∞.

(a) Let Ur be the random variable defined as

Ur = pr[X ≤ X(r)

] =∫X(r)

−∞fX(x) dx = FX(X(r)), r = 1, 2, . . . , n,

so that Ur is the amount of area under fX(x) to the left of X(r). Develop an explicitexpression for E(Ur).

(b) For 0 < p < 1, define the pth quantile of fX(x) to be θp = F−1X (p); in particular, θp is

that value of x such that an amount p of area under fX(x) is to the left of x. Describehow the result in part (a) can be used to develop a reasonable estimator of θp.

Exercise 4.25. Suppose that X1, X2, . . . , Xn constitute a random sample of size n fromthe density function fX(x; θ), where −∞ < x < ∞. It is desired to construct an appro-priate CI for the median ξ of fX(x; θ), where ξ is defined as the population parametersatisfying the relationship

∫ξ−∞ fX(x; θ) dx = 1

2 .As one possible CI for ξ, consider usingX(1) = min{X1, X2, . . . , Xn} for the lower limit and X(n) = max{X1, X2, . . . , Xn} for theupper limit.

(a) If fX(x; θ) = θxθ−1, 0 < x < 1 and θ > 0, derive an explicit expression for theexpected value of the width W of the proposed CI [X(1), X(n)].

(b) Now, suppose that the structure of fX(x; θ) is completely unknown. Again considerthe proposed CI [X(1), X(n)]. Derive an explicit expression for pr[X(1) < ξ < X(n)],and then comment on this result with regard to the utility of this particular CIfor ξ.

Page 228: Exercises and Solutions in Biostatistical Theory (2010)

Exercises 209

Exercise 4.26. Let X1, X2, . . . , Xn constitute a random sample from the uniform densityfX(x) = 1, 0 < x < 1. Then, G = (∏n

i=1 Xi)1/n is the geometric mean. Develop explicit

expressions for E(G) and V(G), and then use these results to determine to what quan-tity G converges in probability.

Exercise 4.27. Let Y be a continuous random variable with density fY(y) = e−y, y > 0.Consider the sequence of random variables Xn = enI(Y > n), n = 1, 2, . . . , where theindicator function I(Y > n) takes the value 1 if Y > n and takes the value 0 otherwise.Working directly with the definition of “convergence in probability,” prove that Xnconverges in probability to the value 0.

Exercise 4.28. Suppose that a continuous response Y is to be measured on each of n sub-jects during a two-group clinical trial comparing a new drug therapy to a standard drugtherapy. Without loss of generality, suppose that the first n1 subjects (i = 1, 2, . . . , n1)

constitute the treatment group (i.e., the group of subjects receiving the new drug ther-apy) and the remaining n0 = (n − n1) subjects (i = n1 + 1, n1 + 2, . . . , n) constitute thecomparison group (i.e., the group of subjects receiving the standard therapy).

Further, suppose that the following multiple linear regression model defines the trueunderlying relationship between the continuous response and relevant covariates:

E(Yi|Ti, Ai) = α + βTi + γAi, i = 1, 2, . . . , n,

where Ti equals 1 if the ith subject is a member of the treatment group and equals 0if the ith subject is a member of the comparison group, where Ai is the age of the ithsubject, and where α = 0, β = 0, and γ = 0. Note that the key parameter of interest isβ, which measures the effect of the new drug therapy relative to the standard drugtherapy, adjusting for the possible confounding effect of the differing ages of studysubjects.

Consider the unfortunate situation where the researchers running the clinical triallose that subset {Ai}n

i=1 of the complete data set {Yi, Ti, Ai}ni=1 which gives the age of

each subject. Suppose that these researchers then decide to fit the alternative incorrectstraight-line model E(Yi|Ti) = α∗ + β∗Ti, i = 1, 2, . . . , n, to the available data by themethod of unweighted least squares, thus obtaining

β∗ =∑n

i=1(Ti − T)(Yi − Y)∑ni=1(Ti − T)2

as their suggested estimator of β, where T = n−1∑ni=1 Ti and Y = n−1∑n

i=1 Yi.Rigorously derive an explicit expression for E(β∗|{Ti}, {Ai}), and then provide a

sufficient condition involving A1, A2, . . . , An such that this conditional expected valueis equal to β. Although these researchers do not know the ages of the n subjects inthe clinical trial, suppose that they did decide to assign these subjects randomly tothe treatment and comparison groups. Discuss how such a randomization procedurecould possibly affect the degree of bias in β∗ as an estimator of β.

For further details about multiple linear regression, see Kleinbaum et al. (2008) andKutner et al. (2004).

Exercise 4.29. For i = 1, 2, . . . , n, suppose that the dichotomous random variable Yitakes the value 1 if the ith subject in a certain clinical trial experiences a particular

Page 229: Exercises and Solutions in Biostatistical Theory (2010)

210 Estimation Theory

outcome of interest and takes the value 0 if not, and assume that Y1, Y2, . . . , Ynconstitute a set of n mutually independent random variables. Further, given p covari-ate values xi0(≡ 1), xi1, . . . , xip associated with the ith subject, make the assumptionthat πi = pr(Yi = 1|xij, j = 0, 1, . . . , p) has the logistic model form, namely,

πi = pr(Yi = 1|xij, j = 0, 1, . . . , p) = e∑p

j=0 βjxij

1 + e∑p

j=0 βjxij.

(a) If L(y; β) denotes the appropriate likelihood function for the random vector Y ′ =(Y1, Y2, . . . , Yn), where y′ = (y1, y2, . . . , yn) and β′ = (β0, β1, . . . , βp), show that the(p + 1) equations that need to be simultaneously solved to obtain the vector β =(β0, β1, . . . , βp)′ of MLEs of β can be compactly in matrix notation as

X ′ [y − E(Y)] = 0,

where E(Y) = [E(Y1), E(Y2), . . . , E(Yn)]′, where 0 is a [(p + 1) × 1] column vectorof zeros, and where X is an appropriately specified [n × (p + 1)] matrix.

(b) For the likelihood function L(y; β), show that both the observed and expectedinformation matrices are identical and that each can be written as the same functionof X and V , where V is the covariance matrix for the random vector Y . Then, usethis result to describe how to obtain an estimate of the covariance matrix of β.

Exercise 4.30∗. Research investigators from the Division of Marine Fisheries in a cer-tain U.S. state are interested in evaluating possible causes of ulcerative lesions in fishinhabiting a large coastal estuary. The investigators hypothesize that fish born in nest-ing sites rich in Pfiesteria, a toxic alga, are more susceptible to such lesions than arefish born in nesting sites without Pfiesteria. The goal of the research is to estimate themean number of lesions for fish born in each of these two types of nesting sites, aswell as the proportion π of coastal estuary fish actually born in Pfiesteria-rich sites.The only available data to estimate these three parameters consist of lesion counts onn randomly chosen young adult fish residing in this estuary, each of which is knownto have been born in one of these two types of nesting sites. Unfortunately, for eachof these n fish, the type of nesting site (Pfiesteria-rich or non-Pfiesteria) in which thatfish was born is not known.

To analyze these data in order to estimate the three parameters of interest, a biostatis-tician consulting with these investigators proposes the following statistical model. Forfish born in Pfiesteria-rich sites, the number Y of lesions is assumed to follow a Pois-son distribution with mean μ1; for fish born in sites without Pfiesteria, Y is assumedto follow a Poisson distribution with mean μ2. The statistical goal is to estimate theunknown parameters π, μ1, and μ2 using the data set y = (y1, y2, . . . , yn)′, where thetype of birth site for the ith young adult fish with observed lesion count yi is unknown,i = 1, 2, . . . , n.

This consulting biostatistician recommends the following method for obtaining theMLEs of the model parameters:

(1) Introduce an unobserved (or “latent”) indicator variable Zi that takes the value 1(with probability π) if the ith of the n fish was born in a Pfiesteria-rich site, andtakes the value 0 otherwise;

Page 230: Exercises and Solutions in Biostatistical Theory (2010)

Exercises 211

(2) Define Lc(y, z; π, μ1, μ2) to be the joint (or “complete-data”) likelihood for y =(y1, y2, . . . , yn)′ and z = (z1, z2, . . . , zn)′; and

(3) Use this complete-data likelihood, along with the expectation-maximization (EM)algorithm described below (Dempster et al., 1977), to derive the desired MLEs.

Starting from well-chosen initial values (specified at iteration t = 0), the EM algo-rithm computes the MLEs by iterating between two steps: the “E-step,” whichevaluates the conditional expectation of the complete-data log-likelihood with respectto the unobservable vector Z = (Z1, Z2, . . . , Zn)′, given the observed data y and the cur-rent parameter estimates; and, the “M-step,” in which this conditional expectation ismaximized with respect to the model parameters. Under certain regularity conditions,the EM algorithm will converge to (at least) a local maximum of the observed-datalikelihood L(y; π, μ1, μ2), which, if the vector Z was known, could be used directly toestimate the three unknown parameters of interest.

Develop explicit expressions (as functions of y, π, μ1, and μ2) for the quantitiesobtained for the E-step and for the M-step at iteration t (t ≥ 1). In particular, for theE-step at iteration t, derive an explicit expression for

Q(t)(y; π, μ1, μ2) ≡ Q(t)

= EZ

{ln[Lc(y, z; π, μ1, μ2)] ∣∣y, π(t−1), μ(t−1)

1 , μ(t−1)2

}.

Then, for the M-step, use Q(t) to find the MLEs of π, μ1, and μ2 at iteration t.

Exercise 4.31∗. The number X of colds per year for a resident in Alaska is assumedto have the discrete distribution pX(x; θ) = θ−1, x = 1, 2, . . . , θ, where the parameterθ is an unknown positive integer. It is desired to find a reasonable candidate for theminimum variance unbiased estimator (MVUE) of θ using the information containedin a random sample X1, X2, . . . , Xn from pX(x; θ).

(a) Prove that U = max{X1, X2, . . . , Xn} is a sufficient statistic for the parameter θ.Also, show that U∗ = (2X1 − 1) is an unbiased estimator of the parameter θ.

(b) Given that U is a complete sufficient statistic for θ, use the Rao–Blackwell Theoremto derive an explicit expression for the MVUE θ of θ, where θ = E(U∗|U = u).Then, show directly that E(θ) = θ. Do you notice any undesirable properties of theestimator θ?

Exercise 4.32∗. For children with autism, it is postulated that the time X (in minutes)for such children to complete a certain manual dexterity test follows the distribution

fX(x) = 1, 0 < θ < x < (θ + 1) < +∞.

Let X1, X2, . . . , Xn constitute a random sample of size n(>1) from fX(x; θ). Let X(1) =min{X1, X2, . . . , Xn} and let X(n) = max{X1, X2, . . . , Xn}. Then, consider the followingtwo estimators of the unknown parameter θ:

θ1 = 12

[X(1) + X(n) − 1

]

Page 231: Exercises and Solutions in Biostatistical Theory (2010)

212 Estimation Theory

and

θ2 = 1(n − 1)

[nX(1) − X(n)

].

(a) Show that θ1 and θ2 are both unbiased estimators of the parameter θ and findexplicit expressions for V(θ1) and V(θ2).

(b) More generally, consider the linear function W = (c0 + c1U1 + c2U2), whereV(U1) = V(U2) = σ2, where cov(U1, U2) = σ12, and where c0, c1, and c2 are con-stants with (c1 + c2) = 1. Determine values for c1 and c2 that minimize V(W), andexplain how this general result relates to a comparison of the variance expressionsobtained in part (a).

(c) Show that X(1) and X(n) constitute a set of jointly sufficient statistics for θ. Do X(1)

and X(n) constitute a set of complete sufficient statistics for θ?

Exercise 4.33*. Reliable estimation of the numbers of subjects in the United Statesliving with different types of medical conditions is important to both public health andhealth policy professionals. In the United States, disease-specific registries have beenestablished for a variety of medical conditions including birth defects, tuberculosis,HIV, and cancer. Such registries are very often only partially complete, meaning thatthe number of registry records for a particular medical condition generally provides anunder-estimate of the actual number of subjects with that particular medical condition.

When two registries exist for the same medical condition, statistical models can beused to estimate the degree of under-ascertainment for each registry and to producean improved estimate of the actual number of subjects having the medical conditionof interest. The simplest statistical model for this purpose is based on the assumptionthat membership status for one registry is statistically independent of membershipstatus for the other registry.

Let the parameter N denote the true unknown number of subjects who have a certainmedical condition of interest. Define the random variables

Xyy = number of subjects listed in both Registry 1 and Registry 2,

Xyn = number of subjects listed in Registry 1 but not in Registry 2,

Xny = number of subjects listed in Registry 2 but not in Registry 1,

Xnn = number of subjects listed in neither of the two registries,

and the corresponding probabilities

πyy = pr(a subject is listed in both Registry 1 and Registry 2),

πyn = pr(a subject is listed in Registry 1 only),

πny = pr(a subject is listed in Registry 2 only),

πnn = pr(a patient is listed in neither Registry).

It is reasonable to assume that the data arise from a multinomial distribution of theform

pXyy ,Xyn,Xny ,Xnn(xyy, xyn, xny, xnn) = N!

xyy!xyn!xny!xnn!πxyyyy π

xynyn π

xnyny π

xnnnn ,

Page 232: Exercises and Solutions in Biostatistical Theory (2010)

Exercises 213

where 0 ≤ xyy ≤ N, 0 ≤ xyn ≤ N, 0 ≤ xny ≤ N, 0 ≤ xnn ≤ N, and (xyy + xyn + xny +xnn) = N.

It is important to note that the random variable Xnn is not observable.

(a) Let π1 = (πyy + πyn) denote the marginal probability that a patient is listed in Reg-istry 1, and let π2 = (πyy + πny) denote the marginal probability that a patientis listed in Registry 2. Under the assumption of statistical independence [i.e.,πyy = π1π2, πyn = π1(1 − π2), etc.], develop an estimator N of N by equatingobserved cell counts to their expected values under the assumed model. What isthe numerical value of N when xyy = 12, 000, xyn = 6, 000, and xny = 8, 000?

(b) For j = 1, 2, let Ej denote the event that a subject with the medical condition islisted in Registry j, and let Ej denote the event that this subject is not listed inRegistry j. In part (a), it was assumed that the events E1 and E2 are independent.As an alternative to this independence assumption, assume that membership inone of the two registries increases or decreases the odds of membership in theother registry by a factor of k; in other words,

odds(E1 | E2)

odds(E1 | E2)= odds(E2 | E1)

odds(E2 | E1)= k, 0 < k < +∞,

where, for two events Aand B, odds(A|B) = pr(A|B)/[1 − pr(A|B)]. Note that k > 1implies a positive association between the events E1 and E2, that k < 1 implies anegative association between the events E1 and E2, and that k = 1 implies noassociation (i.e., independence) between the events E1 and E2.

Although k is not known in practice, it is of interest to determine whether estimatesof N would meaningfully change when plugging in various plausible values for k.Toward this end, develop an explicit expression for the method-of-moments estima-tor N(k) of N that would be obtained under the assumption that k is a known constant.Using the data from part (a), calculate numerical values of N(1/2), N(2), and N(4).Comment on your findings. In particular, is the estimate of N sensitive to differentassumptions about the direction and magnitude of the association between member-ship status for the two registries (i.e., to the value of k)?

Exercise 4.34*. University researchers are conducting a study involving n infants toassess whether infants placed in day care facilities are more likely to be overweightthan are infants receiving care at home. Infants are defined as “overweight” if theyfall within the 85th or higher percentile on the official Centers for Disease Controland Prevention (CDC) age-adjusted and sex-adjusted body mass index (BMI) growthchart.

Let Yi = 1 if the ith infant (i = 1, 2, . . . , n) is overweight, and let Yi = 0 otherwise. Itis assumed that Yi has the Bernoulli distribution

pYi(yi; πi) = π

yii (1 − πi)

1−yi , yi = 0, 1 and 0 < πi < 1.

Also, Y1, Y2, . . . , Yn are assumed to be mutually independent random variables.To make statistical inferences about the association between type of care and

the probability of being overweight, the researchers propose the following logistic

Page 233: Exercises and Solutions in Biostatistical Theory (2010)

214 Estimation Theory

regression model:

πi ≡ π(xi) = pr(Yi = 1|xi) = eα+βxi

(1 + eα+βxi ), or equivalently,

logit[π(xi)] = ln[

π(xi)

1 − π(xi)

]= α + βxi, i = 1, . . . , n,

where xi = 1 if the ith infant is in day care and xi = 0 if the ith infant is at home,and where α and β are unknown parameters to be estimated. Here, the parameter α

represents the “log odds” of being overweight for infants in home care (xi = 0), andthe parameter β represents the difference in log odds (or “log odds ratio”) of beingoverweight for infants placed in day care (xi = 1) compared to infants receiving careat home (xi = 0).

Suppose that n pairs (y1, x1 = 0), (y2, x2 = 0), . . . , (yn0 , xn0 = 0), (yn0+1, xn0+1 =1), (yn0+2, xn0+2 = 1) . . . , (yn, xn = 1) of observed data are collected during the study,where the first n0 data pairs are associated with the infants receiving home care, wherethe last n1 data pairs are associated with the infants placed in day care, and where(n0 + n1) = n.

(a) Show that the MLEs of α and β are

α = ln(

p01 − p0

)and β = ln

[p1/(1 − p1)

p0/(1 − p0)

],

where p0 = n−10∑n0

i=1 yi is the sample proportion of overweight infants receiving

home care and p1 = n−11∑n

i=n0+1 yi is the sample proportion of overweight infantsin day care.

(b) Develop an explicit expression for the large-sample variance–covariance matrixof α and β based on both expected and observed information.

(c) Suppose that there are 100 infants receiving home care, 18 of whom are overweight,and that there are 100 infants in day care, 26 of whom are overweight. Use thesedata to compute large-sample 95% CIs for α and β. Based on these CI results, dothe data supply statistical evidence that infants placed in day care facilities aremore likely to be overweight than are infants receiving care at home?

For further details about logistic regression, see Breslow and Day (1980), Hosmerand Lemeshow (2000), Kleinbaum and Klein (2002), and Kleinbaum et al. (1982).

Exercise 4.35*. In April of 1986, a reactor exploded at the Chernobyl Nuclear PowerPlant in Chernobyl, Russia. There were roughly 14,000 permanent residents ofChernobyl who were exposed to varying levels of radioactive iodine, as well as to otherradioactive substances. It took about 3 days before these permanent residents, andother persons living in nearby areas, could be evacuated. As a result, many childrenand adults have since developed various forms of cancer.

In particular, many young children developed thyroid cancer. As a model for thedevelopment of thyroid cancer in such children, the following statistical model isproposed. Let T be a continuous random variable representing the time (in years)

Page 234: Exercises and Solutions in Biostatistical Theory (2010)

Exercises 215

from childhood radioactive iodine exposure caused by the Chernobyl explosion tothe diagnosis of thyroid cancer, and let the continuous random variable X be the level(in Joules per kilogram) of radioactive iodine exposure. Then, it is assumed that theconditional distribution of T given X = x is

fT(t|X = x) = θxe−θxt, t > 0, x > 0, θ > 0.

Further, assume that the distribution of X is GAMMA(α = 1, β), so that

fX(x) = xβ−1e−x

Γ(β), x > 0, β > 1.

Suppose that an epidemiologist locates n children with thyroid cancer who wereresidents of Cherynobyl at the time of the explosion. For each of these children, thisepidemiologist determines the time in years (i.e., the so-called latency period) fromexposure to the diagnosis of thyroid cancer. In particular, let t1, t2, . . . , tn denote theseobserved latency periods. Since it is impossible to determine the true individual levelof radioactive iodine exposure for each of these n children, the only data available tothis epidemiologist are the n observed latency periods.

Based on the use of the observed latency periods for a random sample of n = 300children, if the MLEs of θ and β are θ = 0.32 and β = 1.50, compute an appropriatelarge-sample 95% CI for γ = E(T), the true average latency period for children whodeveloped thyroid cancer as a result of the Chernobyl nuclear reactor explosion.

Exercise 4.36*. Using n mutually independent data pairs of the general form (x, Y), itis desired to use the method of unweighted least squares to fit the model

Y = β0 + β1x + β2x2 + ε,

where E(ε) = 0 and V(ε) = σ2. Suppose that the three possible values of the predictor xare−1, 0, and+1. What proportion of the n data points should be assigned to each of thethree values of x so as to minimize V(β2), the variance of the unweighted least-squaresestimator of β2?

To proceed, assume that n = nπ1 + nπ2 + nπ3, where π1(0 < π1 < 1) is the propor-tion of the n observations to be assigned to the x-value of −1, where π2(0 < π2 < 1)

is the proportion of the n observations to be assigned to the x-value of 0, and whereπ3(0 < π3 < 1) is the proportion of the n observations to be assigned to the x-valueof +1. Further, assume that n can be chosen so that n1 = nπ1, n2 = nπ2, and n3 = nπ3are positive integers.

(a) With

β = (β0, β1, β2)′ = (X ′X)−1X ′Y ,

show that X ′X can be written in the form

X ′X = n

⎡⎣

1 b ab a ba b a

⎤⎦ ,

where a = (π1 + π3) and b = (π3 − π1).

Page 235: Exercises and Solutions in Biostatistical Theory (2010)

216 Estimation Theory

(b) Show that

V(β2) ={

[(π1 + π3) − (π3 − π1)2]4nπ1π2π3

}σ2.

(c) Use the result from part (b) to find the values of π1, π2 and π3 that minimize V(β2)

subject to the constraint (π1 + π2 + π3) = 1.

Exercise 4.37*. Given appropriate data, one possible (but not necessarily optimal) algo-rithm for deciding whether or not there is statistical evidence that p(≥2) populationmeans are not all equal to the same value is the following: compute a 100(1 − α)% CIfor each population mean and decide that there is no statistical evidence that thesepopulation means are not all equal to the same value if these p CIs have at least onevalue in common (i.e., if there is at least one value that is simultaneously containedin all p CIs); otherwise, decide that there is statistical evidence that these p populationmeans are not all equal to the same value. To evaluate some statistical properties ofthis proposed algorithm, consider the following scenario.

For i = 1, 2, . . . , p, let Xi1, Xi2, . . . , Xin constitute a random sample of size n from aN(μi, σ2) population. Given the stated assumptions, the appropriate exact 100(1 − α)%CI for μi, using only the data {Xi1, Xi2, . . . , Xin} from the ith population, involves thet-distribution with (n − 1) df and takes the form

Xi ± kSi√

n, where k = t(n−1),1−α/2,

where

Xi = n−1n∑

j=1

Xij and S2i = (n − 1)−1

n∑

j=1

(Xij − Xi)2,

and where, for 0 < α < 0.50,

pr(Tν > tν,1−α/2

) = α

2

when the random variable Tν has a t-distribution with ν df.For notational convenience, let Ii denote the set of values included in the ith com-

puted CI; and, for i = i′, let the event Eii′ = Ii ∩ Ii′ = ∅, the empty (or null) set; in otherwords, Eii′ is the event that the CIs Xi ± kSi/

√n and Xi′ ± kSi′/

√n have no values in

common (i.e., do not overlap).

(a) Show that

πii′ = pr(Eii′) = pr[|Xi − Xi′ | > k

(Si√

n+ Si′√

n

)].

(b) Under the condition (say, Cp) that all p population means are actually equal to thesame value (i.e., μ1 = μ2 = · · · = μp = μ, say), use the result from part (a) to showthat, for i = i′,

π∗ii′ = pr(Eii′ |Cp) ≤ pr

[|T2(n−1)| > k|Cp] ≤ α.

Page 236: Exercises and Solutions in Biostatistical Theory (2010)

Exercises 217

(c) When p = 3 and under the condition C3 that μ1 = μ2 = μ3 = μ, say, find a crudeupper bound for the probability that there are no values common to all three CIs.Comment on this finding and, in general, on the utility of this algorithm.

Exercise 4.38*. A highway safety researcher theorizes that the number Y of automo-bile accidents per year occurring on interstate highways in the United States is linearlyrelated to a certain measure x of traffic density. To evaluate his theory, this highwaysafety researcher gathers appropriate data from n independently chosen locationsacross the United States. More specifically, for the ith of n independently chosen loca-tions (i = 1, 2, . . . , n), the data point (xi, yi) is recorded, where xi (the measure of trafficdensity at location i) is assumed to be a known positive constant and where yi is theobserved value (or “realization”) of the random variable Yi. Here, the random variableYi is assumed to have a Poisson distribution with E(Yi) = E(Yi|xi) = θ0 + θ1xi. Youcan assume that E(Yi) > 0 for all i and that the set {Y1, Y2, . . . , Yn} constitutes a set ofn mutually independent Poisson random variables. The goal is to use the availablen pairs of data points (xi, yi), i = 1, 2, . . . , n, to make statistical inferences about theunknown parameters θ0 and θ1.

(a) Derive explicit expressions for the unweighted least-squares (ULS) estimators θ0and θ1, respectively, of θ0 and θ1. Also, derive expressions for the expected valuesand variances of these two ULS estimators.

(b) For a set of n = 100 data pairs (xi, yi), i = 1, 2, . . . , 100, suppose that each of 25 datapairs has an x value equal to 1.0, that each of 25 data pairs has an x value equal to2.0, that each of 25 data pairs has an x value equal to 3.0, and that each of 25 datapairs has an x value of 4.0. If the MLEs of θ0 and θ1 are, respectively, θ0 = 2.00 andθ1 = 4.00, compute an appropriate large-sample 95% CI for the parameter

ψ = E(Y|x = 2.5) = θ0 + (2.5)θ1.

Exercise 4.39*. Let X1, X2, . . . , Xn constitute a random sample of size n(> 1)

from an N(μ, σ2) population, and let Y1, Y2, . . . , Yn constitute a random sampleof size n(> 1) from a completely different N(μ, σ2) population. Hence, the set{X1, X2, . . . , Xn; Y1, Y2, . . . , Yn} is made up of a total of 2n mutually independent ran-dom variables, with each random variable in the set having a N(μ, σ2) distribution.Consider the following random variables:

X = n−1n∑

i=1

Xi, S2x = (n − 1)−1

n∑

i=1

(Xi − X)2,

Y = n−1n∑

i=1

Yi, S2y = (n − 1)−1

n∑

i=1

(Yi − Y)2.

(a) For any particular value of i(1 ≤ i ≤ n), determine the exact distribution of therandom variable Di = (Xi − X).

(b) For particular values of i(1 ≤ i ≤ n) and j(1 ≤ j ≤ n), where i = j, derive an explicitexpression for corr(Di, Dj), the correlation between the random variables Di and

Page 237: Exercises and Solutions in Biostatistical Theory (2010)

218 Estimation Theory

Dj. Also, find the limiting value of corr(Di, Dj) as n → ∞, and then provide anargument as to why this result makes sense.

(c) Prove rigorously that the density function of the random variable R = S2x/S2

y is

fR(r) = [Γ(n − 1)]{Γ[(n − 1)/2]}−2r[(n−3)/2](1 + r)−(n−1),

0 < r < ∞.

Also, find an explicit expression for E(R).

(d) Now, consider the following two estimators of the unknown parameter μ:

(i) μ1 = (X + Y)/2;

(ii) μ2 = [(X)(S2y) + (Y)(S2

x)]/(S2x + S2

y).

Prove rigorously that both μ1 and μ2 are unbiased estimators of μ.

(e) Derive explicit expressions for V(μ1) and V(μ2). Which estimator, μ1 or μ2, doyou prefer and why?

Exercise 4.40*. A certain company manufactures stitches for coronary bypassgraft surgeries. The distribution of the length Y (in feet) of defect-free stitchesmanufactured by this company is assumed to have the uniform density

fY(y; θ) = θ−1, 0 < y < θ, θ > 0.

Clearly, the larger is θ, the better is the quality of the manufacturing process. Supposethat Y1, Y2, . . . , Yn constitute a random sample of size n(n > 2) from fY(y; θ). A statis-tician proposes three estimators of the parameter μ = E(Y) = θ/2, the true averagelength of defect-free stitches manufactured by this company. These three estimatorsare as follows:

(1) μ1 = k1Y = k1n−1∑ni=1 Yi, where k1 is to be chosen so that E(μ1) = μ;

(2) μ2 = k2Y(n), where Y(n) is the largest order statistic based on this random sampleand where k2 is to be chosen so that E(μ2) = μ;

(3) μ3 = k3[Y(1) + Y(n)]/2, the so-called “midpoint” of the data, where Y(1) is thesmallest order statistic based on this random sample and where k3 is to be chosenso that E(μ3) = μ.

(a) Find the value of k1, and then find V(μ1).

(b) Find the value of k2, and then find V(μ2).

(c) Find the value of k3, and then find V(μ3).

(d) Compare the variances of μ1, μ2, and μ3 for both finite n and as n → ∞. Whichestimator do you prefer and why?

Exercise 4.41*. For adult males with incurable malignant melanoma who havelived at least 25 consecutive years in Arizona, an epidemiologist theorizes that thetrue mean time (in years) to death differs between those adult males with a familyhistory of skin cancer and those adult males without a family history of skin cancer.

Page 238: Exercises and Solutions in Biostatistical Theory (2010)

Exercises 219

To test this theory, this epidemiologist selects a random sample of n adult males inArizona with incurable malignant melanoma, each of whom has lived in Arizona forat least 25 consecutive years. Then, this epidemiologist and a collaborating biostatis-tician agree to consider the following statistical model for two random variables Xand Y. Here, X is a dichotomous random variable taking the value 1 for an adult malein the random sample without a family history of skin cancer and taking the value 0for an adult male in the random sample with a family history of skin cancer; and, Y isa continuous random variable representing the time (in months) to death for an adultmale in the random sample with incurable malignant melanoma. More specifically,assume that the marginal distribution of X is Bernoulli (or point-binomial), namely,

pX(x; θ) = θx(1 − θ)(1−x), x = 0, 1; 0 < θ < 1.

Moreover, assume that the conditional distribution of Y, given X = x, is negativeexponential with conditional mean E(Y|X = x) = μ(x) = eα+βx, namely,

fY(y|X = x; α, β) = [μ(x)]−1e−y/μ(x), 0 < y < +∞.

This two-variable model involves three unknown parameters, namely, θ(0 < θ <

1), α(−∞ < α < +∞), and β(−∞ < β < +∞).Let (X1, Y1), (X2, Y2), . . . , (Xn, Yn) constitute a random sample of size n from the

joint distribution fX,Y(x, y; θ, α, β) of the random variables X and Y, where this jointdistribution is given by the product

fX,Y(x, y; θ, α, β) = pX(x; θ)fY(y|X = x; α, β).

Now, suppose that the available data contain n1 adult males without a familyhistory of skin cancer and n0 = (n − n1) adult males with a family history of skincancer. Further, assume (without loss of generality) that the n observed data pairs(i.e., the n realizations) are arranged (for notational simplicity) so that the first n1 pairs(1, y1), (1, y2), . . . , (1, yn1) are the observed data for the n1 adult males without a familyhistory of skin cancer, and the remaining n0 data pairs (0, yn1+1), (0, yn1+2), . . . , (0, yn)

are the observed data for the n0 adult males with a family history of skin cancer.

(a) Develop explicit expressions for the MLEs θ, α, and β of θ, α, and β, respectively.In particular, show that these three ML estimators can be written as explicitfunctions of one or more of the sample means x = n1/n, y1 = n−1

1∑n1

i=1 yi, and

y0 = n−10∑n

i=(n1+1) yi.

(b) Using expected information, develop an explicit expression for the (3×3) large-sample covariance matrix I−1 for the three ML estimators θ, α, and β.

(c) If n = 50, θ = 0.60, α = 0.50, and β = 0.40, use appropriate CI calculations to deter-mine whether this numerical information supplies statistical evidence that the truemean time to death differs between adult males with a family history of skin cancerand adult males without a family history of skin cancer, all of whom developedincurable malignant melanoma and lived at least 25 consecutive years in Arizona.

Exercise 4.42*. For a certain laboratory experiment, the concentration Yx (in mil-ligrams per cubic centimeter) of a certain pollutant produced via a chemical reaction

Page 239: Exercises and Solutions in Biostatistical Theory (2010)

220 Estimation Theory

taking place at temperature x (conveniently scaled so that −1 ≤ x ≤ +1) has a nor-mal distribution with mean E(Yx) = θx = (β0 + β1x + β2x2) and variance V(Yx) = σ2.Also, the temperature x is nonstochastic (i.e., is not a random variable) and is knownwithout error.

Suppose that an environmental scientist runs this experiment N times, with each runinvolving a different temperature setting. Further, suppose that these N runs producethe N pairs of data (x1, Yx1), (x2, Yx2), . . . , (xN , YxN ).Assume that the random variablesYx1 , Yx2 , . . . , YxN constitute a set of mutually independent random variables, and thatx1, x2, . . . , xN constitute a set of known constants.

Further, let μk = N−1∑Ni=1 xk

i , k = 1, 2, 3; and, assume that the environmentalscientist chooses the N temperature values x1, x2, . . . , xN so that μ1 = μ3 = 0.

Suppose that this environmental scientist decides to estimate the parameter θx usingthe straight-line estimator θx = (B0 + B1x), where B0 = N−1∑N

i=1 Yxi and where B1 =∑Ni=1 xiYxi/

∑Ni=1 x2

i . Note that θx is a straight-line function of x, but that the truemodel relating the expected value of Yx to x actually involves a squared term in x.Hence, the wrong model is being fit to the available data.

(a) Develop explicit expressions for E(θx) and V(θx). What is the exact distribution ofthe estimator θx?

(b) Consider the expression

Q =∫1

−1[E(θx) − θx]2 dx.

Since [E(θx) − θx]2 is the squared bias when θx is used to estimate θx at temperaturesetting x, Q is called the integrated squared bias. The quantity Q can be interpreted asbeing the cumulative bias over all values of x such that −1 ≤ x ≤ +1. It is desirableto choose the temperature settings x1, x2, . . . , xN to make Q as small as possible.More specifically, find the numerical value of μ2 that minimizes Q. Then, given thisresult, if N = 4, find a set of values for x1, x2, x3, and x4 such that Q is minimizedand that μ1 = μ3 = 0.

Exercise 4.43*. For the ith of n (i = 1, 2, . . . , n) U.S. military families, suppose thatthere are yi1 events of child abuse during a period of Li1 months when the soldier-father is not at home (i.e., is deployed to a foreign country), and suppose that there areyi0 events of child abuse during a period of Li0 months when the soldier-father is athome (i.e., is not deployed). To assess whether the rate of child abuse when the soldier-father is deployed is different from the rate of child abuse when the soldier-father isnot deployed, the following statistical model is proposed.

Let αi ∼ N(0, σ2α). For j = 0, 1, and given αi fixed, suppose that the random variable

Yij, with realization (or observed value) yij, is assumed to have a Poisson distributionwith conditional mean E(Yij|αi) = Lijλij, where

ln(λij) = αi + βDij +p∑

l=1

γlCil;

here, λi0 and λi1 denote the respective nondeployment and deployment rates of childabuse per month for the ith family, Dij takes the value 1 if j = 1 (i.e., if the soldier-father

Page 240: Exercises and Solutions in Biostatistical Theory (2010)

Exercises 221

is deployed) and Dij takes the value 0 if j = 0 (i.e., if the soldier-father is not deployed),and Ci1, Ci2, . . . , Cip are the values of p covariates C1, C2, . . . , Cp specific to the ith fam-ily. Further, conditional on αi being fixed, Yi0 and Yi1 are assumed to be independentrandom variables, i = 1, 2, . . . , n.

(a) Find an explicit expression for cov(Yi0, Yi1), and comment on the rationale forincluding the random effect αi in the proposed statistical model.

(b) Let the random variable Yi = (Yi0 + Yi1) be the total number of child abuse eventsfor the ith family. Show that the conditional distribution pYi1(yi1|Yi = yi, αi) of Yi1,given Yi = yi and αi fixed, is BIN(yi, πi), where

πi = Li1θ

Li0 + Li1θ;

here,

θ = λi1λi0

= eβ

is the rate ratio comparing the deployment and non-deployment rates of childabuse per month for the ith family. Note that the rate ratio parameter θ does notvary with i (i.e., does not vary across families), even though the individual rateparameters are allowed to vary with i.

(c) Under the reasonable assumption that families behave independently of oneanother, use the conditional likelihood function

L =n∏

i=1

pYi1(yi1|Yi = yi, αi)

to show that the conditional MLE θ of θ satisfies the equation

θ

n∑

i=1

(yiLi1

Li0 + Li1θ

)=

n∑

i=1

yi1.

(d) Using expected information, develop a general expression for a large-sample 95%CI for the rate ratio parameter θ.

Exercise 4.44*. Let Y be a continuous response variable and let X be a continuouspredictor variable. Also, assume that

E(Y|X = x) = (β0 + β1x) and X ∼ N(μx, σ2x).

Further, suppose that the predictor variable X is very expensive to measure, but thata surrogate variable X∗ is available and can be measured fairly inexpensively. Further,for i = 1, 2, . . . , n, assume that X∗

i and Xi are related by the measurement error modelX∗

i = (Xi + Ui), where Ui ∼ N(0, σ2u) and where Xi and Ui are independent random

variables. Suppose that it is decided to use X∗ instead of X as the predictor variable

Page 241: Exercises and Solutions in Biostatistical Theory (2010)

222 Estimation Theory

when estimating the slope parameter β1 by fitting a straight-line regression modelvia unweighted least squares. In particular, suppose that the n mutually independentpairs (X∗

i , Yi) = (Xi + Ui, Yi), i = 1, 2, . . . , n, are used to construct an estimator β∗1 of

β1 of the form

β∗1 =

∑ni=1(X∗

i − X∗)Yi∑ni=1(X∗

i − X∗)2,

where X∗ = n−1∑ni=1 X∗

i .Using conditional expectation theory, derive an explicit expression for E(β∗

1), andthen comment on how E(β∗

1) varies as a function of the ratio λ = σ2u/σ2

x, 0 < λ < ∞.In your derivation, use the fact that Xi and X∗

i have a bivariate normal distri-bution and employ the assumption that E(Yi|Xi = xi, X∗

i = x∗i ) = E(Yi|Xi = xi), i =

1, 2, . . . , n. This assumption is known as the nondifferential error assumption and statesthat X∗

i contributes no further information regarding Yi if Xi is available.For an excellent book on measurement error and its effects on the validity of statis-

tical analyses, see Fuller (2006).

Exercise 4.45*. Let the random variable Y take the value 1 if a person develops acertain rare disease, and let Y take the value 0 if not. Consider the following exponentialregression model, namely,

pr(Y = 1|X, C) = e(β0+β1X+γ′C),

where X is a continuous exposure variable, C′ = (C1, C2, . . . , Cp) is a row vector of pcovariates, and γ′ = (γ1, γ2, . . . , γp) is a row vector of p regression coefficients. Here,β1(>0) is the key parameter of interest; in particular, β1 measures the effect of theexposure X on the probability (or risk) of developing the rare disease after adjustingfor the effects of the covariates C1, C2, . . . , Cp. Since the disease in question is rare, itis reasonable to assume that

pr(Y = 1|X, C) = e(β0+β1X+γ′C) < 1.

Now, suppose that the exposure variable X is very expensive to measure, but thata surrogate variable X∗ for X is available and can be measured fairly inexpensively.Further, assume that X and X∗ are related via the Berkson measurement error model(Berkson, 1950)

X = α0 + α1X∗ + δ′C + U,

where α1 > 0, where U ∼ N(0, σ2u), where δ′ = (δ1, δ2, . . . , δp) is a row vector of p

regression coefficients, and where the random variables U and X∗ are independentgiven C.

(a) Show that corr(X, X∗|C) < 1, in which case X∗ is said to be an imperfect surrogatefor X (since it is not perfectly correlated with X).

(b) Determine the structure of fX(x|X∗, C), the conditional density function of X givenX∗ and C.

Page 242: Exercises and Solutions in Biostatistical Theory (2010)

Exercises 223

(c) Now, suppose that an epidemiologist decides to use X∗ instead of X for theexposure variable in the exponential regression model given above. To assess theimplications of this decision, show that pr(Y = 1|X∗, C) has the structure

pr(Y = 1|X∗, C) = e(θ0+θ1X∗+ξ′C),

where θ0, θ1, and ξ′ are specific parametric functions of one or more of the quantitiesβ0, β1, α0, α1, σ2

u, γ′, and δ′. In your derivation, assume that

pr(Y = 1|X, X∗, C) = pr(Y = 1|X, C);

this is known as the nondifferential error assumption and states that X∗ contributes nofurther information regarding Y if X is available. In particular, show that θ1 = β1,and then comment on the implication of this result with regard to the estimationof β1 using X∗ instead of X in the stated exponential regression model.

For an application of this methodology, see Horick et al. (2006).

Exercise 4.46*. Suppose that Y is a dichotomous outcome variable taking the values 0and 1, and that X is a dichotomous predictor variable also taking the values 0 and 1.Further, for x = 0, 1, let

μx = pr(Y = 1|X = x) and let δ = pr(X = 1).

(a) Suppose that X is unobservable, and that a surrogate dichotomous variable X∗ isused in place of X. Further, assume that X and X∗ are related via the misclassificationprobabilities

πxx∗ = pr(X∗ = x∗|X = x), x = 0, 1 and x∗ = 0, 1.

Find an explicit expression for corr(X, X∗). For what values of π00, π10, π01, andπ11 will corr(X, X∗) = 1, in which case X∗ is said to be a perfect surrogate for X?Comment on your findings.

(b) Now, consider the risk difference parameter θ = (μ1 − μ0). With μ∗x∗ = pr(Y =

1|X∗ = x∗), prove that |θ∗| ≤ |θ|, where

θ∗ = (μ∗1 − μ∗

0).

Then, comment on this finding with regard to the misclassification bias resultingfrom the use of X∗ instead of X for estimating θ. In your proof, assume that

pr[Y = 1|(X = x) ∩ (X∗ = x∗)] = pr(Y = 1|X = x);

this nondifferential error assumption states that X∗ contributes no further informationregarding Y if X is available.

Page 243: Exercises and Solutions in Biostatistical Theory (2010)

224 Estimation Theory

SOLUTIONS

Solution 4.1

(a) Method of Moments:

Y = 1n

n∑

x=1

Yx is equated to E(Y) = 1n

n∑

x=1

E(Yx) = μ

n

n∑

x=1

x,

so that

μ1 = nY∑nx=1 x

= nY[n(n + 1)

2

] =(

2n + 1

)Y.

Unweighted Least Squares:

Q =n∑

x=1

[Yx − E(Yx)]2 =n∑

x=1

(Yx − xμ)2.

So,

∂Q∂μ

= 2n∑

x=1

(Yx − xμ)(−x) = 0 =⇒ μ

n∑

x=1

x2 =n∑

x=1

xYx,

so that

μ2 =∑n

x=1 xYx∑nx=1 x2 =

∑nx=1 xYx[

n(n + 1)(2n + 1)

6

] = 6∑n

x=1 xYx

n(n + 1)(2n + 2).

Maximum Likelihood:

L =n∏

x=1

{1√

2π(r3σ2)1/2exp

[−(yx − xμ)2

2x3σ2

]}

= (2π)−n/2σ−n

⎛⎝

n∏

x=1

x−3/2

⎞⎠ exp

⎡⎣− 1

2σ2

n∑

x=1

x−3(yx − xμ)2

⎤⎦ .

So,

ln L = −n2

ln(2π) − n ln(σ) − 32

n∑

x=1

ln x − 12σ2

n∑

x=1

x−3(yx − xμ)2.

Thus,

∂ ln L∂μ

= −1σ2

n∑

x=1

x−3(yx − xμ)(−x) = 0

Page 244: Exercises and Solutions in Biostatistical Theory (2010)

Solutions 225

gives

μ

n∑

x=1

x−1 =n∑

x=1

x−2yx,

so that

μ3 =∑n

x=1 x−2Yx∑nx=1 x−1 .

Now, since μ1, μ2, and μ3 are each a linear combination of mutually independentnormal random variables, all three of these estimators are normally distributed.Now,

E(μ1) =(

2n + 1

)E(Y) =

(2

n + 1

)1n

n∑

x=1

E(Yx)

=(

2n + 1

n

n∑

x=1

x = μ,

V(μ1) =(

2n + 1

)2 1n2

n∑

x=1

x3σ2 = 4σ2∑nx=1 x3

n2(n + 1)2 = σ2.

E(μ2) =∑n

x=1 x(xμ)∑nx=1 x2 = μ,

V(μ2) =∑n

x=1 x2(x3σ2)(∑n

x=1 x2)2 = 36σ2

n2(n + 1)2(2n + 1)2

n∑

x=1

x5.

E(μ3) =∑n

x=1 x−2(xμ)∑nx=1 x−1 = μ,

V(μ3) =∑n

x=1 x−4(x3σ2)(∑n

x=1 x−1)2 = σ2

(∑nx=1 x−1

) .

(b) Clearly, since all these estimators are unbiased estimators of μ, we want to use theestimator with the smallest variance. We could analytically compare V(μ1), V(μ2),and V(μ3), but there is a more direct way. Since

∂ ln L∂μ

= 1σ2

n∑

x=1

x−2yx − μ

σ2

n∑

x=1

x−1,

and since

E

(∂2 ln L∂μ ∂σ2

)= 0,

Page 245: Exercises and Solutions in Biostatistical Theory (2010)

226 Estimation Theory

so that the expected information matrix is a diagonal matrix, the Cramér–Rao lowerbound for the variance of any unbiased estimator of μ using {Y1, Y2, . . . , Yn} is

1

−E(

∂2 ln L∂μ2

) = σ2∑n

x=1 x−1 ,

which is achieved by μ3 for any finite n. So, the “best” exact 95% CI for μ shouldbe based on μ3, the minimum variance bound unbiased estimator (MVBUE) of μ.Since

μ3 − μ√V(μ3)

∼ N(0, 1),

the “best” exact 95% CI for μ is μ3 ± 1.96√

V(μ3). For the given data,

μ3 =∑5

x=1 x−2(x + 1)∑5

x=1 x−1=

(21

+ 34

+ 49

+ 516

+ 625

)

(1 + 1

2+ 1

3+ 1

4+ 1

5

) = 1.641,

and

V(μ3) = (2)

(2.283)= 0.876,

so that the computed exact 95% CI for μ is

1.641 ± 1.96√

0.876 = 1.641 ± 1.835 = (−0.194, 3.476).

Solution 4.2

(a) The unweighted least-squares estimator θuls is the value of θ that minimizes

Q =n∑

i=1

(Yi − θxi)2.

Solving

∂Q∂θ

= −2n∑

i=1

xi(Yi − θxi) = 0

yields

θuls =n∑

i=1

xiYi

/ n∑

i=1

x2i .

Since

∂2Q∂θ2 = 2

n∑

i=1

x2i > 0,

Page 246: Exercises and Solutions in Biostatistical Theory (2010)

Solutions 227

θuls minimizes Q. Also,

E(θuls) =∑n

i=1 xiE(Yi)∑ni=1 x2

i

=∑n

i=1 xi(θxi)∑ni=1 x2

i

= θ,

and

V(θuls) =∑n

i=1 x2i V(Yi)

(∑n

i=1 x2i )2

=∑n

i=1 x2i (θxi)

(∑n

i=1 x2i )2

= θ

n∑

i=1

x3i

/⎛⎝

n∑

i=1

x2i

⎞⎠

2

.

(b) The method of moments estimator is obtained by solving for θ using the equation

Y = E(Y),

where

Y = n−1n∑

i=1

Yi

and

E(Y) = n−1n∑

i=1

E(Yi) = n−1n∑

i=1

(θxi)

= θn−1n∑

i=1

xi = θx.

Hence, the equation

Y = E(Y) = θx

gives

θmm = Y/x.

Obviously, E(θmm) = θ, and

V(θmm) = V(Y)

(x)2 =V[n−1∑n

i=1 Yi

]

(x)2

= n−2∑ni=1 V(Yi)

(x)2 = n−2∑ni=1(θxi)

(x)2

= n−1θx(x)2 = θ

nx= θ∑n

i=1 xi.

Page 247: Exercises and Solutions in Biostatistical Theory (2010)

228 Estimation Theory

(c) Now, with y = (y1, y2, . . . , yn), we have

L(y; θ) =n∏

i=1

{(θxi)

yi e−θxi

yi!

}

=θ∑n

i=1 yi(∏n

i=1 xyii

)e−θ

∑ni=1 xi

∏ni=1 yi!

,

so that

ln L(y; θ) =⎛⎝

n∑

i=1

yi

⎞⎠ ln θ +

n∑

i=1

yi ln xi − θ

n∑

i=1

xi −n∑

i=1

ln yi!.

So,

∂ ln L(y; θ)∂θ

=(∑n

i=1 yi)

θ−

n∑

i=1

xi = 0

gives

θml =n∑

i=1

Yi

/ n∑

i=1

xi = Yx

(= θmm).

So,

E(θml) = E(θmm) = θ and V(θml) = V(θmm) = θ

nx.

Note that one can use exponential family theory to show that θml(= θmm) is theMVBUE of θ. In particular,

L(y; θ) = exp

⎧⎨⎩θml

⎛⎝

n∑

i=1

xi

⎞⎠ (ln θ) − θ

n∑

i=1

xi + ln

[∏ni=1 x

yii∏n

i=1 yi!

]⎫⎬⎭ .

(d) From part (c),

∂2 ln L(y; θ)∂θ2 = −∑n

i=1 yi

θ2 ,

so that

−Ey

[∂2 ln L(y; θ)

∂θ2

]=∑n

i=1 E(Yi)

θ2 = θ∑n

i=1 xi

θ2 = nxθ

.

Hence,

CRLB = 1(nx/θ)

= θ

nx,

Page 248: Exercises and Solutions in Biostatistical Theory (2010)

Solutions 229

which is achieved by the estimators θml and θmm (which are identical), but whichis not achieved by θuls.

Solution 4.3

(a) Since V(θ) = k2σ21 + (1 − k)2σ2

2 + 2k(1 − k)ρσ1σ2, it follows that

∂V(θ)

∂k= 2kσ2

1 − 2(1 − k)σ22 + 2(1 − 2k)ρσ1σ2 = 0.

Solving the above equation gives

k∗ = σ22 − ρσ1σ2

σ21 + σ2

2 − 2ρσ1σ2=

σ2σ1

− ρ

σ1σ2

+ σ2σ1

− 2ρ= (1 − ρλ)

(1 + λ2 − 2ρλ), k∗ > 0,

which minimizes V(θ).Interestingly, if ρ > λ, then k∗ > 1, so that the unbiased estimator θ2 gets negative

weight. And, when λ = 1, so that σ1 = σ2, k∗ = 12 , regardless of the value of ρ.

(b) In general, since σ2 = σ1/λ,

V(θ) = k2σ21 + (1 − k)2 σ2

1λ2 + 2k(1 − k)ρ

σ21λ

= σ21

[k2 + (1 − k)2 1

λ2 + 2k(1 − k)ρ

λ

].

So, after substituting k∗ for k in the above expression and doing some algebraicsimplification, we obtain

V(θ∗) = σ21

[1 − (λ − ρ)2

(1 − 2ρλ + λ2)

].

Thus, if λ = ρ, V(θ∗) < σ21.

For further discussion, see Samuel-Cahn (1994).

Solution 4.4. Since

E(β) =n∑

i=1

ciE(Xi) =n∑

i=1

ci(βai) = β

n∑

i=1

ciai,

we require that∑n

i=1 ciai = 1. Now,

V(β) =n∑

i=1

c2i V(Xi) =

n∑

i=1

c2i σ2

i .

Page 249: Exercises and Solutions in Biostatistical Theory (2010)

230 Estimation Theory

So, we need to minimize∑n

i=1 c2i σ2

i subject to the constraint∑n

i=1 ciai = 1. Althoughthe Lagrange Multiplier Method could be used, we will do this minimization directly.So,

V(β) =n∑

i=1

c2i σ2

i

=n−1∑

i=1

c2i σ2

i + (cnan)2

(σ2

n

a2n

)

=n−1∑

i=1

c2i σ2

i +⎛⎝1 −

n−1∑

i=1

ciai

⎞⎠

2 (σ2

n

a2n

).

So, for i = 1, 2, . . . , (n − 1),

0 = dV(β)

dci= 2ciσ

2i + 2

⎛⎝1 −

n−1∑

i=1

ciai

⎞⎠ (−ai)

(σ2

n

a2n

)

= 2ciσ2i + 2(cnan)(−ai)

(σ2

n

a2n

)

⇒ 0 = ciσ2i − aicnσ2

nan

⇒ 0 = aici − a2i cn

an

(σ2

n

σ2i

)

⇒ 0 =n∑

i=1

aici −n∑

i=1

a2i cn

an

(σ2

n

σ2i

)

⇒ 0 = 1 − cnσ2n

an

n∑

i=1

a2i

σ2i

,

so that

cn = (an/σ2n)∑n

i=1(a2i /σ2

i ).

Substituting this result into the above equations yields

ci = (ai/σ2i )

∑ni=1(ai/σ

2i )

, i = 1, 2, . . . , n.

For this choice of the ci’s,

β =n∑

i=1

[(ai/σ

2i )

∑ni=1(ai/σ

2i )

]Xi.

Page 250: Exercises and Solutions in Biostatistical Theory (2010)

Solutions 231

Since β is a linear combination of mutually independent normal variates,

β ∼ N[E(β), V(β)],

with

E(β) = β

and with

V(β) =n∑

i=1

[(ai/σ

2i )

∑ni=1(ai/σ

2i )

]2

σ2i

=∑n

i=1(a2i /σ2

i )

[∑ni=1(a2

i /σ2i )]2

=⎡⎣

n∑

i=1

(a2i /σ2

i )

⎤⎦

−1

.

Solution 4.5

(a) Since

(ni − 1)S2i

σ2 ∼ χ2ni−1 = GAMMA

[α = 2, βi = (ni − 1)

2

],

we have

E

[(ni − 1)S2

iσ2

]= 2 · (ni − 1)

2,

so that(ni − 1)

σ2 E(S2i ) = (ni − 1),

and hence E(S2i ) = σ2, i = 1, 2, . . . , k.

Thus,

E(σ2) = E

⎡⎣

k∑

i=1

wiS2i

⎤⎦ =

k∑

i=1

wiE(S2i )

= σ2

⎛⎝

k∑

i=1

wi

⎞⎠ = σ2.

(b) Now, since S21, S2

2, . . . , S2k constitute a set of k mutually independent random

variables, we have

V(σ2) =k∑

i=1

w2i V(S2

i ).

Page 251: Exercises and Solutions in Biostatistical Theory (2010)

232 Estimation Theory

And, since

V

[(ni − 1)S2

iσ2

]= (ni − 1)2

σ4 V(S2i )

= (2)2 (ni − 1)

2= 2(ni − 1),

it follows that V(S2i ) = 2σ4/(ni − 1), so that

V(σ2) =k∑

i=1

w2i

2σ4

(ni − 1).

So,

V(σ2) ∝k−1∑

i=1

w2i (ni − 1)−1 +

⎛⎝1 −

k−1∑

i=1

wi

⎞⎠

2

(nk − 1)−1.

Thus,

∂V(σ2)

∂wi= 2wi

(ni − 1)− 2(1 −∑k−1

i=1 wi)

(nk − 1)= 0, i = 1, 2, . . . , (k − 1),

so that

wi(ni − 1)

− wk(nk − 1)

= 0, i = 1, 2, . . . , k.

Hence,

(nk − 1)

k∑

i=1

wi = wk

k∑

i=1

(ni − 1),

or

wk = (nk − 1)

(N − k).

And, since wi = [(ni − 1)/(nk − 1)]wk , we have, in general,

wi = (ni − 1)

(N − k)= (ni − 1)∑k

i=1(ni − 1), i = 1, 2, . . . , k.

Using these optimal choices for the weights w1, w2, . . . , wk , the estimator σ2 takesthe specific form

σ2 =k∑

i=1

[(ni − 1)∑ki=1(ni − 1)

S2i

]=∑k

i=1∑ni

j=1(Yij − Yi)2

(N − k),

Page 252: Exercises and Solutions in Biostatistical Theory (2010)

Solutions 233

which is recognizable as a pooled variance estimator often encountered whenusing analysis of variance (ANOVA) methods.

Solution 4.6. The joint distribution of X1, X2, . . . , Xn is

pX1,X2,...,Xn(x1, x2, . . . , xn; π) =n∏

i=1

{Ck

xiπxi (1 − π)k−xi

}

=⎛⎝

n∏

i=1

Ckxi

⎞⎠π

∑ni=1 xi (1 − π)nk−∑n

i=1 xi .

Substituting θ = πk and u =∑ni=1 xi in the above expression, we have

pX1,X2,...,Xn (x1, x2, . . . , xn; θ) =[(θ1/k)u(1 − θ1/k)nk−u

]⎛⎝

n∏

i=1

Ckxi

⎞⎠ ,

which has the form g(u; θ) · h(x1, x2, . . . , xn), where h(x1, x2, . . . , xn) does not (in anyway) depend on θ. Hence, by the Factorization Theorem, U =∑n

i=1 Xi is sufficientfor θ. Note that U ∼ BIN(nk, π). To show that this binomial distribution represents acomplete family of distributions, let g(U) denote a generic function of U, and note that

E[g(U)] =nk∑

u=0

g(u)Cnku πu(1 − π)nk−u

= (1 − π)nknk∑

u=0

[g(u)Cnk

u

]( π

1 − π

)u.

Using this result and appealing to the theory of polynomials, we find that the condition

E[g(U)] = 0 ∀π, 0 < π < 1,

implies that g(u) = 0, u = 0, 1, . . . , nk. Hence, U is a complete sufficient statistic for θ.

Let U∗ ={

1 if X1 = k,0 otherwise.

Then, E(U∗) = πk . Thus, by the Rao–Blackwell Theorem,

θ = E(U∗|U = u) = pr(U∗ = 1|U = u) = pr(X1 = k|U = u)

is the MVUE of θ. Clearly, since U =∑ni=1 Xi, θ = 0 for u = 0, 1, . . . , (k − 1). So, for

u = k, (k + 1), . . . , nk,

θ = E(U∗|U = u

)

= pr(U∗ = 1|U = u)

Page 253: Exercises and Solutions in Biostatistical Theory (2010)

234 Estimation Theory

= pr(X1 = k|U = u)

= pr[(X1 = k) ∩ (U = u)]pr(U = u)

= pr(X1 = k) × pr(∑n

i=2 Xi = u − k)

pr(∑n

i=1 Xi = u)

= (πk) × Ck(n−1)u−k πu−k(1 − π)k(n−1)−(u−k)

Cnku πu(1 − π)nk−u

= Ck(n−1)u−k

Cnku

,

where the next-to-last line follows because∑n

i=2 Xi ∼ BIN[k(n − 1), π] and U ∼BIN(nk, π).

So, θ =

⎧⎪⎨⎪⎩

0, u = 0, 1, 2, . . . , (k − 1),Ck(n−1)

u−k

Cnku

, u = k, (k + 1), . . . , nk,

where u =∑ni=1 xi.

To demonstrate that E(θ) = θ, note that

E(θ) =nk∑

u=k

Ck(n−1)u−k

Cnku

Cnku πu(1 − π)nk−u

=nk∑

u=k

Ck(n−1)u−k πu(1 − π)nk−u

=k(n−1)∑

z=0

Ck(n−1)z πz+k(1 − π)nk−(z+k)

= πkk(n−1)∑

z=0

Ck(n−1)z πz(1 − π)k(n−1)−z

= πk[π + (1 − π)]k(n−1) = πk .

Solution 4.7

(a) Now,

L(y; σr) =n∏

i=1

{1√2πσ

e−y2i /2σ2

}= (2π)−n/2θ−n/r exp

{−∑n

i=1 y2i

2θ2/r

},

Page 254: Exercises and Solutions in Biostatistical Theory (2010)

Solutions 235

so that U =∑ni=1 Y2

i is a sufficient statistic for θ = σr . Also,

Uσ2 =

n∑

i=1

(Yiσ

)2=

n∑

i=1

Z2i ∼ χ2

n = GAMMA(α = 2, β = n2),

since Zi ∼ N(0, 1) and the {Zi}ni=1 are mutually independent. So, E

(U/σ2

)= n, so

that E(U) = nσ2. So, we might consider some function of Ur/2. Thus,

E

[(Uσ2

)r/2]

=E(

Ur/2)

σr = Γ(n/2 + r/2)

Γ(n/2)2r/2

= Γ[(n + r)/2]Γ(n

2 )2r/2.

So,

θ = 2−r/2 Γ(n/2)

Γ[(n + r)/2]Ur/2

is a function of a sufficient statistic (namely, U) that is an unbiased estimator of θ.As a special case, when r = 2,

θ = 2−1 Γ(n/2)

Γ(n/2 + 1)U = 2−1

(2n

)U = U

n,

as expected.

(b) Since

L(y; θ) ≡ L = (2π)−n/2θ−n/r exp

{−∑n

i=1 y2i

2θ2/r

},

we have

ln L = −n2

ln(2π) − nr

ln θ − u2θ2/r , where u =

n∑

i=1

y2i .

So,

∂ ln L∂θ

= − nrθ

−(−2

r

)θ−2/r−1u

2= −n

rθ+ θ−2/r−1u

r,

and

∂2 ln L∂θ2 = n

rθ2 +(−2

r− 1)

θ−2/r−2ur

= nrθ2 − (2 + r)θ−2/r−2u

r2 .

Page 255: Exercises and Solutions in Biostatistical Theory (2010)

236 Estimation Theory

Thus,

−E

(∂2 ln L

∂θ2

)= −n

rθ2 + (2 + r)θ−2/r−2(nσ2)

r2

= −nrθ2 + (2 + r)θ−2/r−2nθ2/r

r2

= −nrθ2 + n(2 + r)

r2θ2 = 2nr2θ2 .

So, the CRLB is

CRLB = r2θ2

2n= r2σ2r

2n.

When r = 2, we obtain

CRLB = 4σ4

2n= 2σ4

n,

which is achieved by θ since

V(θ)

= V(

Un

)= V

(σ2

n· Uσ2

)= σ4

n2 V(χ2n) = σ4

n2 (2n) = 2σ4

n.

Solution 4.8

(a) First,

L(y; θ) =n∏

i=1

[θ−1/2e−yi/√

θ] = θ−n/2e−s/√

θ,

where s =∑ni=1 yi. Solving for θ in the equation

∂ ln L(y; θ)∂θ

= −n2θ

+ s2θ3/2 = 0,

yields the MLE θ = Y2. Now,

∂2 ln L(y; θ)∂θ2 = n

2θ2 − 3s4θ5/2 ,

so that

−Ey

[∂2 ln L(y; θ)

∂θ2

]= −n

2θ2 + 3(nθ1/2)

4θ5/2 = −n2θ2 + 3n

4θ2 = n4θ2 .

Page 256: Exercises and Solutions in Biostatistical Theory (2010)

Solutions 237

So, an appropriate large-sample 95% CI for θ is

Y2 ± 1.96

√4θ2

n,

or, equivalently,

Y2 ± 3.92Y2√

n.

When s = 40 and n = 50, this CI is

(4050

)2±

(3.92)(

4050

)2

√50

or (0.285, 0.995).

(b) Clearly, S =∑ni=1 Yi is a sufficient statistic for θ = α2. And, E(S) = n

√θ and V(S) =

nθ. Since E(S2) = nθ + n2θ = n(n + 1)θ, it follows that

S2

n(n + 1)= θ∗

is the MVUE of θ (because S is a complete sufficient statistic for θ from exponentialfamily theory). Now,

V(θ∗) = V(S2)

n2(n + 1)2 = E(S4) − [E(S2)]2n2(n + 1)2 .

Since S ∼ GAMMA(α = √θ, β = n), it follows that

E(Sr) = Γ(n + r)Γ(n)

αr = Γ(n + r)Γ(n)

θr/2, r ≥ 0.

So,

E(S4) = Γ(n + 4)

Γ(n)θ2 = n(n + 1)(n + 2)(n + 3)θ2.

So,

V(θ∗) = n(n + 1)(n + 2)(n + 3)θ2 − n2(n + 1)2θ2

n2(n + 1)2

= θ2

n(n + 1)[n2 + 5n + 6 − n2 − n]

= θ2

n(n + 1)(4n + 6)

= 2(2n + 3)

n(n + 1)θ2.

Page 257: Exercises and Solutions in Biostatistical Theory (2010)

238 Estimation Theory

(c) From part (a),

−Ey

[∂2 ln L(y; θ)

∂θ2

]= n

4θ2 ,

so that the CRLB is

CRLB = 4θ2

n.

However,

V(θ∗) = 2(2n + 3)θ2

n(n + 1)=[

2n + 32n + 2

][4θ2

n

]>

4θ2

n,

so θ∗ does not achieve the CRLB.

(d) In general,

MSE(θ∗, θ) = V(θ∗) + [E(θ∗) − θ]2= V(θ∗) + 0

= 2(2n + 3)

n(n + 1)θ2.

Since θ = Y2,

E(

Y2)

= V(Y)+ [E (Y)]2 = θ

n+ θ.

And,

V(

Y2)

= V(S2)

n4

= n(n + 1)(n + 2)(n + 3)θ2 − n2(n + 1)2θ2

n4

= (n + 1)θ2

n3 [n2 + 5n + 6 − n2 − n]

= 2(n + 1)(2n + 3)θ2

n3 .

So,

MSE(θ, θ) = 2(n + 1)(2n + 3)θ2

n3 + θ2

n2

= θ2

n3 [2(n + 1)(2n + 3) + n]

= (4n2 + 11n + 6)

n3 θ2.

Page 258: Exercises and Solutions in Biostatistical Theory (2010)

Solutions 239

Since

θ∗ = S2

n(n + 1)=(

nn + 1

)θ,

we have

MSE(θ∗, θ) = V(θ∗) =(

nn + 1

)2V(θ) < V(θ) < MSE(θ, θ),

for all finite n, so that θ∗ is preferable to θ for finite n. However,

limn→∞

[MSE(θ∗, θ)

MSE(θ, θ)

]= 1,

so that there is no difference asymptotically.

Solution 4.9

(a) Let C be the event that a subject is classified as having been recently exposed tobenzene, and let E be the event that a subject has truly been recently exposed tobenzene. Then, pr(C) = pr(C|E)pr(E) + pr(C|E)pr(E), so that pr(C) = γπ + δ(1 −π). Since X has a binomial distribution with mean E(X) = n[pr(C)], equating X toE(X) via the method of moments gives

π =Xn − δ

γ − δ,

as the unbiased estimator of π.Since V(X) = n[pr(C)][1 − pr(C)], the variance of the estimator π is

V(π) = V (X/n)

(γ − δ)2

= [γπ + δ(1 − π)][1 − γπ − δ(1 − π)]n(γ − δ)2 .

(b) Since n is large, the standardized random variable (π − π)/

√V(π) ∼ N(0, 1) by

Slutsky’s Theorem, where

V(π) = [γπ + δ(1 − π)][1 − γπ − δ(1 − π)]n(γ − δ)2 .

Thus, an appropriate large-sample 95% CI for π is

π ± 1.96√

V(π).

When n = 50, δ = 0.05, γ = 0.90, and x = 20, the computed 95% interval for π is0.412 ± 1.96(0.0815) = (0.252, 0.572).

Page 259: Exercises and Solutions in Biostatistical Theory (2010)

240 Estimation Theory

Solution 4.10

(a) Since

(Y11, Y00, n − Y11 − Y00)

∼ MULT{

n; (π2 + θ), [(1 − π)2 + θ], 2[π(1 − π) − θ]}

,

it follows directly that

(π2 + θ) = Y11n

and [(1 − π)2 + θ] = Y00n

.

Solving these two equations simultaneously gives the desired expressions for π

and θ.

(b) Appealing to properties of the multinomial distribution, we have

E(π) = 12

+ [E(Y11) − E(Y00)]2n

= 12

+ n(π2 + θ) − n[(1 − π)2 + θ]2n

= 12

+ 12(π2 − 1 + 2π − π2) = π,

so that π is an unbiased estimator of the parameter π.And, with β11 = (π2 + θ) and β00 = [(1 − π)2 + θ], it follows that

V(π) = (4n2)−1[V(Y11) + V(Y00) − 2cov(Y11, Y00)]= (4n2)−1[nβ11(1 − β11) + nβ00(1 − β00) − 2nβ11β00]= (4n)−1[β11(1 − β11) + β00(1 − β00) − 2β11β00].

(c) Since β11 = Y11/n and β00 = Y00/n, it follows that the estimator V(π) of V(π) isequal to

V(π) = (4n)−1[

Y11n

(1 − Y11

n

)+ Y00

n

(1 − Y00

n

)− 2

(Y11n

)(Y00

n

)].

When n = 30, y11 = 3, and y00 = 15, then the estimated value π of π is equal to

π = 12

+ (3 − 15)

30= 0.50 − 0.40 = 0.10.

And, the estimated variance of π is equal to

V(π) = [4(30)]−1[(

330

)(2730

)+(

1530

)(1530

)− 2

(330

)(1530

)]

= 0.0020.

Page 260: Exercises and Solutions in Biostatistical Theory (2010)

Solutions 241

Thus, the computed 95% CI for π is equal to

π ± 1.96√

V(π) = 0.10 ± 1.96√

0.0020 = 0.10 ± 0.0877,

or (0.0123, 0.1877).

Solution 4.11

(a) With Y = g(X), where X ′ = (X1, X2, . . . , Xk) and μ′ = (μ1, μ2, . . . , μk), and withE(Xi) = μi, V(Xi) = σ2

i , and cov(Xi, Xj) = σij for all i = j, i = 1, 2, . . . , k and j =1, 2, . . . , k, then the delta method gives E(Y) ≈ g(μ) and

V(Y) ≈k∑

i=1

[∂g(μ)

∂Xi

]2σ2

i + 2k−1∑

i=1

k∑

j=i+1

[∂g(μ)

∂Xi

][∂g(μ)

∂Xj

]σij,

where∂g(μ)

∂Xi= ∂g(X)

∂Xi∣∣X=μ

.

In our particular situation, k = 2; and, with X1 ≡ Y10 and X2 ≡ Y01, then Y =g(X1, X2) = ln(X1/X2) = ln OR = ln X1 − ln X2. So,

∂g(X1, X2)

∂X1= 1

X1and

∂g(X1, X2)

∂X2= − 1

X2.

Now, E(X1) = nπ10, E(X2) = nπ01, V(X1) = nπ10(1 − π10), and V(X2) = nπ01(1 −π01). Also, cov(X1, X2) = −nπ10π01. Finally,

V(ln OR) ≈(

1nπ10

)2nπ10(1 − π10) +

( −1nπ01

)2nπ01(1 − π01)

+ 2(

1nπ10

)( −1nπ01

)(−nπ10π01)

= (1 − π10)

nπ10+ (1 − π01)

nπ01+ 2

n

= 1nπ10

− 1n

+ 1nπ01

− 1n

+ 2n

= 1nπ10

+ 1nπ01

.

Since E(Y10) = nπ10 and E(Y01) = nπ01, we have

V(ln OR) ≈ 1Y10

+ 1Y01

.

Page 261: Exercises and Solutions in Biostatistical Theory (2010)

242 Estimation Theory

For the given set of data, the estimate of the variance of ln OR is

V(ln OR) ≈ 125

+ 115

= 0.107.

(b) Assume Z ∼ N(0, 1). Then,

0.95 = pr{−1.96 < Z < +1.96}

≈ pr

⎧⎪⎨⎪⎩

−1.96 <ln OR − ln OR√

V(ln OR)

< 1.96

⎫⎪⎬⎪⎭

= pr{

ln OR − 1.96√

V(ln OR) < ln OR

< ln OR + 1.96√

V(ln OR)

}

= pr

{(OR)e−1.96

√V(ln OR)

< OR < (OR)e+1.96√

V(ln OR).

}

So, the 95% CI for OR is[(OR)e−1.96

√V(ln OR), (OR)e+1.96

√V(ln OR)

].

For the data in part (a), we obtain

[(2515

)e−1.96

√0.107,

(2515

)e+1.96

√0.107

]= (0.878, 3.164).

Solution 4.12

(a) The appropriate likelihood function L is

L =1∏

i=0

ni∏

j=1

⎡⎢⎣

(Lijλi

)yije−Lijλi

yij!

⎤⎥⎦ ,

so that ln L can be written as

ln L =1∑

i=0

⎡⎣

ni∑

j=1

yij ln Lij +⎛⎝

ni∑

j=1

yij

⎞⎠ ln λi − λi

ni∑

j=1

Lij −ni∑

j=1

yij!⎤⎦ .

So, for i = 0, 1,

∂ ln L∂λi

=∑n1

j=1 yij

λi−

ni∑

j=1

Lij = 0

Page 262: Exercises and Solutions in Biostatistical Theory (2010)

Solutions 243

gives

λi =∑ni

j=1 Yij∑ni

j=1 Lij

as the MLE of λi.Also, for i = 0, 1,

∂2 ln L∂λ2

i

=−∑n1

j=1 yij

λ2i

,

so that, with E(Yij) = Lijλi and ∂2 ln L/∂λ1∂λ2 = 0, we have

V(λi) ={

−E

(∂2 ln L

∂λ2i

)}−1

= λi∑nij=1 Lij

.

Now, by the invariance principle, the MLE ln ψ of ln ψ is

ln ψ = ln λ1 − ln λ0.

And, using the delta method, we have

V(ln ψ) = V(ln λ1) + V(ln λ0)

≈(

1λ1

)2V(λ1) +

(1λ0

)2V(λ0)

= 1

λ1∑n1

j=1 L1j+ 1

λ0∑n0

j=1 L0j.

Hence, from ML theory, the random variable

ln ψ − ln ψ√V(ln ψ)

= ln ψ − ln ψ(

1∑n1j=1 y1j

+ 1∑n0j=1 y0j

)1/2 ∼ N(0, 1) for large samples,

so that a ML-based large-sample 100(1 − α)% CI for ln ψ is

ln ψ ± Z1−α/2

√V(ln ψ) = (ln λ1 − ln λ0)

± Z1−α/2

⎛⎝ 1∑n1

j=1 y1j+ 1∑n0

j=1 y0j

⎞⎠

1/2

,

where pr(Z > Z1−α/2) = α/2 when Z ∼ N(0, 1).

Page 263: Exercises and Solutions in Biostatistical Theory (2010)

244 Estimation Theory

(b) Based on the CI for ln ψ developed in part (a), an appropriate ML-based large-sample 100(1 − α)% CI for the rate ratio ψ is

(ψ)exp

⎡⎢⎣±Z1−α/2

⎛⎝ 1∑n1

j=1 y1j+ 1∑n0

j=1 y0j

⎞⎠

1/2⎤⎥⎦ .

For the given data, the computed 95% CI for ψ is

(40/35035/400

)exp

[±1.96

(1

40+ 1

35

)1/2]

= (1.306)e±0.454,

or (0.829, 2.056).Since the number 1 is contained in this 95% CI, these data provide no evidence

in favor of the proposed theory. Of course, there could be several reasons whythere were no significant findings. In particular, important individual-specific riskfactors for skin cancer and related skin conditions were not considered, some ofthese important risk factors being skin color (i.e., having fair skin), having a familyhistory of skin cancer, having had a previous skin cancer, being older, being male,and so on.

Solution 4.13

(a) The likelihood function L(t1, t2, . . . , tn) ≡ L is

L =n∏

i=1

fT(ti; θ) =n∏

i=1

[θe−θti

]= θne−θ

∑ni=1 ti .

So,

ln L = n ln θ − θ

n∑

i=1

ti,

∂L∂θ

= nθ

−n∑

i=1

ti,

and

∂2L∂θ2 = −n

θ2 .

Thus, the large-sample variance of θ is

V(θ) =[−E

(∂2 ln L

∂θ2

)]−1

= θ2

n.

Page 264: Exercises and Solutions in Biostatistical Theory (2010)

Solutions 245

(b) Now,

pr(Ti > t∗) =∫∞

t∗θe−θti dti =

[−e−θti

]∞t∗

= e−θt∗ .

So, the likelihood function L∗(y1, y2, . . . , yn) ≡ L∗ is

L∗ =n∏

i=1

{(e−θt∗

)yi(

1 − e−θt∗)1−yi

}

= e−θt∗∑n

i=1 yi (1 − e−θt∗)n−∑ni=1 yi .

So,

ln L∗ = −θt∗ny + n(1 − y) ln(1 − e−θt∗),

and

∂ ln L∗∂θ

= −t∗ny + n(1 − y)t∗e−θt∗

(1 − e−θt∗).

So,

∂ ln L∗∂θ

= 0 ⇒ n(1 − y)t∗e−θt∗ = nt∗y(1 − e−θt∗)

⇒ (1 − y)e−θt∗ = y(1 − e−θt∗)

⇒ e−θt∗ = y

⇒ θ∗ = − ln yt∗ = 1

t∗ ln(

1y

).

(c) Now,

∂2 ln L∗∂θ2 = nt∗(1 − y)

[−t∗e−θt∗(1 − e−θt∗) − e−θt∗(t∗e−θt∗)

(1 − e−θt∗)2

]

= −nt∗(1 − y)

(1 − e−θt∗)2 (t∗e−θt∗),

so that

−E

(∂2 ln L∗

∂θ2

)= n(t∗)2e−θt∗E(1 − Y)

(1 − e−θt∗)2

= n(t∗)2e−θt∗

(1 − e−θt∗)2 (1 − e−θt∗)

= n(t∗)2

(eθt∗ − 1).

Page 265: Exercises and Solutions in Biostatistical Theory (2010)

246 Estimation Theory

So, the large-sample variance of θ∗ is (eθt∗ − 1)/n(t∗)2.Hence, with t∗ ≥ E(T) = θ−1, we have

V(θ)

V(θ∗)= θ2/n

(eθt∗ − 1)/n(t∗)2 = θ2(t∗)2

(eθt∗ − 1)< 1,

so that θ is preferred based solely on large-sample variance considerations.This finding reflects the fact that we have lost information by categorizing{T1, T2, . . . , Tn} into dichotomous data {Y1, Y2, . . . , Yn}. However, if the remissiontimes are measured with error, then θ∗ would be preferred to θ on validity grounds;in other words, if the remission times are measured with error, then θ would be anasymptotically biased estimator of the unknown parameter θ.

Solution 4.14

(a) The parameter of interest is

θ = pr(Y = 0) = e−λ,

so that

λ = −ln θ.

Now, with y = (y1, y2, . . . , yn),

L(y; θ) =n∏

i=1

{(−ln θ)yiθ

yi!}

= θn(−ln θ)s∏n

i=1 yi!,

where s =∑ni=1 yi. So,

ln L(y; θ) = n ln θ + s ln(−ln θ) −n∑

i=1

ln(yi!);

∂ ln L(y; θ)∂θ

= nθ

+ sθ ln θ

;

∂2 ln L(y; θ)∂θ2 = −n

θ2 − s(ln θ + 1)

(θ ln θ)2 .

So, since S ∼ POI(nλ),

−E

[∂2 ln L(y; θ)

∂θ2

]= n

θ2 + (−n ln θ)(ln θ + 1)

(θ ln θ)2

= nθ2 − n

θ2 − nθ2 ln θ

= −nθ2 ln θ

.

Page 266: Exercises and Solutions in Biostatistical Theory (2010)

Solutions 247

So, the CRLB is

CRLB = θ2(−ln θ)

n= e−2λλ

n= λ

ne2λ.

Consider the estimator

θ =(

n − 1n

)S.

Since θ is an unbiased estimator of θ and is a function of a complete sufficientstatistic for θ, it is the MVUE of θ. Since

V(θ) = (eλ/n − 1)

e2λ>

λ

ne2λ,

there is no unbiased estimator that attains the CRLB for all finite values of n.

(b) Note that

pr(Xi = 0) = pr(Yi = 0) = e−λ,

and that

pr(Xi = 1) = pr(Yi ≥ 1) = 1 − pr(Yi = 0) = 1 − e−λ.

So, with x = (x1, x2, . . . , xn),

pXi(xi; λ) = (1 − e−λ)xi (e−λ)1−xi , xi = 0, 1.

Thus,

L(x; λ) =n∏

i=1

{(1 − e−λ)xi e−λ(1−xi)

}

= (1 − e−λ)nxe−nλ(1−x),

where

x = n−1n∑

i=1

xi.

So,

ln L(x; λ) = nx ln(1 − e−λ) − nλ(1 − x).

The equation

∂ ln L(x; λ)

∂λ= nxe−λ

(1 − e−λ)− n(1 − x) = 0

⇒ nxe−λ − n(1 − x)(1 − e−λ) = 0

⇒ xe−λ − 1 + e−λ + x − xe−λ = 0

Page 267: Exercises and Solutions in Biostatistical Theory (2010)

248 Estimation Theory

⇒ e−λ = (1 − x) ⇒ −λ = ln(1 − x)

⇒ λ∗ = − ln(1 − x).

This result also follows because X is the MLE of pr(Xi = 1) = (1 − e−λ).And,

∂2 ln L(x; λ)

∂λ2 = nx

[−e−λ(1 − e−λ) − e−λe−λ

(1 − e−λ)2

]

= −nxe−λ

(1 − e−λ)2 .

So,

−E

[∂2 ln L(x; λ)

∂λ2

]= ne−λE(X)

(1 − e−λ)2

= ne−λ(1 − e−λ)

(1 − e−λ)2

= ne−λ

(1 − e−λ).

Thus, for large n,

V(λ∗) = (1 − e−λ)

ne−λ= (eλ − 1)

n.

(c) There are two scenarios to consider:Scenario 1: Assume that Y1, Y2, . . . , Yn are accurate. Then, λ = Y is the MLE (and

MVBUE) of λ, with E(λ) = λ and V(λ) = λ/n. Since, for large n, λ∗ is essentiallyunbiased, a comparison of variances is appropriate. Now,

EFF(λ∗, λ) = λ/n(eλ − 1)/n

= λ

(eλ − 1)= λ

λ +∑∞j=2

λj

j!< 1,

so that λ∗ always has a larger variance than λ (which is an expected result sincewe are losing information by categorizing Yi into the dichotomous variable Xi).In fact,

limλ→∞ EFF(λ∗, λ) = 0,

so the loss in efficiency gets worse as λ gets larger (and this loss of information isnot affected by increasing n).

Scenario 2: Assume that Y1, Y2, . . . , Yn are inaccurate. In this case, using λ = Y toestimate λ could lead to a severe bias problem. Assuming that X1, X2, . . . , Xn areaccurate, then λ∗ is essentially unbiased for large n and so would be the preferred

Page 268: Exercises and Solutions in Biostatistical Theory (2010)

Solutions 249

estimator. Since validity takes preference over precision, λ∗ would be preferred toλ when {Y1, Y2, . . . , Yn} are inaccurate but {X1, X2, . . . , Xn} are correct.

Solution 4.15

(a) For the assumed statistical model, and with y = (y0, y1, . . . , yn), the correspondinglikelihood function L(y; θ) ≡ L is

L = pY0(y0; θ)

n−1∏

j=0

pYj+1(yj+1|Yk = yk , k = 0, 1, . . . , j; θ)

= pY0(y0; θ)

n−1∏

j=0

pYj+1(yj+1|Yj = yj; θ)

=(

θy0 e−θ

y0!

) n−1∏

j=0

(θyj)yj+1 eθyj

yj+1! .

Thus,

ln(L) ∼⎛⎝

n∑

j=0

yj

⎞⎠ ln(θ) − θ

⎛⎝1 +

n−1∑

j=0

yj

⎞⎠ ,

so that the equation

∂ ln(L)

∂θ= θ−1

n∑

j=0

yj −⎛⎝1 +

n−1∑

j=0

yj

⎞⎠ = 0

gives

θ =∑n

j=0 Yj

1 +∑n−1j=0 Yj

as the MLE of θ.

(b) Now,

∂2 ln(L)

∂θ2 =−∑n

j=0 yj

θ2 , so that − E

(∂2 ln(L)

∂θ2

)=∑n

j=0 E(Yj)

θ2 .

And, E(Y0) = θ, E(Y1) =Ey0 [E(Y1|Y0 = y0)] = Ey0(θy0) = θ2, E(Y2) = Ey1 [E(Y2|Y1= y1)] = Ey1(θy1) = θ3 and so on, so that, in general, E(Yj) = θ(j+1), j = 0, 1, . . . , n.

Finally,

−E

(∂2 ln(L)

∂θ2

)=∑n

j=0 θ(j+1)

θ2 = θ−1n∑

j=0

θj = θ−1

(1 − θ(n+1)

1 − θ

).

Page 269: Exercises and Solutions in Biostatistical Theory (2010)

250 Estimation Theory

So, for large n, V(θ).= [θ(1 − θ)]/[1 − θ(n+1)], and a ML-based 95% CI for θ is

θ ± 1.96√

V(θ) = θ ± 1.96

√θ(1 − θ)

1 − θ(n+1).

When n = 25 and θ = 1.20, the computed 95% CI for θ is (1.11, 1.29).

Solution 4.16. Given the stated assumptions, the appropriate CI for (μt − μc) using(X − Y) is:

(X − Y) ± 1.96

√σ2

tnt

+ σ2c

nc,

where X = n−1t∑nt

i=1 Xi and Y = n−1c∑nc

i=1 Yi.The optimal choices for nt and nc, subject to the constraint (nt + nc) = N, would

minimize the width of the above CI.So, we want to minimize the function

(σ2

t /nt + σ2t /nc

)subject to the constraint

(nt + nc) = N, or, equivalently, we want to minimize the function

Q = σ2t

nt+ σ2

c(N − nt)

,

with respect to nt.So,

dQdnt

= −σ2t

n2t

+ σ2c

(N − nt)2 = 0

⇒ (σ2t − σ2

c )n2t − 2Nσ2

t nt + N2σ2t = 0.

So, via the quadratic formula, the two roots of the above quadratic equation are

2Nσ2t ±

√4N2σ4

t − 4(σ2t − σ2

c )N2σ2t

2(σ2t − σ2

c )= N

[σt(σt ± σc)

(σt + σc)(σt − σc)

].

If the positive sign is used, the possible answer is Nσt/(σt − σc), which cannot becorrect. If the negative sign is used, the answer is

nt = N(

σtσt + σc

),

so that

nc = N(

σc

σt + σc

).

This choice for nt minimizes Q since dQ2

dn2t∣∣nt= Nσt

(σt+σc)

> 0.

Page 270: Exercises and Solutions in Biostatistical Theory (2010)

Solutions 251

When N = 100, σ2t = 4, and σ2

c = 9, then nt = 40 and nc = 60. Note that theseanswers make sense, since more data are required from the more variable population.

Solution 4.17. Consider the random variable (Y − Yn+1), which is a linear combinationof independent N(μ, σ2) variates. Since

E(Y − Yn+1) = E(Y) − E(Yn+1) = μ − μ = 0,

and since

V(Y − Yn+1) = V(Y) + V(Yn+1) = σ2

n+ σ2 =

(n + 1

n

)σ2,

it follows that

(Y − Yn+1) ∼ N[

0,(

n + 1n

)σ2]

.

Hence,

(Y − Yn+1)√(n + 1

n

)σ2

∼ N(0, 1).

Also, we know that

(n − 1)S2

σ2 ∼ χ2n−1.

So, ⎡⎢⎢⎣

(Y − Yn+1)√(n + 1

n

)σ2

⎤⎥⎥⎦

√(n − 1)S2

σ2

/(n − 1)

= (Y − Yn+1)

S

√(n + 1)

n

∼ tn−1,

since (Y − Yn+1) and S2 are independent random variables. So,

(1 − α) = pr

⎧⎪⎪⎨⎪⎪⎩

−tn−1,1−α/2 <(Y − Yn+1)

S

√(n + 1)

n

< tn−1,1−α/2

⎫⎪⎪⎬⎪⎪⎭

= pr

{Y − tn−1,1−α/2S

√(n + 1)

n< Yn+1 < Y + tn−1,1−α/2S

√(n + 1)

n

}.

Thus,

L = Y − tn−1,1−α/2S

√(n + 1)

n

Page 271: Exercises and Solutions in Biostatistical Theory (2010)

252 Estimation Theory

and

U = Y + tn−1,1−α/2S

√(n + 1)

n.

For the given data, the realized values of Y and S2 are y = 3 and s2 = 2.50, so thatthe computed 95% prediction interval for the random variable Y6 is

y ± tn−1,1−α/2s

√(n + 1)

n= 3 ± t0.975,4

√2.50

√65

= 3 ± 2.776√

3

= (−1.8082, 7.8082).

Solution 4.18

(a) We know that

U = (n − 1)S2

σ2 ∼ χ2n−1 = GAMMA

(α = 2, β = n − 1

2

).

If Y ∼ GAMMA(α, β), then

E(Yr) =∫∞

0yr yβ−1e−y/α

Γ(β)αβdy = Γ(β + r)

Γ(β)αr , (β + r) > 0.

So,

E(Ur) = Γ [(n − 1)/2 + r]Γ [(n − 1)/2]

2r .

So,

E[U1/2

]= E

⎡⎣√

(n − 1)S2

σ2

⎤⎦ =

√n − 1σ

E(S)

= Γ[(n − 1)/2 + 1/2]Γ[(n − 1)/2] 21/2 = Γ(n/2)

Γ[(n − 1)/2]√

2

⇒ E(S) = Γ(n/2)

Γ[(n − 1)/2]

√2

(n − 1)σ

⇒ E(W) = 2tn−1,1−α/2E(S)√

n= 23/2tn−1,1−α/2

Γ(n/2)

Γ[(n − 1)/2]σ√

n(n − 1).

If α = 0.05, n = 4, and σ2 = 4, then

E(W) = 23/2t3,.975Γ(2)

Γ(3/2)

2√4(4 − 1)

= 23/2(3.182)1

(√

π/2)

1√3

= 5.8633.

Page 272: Exercises and Solutions in Biostatistical Theory (2010)

Solutions 253

(b)

(1 − γ) ≤ pr{

2tn∗−1,1− α2

S√n∗ ≤ δ

}= pr

{4f1,n∗−1,1−α

S2

n∗ ≤ δ2

}

= pr

{S2 ≤ n∗δ2

4f1,n∗−1,1−α

}

= pr

{(n∗ − 1)S2

σ2 ≤ n∗(n∗ − 1)δ2

4σ2f1,n∗−1,1−α

}

= pr

{χ2

n∗−1 ≤ n∗(n∗ − 1)δ2

4σ2f1,n∗−1,1−α

}.

So, we require

n∗(n∗ − 1)δ2

4σ2f1,n∗−1,1−α

≥ χ2n∗−1,1−γ,

or

n∗(n∗ − 1) ≥(

δ

)2χ2

n∗−1,1−γ f1,n∗−1,1−α.

For further details, see Kupper and Hafner (1989).

Solution 4.19

(a) Let θ = 2Y1 − 3Y2 + Y3; so, E(θ) = θ, and

V(θ) = 4

(σ2

n1

)+ 9

(σ2

n2

)+(

σ2

n3

)= σ2

(4

n1+ 9

n2+ 1

n3

).

So,

Z = θ − E(θ)√V(θ)

=(2Y1 − 3Y2 + Y3

)− (2μ1 − 3μ2 + μ3)

σ√

4/n1 + 9/n2 + 1/n3∼ N(0, 1).

Now, (ni − 1)S2i /σ2 ∼ χ2

ni−1, i = 1, 2, 3, and the S2i ’s are mutually independent

random variables. Thus, by the additivity property of mutually independentgamma random variables,

U = (n1 − 1)S21 + (n2 − 1)S2

2 + (n3 − 1)S23

σ2 ∼ χ2(n1+n2+n3−3)

and

E(U) = E

[ ∑3i=1(ni − 1)S2

i(n1 + n2 + n3 − 3)

]= σ2,

Page 273: Exercises and Solutions in Biostatistical Theory (2010)

254 Estimation Theory

where∑3

i=1(ni − 1)S2i /(n1 + n2 + n3 − 3) is called a “pooled estimator”

of σ2.So, noting that the numerators and denominators in each of the following

expressions are independent, we have

T(n1+n2+n3−3) = Z√U/(n1 + n2 + n3 − 3)

= (θ − E(θ))/

√V(θ)√∑3

i=1(ni−1)S2i

σ2

/(n1 + n2 + n3 − 3)

=(2Y1 − 3Y2 + Y3

)− θ√ ∑3i=1(ni − 1)S2

i(n1 + n2 + n3 − 3)

√4

n1+ 9

n2+ 1

n3

∼ t( 3∑i=1

ni−3

).

(b) Let θ = 2Y1 − 3Y2 + Y3 and S2p =∑3

i=1(ni − 1)S2i /(n1 + n2 + n3 − 3). From part (a),

θ − θ

Sp√

4/n1 + 9/n2 + 1/n3∼ t(n1+n2+n3−3).

So,

(1 − α) = pr

{−t(∑3

i=1 ni−3, 1−α/2) <

θ − θ

Sp√

4/n1 + 9/n2 + 1/n3

< t(∑3i=1 ni−3, 1−α/2

)}

,

and hence an exact 100(1 − α)% CI for θ is

θ ± t(∑3i=1 ni−3, 1−α/2

)Sp

√4

n1+ 9

n2+ 1

n3.

For these data, we have:

[2(80) − 3(75) + 70] ± t9,0.975

√3(4 + 3 + 5)

9

√44

+ 94

+ 14

= 5 ± 8.46,

or (−3.46, 13.46).

(c) An exact 100(1 − α)% CI for σ21/σ2

2 is

(1 − α) = pr

{S2

1/S22

fn1−1, n2−1, 1−α/2<

σ21

σ22

<S2

1/S22

1/fn2−1, n1−1, 1−α/2

}.

Page 274: Exercises and Solutions in Biostatistical Theory (2010)

Solutions 255

Now, f49, 49, 0.975 = 1.76. So,

lower limit = 7/21.76

= 1.99,

and

upper limit =(

72

)(1.76) = 6.16;

thus, our 95% CI for σ21/σ2

2 is (1.99, 6.16). Note that the value 1 is not included inthis interval, suggesting variance heterogeneity.

(d) Consider the statistic

θ − θ√4S2

1/n1 + 9S22/n2 + S2

3/n3

=[

4σ21/n1 + 9σ2

2/n2 + σ23/n3

4S21/n1 + 9S2

2/n2 + S23/n3

]1/2

×⎡⎢⎣ θ − θ√

4σ21/n1 + 9σ2

2/n2 + σ23/n3

⎤⎥⎦ .

The expression in the first set of brackets converges to 1, since S2i is consistent

for σ2i , i = 1, 2, 3, while the expression in the second set of brackets converges in

distribution to N(0,1) by the Central Limit Theorem. So, by Slutsky’s Theorem,

θ − θ√4S2

1/n1 + 9S22/n2 + S2

3/n3

∼ N(0, 1) for large n1, n2, n3.

Thus, an approximate large-sample 95% CI for θ is

θ ± 1.96

√4S2

1n1

+ 9S22

n2+ S2

3n3

.

For the data in part (c), θ = 2(85) − 3(82) + 79 = 3. So, our large-sample CI is

3 ± 1.96

√4(7)

50+ 9(2)

50+ 6

50= 3 ± 2.00 or (1.00, 5.00).

The advantage of selecting large random samples from each of the three pop-ulations is that the assumptions of exactly normally distributed populations andhomogenous variance across populations can both be relaxed.

Solution 4.20

(a) Since Xi/√

θ ∼ N(0, 1), then L/θ ∼ χ2n1

, or equivalently GAMMA (2, n1/2), sinceX1, X2, . . . , Xn1 constitute a set of mutually independent random variables. If

Page 275: Exercises and Solutions in Biostatistical Theory (2010)

256 Estimation Theory

U ∼ GAMMA(α, β), then E(Ur) = [Γ(β + r)/Γ(β)]αr , (β + r) > 0. Thus, for r = 12 ,

we have

E

(√Lθ

)= θ−1/2E(

√L) = Γ (n1/2 + 1/2)

Γ (n1/2)21/2,

so that

E(√

L) = Γ [(n1 + 1)/2]Γ (n1/2)

√2θ.

(b) The random variable

Fn1,n2 =∑n1

i=1

(Xi/

√θ)2

/n1

∑n2i=1

(√θYi

)2/n2

= θ−2(

n2n1

)(∑n1i=1 X2

i∑n2i=1 Y2

i

)∼ fn1,n2 .

So,

(1 − α) = pr(fn1,n2,α/2 < Fn1,n2 < fn1,n2,1−α/2

)

= pr(

f −1n1,n2,1−α/2 < F−1

n1,n2< fn2,n1,1−α/2

)= pr(L < θ < U),

where

L =(

n2n1

)1/2(∑n1

i=1 X2i∑n2

i=1 Y2i

)1/2

f −1/2n1,n2,1−α/2

and

U =(

n2n1

)1/2(∑n1

i=1 X2i∑n2

i=1 Y2i

)1/2

f 1/2n2,n1,1−α/2.

For the available data, since f8,5,0.975 = 6.76 and f5,8,0.975 = 4.82, the computedexact 95% CI for θ is (0.430,2.455).

Solution 4.21

(a) The best point estimator of θ is D = n−1∑ni=1 Di. Since E(Di) =E(YTi − YPi) =

(μT − μP) = θ, and V(Di) = V(YTi) + V(YPi) − 2ρ√

V(YTi)V(YPi) = (σ2T + σ2

P −2ρσTσP), it follows that E(D) = θ and V(D) = (σ2

T + σ2P − 2ρσTσP)/n. Since

[D − E(D)]/√

V(D) ∼N(0, 1), it follows that

pr

[−z1−α/2 <

D − E(D)√V(D)

< z1−α/2

]= (1 − α) = pr(L < θ < U)

where

L = D − z1−α/2

√V(D) and U = D + z1−α/2

√V(D).

Page 276: Exercises and Solutions in Biostatistical Theory (2010)

Solutions 257

Given the available data, the realized value of L is 0.02, and the realized value ofU is 1.98, so that the computed 95% CI for θ is (0.02, 0.98). This computed 95% CIdoes not include the value zero, indicating that there is statistical evidence thatθ = 0 (or, equivalently, that μT = μP).

(b) Now,

pr(L > 0|θ = 1.0) = pr[

D − z0.975

√V(D) > 0|θ = 1.0

]

= pr

[D − 1.0√

V(D)>

1.96√

V(D) − 1.0√V(D)

]

= pr

[Z > 1.96 − 1.0√

V(D)

],

where Z ∼ N(0, 1).So, to achieve pr(L > 0|θ = 1.0) ≥ 0.95, we require 1.96 − 1.0/

√V(D) ≤ −1.645, or,

equivalently,

1.0√V(D)

= 1.0√

n√σ2

T + σ2P − 2ρσTσP

≥ (1.96 + 1.645),

which gives n∗ = 46.

Solution 4.22

(a)

E(Ui) = E(Xi + Yi) = E(Xi) + E(Yi) = (μx + μy),

E(Vi) = E(Xi − Yi) = E(Xi) − E(Yi) = (μx − μy),

V(Ui) = V(Xi + Yi) = V(Xi) + V(Yi) + 2cov(Xi, Yi)

= σ2 + σ2 + 2ρσ2 = 2σ2(1 + ρ), and

V(Vi) = V(Xi − Yi) = V(Xi) + V(Yi) − 2cov(Xi, Yi)

= σ2 + σ2 − 2ρσ2 = 2σ2(1 − ρ).

(b)

cov(Ui, Vi) = E(UiVi) − E(Ui)E(Vi)

= E[(Xi + Yi)(Xi − Yi)] − (μx + μy)(μx − μy)

= E(X2i − Y2

i ) − (μ2x − μ2

y)

= [E(X2i ) − μ2

x] − [E(Y2i ) − μ2

y] = σ2 − σ2 = 0.

Page 277: Exercises and Solutions in Biostatistical Theory (2010)

258 Estimation Theory

(c) Given the bivariate normal assumption, it follows that U1, U2, . . . , Un arei.i.d. N[(μx + μy), 2σ2(1 + ρ)] random variables; and, V1, V2, . . . , Vn are i.i.d.N[(μx − μy), 2σ2(1 − ρ)] random variables. Hence,

(n − 1)S2u

2σ2(1 + ρ)∼ χ2

(n−1),(n − 1)S2

v2σ2(1 − ρ)

∼ χ2(n−1),

and S2u and S2

v are independent random variables because of the result in part (b).So,

[(n − 1)S2

u2σ2(1 + ρ)

]/(n − 1)

[(n − 1)S2

v2σ2(1 − ρ)

]/(n − 1)

= (1 − ρ)S2u

(1 + ρ)S2v

∼ f(n−1),(n−1).

(d) If fn−1,n−1,1− α2

is defined such that

pr[Fn−1,n−1 > fn−1,n−1,1−α/2

] = α

2,

then

(1 − α) = pr[fn−1,n−1,α/2 < Fn−1,n−1 < fn−1,n−1,1−α/2

]

= pr

[1

fn−1,n−1,1−α/2<

(1 − ρ)S2u

(1 + ρ)S2v

< fn−1,n−1,1−α/2

]

= pr

[(S2

v

S2u

)1

fn−1,n−1,1−α/2<

2(1 + ρ)

− 1

<

(S2

v

S2u

)fn−1,n−1,1−α/2

]

= pr[(

2W

− 1)

< ρ <

(2V

− 1)]

,

where

V =[

1 +(

S2v

S2u

)1

fn−1,n−1,1−α/2

]

and

W =[

1 +(

S2v

S2u

)fn−1,n−1,1−α/2

].

Page 278: Exercises and Solutions in Biostatistical Theory (2010)

Solutions 259

In our situation, n = 10, s2u = 1, s2

v = 2, α = 0.05, and f9,9,0.975 = 4.03. So,

v = 1 + 21(4.03)−1 = 1.4963,

w = 1 + 21(4.03) = 9.06,

(2w

− 1)

= 29.06

− 1 = −0.7792,

(2v

− 1)

= 21.4963

− 1 = 0.3366.

Hence, the computed exact 95% CI for ρ is

(−0.7792, 0.3366).

Solution 4.23

(a) First, note that

μ′r = E(Yr) =

∫∞γ

yrθγθy−(θ+1) dy = θγθ

[yr−θ

(r − θ)

]∞

γ

= θγr

(θ − r), θ > r.

The method of moments estimators are found by solving for γ and θ using thefollowing two equations:

μ′1 = y = θγ

(θ − 1)

and

μ′2 = 1

n

n∑

i=1

y2i = E(Y2) = θγ2

(θ − 2).

The above equations imply that

μ′2

y2 = θγ2/(θ − 2)

θ2γ2/(θ − 1)2 = (θ − 1)2

θ(θ − 2).

Hence,

(θ − 1)2

θ(θ − 2)− 1 = 1

θ(θ − 2)= μ′

2y2 − 1 = (μ′

2 − y2)

y2

=1n∑n

i=1(yi − y)2

y2 =(

n − 1n

)s2

y2 .

Page 279: Exercises and Solutions in Biostatistical Theory (2010)

260 Estimation Theory

So,

θ(θ − 2) =(

nn − 1

)y2

s2 =(

5049

)(90010

)= 91.8367.

The roots of the quadratic equation θ2 − 2θ − 91.8367 = 0 are

2 ±√

(−2)2 + 4(91.8367)

2,

or −8.6352 and 10.6352. Since θ > 2, we take the positive root and use θmm =10.6352. Finally,

γmm = (θmm − 1)

θmmy =

(9.6352

10.6352

)(30) = 27.1793.

So, γmm = 27.1793.

(b) Now,

F(y; γ, θ) =∫y

γθγθt−(θ+1) dt = γθ

[−t−θ

]y

γ

= γθ[γ−θ − y−θ

]= 1 −

y

, 0 < γ < y < ∞.

So,

fY(1)(y(1); γ, θ) = n

[1 − FY(y(1); γ, θ)

]n−1 fY(y(1); γ, θ)

= n

[(γ

y(1)

)θ]n−1

θγθy(1)−(θ+1)

= nθγnθy(1)−(nθ+1), 0 < γ < y(1) < ∞.

Using this density, we have

E[Y(1)

r] =∫∞γ

y(1)rnθγnθy(1)

−(nθ+1) dy(1) = nθγr

(nθ − r), nθ > r.

So,

E[Y(1)

] = nθγ

(nθ − 1),

and

limn→∞ E

[Y(1)

] = limn→∞

θγ

(θ − 1n )

= θγ

θ= γ,

Page 280: Exercises and Solutions in Biostatistical Theory (2010)

Solutions 261

so that Y(1) is an asymptotically unbiased estimator of γ. Also,

V[Y(1)

] = nθγ2

(nθ − 2)−[

nθγ

(nθ − 1)

]2

= nθγ2[

1(nθ − 2)

− nθ

(nθ − 1)2

]

= nθγ2

(nθ − 1)2(nθ − 2).

Since limn→∞ V[Y(1)

] = 0, and since Y(1) is asymptotically unbiased, it followsthat Y(1) is a consistent estimator of γ.

(c) For 0 < c < 1, we wish to find c such that pr[γ < cY(1)

] = (1 − α). Now,

pr[γ < cY(1)

] = pr[γ

c< Y(1)

]=

∫∞γ/c

nθγnθy(1)−(nθ+1) dy(1)

= γnθ[−y(1)

−nθ]∞γ/c

= γnθ(γ

c

)−nθ = cnθ = (1 − α).

So, c = (1 − α)1/(nθ). Thus, since θ = 3, we have U = cY(1) = (1 − α)1/3nY(1). Whenn = 5, α = 0.10, and y(1) = 20, the computed value of U is u = (1 − 0.10)1/15(20) =19.860. So, the upper 90% CI for γ is (0, 19.860).

Solution 4.24

(a) From order statistics theory, we know that, for r = 1, 2, . . . , n,

fX(r) (x(r)) = nCn−1r−1

[FX(x(r))

]r−1 [1 − FX(x(r))]n−r fX(x(r)), −∞ < x(r) < +∞.

Hence, letting u = FX(x(r)), so that du = fX(x(r)) dx(r), and then appealing toproperties of the beta distribution, we have

E(Ur) =∫∞−∞

[FX(X(r))

]nCn−1

r−1[FX(x(r))

]r−1 [1 − FX(x(r))]n−r fX(x(r)) dx(r)

=∫1

0

n!(r − 1)!(n − r)!ur(1 − u)n−r du

=∫1

0

Γ(n + 1)

Γ(r)Γ(n − r + 1)ur(1 − u)n−r du

=[

Γ(n + 1)

Γ(r)Γ(n − r + 1)

] [Γ(r + 1)Γ(n − r + 1)

Γ(n + 2)

]

= r(n + 1)

.

Page 281: Exercises and Solutions in Biostatistical Theory (2010)

262 Estimation Theory

(b) For any particular value of p, we can pick a pair of values for r and n such that

E(Ur) = E[FX(X(r))

] = r(n + 1)

≈ p.

For these particular choices for r and n, the amount of area under fX(x) to the leftof X(r) is, on average (i.e., on expectation), equal to p.

Thus, for these values of r and n, it is reasonable to use X(r) as the estimator ofthe pth quantile θp; in particular, X(r) is called the pth sample quantile.

Solution 4.25

(a) E(W) = E[X(n) − X(1)] = E[X(n)] − E[X(1)]. Since fX(x; θ) = θxθ−1, FX(x; θ) = xθ,0 < x < 1, θ > 0.So,

fX(1)(x(1); θ) = n

(1 − xθ

(1)

)n−1θxθ−1

(1), 0 < x(1) < 1

and

fX(n)(x(n); θ) = n

[xθ(n)

]n−1θxθ−1

(n), 0 < x(n) < 1.

So,

E[X(n)] =∫1

0x(n)nxθ(n−1)

(n)θxθ−1

(n)dx(n) =

(nθ

nθ + 1

).

And, with u = xθ(1)

and du = θxθ−1(1)

dx(1), we have

E(X(1)) =∫1

0x(1)n

(1 − xθ

(1)

)n−1θxθ−1

(1)dx(1)

= n∫1

0u1/θ(1 − u)n−1 du

= n∫1

0u

(1θ+1)−1

(1 − u)n−1 du

= nΓ (1/θ + 1) Γ(n)

Γ (1/θ + 1 + n)= Γ (1/θ + 1) Γ(n + 1)

Γ (1/θ + 1 + n).

So,

E(W) =(

nθ + 1

)− Γ (1/θ + 1) Γ(n + 1)

Γ (1/θ + 1 + n).

(b) Let A be the event “X(1) < ξ,” and let B be the event “X(n) > ξ.” Then,

pr[X(1) < ξ < X(n)

] = pr{[

X(1) < ξ] ∩ [X(n) > ξ

]}

= pr(A ∩ B) = 1 − pr(A ∩ B) = 1 − pr(A ∪ B)

= 1 − [pr(A) + pr(B) − pr(A ∩ B)].

Page 282: Exercises and Solutions in Biostatistical Theory (2010)

Solutions 263

Now,

pr(A) = pr[X(1) > ξ] = pr[∩ni=1(Xi > ξ)] =

n∏

i=1

pr(Xi > ξ) =(

12

)n;

similarly,

pr(B) = pr[X(n) < ξ] = pr[∩ni=1(Xi < ξ)] =

n∏

i=1

pr(Xi < ξ) =(

12

)n,

and

pr(A ∩ B) = pr[(X(1) > ξ) ∩ (X(n) < ξ)

] = 0.

So,

pr[X(1) < ξ < X(n)

] = 1 − 2(

12

)n= 1 − 1

2n−1 .

So, the confidence coefficient for the interval [X(1), X(n)] varies with n, which isa highly undesirable property.

Solution 4.26. First, if X has a uniform distribution on the interval (0, 1), then, forr ≥ 0, we have

E(Xr) =∫1

0xr(1) dx = (1 + r)−1.

So,

E(G) = E

⎡⎢⎣⎛⎝

n∏

i=1

Xi

⎞⎠

1/n⎤⎥⎦ =

n∏

i=1

E(

X1/ni

)

=n∏

i=1

(1 + 1

n

)−1=[(

1 + 1n

)n]−1,

so that limn→∞ E(G) = e−1.And, similarly,

E(G2) = E

⎡⎢⎣⎛⎝

n∏

i=1

Xi

⎞⎠

2/n⎤⎥⎦ =

n∏

i=1

E(

X2/ni

)

=n∏

i=1

(1 + 2

n

)−1=[(

1 + 2n

)n]−1,

so that limn→∞ E(G2) = e−2.

Page 283: Exercises and Solutions in Biostatistical Theory (2010)

264 Estimation Theory

Thus,

limn→∞ V(G) = lim

n→∞{

E(G2) − [E(G)]2}

= e−2 − (e−1)2 = 0.

Hence, since limn→∞ E(G) = e−1 and limn→∞ V(G) = 0, it follows that the randomvariable G converges in probability to (i.e., is a consistent estimator of) the quantitye−1 = 0.368.

Solution 4.27. We wish to prove that limn→∞ pr{|Xn − 0| > ε} = 0 ∀ε > 0. Sincepr(Y > n) = ∫∞

n e−y dy = e−n, we have

Xn = enI(Y > n) ={

en with probability e−n,0 with probability

(1 − e−n) .

Thus, for any ε > 0,

p{|Xn| > ε} = pr{Xn > ε} ={

0 if en ≤ ε,e−n if en > ε.

So,

limn→∞ pr{|Xn| > ε} ≤ lim

n→∞ e−n = 0, ∀ε > 0.

Note that E(Xn) = 1 and V(Xn) = (en − 1), so that limn→∞ V(Xn) = +∞; hence, adirect proof of convergence in probability is required.

Solution 4.28. Now,

β∗ =∑n

i=1(Ti − T)Yi∑ni=1(Ti − T)2

,

where T = n−1∑ni=1 Ti = n1/n. Also, define A1 = n−1

1∑n1

i=1 Ai, and A0 = n−10∑n

i=(n1+1) Ai.Now, since E(Yi|Ti, Ai) = α + βTi + γAi, it follows that

E(β∗|{Ti}, {Ai}) =∑n

i=1(Ti − T)(α + βTi + γAi)∑ni=1(Ti − T)2

= β + γ∑n

i=1(Ti − n1n )Ai∑n

i=1(Ti − n1n )2 .

Now,

n∑

i=1

(Ti − n1n

)Ai = (1 − n1n

)n1A1 + (0 − n1n

)n0A0 = n0n1n

(A1 − A0).

And,n∑

i=1

(Ti − n1n

)2 = n1(1 − n1n

)2 + n0(−n1n

)2 = n0n1n

.

Page 284: Exercises and Solutions in Biostatistical Theory (2010)

Solutions 265

So,

E(β∗|{Ti}, {Ai}) = β + γ(A1 − A0).

Thus, since γ = 0, a sufficient condition for E(β∗|{Ti}, {Ai}) = β is A1 = A0 (i.e., theaverage age of the n1 subjects in the treatment group is equal to the average age of then0 subjects in the comparison group).

If the n subjects are randomly selected from a large population of subjects andthen randomization is employed in assigning these n subjects to the treatment andcomparison groups, then it follows that E(A0) = E(A1), so that

E(β∗|{Ti}) = E{Ai}[E(β∗|{Ti}, {Ai}] = β + γE(A1 − A0) = β.

So, on expectation, randomization is sufficient to insure that β∗ is an unbiasedestimator of β.

Solution 4.29

(a) For i = 1, 2, . . . , n, note that E(Yi) = πi, V(Yi) = πi(1 − πi), and

L(y; β) =n∏

i=1

πyii (1 − πi)

1−yi ,

so that

ln L(y; β) =n∑

i=1

[yi ln πi + (1 − yi) ln(1 − πi)

].

So, for j = 0, 1, . . . , p, we have

∂ ln L(y; β)

∂βj=

n∑

i=1

[(yiπi

)∂πi∂βj

− (1 − yi)

(1 − πi)

∂πi∂βj

]

=n∑

i=1

∂πi∂βj

[(yi − πi)

πi(1 − πi)

].

And,

∂πi∂βj

=xije

∑pj=0 βjxij

(1 + e

∑pj=0 βjxij

)− xije

2∑p

j=0 βjxij

(1 + e

∑pj=0 βjxij

)2

= xije∑p

j=0 βjxij

(1 + e

∑pj=0 βjxij

)2 = xijπi(1 − πi).

Page 285: Exercises and Solutions in Biostatistical Theory (2010)

266 Estimation Theory

Thus,

∂ ln L(y; β)

∂βj=

n∑

i=1

xijπi(1 − πi)

[(yi − πi)

πi(1 − πi)

]

=n∑

i=1

xij[yi − E(Yi)

] = x′j[y − E(Y)

],

where x′j = (x1j, x2j, . . . , xnj).

Finally, with the [nx(p + 1)] matrix X defined so that its ith row isx′

i = (1, xi1, xi2, . . . , xip), i = 1, 2, . . . , n, and with the [(p + 1) × 1] column vector[∂ ln L(y; β)]/∂β defined as

∂ ln L(y; β)

∂β=[

∂ ln L(y; β)

∂β0,∂ ln L(y; β)

∂β1, . . . ,

∂ ln L(y; β)

∂βp

]′,

we have∂ ln L(y; β)

∂β= X ′ [y − E(Y)

],

which gives the desired result.

(b) Since ∂ ln L(y; β)/∂βj =∑ni=1 xij(yi − πi), it follows that

−∂2 ln L(y; β)

∂βj∂βj′=

n∑

i=1

xij∂πi∂βj′

=n∑

i=1

xijxij′πi(1 − πi)

=n∑

i=1

xijxij′V(Yi),

which does not functionally depend on Y .Since Y has a diagonal covariance matrix of the simple form

V = diag [V(Y1), V(Y2), . . . , V(Yn)]

= diag [π1(1 − π1), π2(1 − π2), . . . , πn(1 − πn)] ,

it follows directly that the observed information matrix I(y; β) equals the expectedinformation matrix I(β), which can be written in matrix notation as

I(β) = X ′VX .

Finally, the estimated covariance matrix V(β) of β is equal to

V(β) = I−1(β) =(

X ′VX)−1

,

Page 286: Exercises and Solutions in Biostatistical Theory (2010)

Solutions 267

where

V = diag [π1(1 − π1), π2(1 − π2), . . . , πn(1 − πn)] ,

and where

πi = e∑p

j=0 βjxij

1 + e∑p

j=0 βjxij, i = 1, 2, . . . , n.

Solution 4.30*. At iteration t, the E-step requires that we evaluate

Q(t)(y; π, μ1, μ2) ≡ Q(t) = EZ

{ln[Lc(y, z; π, μ1, μ2)] ∣∣y, π(t−1), μ(t−1)

1 , μ(t−1)2

},

where the complete-data likelihood is given by

Lc(y, z; π, μ1, μ2) =n∏

i=1

[πpYi(yi; μ1)]zi [(1 − π)pYi

(yi; μ2)](1−zi).

So,

Q(t)(y; π, μ1, μ2) =n∑

i=1

EZi

{zi ln[π(t−1)pYi

(yi; μ(t−1)1 )]

+ (1 − zi) ln[(1 − π(t−1))pYi(yi; μ

(t−1)2 )] ∣∣ yi

}

=n∑

i=1

[(C1i − C2i)E(Zi|yi) + C2i

],

where C1i = ln[π(t−1)pYi(yi; μ

(t−1)1 )] and C2i = ln[(1 − π(t−1))pYi

(yi; μ(t−1)2 )] are con-

stants with respect to Zi.Now,

E(Zi|yi) = pr(Zi = 1|yi)

=[pYi

(yi; μ(t−1)1 )

]pr(Zi = 1)

[pYi

(yi; μ(t−1)1 )

]pr(Zi = 1) +

[pYi

(yi; μ(t−1)2 )

]pr(Zi = 0)

=π(t−1)

[pYi

(yi; μ(t−1)1 )

]

π(t−1)[pYi

(yi; μ(t−1)1 )

]+ (1 − π(t−1))

[pYi

(yi; μ(t−1)2 )

] = Z(t)i , say.

Note that Z(t)i is the tth iteration estimate of the probability that the ith fish was

born in a Pfiesteria-rich site. Also, when t = 1, π(0), μ(0)1 , and μ

(0)2 are the well-chosen

initial values that must be specified to start the EM algorithm iteration process.

Page 287: Exercises and Solutions in Biostatistical Theory (2010)

268 Estimation Theory

Thus,

Q(t)(y; π, μ1, μ2) ≡ Q(t) =n∑

i=1

[(C1i − C2i)Z

(t)i + C2i

]

=n∑

i=1

{Z(t)

i ln[π(t−1)pYi(yi; μ

(t−1)1 )] + (1 − Z(t)

i )

× ln[(1 − π(t−1))pYi(yi; μ

(t−1)2 )]

}.

For the M-step, maximizing Q(t) with respect to π yields

∂Q(t)

∂π= ∂

( n∑

i=1

{Z(t)

i[

ln π + ln pYi(yi; μ1)

]+ (1 − Z(t)i )[

ln(1 − π)

+ ln pYi(yi; μ2)

]})/∂π

⇒∑n

i=1 Z(t)i

π−[n −∑n

i=1 Z(t)i

]

1 − π= 0

⇒ π(t) =∑n

i=1 Z(t)i

n.

Thus, π(t) is the sample average estimated probability that a randomly selected fishwas born in a Pfiesteria-rich site.And,

∂Q(t)

∂μ1= ∂

( n∑

i=1

{Z(t)

i

[ln π + ln pYi

(yi; μ1)

]+ (1 − Z(t)

i )

×[

ln(1 − π) + ln pYi(yi; μ2)

]})/∂μ1

=∂{∑n

i=1 Z(t)i[yi ln μ1 − μ1 − ln yi!

]}

∂μ1

=∑n

i=1 Z(t)i yi

μ1−

n∑

i=1

Z(t)i = 0

⇒ μ(t)1 =

∑ni=1 Z(t)

i yi∑ni=1 Z(t)

i

.

Note that μ(t)1 is a weighted estimate of the average number of ulcerative lesions for

fish born in Pfiesteria-rich sites.Similarly, it can be shown that μ

(t)2 =∑n

i=1(1 − Z(t)i )yi/

∑ni=1(1 − Z(t)

i ) is a weightedestimate of the average number of ulcerative lesions for fish born in Pfiesteria-free sites.

Page 288: Exercises and Solutions in Biostatistical Theory (2010)

Solutions 269

Solution 4.31*

(a) Let IE(x) be the indicator function for the set E, so that IE(x) equals 1 if x ∈ E andIE(x) equals 0 otherwise. Then, letting A= {1, 2, . . . , θ} and letting B={1, 2, . . . , ∞},we have

pX1,X2,...,Xn(x1, x2, . . . , xn; θ) =

n∏

i=1

{θ−1IA(xi)

}

= [(θ−n)IA(x(n))] ·⎡⎣

n∏

i=1

IB(xi)

⎤⎦

= g(u; θ) · h(x1, x2, . . . , xn),

where u = x(n) = max{x1, x2, . . . , xn}. And, given X(n) = x(n), h(x1, x2, . . . , xn)

does not in any way depend on θ, so that X(n) is a sufficient statistic for θ.Also,

E(U∗) = E[(2X1 − 1)] = 2E(X1) − 1

= 2θ∑

x1=1

x1θ−1 − 1 = 2θ

θ∑

x1=1

x1 − 1

= 2θ

[θ(θ + 1)

2

]− 1

= θ.

(b) For notational ease, let X(n) = U. Now, θ = E(U∗|U = u) = E(2X1 − 1|U = u) =2E(X1|U = u) − 1, so we need to evaluate E(X1|U = u). To do so, we need to firstfind

pX1(x1|U = u) = pX1,U(x1, u)

pU(u)= pr[(X1 = x1) ∩ (U = u)]

pU(u).

Now,

pr(U = u) = pr(U ≤ u) − pr(U ≤ u − 1)

= pr

⎡⎣

n⋂

i=1

(Xi ≤ u)

⎤⎦− pr

⎡⎣

n⋂

i=1

(Xi ≤ u − 1)

⎤⎦

=n∏

i=1

pr(Xi ≤ u) −n∏

i=1

pr(Xi ≤ u − 1)

=(u

θ

)n −(

u − 1θ

)n, u = 1, 2, . . . , θ.

Page 289: Exercises and Solutions in Biostatistical Theory (2010)

270 Estimation Theory

And,

pr[(X1 = x1) ∩ (U = u)] =

⎧⎪⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎪⎩

0, x1 > u,

(uθ

)n−1, x1 = u,

[(uθ

)n−1 −(

u − 1θ

)n−1]

, x1 < u.

In the above expression, note that the equality “x1 = u” implies consideration ofthe event {(X1 = u) ∩ [⋂n

i=2(Xi ≤ u)]}, and that the inequality “x1 < u” impliesconsideration of the event

{(X1 = x1) ∩ [max(X2, X2, . . . , Xn) = u]}.

So,

pX1(x1|U = u) =

⎧⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎩

0, x1 > u,

un−1

un − (u − 1)n , x1 = u,

un−1 − (u − 1)n−1

un − (u − 1)n , x1 = 1, 2, . . . u − 1,

which cannot (and does not) depend in any way on θ by the sufficiency principle.So,

E(X1|U = u) =u−1∑

x1=1

x1

[un−1 − (u − 1)n−1

un − (u − 1)n

]+ u

[un−1

un − (u − 1)n

]

=[

(u − 1)u2

][un−1 − (u − 1)n−1

un − (u − 1)n

]+[

un

un − (u − 1)n

]

= un+1 − u(u − 1)n + un

2 [un − (u − 1)n].

Thus,

2E(X1|U = u) − 1 = un+1 − u(u − 1)n + un

un − (u − 1)n − 1

= un+1 − (u − 1)n+1

un − (u − 1)n ,

so that the MVUE of θ is

θ = Un+1 − (U − 1)n+1

Un − (U − 1)n .

Page 290: Exercises and Solutions in Biostatistical Theory (2010)

Solutions 271

As a simple check, when n = 1, so that U = X1, we obtain

θ = U2 − (U − 1)2

U − (U − 1)= 2U − 1 = 2X1 − 1, as desired .

Note that

E(θ) =θ∑

u=1

θ pr(U = u) =θ∑

u=1

[un+1 − (u − 1)n+1

un − (u − 1)n

][un − (u − 1)n

θn

]

= 1θn

θ∑

u=1

[un+1 − (u − 1)n+1

]

= 1θn {(1 − 0) + (2n+1 − 1) + (3n+1 − 2n+1) + · · · + [θn+1 − (θ − 1)n+1]}

= θn+1

θn = θ.

As a numerical example, if n = 5 and u = 2, then θ = (26 − 16)/(25 − 15) = 2.0323.So, one disadvantage of θ is that it does not necessarily take positive integer values,even though the parameter θ is a positive integer.

Solution 4.32*

(a) First, note that

FX(x) =∫x

θ(1)dx = (x − θ), 0 < θ < x < (θ + 1) < +∞.

Hence, from order statistics theory, it follows directly that

fX(1)(x(1)) = n

[1 − FX(x(1))

]n−1 fX(x(1))

= n[1 − (x(1) − θ)

]n−1 , 0 < θ < x(1) < (θ + 1) < +∞.

Then, with u = (1 + θ) − x(1), so that du = −dx(1), we have, for r a non-negativeinteger,

E[Xr

(1)

]=

∫θ+1

θxr(1)n[(1 + θ) − x(1)]n−1 dx(1)

=∫1

0[(1 + θ) − u]rnun−1du

= n∫1

0

⎡⎣

r∑

j=0

Crj (1 + θ)j(−u)r−j

⎤⎦un−1 du

Page 291: Exercises and Solutions in Biostatistical Theory (2010)

272 Estimation Theory

= nr∑

j=0

Crj (1 + θ)j(−1)r−j

∫1

0un+r−j−1 du

= nr∑

j=0

Crj (1 + θ)j(−1)r−j(n + r − j)−1.

When r = 1, we obtain E(X(1)) = θ + 1/(n + 1). Also, for r = 2, we obtain

E(X2(1)) = n

[1

(n + 2)+ 2(1 + θ)(−1)

(1

n + 1

)+ (1 + θ)2

(1n

)],

so that

V(X(1)) = E(X2(1)) − [E(X(1))]2 = n

(n + 1)2(n + 2).

By symmetry,

E(X(n)) = (θ + 1) − 1(n + 1)

and V(X(n)) = n(n + 1)2(n + 2)

.

Or, more directly, one can use fX(n)(x(n)) = n

[x(n) − θ

]n−1 ,0 < θ < x(n) < (θ + 1) < +∞, to show that

E(X(n)r) = n

r∑

j=0

Crj θ

r−j(n + j)−1.

Thus,

E(θ1) = 12

[E(X(1)) + E(X(n)) − 1

]

= 12

{[θ + 1

(n + 1)

]+[(θ + 1) − 1

(n + 1)

]− 1}

= θ.

And,

E(θ2) = 1(n − 1)

[nE(X(1)) − E(X(n))

]

= 1(n − 1)

{n[θ + 1

(n + 1)

]−[(θ + 1) − 1

(n + 1)

]}= θ.

To find the variances of the estimators θ1 and θ2, we need to find cov[X(1), X(n)].If we let Yi = (Xi − θ), i = 1, 2, . . . , n, then fYi (yi) = 1, 0 < yi < 1. Also, Y(1) =min{Y1, Y2, . . . , Yn} = (X(1) − θ) and Y(n) = max{Y1, Y2, . . . , Yn} = (X(n) − θ), sothat cov[X(1), X(n)]=cov[Y(1), Y(n)].

Page 292: Exercises and Solutions in Biostatistical Theory (2010)

Solutions 273

Now, since fY(1),Y(n)(y(1), y(n)) = n(n − 1)(y(n) − y(1))

n−2, 0 < y(1) < y(n) < 1, wehave

E(Y(1)Y(n)) =∫1

0

∫1

y(1)

[y(1)y(n)]n(n − 1)(y(n) − y(1))n−2 dy(n) dy(1).

So, using the relationship w = (y(n) − y(1)), so that dw = dy(n), and appealing toproperties of the beta distribution, we obtain

E(Y(1)Y(n)) = n(n − 1)

∫1

0

∫1−y(1)

0y(1)(w + y(1))w

n−2 dw dy(1)

= n(n − 1)

∫1

0

∫1−y(1)

0

[y(1)w

n−1 + y2(1)w

n−2]

dw dy(1)

= n(n − 1)

∫1

0

[y(1)(1 − y(1))

n

n+

y2(1)

(1 − y(1))n−1

(n − 1)

]dy(1)

= n(n − 1)

[Γ(2)Γ(n + 1)/Γ(n + 3)

n+ Γ(3)Γ(n)/Γ(n + 3)

(n − 1)

]

= n(n − 1)

[1

n(n + 1)(n + 2)+ 2

(n − 1)n(n + 1)(n + 2)

]

= 1(n + 2)

.

Finally,

cov(X(1), X(n)) = cov(Y(1), Y(n)) = E(Y(1)Y(n)) − E(Y(1))E(Y(n))

= 1(n + 2)

−(

1n + 1

)(n

n + 1

)= 1

(n + 1)2(n + 2).

So,

V(θ1) =(

12

)2 [V(X(1)) + V(X(n)) + 2cov(X(1), X(n))

]

= 14

[n

(n + 1)2(n + 2)+ n

(n + 1)2(n + 2)+ 2

(1

(n + 1)2(n + 2)

)]

= 12(n + 1)(n + 2)

.

And,

V(θ2) = 1(n − 1)2

[n2V(X(1)) + V(X(n)) − 2ncov(X(1), X(n))

]

= 1(n − 1)2

[n3

(n + 1)2(n + 2)+ n

(n + 1)2(n + 2)− 2n

(n + 1)2(n + 2)

]

= n(n − 1)(n + 1)(n + 2)

= n(n2 − 1)(n + 2)

.

Page 293: Exercises and Solutions in Biostatistical Theory (2010)

274 Estimation Theory

Thus, V(θ1) < V(θ2), n > 1.

(b) Now,

V(W) = c21σ2 + c2

2σ2 + 2c1c2σ12 = [c21 + (1 − c1)2]σ2

+ 2c1(1 − c1)σ12.

So,

∂V(W)

∂c1= [2c1 − 2(1 − c1)] + 2(1 − 2c1)σ12 = 0

gives

2(2c1 − 1)(σ2 − σ12) = 0.

Thus, if c1 = c2 = 12 , then V(W) is minimized as long as σ2 > σ12. Note that these

conditions are met by the estimator θ1, but not by the estimator θ2. Also, anotherdrawback associated with the estimator θ2 is that it can take a negative value, eventhough θ > 0.

(c) Let IA(x) be the indicator function for the set A, so that IA(x) equals 1 if x ∈ A andIA(x) equals 0 otherwise. Then, with A equal to the open interval (θ, θ + 1), thejoint distribution of X1, X2, . . . , Xn can be written in the form

(1)nn∏

i=1

I(θ,θ+1)(xi) = (1)n {I(θ,θ+1)[x(1)]} {

I(θ,θ+1)[x(n)]}

,

since 0 < θ < x(1) ≤ xi ≤ x(n) < (θ + 1) < +∞, i = 1, 2, . . . , n. Hence, by theFactorization Theorem, X(1) and X(n) are jointly sufficient for θ.However, X(1) and X(n) do not constitute a set of complete sufficient statistics for θ

since E[g(X(1), X(n))

] = 0 for all θ, 0 < θ < +∞, where

g(X(1), X(n)) = X(1) − X(n) +(

n − 1n + 1

).

This finding raises the interesting question about whether or not the estimator θ1could be the MVUE of θ, even though it is not a function of complete sufficientstatistics. For more general discussion about this issue, see Bondesson (1983).

Solution 4.33*

(a) Under the independence assumption,

E

⎛⎝

XyyXynXny

⎞⎠ =

⎛⎝

Nπ1π2Nπ1(1 − π2)

N(1 − π1)π2

⎞⎠ .

Page 294: Exercises and Solutions in Biostatistical Theory (2010)

Solutions 275

Hence, the method-of-moments estimator of N is obtained by solving for N usingthe three equations

Xyy = Nπ1π2, (4.1)

Xyn = Nπ1(1 − π2), (4.2)

Xny = (1 − π1)π2. (4.3)

The operations [(Equation 4.1) + (Equation 4.2)] and [(Equation 4.1) + (Equation4.3)] give

(Xyy + Xyn) = Nπ1, (4.4)

(Xyy + Xny) = Nπ2. (4.5)

Finally, the operation [(Equation 4.4) × (Equation 4.5)] / (Equation 4.1) produces

N = (Xyy + Xyn)(Xyy + Xny)

Xyy.

Note that N does not necessarily take integer values. Also, E[N] and V[N] areundefined since Xyy = 0 occurs with non-zero probability. When xyy = 12, 000,xyn = 6000, and xny = 8000, we have

N = [(12, 000) + (6000)][(12, 000) + (8000)](12, 000)

= 30, 000.

(b) Since

odds(E1|E2) = pr(E1|E2)

1 − pr(E1|E2)= πyy/(πyy + πny)

πny/(πyy + πny)= πyy

πny

and

odds(E1|E2) = pr(E1|E2)

1 − pr(E1|E2)= πyn/(πyn + πnn)

πnn/(πyn + πnn)= πyn

πnn,

the assumption that

odds(E1 | E2)

odds(E1 | E2)= k

implies that

πyy/πny

πyn/πnn= k

or, equivalently,

πyy(1 − πyy − πyn − πny) = kπynπny.

Page 295: Exercises and Solutions in Biostatistical Theory (2010)

276 Estimation Theory

So, a method-of-moments estimator of N is obtained by simultaneously solvingthe four equations

Xyy = Nπyy, (4.6)

Xyn = Nπyn, (4.7)

Xny = Nπny, (4.8)

πyy(1 − πyy − πyn − πny) = kπynπny. (4.9)

Equations 4.6, 4.7, and 4.8 imply that πyy = Xyy/N, πyn = Xyn/N, and πny =Xny/N. Substituting these expressions into Equation 4.9 yields

(Xyy

N

)[(N − Xyy − Xyn − Xny)

N

]= k

(Xyn

N

)(Xny

N

),

giving

N(k) = X2yy + XyyXyn + XyyXny + kXynXny

Xyy.

When xyy = 12, 000, xyn = 6000, and xny = 8000, we have

N(1/2) = (12, 000)2 + (12, 000)(6000) + (12, 000)(8000) + (1/2)(6000)(8000)

12, 000

= 28, 000,

N(2) = (12, 000)2 + (12, 000)(6000) + (12, 000)(8000) + (2)(6000)(8000)

12, 000

= 34, 000,

N(4) = (12, 000)2 + (12, 000)(6000) + (12, 000)(8000) + (4)(6000)(8000)

12, 000

= 42, 000.

Apparently, the estimator N, which assumes independence, has a tendency toover-estimate N when k < 1 and to under-estimate N when k > 1.

Solution 4.34*

(a) Let π(xi) ≡ πi = eα+βxi/(1 + eα+βxi ). The likelihood function L is equal to

L =n∏

i=1

πyii (1 − πi)

1−yi ,

so that

ln L =n∑

i=1

yi ln(πi) + (1 − yi) ln(1 − πi).

Page 296: Exercises and Solutions in Biostatistical Theory (2010)

Solutions 277

By the chain rule,

∂ ln L∂α

=n∑

i=1

∂ ln L∂πi

· ∂πi∂α

and∂ ln L

∂β=

n∑

i=1

∂ ln L∂πi

· ∂πi∂β

.

Now,

∂πi∂α

=∂

[eα+βxi

(1 + eα+βxi )

]

∂α= eα+βxi

(1 + eα+βxi )2 = πi(1 − πi)

and

∂πi∂β

=∂

[eα+βxi

(1 + eα+βxi )

]

∂β= xieα+βxi

(1 + eα+βxi )2 = xiπi(1 − πi).

Thus,

∂ ln L∂α

=n∑

i=1

∂ ln L∂πi

· ∂πi∂α

=n∑

i=1

[yiπi

− 1 − yi1 − πi

]πi(1 − πi)

=n∑

i=1

(yi − πi) = 0;

⇒n∑

i=1

yi =n∑

i=1

πi =n∑

i=1

eα+βxi

(1 + eα+βxi )

=n0∑

i=1

(1 + eα)+

n∑

i=n0+1

eα+β

(1 + eα+β)

⇒n∑

i=1

yi = n0eα

(1 + eα)+ n1

eα+β

(1 + eα+β).

Similarly,

∂ ln L∂β

=n∑

i=1

∂ ln L∂πi

· ∂πi∂β

=n∑

i=1

[yiπi

− 1 − yi1 − πi

]xiπi(1 − πi)

=n∑

i=1

xi(yi − πi) =n∑

i=n0+1

(yi − πi) = 0;

⇒n∑

i=n0+1

yi =n∑

i=n0+1

πi =n∑

i=n0+1

eα+β

(1 + eα+β)= n1

eα+β

(1 + eα+β).

Page 297: Exercises and Solutions in Biostatistical Theory (2010)

278 Estimation Theory

Via subtraction, we obtain

n∑

i=1

yi −n∑

i=n0+1

yi = n0eα

(1 + eα)

⇒n0∑

i=1

yi = n0eα

(1 + eα)

⇒ α = ln(

p01 − p0

),

where p0 = n−10∑n0

i=1 yi is the sample proportion of overweight infants receivinghome care.

Then, it follows directly that

n∑

i=n0+1

yi = n1eα+β

1 + eα+β

⇒ β = ln(

p11 − p1

)− α

= ln[

p11 − p1

]− ln

[p0

1 − p0

]

= ln[

p1/(1 − p1)

p0/(1 − p0)

],

where p1 = n−11∑n

i=n0+1 yi is the sample proportion of overweight infants in daycare.

Thus, α is the sample log odds of being overweight for infants receiving homecare, while β is the sample log odds ratio comparing the estimated odds of beingoverweight for infants in day care to the estimated odds of being overweight forinfants receiving home care. The estimators α and β make intuitive sense, sincethey are the sample counterparts of the population parameters α and β.

(b)

−∂2 ln L∂α2 = −∂

∑ni=1(yi − πi)

∂πi· ∂πi

∂α

=n∑

i=1

πi(1 − πi) =n∑

i=1

eα+βxi

(1 + eα+βxi )2

=n0∑

i=1

(1 + eα)2 +n∑

i=n0+1

eα+β

(1 + eα+β)2

= n0π0(1 − π0) + n1π1(1 − π1),

where π0 = eα/(1 + eα) and π1 = eα+β/(1 + eα+β).

Page 298: Exercises and Solutions in Biostatistical Theory (2010)

Solutions 279

Also,

−∂2 ln L∂β2 = −∂

∑ni=1 xi(yi − πi)

∂πi· ∂πi

∂β

=n∑

i=1

x2i πi(1 − πi)

=n∑

i=n0+1

eα+β

(1 + eα+β)2 = n1π1(1 − π1).

Finally,

−∂2 ln L∂α∂β

= −∂∑n

i=1(yi − πi)

∂πi· ∂πi

∂β

=n∑

i=1

xiπi(1 − πi)

= n1π1(1 − π1) = −∂2 ln L∂β∂α

.

So, with y′ = (y1, y2, . . . , yn), it follows that

I(y; α, β) = I(α, β)

=[

n0π0(1 − π0) + n1π1(1 − π1) n1π1(1 − π1)

n1π1(1 − π1) n1π1(1 − π1)

].

Hence, using either observed or expected information, the large-samplevariance–covariance matrix for α and β is equal to

I−1(α, β) =

⎧⎪⎨⎪⎩

[n0

(1+eα)2

]−1 −[n0

(1+eα)2

]−1

−[n0

(1+eα)2

]−1 [n0

(1+eα)2

]−1 +[n1

eα+β

(1+eα+β)2

]−1

⎫⎪⎬⎪⎭

.

(c) The estimated large-sample 95% CI for α is

α ± 1.96

√√√√[

n0eα

(1 + eα)2

]−1

,

and the estimated large-sample 95% CI for β is

β ± 1.96

√√√√[

n0eα

(1 + eα)2

]−1

+[

n1eα+β

(1 + eα+β)2

]−1

.

Page 299: Exercises and Solutions in Biostatistical Theory (2010)

280 Estimation Theory

For the given data, α = −1.52 and β = 0.47, so that the corresponding large-sample 95% CIs for α and β are (−2.03, −1.01) and (−0.21, 1.15), respectively. Sincethe CI for β includes the value 0, there is no statistical evidence using these datathat infants placed in day care are more likely to be overweight than are infantsreceiving home care.

Solution 4.35*. First, the parameter of interest is

γ = E(T) = Ex[E(T|X = x)] = Ex[(θx)−1] = 1θ

E(X−1) = 1θ(β − 1)

, β > 1.

Since information on the random variable X is unavailable, the marginal distributionfT(t) of the random variable T must be used to make ML-based inferences about theparameters θ, β, and γ. In particular, the observed latency periods t1, t2, . . . , tn can thenbe considered to be the realizations of a random sample T1, T2, . . . , Tn of size n from

fT(t) =∫∞

0fT,X(t, x) dx =

∫∞0

fT(t|X = x)fX(x) dx

=∫∞

0θxe−θxt · xβ−1e−x

Γ(β)dx

= θ

Γ(β)

∫∞0

xβe−(1+θt)x dx

= θ

Γ(β)Γ(β + 1)(1 + θt)−(β+1)

= θβ(1 + θt)−(β+1), t > 0, θ > 0, β > 1.

To produce a large-sample CI for the unknown parameter γ = [θ(β − 1)]−1, we willemploy the delta method. First, the appropriate likelihood function L is

L =n∏

i=1

[θβ(1 + θti)

−(β+1)]

= θnβnn∏

i=1

(1 + θti)−(β+1),

so that

ln L = n ln θ + n ln β − (β + 1)

n∑

i=1

ln(1 + θti).

So, we have

∂ ln L∂θ

= nθ

− (β + 1)

n∑

i=1

ti(1 + θti)

,

and

∂ ln L∂β

= nβ

−n∑

i=1

ln(1 + θti),

Page 300: Exercises and Solutions in Biostatistical Theory (2010)

Solutions 281

so that

∂2 ln L∂θ2 = −n

θ2 + (β + 1)

n∑

i=1

t2i

(1 + θti)2 ,

∂2 ln L∂β2 = −n

β2 ,

and∂2 ln L∂θ∂β

= ∂2lnL∂β∂θ

= −n∑

i=1

ti(1 + θti)

.

Now, using integration-by-parts with u = t, du = dt, dv = θ(1 + θt)−(β+2) dt, andv = −(1 + θt)−(β+1)/(β + 1), we have

E[

T(1 + θT)

]=

∫∞0

t(1 + θt)

θβ(1 + θt)−(β+1)dt = 1θ(β + 1)

.

And, using integration-by-parts with u = t2, du = 2t dt, dv = θ(1 + θt)−(β+3) dt, andv = −(1 + θt)−(β+2)/(β + 2), we have

E

[T2

(1 + θT)2

]=

∫∞0

t2

(1 + θt)2 θβ(1 + θt)−(β+1) dt = 2θ2(β + 1)(β + 2)

.

Thus, it follows directly that the expected information matrix I is equal to

I =

⎡⎢⎢⎢⎢⎣

−E

(∂2lnL∂θ2

)−E

(∂2lnL∂θ∂β

)

−E

(∂2lnL∂θ∂β

)−E

(∂2lnL∂β2

)

⎤⎥⎥⎥⎥⎦

=

⎡⎢⎢⎣

βn(β + 2)θ2

nθ(β+1)

nθ(β + 1)

nβ2

⎤⎥⎥⎦ ,

and hence that

I−1 =

⎡⎢⎢⎢⎣

θ2(β + 1)2(β + 2)

βn−θβ(β + 1)(β + 2)

n

−θβ(β + 1)(β + 2)

nβ2(β + 1)2

n

⎤⎥⎥⎥⎦ .

Now, with

δ′ =[

∂γ

∂θ,∂γ

∂β

]

=[ −1

θ2(β − 1),

−1θ(β − 1)2

],

Page 301: Exercises and Solutions in Biostatistical Theory (2010)

282 Estimation Theory

use of the delta method gives, for large n,

V(γ) ≈ δ′I−1δ

= (β + 1)

nθ2(β − 1)2

[(β + 1)(β + 2)

β− 2β(β + 2)

(β − 1)+ β2(β + 1)

(β − 1)2

].

Then, with n = 300, θ = 0.32, and β = 1.50, we obtain γ = [0.32(1.50 − 1)]−1 = 6.25and V(γ) = 2.387, so that the 95% large-sample CI for γ is 6.25 ± 1.96

√2.387 =

6.25 ± 3.03, or (3.22,9.28).

Solution 4.36*

(a) For i = 1, 2, 3, let1ni = (1, 1, . . . , 1)′

denote the (ni × 1) column vector of ones, and let

0n2 = (0, 0, . . . , 0)′

denote the (n2 × 1) column vector of zeros.Then, the (n × 3) design matrix X can be written as

X =⎡⎣

1n1 −1n1 1n11n2 0n2 0n21n3 1n3 1n3

⎤⎦ ,

so that

X ′X =⎡⎣

n (n3 − n1) (n1 + n3)

(n3 − n1) (n1 + n3) (n3 − n1)

(n1 + n3) (n3 − n1) (n1 + n3)

⎤⎦

= n

⎡⎣

1 b ab a ba b a

⎤⎦ ,

where a = (π1 + π3) and b = (π3 − π1).

(b) From standard unweighted least-squares theory, we know that V(β2) = c22σ2,where c22 is the last diagonal entry in the matrix (X ′X)−1 = ((cll′)), l =0, 1, 2 and l′ = 0, 1, 2. We can use the theory of cofactors to find an explicitexpression for c22. In particular, the cofactor needed for determining c22 is equalto

(−1)(2+2)n2∣∣∣∣

1 bb a

∣∣∣∣ = n2(a − b2),

so that c22 = n2(a − b2)/|X ′X|. And,

|X ′X| = n3(a2 + ab2 + ab2 − a3 − b2 − ab2) = n3(1 − a)(a2 − b2)

= n3[1 − (π1 + π3)][(π1 + π3)2 − (π3 − π1)2] = 4n3π1π2π3.

Page 302: Exercises and Solutions in Biostatistical Theory (2010)

Solutions 283

Finally,

V(β2) = c22σ2 = n2(a − b2)σ2

|X ′X|

={

[(π1 + π3) − (π3 − π1)2]4nπ1π2π3

}σ2.

(c) We wish to find values for π1, π2, and π3 that minimize V(β2) subject to theconstraint (π1 + π2 + π3) = 1. Instead of considering V(β2), we can equivalentlyconsider the quantity

Q = [(π1 + π3) − (π3 − π1)2]π1π2π3

,

which can be rewritten as

Q = (π1 + π3) − [(π1 + π3)2 − 4π1π3]π1π2π3

= (π1 + π3)[1 − (π1 + π3)]π1π2π3

+ 4π1π3π1π2π3

= (π1 + π3)

π1π3+ 4

(1 − π1 − π3),

since π2 = (1 − π1 − π3).Now,

∂Q∂π1

= 0 gives (1 − π1 − π3)2 = 4π21

and∂Q∂π3

= 0 gives (1 − π1 − π3)2 = 4π23.

Since π1, π2, and π3 are positive, these two equations imply that π1 = π3. Then,from the equation (1 − π1 − π3)2 = 4π2

1, we obtain (1 − 2π1)2 = 4π21, or π1 = 1

4 .Thus, the values for π1, π2, andπ3 that minimize V(β2) subject to the constraint∑3

i=1 πi = 1 are

π1 = 14

, π2 = 12

and π3 = 14

.

Note that the Lagrange Multiplier method can also be used to obtain this answer.

Solution 4.37*

(a) For the CIs Xi ± kSi/√

n and Xi′ ± kSi′/√

n not to have at least one value in common(i.e., not to overlap), it is required that either(

Xi + kSi√

n

)<

(Xi′ − k

Si′√n

), which implies (Xi − Xi′) < −k

(Si√

n+ Si′√

n

)

Page 303: Exercises and Solutions in Biostatistical Theory (2010)

284 Estimation Theory

or

(Xi − k

Si√n

)>

(Xi′ + k

Si′√n

), which implies (Xi − Xi′) > k

(Si√

n+ Si′√

n

).

Thus, these two inequalities together can be written succinctly as the event Eii′ ={|Xi − Xi′ | > k(Si/

√n + Si′/

√n)}

, which gives the desired result.

(b) First, note that

(Si√

n+ Si′√

n

)2=(

Si√n

)2+(

Si′√n

)2+ 2

(Si√

n

)(Si′√

n

)≥(

Si√n

)2+(

Si′√n

)2,

so that(Si/

√n + Si′/

√n) ≥

√(Si/

√n)2 + (Si′/

√n)2. So, using the result from part

(a), we have

π∗ii′ = pr

[|Xi − Xi′ | > k

(Si√

n+ Si′√

n

) ∣∣Cp

]

≤ pr

⎡⎣|Xi − Xi′ | > k

√(Si√

n

)2+(

Si′√n

)2∣∣Cp

⎤⎦

= pr

⎡⎢⎢⎣

|Xi − Xi′ |√(Si√

n

)2 +(

Si′√n

)2> k∣∣Cp

⎤⎥⎥⎦ .

Now,

Z = (Xi − Xi′)√2σ2/n

∼ N(0, 1),

U = (n − 1)(S2i + S2

i′)

σ2 ∼ χ22(n−1),

and Z and U are independent random variables.So, since

Z√U/2(n − 1)

= (Xi − Xi′)√(Si√

n

)2 +(

Si′√n

)2∼ t2(n−1),

and since

pr[T2(n−1) > k] ≤ α

2,

it follows that

π∗ii′ ≤ pr

[|T2(n−1)| > k∣∣Cp] ≤ α.

Page 304: Exercises and Solutions in Biostatistical Theory (2010)

Solutions 285

(c) For p = 3 and given condition C3, let θ3 be the conditional probability that thereare no values common to all three CIs, or equivalently, that at least two of the threeCIs have no values in common. Hence,

θ3 = pr(E12 ∪ E13 ∪ E23|C3)

≤ pr(E12|C3) + pr(E13|C3) + pr(E23|C3) = 3π∗ii′ .

Finally, from part (b), since π∗ii′ ≤ α, we obtain θ3 ≤ 3α.

Note that θ3 is the probability of incorrectly deciding statistically that the threepopulation means are not equal to the same value when, in fact, they are equalto the same value; that is, θ3 is analogous to a Type I error rate when testing thenull hypothesis H0 that all three population means are equal to the same valueversus the alternative hypothesis H1 that they are all not equal to the same value.Since 3α > α, this CI-based algorithm can lead to an inflated Type I error rate. Forexample, when α = 0.05, then this Type I error rate could theoretically be as highas 0.15. Given the stated assumptions, one-way analysis of variance would be anappropriate method for testing H0 versus H1.

Solution 4.38*

(a) We wish to choose θ0 and θ1 to minimize

Q =n∑

i=1

[Yi − (θ0 + θ1xi)]2 .

Now, the equation

∂Q∂θ0

= −2n∑

i=1

[Yi − (θ0 + θ1xi)] = 0

implies that

θ0 = Y − θ1x,

where

Y = 1n

n∑

i=1

Yi and x = 1n

n∑

i=1

xi.

And,

∂Q∂θ1

= −2n∑

i=1

xi [Yi − (θ0 + θ1xi)] = 0

implies that

n∑

i=1

xiYi = θ0

n∑

i=1

xi + θ1

n∑

i=1

x2i = (Y − θ1x)

n∑

i=1

xi + θ1

n∑

i=1

x2i .

Page 305: Exercises and Solutions in Biostatistical Theory (2010)

286 Estimation Theory

Hence,

θ1 =∑n

i=1 xiYi − Y∑n

i=1 xi∑ni=1 x2

i − x∑n

i=1 xi

=∑n

i=1(xi − x)(Yi − Y)∑ni=1(xi − x)2

=∑n

i=1(xi − x)Yi∑ni=1(xi − x)2 .

Now,

E(θ1) =∑n

i=1(xi − x)(θ0 + θ1xi)∑ni=1(xi − x)2

= θ0

∑ni=1(xi − x)∑n

i=1(xi − x)2 + θ1

∑ni=1(xi − x)xi∑ni=1(xi − x)2

= θ1.

And,

V(θ1) =∑n

i=1(xi − x)2(θ0 + θ1xi)[∑ni=1(xi − x)2

]2

= θ0∑ni=1(xi − x)2 + θ1

∑ni=1 xi(xi − x)2

[∑ni=1(xi − x)2

]2 .

Also,

E(θ0) = E(Y) − xE(θ1)

= 1n

n∑

i=1

(θ0 + θ1xi) − xθ1

= θ0 + θ1x − xθ1 = θ0.

And, since

θ0 = 1n

n∑

i=1

Yi − x∑n

i=1(xi − x)Yi∑ni=1(xi − x)2 =

n∑

i=1

ciYi,

where

ci =[

1n

− x(xi − x)∑ni=1(xi − x)2

],

Page 306: Exercises and Solutions in Biostatistical Theory (2010)

Solutions 287

and where the {Yi} are mutually independent,

V(θ0) =n∑

i=1

c2i V(Yi) =

n∑

i=1

c2i (θ0 + θ1xi)

= θ0

n∑

i=1

c2i + θ1

n∑

i=1

c2i xi,

where ci is defined as above.

(b) The likelihood function L has the structure

L =n∏

i=1

{(θ0 + θ1xi)

yi e−(θ0+θ1xi)/yi!}

=⎧⎨⎩

n∏

i=1

(θ0 + θ1xi)yi

⎫⎬⎭{

e−∑ni=1(θ0+θ1xi)

}⎧⎨⎩

n∏

i=1

(yi!)−1

⎫⎬⎭ ;

lnL =n∑

i=1

yi ln(θ0 + θ1xi) −n∑

i=1

(θ0 + θ1xi) +n∑

i=1

ln[(yi!)−1

];

∂ ln L∂θ0

=n∑

i=1

yi(θ0 + θ1xi)

− n;

∂2 ln L∂θ2

0= −

n∑

i=1

yi

(θ0 + θ1xi)2 ;

−E

[∂2 ln L

∂θ20

]=

n∑

i=1

(θ0 + θ1xi)

(θ0 + θ1xi)2 =

n∑

i=1

(θ0 + θ1xi)−1 = A;

∂2 ln L∂θ0∂θ1

= −n∑

i=1

xiyi

(θ0 + θ1xi)2 ;

−E

[∂2 ln L∂θ0∂θ1

]=

n∑

i=1

xi(θ0 + θ1xi)−1 = B;

∂ ln L∂θ1

=n∑

i=1

xiyi(θ0 + θ1xi)

−n∑

i=1

xi;

∂2 ln L∂θ2

1= −

n∑

i=1

x2i yi

(θ0 + θ1xi)2 ;

−E

[∂2 ln L

∂θ21

]=

n∑

i=1

x2i (θ0 + θ1xi)

−1 = C.

Page 307: Exercises and Solutions in Biostatistical Theory (2010)

288 Estimation Theory

So, the expected information matrix is

I(θ0, θ1) =[

A BB C

].

For the available data, we compute I(θ0, θ1) using A, B, and C as follows:

A = 25{[2 + 4(1)]−1 + [2 + 4(2)]−1 + [2 + 4(3)]−1 + [2 + 4(4)]−1

}= 9.8425,

B = 25[

16

+ 210

+ 314

+ 418

]= 20.0800,

C = 25[

16

+ 410

+ 914

+ 1618

]= 52.4625.

So,

I(θ0, θ1) =[

9.8425 20.080020.0800 52.4625

]=[

A BB C

],

and hence

I−1(θ0, θ1) = (AC − B2)−1

[C −B

−B A

]

= [(9.8425)(52.4625) − (20.0800)2]−1[

52.4625 −20.0800−20.0800 9.8425

]

=[

0.4636 −0.1775−0.1775 0.0870

].

Now,

ψ = θ0 + (2.5)θ1,

with

V(ψ) = V(θ0) + (2.5)2V(θ1) + 2(1)(2.5)cov(θ0, θ1).

Since

ψ − ψ√V(ψ)

∼ N(0, 1)

for large n, our 95% CI for ψ is

ψ ± 1.96√

V(ψ) = [2 + (2.5)(4)] ± 1.96 [0.4636 + 6.250(0.0870) + 5(−0.1775)]1/2

= 12 ± 1.96√

0.1199

= 12 ± 0.6787

= (11.3213, 12.6787).

Page 308: Exercises and Solutions in Biostatistical Theory (2010)

Solutions 289

Solution 4.39*

(a)

Di = (Xi − X) = Xi − 1n

n∑

i=1

Xi =(

1 − 1n

)Xi − 1

n

j =i

Xj.

Since Xi ∼ N(μ, σ2)∀i and the {Xi} are mutually independent, then Di is itselfnormal since Di is a linear combination of mutually independent normal variates.Also,

E(Di) = E(Xi − X) = E(Xi) − E(X) = μ − μ = 0,

and

V(Di) =(

1 − 1n

)2σ2 + (n − 1)

n2 σ2

=[

(n − 1)2

n2 + (n − 1)

n2

]σ2 = (n − 1)

n2 [(n − 1) + 1]σ2

=(

n − 1n

)σ2.

So,

Di ∼ N[

0,(

n − 1n

)σ2]

.

(b) Now,

cov(Di, Dj) = E(DiDj)

= E[(Xi − X)(Xj − X)]= E(XiXj) − E(XiX) − E(XjX) + E(X2)

= μ2 − E(XiX) − E(XjX) +(

σ2

n+ μ2

).

Now,

E(XiX) = E

⎡⎣Xi

⎛⎝ 1

n

n∑

i=1

Xi

⎞⎠⎤⎦

= 1n

E[X1Xi + X2Xi + · · · + Xi−1Xi + X2i + Xi+1Xi + . . . + XnXi]

= 1n

[(n − 1)μ2 + (μ2 + σ2)]

= 1n

(nμ2 + σ2)

= μ2 + σ2

n.

Page 309: Exercises and Solutions in Biostatistical Theory (2010)

290 Estimation Theory

An identical argument shows that E(XjX) = μ2 + σ2/n. So

cov(Di, Dj) = μ2 −(

μ2 + σ2

n

)−(

μ2 + σ2

n

)+(

σ2

n+ μ2

)= −σ2

n.

Finally,

corr(Di, Dj) = cov(Di, Dj)√V(Di) · V(Dj)

= −σ2/n√(n−1

n

)σ2 ·

(n−1

n

)σ2

= −1(n − 1)

.

Clearly,lim

n→∞[corr(Di, Dj)] = 0.

Since V(X) → 0 as n → +∞, corr(Di, Dj) → corr(Xi, Xj) = 0 as n → +∞.

(c)

R = S2x

S2y

= (n − 1)S2x/σ2

(n − 1)S2y/σ2

= Ux

Uy,

where Ux ∼ χ2n−1, Uy ∼ χ2

n−1, and Ux and Uy are independent.So,

fUx ,Uy (ux, uy) = u[(n−1)/2]−1x e−ux/2

Γ[(n − 1)/2] · 2[(n−1)/2] · u[(n−1)/2]−1y e−uy/2

Γ[(n − 1)/2] · 2[(n−1)/2] ,

ux > 0, uy > 0.

Let R = Ux/Uy and S = Uy, so that Ux = RS and Uy = S; and, R > 0, S > 0.Also,

J =

∣∣∣∣∣∣∣∣

∂Ux

∂R∂Ux

∂S∂Uy

∂R

∂Uy

∂S

∣∣∣∣∣∣∣∣=∣∣∣∣S R0 1

∣∣∣∣ = S.

So,

fR,S(r, s) = fUx ,Uy (rs, s) × | J|

= (rs)[(n−3)/2]e−rs/2

Γ[(n − 1)/2] · 2[(n−1)/2] · s[(n−3)/2]e−s/2

Γ[(n − 1)/2] · 2[(n−1)/2] · s

= r[(n−3)/2]s(n−2)e−(1+r)s/2

[Γ[(n − 1)/2]]2 · 2(n−1), r > 0, s > 0.

Page 310: Exercises and Solutions in Biostatistical Theory (2010)

Solutions 291

So,

fR(r) = r[(n−3)/2][Γ[(n − 1)/2]]2 · 2(n−1)

∫∞0

s(n−1)−1e−s/[ 2(1+r) ] ds

= r[(n−3)/2][Γ[(n − 1)/2]]2 · 2(n−1)

· Γ(n − 1)

[2

(1 + r)

](n−1)

= [Γ(n − 1)][Γ

(n − 1

2

)]−2r[(n−3)/2](1 + r)−(n−1), r > 0.

(d) Clearly,

E(μ1) = 12[E(X) + E(Y)] = (μ + μ)

2= μ.

And,

E(μ2) = E

[XS2

y + YS2x

S2x + S2

y

]= E

[X + RY(1 + R)

],

where

R = S2x/S2

y ∼ F(n−1),(n−1).

Now, since X, Y, S2x, and S2

y are mutually independent random variables,

E(μ2) = Er{E(μ2|R = r)} = Er

{E(X|R = r) + rE(Y|R = r)

(1 + r)

}

= Er

{E(X) + rE(Y)

(1 + r)

}

= Er

{μ + rμ(1 + r)

}

= Er

{μ(1 + r)(1 + r)

}

= Er(μ) = μ.

Page 311: Exercises and Solutions in Biostatistical Theory (2010)

292 Estimation Theory

(e)

V(μ1) = V[

12(X + Y)

]= 1

4[V(X) + V(Y)]

= 14

[σ2

n+ σ2

n

]= σ2

2n.

V(μ2) = Vr{E(μ2|R = r)} + Er{V(μ2)|R = r}

= Vr(μ) + Er

{V

[X + RY(1 + R)

∣∣∣∣R = r

]}

= 0 + Er

{V(X|R = r) + r2V(Y|R = r)

(1 + r)2

}

= Er

{V(X) + r2V(Y)

(1 + r)2

}

= Er

{(σ2/n) + r2(σ2/n)

(1 + r)2

}

= σ2

nEr

[(1 + r2)

(1 + r)2

].

So, to find V(μ2), we need to find E[(1 + R2)/(1 + R)2

], where fR(r) is given in

part (c). So,

E

[(1 + R2)

(1 + R)2

]= E

[1 − 2

R(1 + R)2

]= 1 − 2E

[R

(1 + R)2

].

Now, with u = r/(1 + r),

E[

R(1 + R)2

]=

∫∞0

r(1 + r)2

Γ(n − 1)

[Γ[(n − 1)/2]]2 r[(n−3)/2](1 + r)−(n−1) dr

= Γ(n − 1)

[Γ[(n − 1)/2]]2∫∞

0

r[(n−1)/2](1 + r)−(n−1)

(1 + r)2 dr

= Γ(n − 1)

[Γ(n − 1/2)]2∫1

0u[(n+1)/2]−1(1 − u)[(n+1)/2]−1 du

= Γ(n − 1)

[Γ[(n − 1)/2]]2 · [Γ[(n + 1)/2]]2Γ(n + 1)

= [(n − 1)/2]2n(n − 1)

= (n − 1)

4n.

Page 312: Exercises and Solutions in Biostatistical Theory (2010)

Solutions 293

Finally,

V(μ2) = σ2

n

{1 − 2E

[R

(1 + R)2

]}

= σ2

n

{1 − 2

[(n − 1)

4n

]}

= σ2

n

[1 − (n − 1)

2n

]

= σ2

n

(2n − n + 1

2n

)

=(

n + 12n2

)σ2.

Since V(μ1) = (1/2n) σ2 < V(μ2) =[(n + 1)/2n2

]σ2 when n > 1, we prefer μ1

(a result which actually follows from the theory of sufficiency).

Solution 4.40*

(a) Now,E(μ1) = E(k1Y) = k1μ,

so that k1 = 1. Then,

V(μ1) = V(Y) = (θ2/12)

n= θ2

12n.

(b) Since

fY(n)(y(n); θ) = n

[y(n)

θ

]n−1θ−1 = nθ−nyn−1

(n), 0 < y(n) < θ,

we have, for r ≥ 0,

E[Yr(n)] = nθ−n

∫θ

0y(n+r)−1(n)

dy(n) =(

nn + r

)θr .

So,

E[Y(n)] =(

nn + 1

)θ.

Thus,

E(μ2) = E[k2Y(n)]

= k2

(n

n + 1

= 2k2

(n

n + 1

)μ,

Page 313: Exercises and Solutions in Biostatistical Theory (2010)

294 Estimation Theory

so that k2 = (n + 1)/2n. Since

V[Y(n)] = E[Y2(n)] − {E[Y(n)]

}2

=(

nn + 2

)θ2 −

(n

n + 1

)2θ2

= nθ2

(n + 1)2(n + 2),

it follows that

V(μ2) = V(k2Y(n))

= k22V(Y(n))

= (n + 1)2

4n2 · nθ2

(n + 1)2(n + 2)

= θ2

4n(n + 2).

(c) Since

fY(1),Y(n)(y(1), y(n); θ) = n(n − 1)θ−n(y(n) − y(1))

n−2, 0 < y(1) < y(n) < θ,

we have, for r ≥ 0 and s ≥ 0,

E[Yr

(1)Ys(n)

]= n(n − 1)θ−n

∫θ

0

∫y(n)

0yr(1)y

s(n)(y(n) − y(1))

n−2 dy(1) dy(n)

= n(n − 1)θ−n∫θ

0ys(n)

[∫y(n)

0yr(1)(y(n) − y(1))

n−2 dy(1)

]dy(n)

= n(n − 1)θ−n∫θ

0ys(n)

[∫1

0(y(n)u)r(y(n) − y(n)u)n−2y(n) du

]dy(n)

= n(n − 1)θ−n∫θ

0y(n+r+s)−1(n)

[∫1

0u(r+1)−1(1 − u)(n−1)−1 du

]dy(n)

= n(n − 1)

(n + r + s)· Γ(r + 1)Γ(n − 1)

Γ(n + r)θ(r+s).

Since

E[Y(1)] = n(n − 1)

(n + 1 + 0)· Γ(1 + 1)Γ(n − 1)

Γ(n + 1)θ(1+0) = θ

(n + 1),

Page 314: Exercises and Solutions in Biostatistical Theory (2010)

Solutions 295

it follows that

E(μ3) = k32

[E(Y(1)) + E(Y(n))]

= k32

(n + 1)+ nθ

(n + 1)

]

= k3

2

)= k3μ,

so that k3 = 1. Since V[Y(1)] = V[Y(n)], by symmetry, and since

cov [Y(1), Y(n)] = n(n − 1)

(n + 1 + 1)· Γ(1 + 1)Γ(n − 1)

Γ(n + 1)θ(1+1)

−[

θ

(n + 1)

] [nθ

(n + 1)

]

= θ2

(n + 2)− nθ2

(n + 1)2

= θ2

(n + 1)2(n + 2),

it follows that

V(μ3) = V(

12

[Y(1) + Y(n)

])

= 14[V(Y(1)) + V(Y(n)) + 2cov(Y(1), Y(n))]

= 12

[nθ2

(n + 1)2(n + 2)+ θ2

(n + 1)2(n + 2)

]

= θ2

2(n + 1)(n + 2).

(d) We have shown that

V(μ1) = θ2

12n,

V(μ2) = θ2

4n(n + 2),

V(μ3) = θ2

2(n + 1)(n + 2).

Now,

4n(n + 2) − 12n = 4n2 − 4n = 4n(n − 1) > 0 for n > 1,

4n(n + 2) − 2(n + 1)(n + 2) = 2(n + 2)(n − 1) > 0 for n > 1,

Page 315: Exercises and Solutions in Biostatistical Theory (2010)

296 Estimation Theory

and

2(n + 1)(n + 2) − 12n = 2(n − 1)(n − 2) > 0 for n > 2.

So, for n > 2, we haveV(μ2) < V(μ3) < V(μ1).

Now,

limn→∞

V(μ2)

V(μ1)= lim

n→∞V(μ3)

V(μ1)= 0,

so that μ1 has an asymptotic efficiency of 0 relative to μ2 and μ3.Since

limn→∞

V(μ2)

V(μ3)= lim

n→∞2(n + 1)(n + 2)

4n(n + 2)= 1

2,

this implies that μ3 is asymptotically 50% as efficient as μ2. That μ2 is theestimator of choice should not be surprising since Y(n) is a (complete) sufficientstatistic for θ (and hence for μ), and so μ2 is the minimum variance unbiasedestimator (MVUE) of μ.

Solution 4.41*

(a) The likelihood function L has the structure

L =n∏

i=1

{[θxi (1 − θ)1−xi

] [ 1μ(xi)

]e−yi/μ(xi)

}

= θ∑n

i=1 xi (1 − θ)n−∑ni=1 xi e−∑n

i=1(α+βxi)e−∑ni=1 e−(α+βxi)yi .

Since∑n

i=1 xi = n1, we have

ln L = n1 ln θ + n0 ln(1 − θ) − nα − βn1 −n∑

i=1

e−(α+βxi)yi.

So,

∂ ln L∂θ

= n1θ

− n0(1 − θ)

= 0

⇒ n1(1 − θ) − n0θ = 0

⇒ θ = n1/(n0 + n1) = n1/n = x.

And,

∂ ln L∂α

= −n +n∑

i=1

e−(α+βxi)yi = 0

⇒ −n + e−α[e−βn1y1 + n0y0

]= 0. (4.10)

Page 316: Exercises and Solutions in Biostatistical Theory (2010)

Solutions 297

Also,

∂ ln L∂β

= −n1 +n∑

i=1

e−(α+βxi)xiyi = 0

⇒ −n1 + e−αe−βn1y1 = 0 ⇒ e−(α+β)y1 = 1. (4.11)

So, using (Equation 4.11) in (Equation 4.10), we obtain

−n + n1 + e−αn0y0 = 0 ⇒ e−α = n − n1n0y0

= n0n0y0

= 1y0

so that

−α = ln(1/y0

), or α = ln(y0).

Finally, since α = ln(y0), it follows from (Equation 4.11) that e−(α+β)y1 = 1, or

eβ = y1/eα, or β = ln(y1/y0

).

So, in summary,

θ = x, α = ln(y0), and β = ln(

y1y0

).

(b) Now,

∂2 ln L∂θ2 = −n1

θ2 − n0(1 − θ)2 ,

so that

−E

(∂2 ln L

∂θ2

)= E(n1)

θ2 + E(n0)

(1 − θ)2

= nθ

θ2 + n(1 − θ)

(1 − θ)2 = nθ

+ n(1 − θ)

= nθ(1 − θ)

.

Clearly,

∂2 ln L∂θ∂α

= ∂2lnL∂θ∂β

= 0.

Now, with X = (X1, X2, . . . , Xn) and x = (x1, x2, . . . , xn), we have

∂2 ln L∂α2 = −e−α

n∑

i=1

e−βxi yi,

Page 317: Exercises and Solutions in Biostatistical Theory (2010)

298 Estimation Theory

so that

−E

(∂2 ln L

∂α2

)= −Ex

{E

[∂2 ln L

∂α2

∣∣∣∣X = x

]}

= e−αEx

⎡⎣

n∑

i=1

e−βxi eα+βxi

⎤⎦ = n.

And,

∂2 ln L∂β2 = −e−α

n∑

i=1

e−βxi x2i yi,

so that

−E

(∂2 ln L

∂β2

)= −Ex

{E

[∂2 ln L

∂β2

∣∣∣∣X = x

]}

= e−αEx

⎡⎣

n∑

i=1

e−βxi x2i eα+βxi

⎤⎦

=n∑

i=1

Ex(

x2i

)=

n∑

i=1

[θ(1 − θ) + θ2

]= nθ.

Finally,

∂2 ln L∂α∂β

= −e−αn∑

i=1

e−(α+βxi)xiyi,

so that

−E

(∂2 ln L∂α∂β

)= −Ex

{E

[(∂2 ln L∂α∂β

) ∣∣∣∣X = x

]}

= e−αEx

⎡⎣

n∑

i=1

e−βxi xieα+βxi

⎤⎦ =

n∑

i=1

Ex(xi) = nθ.

So,

I =⎡⎢⎣

n nθ 0nθ nθ 0

0 0n

θ(1 − θ)

⎤⎥⎦ ,

Page 318: Exercises and Solutions in Biostatistical Theory (2010)

Solutions 299

and

I−1 =

⎡⎢⎢⎢⎢⎢⎣

1n(1 − θ)

−1n(1 − θ)

0

−1n(1 − θ)

1nθ(1 − θ)

0

0 0θ(1 − θ)

n

⎤⎥⎥⎥⎥⎥⎦

.

So,

V(α) ≈ 1

n(1 − θ), V(β)≈ 1

nθ(1 − θ), cov(α, β)≈ −1

n(1 − θ),

cov(α, θ) = cov(β, θ) = 0, and V(θ) = θ(1 − θ)

n.

(c) The parameter of interest is β, and statistical evidence that β = 0 suggests thatthe true mean time to death differs between adult males with advanced malignantmelanoma depending on whether or not these adult males have a family historyof skin cancer. An appropriate large-sample 95% CI for β is (for large n):

β ± 1.96

√1

nθ(1 − θ).

With n = 50, θ = 0.60 and β = 0.40, we obtain

0.40 ± 1.96

√1

50(0.60)(1 − 0.60)or (−0.1658, 0.9658).

So, these data provide no evidence that β = 0, since 0 is contained in the computedCI.

Solution 4.42*

(a) Note that

θx = B0 + B1x = 1N

N∑

i=1

Yxi + xN∑

i=1

⎛⎝ xi∑N

j=1 x2j

⎞⎠Yxi

=N∑

i=1

⎡⎣ 1

N+ xxi∑N

j=1 x2j

⎤⎦Yxi =

N∑

i=1

ciYxi ,

where

ci =⎡⎣ 1

N+ xxi∑N

j=1 x2j

⎤⎦ , i = 1, 2, . . . , N.

Page 319: Exercises and Solutions in Biostatistical Theory (2010)

300 Estimation Theory

So,

E(θx) =N∑

i=1

ciE(Yxi ) =N∑

i=1

ci(β0 + β1xi + β2x2i )

= β0

N∑

i=1

ci + β1

N∑

i=1

cixi + β2

N∑

i=1

cix2i .

Now,N∑

i=1

ci =N∑

i=1

⎡⎣ 1

N+ xxi∑N

j=1 x2j

⎤⎦ = 1 +

⎛⎝ x∑N

j=1 x2j

⎞⎠

N∑

i=1

xi = 1,

since μ1 = 0. Also,

N∑

i=1

cixi =N∑

i=1

⎡⎣ 1

N+ xxi∑N

j=1 x2j

⎤⎦ xi = x,

since μ1 = 0. Finally,

N∑

i=1

cix2i =

⎡⎣ 1

N+ xxi∑N

j=1 x2j

⎤⎦ x2

i = μ2,

since μ3 = 0. So, E(θx) = β0 + β1x + β2μ2, where μ2 = N−1∑Ni=1 x2

i .And, finally,

V(θx) =N∑

i=1

c2i V(Yxi ) = σ2

N∑

i=1

⎡⎣ 1

N+ xxi∑N

j=1 x2j

⎤⎦

2

= σ2N∑

i=1

⎡⎢⎣ 1

N2 + 2xxi

N∑N

j=1 x2j

+ x2x2i(∑N

j=1 x2j

)2

⎤⎥⎦

= σ2

N+ 0 + σ2x2

⎡⎢⎣

N∑

i=1

x2i

/⎛⎝

N∑

i=1

x2i

⎞⎠

2⎤⎥⎦ = σ2

N

[1 + x2

μ2

].

Finally, θx is normally distributed since it is a linear combination of mutuallyindependent and normally distributed random variables.

(b) Since E(θx) = β0 + β1x + β2μ2, it follows that E(θx) − θx = β2(μ2 − x2). So,

Q =∫1

−1

[β2(μ2 − x2)

]2dx = β2

2

∫1

−1(μ2

2 − 2μ2x2 + x4) dx

= 2β22

[(μ2 − 1

3

)2+ 4

45

],

Page 320: Exercises and Solutions in Biostatistical Theory (2010)

Solutions 301

which is minimized when μ2 = 13 . So, an optimal design for minimizing the inte-

grated squared bias Q chooses the temperature spacings x1, x2, . . . , xN such thatμ2 = (1/N)

∑Ni=1 x2

i = 13 , given that μ1 = μ3 = 0. Note that μ1 = μ3 = 0 will be

satisfied if the xi are chosen to be symmetric about zero. For example, when N = 4,we can choose x1 = −x4 and x2 = −x3. Then, to satisfy μ2 = 2(x2

1 + x22) = 1/3

we can choose x1 to be any number in the interval (0,√

16 ) and then choose

x2 =√

16 − x2

1. For example, x1 =√

218 , x2 =

√118 , x3 = −

√118 , x4 = −

√218 .

Solution 4.43*

(a)

cov(Yi0, Yi1) = E(Yi0Yi1) − E(Yi0)E(Yi1)

= Eαi [E(Yi0Yi1|αi)] − Eαi [E(Yi0|αi)]Eαi [E(Yi1|αi)].

Now,

E(Yij|αi) = Lije(αi+βDij+

∑pl=1 γlCil).

So, since αi ∼ N(0, σ2α), it follows from moment generating function theory that

E(etαi ) = et2σ2α/2, −∞ < t < +∞.

Thus,

E(Yij) = Lije(0.50σ2

α+βDij+∑p

l=1 γlCil).

And, using the assumption that Yi0 and Yi1 are independent given αi fixed, wehave

E(Yi0Yi1) = Eαi [E(Yi0Yi1|αi)] = Eαi [E(Yi0|αi)E(Yi1|αi)]

= Eαi

[(Li0e(αi+

∑pl=1 γlCil))(Li1e(αi+β+∑p

l=1 γlCil))

]

= Li0Li1e(2σ2α+β+2

∑pl=1 γlCil).

Thus,

cov(Yi0, Yi1) = Li0Li1e(σ2α+β+2

∑pl=1 γlCil)

(eσ2

α − 1)

.

The inclusion of the random effect αi in the proposed statistical model serves twopurposes: (1) to allow for families to have different (baseline) tendencies towardchild abuse, and (2) to account for the anticipated positive correlation betweenYi0 and Yi1 for the ith family [in particular, note that cov(Yi0, Yi1) = 0 only whenσ2α = 0 and is positive when σ2

α > 0].

Page 321: Exercises and Solutions in Biostatistical Theory (2010)

302 Estimation Theory

(b) Now, pYi1(yi1|Yi = yi, αi) = pr(Yi1 = yi1|Yi = yi, αi)

= pr[(Yi1 = yi1) ∩ (Yi = yi)|αi]pr(Yi = yi|αi)

= pr[(Yi1 = yi1) ∩ (Yi0 = yi − yi1)|αi]pr(Yi = yi|αi)

= pr(Yi1 = yi1|αi)pr[Yi0 = (yi − yi1)|αi]pr(Yi = yi|αi)

=[

(Li1λi1)yi1 e−Li1λi1yi1!

] [(Li0λi0)

(yi−yi1)e−Li0λi0(yi−yi1)!

][

(Li0λi0+Li1λi1)yi e−(Li0λi0+Li1λi1)

yi!]

= Cyiyi1

(Li1θ

Li0 + Li1θ

)yi1(

Li0Li0 + Li1θ

)(yi−yi1)

, yi1 = 0, 1, . . . , yi,

where

θ = λi1λi0

= e(αi+β+∑pl=1 γlCil)

e(αi+∑p

l=1 γlCil)= eβ.

(c) Since

L =n∏

i=1

Cyiyi1

(Li1θ

Li0 + Li1θ

)yi1(

Li0Li0 + Li1θ

)(yi−yi1)

,

it follows that

ln(L) ∝n∑

i=1

[yi1 ln(θ) − yi ln(Li0 + Li1θ)].

Thus,

∂ ln(L)

∂θ=

n∑

i=1

[yi1θ

− yiLi1(Li0 + Li1θ)

]= 0,

so that the MLE θ of θ satisfies the equation

θ

n∑

i=1

(yiLi1

Li0 + Li1θ

)=

n∑

i=1

yi1.

(d) Since

∂2 ln(L)

∂θ2 =n∑

i=1

[−yi1θ2 + yiL2

i1(Li0 + Li1θ)2

],

Page 322: Exercises and Solutions in Biostatistical Theory (2010)

Solutions 303

it follows that

−E

(∂2 ln(L)

∂θ2 |{yi}, {αi})

=n∑

i=1

⎡⎣yi

(Li1θ

Li0+Li1θ

)

θ2 − yiL2i1

(Li0 + Li1θ)2

⎤⎦

= 1θ

n∑

i=1

yiLi0Li1(Li0 + Li1θ)2 .

Thus, a large-sample 95% CI for the rate ratio parameter θ is

θ ± 1.96√

θ

⎡⎣

n∑

i=1

yiLi0Li1

(Li0 + Li1θ)2

⎤⎦

−1/2

.

For an important application of this methodology, see Gibbs et al. (2007).

Solution 4.44*. With X∗ = (X∗1 , X∗

2 , . . . , X∗n) and x∗ = (x∗

1, x∗2, . . . , x∗

n), we have

E(β∗1|X∗ = x∗) =

∑ni=1(x∗

i − x∗)E(Yi|X∗ = x∗)∑ni=1(x∗

i − x∗)2 .

Now, using the nondifferential measurement error assumption, we have, fori = 1, 2, . . . , n,

E(Yi|X∗ = x∗) = E(Yi|X∗i = x∗

i ) = EXi|X∗i =x∗

i

[E(Yi|X∗

i = x∗i , Xi = xi)

]

= EXi|X∗i =x∗

i[E(Yi|Xi = xi)] = EXi|X∗

i =x∗i(β0 + β1xi)

= β0 + β1E(Xi|X∗i = x∗

i ).

And, since (Xi, X∗i = Xi + Ui) ∼ BVN[μx, μx, σ2

x, (σ2x + σ2

u), ρ], where

ρ = cov(Xi, X∗i )√

V(Xi)V(X∗i )

= cov(Xi, Xi + Ui)√σ2

x(σ2x + σ2

u)

= V(Xi)√σ2

x(σ2x + σ2

u)

= σ2x√

σ2x(σ2

x + σ2u)

= 1√1 + σ2

u

σ2x

= 1√1 + λ

,

Page 323: Exercises and Solutions in Biostatistical Theory (2010)

304 Estimation Theory

it follows from bivariate normal distribution theory that

E(Xi|X∗i = x∗

i ) = μx + ρ

√V(Xi)

V(X∗i )

(x∗i − μx)

= μx +(

1√1 + λ

)√σ2

x

(σ2x + σ2

u)(x∗

i − μx)

= μx +(

11 + λ

)(x∗

i − μx).

Hence, we have

E(Yi|X∗ = x∗) = β0 + β1

[μx +

(1

1 + λ

)(x∗

i − μx)

]= β∗

0 + β∗1x∗

i ,

where

β∗0 = β0 + β1μx

1 + λ

)and β∗

1 = β1(1 + λ)

.

Finally, we obtain

E(β∗1|X∗ = x∗) =

∑ni=1(x∗

i − x∗)(β∗0 + β∗

1x∗i )∑n

i=1(x∗i − x∗)2

= β∗1 = β1

(1 + λ).

So, since 0 < λ < ∞, |β∗1| < |β1|, indicating a somewhat common detrimental mea-

surement error effect called attenuation. Because the predictor variable is measuredwith error, the estimator β∗

1 tends, on average, to underestimate the true slope β1 (i.e.,the estimator β∗

1 is said to be attenuated). As λ = σ2u/σ2

x increases in value, the amountof attenuation increases.

For more complicated measurement error scenarios, attenuation should not alwaysbe the anticipated measurement error effect; in particular, an estimator could actuallyhave a tendency to overestimate a particular parameter of interest [e.g., see Kupper(1984)].

Solution 4.45*

(a) First,

cov(X, X∗|C) = cov(α0 + α1X∗ + δ′C + U, X∗|C) = α1V(X∗|C).

And,V(X|C) = α2

1V(X∗|C) + σ2u.

So,

corr(X, X∗|C) = cov(X, X∗|C)√V(X|C)V(X∗|C)

Page 324: Exercises and Solutions in Biostatistical Theory (2010)

Solutions 305

= α1V(X∗|C)√[α2

1V(X∗|C) + σ2u]V(X∗|C)

= 1√1 + σ2

u/[α21V(X∗|C)]

< 1.

(b) With X = α0 + α1X∗ + δ′C + U and given X∗ and C, it follows directly that X has anormal distribution with E(X|X∗, C) = α0 + α1X∗ + δ′C and V(X|X∗, C) = V(U) =σ2

u.

(c) Now, appealing to the nondifferential error assumption, we have

pr(Y = 1|X∗, C) = E(Y|X∗, C) = EX|X∗,C[E(Y|X, X∗, C)

]

= EX|X∗,C [E(Y|X, C)] = EX|X∗,C[pr(Y = 1|X, C)

]

= EX|X∗,C

[e(β0+β1X+γ′C)

]= e(β0+γ′C)EX|X∗,C

(eβ1X

).

Thus, from moment generating theory and the results in part (b), we have

pr(Y = 1|X∗, C) = e(β0+γ′C)e[β1(α0+α1X∗+δ′C)]+(β21σ

2u)/2

= e(θ0+θ1X∗+ξ′C),

where θ0 = (β0 + β1α0 + (β21σ2

u)/2), θ1 = β1α1, and ξ′ = (γ′ + β1δ′).Since θ1/β1 = α1, it follows that 0 < θ1 ≤ β1 when 0 < α1 ≤ 1 and that θ1 > β1

when α1 > 1. So, the use of X∗ instead of X will result in biased estimation of theparameter β1. In particular, if 0 < α1 < 1, the tendency will be to underestimateβ1; and, if α1 > 1, the tendency will be to overestimate β1.

Solution 4.46*

(a) First, E(X) = δ and V(X) = δ(1 − δ). So,

E(X∗) = E[(X∗)2] = E[E(X∗|X = x)

] = E[pr(X∗ = 1|X = x)

]

= E(πx1) = π11δ + π01(1 − δ),

and

V(X∗) = E(X∗)[1 − E(X∗)

].

Also,

E(XX∗) = E[E(XX∗|X = x)

] = E[(x)pr(X∗ = 1|X = x)

]

= E [(x)πx1] = (1)π11δ = π11δ.

Page 325: Exercises and Solutions in Biostatistical Theory (2010)

306 Estimation Theory

Thus,

corr(X, X∗) = cov(X, X∗)√V(X)V(X∗)

= E(XX∗) − E(X)E(X∗)√V(X)V(X∗)

= π11δ − δE(X∗)√δ(1 − δ)E(X∗) [1 − E(X∗)]

.

So, corr(X, X∗) = 1 when π11 = 1 and π01 = 0 (or, equivalently, when π10 = 0and π00 = 1), since then E(X∗) = δ. When π11 = pr(X∗ = 1|X = 1) < 1 and/orπ01 = pr(X∗ = 1|X = 0) > 0, so that corr(X, X∗) < 1, then X∗ is an imperfectsurrogate for X.

(b) Now, using the nondifferential error assumption given earlier, we have

μ∗x∗ = pr(Y = 1|X∗ = x∗) =

1∑

x=0

pr[(Y = 1) ∩ (X = x)|X∗ = x∗]

=1∑

x=0

pr(Y = 1|(X = x) ∩ (X∗ = x∗)]pr(X = x|X∗ = x∗)

=1∑

x=0

pr(Y = 1|X = x)pr(X = x|X∗ = x∗)

=1∑

x=0

μxγx∗x,

where γx∗x = pr(X = x|X∗ = x∗).So, since (γ00 + γ01) = (γ10 + γ11) = 1, we have

θ∗ = (μ∗1 − μ∗

0) =1∑

x=0

μxγ1x −1∑

x=0

μxγ0x

= (μ0γ10 + μ1γ11) − (μ0γ00 + μ1γ01)

= μ0(1 − γ11) + μ1γ11 − μ0(1 − γ01) − μ1γ01

= (μ1 − μ0)(γ11 − γ01)

= θ(γ11 − γ01),

so that |θ∗| ≤ |θ| since |γ11 − γ01| ≤ 1.Hence, under the assumption of nondifferential error, the use of X∗ instead of

X tends, on average, to lead to underestimation of the risk difference parameter θ,a phenomenon known as attenuation.

For more detailed discussion about the effects of misclassification error on thevalidity of analyses of epidemiologic data, see Gustafson (2004) and Kleinbaumet al. (1982).

Page 326: Exercises and Solutions in Biostatistical Theory (2010)

5Hypothesis Testing Theory

5.1 Concepts and Notation

5.1.1 Basic Principles

5.1.1.1 Simple and Composite Hypotheses

A statistical hypothesis is an assertion about the distribution of one or morerandom variables. If the statistical hypothesis completely specifies the dis-tribution (i.e., the hypothesis assigns numerical values to all unknownpopulation parameters), then it is called a simple hypothesis; otherwise, itis called a composite hypothesis.

5.1.1.2 Null and Alternative Hypotheses

In the typical statistical hypothesis testing situation, there are two hypothesesof interest: the null hypothesis (denoted H0) and the alternative hypothesis(denoted H1). The statistical objective is to use the information in a samplefrom the distribution under study to make a decision about whether H0 orH1 is more likely to be true (i.e., is more likely to represent the true “state ofnature”).

5.1.1.3 Statistical Tests

Astatistical test of H0 versus H1 consists of a rule which, when operationalizedusing the available information in a sample, leads to a decision either to reject,or not to reject, H0 in favor of H1. It is important to point out that a decision notto reject H0 does not imply that H0 is, in fact, true; in particular, the decisionnot to reject H0 is often due to data inadequacies (e.g., too small a sample size,erroneous and/or missing information, etc.)

5.1.1.4 Type I and Type II Errors

For any statistical test, there are two possible decision errors that can be made.A “Type I” error occurs when the decision is made to reject H0 in favor of

307

Page 327: Exercises and Solutions in Biostatistical Theory (2010)

308 Hypothesis Testing Theory

H1 when, in fact, H0 is true; the probability of a Type I error is denoted asα = pr(test rejects H0|H0 true). A “Type II” error occurs when the decision ismade not to reject H0 when, in fact, H0 is false and H1 is true; the probabilityof a Type II error is denoted as β = pr(test does not reject H0|H0 false).

5.1.1.5 Power

The power of a statistical test is the probability of rejecting H0 when, in fact,H0 is false and H1 is true; in particular,

POWER = pr(test rejects H0|H0 false) = (1 − β).

Type I error rateα is controllable and is typically assigned a value satisfying theinequality 0 < α ≤ 0.10. For a given value of α, Type II error rate β, and hencethe power (1 − β), will generally vary as a function of the values of populationparameters allowable under a composite alternative hypothesis H1.

In general, for a specified value of α, the power of any reasonable statisticaltesting procedure should increase as the sample size increases. Power is typi-cally used as a very important criterion for choosing among several statisticaltesting procedures in any given situation.

5.1.1.6 Test Statistics and Rejection Regions

A statistical test of H0 versus H1 is typically carried out by using a test statistic.A test statistic is a random variable with the following properties: (i) its dis-tribution, assuming the null hypothesis H0 is true, is known either exactly orto a close approximation (i.e., for large sample sizes); (ii) its numerical valuecan be computed using the information in a sample; and, (iii) its computednumerical value leads to a decision either to reject, or not to reject, H0 in favorof H1. More specifically, for a given statistical test and associated test statistic,the set of all possible numerical values of the test statistic under H0 is dividedinto two disjoint subsets (or “regions”), the rejection region R and the non-rejection region R. The statistical test decision rule is then defined as follows:if the computed numerical value of the test statistic is in the rejection regionR, then reject H0 in favor of H1; otherwise, do not reject H0. The rejectionregion R is chosen so that, under H0, the probability that the test statistic fallsin the rejection region R is equal to (or approximately equal to) α (in whichcase the rejection region and the associated statistical test are both said to beof “size” α).

Almost all popular statistical testing procedures use test statistics that,under H0, follow (either exactly or approximately) well-tabulated distri-butions such as the standard normal distribution, the t-distribution, thechi-squared distribution, and the f-distribution.

Page 328: Exercises and Solutions in Biostatistical Theory (2010)

Concepts and Notation 309

5.1.1.7 P-Values

The P-value for a statistical test is the probability of observing a test statisticvalue at least as rare as the value actually observed under the assumptionthat the null hypothesis H0 is true. Thus, for a size α test, when the decisionis made to reject H0, then the P-value is less than α; and, when the decision ismade not to reject H0, then the P-value is greater than α.

5.1.2 Most Powerful (MP) and Uniformly Most Powerful (UMP) Tests

Let X = (X1, X2, . . . , Xn) be a random row vector with likelihood function (orjoint distribution) L(x; θ) depending on a row vector θ = (θ1, θ2, . . . , θp) of punknown parameters. LetRdenote some subset of all the possible realizationsx = (x1, x2, . . . , xn) of the random vector X . Then, R is the most powerful (orMP) rejection region of size α for testing the simple null hypothesis H0 : θ = θ0versus the simple alternative hypothesis H1 : θ = θ1 if, for every subset A ofall possible realizations x of X for which pr(X ∈ A|H0 : θ = θ0) = α, we have

pr(X ∈ R|H0 : θ = θ0) = α

and

pr(X ∈ R|H1 : θ = θ1) ≥ pr(X ∈ A|H1 : θ = θ1).

Given L(x; θ), the determination of the structure of the MP rejection regionR of size α for testing H0 : θ = θ0 versus H1 : θ = θ1 can be made using theNeyman–Pearson Lemma (Neyman and Pearson, 1933).

Neyman–Pearson Lemma

Let X = (X1, X2, . . . , Xn) be a random row vector with likelihood function(or joint distribution) of known form L(x; θ) that depends on a row vectorθ = (θ1, θ2, . . . , θp) of p unknown parameters. Let R be a subset of all possi-ble realizations x = (x1, x2, . . . , xn) of X . Then, R is the most powerful (MP)rejection region of size α (and the associated test using R is the most pow-erful test of size α) for testing the simple null hypothesis H0 : θ = θ0 versusthe simple alternative hypothesis H1 : θ = θ1 if, for some k > 0, the followingthree conditions are satisfied:

L(x; θ0)

L(x; θ1)< k for every x ∈ R;

L(x; θ0)

L(x; θ1)≥ k for every x ∈ R;

Page 329: Exercises and Solutions in Biostatistical Theory (2010)

310 Hypothesis Testing Theory

and

pr(X ∈ R|H0 : θ = θ0) = α.

A rejection region R is a uniformly most powerful (UMP) rejection rejection ofsize α (and the associated test using R is a uniformly most powerful test ofsize α) for testing a simple null hypothesis H0 versus a composite alternativehypothesis H1 if the region R is a most powerful region of size α for everysimple alternative hypothesis contained in H1.

5.1.2.1 Review of Notation

In the subsections to follow, we will utilize the following quantities, whichwere introduced in Section 4.1.

θ = (θ1, θ2, . . . , θp), the MLE of θ = (θ1, θ2, . . . , θp) based on the likelihoodL(x; θ);

I(θ), the estimated expected information matrix based on the likelihoodL(x; θ);

I−1(θ), the estimated large-sample covariance matrix of θ based onexpected information for the likelihood L(x; θ);

I(x; θ), the estimated observed information matrix based on the likeli-hood L(x; θ);

I−1(x; θ), the estimated large-sample covariance matrix of θ based onobserved information for the likelihood L(x; θ).

5.1.3 Large-Sample ML-Based Methods for Testing the Simple NullHypothesis H0 : θ = θ0 (i.e., θ ∈ ω) versus the Composite AlternativeHypothesis H1 : θ ∈ ω

In general, a null hypothesis places a set of restrictions on the unrestrictedparameter space Ω, where Ω is the set of all possible values of the parametervector θ. Let ω denote the restricted parameter space, where ω ⊂ Ω. Then, forthe simple null hypothesis H0 : θ = θ0, it follows that ω = {θ : θ = θ0}, andΩ = ω ∪ ω, where ω is the complement of ω.

5.1.3.1 Likelihood Ratio Test

The likelihood ratio test statistic λ, 0 < λ < 1, for testing H0 : θ = θ0 (i.e., θ ∈ ω)versus H1 : θ ∈ ω is defined as

λ =maxθ∈ω

L(x; θ)

maxθ∈Ω

L(x; θ)= L(x; θ0)

L(x; θ)≡ Lω

.

Page 330: Exercises and Solutions in Biostatistical Theory (2010)

Concepts and Notation 311

Clearly, small values of λ favor H1, and a size α likelihood ratio testof H0 versus H1 using λ rejects H0 in favor of H1 when λ < kα, wherepr(λ < kα|H0) = α.

Since the exact distribution of λ is often difficult to determine (either underH0 or under H1), the following large-sample approximation is typically used(Neyman and Pearson, 1928).

Under certain regularity conditions, for large n and under H0 : θ = θ0,

−2 ln λ = 2[ln L(x; θ) − ln L(x; θ0)

]∼χ2

p.

Thus, for a likelihood ratio test of approximate size α, one would reject H0 :θ = θ0 in favor of H1 : θ = θ0 when −2 ln λ > χ2

p,1−α.

5.1.3.2 Wald Test

The Wald test statistic W, 0 < W < +∞, for testing H0 : θ = θ0 versus H1 :θ ∈ ω is defined as

W = (θ − θ0)I(θ)(θ − θ0)′

when using expected information, and is defined as

W = (θ − θ0)I(x; θ)(θ − θ0)′

when using observed information.Under certain regularity conditions (e.g., see Wald, 1943), for large n and

under H0 : θ = θ0, W∼χ2p. Thus, for a Wald test of approximate size α, one

would reject H0 : θ = θ0 in favor of H1 : θ ∈ ω when W > χ2p,1−α

.

5.1.3.3 Score Test

With the row vector S(θ) defined as

S(θ) =[∂ ln L(x; θ)

∂θ1,∂ ln L(x; θ)

∂θ2, . . . ,

∂ ln L(x; θ)∂θp

],

the score test statistic S, 0 < S < +∞, for testing H0 : θ = θ0 versus H1 : θ ∈ ω

is defined as

S = S(θ0)I−1(θ0)S′(θ0)

when using expected information, and is defined as

S = S(θ0)I−1(x; θ0)S′(θ0)

Page 331: Exercises and Solutions in Biostatistical Theory (2010)

312 Hypothesis Testing Theory

when using observed information. For the simple null hypothesis H0 : θ = θ0,note that the computation of the value of S involves no parameter estimation.

Under certain regularity conditions (e.g., see Rao, 1947), for large n andunder H0 : θ = θ0, S∼χ2

p. Thus, for a score test of approximate size α, one

would reject H0 : θ = θ0 in favor of H1 : θ ∈ ω when S > χ2p,1−α

.For further discussion concerning likelihood ratio, Wald, and score tests,

see Rao (1973).

Example

As an example, let X1, X2, . . . , Xn constitute a random sample of size n from theparent population pX (x ; θ) = θx (1 − θ)1−x , x = 0, 1 and 0 < θ < 1. Consider test-ing H0 : θ = θ0 versus H1 : θ = θ0. Then, with θ = X = n−1∑n

i=1 Xi , it can beshown that

−2 ln λ = 2n

[X ln

(Xθ0

)+ (1 − X ) ln

(1 − X1 − θ0

)]

that

W =[

(X − θ0)√X (1 − X )/n

]2

,

and that

S =[

(X − θ0)√θ0(1 − θ0)/n

]2

.

This simple example highlights an important general difference betweenWald tests and score tests. Wald tests use parameter variance estimates assum-ing that θ ∈ Ω is true (i.e., assuming no restrictions on the parameter spaceΩ), and score tests use parameter variance estimates assuming that θ ∈ ω (i.e.,assuming that H0 is true).

5.1.4 Large Sample ML-Based Methods for Testing the Composite NullHypothesis H0 : θ ∈ ω versus the Composite Alternative HypothesisH1 : θ ∈ ω

Let Ri(θ) = 0, i = 1, 2, . . . , r, represent r (<p) independent restrictions placedon the parameter vector θ, and consider the null hypothesis H0 : θ ∈ ω,where ω = {θ : Ri(θ) = 0, i = 1, 2, . . . , r}. For example, with θ = (θ1, θ2, θ3, θ4)

for p = 4, consider the r = 3 linearly independent linear restrictions

R1(θ) = (θ1 − θ2) = 0, R2(θ) = (θ1 − θ3) = 0

Page 332: Exercises and Solutions in Biostatistical Theory (2010)

Concepts and Notation 313

and

R3(θ) = (θ1 − θ4) = 0.

Then, the null hypothesis H0 : Ri(θ) = 0, i = 1, 2, 3, is equivalent to the nullhypothesis H0 : θ1 = θ2 = θ3 = θ4.

In what follows, let θω denote the restricted MLE of θ under the nullhypothesis H0 : θ ∈ ω.

5.1.4.1 Likelihood Ratio Test

The likelihood ratio test statistic λ, 0 < λ < 1, for testing H0 : θ ∈ ω versusH1 : θ ∈ ω is defined as

λ =maxθ∈ω

L(x; θ)

maxθ∈Ω

L(x; θ)= L(x; θω)

L(x; θ)≡ Lω

.

Under certain regularity conditions, for large n and under H0 : θ ∈ ω,

−2 ln λ = 2[ln L(x; θ) − ln L(x; θω)

]∼χ2

r .

Thus, for a likelihood ratio test of approximate size α, one would reject H0 :θ ∈ ω in favor of H1 : θ ∈ ω when −2ln λ > χ2

r,1−α.

5.1.4.2 Wald Test

Let the (1 × r) row vector R(θ) be defined as

R(θ) = [R1(θ), R2(θ), . . . , Rr(θ)] .

Also, let the (r × p) matrix T(θ) have (i, j) element equal to [∂Ri(θ)]/∂θj, i =1, 2, . . . , r and j = 1, 2, . . . , p.

And, let the (r × r) matrix Λ(θ) have the structure

Λ(θ) = T(θ)I−1(θ)T ′(θ)

when using expected information, and have the structure

Λ(x; θ) = T(θ)I−1(x; θ)T ′(θ)

when using observed information.Then, the Wald test statistic W, 0 < W < +∞, for testing H0 : θ ∈ ω versus

H1 : θ ∈ ω is defined as

W = R(θ)Λ−1(θ)R′(θ)

Page 333: Exercises and Solutions in Biostatistical Theory (2010)

314 Hypothesis Testing Theory

when using expected information, and is defined as

W = R(θ)Λ−1(x; θ)R′(θ)

when using observed information.Under certain regularity conditions, for large n and under H0 : θ ∈ ω,

W∼χ2r . Thus, for a Wald test of approximate size α, one would reject H0 : θ ∈ ω

in favor of H1 : θ ∈ ω when W > χ2r,1−α

.

5.1.4.3 Score Test

The score test statistic S, 0 < S < +∞, for testing H0 : θ ∈ ω versus H1 : θ ∈ ω

is defined asS = S(θω)I−1(θω)S′(θω)

when using expected information, and is defined as

S = S(θω)I−1(x; θω)S′(θω)

when using observed information.Under certain regularity conditions, for large n and under H0 : θ ∈ ω, S∼χ2

r .Thus, for a score test of approximate size α, one would reject H0 : θ ∈ ω infavor of H1 : θ ∈ ω when S > χ2

r,1−α.

Example

As an example, let X1, X2, . . . , Xn constitute a random sample of size n froma N(μ, σ2) parent population. Consider testing the composite null hypothesis

H0 : μ = μ0, 0 < σ2 < +∞, versus the composite alternative hypothesis H1 : μ =μ0, 0 < σ2 < +∞. Note that this test is typically called a test of H0 : μ = μ0 versusH1 : μ = μ0.

It is straightforward to show that the vector θ of MLEs of μ and σ2 for theunrestricted parameter space Ω is equal to

θ = (μ, σ2) =[X ,(

n − 1n

)S2]

,

where X = n−1∑ni=1 Xi and S2 = (n − 1)−1∑n

i=1(Xi − X )2.Then, it can be shown directly that

−2 ln λ = n ln

[1 + T 2

n−1(n − 1)

],

where

Tn−1 = (X − μ0)

S/√

n∼ tn−1 under H0 : μ = μ0;

Page 334: Exercises and Solutions in Biostatistical Theory (2010)

Exercises 315

thus, the likelihood ratio test is a function of the usual one-sample t -test in thissimple situation.

In this simple situation, the Wald test is also a function of the usual one-samplet -test since

W =(

nn − 1

)T 2

n−1.

In contrast, the score test statistic has the structure

S =[

(X − μ0)

σω/√

n

]2

,

where

σ2ω = n−1

n∑

i=1

(Xi − μ0)2

is the estimator of σ2 under the null hypothesis H0 : μ = μ0.

Although all three of these ML-based hypothesis-testing methods (the likeli-hood ratio test, the Wald test, and the score test) are asymptotically equivalent,their use can lead to different conclusions in some actual data-analysisscenarios.

EXERCISES

Exercise 5.1. Consider sampling from the parent population

fX(x; θ) = θxθ−1, 0 < x < 1, θ > 0.

(a) Based on a random sample X1 of size n = 1 from this parent population, what is thepower of the MP test of H0 : θ = 1 versus H1 : θ = 2 if α = pr(Type I error) = 0.05?

(b) If X1 and X2 constitute a random sample of size n = 2 from this parent population,derive the exact structure of the rejection region of size α = 0.05 associated with theMP test of H0 : θ = 1 versus H1 : θ = 2. Specifically, find the numerical value ofthe dividing point kα between the rejection and non-rejection regions.

Exercise 5.2. Let Y1, Y2, . . . , Yn constitute a random sample of size n from the parentdensity

fY(y; θ) = (1 + θ)(y + θ)−2, y > 1, θ > −1.

(a) Develop an explicit expression for the form of the MP rejection region R for testingH0 : θ = 0 versus H1 : θ = 1 when pr(Type I error) = α.

(b) If n = 1 and α = 0.05, find the numerical value of the dividing point between therejection and non-rejection regions for this MP test.

Page 335: Exercises and Solutions in Biostatistical Theory (2010)

316 Hypothesis Testing Theory

(c) If, in fact, θ = 1, what is the exact numerical value of the power of this MP test ofH0 : θ = 0 versus H1 : θ = 1 when α = 0.05 and n = 1?

Exercise 5.3. Let Y1, Y2, . . . , Yn constitute a random sample of size n from a N(0, σ2)

population. Develop the structure of the rejection region for a uniformly most powerful(UMP) test of H0 : σ2 = 1 versus H1 : σ2 > 1. Then, use this result to find a reasonablevalue for the smallest sample size (say, n∗) that is needed to provide a power of at least0.80 for rejecting H0 in favor of H1 when α = 0.05 and when the actual value of σ2 isno smaller than 2.0 in value.

Exercise 5.4. Let X1, X2, . . . , Xn constitute a random sample of size n from

pX(x; θ1) = θx1(1 − θ1)1−x, x = 0, 1, and 0 < θ1 < 1;

and, let Y1, Y2, . . . , Yn constitute a random sample of size n from

pY(y; θ2) = θy2(1 − θ2)1−y, y = 0, 1, and 0 < θ2 < 1.

(a) If n = 30, derive a reasonable numerical value for the power of a size α = 0.05 MPtest of H0 : θ1 = θ2 = 0.50 versus H1 : θ1 = θ2 = 0.60.

(b) Now, suppose that it is of interest to test H0 : θ1 = θ2 = θ0 (where θ0 is a specifiedconstant, 0 < θ0 < 1) versus H1 : θ1 > θ2 at the α = 0.05 level using a test statisticthat is an explicit function of (X − Y), where X = n−1∑n

i=1 Xi = n−1Sx and Y =n−1∑n

i=1 Yi = n−1Sy. Provide a reasonable value for the smallest sample size (say,n∗) needed so that the power for testing H0 versus H1 is at least 0.90 when θ0 = 0.10and when (θ1 − θ2) ≥ 0.20.

Exercise 5.5. An epidemiologist gathers data (xi, Yi) on each of n randomly chosennoncontiguous and demographically similar cities in the United States, where xi(i =1, 2, . . . , n) is the known population size (in millions of people) in city i, and whereYi is the random variable denoting the number of people in city i with colon can-cer. It is reasonable to assume that Yi(i = 1, 2, . . . , n) has a Poisson distribution withmean E(Yi) = θxi, where θ(>0) is an unknown parameter, and that Y1, Y2, . . . , Yn aremutually independent random variables.

(a) Using the available data (xi, Yi), i = 1, 2, . . . , n, construct a UMP test of H0: θ = 1versus H1: θ > 1.

(b) If∑n

i=1 xi = 0.82, what is the power of this UMP test for rejecting H0: θ = 1 versusH1: θ > 1 when the probability of a Type I error α = 0.05 and when, in reality,θ = 5?

Exercise 5.6. For i = 1, 2, suppose that it is desired to select a random sampleXi1, Xi2, . . . , Xini of size ni from a N(μi, σ2

i ) population, where μ1 and μ2 are unknownparameters and where σ2

1 and σ22 are known parameters.

For testing H0 : μ1 = μ2 versus H1 : μ1 − μ2 = δ(> 0), the test statistic

Z = (X1 − X2) − 0√V

Page 336: Exercises and Solutions in Biostatistical Theory (2010)

Exercises 317

is to be used, where

Xi =ni∑

j=1

Xij for i = 1, 2, and V = σ21

n1+ σ2

2n2

.

(a) If the null hypothesis is to be rejected when Z > Z1−α, show that the two conditionspr(Type I error)= α and pr(Type II error)= β are simultaneously satisfied when

V =(

δ

Z1−α + Z1−β

)2

= θ, say.

(b) Subject to the constraint V = θ, find (as a function of σ21 and σ2

2) that value of n1/n2which minimizes the total sample size N = (n1 + n2). Due to logistical constraints,suppose that it is only possible to select a total sample size of N = (n1 + n2) = 100.If N = 100, σ2

1 = 9, and σ22 = 4, find the appropriate values of n1 and n2.

(c) Again, subject to the constraint V = θ, develop expressions for n1 and n2 (in termsof θ, σ1, and σ2) that will minimize the total sampling cost if the cost of selectingan observation from Population 1 is four times the cost of selecting an observationfrom Population 2. What are the specific sample sizes needed if σ1 = 5, σ2 = 4, α =0.05, β = 0.10, and δ = 3?

Exercise 5.7. Let X1, X2, . . . , Xn constitute a random sample of size n from the parentpopulation

fX(x; θ) = θ−1, 0 < x < θ,

where θ is an unknown parameter.Suppose that a statistician proposes the following test of H0 : θ = θ0 versus H1 : θ >

θ0: “reject H0 in favor of H1 if X(n) > c, where X(n) is the largest observation in the setX1, X2, . . . , Xn and where c is a specified positive constant.”

(a) If θ0 = 12 , find that specified value of c, say c∗, such that pr(Type I error) = α. Note

that c∗ will be a function of both n and α.

(b) If the true value of θ is actually 34 , find the smallest value of n (say, n∗) required so

that the power of the statistician’s test is at least 0.98 when α = 0.05 and θ0 = 12 .

Exercise 5.8. For the ith of n independently selected busy intersections in a certainheavily populated U.S. city, the number Xi of automobile accidents in any given year isassumed to have a Poisson distribution with mean μi, i = 1, 2, . . . , n. It can be assumedthat these n intersections are essentially the same with respect to the rate of traffic flowper day. It is of interest to test the null hypothesis H0 : μi = μ, i = 1, 2, . . . , n, versusthe (unrestricted) alternative hypothesis H1 that the μi’s are not necessarily all equalto one another (i.e., that they are completely free to vary in value). In other words, wewish to use the n mutually independent Poisson random variables X1, X2, . . . , Xn toassess whether or not the true average number of accidents in any given year is the

Page 337: Exercises and Solutions in Biostatistical Theory (2010)

318 Hypothesis Testing Theory

same at each of the n intersections. Note that testing H0 versus H1 is equivalent totesting “homogeneity” versus “heterogeneity” among the μi’s.

(a) Develop an explicit expression for the likelihood ratio statistic −2 ln(λ) for testingH0 versus H1. If, in a sample of n = 40 intersections in a particular year, there were20 intersections each with a total of 5 accidents, 10 intersections each with a totalof 6 accidents, and 10 intersections each with a total of 8 accidents, demonstratethat H0 is not rejected at the α = 0.05 level based on these data.

(b) Based on the data and the hypothesis test results for part (a), construct what youdeem to be an appropriate 95% CI for μ.

Exercise 5.9. It is of interest to compare two cities (say, City 1 and City 2) with regardto their true rates (λ1 and λ2, respectively) of primary medical care utilization, wherethese two rates are expressed in units of the number of out-patient doctor visits perperson-year of community residence. For i = 1, 2, suppose that n adult residents arerandomly selected from City i; further, suppose that the values of the two variables Xijand Lij are recorded for the jth person (j = 1, 2, . . . , n) in this random sample from Cityi, where Xij is the total number of out-patient doctor visits made by this person whileresiding in City i, and where Lij is the length of residency (in years) in City i for thisperson. Hence, for i = 1, 2, the data for City i consist of the n mutually independentpairs (Xi1, Li1), (Xi2, Li2), . . . , (Xin, Lin). In what follows, it is to be assumed that thedistribution of Xij is POI(Lijλi), so that E(Xij) = V(Xij) = Lijλi. Furthermore, the Lij’sare to be considered as fixed known constants.

(a) Develop an explicit expression for the likelihood function for all 2n observations(n from City 1 and n from City 2), and find two statistics which are jointly sufficientfor λ1 and λ2.

(b) Using the likelihood function in part (a), prove that the MLE of λi is

λi =∑n

j=1 Xij∑nj=1 Lij

, i = 1, 2.

(c) Suppose that it is of interest to test the composite null hypothesis H0 : λ1 = λ2(= λ, say) versus the composite alternative hypothesis H1 : λ1 = λ2. Assumingthat H0 is true, find the MLE λ of λ.

(d) Develop an explicit expression for the likelihood ratio statistic which can be usedto test H0 : λ1 = λ2 versus H1 : λ1 = λ2.

(e) Suppose that n = 25, λ1 = 0.02, λ2 = 0.03,∑n

j=1 L1j = 200, and∑n

j=1 L2j = 300.Use the likelihood ratio statistic developed in part (d) to test H0 : λ1 = λ2 versusH1 : λ1 = λ2 at the α = 0.10 level. What is the P-value of your test?

Exercise 5.10. Suppose that X and Y are continuous random variables representingthe survival times (in years) for patients following two different types of surgicalprocedures for the treatment of advanced colon cancer. Further, suppose that thesesurvival time distributions are assumed to be of the form

fX(x; α) = αe−αx, x > 0, α > 0 and fY(y; β) = βe−βy, y > 0, β > 0.

Page 338: Exercises and Solutions in Biostatistical Theory (2010)

Exercises 319

Let X1, X2, . . . , Xn and Y1, Y2, . . . , Yn denote random samples of size n from fX(x; α)

and fY(y; β), respectively. Also, let X = n−1∑ni=1 Xi and let Y = n−1∑n

i=1 Yi.

(a) For the likelihood ratio test of H0 : α = β versus H1 : α = β, show that the like-lihood ratio statistic λ can be written in the form λ = [4u(1 − u)]n, where u =x/(x + y).

(b) If n = 100, x = 1.25 years, and y = 0.75 years, use a P-value computation to decidewhether or not to reject H0 in favor of H1, and then interpret your finding withregard to these two surgical procedures for the treatment of advanced colon cancer.

Exercise 5.11. The number X of speeding tickets issued to a typical teenage driverduring a specified two-year period in a certain community (say, Community #1) havingmandatory teenage driver education classes is assumed to have the distribution

pX(x; θ1) = θ1(1 − θ1)x, x = 0, 1, . . . , ∞; 0 < θ1 < 1.

The number Y of speeding tickets issued to a typical teenage driver during that same2-year period in another community with similar sociodemographic characteristics(say, Community #2), but not having mandatory teenage driver education classes, isassumed to have the distribution

pY(y; θ2) = θ2(1 − θ2)y, y = 0, 1, . . . , ∞; 0 < θ2 < 1.

Let X1, X2, . . . , Xn constitute a random sample of size n from pX(x; θ1), and letx1, x2, . . . , xn denote the corresponding n realizations (i.e., the actual set of observednumbers of speeding tickets) for the set of n randomly chosen teenage drivers selectedfrom Community #1. Further, let Y1, Y2, . . . , Yn constitute a random sample of size nfrom pY(y; θ2), with y1, y2, . . . , yn denoting the corresponding realizations.

(a) Using the complete set of observed data {x1, x2, . . . , xn; y1, y2, . . . , yn}, developan explicit expression for the likelihood ratio test statistic λ for testing the nullhypothesis H0 : θ1 = θ2(= θ, say) versus the alternative hypothesis H1 : θ1 = θ2.If n = 25, x = 1.00, and y = 2.00, is there sufficient evidence to reject H0 in favorof H1 at the α = 0.05 level of significance?

(b) Using observed information, use the data in part (a) to compute the numerical valueof S, the score statistic for testing H0 versus H1. How do the conclusions based onthe score test compare with those based on the likelihood ratio test?

(c) A highway safety researcher contends that the data do suggest that the teenagedriver education classes might actually be beneficial, and he suggests that increas-ing the sample size n might actually lead to a highly statistically significantconclusion that these mandatory teenage driver education classes do lower therisk of speeding by teenagers. Making use of the available data, comment on thereasonableness of this researcher’s contention.

Exercise 5.12. Suppose that n randomly selected adult male hypertensive patients areadministered a new blood pressure lowering drug during a clinical trial designedto assess the efficacy of this new drug for promoting long-term remission of high

Page 339: Exercises and Solutions in Biostatistical Theory (2010)

320 Hypothesis Testing Theory

blood pressure. Further, once each patient’s blood pressure returns to a normal range,suppose that each patient is examined monthly to see if the hypertension returns. Forthe ith patient in the study, let xi denote the age of the patient at the start of the clinicaltrial, and let Yi be the random variable denoting the number of months of follow-upuntil the hypertension returns for the first time. It is reasonable to assume that Yi hasthe geometric distribution

pYi

(yi; θi

) = (1 − θi)yi−1 θi, yi = 1, 2, . . . , ∞, 0 < θi < 1 and i = 1, 2, . . . , n.

It is well-established that age is a risk factor for hypertension. To take into account thediffering ages of the patients at the start of the trial, it is proposed that θi be expressedas the following function of age:

θi = βxi/(1 + βxi), β > 0.

Given the n pairs (xi, yi), i = 1, 2, . . . , n, of data points, the analysis goal is to obtainthe MLE β of β, and then to use β to make statistical inferences about β.

(a) Prove that the MLE β of β satisfies the equation

β = n∑n

i=1 xiyi

(1 + βxi

)−1 .

(b) Prove that the asymptotic variance of β is

V(β) = β2∑n

i=1(1 + βxi)−1 .

(c) If the clinical trial involves 50 patients of age 30 and 50 patients of age 40 at thestart of the trial, find a large-sample 95% CI for β if β = 0.50.

(d) For the data in part (c), carry out a Wald test of H0 : β = 1 versus H1 : β = 1 usingα = 0.05. Do you reject H0 or not? What is the P-value of your test?

(e) To test H0 : β = 1 versus H1 : β > 1, consider the test statistic

U = (β − 1)√V0(β)

,

where

V0(β) = 1∑ni=1 (1 + xi)

−1

is the large-sample variance of β when H0 : β = 1 is true. Assuming that

(β − β)√V(β)

∼ N(0, 1)

Page 340: Exercises and Solutions in Biostatistical Theory (2010)

Exercises 321

for large n, where V(β) is given in part (b), and using the age data in part (c),what is the approximate power of U to reject H0 : β = 1 in favor of H1 : β > 1 whenα = 0.025 and when the true value of β is equal to 1.10?

Exercise 5.13. A random sample of 1000 disease-free heavy smokers is followed fora 20-year period. At the end of this 20-year follow-up period, it is found that exactly100 of these 1000 heavy smokers developed lung cancer during the follow-up period.It is of interest to make statistical inferences about the population parameter ψ =θ/(1 − θ), where θ is the probability that a member of the population from whichthis random sample came develops lung cancer during this 20-year follow-up period.The parameter ψ is the odds of developing lung cancer during this 20-year follow-upperiod, namely, the ratio of the probability of developing lung cancer to the probabilityof not developing lung cancer over this 20-year period.

(a) Using the available numerical information, construct an appropriate 95% CI forthe parameter ψ.

(b) Carry out Wald and score tests of the null hypothesis H0: ψ = 0.10 versus thealternative hypothesis H1: ψ > 0.10. What are the P-values of these two tests?Interpret your findings.

Exercise 5.14. An environmental scientist postulates that the distributions of the con-centrations X and Y (in parts per million) of two air pollutants can be modeled asfollows: the conditional density of Y, given X = x, is postulated to have the structure

fY(y|X = x; α, β) = 1(α + β)x

e−y/(α+β)x, y > 0, x > 0, α > 0, β > 0;

and, the marginal density of X is postulated to have the structure

fX(x; β) = 1β

e−x/β, x > 0, β > 0.

Let (X1, Y1), (X2, Y2), . . . , (Xn, Yn) constitute a random sample of size n from the jointdensity fX,Y(x, y; α, β) of X and Y.

(a) Derive explicit expressions for two statistics U1 and U2 that are jointly sufficientfor α and β, and then prove that corr(U1, U2) = 0.

(b) Using the random sample (Xi, Yi) i = 1, . . . , n, derive explicit expressions for theMLEs α and β of the unknown parameters α and β. Then, if n = 30, α = 2, and β = 1,find the P-value for a Wald test (based on expected information) of H0 : α = β

versus H1 : α = β. Also, use the available data to compute an appropriate 95%CI for the parameter (α − β), and then comment on any numerical connectionbetween the confidence interval result and the P-value.

Exercise 5.15. For the ith of two large formaldehyde production facilities located in twodifferent southern cities in the United States, the expected amount E(Yij) in poundsof formaldehyde produced by a certain chemical reaction, expressed as a function of

Page 341: Exercises and Solutions in Biostatistical Theory (2010)

322 Hypothesis Testing Theory

the amount xij (>0) in pounds of catalyst used to promote the reaction, is given by theequation

E(Yij) = βix2ij, where βi > 0 and xij > 0, i = 1, 2 and

j = 1, 2, . . . , n.

Let (xi1, Yi1), (xi2, Yi2), . . . , (xin, Yin) be n independent pairs of data points from the ithproduction facility, i = 1, 2. Assume that Yij has a negative exponential distribution

with mean αij = E(Yij) = βix2ij, that the xij’s are known constants, and that the Yij’s are

a set of 2n mutually independent random variables.

(a) Provide an explicit expression for the joint distribution (i.e., the unconditionallikelihood function) of the 2n Yij’s, and then provide explicit expressions for twostatistics that are jointly sufficient for β1 and β2.

(b) Under the stated assumptions given earlier, develop an explicit expression (usingexpected information) for the score statistic S for testing H0: β1 = β2 versusH1:β1 = β2. In particular, show that S can be expressed solely as a function ofn, β1, and β2, where β1 and β2 are the MLEs of β1 and β2, respectively, in the unre-stricted parameter space. If n = 25, β1 = 2, and β2 = 3, do you reject H0 in favorof H1 at the α = 0.05 level?

Exercise 5.16. Consider a clinical trial involving two different treatments for Stage IVmalignant melanoma, namely, Treatment Aand Treatment B. Let X1, X2, . . . , Xn denotethe mutually independent survival times (in months) for the n patients randomlyassigned to Treatment A. As a statistical model, consider X1, X2, . . . , Xn to constitutea random sample of size n from

fX(x; θ) = θ−1e−x/θ, x > 0, θ > 0.

Further, let Y1, Y2, . . . , Yn denote the mutually independent survival times (in months)for the n patients randomly assigned to Treatment B. As a statistical model, considerY1, Y2, . . . , Yn to constitute a random sample of size n from

fY(y; λ, θ) = (λθ)−1e−y/λθ, y > 0, λ > 0, θ > 0.

Clearly, E(X) = θ and E(Y) = λθ, so that statistical inferences about the parameter λ

can be used to decide whether or not the available data provide evidence of a differencein true average survival times for Treatment A and Treatment B.

(a) Find explicit expressions for statistics that are jointly sufficient for makingstatistical inferences about the unknown parameters λ and θ.

(b) Derive explicit expressions for λ and θ, the MLEs of the unknown parameters λ

and θ.

(c) Using expected information, derive an explicit expression for the score statisticS for testing H0 : λ = 1 versus H1 : λ = 1. Also, show directly how a variance

Page 342: Exercises and Solutions in Biostatistical Theory (2010)

Exercises 323

estimated under H0 enters into the explicit expression for S. For a particular dataset where n = 50, x = n−1∑n

i=1 xi = 30, and y = n−1∑ni=1 yi = 40, what is the

approximate P-value when the score statistic S is used to test H0 versus H1?

Exercise 5.17. An oncologist reasons that the survival time X (in years) for advanced-stage colorectal cancer follows an exponential distribution with unknown parameterλ; that is,

fX(x|λ) = λe−λx, x > 0, λ > 0.

Although this oncologist does not know the exact value of λ, she is willing to assumea priori that λ also follows an exponential distribution with known parameter β, namely,

π(λ) = βe−βλ, λ > 0, β > 0.

In the Bayesian paradigm, fX(x|λ) is called the likelihood function, and π(λ) is calledthe prior distribution of λ (i.e., the distribution assigned to λ before observing a valuefor X).

(a) Find the marginal distribution fX(x) of X (i.e., the distribution of X averaged overall possible values of λ).

(b) Find the posterior distribution π(λ|X = x) of λ.

(c) A Bayesian measure of evidence against a null hypothesis (H0), and in favor of analternative hypothesis (H1), is the Bayes Factor, denoted BF10. In particular,

BF10 = pr(H1|X = x)/pr(H0|X = x)

pr(H1)/pr(H0)= pr(H1|X = x)pr(H0)

pr(H0|X = x)pr(H1),

where pr(Hk) and pr(Hk|X = x) denote, respectively, the prior and posterior prob-abilities of hypothesis Hk , k = 0, 1. Hence, BF10 is the ratio of the posterior oddsof H1 to the prior odds of H1. According to Kass and Raftery (1995), 1 < BF10 ≤ 3provides “weak” evidence in favor of H1, 3 < BF10 ≤ 20 provides “positive” evi-dence in favor of H1, 20 < BF10 ≤ 150 provides “strong” evidence in favor of H1,and BF10 > 150 provides “very strong” evidence in favor of H1.

If β = 1 and x = 3, what is the Bayes factor for testing H0 : λ > 1 versus H1 :λ ≤ 1? Using the scale proposed by Kass and Raftery (1995), what is the strengthof evidence in favor of H1?

Exercise 5.18∗. A controlled clinical trial was designed to compare the survival times(in years) of HIV patients receiving once daily dosing of the new drug Epzicom (a com-bination of 600 mg of Ziagen and 300 mg of Epivir) to the survival times (in years) ofHIV patients receiving once daily dosing of the new drug Truvada (a combination of300 mg of Viread and 200 mg of Emtriva). Randomly chosen HIV patients were pairedtogether based on the values of several important factors, including age, current HIVlevels, general health status, and so on. Then, one member of each pair was randomlyselected to receive Epzicom, with the other member then receiving Truvada. For theith pair, i = 1, 2, . . . , n, let Xi denote the survival time of the patient receiving Epzicom,

Page 343: Exercises and Solutions in Biostatistical Theory (2010)

324 Hypothesis Testing Theory

and let Yi denote the survival time of the patient receiving Truvada. Further, assumethat Xi and Yi are independent random variables with respective distributions

fXi (xi) = (θφi)−1e−xi/θφi , xi > 0,

and

fYi (yi) = φ−1i e−yi/φi , yi > 0.

Here, φi (>0) is a parameter pertaining to characteristics of the ith pair, and θ (>0)is the parameter reflecting any difference in true average survival times for the twodrugs Epzicom and Truvada. Hence, the value θ = 1 indicates no difference betweenthe two drugs with regard to average survival time.

(a) Provide an explicit expression for the joint distribution (i.e., the likelihood) of the2n random variables X1, X2, . . . , Xn and Y1, Y2, . . . , Yn. How many parameterswould have to be estimated by the method of ML? Comment on this finding.

(b) A consulting statistician points out that the only parameter of real interest is θ.She suggests that an alternative analysis be based just on the n ratios Ri = Xi/Yi,i = 1, 2, . . . , n. In particular, this statistician claims that the distributions of theseratios depend only on θ and not on the {φi}, and that the {φi} are so-called nuisanceparameters (i.e., parameters that appear in assumed statistical models, but that arenot of direct relevance to the particular research questions of interest). Prove thatthis statistician is correct by showing that

fRi (ri) = θ

(θ + ri)2 , 0 < ri < +∞, i = 1, 2, . . . , n.

(c) Using the n mutually independent random variables R1, R2, . . . , Rn, it is of interestto test H0 : θ = 1 versus H1 : θ > 1 at the α = 0.025 level. What is the smallestsample size n∗ required so that the power of an appropriate large-sample test isat least 0.80 when, in fact, the true value of θ is 1.50?

Exercise 5.19∗. In many important practical data analysis situations, the statisticalmodels being used involve several parameters, only a few of which are relevant fordirectly addressing the research questions of interest. The irrelevant parameters, gen-erally referred to as “nuisance parameters,” are typically employed to ensure that thestatistical models make scientific sense, but are generally unimportant otherwise. Onemethod for eliminating the need to estimate these nuisance parameters, and hencegenerally to improve both statistical validity and precision, is to employ a conditionalinference approach, whereby a conditioning argument is used to produce a condi-tional likelihood function that only involves the relevant parameters. For an excellentdiscussion of methods of conditional inference, see McCullagh and Nelder (1989).

As an example, suppose that it is of interest to evaluate whether current smokerstend to miss more days of work due to illness than do nonsmokers. For a certain man-ufacturing industry, suppose that n mutually independent matched pairs of workers,one a current smoker and one a nonsmoker, are formed, where the workers in eachpair are chosen (i.e., are matched) to have the same set of general risk factors (e.g.,

Page 344: Exercises and Solutions in Biostatistical Theory (2010)

Exercises 325

age, current health status, type of job, etc.) for illness-related work absences. These 2nworkers are then followed for a year, and the number of days missed due to illnessduring that year is recorded for each worker.

For the ith pair of workers, i = 1, 2, . . . , n, let Yij ∼ POI(φiλj), j = 0, 1, where j =0 pertains to the nonsmoking worker and where j = 1 pertains to the worker whocurrently smokes. Further, assume that Yi1 and Yi0 are independent random variables.It is of interest to test H0 : λ1 = λ0 versus H1 : λ1 > λ0. If H0 is rejected in favorof H1, then this finding would supply statistical evidence that current smokers, onaverage, tend to miss more days of work due to illness than do nonsmokers. The nparameters {φ1, φ2, . . . , φn} are parameters reflecting inherent differences across thematched pairs with regard to general risk factors for illness-related work absences,and these n nuisance parameters are not of primary interest. The statistical analysisgoal is to use a conditional inference approach that eliminates the need to estimatethese nuisance parameters and that still produces an appropriate statistical procedurefor testing H0 versus H1.

(a) Develop an explicit expression for the conditional distribution pYi1(yi1|Yi1 +Yi0 = Si = si) of the random variable Yi1 given that (Yi1 + Yi0) = Si = si.

(b) Use the result in part (a) to develop an appropriate ML-based large-sampletest of H0 versus H1 that is based on the parameter θ = λ1/(λ0 + λ1). For n =50,∑n

i=1 si = 500, and∑n

i=1 yi1 = 275, is there statistical evidence for rejecting H0in favor of H1? Can you detect another advantage of this conditional inferenceprocedure?

Exercise 5.20∗. Let X1, X2, . . . , Xn constitute a random sample of size n from a Poissondistribution with parameter λx. Furthermore, let Y1, Y2, . . . , Yn constitute a randomsample of the same size n from a different Poisson population with parameter λy.

(a) Use these 2n mutually independent observations to develop an explicit expressionfor the score test statistic S (based on expected information) for testing the nullhypothesis H0 : λx = λy versus the alternative hypothesis H1 : λx = λy. Supposethat n = 30, x = n−1∑n

i=1 xi = 8.00, and y = n−1∑ni=1 yi = 9.00; do you reject H0

or not using S?

(b) Now, suppose that n = 1, so that only the independent observations X1 and Y1 areavailable. By considering the conditional distribution of X1 given that (X1 + Y1) =s1, develop a method for testing the null hypothesis H0 : λy = δλx versus thealternative hypothesis H1 : λy > δλx, where δ (> 0) is a known constant. Supposethat δ = 0.60, x1 = 4, and y1 = 10. What is the exact P-value of your test of H0versus H1?

Exercise 5.21∗. For older adults with symptoms of Alzheimer’s disease, the distribu-tion of the time X (in hours) required to complete a verbal aptitude test designed tomeasure the severity of dementia is assumed to have the distribution

fX(x) = 1, 0.50 ≤ θ < x < (θ + 1) < +∞.

Let X1, X2, . . . , Xn constitute a random sample of size n(> 1) from fX(x). Further, define

X(1) = min{X1, X2, . . . , Xn} and X(n) = max{X1, X2, . . . , Xn}.

Page 345: Exercises and Solutions in Biostatistical Theory (2010)

326 Hypothesis Testing Theory

It is of interest to test H0 : θ = 1 versus H1 : θ > 1. Suppose that the following decisionrule is proposed: reject H0 : θ = 1 in favor of H1 : θ > 1 if and only if the event A ∪ Boccurs, where A is the event that X(1) > k, where B is the event that X(n) > 2, andwhere k is a positive constant.

(a) Find a specific expression for k, say kα, such that this particular decision rule hasa Type I error rate exactly equal to α, 0 < α ≤ 0.10.

(b) Find the power function for this decision rule; in particular, consider the power ofthis decision rule for appropriately chosen disjoint sets of values of θ, 1 < θ < +∞.

Exercise 5.22∗. Suppose that Y11, Y12, . . . , Y1n constitute a set of n random variablesrepresenting the responses to a certain lung function test for n farmers living in thesame small neighborhood located very near to a large hog farm in rural North Carolina.Since these n farmers live in the same small neighborhood and so experience roughlythe same harmful levels of air pollution from hog waste, it is reasonable to believe thatthe responses to this lung function test for these n farmers will not be independent.In particular, assume that Y1j ∼ N(μ1, σ2), j = 1, 2, . . . , n, and that corr(Y1j, Y1j′) =ρ (> 0) for every j = j′, j = 1, 2, . . . , n and j′ = 1, 2, . . . , n.

Similarly, suppose that Y21, Y22, . . . , Y2n constitute a set of n random variables rep-resenting responses to the same lung function test for n farmers living in a differentsmall rural North Carolina neighborhood that experiences only minimal levels of airpollution from hog waste. In particular, assume that Y2j ∼ N(μ2, σ2), j = 1, 2, . . . , n,and that corr(Y2j, Y2j′) = ρ (> 0) for every j = j′, j = 1, 2, . . . , n and j′ = 1, 2, . . . , n.

Further, assume that the parameters σ2 and ρ have known values, that the sets ofrandom variables {Y1j}n

j=1 and {Y2j}nj=1 are independent of each other, and that the

two sample means Y1 =∑nj=1 Y1j and Y2 =∑n

j=1 Y2j are each normally distributed.

(a) Find E(Y1 − Y2) and develop an explicit expression for V(Y1 − Y2) that is afunction of n, σ2, and ρ.

(b) Given the stated assumptions, provide a hypothesis testing procedure involvingthe standard normal distribution for testing H0 : μ1 = μ2 versus H1 : μ1 > μ2using a Type I error rate of α = 0.05.

(c) Now, suppose that an epidemiologist with minimal statistical training incorrectlyignores the positive intra-neighborhood correlation among responses and thususes a test (based on the standard normal distribution) which incorrectly involvesthe assumption that ρ = 0. If this incorrect test is based on an assumed Type I errorrate of 0.05, and if n = 10, σ2 = 2, and ρ = 0.50, compute the exact numerical valueof the actual Type I error rate associated with the use of this incorrect test. There isan important lesson to be learned here; what is it?

Exercise 5.23∗. The normally distributed random variables X1, X2, . . . , Xn are said tofollow a first-order autoregressive process when

Xi = θXi−1 + εi, i = 1, 2, . . . , n,

where X0 ≡ 0, where θ (−∞ < θ < ∞) is an unknown parameter, and whereε1, ε2, . . . , εn are mutually independent N(0,1) random variables.

Page 346: Exercises and Solutions in Biostatistical Theory (2010)

Exercises 327

(a) Determine the conditional density fX2(x2|X1 = x1) of X2 given X1 = x1.

(b) Develop an explicit expression for fX1,X2(x1, x2), the joint density of X1 and X2.

(c) Let f∗ denote the joint density of X1, X2, . . . , Xn, where, in general,

f∗ = fX1(x1)

n∏

i=2

fXi (xi|X1 = x1, X2 = x2, . . . , Xi−1 = xi−1).

Using a sample (X1, X2, . . . , Xn) from the joint density f∗, show that a likelihoodratio test of H0 : θ = 0 versus H1 : θ = 0 can be expressed explicitly as a functionof the statistic

(n∑

i=2xi−1xi

)2

(n−1∑i=1

x2i

)(n∑

i=1x2

i

) .

For n = 30, if∑n

i=2 xi−1xi = 4,∑n

i=1 x2i = 15, and xn = 2, would you reject H0 :

θ = 0 at the α = 0.05 level using this likelihood ratio test?

Exercise 5.24∗. For lifetime residents of rural areas in the United States, suppose that itis reasonable to assume that the distribution of the proportion X of a certain biomarkerof benzene exposure in a cubic centimeter of blood taken from such a rural residenthas a beta distribution with parameters α = θr and β = 1, namely,

fX(x; θr) = θrxθr−1, 0 < x < 1, θr > 0.

Let X1, X2, . . . , Xn constitute a random sample of size n from fX(x; θr). Analogously, forlifetime residents of United States urban areas, let the distribution of Y, the proportionof this same biomarker of benzene exposure in a cubic centimeter of blood taken fromsuch an urban resident, be

fY(y; θu) = θuyθu−1, 0 < y < 1, θu > 0.

Let Y1, Y2, . . . , Ym constitute a random sample of size m from fY(y; θu).

(a) Using all (n + m) available observations, find two statistics that are jointlysufficient for θr and θu.

(b) Show that a likelihood ratio test of H0 : θr = θu(= θ, say) versus H1 : θr = θu canbe based on the test statistic

W =∑n

i=1 ln(Xi)[∑ni=1 ln(Xi) +∑m

i=1 ln(Yi)] .

(c) Find the exact distribution of the test statistic W under H0 : θr = θu (= θ, say),and then use this result to construct a likelihood ratio test of H0: θr = θu(= θ, say)versus H1: θr = θu with an exact Type I error rate of α = 0.10 when n = m = 2.

Page 347: Exercises and Solutions in Biostatistical Theory (2010)

328 Hypothesis Testing Theory

Exercise 5.25∗. For two states in the United States with very different distributions ofrisk factors for AIDS (say, Maine and California), suppose that the number Yij of newcases of AIDS in county j ( j = 1, 2, . . . , n) of state i (i = 1, 2) during a particular year isassumed to have the negative binomial distribution

pYij(yij; θi) = C

k+yij−1k−1 θ

yiji (1 + θi)

−(k+yij), yij = 0, 1, . . . , ∞ and θi > 0;

here, θ1 and θ2 are unknown parameters, and k is a known positive constant.For i = 1, 2, let Yi1, Yi2, . . . , Yin denote n mutually independent random variables

representing the numbers of new AIDS cases developing during this particular yearin n randomly chosen non-adjacent counties in state i. It is desired to use the 2n mutu-ally independent observations {Y11, Y12, . . . , Y1n} and {Y21, Y22, . . . , Y2n} to makestatistical inferences about the unknown parameters θ1 and θ2.

(a) Using these 2n mutually independent observations, develop an explicit expressionfor the likelihood ratio test statistic−2ln(λ) for testing the null hypothesis H0 : θ1 =θ2(= θ, say) versus the alternative hypothesis H1 : θ1 = θ2. For n = 50 and k = 3, ifthe observed data are such that

∑nj=1 y1j = 5 and

∑nj=1 y2j = 10, use the likelihood

ratio statistic to test H0 versus H1 at the α = 0.05 significance level. What is theP-value associated with this particular test?

(b) Using the observed data information given in part (a), what is the numericalvalue of the score statistic S for testing H0 versus H1? Use observed information inyour calculations. What is the P-value associated with the use of S for testing H0versus H1?

(c) For i = 1, 2, let Yi = n−1∑nj=1 Yij. A biostatistician suggests that a test of H0 : θ1 =

θ2 versus H1 : θ1 = θ2 can be based on a test statistic, involving (Y1 − Y2), thatis approximately N(0, 1) for large n under H0. Develop the structure of such alarge-sample test statistic. For k = 3 and α = 0.05, if the true parameter values areθ1 = 2.0 and θ2 = 2.4, provide a reasonable value for the minimum value of n (say,n∗) so that the power of this large-sample test is at least 0.80 for rejecting H0 infavor of H1.

Exercise 5.26∗. Consider an investigation in which each member of a random sampleof patients contributes a pair of binary (0−1) outcomes, with the possible outcomesbeing (1,1), (1,0), (0,1), and (0,0). Data such as these arise when a binary outcome (e.g.,the presence or absence of a particular symptom) is measured on the same patientunder two different conditions or at two different time points. Interest focuses onstatistically testing whether the marginal probability of the occurrence of the outcomeof interest differs for the two conditions or time points. To statistically analyze suchdata appropriately, it is necessary to account for the statistical dependence betweenthe two outcomes measured on the same patient.

For a random sample of n patients, let the discrete random variables Y11, Y10, Y01,and Y00 denote, respectively, the numbers of patients having the response patterns(1,1), (1,0), (0,1), and (0,0), where 1 denotes the presence of a particular symptom,0 denotes the absence of that symptom, and the two outcome measurements aremade before and after a particular therapeutic intervention. Assuming that patientsrespond independently of one another, the observed data {y11, y10, y01, y00} may

Page 348: Exercises and Solutions in Biostatistical Theory (2010)

Solutions 329

be assumed to arise from a multinomial distribution with corresponding probabil-ities {π11, π10, π01, π00}, where

∑1i=0∑1

j=0 πij = 1. Note that the random variable(Y11 + Y10) is the number of patients who have the symptom prior to the interven-tion, and that the random variable (Y11 + Y01) is the number of patients who have thesymptom after the intervention. Let

δ = (π11 + π10) − (π11 + π01) = (π10 − π01), −1 < δ < 1,

denote the difference in the probabilities of having the symptom before and after theintervention. Interest focuses on testing H0 : δ = 0 versus H1 : δ = 0.

(a) Given observed counts y11, y10, y01, and y00, develop an explicit expression for theMLE δ of δ.

(b) Using expected information, derive an explicit expression for the Wald chi-squaredtest statistic for testing H0 : δ = 0 versus H1 : δ = 0. What is the P-value of theWald chi-squared test if y11 = 22, y10 = 3, y01 = 7, and y00 = 13?

(c) For testing H0 : δ = 0 versus H1 : δ = 0, the testing procedure known as McNemar’sTest is based on the test statistic

QM = (Y01 − Y10)2

(Y01 + Y10).

Under H0, the statistic QM follows an asymptotic χ21 distribution, and so a two-

sided test at the 0.05 significance level rejects H0 in favor of H1 when QM > χ21,0.95.

Prove that McNemar’s test statistic is identical to the score test statistic used totest H0 : δ = 0 versus H1 : δ = 0. Also, show that the Wald chi-squared statistic isalways at least as large in value as the score chi-squared statistic.

(d) For the study in question, the investigators plan to enroll patients until (y10 + y01)

is equal to 10. Suppose that these investigators decide to reject H0 if QM > χ21,0.95

and decide not to reject H0 if QM ≤ χ21,0.95. What is the exact probability (i.e., the

power) that H0 will be rejected ifπ11 = 0.80,π10 = 0.10,π01 = 0.05, andπ00 = 0.05?

SOLUTIONS

Solution 5.1

(a) To find the form of the MP rejection region, we need to employ the Neyman–Pearson Lemma.Now, with x = (x1, x2, . . . , xn), we have L(x; θ) =∏n

i=1(θxθ−1i ) = θn (∏n

i=1 xi)θ−1 .

In particular, for n = 1, L(x; θ) = θxθ−11 . So,

L(x; 1)

L(x; 2)= (1)x1−1

1

(2)x2−11

= (2x1)−1 ≤ k.

Thus, x1 ≥ kα is the form of the MP rejection region.

Page 349: Exercises and Solutions in Biostatistical Theory (2010)

330 Hypothesis Testing Theory

Under H0 : θ = 1, fX1(x1; 1) = 1, 0 < x1 < 1, so that kα = 0.95; i.e., we reject H0if x1 > 0.95.

So,

POWER = pr{X1 > 0.95|θ = 2} =∫1

0.952x2−1

1 dx1 = 0.0975.

(b) For n = 2, L(x; θ) = θ2(x1x2)θ−1. So,

L(x; 1)

L(x; 2)= 1

4x1x2≤ k.

Thus, x1x2 ≥ kα is the form of the MP rejection region.Under H0 : θ = 1,

fX1,X2(x1, x2; 1) = fX1(x1; 1)fX2(x2; 1) = (1)(1) = 1, 0 < x1 < 1, 0 < x2 < 1.

So, we need to pick kα such that pr[(X1, X2) ∈ R|H0 : θ = 1] = 0.05. In other words,we must choose kα such that

∫1

∫1

kα/x1

(1) dx2dx1 = 0.05 ⇒ [x1 − kα ln x1]1kα

= 0.05

⇒ 1 − [kα − kα ln kα] = 0.05 ⇒ kα ≈ 0.70.

Solution 5.2

(a) With y = (y1, y2, . . . , yn),

L(y; θ) = (1 + θ)nn∏

i=1

(yi + θ)−2.

The MP rejection region has the form

L(y; 0)

L(y; 1)=

∏ni=1(yi + 0)−2

2n∏ni=1(yi + 1)−2 = 2−n

n∏

i=1

(yi + 1

yi

)2≤ k

or, equivalently,

n∏

i=1

(1 + y−1i )2 ≤ 2nk.

So,

R =⎧⎨⎩(y1, y2, . . . , yn) :

n∏

i=1

(1 + y−1i )2 ≤ kα

⎫⎬⎭,

where kα is chosen so that pr{(Y1, Y2, . . . , Yn) ∈ R|H0 : θ = 0} = α.

Page 350: Exercises and Solutions in Biostatistical Theory (2010)

Solutions 331

(b) If n = 1, we need to find kα such that

pr{(

1 + Y−11

)2< kα

∣∣H0 : θ = 0}

= 0.05.

Since y1 > 1, we have kα > 1, so that

pr{(

1 + Y−11

)2< kα

∣∣H0 : θ = 0}

= pr{

1 + Y−11 <

√kα

∣∣H0 : θ = 0}

= pr{Y1 > (√

kα − 1)−1∣∣H0 : θ = 0}

=∫∞(√

kα−1)−1y−2

1 dy1

=[−y−1

1

]+∞(√

kα−1)−1

= (√

kα − 1) = 0.05,

so that kα = (1.05)2 = 1.1025, or k′α = (

√kα − 1)−1 = 1/0.05 = 20.

(c)

POWER = pr(Y1 > 20|θ = 1)

=∫∞

202(y1 + 1)−2 dy1 = 2

[−(y1 + 1)−1

]∞20

= 221

= 0.0952.

The power is very small because n = 1.

Solution 5.3. For any particularσ21 > 1, the optimal rejection region for a most powerful

(MP) test of H0 : σ2 = 1 versus H1 : σ2 = σ21 has the form L(y; 1)

/L(y; σ2

1) ≤ k, wherey = (y1, y2, . . . , yn). Since

L(y; σ) =n∏

i=1

{1√2πσ

e −y2i/

2σ2}

= (2π)−n/2(σ2)−n/2e−(1/2σ2)∑n

i=1 y2i ,

the optimal rejection region has the structure

L(y; 1)

L(y; σ21)

= (2π)−n/2e− 12∑n

i=1 y2i

(2π)−n/2(σ21)−n/2e

− 12σ2

1

∑ni=1 y2

i

= (σ21)n/2e

(1

2σ21− 1

2

)∑ni=1 y2

i ≤ k.

Since(

1/2σ21 − 1/2

)< 0 when σ2

1 > 1, the MP test rejects when∑n

i=1 y2i is large,

that is when∑n

i=1 y2i ≥ k′ for some appropriately chosen k′. Because we obtain the

Page 351: Exercises and Solutions in Biostatistical Theory (2010)

332 Hypothesis Testing Theory

same optimal rejection region for all σ21 > 1, we have a UMP test. Under H0 : σ2 = 1,∑n

i=1 Y2i ∼ χ2

n; so, the appropriate critical value k′ for an α-level test is k′ = χ2n,1−α

because pr(∑n

i=1 Y2i ≥ χ2

n,1−α

∣∣∣H0 : σ2 = 1)

= α. Now,

POWER = pr

⎡⎣

n∑

i=1

Y2i ≥ χ2

n,0.95

∣∣∣∣∣∣σ2 = 2

⎤⎦

= pr

⎡⎣

n∑

i=1

(Yi√

2

)2≥

χ2n,0.95

2

∣∣∣∣∣∣σ2 = 2

⎤⎦

= pr

[χ2

n ≥χ2

n,0.952

], since

Yi√2

∼ N(0, 1) when σ2 = 2.

We want to find the smallest n (say, n∗) such that this probability is at least 0.80. Byinspection of chi-square tables, we find n∗ = 25. Also, by the Central Limit Theorem,since Zi = Yi/

√2 ∼ N(0, 1) when σ2 = 2,

POWER = pr

⎡⎣

n∑

i=1

Z2i ≥

χ2n,0.95

2

⎤⎦ = pr

[∑ni=1 Z2

i − n√2n

≥χ2

n,0.95/2 − n√

2n

]

≈ pr

[Z ≥

χ2n,0.95/2 − n

√2n

],

where E(Z2i ) = 1, V(Z2

i ) = 2, and Z ∼ N(0, 1) for large n. Since Z0.20 = −0.842,POWER ≥ 0.80 when

(χ2n,0.95/2) − n

√2n

≤ −0.842,

or, equivalently, when χ2n,0.95 ≤ 2

[n − 0.842

√2n], which is satisfied by a minimum

value of n∗ = 25.

Solution 5.4. (a) With x = (x1, x2, . . . , xn) and y = (y1, y2, . . . , yn), we have

L(x, y; θ1, θ2) =n∏

i=1

θxi1 (1 − θ1)1−xi ·

n∏

i=1

θyi2 (1 − θ2)1−yi

= θsx1 (1 − θ1)n−sxθ

sy2 (1 − θ2)n−sy ,

Page 352: Exercises and Solutions in Biostatistical Theory (2010)

Solutions 333

where sx =∑ni=1 xi and sy =∑n

i=1 yi. So, using the Neyman–Pearson Lemma,

L(x, y; 0.50, 0.50)

L(x, y; 0.60, 0.60)= (0.50)sx (0.50)n−sx (0.50)sy (0.50)n−sy

(0.60)sx (0.40)n−sx (0.60)sy (0.40)n−sy

= (0.50)2n

(0.60)(sx+sy)(0.40)2n−(sx+sy)

=(

23

)(sx+sy) (54

)2n≤ k =⇒ (sx + sy) ≥ k′

is the structure of the MP region. When θ1 = θ2 = θ, S = (Sx + Sy)

∼ BIN(2n, θ). So, by the Central Limit Theorem, under H0 : θ = 12

S − n√n/2

∼ N(0, 1)

for large n. So, for a size α = 0.05 test of H0 versus H1,

POWER = pr{

S − n√n/2

> 1.645∣∣∣∣H1

}

= pr

{S > 1.645

√n2

+ n

∣∣∣∣∣H1

}

= pr

⎧⎪⎪⎨⎪⎪⎩

S − 2n(0.60)√2n(0.60)(0.40)

>

1.645√

n2

+ n − 2n(0.60)

√2n(0.60)(0.40)

∣∣∣∣∣∣∣∣H1

⎫⎪⎪⎬⎪⎪⎭

≈ pr

⎧⎪⎪⎨⎪⎪⎩

Z >

1.645

√302

+ 30 − 2(30)(0.60)

√2(30)(0.60)(0.40)

∣∣∣∣∣∣∣∣H1

⎫⎪⎪⎬⎪⎪⎭

where Z ∼ N(0, 1). So

POWER ≈ pr(Z > 0.0978) = 1 − Φ(0.0978) = 0.46.

(b) Under H0 : θ1 = θ2 = θ0,

(X − Y)√2θ0(1 − θ0)

n

∼ N(0, 1)

Page 353: Exercises and Solutions in Biostatistical Theory (2010)

334 Hypothesis Testing Theory

for reasonably large n. So,

POWER = pr

⎧⎪⎪⎨⎪⎪⎩

(X − Y)√2θ0(1 − θ0)

n

> 1.645

∣∣∣∣∣∣∣∣(θ1 − θ2) ≥ 0.20

⎫⎪⎪⎬⎪⎪⎭

≥ pr

⎧⎪⎪⎨⎪⎪⎩

(X − Y)√2θ0(1 − θ0)

n

> 1.645

∣∣∣∣∣∣∣∣(θ1 − θ2) = 0.20

⎫⎪⎪⎬⎪⎪⎭

= pr

{(X − Y) > 1.645

√2θ0(1 − θ0)

n

∣∣∣∣∣ (θ1 − θ2) = 0.20

}

= pr

⎧⎪⎪⎨⎪⎪⎩

(X − Y) − 0.20√θ1(1 − θ1)

n+ θ2(1 − θ2)

n

>

1.645

√2θ0(1 − θ0)

n− 0.20

√θ1(1 − θ1)

n+ θ2(1 − θ2)

n

⎫⎪⎪⎬⎪⎪⎭

≈ pr

⎧⎪⎪⎨⎪⎪⎩

Z >

1.645

√2θ0(1 − θ0)

n− 0.20

√θ1(1 − θ1)

n+ θ2(1 − θ2)

n

⎫⎪⎪⎬⎪⎪⎭

,

where Z ∼ N(0, 1). So, for POWER ≥ 0.90, we require

1.645√

2θ0(1 − θ0) − 0.20√

n√θ1(1 − θ1) + θ2(1 − θ2)

≤ −1.282,

or

n ≥[

1.645√

2θ0(1 − θ0) + 1.282√

θ1(1 − θ1) + θ2(1 − θ2)

0.20

]2

.

Now, given that θ1 = (θ2 + 0.20), the quantity [θ1(1 − θ1) + θ2(1 − θ2)] is maxi-mized at θ1 = 0.60 and θ2 = 0.40. So, for θ0 = 0.10 and to cover all (θ1, θ2) values,choose

n ≥[

1.645√

2(0.10)(0.90) + 1.282√

0.60(0.40) + 0.40(0.60)

0.20

]2= 62.88;

so, n∗ = 63.

Solution 5.5

(a) Consider the simple null hypothesis H0: θ = 1 versus the simple alternativehypothesis H1: θ = θ1(> 1), where θ1 is any specific value of θ greater than 1.

Page 354: Exercises and Solutions in Biostatistical Theory (2010)

Solutions 335

Then, from the Neyman–Pearson Lemma and with y = (y1, y2, . . . , yn), the formof the rejection region for a MP test is based on the inequality

L(y; 1)

L(y; θ1)≤ k,

or

(∏ni=1 x

yii

)e−∑n

i=1 xi/ (∏n

i=1 yi!)

θ

∑ni=1 yi

1

(∏ni=1 x

yii

)e−θ1

∑ni=1 xi

/ (∏ni=1 yi!

) ≤ k,

or

θ−∑n

i=1 yi1 e(θ1−1)

∑ni=1 xi ≤ k,

or

⎛⎝−

n∑

i=1

yi

⎞⎠ (ln θ1) ≤ k′,

or

n∑

i=1

yi ≥ k′′, since ln θ1 > 0.

So, the MP rejection region R = {S : S ≥ kα}, where S =∑ni=1 Yi. Since S is a

discrete random variable, kα is a positive integer chosen so that pr{S ≥ kα|H0: θ =1}=α. Since this same form of rejection region is obtained for any value θ1 > 1, Ris the UMP region for a test of H0: θ = 1 versus H1: θ > 1.

(b) Since the test statistic is S =∑ni=1 Yi, we need to know the distribution of S. Since

Yi ∼ POI(θxi), i = 1, 2, . . . , n, and since the {Yi} are mutually independent,

MS(t) = E[etS] = E[et∑n

i=1 Yi ] = E

⎡⎣

n∏

i=1

etYi

⎤⎦ =

n∏

i=1

[E(etYi )]

=n∏

i=1

[eθxi(et−1)] = e(θ∑n

i=1 xi)(et−1) = e0.82θ(et−1),

so that S ∼ POI(0.82θ). So, under H0: θ = 1, S ∼ POI(0.82). So, we need to find k.05such that

pr(S ≥ k.05|θ = 1) = 1 −(k.05−1)∑

s=0

(0.82)se−0.82

s! = 0.05,

Page 355: Exercises and Solutions in Biostatistical Theory (2010)

336 Hypothesis Testing Theory

or such that

(k.05−1)∑

s=0

(0.82)se−0.82

s! = 0.95.

By trial-and-error, k.05 = 3. So, for α = 0.05, we reject H0: θ = 1 in favor of H1: θ >

1 when S =∑ni=1 Yi ≥ 3. Now, when θ = 5, S ∼ POI(4.10), so that

POWER = pr(S ≥ 3|θ = 5) = 1 − pr(S < 3|θ = 5)

= 1 −2∑

s=0

(4.10)se−4.10

s! = 1 − 0.2238 = 0.7762.

Solution 5.6. (a) Now, with k a constant,

α = pr[(X1 − X2) > k|H0] = pr[

Z >k√V

∣∣∣∣H0

],

so that we require k/√

V = Z1−α, or k = √VZ1−α.

And,

(1 − β) = pr[(X1 − X2) > k|H1] = pr[

Z − δ√V

>(k − δ)√

V

∣∣∣∣H1

],

which, since(

Z − δ√V

)∼ N(0,1) when H1 is true, requires that

(k − δ)√V

= −Z1−β, or k = −√VZ1−β + δ.

Finally, the equation√

VZ1−α = −√VZ1−β + δ

gives the requirement V = θ.

(b) Since the goal is to minimize N = (n1 + n2) with respect to n1 and n2, subject to theconstraint V = θ, consider the function

Q = (n1 + n2) + λ

(σ2

1n1

+ σ22

n2− θ

),

where λ is a Lagrange multiplier.Then, simultaneously solving the two equations

∂Q∂n1

= 1 − λσ21

n21

= 0 and∂Q∂n2

= 1 − λσ22

n22

= 0

Page 356: Exercises and Solutions in Biostatistical Theory (2010)

Solutions 337

gives

n21

n22

= σ21

σ22

orn1n2

= σ1σ2

.

Finally, if N = 100, σ21 = 9, and σ2

2 = 4, then n1/n2 = 1.5; then, the equation (n1 +n2) = (1.5n2 + n2) = 2.5n2 = 100 gives n2 = 40 and n1 = 60.

(c) Let C denote the cost of selecting an observation from Population 2, so that the totalsampling cost is (4Cn1 + Cn2) = C(4n1 + n2). So, we want to minimize the functionC(4n1 + n2) with respect to n1 and n2, subject to the constraint V = θ. Again usingLagrange multipliers, if

Q = C(4n1 + n2) + λ

(σ2

1n1

+ σ22

n2− θ

),

then the equation

∂Q∂n1

= 0 gives λ = 4Cn21

σ21

and the equation

∂Q∂n2

= 0 gives λ = Cn22

σ22

,

implying that n1/n2 = σ1/2σ2.And, since the equation

∂Q∂λ

= 0 gives V =(

σ21

n1+ σ2

2n2

)= θ,

so that

n1 =[σ2

1 + (n1/n2) σ22

]

θ,

we obtain

n1 =[σ2

1 + (σ1/2σ2) σ22

]

θ=(σ2

1 + (σ1σ2)/2)

θ

and

n2 =(

2σ2σ1

)n1 = (2σ1σ2 + σ2

2)

θ.

Then, with σ1 = 5, σ2 = 4, α = 0.05, β = 0.10, and δ = 3, then Z1−α = Z0.95 =1.645, Z1−β = Z0.90 = 1.282, and V = (3)2/(1.645 + 1.282)2 = 1.0505. Using these

Page 357: Exercises and Solutions in Biostatistical Theory (2010)

338 Hypothesis Testing Theory

values, we obtain n1 = 33.3175 and n2 = 53.3079; in practice, one would use n1 = 34and n2 = 54.

Solution 5.7

(a) Note that the CDF of X is

FX(x) = pr(X ≤ x) =∫x

0θ−1dt = x

θ, 0 < x < θ.

Hence, it follows that

α = pr(Type I error) = pr[

X(n) > c∗∣∣H0 : θ = 12

]

= 1 − pr

⎧⎨⎩

n⋂

i=1

(Xi ≤ c∗)∣∣H0 : θ = 1

2

⎫⎬⎭

= 1 −n∏

i=1

[pr(

Xi ≤ c∗∣∣H0 : θ = 12

)]

= 1 −

⎡⎢⎢⎣

c∗(

12

)

⎤⎥⎥⎦

n

= 1 − (2c∗)n.

⇒ (2c∗)n = (1 − α)

⇒ c∗ = (1 − α)1/n

2.

For 0 < α < 1, note that 0 < c∗ <12

.

(b) When α = 0.05, c∗ = (0.95)1/n/2. So,

0.98 ≤ POWER = pr{

X(n) > c∗∣∣θ = 34

}.

= 1 − pr

⎧⎨⎩

n⋂

i=1

(Xi ≤ c∗)∣∣θ = 3

4

⎫⎬⎭ .

= 1 −n∏

i=1

pr

{Xi ≤ (0.95)1/n

2

∣∣∣∣θ = 34

}

= 1 −

⎡⎢⎢⎣

(0.95)1/n/2(34

)

⎤⎥⎥⎦

n

Page 358: Exercises and Solutions in Biostatistical Theory (2010)

Solutions 339

= 1 − (0.95)

(23

)n

⇒ −0.02 ≤ −(0.95)

(23

)n

⇒(

23

)n≤ 0.0211 ⇒ n∗ = 10.

Solution 5.8

(a) Under H1, and with x = (x1, x2, . . . , xn) and μ = (μ1, μ2, . . . , μn), the (unrestricted)likelihood and log-likelihood functions are

L(x; μ) =n∏

i=1

xii e−μi

xi!

]=⎛⎝

n∏

i=1

μxii

⎞⎠ e−∑n

i=1 μi

⎛⎝

n∏

i=1

xi!⎞⎠

−1

and

ln L(x; μ) =n∑

i=1

xi ln μi −n∑

i=1

μi −n∑

i=1

ln xi!.

Solving

∂ ln L(x; μ)

∂μi= xi

μi− 1 = 0

yields the (unrestricted) MLEs μi = xi, i = 1, 2, . . . , n. Thus, with μ = x, we have

L(x; μ) =⎛⎝

n∏

i=1

xxii

⎞⎠ e−∑n

i=1 xi

⎛⎝

n∏

i=1

xi!⎞⎠

−1

.

Under H0, the likelihood and log-likelihood functions are

L(x; μ) = μ∑n

i=1 xi e−nμ

⎛⎝

n∏

i=1

xi!⎞⎠

−1

and

ln L(x; μ) =⎛⎝

n∑

i=1

xi

⎞⎠ ln μ − nμ −

n∑

i=1

ln xi!.

Solving

∂ ln L(x; μ)

∂μ=∑n

i=1 xiμ

− n = 0

Page 359: Exercises and Solutions in Biostatistical Theory (2010)

340 Hypothesis Testing Theory

yields the (restricted) MLE μ = x = n−1∑ni=1 xi. Thus,

L(x; μ) = (x)nxe−nx

⎛⎝

n∏

i=1

xi!⎞⎠

−1

.

So, the likelihood ratio statistic is

λ = L(x; μ)

L(x; μ)= (x)nxe−nx (∏n

i=1 xi!)−1

(∏ni=1 xxi

i

)e−nx

(∏ni=1 xi!

)−1= (x)nx(∏n

i=1 xxii

) .

So,

ln λ = (nx) ln x −n∑

i=1

xi ln xi = (ln x)

n∑

i=1

xi −n∑

i=1

xi ln xi

=n∑

i=1

xi(ln x − ln xi) =n∑

i=1

xi ln(

xxi

),

so that

−2 ln λ = 2n∑

i=1

xi ln(xi

x

).

Under H0: μ1 = μ2 = · · · = μn, −2 ln λ∼χ2(n−1)

for large n. For the given data set,

x = 20(5) + 10(6) + 10(8)

40= 240

40= 6,

so that

−2 ln λ = 2{

20[

5 ln(

56

)]+ 10

[6 ln

(66

)]+ 10

[8 ln

(86

)]}

= 2[100(−0.1823) + 0 + 80(0.2877)] = 2(23.015 − 18.230)

= 9.570.

Since χ20.95,39 > 50, we do not reject H0.

(b) Based on the results in part (a), there is no evidence to reject H0. Hence, L(x; μ) isthe appropriate likelihood to use. Since

∂ ln L(x; μ)

∂μ=∑n

i=1 xiμ

− n = n(

− 1)

= nμ

(x − μ),

it follows from exponential family theory that X is the MVBUE of μ. Hence, a CIbased on X would be an appropriate choice.

Page 360: Exercises and Solutions in Biostatistical Theory (2010)

Solutions 341

From ML theory (or Central Limit Theorem theory),

X − μ√V(X)

= X − μ√μ/n

∼ N(0, 1)

for large n. Since X is consistent for μ, by Slutsky’s Theorem,

X − μ√X/n

∼ N(0, 1)

for large n. Thus, an appropriate large-sample 100(1 − α)% CI for μ is: X ±Z1−α/2

√X/n. For the given data set, and for α = 0.05, we have 6 ± 1.96

√640 =

6 ± 0.759, giving (5.241, 6.759) as the computed 95% CI for μ.

Solution 5.9

(a) With x = (x11, x12, . . . , x1n; x21, x22, . . . , x2n),

L(x; λ1, λ2) =2∏

i=1

n∏

j=1

{(Lijλi)

xij e−Lijλi

xij!

}

=⎛⎝

2∏

i=1

n∏

j=1

(xij!)−1

⎞⎠λ

∑nj=1 x1j

1 λ

∑nj=1 x2j

2

⎛⎝

2∏

i=1

n∏

j=1

Lxijij

⎞⎠

×(

e−λ1∑n

j=1 L1j e−λ2∑n

j=1 L2j)

={λ

∑nj=1 x1j

1 λ

∑nj=1 x2j

2 e−λ1∑n

j=1 L1j e−λ2∑n

j=1 L2j

}

×⎧⎨⎩

⎛⎝

2∏

i=1

n∏

j=1

(xij!)−1

⎞⎠ ·⎛⎝

2∏

i=1

n∏

j=1

Lxijij

⎞⎠⎫⎬⎭,

so∑n

j=1 X1j and∑n

j=1 X2j are jointly sufficient for λ1 and λ2 by the FactorizationTheorem.

(b)

ln L(x; λ1, λ2) = constant +⎛⎝

n∑

j=1

x1j

⎞⎠ ln λ1 +

⎛⎝

n∑

j=1

x2j

⎞⎠ ln λ2

− λ1

n∑

j=1

L1j − λ2

n∑

j=1

L2j.

Page 361: Exercises and Solutions in Biostatistical Theory (2010)

342 Hypothesis Testing Theory

Solving for λi in the equation

∂ ln L(x; λ1, λ2)

∂λi=∑n

j=1 xij

λi−

n∑

j=1

Lij = 0

yields the MLE

λi =n∑

j=1

Xij

/ n∑

j=1

Lij , i = 1, 2.

(c) Under H0 : λ1 = λ2(= λ, say),

ln L(x; λ) = constant +⎛⎝

n∑

j=1

x1j +n∑

j=1

x2j

⎞⎠ ln λ − λ

⎛⎝

n∑

j=1

L1j +n∑

j=1

L2j

⎞⎠ .

Solving the equation

∂ ln L(x; λ)

∂λ=(∑n

j=1 x1j +∑nj=1 x2j

)

λ−⎛⎝

n∑

j=1

L1j +n∑

j=1

L2j

⎞⎠ = 0

yields the MLE

λ =∑n

j=1(x1j + x2j)∑nj=1(L1j + L2j)

.

(d) Now,

=∏2

i=1

{(∏nj=1 L

xijij

∑nj=1 xij e−λ

∑nj=1 Lij

/∏nj=1 xij!

}

∏2i=1

{(∏nj=1 L

xijij

∑nj=1 xij

i e−λi∑n

j=1 Lij

/∏nj=1 xij!

}

= λ

∑nj=1(x1j+x2j)e

−λ(∑n

j=1 L1j+∑n

j=1 L2j

)

λ

∑nj=1 x1j

1 λ

∑nj=1 x2j

2 e−λ1∑n

j=1 L1j e−λ2∑n

j=1 L2j

.

And, for large n,

−2 ln

(Lω

)∼ χ2

1

under H0 : λ1 = λ2.

Page 362: Exercises and Solutions in Biostatistical Theory (2010)

Solutions 343

(e) From part (d),

− 2 ln(Lω/LΩ)

= −2

⎧⎨⎩

⎛⎝

n∑

j=1

x1j +n∑

j=1

x2j

⎞⎠ ln λ − λ

⎛⎝

n∑

j=1

L1j +n∑

j=1

L2j

⎞⎠

−⎛⎝

n∑

j=1

x1j

⎞⎠ ln λ1 −

⎛⎝

n∑

j=1

x2j

⎞⎠ ln λ2 + λ1

n∑

j=1

L1j + λ2

n∑

j=1

L2j

⎫⎬⎭

= −2{(4 + 9) ln

(4 + 9

200 + 300

)−(

4 + 9200 + 300

)(200 + 300)

− (4) ln(0.02) − (9) ln(0.03) + 0.02(200) + 0.03(300)

}

= 0.477.

Since χ21,0.90 = 2.706, we do not reject H0 : λ1 = λ2, and the

P-value = pr{χ2

1 > 0.477|H0 : λ1 = λ2

}

= 0.50.

Solution 5.10

(a) Under H0 : α = β(= γ, say), the restricted likelihood is Lω = γ2ne−nγ(x+y). So,

∂ln(Lω)

∂γ= 2n

γ− n(x + y) = 0 gives γω = 2(x + y)−1.

Thus,

Lω = γ2nω e−nγω(x+y) =

[2

(x + y)

]2ne−2n.

Under H1 : α = β, the unrestricted likelihood is LΩ = αne−nαxβne−nβy. Thus,

∂ln(LΩ)

∂α= n

α− nx = 0 gives αΩ = (x)−1,

and∂ln(LΩ)

∂β= n

β− ny = 0 gives βΩ = (y)−1.

Thus,

LΩ =(

x−1)n

e−nx−1x(

y−1)n

e−ny−1y = (xy)−ne−2n.

Page 363: Exercises and Solutions in Biostatistical Theory (2010)

344 Hypothesis Testing Theory

Finally, the likelihood ratio statistic λ can be written as

λ = Lω

=[

4xy(x + y)2

]n= [4u(1 − u)]n, with u = x

(x + y).

(b) For the given set of data, λ = 0.0016. For large n and under H0 : α = β, the randomvariable −2 ln(λ)∼χ2

1. So,

P-value ≈ pr[−2 ln(λ) > −2 ln(0.0016)] < 0.0005.

Since E(X) = 1/α and E(Y) = 1/β, the available data provide strong statistical evi-dence that the two surgical procedures lead to different true average survival timesfor patients with advanced colon cancer.

Solution 5.11

(a) With x = (x1, x2, . . . , xn) and y = (y1, y2, . . . , yn), the unconstrained likelihood hasthe form

LΩ = L(x, y; θ1, θ2) =n∏

i=1

[θ1(1 − θ1)xiθ2(1 − θ2)yi

]

= θn1(1 − θ1)sxθn

2(1 − θ2)sy ,

where sx =∑ni=1 xi and sy =∑n

i=1 yi. So, ln LΩ = n ln θ1 + sx ln(1 − θ1) +n ln θ2 + sy ln(1 − θ2). Solving ∂ ln LΩ/∂θ1 = 0 and ∂ ln LΩ/∂θ2 = 0 yields theunconstrained MLEs θ1 = 1/(1 + x) and θ2 = 1/(1 + y). Substituting these unre-stricted MLEs for θ1 and θ2 into the expression for LΩ gives

LΩ =(

11 + x

)n ( x1 + x

)nx ( 11 + y

)n ( y1 + y

)ny.

Also, when θ1 = θ2 = θ, the constrained likelihood has the form Lω = θ2n(1 −θ)(sx+sy), so that ln Lω = 2n ln θ + (sx + sy) ln(1 − θ). Solving ∂ ln Lω/∂θ = 0 yieldsthe constrained MLE θ = 2/(2 + x + y). Then, substituting this restricted MLE forθ into the expression for Lω gives

Lω =(

22 + x + y

)2n ( x + y2 + x + y

)n(x+y)

.

So,

λ = Lω

= 22n(1 + x)n(1+x)(1 + y)n(1+y)(x + y)n(x+y)

(x)nx(y)ny(2 + x + y)n(2+x+y).

When n = 25, x = 1.00, and y = 2.00, −2 ln λ = 3.461. Under H0, −2 ln λ ∼ χ21

when n is large. Since χ21,0.95 = 3.841, we do not reject H0 at the α = 0.05 level

of significance.

Page 364: Exercises and Solutions in Biostatistical Theory (2010)

Solutions 345

(b) First, with

θ = (θ1, θ2), S′(θ) =

⎡⎢⎢⎢⎣

∂ ln LΩ

∂θ1

∂ ln LΩ

∂θ2

⎤⎥⎥⎥⎦ =

⎡⎢⎢⎣

nθ1

− sx

(1 − θ1)

nθ2

− sy

(1 − θ2)

⎤⎥⎥⎦

=

⎡⎢⎢⎣

n − nθ1(1 + x)

θ1(1 − θ1)

n − nθ2(1 + y)

θ2(1 − θ2)

⎤⎥⎥⎦.

Now, θ = 2/(2 + x + y) = 2/(2 + 1 + 2) = 0.40. So, when n = 25, x = 1.00, y =2.00, and θω = (θ, θ) = (0.40, 0.40), then S(θω) = (20.83333, −20.83333). Now,

∂2 ln LΩ

∂θ21

= −n

θ21

− sx

(1 − θ1)2 = −n

θ21

− nx(1 − θ1)2 ,

so that

−∂2 ln LΩ

∂θ21

∣∣θ1=θ=0.40,n=25,x=1.0

= 225.6944.

Also,

−∂2 ln LΩ

∂θ1∂θ2= −∂2 ln LΩ

∂θ2∂θ1= 0.

And,

∂2 ln LΩ

∂θ22

= −n

θ22

− sy

(1 − θ2)2 = −n

θ22

− ny(1 − θ2)2 ,

so that

−∂2 ln LΩ

∂θ22

∣∣θ2=θ=0.40,n=25,y=2.0

= 295.1389.

Finally,

S = S(θω)I−1(x, y; θω)S′(θω)

= (20.8333, −20.8333)

⎡⎢⎣

1225.6944

0

01

295.1389

⎤⎥⎦(

20.8333−20.8333

)= 3.394.

Under H0, S ∼ χ21 for large n. Since χ2

1,0.95 = 3.841, we again do not reject H0 at

the α = 0.05 level of significance. Although the numerical values of −2 ln λ and Sagree closely in this particular example, this will not always be the case.

Page 365: Exercises and Solutions in Biostatistical Theory (2010)

346 Hypothesis Testing Theory

(c) First, the actual P-value for either the likelihood ratio test or the score test satisfiesthe inequality 0.05 < P-value < 0.10. Also, since X and Y are unbiased estimatorsof E(X)and E(Y), respectively, and since x = 1.00 is half the size of y = 2.00, the datado provide some evidence suggesting that the teenage driver education classesare beneficial. So, the suggestion by the highway safety researcher to increasethe sample size is very reasonable; power calculations can be used to choose anappropriate sample size.

Solution 5.12

(a) The unconditional likelihood L(β) is

L(β) =n∏

i=1

{(1

1 + βxi

)yi−1 ( βxi1 + βxi

)}

⇒ ln L(β) =n∑

i=1

{(yi − 1) ln

(1

1 + βxi

)+ ln

(βxi

1 + βxi

)}

=n∑

i=1

[ln(βxi) − yi ln(1 + βxi)

]

⇒ d ln L(β)

dβ=

n∑

i=1

[1β

− xiyi(1 + βxi)

]= 0

⇒ n

β=

n∑

i=1

xiyi

(1 + βxi)

⇒ β = n∑n

i=1 xiyi

(1 + βxi

)−1 .

(b) From part (a), we know that,

d ln L(β)

dβ= n

β−

n∑

i=1

xiyi(1 + βxi)

⇒ d2 ln L(β)

dβ2 = −nβ2 +

n∑

i=1

x2i yi

(1 + βxi)2

⇒ −E

[d2 ln L(β)

dβ2

]= n

β2 −n∑

i=1

x2i E(Yi)

(1 + βxi)2 .

Now, since E(Yi) = θ−1i = (1 + βxi)/βxi, it follows that

I (β) = −E

[d2 ln L(β)

dβ2

]= n

β2 −n∑

i=1

x2i (1 + βxi)

βxi(1 + βxi)2

Page 366: Exercises and Solutions in Biostatistical Theory (2010)

Solutions 347

= nβ2 −

n∑

i=1

xiβ(1 + βxi)

= 1β2

⎡⎣

n∑

i=1

(1 − βxi

(1 + βxi)

)⎤⎦

= 1β2

n∑

i=1

(1 + βxi)−1.

So,

V(β) ={

−E

[d2 ln L(β)

dβ2

]}−1

= β2∑n

i=1(1 + βxi)−1 .

(c) For these data,

n∑

i=1

(1 + βxi)−1 = 50

[1 + 1

2(30)

]−1+ 50

[1 + 1

2(40)

]−1= 5.5060,

so that

V(β) =(

12

)2 /5.5060 = 0.0454.

So, a large-sample 95% CI for β is

β ± 1.96√

V(β)

= 0.50 ± 1.96√

0.0454

= (0.0824, 0.9176).

(d)

I (β) = 1β2

n∑

i=1

(1 + βxi)−1,

so that I (β) = I (1/2) = 4(5.5060) = 22.0240.

So, W =(

12 − 1

)2(22.0240) = 5.5060, since W = (β − β0)I (β)(β − β0) and β0 = 1.

Since χ21,0.95 = 3.84, we reject H0; P-value = 0.02.

(e)

POWER = pr {U > 1.96|β = 1.10}

= pr

⎧⎪⎨⎪⎩

β − 1√V0(β)

> 1.96∣∣β = 1.10

⎫⎪⎬⎪⎭

= pr{β > 1 + 1.96

√V0(β)

∣∣∣∣β = 1.10}

Page 367: Exercises and Solutions in Biostatistical Theory (2010)

348 Hypothesis Testing Theory

= pr

⎧⎪⎨⎪⎩

β − 1.10√V(β)

>1 + 1.96

√V0(β) − 1.10√V(β)

⎫⎪⎬⎪⎭

= pr

⎧⎪⎨⎪⎩

Z >1.96

√V0(β) − 0.10√

V(β)

⎫⎪⎬⎪⎭

,

where Z∼N(0, 1) for large n.Now, when β = 1,

V0(β) = 150(1 + 30)−1 + 50(1 + 40)−1 = 0.3531;

and, when β = 1.10,

V(β) = β2∑n

i=1(1 + βxi)−1

= (1.10)2

50[1 + 1.10(30)]−1 + 50[1 + 1.10(40)]−1 = 0.4687.

So,

POWER = pr

{Z >

1.96√

0.3531 − 0.10√0.4687

}

= pr(Z > 1.5552) = 0.06.

Solution 5.13

(a) If X is the random variable denoting the number of lung cancer cases developingover this 20-year follow-up period in a random sample of n = 1000 heavy smokers,it is reasonable to assume that X ∼ BIN(n, θ). The maximum likelihood estimatorof θ is

θ = Xn

,

with

E(θ) = θ and V(θ) = θ(1 − θ)

n.

Since n is large, by the Central Limit Theorem and by Slutsky’s Theorem,

0.95 .= pr

⎧⎪⎪⎪⎪⎨⎪⎪⎪⎪⎩

−1.96 <θ − θ√θ(1 − θ)

n

< 1.96

⎫⎪⎪⎪⎪⎬⎪⎪⎪⎪⎭

= pr{L < θ < U},

Page 368: Exercises and Solutions in Biostatistical Theory (2010)

Solutions 349

where

L = θ − 1.96

√θ(1 − θ)

nand U = θ + 1.96

√θ(1 − θ)

n.

Since ψ = θ/(1 − θ), so that θ = ψ/(1 + ψ) and θ−1 = 1 + ψ−1, and with0 < L < U,

0.95 .= pr{U−1 < θ−1 < L−1}= pr{U−1 < 1 + ψ−1 < L−1}

= pr

{(1L

− 1)−1

< ψ <

(1U

− 1)−1

}.

Since

L = 0.10 − 1.96

√0.10(0.90)

1000= 0.0814

and

U = 0.10 + 1.96

√0.10(0.90)

1000= 0.1186,

a large-sample 95% CI for ψ is

[(1

0.0814− 1)−1

,(

10.1186

− 1)−1

]= (0.0886, 0.1346).

Or, we can use ML methods directly. Since

L(x; θ) ∝ θx(1 − θ)n−x,

so that

ln L(x; ψ) ∼ ln

[(ψ

1 + ψ

)x ( 11 + ψ

)n−x]

= x ln(

ψ

1 + ψ

)+ (n − x) ln

(1

1 + ψ

)

= x [ln ψ − ln(1 + ψ)] − (n − x) ln(1 + ψ),

we have

∂ ln L(x; ψ)

∂ψ= x

ψ− x

(1 + ψ)− (n − x)

(1 + ψ).

Page 369: Exercises and Solutions in Biostatistical Theory (2010)

350 Hypothesis Testing Theory

So,

∂ ln L(x; ψ)

∂ψ= 0

gives

ψ = X(n − X)

= θ

(1 − θ),

as expected. Now,

∂2 ln L(x; ψ)

∂ψ2 = −xψ2 + x

(1 + ψ)2 + (n − x)

(1 + ψ)2

= −xψ2 + n

(1 + ψ)2 .

So, using observed information, a large-sample 95% CI for ψ is

ψ ± 1.96[

x

ψ2− n

(1 + ψ)2

]−1/2

= 0.1111 ± 1.96[

100(0.1111)2 − 1, 000

(1 + 0.1111)2

]−1/2

= 0.1111 ± 1.96(8, 101.6202 − 810.0162)−1/2

= 0.1111 ± 0.0230

= (0.0881, 0.1341).

Or, since

−Ex

[∂2 ln L(x; ψ)

∂ψ2

]= nθ

ψ2 − n(1 + ψ)2 = n

ψ(1 + ψ)− n

(1 + ψ)2 ,

a 95% CI for ψ using expected information is

ψ ± 1.96[

n

ψ(1 + ψ)− n

(1 + ψ)2

]−1/2

= 0.1111 ± 1.96[

1, 0000.1111(1 + 0.1111)

− 1, 000(1 + 0.1111)2

]−1/2

= 0.1111 ± 0.0230

= (0.0881, 0.1341).

Page 370: Exercises and Solutions in Biostatistical Theory (2010)

Solutions 351

(b) Clearly, testing H0: ψ = 0.10 versus H1: ψ > 0.10 is equivalent to testing H0: θ =0.0909 versus H1: θ > 0.0909. Since

∂ ln L(x; θ)∂θ

= xθ

− (n − x)

(1 − θ),

we have

∂2 ln L(x; θ)∂θ2 = −x

θ2 − (n − x)

(1 − θ)2 ;

hence, the estimated observed information is

100(0.10)2 + 900

(1 − 0.10)2 = 10, 000 + 1111.111 = 11, 111.111.

And, since

−Ex

[∂2 ln L(x; θ)

∂θ2

]= nθ

θ2 + (n − nθ)

(1 − θ)2 = nθ(1 − θ)

,

the estimated expected information is 1,0000.10(1−0.10)

= 11, 111.111. So, the Waldstatistic is

W = (0.10 − 0.0909)2(11, 111.111) = 0.9201,

with

P-value = pr(√

W >√

0.9201 | H0: θ = 0.0909).= pr(Z > 0.9592)

.= 0.17,

where Z ∼ N(0, 1). Since

∂ ln L(x; θ)∂θ

∣∣∣∣θ=0.0909

= (100)

0.0909− (1, 000 − 100)

(1 − 0.0909)

= 1100.11 − 989.9901 = 110.1199,

the score statistic is

S = (110.1199)2

1000/[0.0909(0.9091)] = 1.0021,

with

P-value = pr(√

S >√

1.0021|H0: θ = 0.0909).= pr(Z > 1.0010)

.= 0.16,

where Z ∼ N(0, 1). The results of these Wald and score tests imply that H0: θ =0.0909 cannot be rejected given the available data.

Page 371: Exercises and Solutions in Biostatistical Theory (2010)

352 Hypothesis Testing Theory

Of course, we can equivalently work with the parameter ψ, and directlytest H0: ψ = 0.10 versus H1: ψ > 0.10 using appropriate Wald and score tests.From part (a), the appropriate estimated observed information is (8101.6202 −810.0162) = 7, 291.6040; so, the Wald test statistic is

W = (0.1111 − 0.10)2(7, 291.6040) = 0.8984,

with

P-value = pr(√

W >√

0.8984|H0: ψ = 0.10).= pr(Z > 0.9478) = 0.17,

where Z ∼ N(0, 1). Since

∂ ln L(x; ψ)

∂ψ |ψ=0.10= 100

0.10− 1, 000

(1 + 0.10)= 1000 − 909.0909 = 90.9091,

the score test statistic is

S = (90.9091)2

1000/[.10(1 + .10)2] = 1.0000,

with

P-value = pr(√

S >√

1.000|H0: ψ = 0.10).= pr(Z > 1.0000) = 0.16,

where Z ∼ N(0, 1). As before, there is not sufficient evidence to reject H0: ψ = 0.10.

Solution 5.14

(a) The unrestricted likelihood function LΩ has the structure

LΩ =n∏

i=1

fX,Y(xi, yi; α, β) =n∏

i=1

fX(xi; β)fY(yi|X = xi; α, β)

=n∏

i=1

{1β

e−xi/β1

(α + β)xie−yi/(α+β)xi

}

= β−ne−∑ni=1 xi/β(α + β)−ne

−∑ni=1

(yixi

)/(α+β) ·

⎛⎝

n∏

i=1

xi

⎞⎠

−1

.

By the Factorization Theorem, U1 =∑ni=1 Xi and U2 =∑n

i=1 (Yi/Xi) are jointlysufficient for α and β. If we can show that Xi and Yi/Xi are uncorrelated, then U1and U2 will be uncorrelated.

Now, E(Xi) = β and E(Yi) = Exi [E(Yi|Xi = xi)] = E[(α + β)xi] = (α + β)β.

Page 372: Exercises and Solutions in Biostatistical Theory (2010)

Solutions 353

And,

E(

YiXi

)= Exi

[E(

YiXi

|Xi = xi

)]

= Exi

[1xi

E(Yi|Xi = xi)

]= E

[1xi

(α + β)xi

]= (α + β).

Since

E(

Xi · YiXi

)= E(Yi) = (α + β)β,

cov(

Xi,YiXi

)= E

(Xi · Yi

Xi

)− E(Xi)E

(YiXi

)

= (α + β)β − β(α + β) = 0,

and hence U1 and U2 are uncorrelated.

(b) Now,

ln LΩ = −n ln β −∑n

i=1 xiβ

− n ln(α + β) −∑n

i=1(yi/xi)

(α + β)−

n∑

i=1

ln xi.

So,

∂ ln LΩ

∂α= −n

(α + β)+∑n

i=1(yi/xi)

(α + β)2 = 0 ⇒ (α + β) =∑n

i=1(yi/xi)

n.

And,

∂ ln LΩ

∂β= −n

β+∑n

i=1 xi

β2 − n(α + β)

+∑n

i=1(yi/xi)

(α + β)2 = 0

⇒ β = X =∑n

i=1 Xin

and α =∑n

i=1(Yi/Xi)

n− X.

Now,

∂2 ln LΩ

∂α2 = n(α + β)2 − 2

∑ni=1(yi/xi)

(α + β)3 ,

so that

−E

(∂2 ln LΩ

∂α2

)= −n

(α + β)2 + 2n(α + β)

(α + β)3 = n(α + β)2 .

Also,

∂2 ln LΩ

∂β2 = nβ2 − 2

∑ni=1 xi

β3 + n(α + β)2 − 2

∑ni=1(yi/xi)

(α + β)3 ,

Page 373: Exercises and Solutions in Biostatistical Theory (2010)

354 Hypothesis Testing Theory

so that

−E

(∂2 ln LΩ

∂β2

)= −n

β2 + 2nβ

β3 − n(α + β)2 + 2n(α + β)

(α + β)3

= nβ2 + n

(α + β)2 .

And,

∂2 ln LΩ

∂α∂β= n

(α + β)2 − 2∑n

i=1(yi/xi)

(α + β)3 ,

so that

−E

(∂2 ln LΩ

∂α∂β

)= −n

(α + β)2 + 2n(α + β)

(α + β)3 = n(α + β)2 .

Thus, the expected information matrix I(α, β) is equal to

I(α, β) =

⎡⎢⎢⎣

n(α + β)2

n(α + β)2

n(α + β)2

nβ2 + n

(α + β)2

⎤⎥⎥⎦ ,

and so

I−1(α, β) =

⎡⎢⎢⎢⎣

(α + β)2

n+ β2

n−β2

n

−β2

nβ2

n

⎤⎥⎥⎥⎦ .

For H0 : α = β, or equivalently H0 : R = (α − β) = 0, we have

T =[

∂R∂α

,∂R∂β

]= (1, −1),

and so

Λ = TI−1(α, β)T′ = (α + β)2

n+ 4β2

n

= V(R) = V(α − β) = V(α) + V(β) − 2cov(α, β).

So, the Wald test statistic W takes the form

W = R

Λ= (α − β)2

(α + β)2

n+ 4β2

n

=

⎡⎢⎢⎣

(α − β) − 0√V(α − β

)

⎤⎥⎥⎦

2

, as expected.

Page 374: Exercises and Solutions in Biostatistical Theory (2010)

Solutions 355

For n = 30, α = 2, and β = 1,

W = (2 − 1)2

(2 + 1)2

30+ 4(1)2

30

= 2.31.

Since W ∼ χ21 under H0 : α = β for large n, the P-value = pr(χ2

1 > 2.31|H0 : α =β) = 0.14. So, for the given data, there is not sufficient evidence to reject H0 : α = β.

A (large sample) 95% confidence interval for (α − β) is

(α − β) ± 1.96√

V(α − β) = (α − β) ± 1.96

√(α + β)2

n+ 4β2

n

= (2 − 1) ± 1.96

√(2 + 1)2

30+ 4(1)2

30= (−0.29, 2.29).

The computed 95% CI contains the value 0, which agrees with the conclusion basedon the Wald test.

Solution 5.15

(a) With y = (y11, y12, . . . , y1n; y21, y22, . . . , y2n), we have

L(y; β1, β2) =2∏

i=1

n∏

j=1

⎧⎨⎩

1

βix2ij

e−yij/βix2ij

⎫⎬⎭

=2∏

i=1

⎧⎨⎩

1

βni∏n

j=1 x2ij

e−β−1i∑n

j=1 x−2ij yij

⎫⎬⎭.

Hence, by the Factorization Theorem,

n∑

j=1

y1j

x21j

is sufficient for β1,

andn∑

j=1

y2j

x22j

is sufficient for β2.

(b) Now,

ln L(y; β1, β2) =2∑

i=1

n∑

j=1

⎧⎨⎩− ln βi − ln x2

ij − yij

βix2ij

⎫⎬⎭

= −n2∑

i=1

ln βi −2∑

i=1

n∑

j=1

ln x2ij −

2∑

i=1

β−1i

n∑

j=1

yij

x2ij

.

Page 375: Exercises and Solutions in Biostatistical Theory (2010)

356 Hypothesis Testing Theory

So, for i = 1, 2,

Si(β1, β2) = ∂ ln L(y; β1, β2)

∂βi= −n

βi+ 1

β2i

n∑

j=1

yij

x2ij

= 0

gives

βi = 1n

n∑

j=1

yij

x2ij

, i = 1, 2.

And,

∂2 ln L(y; β1, β2)

∂β2i

= n

β2i

− 2

β3i

n∑

j=1

yij

x2ij

, i = 1, 2,

so that

−E

{∂2 ln L(y; β1, β2)

∂β2i

}= −n

β2i

+ 2

β3i

n∑

j=1

E(Yij)

x2ij

= −n

β2i

+ 2

β3i

n∑

j=1

βix2ij

x2ij

= −n

β2i

+ 2n

β2i

= n

β2i

.

Also,

∂2 ln L(y; β1, β2)

∂β1∂β2= ∂2 ln L(y; β1, β2)

∂β2∂β1= 0,

so that the expected information matrix is

I(β1, β2) =[

n/β21 0

0 n/β22

].

Under H0: β1 = β2 (= β, say),

ln L(y; β) = −2n ln β −2∑

i=1

n∑

j=1

ln x2ij − 1

β

2∑

i=1

n∑

j=1

yij

x2ij

,

so that the equation

∂ ln L(y; β)

∂β= −2n

β+ 1

β2

2∑

i=1

n∑

j=1

yij

x2ij

= 0

Page 376: Exercises and Solutions in Biostatistical Theory (2010)

Solutions 357

gives

β =∑2

i=1∑n

j=1yij

x2ij

2n= 1

2(β1 + β2).

So, with S(β) = [S1(β, β), S2(β, β)],

S = S(β)I −1(β, β)S′(β)

=[

−n

β+ nβ1

β2,−n

β+ nβ2

β2

][β 2/n 0

0 β 2/n

]⎡⎢⎢⎢⎣

−n

β+ nβ1

β2

−n

β+ nβ2

β2

⎤⎥⎥⎥⎦

= [−β + β1, −β + β2]

⎡⎢⎢⎢⎣

−n

β+ nβ1

β2

−n

β+ nβ2

β2

⎤⎥⎥⎥⎦

= [(β1 − β), (β2 − β)][(β1 − β)

(β2 − β)

](n

β2

)

= n[(β1 − β)2 + (β2 − β)2]β2

=n[

14(β1 − β2)2 + 1

4(β1 − β2)2

]

14(β1 + β2)2

= 2n(β1 − β2)2

(β1 + β2)2.

Under H0: β1 = β2, S ∼χ21 for large n. For the given data,

S = 2(25)(2 − 3)2

(2 + 3)2 = 5025

= 2.

Since χ20.95,1 = 3.841, we do not reject H0 at the α = 0.05 level.

Solution 5.16

(a) The unrestricted likelihood function is

LΩ =n∏

i=1

θ−1e−xi/θ ·n∏

i=1

(λθ)−1e−yi/λθ

Page 377: Exercises and Solutions in Biostatistical Theory (2010)

358 Hypothesis Testing Theory

= θ−n exp

⎧⎨⎩−θ−1

n∑

i=1

xi

⎫⎬⎭ (λθ)−n exp

⎧⎨⎩−(λθ)−1

n∑

i=1

yi

⎫⎬⎭ ;

so, by the Factorization Theorem,

Sx =n∑

i=1

Xi and Sy =n∑

i=1

Yi

are jointly sufficient for λ and θ.

(b) From part (a),

ln LΩ = −n ln θ −∑n

i=1 xiθ

− n ln(λθ) −∑n

i=1 yiλθ

= −2n ln θ − n ln λ −∑n

i=1 xiθ

−∑n

i=1 yiλθ

.

So,

∂ ln LΩ

∂θ= −2n

θ+∑n

i=1 xi

θ2 +∑n

i=1 yi

λθ2 ,

and

∂ ln LΩ

∂λ= − n

λ+∑n

i=1 yi

λ2θ.

Now,

∂ ln LΩ

∂λ= 0 =⇒ −n

θ+∑n

i=1 yi

λθ2 = 0 =⇒∑n

i=1 yi

λθ2 = nθ

.

Thus,

∂ ln LΩ

∂θ= 0

gives

−2nθ

+∑n

i=1 xi

θ2 + nθ

= −nθ

+∑n

i=1 xi

θ2 = 0,

or

θ = x.

Then,

−nλ

+ nyλ2x

= 0

Page 378: Exercises and Solutions in Biostatistical Theory (2010)

Solutions 359

gives

λ = yx

.

(c) Let θ = (θ, λ) denote the set of unknown parameters. From part (b),

S(θ) =[−2n

θ+ nx

θ2 + nyλθ2 ,

−nλ

+ nyλ2θ

].

Under H0 : λ = 1, the restricted log likelihood is

ln Lω = −2n ln θ − n(x + y)

θ;

so,

∂ ln Lω

∂θ= −2n

θ+ n(x + y)

θ2 = 0

gives

θω = (x + y)

2.

Thus,

θω = [(x + y)/2, 1]

.

Now,

− 2n

θω

+ nx

θ2ω

+ ny

(1)θ2ω

= −4n(x + y)

+ 4nx(x + y)2 + 4ny

(x + y)2 = 0,

and

−n(1)

+ 2ny(1)2(x + y)

= −n(x + y) + 2ny(x + y)

= n(y − x)

(x + y),

so that

S(θω) =[

0,n(y − x)

(x + y)

].

Finally, we need I−1(θω). Now,

∂2 ln LΩ

∂θ2 = 2nθ2 − 2nx

θ3 − 2nyλθ3 ,

Page 379: Exercises and Solutions in Biostatistical Theory (2010)

360 Hypothesis Testing Theory

so that

−E

(∂2 ln LΩ

∂θ2

)= −2n

θ2 + 2nθ

θ3 + 2nλθ

λθ3 = 2nθ2 .

And,

∂2 ln LΩ

∂θ∂λ= −ny

λ2θ2 ,

so that

−E

(∂2 ln LΩ

∂θ∂λ

)= nλθ

λ2θ2 = nλθ

.

Also,

∂2 ln LΩ

∂λ2 = nλ2 − 2ny

λ3θ,

so that

−E

(∂2 ln LΩ

∂λ2

)= −n

λ2 + 2nλθ

λ3θ= n

λ2 .

So,

I(θ) =⎡⎢⎣

2nθ2

nλθ

nλθ

nλ2

⎤⎥⎦ ,

and hence

I−1(θ) =

⎡⎢⎢⎣

θ2

n−λθ

n−λθ

n2λ2

n

⎤⎥⎥⎦ .

So,

I−1(θω) =

⎡⎢⎢⎣

(x + y)2

4n−(x + y)

2n−(x + y)

2n2n

⎤⎥⎥⎦ .

Page 380: Exercises and Solutions in Biostatistical Theory (2010)

Solutions 361

Finally,

S =[

0,n(y − x)

(x + y)

]⎡⎢⎢⎣

(x + y)2

4n−(x + y)

2n−(x + y)

2n2n

⎤⎥⎥⎦

⎡⎣

0n(y − x)

(x + y)

⎤⎦

= 2n(y − x)2

(x + y)2 =⎡⎢⎣ (y − x)√

V0(y − x)

⎤⎥⎦

2

,

since

V0(Y − X) = θ2

n+ [(1)θ]2

n= 2θ2

nand θω = (x + y)

2,

so that

V0(Y − X) = 2θ2ω

n= (x + y)2

2n.

For n = 50, x = 30, and y = 40,

S = 2(50)(40 − 30)2

(30 + 40)2 = 2.04.

So,

P-value = pr(

χ21 > 2.04

∣∣∣H0 : λ = 1) .= 0.15.

So, there is not sufficient evidence with these data to reject H0 : λ = 1.

Solution 5.17

(a) The marginal cumulative distribution function (CDF) of X, FX(x), is given by

FX(x) = Eλ [FX(x|λ)] =∫∞

0FX(x|λ)π(λ) dλ

=∫∞

0

(1 − e−λx

)βe−βλdλ

= 1 −∫∞

0βe−(x+β)λdλ

= 1 − β

(x + β)

∫∞0

(x + β)e−(x+β)λdλ

= 1 −(

1 + xβ

)−1, x > 0.

Page 381: Exercises and Solutions in Biostatistical Theory (2010)

362 Hypothesis Testing Theory

Thus,

fX(x) = 1β

(1 + x

β

)−2, x > 0, β > 0,

which is a generalized Pareto distribution with scale parameter equal to β, shapeparameter equal to 1, and location parameter equal to 0.

(b) Now,

π(λ|X = x) = fX,λ(x, λ)

fX(x)= fX(x|λ)π(λ)

fX(x)

= λβe−(x+β)λ

(1 + x

β

)−2

= λβ2(

1 + xβ

)2e−(x+β)λ = λ(x + β)2e−(x+β)λ, λ > 0.

Thus, the posterior distribution for λ is GAMMA[(x + β)−1, 2]. Since π(λ) isGAMMA(β−1, 1), the prior and posterior distributions belong to the same dis-tributional family, and hence π(λ) is known as a conjugate prior.

(c) For a given value of λ (λ∗, say), pr(λ < λ∗) = 1 − e−βλ∗based on the prior

distribution π(λ). And, given an observed value x of X,

pr(λ < λ∗|X = x) =∫λ∗

0π(λ|X = x) dλ

=∫λ∗

0λ(x + β)2e−(x+β)λ dλ.

Using integration by parts with u = λ and dv = (x + β)2e−(x+β)λ, we have

pr(λ < λ∗|X = x) = −λ(x + β)e−(x+β)λ|λ∗0 +

∫λ∗

0(x + β)e−(x+β)λdλ

= −λ∗(x + β)e−(x+β)λ∗ + 1 − e−(x+β)λ∗

= 1 − [λ∗(x + β) + 1]

e−(x+β)λ∗.

With λ∗ = 1, β = 1 and x = 3, pr(H1) = 1 − e−1 = 0.6321, pr(H0) = 1 − pr(H1)

= 0.3679, pr(H1|X = x) = 1 − 5e−4 = 0.9084, and pr(H0|X = x) = 1 − pr(H1|X= x) = 0.0916. Thus,

BF10 = pr(H1|X = x)pr(H0)

pr(H0|X = x)pr(H1)= (0.9084)(0.3679)

(0.0916)(0.6321)= 5.77.

Hence, observing a survival time of x = 3 years yields “positive,” but not “strong,”evidence in favor of H1.

Page 382: Exercises and Solutions in Biostatistical Theory (2010)

Solutions 363

Solution 5.18∗

(a) With x = (x1, x2, . . . , xn) and y = (y1, y2, . . . , yn), the likelihood function has thestructure

L(x, y; θ, φ1, φ2, . . . , φn) =n∏

i=1

fXi (xi)fYi (yi)

=n∏

i=1

{(θφi)

−1e−xi/θφi · φ−1i e−yi/φi

}

= θ−n

⎛⎝

n∏

i=1

φi

⎞⎠

−2

e

(−∑n

i=1

[xiθφi

+ yiφi

]),

0 < xi < ∞, 0 < yi < ∞, i = 1, 2, . . . , n.

Using this particular likelihood would entail the estimation of (n + 1) parameters,namely, θ, φ1, φ2, . . . , φn. Note that there are only 2n data points, so that the numberof parameters to be estimated is more than half the number of data values; thistype of situation often leads to unreliable statistical inferences.

(b) We need to use the method of transformations. We know that

fXi ,Yi (xi, yi) = fXi (xi) · fYi (yi)

= 1θφi

e−xi/θφi · 1φi

e−yi/φi

= 1

θφ2i

e−(

xiθφi

+ yiφi

), 0 < xi < +∞, 0 < yi < +∞.

Let Ri = Xi/Yi and Si = Yi, so that Xi = RiSi and Yi = Si. Clearly, 0 < Ri < +∞and 0 < Si < +∞. And,

J =

∣∣∣∣∣∣∣∣

∂Xi∂Ri

∂Xi∂Si

∂Yi∂Ri

∂Yi∂Si

∣∣∣∣∣∣∣∣=∣∣∣∣Si Ri0 1

∣∣∣∣ = Si = |Ji|.

So,

fRi ,Si (ri, si) = 1

θφ2i

e−( risi

θφi+ si

φi

)(si),

Page 383: Exercises and Solutions in Biostatistical Theory (2010)

364 Hypothesis Testing Theory

0 < ri < +∞, 0 < si < +∞. Finally,

fRi (ri) = 1

θφ2i

∫∞0

sie− si

φi

(riθ+1)

dsi

= 1

θφ2i

∫∞0

sie−si/

φi( riθ

+1)

dsi

= 1

θφ2i

[φi( ri

θ+ 1)]2

= θ

(θ + ri)2 , 0 < ri < +∞.

(c) With r = (r1, r2, . . . , rn), we have

L(r; θ) ≡ L =n∏

i=1

(θ + ri)2

}= θn

n∏

i=1

(θ + ri)−2.

So,

ln L = n ln θ − 2n∑

i=1

ln(θ + ri),

∂ ln L∂θ

= nθ

− 2n∑

i=1

(θ + ri)−1,

∂2 ln L∂θ2 = −n

θ2 + 2n∑

i=1

(θ + ri)−2.

And,

E[(θ + ri)−2] =

∫∞0

(θ + ri)−2 θ

(θ + ri)2 dri

=∫∞

0

θ

(θ + ri)4 dri

= θ

[−(θ + ri)

−3

3

]∞

0

= 13θ2 .

So,

−E

(∂2 ln L

∂θ2

)= n

θ2 − 2n∑

i=1

(1

3θ2

)= n

θ2 − 2n3θ2 = n

3θ2 .

Page 384: Exercises and Solutions in Biostatistical Theory (2010)

Solutions 365

Hence, if θ is the MLE of θ, then

θ − θ√3θ2/n

∼N(0, 1)

for large n.To test H0: θ = 1 versus H1 : θ > 1, we would reject H0 if (θ − 1)/

√3/n > 1.96

for a size α = 0.025 test; note that this is a score test. So, when θ = 1.50,

POWER = pr

{θ − 1√

3/n> 1.96|θ = 1.50

}

= pr

{θ − 1.50√3(1.50)2/n

>(1 + 1.96

√3/n − 1.50)√

3(1.50)2/n

}

≈ pr[

Z >1.961.50

−√

n

3√

3

]

where Z ∼ N(0, 1). So, we should choose n∗ as the smallest positive integer valueof n such that

−√n

3√

3+ 1.96

1.50≤ −0.84,

or, equivalently,

−√n ≤ −3

√3(

1.961.50

+ 0.84)

= −11.1546 =⇒ n∗ = 125.

Solution 5.19∗

(a) First, we know that Si ∼ POI[φi(λ1 + λ0)]. So, for i = 1, 2, . . . , n,

pYi1(yi1|Si = si) = pr[(Yi1 = yi1) ∩ (Si = si)]

pr(Si = si)

= pr[(Yi1 = yi1) ∩ (Yi0 = si − yi1)]pr(Si = si)

=

[(φiλ1)yi1 e−φiλ1

yi1!

][(φiλ0)(si−yi1)e−φiλ0

(si − yi1)!

]

{[φi(λ1 + λ0)]si e−φi(λ1+λ0)

si!

}

= Csiyi1

(λ1

λ1 + λ0

)yi1(

λ0λ1 + λ0

)si−yi1,

yi1 = 0, 1, . . . , si.

So, the conditional distribution of Yi1 given Si = si is BIN[si, λ1/(λ1 + λ0)].

Page 385: Exercises and Solutions in Biostatistical Theory (2010)

366 Hypothesis Testing Theory

(b) Based on the result found in part (a), an appropriate (conditional) likelihoodfunction is

Lc =n∏

i=1

Csiyi1θ

yi1(1 − θ)si−yi1 ,

where θ = λ1/(λ1 + λ0).Thus,

ln Lc ∝ ln θ

n∑

i=1

yi1 + ln(1 − θ)

⎛⎝

n∑

i=1

si −n∑

i=1

yi1

⎞⎠ ,

so that

∂ ln Lc

∂θ=∑n

i=1 yi1θ

−(∑n

i=1 si −∑ni=1 yi1

)

(1 − θ)= 0

gives θ =∑ni=1 yi1

/∑ni=1 si.

And, with S = (S1, S2, . . . , Sn) and s = (s1, s2, . . . , sn),

∂2 ln Lc

∂θ2 = −∑ni=1 yi1

θ2 −(∑n

i=1 si −∑ni=1 yi1

)

(1 − θ)2

gives

V(θ|S = s) = −[

E

(∂2 ln Lc

∂θ2

) ∣∣∣∣S = s

]−1

=[

θ∑n

i=1 si

θ2 + (1 − θ)∑n

i=1 si

(1 − θ)2

]−1

= θ(1 − θ)∑ni=1 si

.

So, given S = s, under H0 : θ = 1/2 (or, equivalently, λ1 = λ0) and for large n, itfollows that

U = θ − 12√

(1/2)(1/2)

(∑n

i=1 si)

∼N(0, 1);

so, we would reject H0 : θ = 1/2 (or, equivalently, λ1 = λ0) in favor of H1 : θ > 1/2(or, equivalently, λ1 > λ0) when the observed value u of U exceeds 1.645. Notethat this is a score-type test statistic.

When n = 50,∑n

i=1 si = 500, and∑n

i=1 yi1 = 275, then θ = 275/500 = 0.55,so that

u = 0.55 − 0.50(12

)√1/500

= 2.236;

Page 386: Exercises and Solutions in Biostatistical Theory (2010)

Solutions 367

so, these data provide strong evidence (P-value = 0.0127) for rejecting H0 : θ = 1/2in favor of H1 : θ > 1/2.Another advantage of this conditional inference procedureis that its use avoids the need to estimate the parameters λ1 and λ0 separately.

Solution 5.20∗

(a) With x = (x1, x2, . . . , xn) and y = (y1, y2, . . . , yn),

L ≡ L(x, y; λx, λy) =n∏

i=1

xix e−λx

xi! · λyiy e−λy

yi!

]

= λnxx e−nλx∏n

i=1 xi!· λ

nyy e−nλy

∏ni=1 yi!

.

So,

ln L ∼ nx ln λx + ny ln λy − n(λx + λy).

Thus,

∂ ln L∂λx

= nxλx

− n,∂ ln L∂λy

= nyλy

− n,

∂2 ln L∂λ2

x= −nx

λ2x

,∂2 ln L∂λ2

y= −ny

λ2y

,

and

∂2 ln L∂λx∂λy

= ∂2 ln L∂λy∂λx

= 0.

Hence,

−E

[∂2 ln L∂λ2

x

]= nλx

λ2x

= nλx

,

and

−E

[∂2 ln L∂λ2

y

]= nλy

λ2y

= nλy

.

So,

I(λx, λy) =⎡⎢⎣

nλx

0

0nλy

⎤⎥⎦ and I−1(λx, λy) =

⎡⎢⎣

λx

n0

0λy

n

⎤⎥⎦ .

Page 387: Exercises and Solutions in Biostatistical Theory (2010)

368 Hypothesis Testing Theory

Under H0 : λx = λy (= λ, say), ln Lω ∼ nx ln λ + ny ln λ − 2nλ.

Solving

∂ ln Lω

∂λ= nx

λ+ ny

λ− 2n = 0

gives

λ = (x + y)

2.

So,

S =⎡⎢⎣

nx

λ− n

ny

λ− n

⎤⎥⎦

′ [λ/n 0

0 λ/n

]⎡⎢⎣

nx

λ− n

ny

λ− n

⎤⎥⎦ .

Since λ = (8.00 + 9.00)/2 = 8.500 and n = 30,

S =

⎡⎢⎢⎣

30(8)

8.5− 30

30(9)

8.5− 30

⎤⎥⎥⎦

′ ⎡⎢⎢⎣

8.530

0

08.530

⎤⎥⎥⎦

⎡⎢⎢⎣

30(8)

8.5− 30

30(9)

8.5− 30

⎤⎥⎥⎦ = 1.7645.

Since,

P-value = pr(χ21 > 1.7645) ≥ 0.15,

we would not reject H0 at any conventional α−level.

(b) Now, since (X1 + Y1) ∼ POI(λx + λy), we have

pX1(x1|X1 + Y1 = s1) = pr(X1 = x1|X1 + Y1 = s1)

= pr [(X1 = x1) ∩ (X1 + Y1 = s1)]pr [X1 + Y1 = s1]

= pr(X1 = x1)pr(Y1 = s1 − x1)

pr(X1 + Y1 = s1)

=

x1x e−λx

x1!

][λ

s1−x1y e−λy

(s1 − x1)!

]

[(λx + λy)s1 e−(λx+λy)

s1!

]

= s1!x1!(s1 − x1)!

(λx

λx + λy

)x1(

1 − λx

λx + λy

)s1−x1

= Cs1x1 πx1(1 − π)s1−x1

Page 388: Exercises and Solutions in Biostatistical Theory (2010)

Solutions 369

for x1 = 0, 1, . . . , s1 and π = λx/(λx + λy). So, given (X1 + Y1) = s1,

X1 ∼ BIN

[n = s1, π = λx

(λx + λy)

].

If δ = 0.60, then H0 : λy = 0.60λx is equivalent to testing

H′0 : π = λx

λx + 0.60λx= 1

1.60= 0.625,

and H1 : λy > 0.60λx is equivalent to testing

H′1 : π <

λx

λx + 0.60λx= 0.625.

So, for the given data, the exact P-value is

P-value = pr(X1 ≤ 4|S1 = 14, θ = 0.625)

=4∑

x1=0

C14x1

(0.625)x1(0.375)14−x1 .= 0.0084.

So, given the observed values of x1 = 4 and y1 = 10, one would reject H0 : λy =0.60λx in favor of H1 : λy > 0.60λx using this conditional test.

Solution 5.21∗

(a) From standard order statistics theory, it follows directly that

fX(1)(x(1)) = n[1 − x(1) + θ]n−1, 0.50 ≤ θ < x(1) < (θ + 1) < +∞,

that

fX(n)(x(n)) = n[x(n) − θ]n−1, 0.50 ≤ θ < x(n) < (θ + 1) < +∞,

and that

fX(1),X(n)(x(1), x(n)) = n(n − 1)[x(n) − x(1)]n−1,

0.50 ≤ θ < x(1) < x(n) < (θ + 1) < +∞.

Now, pr(B|H0 : θ = 1) = pr(X(n) > 2|H0 : θ = 1) = 0 since 1 < X(n) < 2 when θ =1. Thus, it follows that the probability of a Type I error is equal to

pr(A|H0 : θ = 1) = pr(X(1) > k|θ = 1) =∫2

kn(2 − x(1))

n−1dx(1)

= [−(2 − x(1))n]2

k = (2 − k)n;

Page 389: Exercises and Solutions in Biostatistical Theory (2010)

370 Hypothesis Testing Theory

thus, solving the equation (2 − k)n = α gives

kα = 2 − α1/n, 0 < α ≤ 0.10.

(b) First, consider the power for values of θ satisfying θ > kα > 1. In this situation,X(1) > kα, so that pr(A|θ > kα) = pr(X(1) > kα|θ > kα) = 1, so that the power is 1for θ > kα.

For values of θ satisfying 1 < θ ≤ kα,

POWER = pr(A|1 < θ ≤ kα) + pr(B|1 < θ ≤ kα) − pr(A ∩ B|1 < θ ≤ kα).

Now, with kα = 2 − α1/n,

pr(A|1 < θ ≤ kα) =∫θ+1

n[1 − x(1) + θ]n−1dx(1)

= (1 − kα + θ)n =[θ − 1 + α1/n

]n.

And,

pr(B|1 < θ ≤ kα) =∫θ+1

2n[x(n) − θ]n−1dx(n)

= 1 − (2 − θ)n.

Finally,

pr(A ∩ B|1 < θ ≤ kα) =∫θ+1

2

∫x(n)

n(n − 1)(x(n) − x(1))n−2dx(1)dx(n)

=∫θ+1

2n[−(x(n) − x(1))

n−1]x(n)

dx(n)

=∫θ+1

2n(x(n) − kα)n−1dx(n) = [(x(n) − kα)n]θ+1

2

= (θ + 1 − kα)n − (2 − kα)n =(θ − 1 + α1/n

)n − α.

So, for 1 < θ ≤ kα = 2 − α1/n,

POWER =[θ − 1 + α1/n

]n + [1 − (2 − θ)n]−[θ − 1 + α1/n

]n + α

= 1 + α − (2 − θ)n.

As required, the above expression equals α when θ = 1 and equals 1 when θ =kα = 2 − α1/n.

Page 390: Exercises and Solutions in Biostatistical Theory (2010)

Solutions 371

Solution 5.22∗

(a) Clearly, E(Y1 − Y2) = E(Y1) − E(Y2) = (μ1 − μ2). Now,

V(Yi) = V

⎛⎝ 1

n

n∑

j=1

Yij

⎞⎠ = 1

n2

⎧⎨⎩

n∑

j=1

V(Yij) + 2∑

allj<j′cov(Yij, Yij′)

⎫⎬⎭

= 1n2

{nσ2 + 2

n(n − 1)

2ρσ2}

= σ2

n[1 + (n − 1)ρ], i = 1, 2.

So,

V(Y1 − Y2) = V(Y1) + V(Y2) = 2σ2

n[1 + (n − 1)ρ].

(b) Under the stated assumptions,

(Y1 − Y2) = N

{(μ1 − μ2),

2σ2

n[1 + (n − 1)ρ]

}.

So,

Z = (Y1 − Y2) − (μ1 − μ2)√2σ2n [1 + (n − 1)ρ]

∼ N(0, 1).

Thus, to test H0 : μ1 = μ2 versus H1 : μ1 > μ2 at the α = 0.05 level, we reject H0in favor of H1 when

(Y1 − Y2) − 0√2σ2n [1 + (n − 1)ρ]

> 1.645.

(c) If one incorrectly assumes that ρ = 0, one would use (under the stated assump-tions) the test statistic

(Y1 − Y2)√2σ2/n

,

and reject H0 : μ1 = μ2 in favor of H1 : μ1 > μ2 when

(Y1 − Y2)√2σ2n

> 1.645.

Page 391: Exercises and Solutions in Biostatistical Theory (2010)

372 Hypothesis Testing Theory

Thus, the actual Type I error rate using this incorrect testing procedure (whenn = 10, σ2 = 2, and ρ = 0.50) is:

pr

⎡⎢⎣ (Y1 − Y2) − 0√

2σ2n

> 1.645∣∣n = 10, σ2 = 2, ρ = 0.50

⎤⎥⎦

= pr

⎡⎢⎣ (Y1 − Y2) − 0√

2σ2n [1 + (n − 1)ρ]

>1.645

√2σ2/n√

2σ2n [1 + (n − 1)ρ]

∣∣∣∣n = 10,

σ2 = 2, ρ = 0.50

⎤⎥⎦

= pr(Z > 0.7014] = 0.24.

This simple example illustrates that ignoring positive “intra-cluster” (in our case,intra-neighborhood) response correlation can lead to inflated Type I error rates,and more generally, to invalid statistical inferences.

Solution 5.23∗

(a) Given X1 = x1, where x1 is a fixed constant, X2 = θx1 + ε2 ∼ N(θx1, σ2).

(b)

fX1,X2(x1, x2) = fX1(x1)fX2(x2|X1 = x1)

= 1√2πσ2

e−x21/2σ2 · 1√

2πσ2e−(x2−θx1)

2/2σ2,

− ∞ < x1 < ∞, −∞ < x2 < ∞.

(c) For i = 2, 3, . . . , n, since Xi = θXi−1 + εi, it follows from part (a) that

fXi (xi|Xj = xj, j = 1, 2, . . . , i − 1) = fXi (xi|Xi−1 = xi−1),

where the conditional density of Xi given Xi−1 = xi−1 is N(θxi−1, σ2).So,

f∗ = fX1(x1)

n∏

i=2

fXi (xi|Xi−1 = xi−1)

= 1√2πσ2

e−x21/2σ2 ·

n∏

i=2

{1√

2πσ2e−(xi−θxi−1)

2/2σ2}

Page 392: Exercises and Solutions in Biostatistical Theory (2010)

Solutions 373

= (2π)−n/2(σ2)−n/2exp

⎧⎨⎩− 1

2σ2

⎡⎣x2

1 +n∑

i=2

(xi − θxi−1)2

⎤⎦⎫⎬⎭ ,

−∞ < xi < ∞, i = 1, 2, . . . , n.

So,

ln f∗ = −n2

ln(2π) − n2

ln σ2 − 12σ2

⎡⎣x2

1 +n∑

i=2

(xi − θxi−1)2

⎤⎦ .

So, in the unrestricted parameter space Ω,

∂ ln f∗∂θ

= 1σ2

n∑

i=2

xi−1(xi − θxi−1) = 0 ⇒ θΩ =

n∑i=2

xi−1xi

n−1∑i=1

x2i

.

And,

∂ ln f∗∂(σ2)

= −n2σ2 + 1

2σ4

⎡⎣x2

1 +n∑

i=2

(xi − θxi−1)2

⎤⎦

⇒ σ2Ω = 1

n

⎡⎣x2

1 +n∑

i=2

(xi − θΩxi−1)2

⎤⎦

= 1n

n∑

i=1

(xi − θΩxi−1)2 since x0 ≡ 0.

So,

LΩ = f∗|θ=θΩ,σ2=σ2Ω

= (2π)−n/2(σ2Ω

)−n/2e−n/2.

And, in the restricted parameter space ω (i.e., where θ = 0),

ln f∗|θ=0 = −n2

ln(2π) − n2

ln σ2 − 12σ2

n∑

i=1

x2i ,

∂ ln f∗∂(σ2)

= −n2σ2 + 1

2σ4

n∑

i=1

x2i = 0

⇒ σ2ω =

n∑i=1

x2i

n,

Page 393: Exercises and Solutions in Biostatistical Theory (2010)

374 Hypothesis Testing Theory

so that

Lω = f∗|θ=0,σ2=σ2ω

= (2π)−n/2(σ2ω

)−n/2e−n/2.

Thus,

λ = Lω

=(

σ2Ω

σ2ω

)n/2

⇒ λ2/n = σ2Ω

σ2ω

=∑n

i=1(xi − θΩxi−1)2∑n

i=1 x2i

=

n∑i=1

x2i − 2θΩ

∑ni=1 xi−1xi + θ2

Ω

∑ni=1 x2

i−1∑n

i=1 x2i

Note that∑n

i=1 xi−1xi =∑ni=2 xi−1xi and

∑ni=1 x2

i−1 =∑n−1i=1 x2

i since x0 ≡ 0.Thus, we have

λ2/n = 1 −⎧⎨⎩

2[(∑n

i=2 xi−1xi)2 /∑n−1

i=1 x2i

]−[(∑n

i=2 xi−1xi)2 /∑n−1

i=1 x2i

]

∑ni=1 x2

i

⎫⎬⎭

= 1 −(∑n

i=2 xi−1xi)2

(∑n−1i=1 x2

i

) (∑ni=1 x2

i

) .

For the given data,

λ230 = 1 − (4)2

(15 − 4)(15)= 0.9030 ⇒ λ = (0.9030)15 = 0.2164

⇒ −2 ln λ = 3.0610.

Since χ21,0.95 = 3.84, these data do not provide sufficient evidence to reject

H0 : θ = 0.

Solution 5.24∗

(a) Now, with x = (x1, x2, . . . , xn) and y = (y1, y2, . . . , ym), we have

L(x, y; θr , θu) =⎧⎨⎩

n∏

i=1

(θrxθr−1

i

)⎫⎬⎭

⎧⎨⎩

m∏

i=1

(θuyθu−1

i

)⎫⎬⎭

= θnr

⎛⎝

n∏

i=1

xi

⎞⎠

θr−1

θmu

⎛⎝

m∏

i=1

yi

⎞⎠

θu−1

.

Page 394: Exercises and Solutions in Biostatistical Theory (2010)

Solutions 375

So, by the Factorization Theorem,∏n

i=1 Xi and∏m

i=1 Yi are jointly sufficient for θrand θu.

(b) From part (a), the unrestricted log likelihood is

ln LΩ = n ln θr + (θr − 1)

n∑

i=1

ln xi + m ln θu + (θu − 1)

m∑

i=1

ln yi.

So,

∂ ln LΩ

∂θr= n

θr+

n∑

i=1

ln xi = 0

gives

θr = −n

⎛⎝

n∑

i=1

ln xi

⎞⎠

−1

.

Similarly, by symmetry,

∂ ln LΩ

∂θu= m

θu+

m∑

i=1

ln yi = 0

gives

θu = −m

⎛⎝

m∑

i=1

ln yi

⎞⎠

−1

.

So,

LΩ = θnr

⎛⎝

n∏

i=1

xi

⎞⎠

θr−1

θmu

⎛⎝

m∏

i=1

yi

⎞⎠

θu−1

.

Now, under H0 : θr = θu(= θ, say), the restricted log likelihood is

ln Lω = (n + m) ln θ + (θ − 1)

⎛⎝

n∑

i=1

ln xi +m∑

i=1

ln yi

⎞⎠ ,

so that

∂ ln Lω

∂θ= (n + m)

θ+⎛⎝

n∑

i=1

ln xi +m∑

i=1

ln yi

⎞⎠ = 0

Page 395: Exercises and Solutions in Biostatistical Theory (2010)

376 Hypothesis Testing Theory

gives

θ = −(n + m)

⎛⎝

n∑

i=1

ln xi +m∑

i=1

ln yi

⎞⎠

−1

.

So,

Lω = θ(n+m)

⎛⎝

n∏

i=1

xi ·m∏

i=1

yi

⎞⎠

θ−1

.

So,

λ = Lω

= θ(n+m)(∏n

i=1 xi ·∏mi=1 yi

)θ−1

θnr θm

u(∏n

i=1 xi)θr−1 (∏m

i=1 yi)θu−1

=(

θ

θr

)n (θ

θu

)m⎛⎝

n∏

i=1

xi

⎞⎠

θ−θr⎛⎝

m∏

i=1

yi

⎞⎠

θ−θu

=(

n + mn

)nWn

(n + m

m

)m(1 − W)m

⎛⎝

n∏

i=1

xi

⎞⎠

θ−θr⎛⎝

m∏

i=1

yi

⎞⎠

θ−θu

.

Thus,

ln λ = n ln(

n + mn

)+ m ln

(n + m

m

)+ n ln W + m ln(1 − W)

+ (θ − θr)

n∑

i=1

ln xi + (θ − θu)

m∑

i=1

ln yi

= n ln(

n + mn

)+ m ln

(n + m

m

)+ n ln W + m ln(1 − W)

− (n + m)W

+ n − (n + m)(1 − W) + m

= n ln(

n + mn

)+ m ln

(n + m

m

)+ ln[Wn(1 − W)m].

Finally,

−2 ln λ = −2n ln(

n + mn

)− 2m ln

(n + m

m

)− 2 ln[Wn(1 − W)m].

Under H0 : θr = θu, we know that −2 ln λ ∼χ21 for large n and m. Since 0 < W < 1,

−2 ln λ will be large (and hence favor rejecting H0) when either W is close to 0 orW is close to 1.

Page 396: Exercises and Solutions in Biostatistical Theory (2010)

Solutions 377

(c) Under H0 : θr = θu (= θ, say) fX(x; θ) = θxθ−1, 0 < x < 1, and fY(y; θ) = θyθ−1,0 < y < 1. Now, let U = − ln X, so that X = e−U and dX = −e−UdU. Hence,

fU(u; θ) = θ(e−u)θ−1e−u = θe−θu, 0 < u < ∞,

so that U = − ln X ∼ GAMMA(α = θ−1, β = 1). Thus,

n∑

i=1

(− ln Xi) = −n∑

i=1

ln Xi ∼ GAMMA(α = θ−1, β = n).

Analogously,

m∑

i=1

(− ln Yi) = −m∑

i=1

ln Yi ∼ GAMMA(α = θ−1, β = m).

Thus,

W =∑n

i=1 ln Xi∑ni=1 ln Xi +∑m

i=1 ln Yi= −∑n

i=1 ln Xi

−∑ni=1 ln Xi −∑m

i=1 ln Yi= R

(R + S)

where R ∼ GAMMA(α = θ−1, β = n), S ∼ GAMMA(α = θ−1, β = m), and R and Sare independent random variables. So,

fR,S(r, s; θ) =[

θnrn−1e−θr

Γ(n)

][θmsm−1e−θs

Γ(m)

]

= θ(n+m)rn−1sm−1e−θ(r+s)/Γ(n)Γ(m), r > 0, s > 0.

So, let W = R/(R + S) and P = (R + S); hence, R = PW and S = (P − PW) = P(1 −W). Clearly, 0 < W < 1 and 0 < P < +∞. Also,

J =

∣∣∣∣∣∣∣

∂R∂P

∂R∂W

∂S∂P

∂S∂W

∣∣∣∣∣∣∣=∣∣∣∣

W P(1 − W) −P

∣∣∣∣ = −P,

so that |J| = P. Finally,

fW,P(w, p; θ) = θ(n+m)(pw)n−1[p(1 − w)]m−1e−θ[pw+p(1−w)](p)

Γ(n)Γ(m)

=[

Γ(n + m)

Γ(n)Γ(m)wn−1(1 − w)m−1

][θn+mp(n+m)−1e−θp

Γ(n + m)

],

Page 397: Exercises and Solutions in Biostatistical Theory (2010)

378 Hypothesis Testing Theory

0 < w < 1, 0 < p < ∞. So, W ∼ BETA(α = n, β = m), P ∼ GAMMA(α = θ−1,β = n + m), and W and P are independent random variables. When n = m = 2,

fW(w) = Γ(4)

Γ(2)Γ(2)w(1 − w) = 6w(1 − w), 0 < w < 1,

when H0 : θr = θu is true. So, we want to choose k.05 such that

∫ k.05

06t(1 − t)dt = 0.05,

or (3k2.05 − 2k3

.05) = 0.05, or (by trial-and-error) k.05 = 0.135. So, for n = m = 2, rejectH0 : θr = θu when either W < 0.135 or W > 0.865 for α = 0.10.

Solution 5.25∗

(a) For the unrestricted parameter space,

LΩ =2∏

i=1

n∏

j=1

{C

k+yij−1k−1 θ

yiji (1 + θi)

−(k+yij)}

,

and

ln LΩ =2∑

i=1

n∑

j=1

{ln C

k+yij−1k−1 + yij ln θi − (k + yij) ln(1 + θi)

},

so that

∂ ln LΩ

∂θi=

n∑

j=1

[yij

θi− (k + yij)

(1 + θi)

]= nyi

θi− n(k + yi)

(1 + θi)= 0,

where

yi = n−1n∑

j=1

yij.

Thus,

θi = yik

=∑n

j=1 yij

nk, i = 1, 2.

So,

LΩ =2∏

i=1

n∏

j=1

{C

k+yij−1k−1 θ

yiji (1 + θi)

−(k+yij)}

.

Page 398: Exercises and Solutions in Biostatistical Theory (2010)

Solutions 379

For the restricted parameter space,

Lω =2∏

i=1

n∏

j=1

{C

k+yij−1k−1 θ

yij (1 + θ)−(k+yij)

}

=⎛⎝

2∏

i=1

n∏

j=1

Ck+yij−1k−1

⎞⎠ θs(1 + θ)−(2nk+s), where s =

2∑

i=1

n∑

j=1

yij.

So,

∂Lω

∂θ= s

θ− (2nk + s)

(1 + θ)= 0 gives θ = y

k, where y = 1

2(y1 + y2).

Thus,

Lω =⎛⎝

2∏

i=1

n∏

j=1

Ck+yij−1k−1

⎞⎠ θs(1 + θ)−(2nk+s).

Hence,

−2 ln λ = −2 ln

⎧⎨⎩

θs(1 + θ)−(2nk+s)

∏2i=1∏n

j=1 θyiji (1 + θi)

−(k+yij)

⎫⎬⎭

= −2

⎧⎨⎩s ln θ − (2nk + s) ln(1 + θ)

−2∑

i=1

[nyi ln θi − n(k + yi) ln(1 + θi)]⎫⎬⎭.

Now, s = 15, n = 50, k = 3, y1 = 550 = 0.10, y2 = 10

50 = 0.20, θ = yk = (y1+y2)

2k =0.10+0.20

2(3)= 0.05, θ1 = 0.10

3 = 0.0333, and θ2 = 0.203 = 0.0667.

So, −2 ln λ = 1.62. Since χ21,0.95 = 3.841, we do not reject H0; the P-value = 0.22.

(b) From part (a),

S′(θ1, θ2) =

⎡⎢⎢⎢⎣

∂ ln LΩ

∂θ1

∂ ln LΩ

∂θ2

⎤⎥⎥⎥⎦ =

⎡⎢⎢⎢⎣

ny1θ1

− n(k + y1)

1 + θ1

ny2θ2

− n(k + y2)

1 + θ2

⎤⎥⎥⎥⎦.

Under H0: θ1 = θ2(= θ, say), θ = (y1+y2)2k = 0.05. So,

S′(θ, θ) =

⎡⎢⎢⎣

(50)(0.10)

0.05− (50)(3 + 0.10)

(1 + 0.05)

(50)(0.20)

0.05− (50)(3 + 0.20)

(1 + 0.05)

⎤⎥⎥⎦ =

[−47.6190

+47.6190

].

Page 399: Exercises and Solutions in Biostatistical Theory (2010)

380 Hypothesis Testing Theory

Now,

∂2 ln LΩ

∂θ21

= −ny1

θ21

+ n(k + y1)

(1 + θ1)2 ,

∂2 ln LΩ

∂θ22

= −ny2

θ22

+ n(k + y2)

(1 + θ2)2 ,

and

∂2 ln LΩ

∂θ1∂θ2= ∂2 ln LΩ

∂θ2∂θ1= 0.

So, with y = (y11, y12, . . . , y1n; y21, y22, . . . , y2n), we have

I(y; θ) =

⎡⎢⎢⎢⎣

ny1

θ2− n(k + y1)

(1 + θ)20

0ny2

θ2− n(k + y2)

(1 + θ)2

⎤⎥⎥⎥⎦

=

⎡⎢⎢⎢⎣

(50)(0.10)

(0.05)2 − (50)(3 + 0.10)

(1 + 0.05)2 0

0(50)(0.20)

(0.05)2 − (50)(3.20)

(1 + 0.05)2

⎤⎥⎥⎥⎦

=[

1, 859.4104 0

0 3, 854.8753

].

So,

S = S(θ, θ)I−1(y; θ)S′(θ, θ) = (−47.6190)2

1859.4104+ (47.6190)2

3854.8753= 1.81.

Since χ21,0.95 = 3.84, we do not reject H0; the P-value = 0.18 .

(c) With Xij = k + Yij, then

pXij (xij; θ) = Cxij−1k−1

(1

1 + θi

)k( θi1 + θi

)xij−k, xij = k, k + 1, . . . , ∞.

So,

E(Yij) = E(Xij) − k = k(1 + θi) − k = kθi

and

V(Yij) = k(

θi1 + θi

)(1 + θi)

2 = kθi(1 + θi).

Page 400: Exercises and Solutions in Biostatistical Theory (2010)

Solutions 381

In general,

(Y1 − Y2) − k(θ1 − θ2)√kθ1(1 + θ1)

n+ kθ2(1 + θ2)

n

∼ N(0, 1)

for large n by the Central Limit Theorem. Under H0: θ1 = θ2(= θ, say), then

(Y1 − Y2) − 0√2kθ(1 + θ)

n

∼ N(0, 1) for large n.

Thus, via Slutsky’s Theorem, we could reject H0: θ1 = θ2(= θ, say) at the α-levelfor large n when

∣∣∣∣∣∣∣∣∣∣

(Y1 − Y2) − 0√2kθ(1 + θ)

n

∣∣∣∣∣∣∣∣∣∣> Z1−α/2.

Now, for large n,

POWER = pr

⎧⎪⎪⎨⎪⎪⎩

∣∣∣∣∣∣∣∣

Y1 − Y2√2kθ(1 + θ)

n

∣∣∣∣∣∣∣∣> Z1−α/2

∣∣∣θ1 = θ2

⎫⎪⎪⎬⎪⎪⎭

= pr

⎧⎪⎪⎨⎪⎪⎩

(Y1 − Y2)√2kθ(1 + θ)

n

< − Z1−α/2

∣∣∣θ1 = θ2

⎫⎪⎪⎬⎪⎪⎭

+ pr

⎧⎪⎪⎨⎪⎪⎩

(Y1 − Y2)√2kθ(1 + θ)

n

> Z1−α/2

∣∣∣θ1 = θ2

⎫⎪⎪⎬⎪⎪⎭

.

For θ1 = 2.0 and θ2 = 2.4, the contribution of the second term will be negligible.So,

POWER = pr

⎧⎪⎪⎨⎪⎪⎩

(Y1 − Y2)√2kθ(1 + θ)

n

< − Z1−α/2

∣∣∣θ1 = 2.0, θ2 = 2.4

⎫⎪⎪⎬⎪⎪⎭

= pr

⎧⎪⎪⎨⎪⎪⎩

(Y1 − Y2) − k(θ1 − θ2)√kθ1(1 + θ1)

n+ kθ2(1 + θ2)

n

<

Page 401: Exercises and Solutions in Biostatistical Theory (2010)

382 Hypothesis Testing Theory

−Z1−α/2

√2kθ(1 + θ)

n− k(θ1 − θ2)

√kθ1(1 + θ1)

n+ kθ2(1 + θ2)

n

∣∣∣∣θ1 = 2.0, θ2 = 2.4

⎫⎪⎪⎬⎪⎪⎭

= pr

⎧⎪⎪⎨⎪⎪⎩

Z <

−Z1−α/2

√2kθ(1 + θ)

n− k(θ1 − θ2)

√kθ1(1 + θ1)

n+ kθ2(1 + θ2)

n

∣∣∣∣θ1 = 2.0, θ2 = 2.4

⎫⎪⎪⎬⎪⎪⎭

,

where Z∼N(0, 1) for large n. So, with θ1 = 2.0, θ2 = 2.4, α = 0.05, k = 3, (1 − β) =0.80, and θ = (θ1 + θ2)/2 = 2.2, we require the smallest n (say, n∗) such that

−1.96√

2(3)(2.2)(1 + 2.2) − √n(3)(2.0 − 2.4)√

3(2.0)(1 + 2.0) + 3(2.4)(1 + 2.4)≥ 0.842,

giving n∗ = 231.

Solution 5.26∗

(a) The multinomial likelihood function L is given by

L = = n!y11!y10!y01!y00!π

y1111 π

y1010 π

y0101 π

y0000 ,

and so

ln L ∝1∑

i=0

1∑

j=0

yij ln πij.

To maximize ln L subject to the constraint (π11 + π10 + π01 + π00) = 1, we canuse the method of Lagrange multipliers. Define

U =1∑

i=0

1∑

j=0

yij ln πij + λ

⎛⎝1 −

1∑

i=0

1∑

j=0

πij

⎞⎠ .

The equations

∂U∂πij

= yij

πij− λ = 0, i = 0, 1 and j = 0, 1,

imply that

y11π11

= y10π10

= y01π01

= y00π00

= λ,

Page 402: Exercises and Solutions in Biostatistical Theory (2010)

Solutions 383

or equivalently,

yij

λ= πij, i = 0, 1 and j = 0, 1.

Additionally,

1∑

i=0

1∑

j=0

πij = 1 =⇒1∑

i=0

1∑

j=0

yij

λ= 1

=⇒ λ =1∑

i=0

1∑

j=0

yij = n.

Hence,

πij = yij

n, i = 0, 1 and j = 0, 1.

By the invariance property for MLEs, it follows that the MLE of δ is equal to

δ = (π11 + π10) − (π11 + π01) = (π10 − π01) = (y10 − y01)

n.

(b) Using the equality π00 = (1 − π11 − π10 − π01), the log likelihood function can bewritten as

ln L ∝ y11 ln π11 + y10 ln π10 + y01 ln π01 + y00 ln(1 − π11 − π10 − π01).

Now,

∂ ln L∂π11

= y11π11

− y00(1 − π11 − π10 − π01)

,

∂ ln L∂π10

= y10π10

− y00(1 − π11 − π10 − π01)

,

and

∂ ln L∂π01

= y01π01

− y00(1 − π11 − π10 − π01)

.

So, for (i, j) equal to (1, 1), (1, 0), or (0, 1), we have

∂2 ln L∂π2

ij

= − yij

π2ij

− y00(1 − π11 − π10 − π01)2 ,

and hence

−E

⎡⎣∂2 ln L

∂π2ij

⎤⎦ =

(nπij

+ nπ00

).

Page 403: Exercises and Solutions in Biostatistical Theory (2010)

384 Hypothesis Testing Theory

In addition,

∂2 ln L∂π11∂π10

= ∂2 ln L∂π11∂π01

= ∂2 ln L∂π10∂π01

= − y00(1 − π11 − π10 − π01)2 ,

and so

−E

[∂2 ln L

∂π11∂π10

]= −E

[∂2 ln L

∂π11∂π01

]= −E

[∂2 ln L

∂π10∂π01

]= n

π00.

Hence, with π = (π11, π10, π01), the expected Fisher information matrixI(π) is

I(π) = n

⎡⎢⎢⎢⎢⎢⎢⎢⎣

(1

π11+ 1

π00

)1

π00

1π00

1π00

(1

π10+ 1

π00

)1

π001

π00

1π00

(1

π01+ 1

π00

)

⎤⎥⎥⎥⎥⎥⎥⎥⎦

,

with π00 = (1 − π11 − π10 − π01).So,

I−1(π) = 1n

⎡⎣

π11(1 − π11) −π11π10 −π11π01−π11π10 π10(1 − π10) −π10π01−π11π01 −π10π01 π01(1 − π01)

⎤⎦ .

The null hypothesis of interest is H0 : R(π) ≡ R = (π10 − π01) = 0. Hence, withT(π) ≡ T = [0, 1, −1],

Λ(π) ≡ Λ = TI−1(π)T ′

= π10(1 − π10)

n+ π01(1 − π01)

n+ 2π10π01

n

= (π10 + π01)

n− (π10 − π01)2

n.

So, the Wald test statistic W takes the form

W = R2

Λ

= (π10 − π01)2

(π10 + π01)

n− (π10 − π01)2

n

Page 404: Exercises and Solutions in Biostatistical Theory (2010)

Solutions 385

= (y10 − y01)2/n2[

y10n2 + y01

n2 −(

y10 − y01n

)2]/

n

= (y10 − y01)2

(y10 + y01) − (y10 − y01)2

n

.

A simpler way to derive this test statistic is to note that

W = δ2

V(δ),

where V(δ) denotes the MLE of V(δ). Now,

V(δ) = V[

(Y10 − Y01)

n

]

= V(Y10) + V(Y01) − 2cov(Y10, Y01)

n2

= nπ10(1 − π10) + nπ01(1 − π01) + 2nπ10π01n2

= (π10 + π01) − (π10 − π01)2

n.

By the invariance property, it follows that

V(δ) = (π10 + π01) − (π10 − π01)2

n

=(y10 + y01)

n− (y10 − y01)2

n2

n.

Finally, the Wald test statistic is given by

W = (δ)2

V(δ)

=(y10 − y01)2

n2[(y10 + y01)

n− (y10 − y01)2

n2

]/n

= (y10 − y01)2

(y10 + y01) − (y10 − y01)2

n

.

Page 405: Exercises and Solutions in Biostatistical Theory (2010)

386 Hypothesis Testing Theory

When y11 = 22, y10 = 3, y01 = 7, and y00 = 13, so that n = 45, the Wald teststatistic is equal to

W = (3 − 7)2

(3 + 7) − (3 − 7)2

45

= 1.6590.

An approximate P-value is

P-value = pr(χ2

1 > 1.6590|H0

)= 0.1977.

(c) The score vector has the form S(π) = (s1, s2, s3), where

s1 = ∂ ln L∂π11

= y11π11

− y00π00

,

s2 = ∂ ln L∂π10

= y10π10

− y00π00

,

and

s3 = ∂ ln L∂π01

= y01π01

− y00π00

.

Under H0 : π10 = π01 (= π, say), the restricted log likelihood is

ln Lω ∝ y11 ln π11 + y10 ln π + y01 ln π + y00 ln π00.

Using the LaGrange multiplier method with

U = y11 ln π11 + y10 ln π + y01 ln π + y00 ln π00

+ λ(1 − π11 − 2π − π00),

we have

∂U∂π11

= y11π11

− λ = 0,

∂U∂π

= (y10 + y01)

π− 2λ = 0,

and

∂U∂π00

= y00π00

− λ = 0.

Since λ = n, the restricted MLEs are

πω11 = y11n

,

πω = (y10 + y01)

(2n)(= πω10 = πω01),

Page 406: Exercises and Solutions in Biostatistical Theory (2010)

Solutions 387

and

πω00 = y00n

.

Thus, with πω = (πω11, πω10, πω01) = (πω11, πω, πω

), we have

I−1(πω) = 1

n3

×

⎡⎢⎢⎢⎢⎢⎢⎣

y11(n − y11)−y11(y10 + y01)

2−y11(y10 + y01)

2−y11(y10 + y01)

2(y10 + y01)(n + y11 + y00)

4−(y10 + y01)2

4−y11(y10 + y01)

2−(y10 + y01)2

4(y10 + y01)(n + y11 + y00)

4

⎤⎥⎥⎥⎥⎥⎥⎦

.

Now,

∂ ln L∂π11 |π=πω

= 0,

∂ ln L∂π10 |π=πω

= n(

y10 − y01y10 + y01

)

and

∂ ln L∂π01 |π=πω

= n(

y01 − y10y10 + y01

).

So,

S(πω) = [0, s, −s]

,

where

s = n(

y10 − y01y10 + y01

).

Finally, it can be shown with some algebra that

S(πω)I−1(πω)S′(πω) = QM.

When comparing the Wald and score test statistics, the nonnegative numeratorsof the two test statistics are identical. Since the nonnegative denominator of thescore statistic is always at least as large as the denominator of the Wald statistic, itfollows that the Wald statistic will always be at least as large in value as the scorestatistic.

Page 407: Exercises and Solutions in Biostatistical Theory (2010)

388 Hypothesis Testing Theory

(d) Since χ21,0.95 = 3.84, H0 will be rejected if QM > 3.84 and will not be rejected if

QM ≤ 3.84. Let Q(y10; 10) denote the value of QM when Y10 = y10 and when (y10 +y01) = 10. Note that

Q(0; 10) = Q(10; 10) = 10.0;

Q(1; 10) = Q(9; 10) = 6.4;

Q(2; 10) = Q(8; 10) = 3.6;

Q(3; 10) = Q(7; 10) = 1.6;

Q(4; 10) = Q(6; 10) = 0.4;

Q(5; 10) = 0.0.

Thus, the null hypothesis will be rejected if Y10 takes any of the four values 0,1,9, or10, and will not be rejected otherwise. For each randomly selected subject who has adiscordant response pattern [i.e., (0,1) or (1,0)], the conditional probability of a (1,0)response [given that the response is either (1,0) or (1,0)] is equal to π10/(π10 + π01).This probability remains constant and does not depend on the number of subjectswho have a concordant [(0,0) or (1,1)] response, and so the binomial distributionapplies. Under the assumption that π10 = 0.10 and π10 = 0.05, the probability ofrejecting the null hypothesis is equal to

POWER =∑

y∈{0,1,9,10}C10

y

(0.10

0.10 + 0.05

)y ( 0.050.10 + 0.05

)10−y

= 0.0000169 + 0.000339 + 0.0867 + 0.01734

= 0.1044.

Thus, there is roughly a 10% chance that the null hypothesis will be rejected.A larger sample size is needed in order to achieve reasonable power for testingH0: δ = 0 versus H1: δ = 0 when π10 = 0.10 and δ = 0.05.

Page 408: Exercises and Solutions in Biostatistical Theory (2010)

AppendixUseful Mathematical Results

A.1 Summations

a. Binomial:n∑

j=0Cn

j ajb(n−j) = (a + b)n, where Cnj = n!

j!(n−j)! .

b. Geometric:

i.∞∑

j=0rj = 1

1 − r, |r| < 1.

ii.∞∑

j=1rj = r

1 − r, |r| < 1.

iii.n∑

j=0rj = 1 − r(n+1)

1 − r, −∞ < r < +∞, r = 0.

c. Negative Binomial:∞∑

j=0Cj+k

k πj = (1 − π)−(k+1), 0 < π < 1,

k a positive integer.

d. Exponential:∞∑

j=0

xj

j! = ex, −∞ < x < +∞.

e. Sums of Integers:

i.n∑

i=1i = n(n + 1)

2.

ii.n∑

i=1i2 = n(n + 1)(2n + 1)

6.

iii.n∑

i=1i3 =

[n(n + 1)

2

]2

.

A.2 Limits

a. limn→∞(

1 + an

)n = ea, −∞ < a < +∞.

389

Page 409: Exercises and Solutions in Biostatistical Theory (2010)

390 Appendix: Useful Mathematical Results

A.3 Important Calculus-Based Results

a. L’Hôpital’s Rule: For differentiable functions f(x) and g(x) and an“extended” real number c (i.e., c ∈ �1 or c = ±∞), suppose thatlimx→c f(x) = limx→c g(x) = 0, or that limx→c f(x) = limx→c g(x) =±∞. Suppose also that limx→c f′(x)/g′(x) exists [in particular, g′(x) =0 near c, except possibly at c]. Then,

limx→c

f(x)

g(x)= lim

x→c

f′(x)

g′(x).

L’Hôpital’s Rule is also valid for one-sided limits.

b. Integration by Parts: Let u = f(x) and v = g(x), with differentialsdu = f′(x) dx and dv = g′(x) dx. Then,

∫u dv = uv −

∫v du.

c. Jacobians for One- and Two-Dimensional Change-of-Variable Transforma-tions: Let X be a scalar variable with support A ⊆ �1. Consider a one-to-one transformation U = g(X) that maps A → B ⊆ �1. Denote theinverse of U as X = h(U). Then, the corresponding one-dimensionalJacobian of the transformation is defined as

J = d[h(U)]dU

,

so that ∫

A

f(X) dX =∫

B

f[h(U)]|J| dU.

Similarly, consider scalar variables X and Y defined on a two-dimensional set A ⊆ �2, and let U = g1(X, Y) and V = g2(X, Y)

define a one-to-one transformation that maps A in the xy-plane toB ⊆ �2 in the uv-plane. Define X = h1(U, V) and Y = h2(U, V). Then,the Jacobian of the (two-dimensional) transformation is given by thesecond-order determinant

J =

∣∣∣∣∣∣∣∣∣

∂h1(U, V)

∂U∂h1(U, V)

∂V

∂h2(U, V)

∂U∂h2(U, V)

∂V

∣∣∣∣∣∣∣∣∣,

Page 410: Exercises and Solutions in Biostatistical Theory (2010)

Special Functions 391

so that∫ ∫

A

f(X, Y) dX dY =∫ ∫

B

f[h1(U, V), h2(U, V)]|J|dU dV.

A.4 Special Functions

a. Gamma Function:

i. For any real number t > 0, the Gamma function is defined as

Γ(t) =∫∞

0yt−1e−y dy.

ii. For any real number t > 0, Γ(t + 1) = tΓ(t).iii. For any positive integer n, Γ(n) = (n − 1)!.iv. Γ(1/2) = √

π; Γ(3/2) = √π/2; Γ(5/2) = (3

√π)/4.

b. Beta Function:

i. For α > 0 and β > 0, the Beta function is defined as

B(α, β) =∫1

0yα−1(1 − y)β−1 dy.

ii. B(α, β) = Γ(α)Γ(β)

Γ(α + β).

c. Convex and Concave Functions: A real-valued function f(·) is said to beconvex if, for any two points x and y in its domain and any t ∈ [0, 1],we have

f[tx + (1 − t)y] ≤ tf(x) + (1 − t)f(y).

Likewise, f(·) is said to be concave if

f[tx + (1 − t)y] ≥ tf(x) + (1 − t)f(y).

Also, f(x) is concave on [a, b] if and only if −f(x) is convex on [a, b].

A.5 Approximations

a. Stirling’s Approximation:For n a nonnegative integer, n! ≈ √

2πn(n

e

)n .

Page 411: Exercises and Solutions in Biostatistical Theory (2010)

392 Appendix: Useful Mathematical Results

b. Taylor Series Approximations:

i. Univariate Taylor Series: If f(x) is a real-valued function of x that isinfinitely differentiable in a neighborhood of a real number a, thena Taylor series expansion of f(x) around a is equal to

f(x) =∞∑

k=0

f(k)(a)k! (x − a)k ,

where

f(k)(a) =[

dkf(x)

dxk

]

|x=a

, k = 0, 1, . . . , ∞.

When a = 0, the infinite series expansion above is called a Maclau-rin series.

As examples, a first-order (or linear) Taylor series approximationto f(x) around the real number a is equal to

f(x) ≈ f(a) +[

df(x)

dx

]

|x=a(x − a),

and a second-order Taylor series approximation to f(x) around thereal number a is equal to

f(x) ≈ f(a) +[

df(x)

dx

]

|x=a(x − a) + 1

2!

[d2f(x)

dx2

]

|x=a

(x − a)2.

ii. Multivariate Taylor series: For p ≥ 2, if f(x1, x2, . . . , xp) is a real-valued function of x1, x2, . . . , xp that is infinitely differentiablein a neighborhood of (a1, a2, . . . , ap), where ai, i = 1, 2, . . . , p, isa real number, then a multivariate Taylor series expansion off(x1, x2, . . . , xp) around (a1, a2, . . . , ap) is equal to

f(x1, x2, . . . , xp) =∞∑

k1=0

∞∑k2=0

· · ·∞∑

kp=0

f(k1+k2+···+kp)(a1, a2, . . . , ap)

k1!k2! · · · kp!

×p∏

i=1

(xi − ai)ki ,

where

f(k1+k2+···+kp)(a1, a2, . . . , ap)

=⎡⎣∂(k1+k2+···+kp)f(x1, x2, . . . , xp)

∂xk11 ∂xk2

2 · · · ∂xkpp

⎤⎦

|(x1,x2,...,xp)=(a1,a2,...,ap)

.

Page 412: Exercises and Solutions in Biostatistical Theory (2010)

Lagrange Multipliers 393

As examples, when p = 2, a first-order (or linear) multivariateTaylor series approximation to f(x1, x2) around (a1, a2) is equal to

f(x1, x2) ≈ f (a1, a2) +2∑

i=1

[∂f(x1, x2)

∂xi

]

|(x1,x2)=(a1,a2)

(xi − ai),

and a second-order multivariate Taylor series approximation tof(x1, x2) around (a1, a2) is equal to

f(x1, x2) ≈ f(a1, a2) +2∑

i=1

[∂f(x1, x2)

∂xi

]

|(x1,x2)=(a1,a2)

(xi − ai)

+ 12!

2∑i=1

[∂2f(x1, x2)

∂x2i

]

|(x1,x2)=(a1,a2)

(xi − ai)2

+[

∂2f(x1, x2)

∂x1∂x2

]

|(x1,x2)=(a1,a2)

(x1 − a1)(x2 − a2).

A.6 Lagrange Multipliers

The method of Lagrange multipliers provides a strategy for finding stationarypoints x∗ of a differentiable function f(x) subject to the constraint g(x) = c,where x = (x1, x2, . . . , xp)

′, where g(x) = [g1(x), g2(x), . . . , gm(x)]′ is a set ofm(< p) constraining functions, and where c = (c1, c2, . . . , cm)′ is a vector ofknown constants. The stationary points x∗ = (x∗

1, x∗2, . . . , x∗

p)′ can be (local)

maxima, (local) minima, or saddle points. The Lagrange multiplier methodinvolves consideration of the Lagrange function

Λ(x, λ) = f(x) − [g(x) − c]′

λ,

where λ = (λ1, λ2, . . . , λm)′ is a vector of scalars called “Lagrange multipli-ers.” In particular, the stationary points x∗ are obtained as the solutions for xusing the (p + m) equations

∂Λ(x, λ)

∂x= ∂f(x)

∂x−{

∂[g(x) − c

]′∂x

}λ = 0

and∂Λ(x, λ)

∂λ= − [g(x) − c

] = 0,

Page 413: Exercises and Solutions in Biostatistical Theory (2010)

394 Appendix: Useful Mathematical Results

where ∂f(x)/∂x is a (p × 1) column vector with ith element equal to∂f(x)/∂xi, i = 1, 2, . . . , p, where ∂[g(x) − c]′/∂x is a (p × m) matrix with (i, j)thelement equal to ∂gj(x)/∂xi, i = 1, 2, . . . , p and j = 1, 2, . . . , m, and where 0denotes a column vector of zeros.

Note that the second matrix equation gives g(x) = c.As an example, consider the problem of finding the stationary points (x∗, y∗)

of the function f(x, y) = (x2 + y2) subject to the constraint g(x, y) = g1(x, y) =(x + y) = 1. Here, p = 2, m = 1, and the Lagrange multiplier function isgiven by

Λ(x, y, λ) = (x2 + y2) − λ(x + y − 1).

The stationary points (x∗, y∗) are obtained by solving the system of equations

∂Λ(x, y, λ)

∂x= 2x − λ = 0,

∂Λ(x, y, λ)

∂y= 2y − λ = 0,

∂Λ(x, y, λ)

∂λ= x + y − 1 = 0.

Solving these three equations yields the solution x∗ = y∗ = 1/2. Since

∂Λ2(x, y, λ)

∂x2 = ∂Λ2(x, y, λ)

∂y2 > 0 and∂Λ2(x, y, λ)

∂x∂y= 0,

this solution yields a minimum subject to the constraint x + y = 1.

Page 414: Exercises and Solutions in Biostatistical Theory (2010)

References

Berkson J. 1950. “Are there two regressions?,” Journal of the American StatisticalAssociation, 45(250), 164–180.

Birkett NJ. 1988. “Evaluation of diagnostic tests with multiple diagnostic categories,”Journal of Clinical Epidemiology, 41(5), 491–494.

Blackwell D. 1947. “Conditional expectation and unbiased sequential estimation,”Annals of Mathematical Statistics, 18, 105–110.

Bondesson L. 1983. “On uniformly minimum variance unbiased estimation when nocomplete sufficient statistics exist,” Metrika, 30, 49–54.

Breslow NE and Day NE. 1980. Statistical Methods in Cancer Research, Volume I: TheAnalysis of Case–Control Studies, International Agency for Research on Cancer(IARC) Scientific Publications.

Casella G and Berger RL. 2002. Statistical Inference, Second Edition, Duxbury, ThomsonLearning, Belmont, CA.

Cramer H. 1946. Mathematical Methods of Statistics, Princeton University Press, Prince-ton, NJ.

Dempster AP, Laird NM, and Rubin DB. 1977. “Maximum likelihood from incompletedata via the EM algorithm,” Journal of the Royal Statistical Society, Series B, 39,1–22.

Feller W. 1968. An Introduction to Probability Theory and Its Applications, Volume I, ThirdEdition, John Wiley and Sons, Inc., Hoboken, NJ.

Fuller WA. 2006. Measurement Error Models, paperback, John Wiley and Sons, Inc.,Hoboken, NJ.

Gibbs DA, Martin SL, Kupper LL, and Johnson RE. 2007. “Child maltreatment inenlisted soldiers’ families during combat-related deployments,” Journal of theAmerican Medical Association, 298(5), 528–535.

Gustafson P. 2004. Measurement Error and Misclassification in Statistics and Epidemiology:Impacts and Bayesian Adjustments, Chapman & Hall/CRC Press, London, UK.

Halmos PR and Savage LJ. 1949. “Applications of the Radon-Nikodym theorem to thetheory of sufficient statistics,” Annals of Mathematical Statistics, 20, 225–241.

Hogg RV, Craig AT, and McKean JW. 2005. Introduction to Mathematical Statistics, SixthEdition, Prentice-Hall, Upper Saddle River, NJ.

Hosmer DW and Lemeshow S. 2000. Applied Logistic Regression, Second Edition, JohnWiley and Sons, Inc., Hoboken, NJ.

Hosmer DW, Lemeshow S, and May S. 2008. Applied Survival Analysis: RegressionModeling of Time to Event Data, Second Edition, John Wiley and Sons, Inc.,Hoboken, NJ.

Houck N, Weller E, Milton DK, Gold DR, Ruifeng L, and Spiegelman D. 2006. “Homeendotoxin exposure and wheeze in infants: correction for bias due to exposuremeasurement error,” Environmental Health Perspectives, 114(1), 135–140.

395

Page 415: Exercises and Solutions in Biostatistical Theory (2010)

396 References

Kalbfleisch JG. 1985. Probability and Statistical Inference, Volume 1: Probability, SecondEdition, Springer, New York, NY.

Kalbfleisch JG. 1985. Probability and Statistical Inference, Volume 2: Statistical Inference,Second Edition, Springer, New York, NY.

Kass RE and Raftery AE. 1995. “Bayes factors,” Journal of the American StatisticalAssociation, 90, 773–795.

Kleinbaum DG and Klein M. 2002. Logistic Regression: A Self-Learning Text, SecondEdition, Springer, New York, NY.

Kleinbaum DG and Klein M. 2005. Survival Analysis: A Self-Learning Text, SecondEdition, Springer, New York, NY.

Kleinbaum DG, Kupper LL, and Morgenstern H. 1982. Epidemiologic Research: Principlesand Quantitative Methods, John Wiley and Sons, Inc., Hoboken, NJ.

Kleinbaum DG, Kupper LL, Nizam A, and Muller KE. 2008. Applied Regression Analysisand Other Multivariable Methods, Fourth Edition, Duxbury Press, Belmont, CA.

Kupper LL. 1984. “Effects of the use of unreliable surrogate variables on the valid-ity of epidemiologic research studies,” American Journal of Epidemiology, 120(4),643–648.

Kupper LL and Hafner KB. 1989. “How appropriate are popular sample sizeformulas?,” The American Statistician, 43(2), 101–105.

Kupper LL and Haseman JK. 1978. “The use of a correlated binomial model for theanalysis of certain toxicological experiments,” Biometrics, 34, 69–76.

Kutner MH, Nachtsheim CJ, and Neter J. 2004. Applied Linear Regression Models, FourthEdition, McGraw-Hill/Irwin, Burr Ridge, IL.

Lehmann EL. 1983. Theory of Point Estimation, Springer, New York, NY.Makri FS, Philippou AN, and Psillakis ZM. 2007. “Shortest and longest length of

success runs in binary sequences,” Journal of Statistical Planning and Inference,137, 2226–2239.

McCullagh P and Nelder JA. 1989. Generalized Linear Models, Second Edition, Chapman& Hall/CRC Press, London, UK.

Neyman J and Pearson ES. 1928. “On the use and interpretation of certain test criteriafor purposes of statistical inference,” Biometrika, 20A, 175–240 and 263–294.

Neyman J and Pearson ES. 1933. “On the problem of the most efficient tests of statisticalhypotheses,” Philosophical Transactions, Series A, 231, 289–337.

Rao CR. 1945. “Information and accuracy attainable in the estimation of statisticalparameters,” Bulletin of the Calcutta Mathematical Society, 37, 81–91.

Rao CR. 1947. “Large sample tests of statistical hypotheses concerning several param-eters with applications to problems of estimation,” Proceedings of the CambridgePhilosophical Society, 44, 50–57.

Rao CR. 1973. Linear Statistical Inference and Its Applications, Second Edition. John Wileyand Sons, Inc., Hoboken, NJ.

Ross S. 2006. A First Course in Probability, Seventh Edition, Prentice-Hall, Inc., UpperSaddle River, NJ.

Samuel-Cahn E. 1994. “Combining unbiased estimators,” The American Statistician,48(1), 34–36.

Serfling RJ. 2002. Approximation Theorems of Mathematical Statistics, John Wiley andSons, Inc., Hoboken, NJ.

Taylor DJ, Kupper LL, Rappaport SM, and Lyles RH. 2001. “A mixture model foroccupational exposure mean testing with a limit of detection,” Biometrics, 57(3),681–688.

Page 416: Exercises and Solutions in Biostatistical Theory (2010)

References 397

Wackerly DD, Mendenhall III W, and Scheaffer RL. 2008. Mathematical Statistics WithApplications, Seventh Edition, Duxbury, Thomson Learning, Belmont, CA.

Wald A. 1943. “Tests of statistical hypotheses concerning several parameters whenthe number of observations is large,” Transactions of the American MathematicalSociety, 54, 426–482.

Page 417: Exercises and Solutions in Biostatistical Theory (2010)
Page 418: Exercises and Solutions in Biostatistical Theory (2010)