17
International Statistical Review (2008), 76, 1, 140–156 doi:10.1111/j.1751-5823.2007.00039.x Short Book Reviews Editor: Simo Puntanen A Statistical Approach to Neural Networks for Pattern Recognition Robert A. Dunne Wiley, 2007, xvii + 268 pages, £ 47.50 / 67.70, hardcover ISBN: 978-0-471-74108-4 Table of contents 1. Introduction 8. Influence curves for the multi-layer 2. The multi-layer perception model perceptron classifier 3. Linear discriminant analysis 9. The sensitivity curves of the MLP classifier 4. Activation and penalty functions 10. A robust fitting procedure for MLP models 5. Model fitting and evaluation 11. Smoothed weights 6. The task-based MLP 12. Translation invariance 7. Incorporating spatial information into an 13. Fixed-slope training MLP classifier A. Function minimization B. Maximum values of the influence curve Readership: Students and professionals in mathematics, statistics, computer science, and electrical engineering. In the preface, Robert Dunne describes neural networks or multilayer perceptrons as statistical concepts under a different terminology. I think this is exactly right. Although, in their early days, emphasis was on the network representation, various statisticians rapidly made it clear that this was just another way of looking at what were effectively rather complicated nonlinear statistical models. The statistical input also led to a deeper understanding of issues such as overfitting which bedevilled early presentations by neural network enthusiasts. (I recall attending more than one in which ‘100% predictive accuracy’ was reported with enthusiasm.) This book arose from the author’s efforts in recasting descriptions of neural networks in statistical terms. The first five chapters describe neural network models and relate them to earlier statistical models. Chapters 6 and 7 describe how to apply them to problems with large numbers of classes and some image problems; these are the sorts of problems most often encountered by the author. Chapters 8 through 10 explore the robustness of the models. This is always an important issue for highly flexible modeling methods. The last three chapters describe further extensions to the basic approach. The book provides an excellent introduction to neural networks from a statistical perspective. It would make ideal reading for a graduate student or researcher about to enter the area, or someone who wished to have a very sound grasp of this class of models in order to apply them effectively. David J. Hand: [email protected] Mathematics Department, Imperial College London SW7 2AZ, UK C 2008 The Author. Journal compilation C 2008 International Statistical Institute. Published by Blackwell Publishing Ltd, 9600 Garsington Road, Oxford OX4 2DQ, UK and 350 Main Street, Malden, MA 02148, USA.

Statistical Thinking in Sports edited by Jim Albert, Ruud H. Koning

Embed Size (px)

Citation preview

International Statistical Review (2008), 76, 1, 140–156 doi:10.1111/j.1751-5823.2007.00039.x

Short Book ReviewsEditor: Simo Puntanen

A Statistical Approach to Neural Networks for Pattern RecognitionRobert A. DunneWiley, 2007, xvii + 268 pages, £ 47.50 / € 67.70, hardcoverISBN: 978-0-471-74108-4

Table of contents

1. Introduction 8. Influence curves for the multi-layer2. The multi-layer perception model perceptron classifier3. Linear discriminant analysis 9. The sensitivity curves of the MLP classifier4. Activation and penalty functions 10. A robust fitting procedure for MLP models5. Model fitting and evaluation 11. Smoothed weights6. The task-based MLP 12. Translation invariance7. Incorporating spatial information into an 13. Fixed-slope training

MLP classifier A. Function minimizationB. Maximum values of the influence curve

Readership: Students and professionals in mathematics, statistics, computer science, andelectrical engineering.

In the preface, Robert Dunne describes neural networks or multilayer perceptrons as statisticalconcepts under a different terminology. I think this is exactly right. Although, in their early days,emphasis was on the network representation, various statisticians rapidly made it clear that thiswas just another way of looking at what were effectively rather complicated nonlinear statisticalmodels. The statistical input also led to a deeper understanding of issues such as overfittingwhich bedevilled early presentations by neural network enthusiasts. (I recall attending more thanone in which ‘100% predictive accuracy’ was reported with enthusiasm.)

This book arose from the author’s efforts in recasting descriptions of neural networks instatistical terms. The first five chapters describe neural network models and relate them to earlierstatistical models. Chapters 6 and 7 describe how to apply them to problems with large numbersof classes and some image problems; these are the sorts of problems most often encounteredby the author. Chapters 8 through 10 explore the robustness of the models. This is always animportant issue for highly flexible modeling methods. The last three chapters describe furtherextensions to the basic approach.

The book provides an excellent introduction to neural networks from a statistical perspective.It would make ideal reading for a graduate student or researcher about to enter the area, orsomeone who wished to have a very sound grasp of this class of models in order to apply themeffectively.

David J. Hand: [email protected] Department, Imperial College

London SW7 2AZ, UK

C© 2008 The Author. Journal compilation C© 2008 International Statistical Institute. Published by Blackwell Publishing Ltd, 9600 Garsington Road,Oxford OX4 2DQ, UK and 350 Main Street, Malden, MA 02148, USA.

SHORT BOOK REVIEWS 141

Data Clustering: Theory, Algorithms, and ApplicationsGuojun Gan, Chaoqun Ma, Jianhong WuSIAM, 2007, xxii + 466 pages, £ 60.00 / US$ 114, softcoverISBN: 978-0-898716-23-8

Table of contents

Part I. Clustering, Data and Similarity Measures 13. Density-based clustering algorithms1. Data clustering 14. Model-based clustering algorithms2. Data types 15. Subspace clustering3. Scale conversion 16. Miscellaneous algorithms4. Data standardization and transformation 17. Evaluation of clustering algorithms5. Data visualization Part III. Applications of Clustering6. Similarity and dissimilarity measures 18. Clustering gene expression data

Part II. Clustering Algorithms Part IV. MATLAB and C++ for Clustering7.Hierarchical clustering techniques 19. Data clustering in MATLAB8. Fuzzy clustering algorithms 20. Clustering in C/C++9. Center-based clustering algorithms A. Some clustering algorithms

10. Search-based clustering algorithms B. The kd-tree data structure11. Graph-based clustering algorithms C. MATLAB codes12. Grid-based clustering algorithms D. C++ codes

Readership: Applied statisticians, engineers, scientists; researchers in pattern recognition,artificial intelligence, machine learning and data mining; applied mathematicians. Suitable forgraduate level introduction.

Cluster analysis seeks to decompose a population of objects into homogeneous groups. Manydifferent clustering algorithms have been developed. This is partly a consequence of the differentareas of data analysis which have looked at the problem (statistics, data mining, machine learning,pattern recognition, etc) and partly because of the large numbers of application areas which makeuse of such methods. The amount of work in the area is probably now such that a comprehensivereview would be impossible. Perhaps for this reason, the authors of this book claim to concentrateon a small number of important algorithms. However, since this ‘small number’ does includedivisive and agglomerative hierarchical methods, fuzzy methods, centre-based methods such ask-means and k-modes, density based methods, graph based methods, grid based methods, andmodel based methods, I would imagine it would be comprehensive enough to satisfy all but themost exacting of standards.

The book also has excellent discussions of the basic elements which feed into cluster analysistools, covering such things as data types, scale conversion, standardisation, visualisation, anddistance and dissimilarity measures.

The book is slightly marred by the fact that the Author Index is faulty (I checked a dozennames, and none of the cited pages I checked were correct). I suspect that the page numbers eitherreferred to an earlier version, or that a software glitch occurred. In either case, the publishersshould have detected it. The quality of the book is such that I would expect it to be reprinted, orto appear in new editions, so I encourage the publishers to correct this problem. Fortunately, theSubject index seems to be accurate.

Apart from this, the book provides excellent coverage of the area, and I thoroughly recommendit. I have already ordered another copy for a student of mine who is beginning to work inclustering.

David J. Hand: [email protected] Department, Imperial College

London SW7 2AZ, UKInternational Statistical Review (2008), 76, 1, 140–156C© 2008 The Author. Journal compilation C© 2008 International Statistical Institute

142 SHORT BOOK REVIEWS

Understanding Complex Datasets: Data Mining with Matrix DecompositionsDavid B. SkillicornChapman & Hall/CRC Press, 2007, xxi + 236 pages, £ 39.99 / US$ 69.95, hardcoverISBN: 978-1-584-88832-1

Table of contents

1. Data mining 7. Independent component analysis (ICA)2. Matrix decompositions 8. Non-negative matrix factorization (NNMF)3. Singular value decomposition (SVD) 9. Tensors4. Graph analysis 10. Conclusion5. Semidiscrete decomposition (SDD) Appendix: Matlab scripts6. Using SVD and SDD together

Readership: Researchers who want to model complex data sets, researchers in computing,graduate or advanced undergraduate students in data mining.

This is a rather idiosyncratic but in some ways rather elegant book. It would make excellent andeye-opening reading for a newcomer to research in data mining methods, and would also providestimulating and thought-provoking reading for PhD students working in multivariate statistics.

The book draws a distinction between business data sets (characterised, the author claims,by the definitions and meanings being relatively clear) and scientific data sets (where eachmeasurement is often a combination of multiple factors or components). Then ‘matrix decom-positions use the relationships among large amounts of data and the probable relationshipsbetween the components’ to separate the components. The author decomposes the raw n by pdata matrix in various ways, and shows that the various decompositions are related. By ‘matrixdecomposition’, the author means ‘a way of expressing a dataset matrix . . . as the product of anew set of matrices’. Singular value decomposition and independent components analysis areexamples of such decompositions.

The author illustrates the ideas with a very wide range of modern data mining applications,including, the PageRank algorithm used by Google, walks through graphs, recommendersystems, microarray analysis, detecting spam emails, content analysis, citation analysis, detectingterrorist cliques, mineral exploration, and others.

David J. Hand: [email protected] Department, Imperial College

London SW7 2AZ, UK

The Statistical Analysis of Recurrent EventsRichard J. Cook, Jerald F. LawlessSpringer, 2007, xx + 403 pages, € 69.95, hardcoverISBN: 978-0-387-69809-0

Table of contents

1. Introduction 7. Observation schemes giving incomplete or2. Models and frameworks for analysis of recurrent selective data

events. 8. Other topics3. Methods based on counts and rate functions A. Estimation and statistical inference4. Analysis of gap times B. Computational methods5. General intensity-based models C. Code and remarks on selected examples6. Multi-type recurrent events D. Datasets

International Statistical Review (2008), 76, 1, 140–156C© 2008 The Author. Journal compilation C© 2008 International Statistical Institute

SHORT BOOK REVIEWS 143

Readership: Graduate students, researchers, and applied statisticians working in industry,government, or academia.

This book describes models and tools for analysing processes in which events occur repeatedlyover time. Such processes are ubiquitous, arising in medicine (attacks of some recurrent illness,seizures, relapses, etc), breakdowns followed by repair of systems, insurance claims, and a hugenumber of other areas.

The book distinguishes between situations involving few processes, each generating manyevents, and systems involving a large number of processes, and concentrates mainly on the latter.It covers methods based on counts and rate functions, gap times, intensity models, multityperecurrent events, marked processes, and others. It goes beyond merely presenting models forstraightforward idealised situations, and also describes issues such as modelling with incompletedata and selectivity bias. While, of course, there are doubtless special topics that it does not cover,it is remarkably comprehensive, covering such topics as tests for multiplicative covariate effectsand time-varying covariates. It illustrates the ideas and methods with a wide range of examplesfrom many different areas. An appendix gives information on software.

I imagine that this book will justifiably become the main source of reference for researchersworking on or applying such methods.

David J. Hand: [email protected] Department, Imperial College

London SW7 2AZ, UK

Random Dynamical Systems: Theory and ApplicationsRabi Bhattacharya, Mukul MajumdarCambridge University Press, 2007, xv + 463 pages, US$ 39.99, softcoverISBN: 978-0-521-53272-3

Table of contents

1. Dynamical systems 5. Invariant distributions: estimation and computation2. Markov processes 6. Discounted dynamic programming under uncertainty3. Random dynamical systems A. Appendix4. Random dynamical systems: special structures

Readership: Graduate students or advanced undergraduates of stochastic processes and dynamicsystems.

The book is concerned with discrete time dynamic processes developing over an infinite time.The state of such a system at a given time is determined from its state at the previous time bymeans of a stationary rule. Even with deterministic rules, chaotic behaviour can result. Whenrandom perturbations are added to the mix, random dynamical systems result. Of key concernwith such systems is their long term stability. Chapter 1, covering over 100 pages, provides anaccessible introduction to deterministic systems. The authors have used this chapter alone as acourse in the subject. Chapter 2, another 100 page chapter, provides an excellent introductionto Markov processes. This also would form the basis for a complete course in this material.Chapter 3 then extends the material to random dynamical systems, with Chapter 4 lookingat some special structures. Chapters 5 and 6 round things off by looking at estimationand computation of invariant distributions, and discounted dynamic programming underuncertainty.

International Statistical Review (2008), 76, 1, 140–156C© 2008 The Author. Journal compilation C© 2008 International Statistical Institute

144 SHORT BOOK REVIEWS

This is a beautiful book. Although mostly oriented towards economic applications, there aresufficient references to and illustrations of other application scenarios to make this of interestto a far wider readership than merely econometricians. Engineers, statisticians, and physicistswould find the material of interest and value.

The Nobel Prize winning economist Paul Samuelson has been quoted in a blurb on the backof the book, saying ‘as I turn the pages of this new book, I come to realize how lucky graduatestudents of today are.’ I think he is right.

David J. Hand: [email protected] Department, Imperial College

London SW7 2AZ, UK

Evaluating Clinical Research: All That Glitters is not Gold, 2nd EditionBengt D. Furberg, Curt D. FurbergSpringer, 2007, v + 165 pages, € 24.95 / US$ 29.95, softcoverISBN: 978-0-387-72898-8

Table of contents

1. What is the purpose of this book? 16. What happened to the study subjects who2. Why is benefit-to-harm balance essential to disappeared from the analysis?

treatment decisions? 17. How reliable are active-control trials?3. What are the strengths of randomized controlled 18. How informative are composite outcomes?

clinical trials? 19. Do changes in biologic markers predict clinical4. What are the weaknesses of randomized controlled benefit?

clinical trials? 20. How trustworthy are the authors?5. Do meta-analyses provide the ultimate truth? 21. Does publication in a reputable scientific journal6. What are the strengths of observational studies? guarantee quality?7. What are the weaknesses of observational studies? 22. Is it necessary to be a biostatistician to interpret8. Were the scientific questions specified in advance? scientific data?9. Were the treatment groups comparable initially? 23. Are all drugs of a class interchangeable?

10. Why is blinding/masking so important? 24. How much confidence can be placed on economic11. How is symptomatic improvement measured? analysis?12. Is it really possible to assess quality of life? 25. How should I handle the massive flow of13. What is the value of biologic markers in drug information?

evaluation? 26. How well is research translated into clinical care?14. How are adverse drug reactions measured? Appendix A: Glossary15. How representative are study subjects in clinical Appendix B: Explanations for checklist questionstrials?

Readership: Health care professionals, pharmaceutical company employees, and others whowish to be able to understand the strengths and weaknesses of clinical studies.

This has the potential to be a really valuable book, but in my opinion this version, an updatedand expanded second edition of a book first published in 1994, does not achieve its potential.

Its aim is to enable a non-statistical reader to distinguish sound methodology from weakmethodology in clinical trials. This is a highly laudable and valuable aim. Moreover, the bookhas the merit of containing no mathematics or statistics – necessary if it is to appeal to theintended audience. It also has the merit of being short – again necessary if it is to be read bybusy health care workers. Finally, it has a heavy sprinkling of cartoons, which certainly appealsto me.

International Statistical Review (2008), 76, 1, 140–156C© 2008 The Author. Journal compilation C© 2008 International Statistical Institute

SHORT BOOK REVIEWS 145

However, given the wide range of issues it seeks to cover, its brevity has led to a superficialitywhich I think means it often runs the risk of being misleading. It would have been improved bymore material relating the different topics, and setting them each in the context of the others.Elaboration in this way might have helped avoid such oversimplifications as (p. 39) ‘A clinicaltrial determines prospectively whether [a] hypothesis is true. . . .’ Or (p. 112) ‘For example, a95% [confidence interval] provides information about the upper and lower boundaries of theobserved treatment difference. . . .’ I think that to understand what the authors mean in such casesone would already have to understand the ideas.

Pages 13–14 refer to the Hawthorne effect, describing the original study in the Western ElectricCompany factory in Chicago where it was carried out. The authors remark that ‘the specialattention given to the workers who participated in the study explains the improvement in overallperformance.’ That is certainly the popular myth. The truth, however, is that, in this particularstudy, the improvement in performance probably owed more to the changing management styleas the study progressed as well as the replacement of the two of the workers who seemed tobe working least hard halfway through the study. One of the replacements (who had recentlybecome her family’s sole breadwinner) was very highly motivated, and since rewards dependedon overall group output she was also motivated to encourage the others to work hard. The originalHawthorne effect study is a nice case study of trial weaknesses of kinds that the authors describeelsewhere in their book.

As I said, I think the aim of this book is praiseworthy, and that it could achieve that aim, albeitat the cost of slightly extending its length. And if and when they authors do produce such arevision, they might take note of the blurb on the back of the book, which says ‘the authors haveacquired much of their knowledge about clinical studies through the “trial and error” method.’I am afraid that does not fill me with confidence, since it inevitably invites the question aboutwhat errors remain. Perhaps that sentence should be removed from the blurb in the next edition.

David J. Hand: [email protected] Department, Imperial College

London SW7 2AZ, UK

Statistical Genetics of Quantitative Traits: Linkage, Maps and QTLRongling Wu, Chang-Xing Ma, George CasellaSpringer, 2007, xvi + 365 pages, € 69.95 / £ 54.00, hardcoverISBN: 978-0-387-20334-8

Table of contents

1. Basic genetics 9. The structure of QTL mapping2. Basic statistics 10. Interval mapping with regression analysis3. Linkage analysis and map construction 11. Interval mapping by maximum likelihood4. A general model for linkage analysis in controlled approach

crosses 12. Threshold and precision analysis5. Linkage analysis with recombinant inbred lines 13. Composite QTL mapping6. Linkage analysis for distorted and misclassified 14. QTL mapping in outbred pedigrees

markers Appendix A: General statistical results and7. Special considerations in linkage analysis algorithms8. Marker analysis of phenotypes Appendix B: R programs

Readership: Researchers in Statistical Genetics.

International Statistical Review (2008), 76, 1, 140–156C© 2008 The Author. Journal compilation C© 2008 International Statistical Institute

146 SHORT BOOK REVIEWS

The authors note that statistical methods for QTL (quantitative trait loci) mapping are scatteredaround a huge volume of literature. Their purpose here is to give a coherent accessible accountspanning the gap between genetics and statistics. Thus they aim to generate interest and encouragejoint research between the disciplines.

There are 14 chapters that fall naturally into three parts. The first part (chapters 1 and 2)contains introductory accounts of genetics and statistics. The latter just sets out a bare minimumof what is needed here, though there is some additional material in Appendix A. In part 2(chapters 3 to 7) detailed accounts of various aspects of linkage analysis are given. The third part(chapters 8 to 14) covers statistical models and computational algorithms for QTL mapping: inthis part the power of modern statistical methodology is well exemplified.

Each chapter ends with some exercises, Appendix B lists R code for some of the examplesused in the text, and a web site is given where both R and Matlab code can be found for all theexamples.

My impression is that this is an ideal book for a young researcher looking for an exciting anddeveloping field to get into.

M.J. Crowder: [email protected] 530, Mathematics Department, Imperial College London

Huxley Building, 180 Queen’s Gate, London SW7 2BZ, UK

Multiscale Modeling: A Bayesian PerspectiveMarco A.R. Ferreira, Herbert K.H. LeeSpringer, 2007, xii + 245 pages, € 62.95 / US$ 79.95, hardcoverISBN: 978-0-387-70897-3

Table of contents

1. Introduction 10. Multiscale random fields2. Models for spatial data 11. Multiscale time series3. Illustrative example 12. Change of support models4. Convolution methods 13. Implicit computationally linked model overview5. Wavelet methods 14. Metropolis-coupled methods6. Overview on explicit multiscale models 15. Genetic algorithms7. Gaussian multiscale models on trees 16. Soil permeability estimation8. Hidden Markov models on trees 17. Single photon emission computed tomography example9. Mass balanced multiscale models on trees 18. Conclusions

Readership: Students and practitioners of Multiscale Modeling and Analysis by Bayesianmethods.

Both the natural and human world abound in phenomena or processes involving multiple scales.For example observations may be made weekly, monthly or annually. Alternatively, the problemcould be one of automated recognition of place, face or letter, in which scale would involveat one end, namely the coarsest, the landmarks and at the other, i.e., finest, small details. Thisis a wonderfully written review of what is known about multiscale modelling and associatedBayesian inference. The modelling can be viewed as part of the Bayesian Analysis. A substantialpart of this work grew out of an NSF funded large interdisciplinary project in which the authorsof this monograph participated.

International Statistical Review (2008), 76, 1, 140–156C© 2008 The Author. Journal compilation C© 2008 International Statistical Institute

SHORT BOOK REVIEWS 147

The authors distinguish between three very broad categories. The first category is wherethe object under study is quite complex, e.g., a society or organization or flow of a river, andmust be studied at different levels to understand the main sources of variations at differentscales. Typically these will be different at different levels but sometimes there will be complexinterrelationships. A second category is where the observations are taken at the finest levels andthen aggregated at coarser levels to smooth out the finer features and provide a clear pictureof the most significant features at different levels. The third category of problems is one wherethe scales arise somewhat less naturally, as part of computation. For example this is done inlikelihood or Bayesian analysis if the likelihood is multimodal. Coarser scales are introduced tosmooth out the uninteresting, small modes.

Professors Ferreira and Lee discuss in detail what is to be done for each category of problems.The models are very clearly described and discussed with a lot of insight. The computationaldetails are also discussed well.

The book is very well written, but this complex subject deserves a sequel, which providesfurther details on a few well-chosen methods and a few simulated and real life examples toillustrate the application of those methods. The examples that are partially worked out in thebook are good, but they aren’t treated at the kind of depth that is needed to get hands on experience.That’s what a sequel could do ideally, there would be a good discussion of the priors chosen andwhat is learnt from the posterior (and its significance) as well as open questions and possiblyambiguities and uncertainties that almost always arise in real life analysis.

Jayanta K. Ghosh: [email protected] of Statistics, Purdue University

West Lafayette, IN 47909, USA

Introduction to Bayesian StatisticsKarl-Rudolf KochSpringer, 2007, xii + 249 pages, € 89.95 / US$ 119.00, hardcoverISBN: 978-3-540-72723-1

Table of contents

1. Introduction 4. Linear model2. Probability 5. Special models and applications3. Parameter estimation, confidence regions and 6. Numerical methods

hypothesis testing

Readership: Students and practitioners applying Bayes statistics, to specially geophysical andgeodetic problems.

Statistics has an old historical connection with Geodesy that goes back to the eighteenth century.Though not specifically written for geodesists or engineers, this is a well-written introductionto Bayesian Analysis that contains many applications to Geodesy and Engineering at the cuttingedge of these topics. The book begins from basics but takes the reader all the way to HierarchicalBayesian Analysis, based on MCMC and other Monte Carlo techniques for calculating posteriors.

There is a good treatment of Bayesian Analysis of Linear Models, which is arguably one ofthe most important topics in applications, and many extremely interesting special applicationsthat include Variance Components, Tomography, Image Reconstruction, Classification, and ofcourse Geodesy. Each of these applications is sufficiently complex that only a very condensed

International Statistical Review (2008), 76, 1, 140–156C© 2008 The Author. Journal compilation C© 2008 International Statistical Institute

148 SHORT BOOK REVIEWS

overview is possible, but many examples are drawn from Professor Koch’s own work and thatof his colleagues. Professor Koch has done a very difficult job quite well. The references arevery interesting in that they introduce us to a world of sophisticated applications of BayesianAnalysis by a group of scientists of whose work many of us in the Statistical profession may notbe aware.

The strength of the book lies in its coverage, careful mathematics and many contemporaryapplications. But one tends to miss the sort of intuitive discussion of methods and ideas, generallyand as applied to different examples, which gives Statistics, in general, and Bayesian Analysis,in particular, its depth and appeal to scientists. I guess such an extensive coverage starting fromscratch involves both a quick pace and some neglect of the subtleties, e.g., in the choice ofpriors in the Component of Variance problems. There have been relatively recent discussions byBerger and Deely (JASA, 1988), Gelman (Bayesian Analysis, 2006). A student or reader woulddo well to supplement this book with a one by Gelman, Carlin, Stern and Rubin (Bayesian DataAnalysis, Second Edition, Chapman & Hall/CRC, 2003) which is certainly in Professor Koch’sextensive bibliography.

Jayanta K. Ghosh: [email protected] of Statistics, Purdue University

West Lafayette, IN 47909, USA

Statistical Thinking in SportsJim Albert, Ruud H. Koning (Editors)Chapman and Hall/CRC, 2008, xii + 298 pages, US$ 49.95 / £ 29.99, hardcoverISBN: 978-1-58488-868-0

Table of contents

1. Introduction 8. Does momentum exist in a baseball game ?2. Modelling the development of world 9. Inference about batter-pitcher matchups in baseball

records in running from small samples3. The physics and evolution of Olympic 10. Outcome uncertainty measures: how closely do they

winning performances predict a close game ?4. Competitive balance in national European 11. The impact of post-season play-off systems on the

soccer competitions attendance at regular season games5. Statistical analysis of the effectiveness of the 12. Measurement and interpretation of home

FIFA World Rankings advantage6. Forecasting scores and results and testing the 13. Myths in Tennis

efficiency of the fixed-odds betting market in 14. Back to back evaluations on the gridironScottish league football 15. Optimal drafting in hockey pools

7. Hitting in the pinch

Readership: Sports enthusiasts, with some background in statistics. They can be students,teachers, researchers or practitioners or sports policy makers.

Publications in statistical thinking have been around for several decades now and this book isan international collection of current research in statistics and sports, written by authors froma variety of areas. The book is full of interesting and useful examples to use when teachingstatistics. The chapters contain a summary/conclusion at the end and a wealth of references forthe reader to pursue. It is not a textbook with questions and answers to be used in a classroom

International Statistical Review (2008), 76, 1, 140–156C© 2008 The Author. Journal compilation C© 2008 International Statistical Institute

SHORT BOOK REVIEWS 149

and so educators will have to use the material, in the chapters, to provide their own questions toassess learning.

The book has a website http://www.statistical-thinking-in-sports.com that has additionalmaterial as appendices, references, tables and other data. The authors request that if you usethe information from the website and write a paper or have a project using this material that yousend it to them so they can include it on the website.

This is a book that is intended for sports enthusiasts, who have a background in statistics. Thechapters are written so that readers can follow the text without having in depth knowledge of theparticular sport; however, a background in statistics is essential to being able to follow the text.

Susan Starkings: [email protected] South Bank University

103 Borough Road, London SE1 0AA, UK

Correspondence Analysis in Practice, Second Edition

Michael GreenacreChapman and Hall/CRC, 2007, xiii + 280 pages, £ 39.99 / US$ 79.95, hardcoverISBN 978-1-58488-616-7

Table of contents

1. Scatterplots and maps 16. Multiway tables2. Profiles and the profile space 17. Stacked tables3. Masses and centroids 18. Multiple correspondence analysis4. Chi-square distance and inertia 19. Joint correspondence analysis5. Plotting chi-square distances 20. Scaling properties of mca6. Reduction of dimensionality 21. Subset correspondence analysis7. Optimal scaling 22. Analysis of square tables8. Symmetry of row and column analyses 23. Data recoding9. Two-dimensional maps 24. Canonical correspondence analysis

10. Three more examples 25. Aspects of stability and inference11. Contributions to inertia App. A: Theory of correspondence analysis12. Supplementary points App. B: Computation of correspondence analysis13. Correspondence analysis biplots App. C: Bibliography of correspondence analysis14. Transition and regression relationships App. D: Glossary of Terms15. Clustering rows and columns App. E: Epilogue

Readership: Everyone interested in visualization of categorical data.

This is a brilliant book written by an experienced writer. It’s the second edition, quite completelyrewritten and reorganized. The preface includes a nice comparison of the correspondances of thefirst and second editions, expressed by the means of the method described in the book, namely,correspondence analysis.

The book is somewhat unusual in many respects. For example, each chapter is exactly eightpages in length! There’s a clear didactic idea behind this, and it has strongly guided the writingprocess. Also the marginal notes have a big role, as they are also used to summarize each chapterin the end. In addition, there’s a large number of tables and graphs with very informative captions.Indeed, this is a smooth book to read.

International Statistical Review (2008), 76, 1, 140–156C© 2008 The Author. Journal compilation C© 2008 International Statistical Institute

150 SHORT BOOK REVIEWS

There are 25 chapters of the same length, summing exactly to 200 pages. They cover a broadrange of the topic, beginning from very simple scatterplots, proceeding through the fundamentalsof the correspondence analysis and its variations, finally ending to questions of special interestsuch as aspects of stability and inference.

And the book doesn’t even end yet. There are over 70 extra pages, organized as fiveappendices. The first two give valuable information on the theoretical grounds of the method andcomputational issues, which are implemented in R and Excel. The more demanding mathematicsis buried here, while the chapters do not require so much in this sense. Also the computationaldetails do not spoil the text, as they are all given here, in a 46-page appendix. Altogether, thismakes it very easy to read the book.

Correspondence analysis has an extremely interesting history. Hence, an annotated bibliogra-phy as well as a glossary of terms shed light on the sometimes complicated phases of developmentthat have taken place in several application areas. As this development still goes on, the authorof the book being one of the experts of the field, it is quite stimulating to read the epilogue of thebook. It elaborates certain aspects of the method that are frequently discussed, and adds somepersonal thoughts of the author. In my opinion, this kind of insight is something that practicallyevery book could have.

I would truly recommend this book for everyone who is interested in analysing and visualizingcategorical data. Especially those interested in correspondence analysis techniques will surelyfind lots of use for this book.

Kimmo Vehkalahti: [email protected] of Mathematics and StatisticsFI-00014 University of Helsinki, Finland

Nonparametric Statistics with Applications to Science and Engineering

Paul H. Kvam, Brani VidakovicWiley, 2007, 420 pages, £ 57.95 / € 77.00, hardcoverISBN: 978-0-470-08147-1

Table of contents

1. Introduction 11. Density estimation2. Probability basics 12. Beyond linear regression3. Statistics basics 13. Curve fitting techniques4. Bayesian statistics 14. Wavelets5. Order statistics 15. Bootstrap6. Goodness of fit 16. EM algorithm7. Rank tests 17. Statistical learning8. Designed experiments 18. Nonparametric Bayes9. Categorical data A. MATLAB

10. Estimating distribution functions B. WinBUGS

Readership: Undergraduate and graduate students in applied mathematics, statistics, computerscience, and engineering; but also for researchers and practitioners working with traditional andmodern nonparametric techniques.

If a parametric model is used to describe the observed data, there is a fixed number of parameterswhich determine the distribution completely. If this is not realistic, robust inference techniquescan be used. Another possibility is to assume a nonparametric model and then use corresponding

International Statistical Review (2008), 76, 1, 140–156C© 2008 The Author. Journal compilation C© 2008 International Statistical Institute

SHORT BOOK REVIEWS 151

nonparametric procedures to analyse the data. In the classical regression analysis, for example,either the error distribution or the regression function may be assumed to be nonparametric.

The book presents a practical approach to nonparametric statistical analysis. First, somebackground material with fundamental concepts of probability theory and statistical inferenceare provided. The presentation then continues with classical nonparametric procedures (orderstatistics, goodness of fit tests, sign and rank tests under different designs) but discusses, laterin the book, the modern nonparametric density estimation, curve fitting and statistical learningtechniques as well. In fact, the book covers a huge amount of material for practical data analysis,even for the analysis the categorical or survival data. Even Bayesian inference is discussed. Ascomputational tools, descriptions of bootstrap techniques and EM algorithm are included. Thebook is integrated with MATLAB computing language (and WinBUGS for Bayesian calculation).

The book is clearly written and well organised. I liked very much the photos and historicaldetails of statisticians in the middle of the text. The book certainly works well as a reference textfor practitioners and applicants of modern statistical procedures and as a textbook for graduatecourses in engineering and physical sciences.

Hannu Oja: [email protected] School of Public Health

FI-33014 University of Tampere, Finland

Introduction to Modern Time Series Analysis

Gebhard Kirchgassner, Jurgen WoltersSpringer, 2007, x + 274 pages, € 79.95, hardcoverISBN: 978-3-540-73290-7

Table of contents

1. Introduction and basics 5. Nonstationary processes2. Univariate stationary processes 6. Cointegration3. Granger causality 7. Autoregressive conditional heteroskedasticity4. Vector autoregressive processes

Readership: Graduate students and researchers in economics who want an introduction to recentdevelopments in time series econometrics and their applications.

The book presents methods that are currently in wide use in time series econometrics. Afterintroducing standard stationary autoregressive moving average models topics more typicalin time series econometrics are discussed. Both univariate and multivariate time series areconsidered with special emphasis in testing Granger causality, use and interpretation of vectorautoregressions, and methods developed for nonstationary trending time series during the pasttwo or three decades. A chapter for modelling conditional heteroskedasticity, relevant forfinancial applications, is also included. This chapter is the only one related to nonlinear timeseries analysis. Other topics not covered include spectral analysis and nonparametric methodsin general.

The book is aimed at readers interested in applications. It introduces models and proceduresin a fairly nontechnical fashion which mostly excludes details about parameter estimation andstatistical inference. To appreciate this kind of exposition and the many empirical examples,basic knowledge of the linear model and maximum likelihood estimation or, more generally, ofeconometrics and statistics at the level of introductory textbooks is required.

International Statistical Review (2008), 76, 1, 140–156C© 2008 The Author. Journal compilation C© 2008 International Statistical Institute

152 SHORT BOOK REVIEWS

The book can be used as a text for a course in time series econometrics for graduate students ineconomics or econometrics. For graduate students in statistics it may be used as supplementarymaterial to highlight economic applications of time series analysis.

Pentti Saikkonen: [email protected] of Mathematics and StatisticsP.O. Box 68 (Gustaf Hallstromin katu 2b)FI-00014 University of Helsinki, Finland

Applied Nonparametric Statistical Methods, Fourth Edition

Peter Sprent, Nigel C. SmeetonChapman & Hall/CRC, 2007, x + 530 pages, £ 34.99 / US$ 79.95, hardcoverISBN: 978-1-58488-701-0

Table of contents

1. Some basic concepts 9. Analysis of survival data2. Fundamentals of nonparametric methods 10. Correlation and concordance3. Location inference for single samples 11. Bivariate linear regression4. Other single-sample inferences 12. Categorical data5. Methods for paired samples 13. Association in categorical data6. Methods for two independent samples 14. Robust estimation7. Basic tests for three or more samples 15. Modern nonparametrics8. Analysis of structured data

Readership: Undergraduate and graduate students in statistics and students majoring in otherdisciplines; also for self-study and reference material.

The book begins with a brief summary of general statistical concepts and basic ideas ofnonparametric and distribution-free methods. The classical sign and rank based inferencemethods are then presented and discussed in the one sample, two samples, and several samplescases as well as in the correlation and (univariate) regression analysis. Analysis tools forcategorical data and robust estimation techniques are given as well. The book also expandscoverage on the analysis of survival data and the bootstrap method. In the very last chapter, thenew edition also focuses on some modern developments. Each chapter ends with a list of fieldsof applications and a nice selection of exercises; solutions of selected exercises are given in anappendix. The formal testing procedures are illustrated in a nice way with realistic examplesleading to final conclusions, comments, and a discussion on computational aspects.

The book has a clear style with well organised material. The book works well as a referencebook for the users of nonparametric methods in different research areas. It is also a goodtextbook for undergraduate courses in statistics as well as courses for students majoring in otherdisciplines.

Hannu Oja: [email protected] School of Public Health

FI-33014 University of Tampere, Finland

International Statistical Review (2008), 76, 1, 140–156C© 2008 The Author. Journal compilation C© 2008 International Statistical Institute

SHORT BOOK REVIEWS 153

Reminiscences of a Statistician: The Company I KeptErich L. LehmannSpringer, 2008, xii + 309 pages, € 34.94 / US$ 44.95, softcoverISBN: 978-0-387-71596-4

Table of contents

1. Mathematical preparation 8. The Stanford Statistics Department2. Becoming a statistician 9. Nonparametrics and robustness3. Early collaborators 10. Foundations I: The frequentist approach4. Mathematical statistics at other universities 11. Foundations II: Bayesianism and data5. The Annals analysis6. The Berkeley Statistics Department I: 12. Statistics comes of age

Establishment and first generation 13. New tasks and relationships7. The Berkeley Statistics Department II: 14. England

The second generation 15. Contacts abroad

Readership: For anyone interested in the people and ideas from the academic side of ourprofession over the last 75 years.

This book is a gem, a must-read. It is a thoroughly enjoyable mixture of autobiographicaland broader historical material, presented as mini-biographies of over 60 mathematicians andstatisticians connected to the author, these being grouped by time, place or a scientific or personaltheme. Lehmann is extremely well known in the statistical world for his research contributions,and his books on testing, estimation, nonparametrics, and introductory statistics, which have beenwidely used for decades, and translated into many languages other than their original English.The wonderful writing style which makes his texts so popular shines out in this one. He suggestsearly on in the book that as a teenager his love was German literature, and that perhaps he wouldhave become a writer, had circumstances in his native Germany not dictated otherwise. Insteadhe became a mathematician, and later a statistician. A reader of this volume will come awaywith the distinct impression that he realized this early ambition, though perhaps not quite as heoriginally imagined.

There are so many fascinating pieces in this book that it is probably unwise to single anyout, but I cannot resist doing so. The portrait of the German mathematician Edmund Landauis a delight, showing a side to him and recounting his final, sad years in a way that cannot failto amuse and move the reader. The book contains many anecdotes relating to what we mightcall career development, implicitly emphasizing the role of chance in life. It contains much ofinterest on academic research, administration, teaching, the writing of papers and textbooks,collaboration, advising and, most prominently, friendship. It is also full of photos of those whopeople the book.

Terry Speed: [email protected] of Statistics, 367 Evans Hall #3860

University of California, Berkeley, CA 94720-3860, USA

International Statistical Review (2008), 76, 1, 140–156C© 2008 The Author. Journal compilation C© 2008 International Statistical Institute

154 SHORT BOOK REVIEWS

Statistical Test Theory for the Behavioral SciencesDato N. M. de Gruijter, Leo J. Th. van der KampChapman & Hall/CRC, 2007, xvi + 264 pages, £ 39.99 / US$ 69.95, hardcoverISBN: 978-1-58488-958-8

Table of contents

1. Measurement and scaling 8. Principal component analysis, factor analysis,2. Classical test theory and structural equation modeling:3. Classical test theory and reliability A very brief introduction4. Estimating reliability 9. Item response models5. Generalizability theory 10. Applications of item response theory6. Models for dichotomous items 11. Test equating7. Validity and validation of tests

Readership: Advanced undergraduate and graduate students in the behavioral sciences.

The book gives an introduction to the area which is called ‘test theory’ in the behavioral sciences.Here the word ‘test’ refers to a psychological test, a standardized procedure administered over agroup of persons. The measurements related to the tests lead to test scores, which may then beassessed by statistical procedures.

The aim of the series ‘Statistics in the Social and Behavioral Sciences’ is to ‘capture newdevelopments in statistical methodology with particular relevance to applications’ in these fields.Indeed, the book mentions certain newer developments, but mostly the focus is on the olddevelopments of psychometrics: classical test theory, the concept of reliability and its estima-tion under the restricted one-dimensional models, generalizability theory and item responsetheory.

Thinking about the readership and the applications for example in psychology and education,I would like to put more emphasis on the measurement. Although the need seems to be clearlystated in the preface, beginning with the words ‘What would science be without measurement?’,the coverage in the first chapter is short and cursory. It merely repeats the usual classificationof the nominal, ordinal, interval, and ratio scales, describing Celsius and other temperaturescales as examples. The temperatures may serve well in statistics books, but here I wouldcertainly have preferred a more thorough discussion of real-world examples of psychologicalmeasurement.

As its title suggests, the book is quite statistically oriented, although written by experts ofpsychology. It goes through the classical procedures on rather abstract level with a style adoptedfrom statistics books: ‘Suppose that we obtained a measurement x.’ It is followed by a collectionof historical formulas, where the concept of reliability gets quite much attention. The validity isdiscussed later, but the weight is again on the statistical aspects.

Factor analysis is only mentioned very briefly, and in connection with principal componentsand structural equation modeling, which is said to ‘play an important role in test theory’.However, the whole chapter consists of only a few pages. Unfortunately I am almost sure thatafter reading this chapter the students will not have a very clear idea of any of these importantmethods.

In contrast, the following 65 pages are devoted to item response theory, where a huge number ofstatistical apparatus ranging from maximum likelihood estimation to Markov chain Monte Carlomethods are developed around simple, usually dichotomous measurements. In their concluding

International Statistical Review (2008), 76, 1, 140–156C© 2008 The Author. Journal compilation C© 2008 International Statistical Institute

SHORT BOOK REVIEWS 155

remarks of these issues the authors present a very good and critical discussion. Another one isfound at the end of the chapter on the generalizability theory. One more could have been addedon the classical test theory as well.

Kimmo Vehkalahti: [email protected] of Mathematics and StatisticsFI-00014 University of Helsinki, Finland

Fifty Years of Human Genetics: A Festschrift and Liber Amicorum to Celebrate theLife and Work of George Robert FraserOliver Mayo and Carolyn Leach (Editors)Wakefield Press, 2007, xvi + 568 pages, € 62.00/US$ 93.50, hardcoverISBN: 978-1-86254753-7

Table of contents

1. Introduction 6. Clinical genetics2. History of human and medical genetics 7. Cytogenetics3. Biochemical and molecular genetics 8. Ethics4. Cancer genetics 9. Statistical and population genetics5. Childhood sensory defects

Readership: Statisticians interested in genetics.

Statistics and genetics have been intertwined at least since the time of Francis Galton and KarlPearson, who connected the two when studying heritable human traits before the rediscovery ofGregor Mendel’s researches. R.A. Fisher deepened and strengthened the connexion by buildingon Mendelian inheritance, along the way inventing the analysis of variance, and making numerousother contributions to statistics inspired by his efforts to analyse genetic data, human or otherwise.It seems that there is something about the way geneticists and statisticians think which drawsthem together, even more than the need for statistics in genetics, and need for applications instatistics would suggest. This is the first reason why readers of this Review should be interested inMayo and Leach’s “Festschrift and Liber Amicorum”. Furthermore, there are more contributionsunder the heading “Statistical and Population Genetics” than in any other area.

A second reason is the book itself. Over the years I have come to view festschrifts as rathermixed bags, collecting disparate contributions from contributors whose common theme is aprofessional or personal friendship with the subject, which does not in itself make such a bookentirely readable. The present volume is a striking exception, and we owe an enormous debt ofgratitude to the editors. It is very clearly a labour of love, and they have succeeded brilliantly.Beautifully presented and superbly edited, the contributions are not only interesting and well-written, they present viewpoints not much heard these days. As friends and associates of a personcelebrating his 75th year, the book’s contributors come from a time before the Brave New Worldof genetic technologies: the Human Genome Sequence, HapMaps, and SNP chips. It is great toread their viewpoints, and I encourage you to do so too.

The final reason is the subject of the book, George Robert Fraser. I surmise that hardly anystatisticians have heard of him, and they will end up wondering why after reading the tributes paidto him in this book by the many eminent geneticists of whom they will have heard. Suffice to say

International Statistical Review (2008), 76, 1, 140–156C© 2008 The Author. Journal compilation C© 2008 International Statistical Institute

156 SHORT BOOK REVIEWS

that he was graduated in 1953 with the first undergraduate degree in genetics from the Universityof Cambridge, under R.A. Fisher, that he later graduated in medicine from that university, that hehas diplomas in the French, Russian and Hungarian languages and Medical Statistics, a writtenand speaking knowledge of 8 other languages, not counting English, and a medical syndromeand a gene named after him. A remarkable man, and a remarkable book, one that should be ofgreat interest to readers of this Review.

Terry Speed: [email protected] of Statistics, 367 Evans Hall #3860

University of California, Berkeley, CA 94720-3860, USA

International Statistical Review (2008), 76, 1, 140–156C© 2008 The Author. Journal compilation C© 2008 International Statistical Institute