40
Graybill 2011 The Department of Statistics at Colorado State University June 22-24, 2011 Fort Collins, Colorado June 22 SEMIPARAMETRIC REGRESSION: A short-course June 23-24 Graybill Conference

Graybill Program 2011 - Colorado State University Program... · Ronald R. Miller (MS '76) George A ... Patrick Wolfe, and Thomas Lee Graybill Conference 2007 ... 9:20-10:00 Wenceslao

Embed Size (px)

Citation preview

Graybill 2011 The Department of Statistics at Colorado State University

June 22-24, 2011

Fort Collins,

Colorado

June 22

SEMIPARAMETRIC

REGRESSION:

A short-course

June 23-24

Graybill

Conference

2

3

Foreword

On behalf of the Department of Statistics at Colorado State University, I am delighted to welcome all the participants to the Graybill 2011 Conference on Modern Nonparametric Methods. Our sincere thanks to all the keynote and invited speakers, the instructor for the short course, and the poster presenters for their participation in this conference. We also thank our sponsors, both financial and institutional, for their assistance in making this con-ference possible. In addition to the main theme of nonparametric methods, this year's Graybill Conference marks the 50th anniversary of the Statistical Laboratory at Colorado State University, an event that will be celebrated during the conference banquet. I wish all participants an instructive, useful and fun stay in Fort Collins!

Jean Opsomer

Department of Statistics

Colorado State University

June 2011

4

Professor Franklin A. Graybill

Department of Statistics

Colorado State University

5

The following graduates of the Department of Statistics at Colorado State University completed their degrees under the guidance of Professor Franklin A. Graybill:

Mohamed H. Albohali (MS '79) Robert A. Ahlbrandt (MS '87) Carmen E. Arteaga (MS '80) James H. Baylis (MS '77) David C. Bowden (MS '65, PhD '68) Brent D. Burch (MS '93, PhD '96) James A. Calvin (PhD '85) Terrence L. Connell (MS '63, PhD '66) Ruth Ann Daniel (MS '80) Ali Mashat Deeb (MS '81) Richard M. Engeman (MS '75) Rana S. Fayyad (PhD '95) Mark J. Grassl (MS '80) Rongde Gui (PhD '92) Paul A. Hatab (MS '77) William C. Heiny (MS '81) Sakthivel Jeyaratnam (PhD '78) Dallas E. Johnson (PhD '71) Thomas A. Jones (MS '67) Yongsang Ju (MS '92) Adam Kahn (MS '78) M. Kazem Kazempour (PhD '88)

Albert Kingman (PhD '69) Stephen L. Kozarich (PhD '71) Ricardo A. Leiva (MS '82) Tai-Fang Chen Lu (MS '79, PhD '85) Sandra Mader (MS '77) Farooq Maqsood (MS '84) Louise R. Meiman (MS '67) Ronald R. Miller (MS '76) George A. Milliken (MS '68, PhD '69) Michael E. Mosier (PhD '92) William B. Owen (PhD '65) Antonio Reverter-Gomez (MS '94) Robert C. Rounding (PhD '65) Bhabesh Sen (PhD '88) Jeanne Simpson (MS '78) Syamala Srinivasan (MS '84, PhD '86) R. Kirk Steinhorst (MS '69, PhD '71) Naitee Ting (PhD ’87) N. Scott Urquhart (MS '63) Antonia Wang (MS '82) Chih-Ming (Jack) Wang (PhD '78)

6

Past Conferences:

Graybill Conference 2001 – June 13-15 Inaugural Graybill Conference on Linear, Nonlinear, and Generalized Linear Models Organizers: Hari Iyer & Jim zumBrunnen

Graybill Conference 2003 – June 18-20

Microarrays, Bioinformatics, and Related Topics Organizers: Hari Iyer & Jim zumBrunnen Short course on "Microarray Data Analysis" by Dr. Steen Knudsen

Graybill Conference 2004 – June 16-18

Spatial Statistics: Agricultural, Ecological & Environmental Applications Program Chair: Scott Urquhart Short Course on "Applied Spatial Statistics" by Dr. Jay Ver Hoef

Graybill Conference 2005 – June 1-2

Statistics in Information Technology Program Committee: Bin Yu (Chair), Thomas Lee (Co-Chair), Mark Hansen, and Hari Iyer Short Course on “Minimum Description Length” by Professors Bin Yu and Mark Hansen

Graybill Conference 2006 – June 11-13

Multi-scale methods and Statistics – A Productive Marriage Program Committee: Thomas Lee (Chair), Xiao-Li Meng, and Patrick Wolfe Short Course on “Multiscale methods” by Professors Xiao-li Meng, Patrick Wolfe, and Thomas Lee

Graybill Conference 2007 – June 12-15

A Workshop on Bioinformatics and a Symposium on Applied Probability and Time Series in honor of Professor Peter J. Brockwell. Program Committee: Duane Boes (honorary chair), Richard Davis, Jay Breidt, Asa Ben-Hur, and Hari Iyer Short Course on BLAST by Professor Warren Ewens

Graybill Conference 2008 –June 11-13 Biopharmaceutical Statistics Program Committee: Alfred Balch, Scott Evans, Brian Wiens, and Jim Whitmore Short Course on Hot Topics in Clinical Trials by Professors L.J. Wei, Marvin Zelen, Scott Evans and Lingling Li, Harvard University Graybill Conference/EVA 2009 –June 22-26 Extreme Value Analysis Program Committee: Dan Cooley , Richard Davis, Paul Embrechts, Anne-Laure Fougères, Ivette Gomes, Jürg Hüsler .Rick Katz, Claudia Klüppelberg, Thomas Mikosch, Philippe Naveau, Liang Peng, and Holger Rootzén Short Course: An Introducton ion to the analysis of extreme values using R and extRemes Eric Gilleland, NCAR, Boulder, CO and Mathieu Ribatet, EPFL, Lausanne, Switzerland

7

Lenae Andersen Barb Andre

Grant Dornan Amber Hackstadt Wade Herndon

Kelly McConville Ben Prytherch

Libo Sun

Asma Tahir John Tipton

Lu Wang Lulu Wang Jiwen Wu

Gabriel Young Yue Yang

Program Committee:

Mary Meyer (Co-Chair), Colorado State University

Jean Opsomer (Co-Chair), Colorado State University

Rui Song, Colorado State University

Ray Carroll, Texas A&M

Thomas Lee, University of California Davis

Matt Wand, University of Technology, Sydney, Australia

Jane-Ling Wang, University of California Davis

Organizing Committee:

Mary Meyer (Co-Chair), Colorado State University

Jean Opsomer (Co-Chair), Colorado State University Rui Song, Colorado State University

Jim zumBrunnen, Colorado State University

CSU Graduate Student Volunteers

8

9

Table of Contents

Program and Schedule of Talks Page 10 Abstracts for Talks (arranged in order of presentation) Page 13 Abstracts for Posters Page 25 Floor Plan of Hilton Fort Collins Conference Facilities Page 38 Map of Fort Collins Area Page 39

10

Graybill 2011

PROGRAM

June 22-24, 2011 Hilton Fort Collins, Fort Collins, CO

Wednesday, June 22, 2011

8:00-9:30 AM Registration Pre-Convention Area

9:30 AM to 12:50 PM Short Course Salon I 12:50 PM to 2:00 PM Lunch Atrium 2:00 PM to 4:30 PM Short Course Salon I

SEMIPARAMETRIC REGRESSION

A short-course by Professor Matt Wand (University of Technology, Sydney, Australia)

SHORT-COURSE OVERVIEW

4:00– 7:00 PM Registration Pre-Convention Area 6:30 Poster Set-up Salon II 6:30– 8:30 PM Opening Mixer Salon II

Semiparametric regression is concerned with the flexible incorporation of nonlinear functional relationships in regres-sion analyses. Assuming only a basic familiarity with ordi-nary regression, this short-course explains the techniques and benefits of semiparametric regression in a concise and modular fashion. Spline functions, linear mixed models and Bayesian hierarchical models are shown to play an impor-tant role in semiparametric regression. There will be a strong emphasis on implementation in R and BUGS. The short-course is based on the book `Semiparametric Regres-sion' by D. Ruppert, M.P. Wand and R.J. Carroll (Cambridge University Press, 2003) and a 2009 Electronic Journal of Statistics paper by the same authors which de-scribes more recent developments on the topic.

11

Thursday, June 23, 2011

7:00-8:00 Continental Breakfast Atrium 7:30AM –6:00 PM Registration /Poster Set-up Pre-Convention Area 8:00-8:30 Welcoming Remarks Salon I Jean Opsomer, Chair, Department of Statistics, Colorado State University Bill Farland, Vice President for Research, Colorado State University 8:30-10:00 Session 1-Rui Song, Chair Salon I 8:30-9:20 Keynote: Jianqing Fan Nonparametric Independence Screening for ultrahigh-dimensional sparse modeling 9:20-10:00 Yazhen Wang Quantum Computation and Quantum Simulation 10:00-10:15 Coffee Break Atrium 10:15-12:00 Session 2– Goeran Kauermann, Chair 10:15-10:50 Hui Zou Estimating Large Bandable Covariance Matrices 10:50-11:25 Gerda Claeskens Focused model selection using penalization methods 11:25-12:00 Xihong Lin Efficient Adaptive Score Test for Gene/SNP-Set Effects 12:00-1:00 Lunch Atrium 1:00-2:30 Session 3– Thomas Lee, Chair Salon I 1:00-1:50 Keynote: Jon Wellner Nonparametric estimation of log-concave densities 1:50-2:30 Jiayang Sun nFCA: Numerical Formal Concept Analysis 2:30-2:50 Coffee Break Atrium 2:50-4:35 Session 4-Mary Meyer, Chair 2:50-3:25 Bodhisattva Sen Nonparametric least squares estimation of a multivariate convex regression function 3:25-4:00 Mouli Banerjee Likelihood inference for current status data on a grid: a boundary phenomenon and an adaptive inference procedure 4:00-4:35 Geurt Jongbloed L1-type test for monotonicity of a hazard 4:45– 6:00 Poster Session Salon II 6:00 Cash Bar Salon IV 6:30-8:30 Banquet and Presentation, 50 Years of Graybill Statistical Laboratory Salon IV

12

Friday, June 24, 2011

8:30-10:00 Session 5-Marc Hallin, Chair Salon I 8:30-9:20 Keynote: Ingrid Van Keilegom Boundary estimation in the presence of measurement error with unknown variance 9:20-10:00 Wenceslao Gonzalez-Manteiga Presmoothing and testing in functional linear models 10:00-10:15 Coffee Break Atrium 10:15 –12:00 Session 6-Jean Opsomer, Chair 10:15-10:50 Marc Hallin and Yvik Swan Rank-Based Inference in Linear Models with a-Stable Errors 10:50-11:25 Joshua Habiger Randomized p-Values and Nonparametric Procedures in Multiple Testing 11:25-12:00 Reza Modarres A Triangle Test for Equality of Distribution Functions and the Lens Depth Function 12:00-1:00 Lunch Atrium

1:00-2:30 Session 7-Jay Breidt, Chair 1:00-1:50 Keynote: David Ruppert Guilt by Association: Finding Cosmic Ray Sources 1:50-2:30 Aurore Delaigle Nonparametric regression from group testing data 2:30- 2:45 Coffee Break Atrium 2:45-4:30 Session 8-Myung-Hee Lee, Chair 2:45-3:20 Yuhong Yang Adaptive Minimax Estimation over Sparse $l_q$-Hulls 3:20-3:55 Goeran Kauermann Penalized Splines-A Statistical Idea with numerous Applications 3:55-4:30 Naisyin Wang Functional Linear Model with Zero-value Coefficient Function at Sub-regions 4:30-4:45 Late Afternoon Snack Atrium

13

Abstracts

14

Thursday, June 23 8:30-9:20 Keynote #1 Jianqing Fan-Princeton University

Nonparametric Independence Screening for ultrahigh-dimensional sparse modeling A variable screening procedure via correlation learning was proposed in Fan and Lv (2008)to reduce dimensionality in sparse ultra-high dimensional models. Even when the true model is linear, the mar-ginal regression can be highly nonlinear. To address this issue, we further extend the correlation learn-ing to marginal nonparametric learning.Our nonparametric independence screening, NIS, is a specific member of the sure independence screening. Several closely related variable screening procedures are proposed. It is shown that under some mild technical conditions, the proposed independence screening methods enjoy a sure screening property. The extent to which the dimensionality can be reduced by independence screening is also explicitly quantified. As a methodological extension, an iterative non-parametric independence screening (INIS) is also proposed to enhance the finite sample performance for fitting sparse additive models. The simulation results and a real data analysis demonstrate that the proposed procedure works well with moderate sample size and large dimension and performs better than competing methods.

(Joint work with Yang Feng, Columbia University; Rui Song, Colorado State University)

9:20-10:00 Plenary Talk Yazhen Wang University of Wisconsin-Madison

Quantum Computation and Quantum Simulation Quantum computation and quantum information are of great current interest in computer science, mathematics, physical sciences and engineering. They will likely lead to a new wave of technological innovations in communication, computation and cryptography. As the theory of quantum physics is fundamentally stochastic, randomness and uncertainty are deeply rooted in quantum computation, quantum simulation and quantum information.Consequently quantum algorithms are random in nature, and quantum simulation utilizes Monte Carlo techniques extensively. Thus statistics can play an impor-tant role in quantum computation and quantum simulation.This talk will give a brief review on quan-tum computation, quantum simulation and quantum information. I will present some recent work on statistical analysis of quantum computation and quantum simulation.

Session on Model Selection 10:15-10:50 Hui Zou, School of Statistics, University of Minnesota

Estimating Large Bandable Covariance Matrices

Covariance matrix estimation has attracted a lot of attention in recent years. In this talk I will present some theoretical and empirical results on estimating large bandable covariance matrices, including a general minimax theorem and a tuning method based on Stein’s unbiased risk estimation.

15

Thursday, June 23 Session on Model Selection continued 10:50-11:25 Gerda Claeskens Katholieke University, Leuven Focused model selection using penalization methods The quest for a good estimator of a certain focus or target is present regardless of the dimensionality of the data. Obtaining such a good estimator with low mean squared error, or a prediction with low pre-diction error often proceeds via a variable selection or model selection search. Estimators can also be averaged to enlarge the space of possible estimators in an attempt to further lower the mean squared error. While these meth-ods are being studied mostly for unpenalized estimation methods in situations with the number of vari-ables much smaller than the sample size, this work concentrates on the additional difficulties and chal-lenges when applying these methods to penalized estimation. 11:25-12:00 Xihong Lin, Department of Biostatistics, Harvard University Efficient Adaptive Score Test for Gene/SNP-Set Effects In recent years, genome-wide studies have generated a large number of valuable datasets for assessing how genetic variations related to disease outcomes. With such datasets, it is often of interest to assess the overall effect of a set of genetic markers, assembled based on a pathway or multiple single nucleo-tide polymorphisms (SNPs) in a gene. Gene-set analyses have been advocated as more reliable and powerful approaches compared to the traditional marginal analysis of single markers. Statistical proce-dures for testing the overall effect of a gene-set have been well studied in recent years. For example, score tests derived under an Empirical Bayes (EB) framework have been proposed as powerful alter-natives to the standard Rao's p-degree freedom score test. The advantages of these EB based proce-dures are most apparent when the genes are highly correlated due to the reduction in the degree of freedom. In this paper, we propose an adaptive score test which upweights or downweights the contri-butions from each member of the gene-set based on the Z-scores of their effects. Such an adaptive pro-cedure gains power over existing procedures when the signal is sparse and the correlation among the genes is not high. By combining evidence between the EB based score test and the adaptive test, we further construct an omnibus test that attains a good power in most settings. The null distributions of the proposed test statistics can be approximated well either via simple perturbation procedures or χ2 approximations. Via extensive simulation studies, we demonstrate that the proposed procedures per-form well in finite sample. We apply the tests to a breast cancer study to assess the overall effect of the FGFR2 gene on the risk of breast cancer.

16

Thursday, June 23 1:00-1:50 Keynote #2 Jon A. Wellner, Department of Statistics, University of Washington Nonparametric estimation of log-concave densities I will review recent progress concerning nonparametric estimation of log-concave densities in $R^1$ and $R^d$. In the case of $R^1$, I will present limit theory for the estimators at fixed points at which the popu-lation density has a non-zero second derivative and for the resulting natural mode estimator under a corre-sponding hypothesis. In the case of $R^d$ with $d\ge 2$ will briefly discuss some recent progress and sketch a variety of open problems. 1:50-2:30 Plenary Talk Jiayang Sun. Case Western University

nFCA: Numerical Formal Concept Analysis

Joint work with Junheng Ma and GQ Zhang

In this talk, we introduce our Numerical Formal Concept Analysis (nFCA) technique. Formal Concept Analysis (FCA) is a powerful method in computer science (CS) for identifying overall inherent structures within and between the row and column variables (called objects and attributes in CS) of a binary data set. It is like lifting up the overall hierarchical structure of a forest from a superposition based on simple local in-formation, ie. pairwise relationships between variables of the data. The objective of nFCA is to combine FCA and statistics to translate what an FCA can offer for binary data to numerical data. The end product of our nFCA is a pair of nFCA graphs, where the H-graph is a clustered lattice graph indicatin g inherent hierar-chical and clustered relations, and the I-graph is a complementary tree plot indicating the strength and direc-tions of each of the relations and additional network relationships. The nFCA performs better than the con-ventional hierarchical clustering methods in terms of the Cophenetic correlation coefficient and the relational structure. Its application to a social network and cardiovascular (CV) traits data will be demonstrated.

Session on Shape-Restricted Estimation and Inference

2:50-3:25

Bodhisattva Sen ,Columbia University

Nonparametric least squares estimation of a multivariate convex regression function The talk will deal with the estimation of the nonparametric least squares estimator of a convex regression function when the predictor is multidimensional. We characterize and discuss the computation of such an estimator via the solution of certain quadratic and linear programs. Mild sufficient conditions for the consistency of this estimator and its subdifferentials in both fixed and sto-chastic design regression settings will also be presented.

17

Thursday, June 23 3:25-4:00 Mouli Banerjee, University of Michigan Likelihood inference for current status data on a grid: a boundary phenomenon and an adaptive infer-ence procedure.

We consider isotonic regression estimation and hypothesis testing for a survival distribution function at a point in the current status model with an equally-spaced grid distribution for the covariate. The grid resolution is specified as $cn^{-\gamma}$ with $c>0$ being a scaling constant and $\gamma>0$ determining the order of spacing. The asymptotic results belong to three cases according to the value of $\gamma$. The case with $\gamma=1/3$ constitutes the boundary case, whose limit distributions converge weakly to those of the other two cases with $\gamma\in(0,1/3)$ and $\gamma\in(1/3,\infty)$ as $c$ goes to $\infty$ and $0$, respectively. Further, we propose an adaptive procedure to make statistical inference without the estimation or specification of $\gamma$, which is of practical interest.

4:00-4:35

Guert Jongbloed, TU Delft, EURANDOM

18

Friday, June 24

Keynote #3 8:30-9:20 Ingrid Van Keilegom, Université catholique de Louvain; Institute of Statistics, Biostatistics and Actuarial Sciences Boundary estimation in the presence of measurement error with unknown variance Abstract : Boundary estimation appears naturally in economics in the context of productivity analysis. The performance of a firm is measured by the distance between its achieved output level (quantity of goods produced) and an optimal production frontier which is the locus of the maximal achievable output given the level of the inputs (labor, energy, capital, etc.). Frontier estimation becomes difficult if the out-puts are measured with noise and most approaches rely on restrictive parametric assumptions. This paper contributes to the direction of nonparametric approaches. A slightly simplified version of the general problem can be written as Y=X.Z, where Y is the observable output, X is the unobserved variable of in-terest with support [0,\tau] and density f, and Z is the noise. Suppose that f(\tau)>0, and that Z is inde-pendent of X and is log-normally distributed with \log Z ~ N(0,\sigma2) for some unknown variance \sig-ma2. The novelty of our approach consists in proposing a method for simultaneous estimation of \tau and \sigma. The asymptotic consistency and the rate of convergence of the estimators are established, and simulations are carried out to verify the performance of the estimators for small samples. We briefly de-scribe how the approach could be extended to the problem of estimating a frontier function. l

l

l l

l

l

l

l

l

l

l

l

l

l

l l

l

l

l

l

19

Friday, June 24

Plenary Talk 9:20-10:00 Wenceslao Gonzalez-Manteiga, Universidade de Santiago de Compostela

20

Friday, June 24

Journal of Nonparametric Statistics Session 10:15-10:50 Marc Hallin and Yvik Swan, Universite Libre de Bruxelles

21

Friday, June 24 10:50-11:25 Joshua Habiger, Oklahoma State University Randomized p-Values and Nonparametric Procedures in Multiple Testing

The validity of many multiple hypothesis testing procedures for False Discovery Rate (FDR) control relies on the assumption that p-value statistics are uniformly distributed under the null hypotheses. However, this as-sumption fails if the test statistics have discrete distributions or if the distributional model for the observables is misspecified. A stochastic process framework is introduced that, with the aid of a uniform variate, admits p-value statistics to satisfy the uniformity condition even when test statistics have discrete distributions. This al-lows nonparametric tests to be used to generate p-value statistics satisfying the uniformity condition. The re-sulting multiple testing procedures are therefore endowed with robustness properties. Simulation studies sug-gest that nonparametric randomized test p-values allow for these FDR methods to perform better when the model for the observables is misspecified.

11:25-12:00

Reza Modarres, George Washington University

:

22

Friday, June 24 Keynote #4 1:00-1:50 David Ruppert, Cornell University Guilt by Association: Finding Cosmic Ray Sources The Earth is continuously showered by cosmic rays, atomic nuclei moving with velocities close to that of light. In 2008 the most sensitive cosmic ray detector to date, the Pierre Auger Observatory (PAO), began op-eration. Roughly 70 Ultra High Energy Cosmic Rays (UHECRs) have been detected by PAO. Each is ap-proximately ten million times more energetic than the most energetic particles produced at the Large Hadron Collider. Astrophysical questions include: what phenomenon accelerates particles to such high energies, which astro-nomical objects host the accelerators, and what sorts of nuclei are energized? The magnetic deflection of the trajectories of UHECRs makes them potential probes of galactic and intergalactic magnetic fields. The data consist of precise arrival times and estimated energies and directions of the detected UHECRs, meas-urement uncertainties, and characterization of the observatory detection capabilities. We compare models with different source populations, including a ``null'' model assigning all cosmic rays to unresolved sources. We aim to (1) Ascertain which cosmic rays may be associated with specific sources; (2) Estimate luminosity func-tion parameters for astrophysical sources; (3) Estimate the proportion of detected cosmic rays generated by each population; (4) Estimate parameters describing the effects of cosmic magnetic fields; (5) Investigate whether cosmic rays from a single source are scattered independently (which we call a ``buckshot model'') or share part of their scattering history (an exchangeable ``radiant model''). We use Bayes factors to compare rival models and compare a number of approaches for marginal likelihood computation, including the harmonic mean estimator (which behaves poorly), Chib's method, an enumerative algorithm, and importance sampling. Plenary Talk 1:50-2:30 Aurore Delaigle, University of Melbourne Nonparametric regression from group testing data To reduce cost and increase speed of large screening studies, data are often pooled in groups. In these cases, instead of carrying out a test (say a blood test) on all individuals in the study to see if they are infected or not, one only tests the pooled blood of all individuals in each group. We consider this problem when a covariate is also observed, and one is interested in estimating the conditional probability of contamination. We show how to estimate this conditional probability using a simple nonpara-metric estimator. We illustrate the procedure on data from the NHANES study.

23

Friday, June 24

Session on Semiparametric Methods and Applications 2:45-3:20 Yuhong Yang-School of Statistics, University of Minnesota

Adaptive Minimax Estimation over Sparse $l_q$-Hulls For high-dimensional linear regression, both $l_0$ and $l_1$ norms on the coefficients have been used for sparse modeling of the regression function. In this work, we identify the minimax rates of convergence for re-gression estimation under $l_q$ constraints on the coefficients for $0<q<1$ for both random and fixed de-signs. Furthermore, our estimators based on model combination/selection are showed to simultaneously achieve the optimal rates over the whole range of $0\leq q \leq 1$. Our results also permit model mis-specification. The work is joint with Zhan Wang, Sandra Paterlini and Fuchang Gao 3:20-3:55 Göran Kauermann, Universität Bielefeld

Penalized Splines –A Statistical Idea with numerous Applications

The estimation method of penalized splines has been proposed by Eilers & Marx (Statistical Science, 1996) and has shown to be impressively flexible to be applied in numerous fields of statistics, as the recent survey of Ruppert, Wand & Carroll (Electronic Journal of Statistics, 2009) shows. The underlying idea is thereby sim-ple, in that in high dimensional estimation problems a penalty controls numerical and statistical stability. The talk starts by giving an introduction to the field of penalized splines and lists relevant results. The second part of the talk shows how to use penalized splines to model complex data structures, including copula estimation and functional data analysis.

24

Friday, June 24

3:55-4:30 Naisyin Wang, University of Michigan Functional Linear Model with Zero-value Coefficient Function at Sub-regions We propose a two-stage shrinkage method to estimate the coefficient function in a functional linear regression model when the value of the coefficient function is zero within certain sub-regions. Theoretically, we show that our estimator enjoys the desired Oracle property; it identifies the null region with probability tending to 1, and it achieves the same asymptotic normality for the estimated coefficient function on the non-null region, as in the regular functional linear model estimation when the non-null region is known. Numerically, owing to the additional refinement stage, we only need to use a large but not an extremely large number of knots in both stages, yet achieve superior numerical performances. Our refined estimator overcomes the shortcomings of most shrinkage estimators which tends to under-estimate the absolute scale of non-zero coefficients. The numerical performances of the proposed method are illustrated in a simulation study and an analysis of data collected by the Johns Hopkins Precursors Study. This is a joint work with Jianhui Zhou and Nae-Yuh Wang.

25

Posters

Wednesday 6:30 p.m. Set-up Salon II Thursday 4:45-6:00 p.m. Session Salon II

26

Student Poster Competition Winners Daniel Bonnery, ENSAI Asymptotic properties of nonparametric distribution estimators under informative sampling from a finite population Consider informative selection of a sample from a finite population. Responses are realized as independent and identically distributed (iid) random variables with a probability density function (pdf) $f$, referred to as the superpopulation model. The selection is informative in the sense that the sample responses, given that they were selected, are not iid $f$. In general, the informative selection mechanism may induce dependence among the selected observations. The impact of such dependence on the behavior of two basic distribution estimators, the (unweighted) empirical cumulative distribution function (cdf) and the kernel density estimator of the pdf, is studied. An asymptotic framework and weak conditions on the informative selection mechanism are devel-oped under which these statistics computed on sample responses behave as if they were computed from an iid sample of observations from a weighted version of the superpopulation pdf. In particular, the empirical cdf converges uniformly, in $L_2$ and almost surely, to a weighted version of the superpopulation cdf, yielding an analogue of the Glivenko-Cantelli theorem. Further, we compute the rate of convergence of the kernel den-sity estimator to the weighted superpopulation pdf. A series of examples, motivated by real problems in sur-veys and other observational studies, shows that the conditions are verifiable for specified designs. Gang Cheng, Department of Biostatistics, College of Public Health, University of Iowa, Iowa City, IA. The Nonparametric Least-Squares Estimation Method for Tumor Growth Function with Interval-Censored Data Gang Cheng, MS1 and Ying Zhang, PhD1

We study a model-free nonparametric estimation method for tumor growth function with tumor onset time subject to interval censoring case 1 in which each subject is assumed to be screened only once during the study. In our estimation method, the nonparametric maximum likelihood estimate (NPMLE) of tumor onset probability function is first calculated from the tumor presence at diagnosis data for each subject. A nonpara-metric least-squares objective function is then constructed with the NPMLE of the tumor onset probability function and the data of tumor size at diagnosis. The nonparametric least-squares estimate (NPLSE) of tumor growth function is obtained by minimizing the nonparametric least-squares objective function. We study as-ymptotic properties of the proposed estimator under mild regularity conditions and show that the NPLSE of the growth function is asymptotically consistent. Simulation studies are performed to demonstrate the asymp-totic consistency of the NPLSE of the growth function. The proposed method can also be readily applied to doubly interval-censored data that often occur in HIV studies, such as in estimation of the distribution function of the HIV transmission time between heterosexual couples and estimation of the distribution function of the incubation time from HIV to AIDS.

27

Student Poster Competition Winners

Kris De Brabanter, K.U. Leuven

28

Student Poster Competition Winners

Amber Hackstadt, Department of Statistics, Colorado State University Bayesian Shape-Restricted Regression Spline Model Including Changepoints

Amber Hackstadt, Mary Meyer, and Jennifer Hoeting, Colorado State University

We propose a Bayesian approach to function estimation for generalized linear models using shape-restricted fixed- and free- knot regression splines and extend this model to allow for one or more changepoints. Along with vague priors, the Bayesian shape-restricted regression spline model can be used to estimate functions while providing posterior distributions that facilitate inference. For example, inference concerning parametri-cally-modeled covariates can be accomplished using approximate marginal distributions. For the estimation of free-knot splines and the changepoint model, we use a reversible jump Markov Chain Monte Carlo algorithm. Simulation studies and examples are used to evaluate the performance of the function estimation procedures.

Zonglin He, Department of Statistics, Colorado State University

Generalized Kernel Smoothing

We are interested in fitting a nonparametric regression model to data for the situation in which the covariate is an ordered categorical variable. We extend the Nadaraya-Watson estimator, which normally requires continu-ous covariates. We derive the leading bias and variance terms for the generalized kernel estimator, under the assumption that the categories correspond to quantiles of an unobserved continuous latent variable. We also investigate the asymptotic normality of the generalized kernel estimator. Moreover, we extend the generalized Nadaraya-Watson estimator to the class of local linear regression estimators.

29

Student Poster Competition Winners

Geraldine Laurent, HEC - Ecole de gestion de l'Université de Liège

30

Student Poster Competition Winners

Morgan Lennon, North Carolina State University, Department of Statistics Spatial Bayesian Nonparametric Approach for Extreme Temperatures

Extreme temperature trends across space and time, especially those of minimum temperature, are not well un-derstood and yet are important for studying climate change. We extend common statistical models for extreme data to the spatial setting using a nonparametric approach by introducing a Dirichelet-type mixture model, with marginals that have generalized extreme value (GEV) distributions. The GEV parameters are allowed to vary spatially and temporally, however this may not explain all spatial correlation. Our proposed methodology is able to capture the unexplained spatial dependence after accounting for the GEV spatial parameters while also allowing for nonstationarity. This modeling approach provides flexibility to characterize complex spatial de-pendence between the extreme values at the sites. Our approach is computationally efficient, since it avoids matrix inversions used in the common copula frameworks for spatial extremes. The performance of our non-parametric spatial methodology is investigated through a simulation study and applied to minimum tempera-ture data in the Midwest region of the United States.

Atul Mallik, Department of Statistics, Univ. of Michigan, Ann Arbor, MI

Threshold estimation using p-values

We use p–values as a discrepancy criterion for identifying the threshold level at which a regression function takes off from its baseline value – a problem motivated by applications in toxicological and pharma-cological dose-response studies and environmental statistics. We study the problem in two different sampling settings: one where multiple responses can be obtained at a number of different covariate-levels and the other being the standard regression setting (limited number of response values at each covariate). Our procedure in-volves testing the hypothesis that the regression function is at its baseline at each covariate value and then computing the (potentially approximate) p–value of the test. An estimate of the threshold is obtained by fitting a piecewise constant function with a single jump discontinuity (stump) to these observed p–values (or their surrogates), as they behave in markedly different ways on the two sides of the threshold. The estimate is shown to be consistent and its large sample properties are studied. Our approach is computationally simple and extends to the estimation of the baseline value of the regression function, heteroscedastic errors and to time–series. It is illustrated on some real data applications. This is a joint work with Bodhisattva Sen, Moulinath Banerjee and George Michailidis.

31

Student Poster Competition Winners Kelly McConville, Colorado State University Design Asymptotics of a Penalized Spline Survey Regression Estimator Nonparametric survey regression estimation procedures are applicable in natural resource surveys, where population-level auxiliary information is available from remote sensing or other sources. Ideally, these estima-tors will have good design properties regardless of the accuracy of any assumed superpopulation model. The asymptotic properties of the penalized spline regression estimator are considered in the case where the number of knots goes to infinity and the locations of the knots are allowed to change. The estimator is shown to be de-sign consistent and asymptotically design unbiased. A variance estimator is proposed and shown to be design consistent for the asymptotic mean squared error. Simulation results demonstrate the usefulness of the asymp-totic approximations. This research was supported in part by the US National Science Foundation (SES-0922142). Elizabeth Ogburn, Department of Biostatistics, Harvard University Doubly Robust Estimation of the Local Average Treatment Effect Curve Elizabeth L. Ogburn*, Andrea Rotnitzky*†, and James Robins* We address estimation of the causal effect of a binary treatment D on an outcome Y, conditional on covariates V, from observational studies or natural experiments that record a binary instrumental variable Z. We describe a doubly robust, locally efficient estimator of the parameters indexing a model for the local average treatment effect conditional on V (LATE(V)) when randomization of the instrument Z is only true conditional on a high dimensional vector of covariates X which may be bigger than V. We derive our results after recognizing that inference should be identical to inference for the parameters of a model for an additive treatment effect on the treated conditional on V that assumes no treatment-instrument interaction, as considered in Robins (1994) for the case V=X and Tan (2010) when V is a strict subset of X. We illustrate our methods with the estimation of the local average effect of participating in 401(k) retirement programs on savings using data from the U.S. Census Bureau's 1991 Survey of Income and Program Participation. *Department of Biostatistics, Harvard University, Boston, MA; †Department of Economics, Di Tella Unvier-sity, Buenos Aires, Argentina

32

Student Poster Competition Winners Huan Wang, Department of Statistics, Colorado State University

Yuan Wang, Department of Statistics, Colorado State University

Nonparametric Regression Model with Tree-structured Response Yuan Wang 1, J.S. Marron 2, Burcu Aydin 3, Alim Ladha 2, Elizabeth Bullitt 2 and Haonan Wang 1 1 Colorado State University,2 University of North Carolina, Chapel Hill, 3 HP Labs Highly developed science and technology from the last two decades motivated the study of complex data ob-jects. In this paper, we consider the topological properties of a population of tree-structured objects. Our inter-est centers on modeling the relationship between a tree-structured response and other covariates. Nonlinear regression analysis has been a very powerful and widely-used toolkit for such purposes. For tree objects, this poses serious challenges since most regression methods rely on linear operations in Euclidean space. We gen-eralize the notion of nonparametric regression to the case of a tree-structured response variable. In addition, a fast algorithm with theoretical justification is available. Our proposed method has been implemented to a data set of human brain artery trees. We show that smoothing in tree space reveals much deeper scientific insights than smoothing summary statistics in Euclidean space.

33

Regular Posters

Mohamed Amezziane, Depaul University Semiparametric Density Estimation Several problems arise when estimating a function nonparametrically. The difficulty of selecting smoothing parameters and choosing appropriate measures of discrepancy combine to render these types of estimators un-attractive to practitioners and limit their usefulness to data exploration. In addition, certain functions such as the density or distribution have no room for incorporating available information. These reasons motivate us to explore new means of constructing semiparametric function estimators, using shrinkage methodologies, with a special emphasis on the density function. We investigate the properties of the shrinkage coefficient, study the optimality and limiting distribution of the new density estimators and show that the effect of bandwith selec-tion becomes less crucial with this new approach, thus making the proposed estimator less sensitive to the ef-fect of the curse of dimensionality. Theoretical results are supported by a Monte Carlo simulation study. Jose Chacon, Departamento de Matematicas, Universidad de Extremadura, SPAIN Kernel estimation of multivariate density derivatives In a recent paper kernel estimators of multivariate density derivative functions using unconstrained (i.e., sym-metric positive definite) bandwith matrices were introduced. These density derivative estimators have been relatively less well researched than their density estimator analogues, but have full potential for applications, including clustering and detection of other significant features. Here we present cross-validation and plug-in methods which allow for an automatic (data-dependent) selection of the bandwith matrix within the class of unconstrained matrices to be used in the kernel estimator. We illustrate the usefulness of our results with an application to gradient estimation, leading to an automatic clustering method via the mean shift algorithm. This poster is based on previous joint work with Tarn Duong (Institut Curie, Paris) and Matt Wand (University of Technology, Sydney)

34

Regular Posters

Carolina Franco, University of Maryland at College Park

35

Regular Posters

Nels Grevstad, Metropolitan State College of Denver

36

Regular Posters Daniel Hernandez-Stumpfhauser, Department of Statistics, Colorado State University Nonparametric Bayesian Circular Density Estimation and Inference Using Mixtures We illustrate Bayesian inference in models for circular density estimation using mixtures. The number of mix-ture components is unknown a priori and is to be inferred from the data. The clustering property of Dirichlet Processes provides a nonparametric prior for the number of mixture components. Density estimation is then done by modeling the data as a sample from mixtures of projected normal distributions. This allows for direct inference on uncertainty about density estimates, assessment of modality, and inference on the number of com-ponents. Finally, groups of data will be used to model the parameters of the base measure of the Dirichlet Processes in a small area estimation problem. Toshio Honda, Graduate School of Economics, Hitotsubashi University, Japan Nonparametric Quantile Regression with Heavy-Tailed and Strongly Dependent Errors We consider non parametric estimation of the conditional qth quantile for stationary time series. We deal with stationary time series with strong time dependence and heavy tails under the setting of random design. We es-timate the conditional qth quantile by local linear regression and investigate the asymptotic properties of the estimators. It is shown that the asymptotic properties are affected by both the strong time dependence and the tail index of the errors. The result of a small simulation study is also presented. The full version of this paper is available at http://gcoe.ier.hit-u.ac.jp/research/discussion/2008/pdf/gd10-157.pdf Gabriela Nane , Delft Institute of Applied Mathematics, Delft University of Technology, The Nether-lands Shape Constrained Nonparametric Baseline Estimators in the Cox Proportional Hazards Model Within survival analysis, the Cox proportional hazards model is one of the most acknowledged approaches to model right-censored time to event data in the presence of covariates. Different functionals of the lifetime dis-tribution are commonly investigated. The hazard function is of particular interest, as it represents an important feature of the time course of a process under study, e.g., death or a certain disease. Numerous survival studies indicate explicit evidence of monotone baseline hazard or density functions. The main objective is therefore to derive nonparametric baseline hazard and density estimators under monotonicity constraints and investigate their asymptotic behavior. Through the classical graphical representation, our first approach starts from the maximum likelihood estimator of the baseline cumulative hazard estimator, namely the Breslow (1972) estimator. For a nondecreasing baseline hazard, we define the least-squares (LS) baseline hazard estimator as the left-hand slope of the Greatest Convex Minorant (GCM) of the Breslow estimator. This estimator can be viewed as a least-squares projection on the space of all distributions with nondecreasing base-line hazards. Succeedingly, a maximum likelihood estimator (MLE) of a nondecreasing baseline hazard has been derived by maximizing the (log)likelihood function over the set of all distributions with nondecreasing baseline hazards. Similarly, a nonincreasing LS baseline density estimator has been derived and its asymptotic properties explored.

37

Regular Posters

Jing Wang, University of Illinois at Chicago

Determination of linear components in additive models

Additive models have been widely used in nonparametric regression, mainly due to their ability to avoid the problem of the 'curse of dimensionality'. When some of the additive components are linear, the model can be further simplified and higher convergence rates can be achieved for the estimation of these linear components. In this paper, we propose a testing procedure for the determination of linear components in nonparametric additive models. We adopt the penalised spline approach for modelling the nonparametric functions, and the test is a sort of Chi-square test based on finite-order penalised spline estimators. The limiting behaviour of the test statistic is investigated. To obtain the critical val-ues for finite sample problems, we use resampling techniques to establish a bootstrap test. The per-formance of the proposed tests is studied through simulation experiments and a real-data example. It is a joint work with Prof. Rong Chen and Prof. Hua Liang.

38

39

40

Department of Statistics

Co-sponsors