NSF Career Award – 07

Congratulations to Dr. Hao “Helen” Zhang for receiving a CAREER Award from NSF!!! Only a limited number of these are awarded each year. It is a great honor and recognition of her current and proposed future work. It is for $400k over a five year period.

Award Abstract #0645293

CAREER: Nonparametric Models Building, Estimation, and Selection with Applications to High Dimensional Data Mining

Nonparametric methods are increasingly applied to regression, classification and density estimation, both in statistics and other related areas such as data mining and machine learning. However, a key difficulty with nonparametric models is model fitting for high dimensional data due to the curse of dimensionality. Another difficulty is model inference and interpretation, i.e., how to evaluate or test individual variable effects on the complex surface fit. For heterogeneous data with complicated covariance structure, nonparametric model estimation is even more challenging. The objectives of this proposal are to develop novel and widely applicable procedures to simultaneous model selection and estimation for nonparametric models and their related paradigms in data mining. In the framework of reproducing kernel Hilbert space (RKHS), the PI proposes a host of new regularization techniques for several families of models: smoothing spline ANOVA models for correlated data, semiparametric regression models, support vector machines for supervised and semi-supervised learning. The proposed methodologies constitute key advances over standard methods through their unified framework for achieving model sparsity and function smoothing altogether, their tractable theoretical properties, and their easy adaptation to high dimensional problems. The PI will study asymptotic behaviors of the proposed estimators, explore data-driven procedures for tuning regularization parameters, and develop computation algorithms and softwares to implement the proposed procedures. The PI will also examine finite sample performance of new methods via extensive simulation studies and real data analysis.

In the current information era, the volume and complexity of scientific and industrial databases have been exponentially expanding. As a consequence, the data form keeps gaining higher and higher dimensionality. Analysis of such data poses new challenges to statisticians and is becoming one of the most important research topics in modern statistics. The purpose of this project is to significantly increase the available tools for analyzing complex high dimensional data. In this project, the PI aims to accomplish the following three goals: (1) meet the challenges of nonparametric model estimation and selection within a unified mathematical framework; (2) develop flexible methods with desired statistical properties and high-performance statistical softwares for mining massive data; (3) integrate research opportunities and findings from the above two activities into disciplinary and interdisciplinary statistical education at graduate, undergraduate and high school levels. This research will broaden traditional understanding of nonparametric inferences and model selection, provide a broad range of researchers and practitioners in various fields including sociology, economics, environmental, biological and medical sciences with state-of-the-art data analysis tools, and help to prepare the next-generation students with the necessary modern statistical perspectives.