Statistical Machine Learning: An Integrated Approach
We introduces a new data analysis framework, called Integrated Statistical Learning (ISL) theory,
which for the first time, offers solutions to blend the parametric statistical modeling and
algorithmic machine learning into a coherent whole by establishing a link between them. This new
integrated statistical method provide novel solutions to conditional density estimation,
goodness-of-fit evaluation, quantile regression and much more.
Heterogeneity, Relevance and Customized Inference
In modern Large-Scale Inference problems, it is important to take the extra covariate information into account when performing inferences for the heterogeneous data.
I am developing with Prof. Deep Mukhopadhyay a new paradigm of statistical modeling called ``Global-to-Local Inference", which will provide the necessary theory and algorithms for
this Individualized Inference. By borrowing strength from the full ensemble, this method generates simulated relevance samples to power subsequent analysis.
Single-cell Data Cytokine Clustering in Vaccine Studies
Flow cytometry single-cell data are usually used in modern study of immune-related diseases.
So far, various model have been proposed on how to process the signal intensity to identify
cell subsets of different marker combinations, yet little has been said what to do with the
resulting count data. This is not a trivial matter as the resulting data set consists of sparse
count data of high dimensionality, and analyzing it using traditional means are challenging.
We believe that an important step to understanding this data set is finding out which subsets
have similar responses to the stimuli. To this end, I'm developing a framework using mixture
model and graph theory that can quickly cluster together marker combinations with similar
reaction to stimuli for a given time period.
Nonparametric Approach to High-dimensional K-sample Comparison Problems
Multivariate k-sample comparison problem frequently appears in a wide range of data-rich scientific fields. In this project, I developed an approach based on modern LP-nonparametric
tools and unexplored connections with spectral graph theory, which demonstrated impressive robustness for noise contaminated data sets. Furthermore, this method comes
with an exploratory interface, which not only provides more insight into the problem but also can be utilized for developing a better predictive model at the next phase of data-modeling
High-dimensional Nonparametric Change-point Detection
A large class of applied problems arising in health to military to environmental monitoring can be formulated as: tracking massive amount of data collected from a large number
of sensors, which can be summarized as high dimensional change-point detection problems. Approaching this kind of problems in a brute-force classical manner is known to be extremely
challenging. I've developed a spectral graph analysis approach for the purpose, which significantly reduces the memory footprint and speeds up the computation.