Ignoring measurement errors in conventional regression analyses can lead to biased estimation and inference results. Reducing such bias is challenging when the error-prone covariate is a functional curve. In this paper, we propose a new corrected loss function for a partially functional linear quantile model with function-valued measurement errors. We establish the asymptotic properties of both the functional coefficient and the parametric coefficient estimators. We also demonstrate the finite-sample performance of the proposed method using simulation studies, and illustrate its advantages by applying it to data from a children obesity study.
High-dimensional classification is an important statistical problem that has applications in many areas. One widely used classifier is the Linear Discriminant Analysis (LDA). In recent years, many regularized LDA classifiers have been proposed to solve the problem of high-dimensional classification. However, these methods rely on inverting a large matrix or solving large-scale optimization problems to render classification rules-methods that are computationally prohibitive when the dimension is ultra-high. With the emergence of big data, it is increasingly important to develop more efficient algorithms to solve the high-dimensional LDA problem. In this paper, we propose an efficient greedy search algorithm that depends solely on closed-form formulae to learn a high-dimensional LDA rule. We establish theoretical guarantee of its statistical properties in terms of variable selection and error rate consistency; in addition, we provide an explicit interpretation of the extra information brought by an additional feature in a LDA problem under some mild distributional assumptions. We demonstrate that this new algorithm drastically improves computational speed compared with other high-dimensional LDA methods, while maintaining comparable or even better classification performance.
In Bayesian data analysis, it is often important to evaluate quantiles of the posterior distribution of a parameter of interest (e.g., to form posterior intervals). In multi-dimensional problems, when non-conjugate priors are used, this is often difficult generally requiring either an analytic or sampling-based approximation, such as Markov chain Monte-Carlo (MCMC), Approximate Bayesian computation (ABC) or variational inference. We discuss a general approach that reframes this as a multi-task learning problem and uses recurrent deep neural networks (RNNs) to approximately evaluate posterior quantiles. As RNNs carry information along a sequence, this application is particularly useful in time-series. An advantage of this risk-minimization approach is that we do not need to sample from the posterior or calculate the likelihood. We illustrate the proposed approach in several examples.
High-dimensional vector autoregression with measurement error is frequently encountered in a large variety of scientific and business applications. In this article, we study statistical inference of the transition matrix under this model. While there has been a large body of literature studying sparse estimation of the transition matrix, there is a paucity of inference solutions, especially in the high-dimensional scenario. We develop inferential procedures for both the global and simultaneous testing of the transition matrix. We first develop a new sparse expectation-maximization algorithm to estimate the model parameters, and carefully characterize their estimation precisions. We then construct a Gaussian matrix, after proper bias and variance corrections, from which we derive the test statistics. Finally, we develop the testing procedures and establish their asymptotic guarantees. We study the finite-sample performance of our tests through intensive simulations, and illustrate with a brain connectivity analysis example.
In this work, we propose a longitudinal quantile regression framework that enables a robust characterization of heterogeneous covariate-response associations in the presence of high-dimensional compositional covariates and repeated measurements of both response and covariates. We develop a globally adaptive penalization procedure, which can consistently identify covariate sparsity patterns across a continuum set of quantile levels. The proposed estimation procedure properly aggregates longitudinal observations over time, and ensures the satisfaction of the sum-zero coefficient constraint that is needed for proper interpretation of the effects of compositional covariates. We establish the oracle rate of uniform convergence and weak convergence of the resulting estimators, and further justify the proposed uniform selector of the tuning parameter in terms of achieving global model selection consistency. We derive an efficient algorithm by incorporating existing R packages to facilitate stable and fast computation. Our extensive simulation studies confirm the theoretical findings. We apply the proposed method to a longitudinal study of cystic fibrosis children where the association between gut microbiome and other diet-related biomarkers is of interest.