Vector autoregressive (VAR) models have become a popular choice for modeling multivariate time series data due to their simplicity and ease of use. Efficient estimation of VAR coefficients is an important problem. The envelope technique for VAR models is demonstrated to have the potential to yield significant gains in efficiency and accuracy by incorporating linear combinations of the response vector that are essentially immaterial to the estimation of the VAR coefficients. However, inferences based on envelope VAR (EVAR) models are not invariant or equivariant upon the rescaling of the VAR responses, limiting their application to time series data that are measured in the same or similar units. In scenarios where VAR responses are measured on different scales, the efficiency improvements promised by envelopes are not always guaranteed. To address this limitation, we introduce the scaled envelope VAR (SEVAR) model, which preserves the efficiency-boosting capabilities of standard envelope techniques while remaining invariant to scale changes. The asymptotic characteristics of the proposed estimators are established based on different error assumptions. Simulation studies and real-data analysis are conducted to demonstrate the efficiency and effectiveness of the proposed model. The numerical results corroborate our theoretical findings.
This paper proposes a test for cross-sectional independence with high dimensional panel data. It uses the random matrix theory based approach of Srivastava (2005) in the presence of a large number of cross-sectional units and time series observations. Because the errors are unobservable, the residuals from the regression model for panel data are used. We develop a bias-corrected test after adjusting for the contribution from the regressors. With the aid of the martingale central limit theorem, we prove that the limiting null distribution of the proposed test statistic is normal under mild conditions as cross-sectional dimension and time dimension go to infinity together. We further study the asymptotic relative efficiency of our proposed test with respect to the state-of-art Lagrange multiplier test. An interesting finding is that the newly proposed test can have substantial power gain when the underlying variance magnitudes are not identical across different units.
A useful property of independent samples is that their correlation remains the same after applying marginal transforms. This invariance property plays a fundamental role in statistical inference, but does not hold in general for dependent samples. In this paper, we study this invariance property on the Pearson correlation coefficient and its applications. A multivariate random vector is said to have an invariant correlation if its pairwise correlation coefficients remain unchanged under any common marginal transforms. For a bivariate case, we characterize all models of such a random vector via a certain combination of comonotonicity—the strongest form of positive dependence—and independence. In particular, we show that the class of exchangeable copulas with invariant correlation is precisely described by what we call positive Fréchet copulas. In the general multivariate case, we characterize the set of all invariant correlation matrices via the clique partition polytope. We also propose a positive regression dependent model that admits any prescribed invariant correlation matrix. Finally, we show that all our characterization results of invariant correlation, except one special case, remain the same if the common marginal transforms are confined to the set of increasing ones.
Gini distance correlation (GDC) was recently proposed to measure the dependence between a categorical variable, , and a numerical random vector, . It mutually characterizes independence between and . In this article, we utilize the GDC to establish a feature screening for ultrahigh-dimensional discriminant analysis where the response variable is categorical. It can be used for screening individual features as well as grouped features. The proposed procedure possesses several appealing properties. It is model-free. No model specification is needed. It holds the sure independence screening property and the ranking consistency property. The proposed screening method can also deal with the case that the response has divergent number of categories. We conduct several Monte Carlo simulation studies to examine the finite sample performance of the proposed screening procedure. Real data analysis for two real life datasets are illustrated.
The partial least square (PLS) algorithm retains the combinations of predictors that maximize the covariance with the outcome. Cook et al. (2013) showed that PLS results in a predictor envelope, which is the smallest reducing subspace of predictors’ covariance that contains the coefficient. However, PLS and predictor envelope both target at a space that contains the regression coefficients and therefore they may sometimes be too conservative to reduce the dimension of the predictors. In this paper, we propose a new method that may improve the estimation efficiency of regression coefficients when both PLS and predictor envelope fail to do so. Specifically, our method results in the largest reducing subspace of predictors’ covariance that is contained in the coefficient matrix space. Interestingly, the moment based algorithm of our proposed method can be achieved by changing the max in PLS to min. We define the modified PLS as the inner PLS and the resulting space as the inner predictor envelope space. We provide the theoretical properties of our proposed methods as well as demonstrate their use in China Health and Nutrition Survey.
The cross projection test (CPT) technique is extended to high-dimensional two-sample mean tests in this article, which was first proposed by Wang and Cui (2024). A data-splitting strategy is required to find the projection directions that reduce the data from high dimensional space to low dimensional space which can well solve the issue of “the curse of dimensionality”. As long as both samples are randomly split once, two correlated cross projection statistics can be established according to the CPT development mechanism, which is similar to all constructed test statistics that exist the correlation caused by multiple random splits. To deal with this issue and improve the performance of empirical powers by eliminating the randomness of data-splitting, we further utilize a powerful Cauchy combination test algorithm based on multiple data-splitting. Theoretically, we prove the asymptotic property of the proposed test statistic. Furthermore, for the sparse alternative case, we apply the power enhancement technique to the ensemble Cauchy combination test-based algorithm in marginal screening for the full data. Numerical studies through Monte Carlo simulations and two real data examples are conducted simultaneously to illustrate the utility of our proposed ensemble algorithm.
A Bayesian covariance structure model (BCSM) is proposed for interval-censored multi-way nested survival data. This flexible modeling framework generalizes mixed effects survival models by allowing positive and negative associations among clustered observations. Conjugate shifted-inverse gamma priors are proposed for the covariance parameters, implying inverse gamma priors for the eigenvalues of the covariance matrix, which ensures a positive definite covariance matrix under posterior analysis. A numerically efficient Gibbs sampling procedure is defined for balanced nested designs. This requires sampling latent variables from their marginal full conditional distributions, which are derived through a recursive formula. This makes the estimation procedure suitable for interval-censored data with large cluster sizes. For unbalanced nested designs, a novel (balancing) data augmentation procedure is introduced to improve the efficiency of the Gibbs sampler. The Gibbs sampling procedure is validated in two simulation studies. The linear transformation BCSM (LT-BCSM) was applied to two-way nested interval-censored event times to analyze differences in adverse events between three groups of patients, who were randomly allocated to treatment with different stents (BIO-RESORT). The parameters of the structured covariance matrix represented unobserved heterogeneity in treatment effects and were examined to detect differential treatment effects. A comparison was made with inference results under a random effects linear transformation model. It was concluded that the LT-BCSM led to inferences with higher posterior credibility, a more profound way of quantifying evidence for risk equivalence of the three treatments, and it was more robust to prior specifications.
The family of multivariate unified skew-normal (SUN) distributions has been recently shown to possess fundamental conjugacy properties. When used as priors for the vector of coefficients in probit, tobit, and multinomial probit models, these distributions yield posteriors that still belong to the SUN family. Although this result has led to important advancements in Bayesian inference and computation, its applicability beyond likelihoods associated with fully-observed, discretized, or censored realizations from multivariate Gaussian models remains yet unexplored. This article covers such a gap by proving that the wider family of multivariate unified skew-elliptical (SUE) distributions, which extends SUNs to more general perturbations of elliptical densities, guarantees conjugacy for broader classes of models, beyond those relying on fully-observed, discretized or censored Gaussians. Such a result leverages the closure under linear combinations, conditioning and marginalization of SUE to prove that this family is conjugate to the likelihood induced by regression models for fully-observed, censored or dichotomized realizations from skew-elliptical distributions. This key advancement enlarges the set of models that enable conjugate Bayesian inference to general formulations arising from elliptical and skew-elliptical families, including the multivariate Student’s and skew-, among others.
Estimation of the mean and covariance parameters for functional data is a critical task, with local linear smoothing being a popular choice. In recent years, many scientific domains are producing multivariate functional data for which , the number of curves per subject, is often much larger than the sample size . In this setting of high-dimensional functional data, much of developed methodology relies on preliminary estimates of the unknown mean functions and the auto- and cross-covariance functions. This paper investigates the convergence rates of local linear estimators in terms of the maximal error across components and pairs of components for mean and covariance functions, respectively, in both and uniform metrics. The local linear estimators utilize a generic weighting scheme that can adjust for differing numbers of discrete observations across curves and subjects , where the vary with . Particular attention is given to the equal weight per observation (OBS) and equal weight per subject (SUBJ) weighting schemes. The theoretical results utilize novel applications of concentration inequalities for functional data and demonstrate that, similar to univariate functional data, the order of the relative to and divides high-dimensional functional data into three regimes (sparse, dense, and ultra-dense), with the high-dimensional parametric convergence rate of being attainable in the latter two.