Mahalanobis distance of covariate means between treatment and control groups is often adopted as a balance criterion when implementing a rerandomization strategy. However, this criterion may not work well for high-dimensional cases because it balances all orthogonalized covariates equally. We propose using principal component analysis (PCA) to identify proper subspaces in which Mahalanobis distance should be calculated. Not only can PCA effectively reduce the dimensionality for high-dimensional covariates, but it also provides computational simplicity by focusing on the top orthogonal components. The PCA rerandomization scheme has desirable theoretical properties for balancing covariates and thereby improving the estimation of average treatment effects. This conclusion is supported by numerical studies using both simulated and real examples.
General or case II interval-censored data are commonly encountered in practice. We develop methods for model-checking and goodness-of-fit testing for the additive hazards model with case II interval-censored data. We propose test statistics based on the supremum of the stochastic processes derived from the cumulative sum of martingale-based residuals over time and covariates. We approximate the distribution of the stochastic process via a simulation technique to conduct a class of graphical and numerical techniques for various purposes of model-fitting evaluations. Simulation studies are conducted to assess the finite-sample performance of the proposed method. A real dataset from an AIDS observational study is analyzed for illustration.
Heterogeneity exists in populations, and people may benefit differently from the same treatments or services. Correctly identifying subgroups corresponding to outcomes such as treatment response plays an important role in data-based decision making. As few discussions exist on subgroup analysis with measurement error, we propose a new estimation method to consider these two components simultaneously under the linear regression model. First, we develop an objective function based on unbiased estimating equations with two repeated measurements and a concave penalty on pairwise differences between coefficients. The proposed method can identify subgroups and estimate coefficients simultaneously when considering measurement error. Second, we derive an algorithm based on the alternating direction method of multipliers algorithm and demonstrate its convergence. Third, we prove that the proposed estimators are consistent and asymptotically normal. The performance and asymptotic properties of the proposed method are evaluated through simulation studies. Finally, we apply our method to data from the Lifestyle Education for Activity and Nutrition study and identify two subgroups, of which one has a significant treatment effect.
Data collected from distributed sources or sites commonly have different distributions or contaminated observations. Active learning procedures allow us to assess data when recruiting new data into model building. Thus, combining several active learning procedures together is a promising idea, even when the collected data set is contaminated. Here, we study how to conduct and integrate several adaptive sequential procedures at a time to produce a valid result via several machines or a parallel-computing framework. To avoid distraction by complicated modelling processes, we use confidence set estimation for linear models to illustrate the proposed method and discuss the approach's statistical properties. We then evaluate its performance using both synthetic and real data. We have implemented our method using Python and made it available through Github at https://github.com/zhuojianc/dsep.
In multigroup data settings with small within-group sample sizes, standard