Schematic orthogonal arrays are closely related to association schemes. And which orthogonal arrays are schematic orthogonal arrays and how to classify them is an open problem proposed by Hedayat et al. (1999). By using the Hamming distances, this paper presents some general methods for constructing schematic symmetric and mixed orthogonal arrays of high strength. As applications of these methods, we construct association schemes and many new schematic orthogonal arrays including several infinite classes of such arrays. Some examples are provided to illustrate the construction methods. The paper gives the partial solution of the problem by Hedayat et al. (1999) for symmetric and mixed orthogonal arrays of high strength.
A frequent challenge encountered in real-world applications is data having a high proportion of zeros. Focusing on ecological abundance data, much attention has been given to zero-inflated count data. Models for non-negative continuous abundance data with an excess of zeros are rarely discussed. Work presented here considers the creation of a point mass at zero through a left-censoring approach or through a hurdle approach. We incorporate both mechanisms to capture the analog of zero-inflation for count data. Additionally, primary attention has been given to univariate zero-inflated modeling (e.g., single species), whereas data often arise jointly (e.g., a collection of species). With multivariate abundance data, a key issue is to capture dependence among the species at a site, both in terms of positive abundance as well as absence. Therefore, our contribution is a model for multivariate zero-inflated continuous data that are non-negative. Working in a Bayesian framework, we discuss the issue of separating the two sources of zeros and offer model comparison metrics for multivariate zero-inflated data. In an application, we model the total biomass for five tree species obtained from plots established in the Forest Inventory Analysis database in the Northeast region of the United States.
The Cox model with unspecified baseline hazard is often used to model survival data. In the case of correlated event times, this model can be extended by introducing random effects, also called frailty terms, leading to the frailty model. Few methods have been put forward to estimate parameters of such frailty models, and they often consider only a particular distribution for the frailty terms and specific correlation structures. In this paper, a new efficient method is introduced to perform parameter estimation by maximizing the integrated partial likelihood. The proposed stochastic estimation procedure can deal with frailty models with a broad choice of distributions for the frailty terms and with any kind of correlation structure between the frailty components, also allowing random interaction terms between the covariates and the frailty components. The almost sure convergence of the stochastic estimation algorithm towards a critical point of the integrated partial likelihood is proved. Numerical convergence properties are evaluated through simulation studies and comparison with existing methods is performed. In particular, the robustness of the proposed method with respect to different parametric baseline hazards and misspecified frailty distributions is demonstrated through simulation. Finally, the method is applied to a mastitis and a bladder cancer dataset.
The paper considers a multiple testing problem of multivariate normal means under sparsity. First, the Bayes risk of the multivariate Bayes oracle is derived. Then, a hierarchical Bayesian approach is taken with global–local shrinkage priors, where the global parameter is either treated as a tuning parameter or is given a specific prior. The method is shown to attain an asymptotic Bayes optimal under sparsity (ABOS) property. Finally, an empirical Bayes procedure is proposed which involves estimation of the global shrinkage parameter. The approach is also shown to lead to the ABOS property.
Treatment initiation guidelines are essential in healthcare, dictating when patients begin therapy. These guidelines are typically assessed through randomized controlled trials (RCTs) to measure their average effect on a population. However, this method may not fully account for patient heterogeneity. We introduce a refined analysis methodology that accounts for diverse times to treatment initiation (TTI) arising from these guidelines. We offer a more detailed perspective on the guidelines’ impact by analyzing homogeneous subpopulations based on their TTI. We develop a longitudinal regression model with smooth time functions to capture dynamic changes in average guideline effects on subpopulations (AGES). A unique weighting mechanism creates pseudo-subpopulations from RCT data, enabling consistent and precise estimation of smooth functions. The efficacy of our approach is validated through theoretical and numerical studies, underscoring its capacity to provide insightful statistical inferences. We exemplify the utility of our methodology by applying it to an RCT of the World Health Organization (WHO) guideline for adults with HIV. This analysis promises to enhance the evaluation of treatment initiation guidelines, leading to more personalized and efficient patient care.
We propose a new algorithm for solving the graph-fused lasso (GFL), a regularized model that operates under the assumption that the signal tends to be locally constant over a predefined graph structure. The proposed method applies a novel decomposition of the objective function for the alternating direction method of multipliers (ADMM) algorithm. While ADMM has been widely used in fused lasso problems, existing works such as the network lasso decompose the objective function into the loss function component and the total variation penalty component. In contrast, based on the graph matching technique in graph theory, we propose a new method of decomposition that separates the objective function into two components, where one component is the loss function plus part of the total variation penalty, and the other component is the remaining total variation penalty. We develop an exact convergence rate of the proposed algorithm by developing a general theory on the local convergence of ADMM. Compared with the network lasso algorithm, our algorithm has a faster exact linear convergence rate (although in the same order as for the network lasso). It also enjoys a smaller computational cost per iteration, thus converges overall faster in most numerical examples.
Generalized linear mixed models are powerful tools for analyzing clustered data, where the unknown parameters are classically (and most commonly) estimated by the maximum likelihood and restricted maximum likelihood procedures. However, since the likelihood-based procedures are known to be highly sensitive to outliers, M-estimators have become popular as a means to obtain robust estimates under possible data contamination. In this paper, we prove that for sufficiently smooth general loss functions defining the M-estimators in generalized linear mixed models, the tail probability of the deviation between the estimated and the true regression coefficients has an exponential bound. This implies an exponential rate of consistency of these M-estimators under appropriate assumptions, generalizing the existing exponential consistency results from univariate to multivariate responses. We have illustrated this theoretical result further for the special examples of the maximum likelihood estimator and the robust minimum density power divergence estimator, a popular example of model-based M-estimators, in the settings of linear and logistic mixed models, comparing it with the empirical rate of convergence through simulation studies.
A criterion is constructed to identify the largest homoscedastic region in a Gaussian dataset. This can be reduced to a one-sided non-parametric break detection, knowing that up to a certain index the output is governed by a linear homoscedastic model, while after this index it is different (e.g. a different model, different variables, different volatility, ….). We show the convergence of the estimator of this index, with asymptotic concentration inequalities that can be exponential. A criterion and convergence results are derived when the linear homoscedastic zone is bounded by two breaks on both sides. Additionally, a criterion for choosing between zero, one, or two breaks is proposed. Monte Carlo experiments will also confirm its very good numerical performance.