The variable sampling interval (VSI) exponentially weighted moving average (EWMA) chart which varies the chart's sampling interval according to the value of the current plotting statistic increases the speed of the standard EWMA chart in detecting shifts. Joint monitoring schemes use a single combined statistic for the mean and variance in process monitoring. To simultaneously monitor the mean and variance of a process from the normal distribution, two VSI EWMA schemes with unknown process parameters, based on (i) Maximum (Max) and (ii) Distance (Dis) type combining functions, are proposed in this paper. Each of these schemes uses a single plotting statistic. The effects of parameter estimation on the performance of the proposed VSI Max EWMA and VSI Dis EWMA schemes, in terms of the average time to signal, standard deviation of the time to signal, expected average time to signal and median time to signal criteria, are studied using Monte Carlo simulation. The results show that the proposed schemes can identify process shifts quicker than the existing Max/Dis Shewhart (SH), Max/Dis cumulative sum (CUSUM) and Max/Dis EWMA schemes. The implementation of the proposed schemes is demonstrated using a commercial dataset.
可变采样间隔指数加权移动平均图(VSI)根据当前绘制统计量的值改变图的采样间隔,提高了标准指数加权移动平均图检测移位的速度。联合监测方案在过程监测中使用单个组合统计量来表示均值和方差。为了从正态分布同时监测过程的均值和方差,本文提出了基于(i) Maximum (Max)和(ii) Distance (Dis)型组合函数的两种未知过程参数的VSI EWMA方案。这些方案中的每一个都使用一个单独的绘图统计量。利用蒙特卡罗仿真研究了参数估计对所提出的VSI Max和VSI Dis EWMA方案在平均到信号时间、到信号时间标准差、期望平均到信号时间和中位数到信号时间准则方面性能的影响。结果表明,该算法比现有的Max/Dis Shewhart (SH)、Max/Dis cumulative sum (CUSUM)和Max/Dis EWMA算法能更快地识别过程转移。使用商业数据集演示了所提出方案的实现。
{"title":"Proposed variable sampling interval maximum EWMA and distance EWMA charts with unknown process parameters","authors":"R. Parvin, M. Khoo, S. Saha, W. L. Teoh","doi":"10.1002/sta4.605","DOIUrl":"https://doi.org/10.1002/sta4.605","url":null,"abstract":"The variable sampling interval (VSI) exponentially weighted moving average (EWMA) chart which varies the chart's sampling interval according to the value of the current plotting statistic increases the speed of the standard EWMA chart in detecting shifts. Joint monitoring schemes use a single combined statistic for the mean and variance in process monitoring. To simultaneously monitor the mean and variance of a process from the normal distribution, two VSI EWMA schemes with unknown process parameters, based on (i) Maximum (Max) and (ii) Distance (Dis) type combining functions, are proposed in this paper. Each of these schemes uses a single plotting statistic. The effects of parameter estimation on the performance of the proposed VSI Max EWMA and VSI Dis EWMA schemes, in terms of the average time to signal, standard deviation of the time to signal, expected average time to signal and median time to signal criteria, are studied using Monte Carlo simulation. The results show that the proposed schemes can identify process shifts quicker than the existing Max/Dis Shewhart (SH), Max/Dis cumulative sum (CUSUM) and Max/Dis EWMA schemes. The implementation of the proposed schemes is demonstrated using a commercial dataset.","PeriodicalId":56159,"journal":{"name":"Stat","volume":"102 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2023-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81782833","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Support vector machine (SVM) is one of the most prevalent classification techniques due to its excellent performance. The standard binary SVM has been well‐studied. However, a large number of multicategory classification problems in the real world are equally worth attention. In this paper, focusing on the computationally efficient multicategory angle‐based SVM model, we first study the statistical properties of model coefficient estimation. Notice that the new challenges posed by the widespread presence of distributed data, this paper further develops a distributed smoothed estimation for the multicategory SVM and establishes its theoretical guarantees. Through the derived asymptotic properties, it can be seen that our distributed smoothed estimation can achieve the same statistical efficiency as the global estimation. Numerical studies are performed to demonstrate the highly competitive performance of our proposed distributed smoothed method.
{"title":"Statistical inference and distributed implementation for linear multicategory SVM","authors":"Gaoming Sun, Xiaozhou Wang, Yibo Yan, Riquan Zhang","doi":"10.1002/sta4.611","DOIUrl":"https://doi.org/10.1002/sta4.611","url":null,"abstract":"Support vector machine (SVM) is one of the most prevalent classification techniques due to its excellent performance. The standard binary SVM has been well‐studied. However, a large number of multicategory classification problems in the real world are equally worth attention. In this paper, focusing on the computationally efficient multicategory angle‐based SVM model, we first study the statistical properties of model coefficient estimation. Notice that the new challenges posed by the widespread presence of distributed data, this paper further develops a distributed smoothed estimation for the multicategory SVM and establishes its theoretical guarantees. Through the derived asymptotic properties, it can be seen that our distributed smoothed estimation can achieve the same statistical efficiency as the global estimation. Numerical studies are performed to demonstrate the highly competitive performance of our proposed distributed smoothed method.","PeriodicalId":56159,"journal":{"name":"Stat","volume":"28 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2023-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78336503","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Motivated by a genome‐wide association study on the glomerular filtration rate, we develop a new robust test for longitudinal data to detect the effects of biomarkers in high‐dimensional quantile regression, in the presence of prespecified control variables. The test is based on the sum of score‐type statistics deduced from conditional quantile regression. The test statistic is constructed in a working‐independent manner, but the calibration reflects the intrinsic within‐subject correlation. Therefore, the test takes advantage of the feature of longitudinal data and provides more information than those based on only one measurement for each subject. Asymptotic properties of the proposed test statistic are established under both the null and local alternative hypotheses. Simulation studies show that the proposed test can control the family‐wise error rate well, while providing competitive power. The proposed method is applied to the motivating glomerular filtration rate data to test the overall significance of a large number of candidate single‐nucleotide polymorphisms that are possibly associated with the Type 1 diabetes, conditioning on the patients' demographics.
{"title":"Score‐based test in high‐dimensional quantile regression for longitudinal data with application to a glomerular filtration rate data","authors":"Yinfeng Wang, H. Wang, Yanlin Tang","doi":"10.1002/sta4.610","DOIUrl":"https://doi.org/10.1002/sta4.610","url":null,"abstract":"Motivated by a genome‐wide association study on the glomerular filtration rate, we develop a new robust test for longitudinal data to detect the effects of biomarkers in high‐dimensional quantile regression, in the presence of prespecified control variables. The test is based on the sum of score‐type statistics deduced from conditional quantile regression. The test statistic is constructed in a working‐independent manner, but the calibration reflects the intrinsic within‐subject correlation. Therefore, the test takes advantage of the feature of longitudinal data and provides more information than those based on only one measurement for each subject. Asymptotic properties of the proposed test statistic are established under both the null and local alternative hypotheses. Simulation studies show that the proposed test can control the family‐wise error rate well, while providing competitive power. The proposed method is applied to the motivating glomerular filtration rate data to test the overall significance of a large number of candidate single‐nucleotide polymorphisms that are possibly associated with the Type 1 diabetes, conditioning on the patients' demographics.","PeriodicalId":56159,"journal":{"name":"Stat","volume":"6 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2023-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81655787","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
For sensitivity analysis with stochastic counterfactuals, we introduce a methodology to characterize uncertainty in causal inference from natural experiments. Our sensitivity parameters are standardized measures of variation in propensity and prognosis probabilities, and one minus their geometric mean is an intuitive measure of randomness in the data generating process. Within our latent propensity‐prognosis model, we show how to compute, from contingency table data, a threshold, , of sufficient randomness for causal inference. If the actual randomness of the data generating process is greater than this threshold, then causal inference is warranted. We demonstrate our methodology with two example applications.
{"title":"An asymptotic threshold of sufficient randomness for causal inference","authors":"B. Knaeble, B. Osting, P. Tshiaba","doi":"10.1002/sta4.609","DOIUrl":"https://doi.org/10.1002/sta4.609","url":null,"abstract":"For sensitivity analysis with stochastic counterfactuals, we introduce a methodology to characterize uncertainty in causal inference from natural experiments. Our sensitivity parameters are standardized measures of variation in propensity and prognosis probabilities, and one minus their geometric mean is an intuitive measure of randomness in the data generating process. Within our latent propensity‐prognosis model, we show how to compute, from contingency table data, a threshold, , of sufficient randomness for causal inference. If the actual randomness of the data generating process is greater than this threshold, then causal inference is warranted. We demonstrate our methodology with two example applications.","PeriodicalId":56159,"journal":{"name":"Stat","volume":"1 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90173354","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Deep neural network (DNN) models have achieved state‐of‐the‐art predictive accuracy in a wide range of applications. However, it remains a challenging task to accurately quantify the uncertainty in DNN predictions, especially those of continuous outcomes. To this end, we propose the Bayesian deep noise neural network (B‐DeepNoise), which generalizes standard Bayesian DNNs by extending the random noise variable from the output layer to all hidden layers. Our model is capable of approximating highly complex predictive density functions and fully learn the possible random variation in the outcome variables. For posterior computation, we provide a closed‐form Gibbs sampling algorithm that circumvents tuning‐intensive Metropolis–Hastings methods. We establish a recursive representation of the predictive density and perform theoretical analysis on the predictive variance. Through extensive experiments, we demonstrate the superiority of B‐DeepNoise over existing methods in terms of density estimation and uncertainty quantification accuracy. A neuroimaging application is included to show our model's usefulness in scientific studies.
{"title":"Density regression and uncertainty quantification with Bayesian deep noise neural networks","authors":"Daiwei Zhang, Tianci Liu, Jian Kang","doi":"10.1002/sta4.604","DOIUrl":"https://doi.org/10.1002/sta4.604","url":null,"abstract":"Deep neural network (DNN) models have achieved state‐of‐the‐art predictive accuracy in a wide range of applications. However, it remains a challenging task to accurately quantify the uncertainty in DNN predictions, especially those of continuous outcomes. To this end, we propose the Bayesian deep noise neural network (B‐DeepNoise), which generalizes standard Bayesian DNNs by extending the random noise variable from the output layer to all hidden layers. Our model is capable of approximating highly complex predictive density functions and fully learn the possible random variation in the outcome variables. For posterior computation, we provide a closed‐form Gibbs sampling algorithm that circumvents tuning‐intensive Metropolis–Hastings methods. We establish a recursive representation of the predictive density and perform theoretical analysis on the predictive variance. Through extensive experiments, we demonstrate the superiority of B‐DeepNoise over existing methods in terms of density estimation and uncertainty quantification accuracy. A neuroimaging application is included to show our model's usefulness in scientific studies.","PeriodicalId":56159,"journal":{"name":"Stat","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136020291","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Although there is a huge literature on feature selection for the Cox model, none of the existing approaches can control the false discovery rate (FDR) unless the sample size tends to infinity. In addition, there is no formal power analysis of the knockoffs framework for survival data in the literature. To address those issues, in this paper, we propose a novel controlled feature selection approach using knockoffs for the Cox model. We establish that the proposed method enjoys the FDR control in finite samples regardless of the number of covariates. Moreover, under mild regularity conditions, we also show that the power of our method is asymptotically one as sample size tends to infinity. To the best of our knowledge, this is the first formal theoretical result on the power for the knockoffs procedure in the survival setting. Simulation studies confirm that our method has appealing finite-sample performance with desired FDR control and high power. We further demonstrate the performance of our method through a real data example.
{"title":"CoxKnockoff: Controlled feature selection for the Cox model using knockoffs","authors":"Daoji Li, Jinzhao Yu, Hui Zhao","doi":"10.1002/sta4.607","DOIUrl":"https://doi.org/10.1002/sta4.607","url":null,"abstract":"Although there is a huge literature on feature selection for the Cox model, none of the existing approaches can control the false discovery rate (FDR) unless the sample size tends to infinity. In addition, there is no formal power analysis of the knockoffs framework for survival data in the literature. To address those issues, in this paper, we propose a novel controlled feature selection approach using knockoffs for the Cox model. We establish that the proposed method enjoys the FDR control in finite samples regardless of the number of covariates. Moreover, under mild regularity conditions, we also show that the power of our method is asymptotically one as sample size tends to infinity. To the best of our knowledge, this is the first formal theoretical result on the power for the knockoffs procedure in the survival setting. Simulation studies confirm that our method has appealing finite-sample performance with desired FDR control and high power. We further demonstrate the performance of our method through a real data example.","PeriodicalId":56159,"journal":{"name":"Stat","volume":"25 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2023-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86667170","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A trinomial difference autoregressive model and its applications","authors":"Huaping Chen, Jiayue Zhang, Fukang Zhu","doi":"10.1002/sta4.596","DOIUrl":"https://doi.org/10.1002/sta4.596","url":null,"abstract":"","PeriodicalId":56159,"journal":{"name":"Stat","volume":"22 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2023-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82416971","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Dirichlet process mixture models using matrix‐generalized half‐t distribution","authors":"Sanghyun Lee, C. Kim","doi":"10.1002/sta4.599","DOIUrl":"https://doi.org/10.1002/sta4.599","url":null,"abstract":"","PeriodicalId":56159,"journal":{"name":"Stat","volume":"56 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2023-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75091018","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In block designs, the responses of plots are potentially influenced by the treatments of neighbouring plots and the surrounding environment. Many researchers use two guarding plots next to the edge plots, for which we apply certain treatments to control these environmental effects. Thus, a design is presented as a collection of treatment sequences. For the estimation of total effects, existing results consider circular designs, whose constraints are unnecessary in common applications. In this paper, we construct optimal or highly efficient non-circular designs under interference models. It is observed that the optimal non-circular designs for the total effects outperform the optimal circular designs in many instances. In fact, a design containing a circular sequence cannot be optimal for