Pub Date : 2024-10-23eCollection Date: 2025-01-01DOI: 10.1080/02664763.2024.2418476
Wisdom Aselisewine, Suvra Pal, Helton Saulo
The mixture cure rate model (MCM) is the most widely used model for the analysis of survival data with a cured subgroup. In this context, the most common strategy to model the cure probability is to assume a generalized linear model with a known link function, such as the logit link function. However, the logit model can only capture simple effects of covariates on the cure probability. In this article, we propose a new MCM where the cure probability is modeled using a decision tree-based classifier and the survival distribution of the uncured is modeled using an accelerated failure time structure. To estimate the model parameters, we develop an expectation maximization algorithm. Our simulation study shows that the proposed model performs better in capturing nonlinear classification boundaries when compared to the logit-based MCM and the spline-based MCM. This results in more accurate and precise estimates of the cured probabilities, which in-turn results in improved predictive accuracy of cure. We further show that capturing nonlinear classification boundary also improves the estimation results corresponding to the survival distribution of the uncured subjects. Finally, we apply our proposed model and the EM algorithm to analyze an existing bone marrow transplant data.
{"title":"A semiparametric accelerated failure time-based mixture cure tree.","authors":"Wisdom Aselisewine, Suvra Pal, Helton Saulo","doi":"10.1080/02664763.2024.2418476","DOIUrl":"10.1080/02664763.2024.2418476","url":null,"abstract":"<p><p>The mixture cure rate model (MCM) is the most widely used model for the analysis of survival data with a cured subgroup. In this context, the most common strategy to model the cure probability is to assume a generalized linear model with a known link function, such as the logit link function. However, the logit model can only capture simple effects of covariates on the cure probability. In this article, we propose a new MCM where the cure probability is modeled using a decision tree-based classifier and the survival distribution of the uncured is modeled using an accelerated failure time structure. To estimate the model parameters, we develop an expectation maximization algorithm. Our simulation study shows that the proposed model performs better in capturing nonlinear classification boundaries when compared to the logit-based MCM and the spline-based MCM. This results in more accurate and precise estimates of the cured probabilities, which in-turn results in improved predictive accuracy of cure. We further show that capturing nonlinear classification boundary also improves the estimation results corresponding to the survival distribution of the uncured subjects. Finally, we apply our proposed model and the EM algorithm to analyze an existing bone marrow transplant data.</p>","PeriodicalId":15239,"journal":{"name":"Journal of Applied Statistics","volume":"52 6","pages":"1177-1194"},"PeriodicalIF":1.1,"publicationDate":"2024-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12035937/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144020246","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-18eCollection Date: 2025-01-01DOI: 10.1080/02664763.2024.2417978
Elisa Frutos-Bernal, José Luis Vicente-Villardón
Biplots are useful tools because they provide a visual representation of both individuals and variables simultaneously, making it easier to explore relationships and patterns within multidimensional datasets. This paper extends their use to examine the relationship between a set of predictors and a set of response variables using Principal Covariates Regression analysis (PCovR). The PCovR biplot provides a simultaneous graphical representation of individuals, predictor variables and response variables. It also provides the ability to examine the relationship between both types of variables in the form of the regression coefficient matrix.
{"title":"The PCovR biplot: a graphical tool for principal covariates regression.","authors":"Elisa Frutos-Bernal, José Luis Vicente-Villardón","doi":"10.1080/02664763.2024.2417978","DOIUrl":"10.1080/02664763.2024.2417978","url":null,"abstract":"<p><p>Biplots are useful tools because they provide a visual representation of both individuals and variables simultaneously, making it easier to explore relationships and patterns within multidimensional datasets. This paper extends their use to examine the relationship between a set of predictors <math><mrow><mi>X</mi></mrow> </math> and a set of response variables <math><mrow><mi>Y</mi></mrow> </math> using Principal Covariates Regression analysis (PCovR). The PCovR biplot provides a simultaneous graphical representation of individuals, predictor variables and response variables. It also provides the ability to examine the relationship between both types of variables in the form of the regression coefficient matrix.</p>","PeriodicalId":15239,"journal":{"name":"Journal of Applied Statistics","volume":"52 5","pages":"1144-1159"},"PeriodicalIF":1.1,"publicationDate":"2024-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11951325/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143752849","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-14eCollection Date: 2025-01-01DOI: 10.1080/02664763.2024.2415412
Pao-Sheng Shen, Huai-Man Li
Field data provide important information on product reliability. Interval sampling is widely used for collection of field data, which typically report incident cases during a certain time period. Such sampling scheme induces doubly truncated (DT) data if the exact failure time is known. In many situations, the exact failure date is known only to fall within an interval, leading to doubly truncated and interval censored (DTIC) data. This article considers analysis of DTIC data under parametric failure time models. We consider a conditional likelihood approach and propose interval estimation for parameters and the cumulative distribution functions. Simulation studies show that the proposed method performs well for finite sample size.
{"title":"Reliability analysis based on doubly-truncated and interval-censored data.","authors":"Pao-Sheng Shen, Huai-Man Li","doi":"10.1080/02664763.2024.2415412","DOIUrl":"10.1080/02664763.2024.2415412","url":null,"abstract":"<p><p>Field data provide important information on product reliability. Interval sampling is widely used for collection of field data, which typically report incident cases during a certain time period. Such sampling scheme induces doubly truncated (DT) data if the exact failure time is known. In many situations, the exact failure date is known only to fall within an interval, leading to doubly truncated and interval censored (DTIC) data. This article considers analysis of DTIC data under parametric failure time models. We consider a conditional likelihood approach and propose interval estimation for parameters and the cumulative distribution functions. Simulation studies show that the proposed method performs well for finite sample size.</p>","PeriodicalId":15239,"journal":{"name":"Journal of Applied Statistics","volume":"52 5","pages":"1128-1143"},"PeriodicalIF":1.1,"publicationDate":"2024-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11951335/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143752915","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-11eCollection Date: 2025-01-01DOI: 10.1080/02664763.2024.2414357
Yasir Khan, Said Farooq Shah, Syed Muhammad Asim
Missing data is a common problem in many domains that rely on data analysis. The k Nearest Neighbors imputation method has been widely used to address this issue, but it has limitations in accurately imputing missing values, especially for datasets with small pairwise correlations and small values of k. In this study, we proposed a method, Ranked k Nearest Neighbors imputation that uses a similar approach to k Nearest Neighbor, but utilizing the concept of Ranked set sampling to select the most relevant neighbors for imputation. Our results show that the proposed method outperforms the standard k nearest neighbor method in terms of imputation accuracy both in case of Missing Completely at Random and Missing at Random mechanism, as demonstrated by consistently lower MSIE and MAIE values across all datasets. This suggests that the proposed method is a promising alternative for imputing missing values in datasets with small pairwise correlations and small values of k. Thus, the proposed Ranked k Nearest Neighbor method has important implications for data imputation in various domains and can contribute to the development of more efficient and accurate imputation methods without adding any computational complexity to an algorithm.
在许多依赖数据分析的领域中,丢失数据是一个常见的问题。k近邻归算方法已被广泛用于解决这一问题,但它在准确归算缺失值方面存在局限性,特别是对于具有小成对相关性和小k值的数据集。在本研究中,我们提出了一种方法,rank k Nearest Neighbors imputation,它使用类似于k近邻的方法,但利用rank集抽样的概念来选择最相关的邻居进行归算。我们的研究结果表明,在完全随机缺失和随机缺失机制的情况下,所提出的方法在imputation精度方面优于标准k近邻方法,所有数据集的MSIE和MAIE值都始终较低。这表明,所提出的方法是一种有希望的替代方法,用于在具有小成对相关性和小k值的数据集中输入缺失值。因此,所提出的排名k最近邻方法对各个领域的数据输入具有重要意义,并且可以有助于开发更有效和准确的输入方法,而不会增加算法的计算复杂性。
{"title":"A novel ranked <i>k</i>-nearest neighbors algorithm for missing data imputation.","authors":"Yasir Khan, Said Farooq Shah, Syed Muhammad Asim","doi":"10.1080/02664763.2024.2414357","DOIUrl":"10.1080/02664763.2024.2414357","url":null,"abstract":"<p><p>Missing data is a common problem in many domains that rely on data analysis. The <i>k</i> Nearest Neighbors imputation method has been widely used to address this issue, but it has limitations in accurately imputing missing values, especially for datasets with small pairwise correlations and small values of <i>k</i>. In this study, we proposed a method, Ranked <i>k</i> Nearest Neighbors imputation that uses a similar approach to <i>k</i> Nearest Neighbor, but utilizing the concept of Ranked set sampling to select the most relevant neighbors for imputation. Our results show that the proposed method outperforms the standard <i>k</i> nearest neighbor method in terms of imputation accuracy both in case of Missing Completely at Random and Missing at Random mechanism, as demonstrated by consistently lower MSIE and MAIE values across all datasets. This suggests that the proposed method is a promising alternative for imputing missing values in datasets with small pairwise correlations and small values of <i>k</i>. Thus, the proposed Ranked <i>k</i> Nearest Neighbor method has important implications for data imputation in various domains and can contribute to the development of more efficient and accurate imputation methods without adding any computational complexity to an algorithm.</p>","PeriodicalId":15239,"journal":{"name":"Journal of Applied Statistics","volume":"52 5","pages":"1103-1127"},"PeriodicalIF":1.1,"publicationDate":"2024-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11951327/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143752879","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-11eCollection Date: 2025-01-01DOI: 10.1080/02664763.2024.2414346
Jyrki Möttönen, Tero Lähderanta, Janne Salonen, Mikko J Sillanpää
Lasso is a popular and efficient approach to simultaneous estimation and variable selection in high-dimensional regression models. In this paper, a robust fused LAD-lasso method for multiple outcomes is presented that addresses the challenges of non-normal outcome distributions and outlying observations. Measured covariate data from space or time, or spectral bands or genomic positions often have natural correlation structure arising from measuring distance between the covariates. The proposed multi-outcome approach includes handling of such covariate blocks by a group fusion penalty, which encourages similarity between neighboring regression coefficient vectors by penalizing their differences, for example, in sequential data situation. Properties of the proposed approach are illustrated by extensive simulations using BIC-type criteria for model selection. The method is also applied to a real-life skewed data on retirement behavior with longitudinal heteroscedastic explanatory variables.
{"title":"Robust multi-outcome regression with correlated covariate blocks using fused LAD-lasso.","authors":"Jyrki Möttönen, Tero Lähderanta, Janne Salonen, Mikko J Sillanpää","doi":"10.1080/02664763.2024.2414346","DOIUrl":"10.1080/02664763.2024.2414346","url":null,"abstract":"<p><p>Lasso is a popular and efficient approach to simultaneous estimation and variable selection in high-dimensional regression models. In this paper, a robust fused LAD-lasso method for multiple outcomes is presented that addresses the challenges of non-normal outcome distributions and outlying observations. Measured covariate data from space or time, or spectral bands or genomic positions often have natural correlation structure arising from measuring distance between the covariates. The proposed multi-outcome approach includes handling of such covariate blocks by a group fusion penalty, which encourages similarity between neighboring regression coefficient vectors by penalizing their differences, for example, in sequential data situation. Properties of the proposed approach are illustrated by extensive simulations using BIC-type criteria for model selection. The method is also applied to a real-life skewed data on retirement behavior with longitudinal heteroscedastic explanatory variables.</p>","PeriodicalId":15239,"journal":{"name":"Journal of Applied Statistics","volume":"52 5","pages":"1081-1102"},"PeriodicalIF":1.2,"publicationDate":"2024-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11951329/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143752845","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-10eCollection Date: 2025-01-01DOI: 10.1080/02664763.2024.2411608
Wei Zhang, Antonietta Mira, Ernst C Wit
COVID-19 has led to excess deaths around the world. However, the impact on mortality rates from other causes of death during this time remains unclear. To understand the broader impact of COVID-19 on other causes of death, we analyze Italian official data covering monthly mortality counts from January 2015 to December 2020. To handle the high-dimensional nature of the data, we developed a model that combines Poisson regression with tensor train decomposition to explore the lower-dimensional residual structure of the data. Our Bayesian approach incorporates prior information on model parameters and utilizes an efficient Metropolis-Hastings within Gibbs algorithm for posterior inference. Simulation studies were conducted to validate our approach. Our method not only identifies differential effects of interventions on cause-specific mortality rates through Poisson regression but also provides insights into the relationship between COVID-19 and other causes of death. Additionally, it uncovers latent classes related to demographic characteristics, temporal patterns, and causes of death.
{"title":"Bayesian poisson regression tensor train decomposition model for learning mortality pattern changes during COVID-19 pandemic.","authors":"Wei Zhang, Antonietta Mira, Ernst C Wit","doi":"10.1080/02664763.2024.2411608","DOIUrl":"10.1080/02664763.2024.2411608","url":null,"abstract":"<p><p>COVID-19 has led to excess deaths around the world. However, the impact on mortality rates from other causes of death during this time remains unclear. To understand the broader impact of COVID-19 on other causes of death, we analyze Italian official data covering monthly mortality counts from January 2015 to December 2020. To handle the high-dimensional nature of the data, we developed a model that combines Poisson regression with tensor train decomposition to explore the lower-dimensional residual structure of the data. Our Bayesian approach incorporates prior information on model parameters and utilizes an efficient Metropolis-Hastings within Gibbs algorithm for posterior inference. Simulation studies were conducted to validate our approach. Our method not only identifies differential effects of interventions on cause-specific mortality rates through Poisson regression but also provides insights into the relationship between COVID-19 and other causes of death. Additionally, it uncovers latent classes related to demographic characteristics, temporal patterns, and causes of death.</p>","PeriodicalId":15239,"journal":{"name":"Journal of Applied Statistics","volume":"52 5","pages":"1017-1039"},"PeriodicalIF":1.2,"publicationDate":"2024-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11951336/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143752896","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-08eCollection Date: 2025-01-01DOI: 10.1080/02664763.2024.2411614
Iana Michelle L Garcia, Michael Daniel C Lucagbo
Reference ranges are invaluable in laboratory medicine, as these are indispensable tools for the interpretation of laboratory test results. When assessing measurements on a single analyte, univariate reference intervals are required. In many cases, however, measurements on several analytes are needed by medical practitioners to diagnose more complicated conditions such as kidney function or liver function. For such cases, it is recommended to use multivariate reference regions, which account for the cross-correlations among the analytes. Traditionally, multivariate reference regions (MRRs) have been constructed as ellipsoidal regions. The disadvantage of such regions is that they are unable to detect component-wise outlying measurements. Because of this, rectangular reference regions have recently been put forward in the literature. In this study, we develop methodologies to compute rectangular MRRs that incorporate covariate information, which are often necessary in evaluating laboratory test results. We construct the reference region using tolerance-based criteria so that the resulting region possesses the multiple use property. Results show that the proposed regions yield coverage probabilities that are accurate and are robust to the sample size. Finally, we apply the proposed procedures to a real-life example on the computation of an MRR for three components of the insulin-like growth factor system.
{"title":"Regression-based rectangular tolerance regions as reference regions in laboratory medicine.","authors":"Iana Michelle L Garcia, Michael Daniel C Lucagbo","doi":"10.1080/02664763.2024.2411614","DOIUrl":"10.1080/02664763.2024.2411614","url":null,"abstract":"<p><p>Reference ranges are invaluable in laboratory medicine, as these are indispensable tools for the interpretation of laboratory test results. When assessing measurements on a single analyte, univariate reference intervals are required. In many cases, however, measurements on several analytes are needed by medical practitioners to diagnose more complicated conditions such as kidney function or liver function. For such cases, it is recommended to use multivariate reference regions, which account for the cross-correlations among the analytes. Traditionally, multivariate reference regions (MRRs) have been constructed as ellipsoidal regions. The disadvantage of such regions is that they are unable to detect component-wise outlying measurements. Because of this, rectangular reference regions have recently been put forward in the literature. In this study, we develop methodologies to compute rectangular MRRs that incorporate covariate information, which are often necessary in evaluating laboratory test results. We construct the reference region using tolerance-based criteria so that the resulting region possesses the multiple use property. Results show that the proposed regions yield coverage probabilities that are accurate and are robust to the sample size. Finally, we apply the proposed procedures to a real-life example on the computation of an MRR for three components of the insulin-like growth factor system.</p>","PeriodicalId":15239,"journal":{"name":"Journal of Applied Statistics","volume":"52 5","pages":"1040-1062"},"PeriodicalIF":1.1,"publicationDate":"2024-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11951328/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143752912","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The devastating impact of COVID-19 on the United States has been profound since its onset in January 2020. Predicting the trajectory of epidemics accurately and devising strategies to curb their progression are currently formidable challenges. In response to this crisis, we propose COVINet, which combines the architecture of Long Short-Term Memory and Gated Recurrent Unit, incorporating actionable covariates to offer high-accuracy prediction and explainable response. First, we train COVINet models for confirmed cases and total deaths with five input features, and compare Mean Absolute Errors (MAEs) and Mean Relative Errors (MREs) of COVINet against ten competing models from the United States CDC in the last four weeks before April 26, 2021. The results show COVINet outperforms all competing models for MAEs and MREs when predicting total deaths. Then, we focus on prediction for the most severe county in each of the top 10 hot-spot states using COVINet. The MREs are small for all predictions made in the last 7 or 30 days before March 23, 2023. Beyond predictive accuracy, COVINet offers high interpretability, enhancing the understanding of pandemic dynamics. This dual capability positions COVINet as a powerful tool for informing effective strategies in pandemic prevention and governmental decision-making.
{"title":"COVINet: a deep learning-based and interpretable prediction model for the county-wise trajectories of COVID-19 in the United States.","authors":"Yukang Jiang, Ting Tian, Wenting Zhou, Yuting Zhang, Zhongfei Li, Xueqin Wang, Heping Zhang","doi":"10.1080/02664763.2024.2412284","DOIUrl":"10.1080/02664763.2024.2412284","url":null,"abstract":"<p><p>The devastating impact of COVID-19 on the United States has been profound since its onset in January 2020. Predicting the trajectory of epidemics accurately and devising strategies to curb their progression are currently formidable challenges. In response to this crisis, we propose COVINet, which combines the architecture of Long Short-Term Memory and Gated Recurrent Unit, incorporating actionable covariates to offer high-accuracy prediction and explainable response. First, we train COVINet models for confirmed cases and total deaths with five input features, and compare Mean Absolute Errors (MAEs) and Mean Relative Errors (MREs) of COVINet against ten competing models from the United States CDC in the last four weeks before April 26, 2021. The results show COVINet outperforms all competing models for MAEs and MREs when predicting total deaths. Then, we focus on prediction for the most severe county in each of the top 10 hot-spot states using COVINet. The MREs are small for all predictions made in the last 7 or 30 days before March 23, 2023. Beyond predictive accuracy, COVINet offers high interpretability, enhancing the understanding of pandemic dynamics. This dual capability positions COVINet as a powerful tool for informing effective strategies in pandemic prevention and governmental decision-making.</p>","PeriodicalId":15239,"journal":{"name":"Journal of Applied Statistics","volume":"52 5","pages":"1063-1080"},"PeriodicalIF":1.1,"publicationDate":"2024-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11951337/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143752907","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-25eCollection Date: 2025-01-01DOI: 10.1080/02664763.2024.2405111
Xuelong Hu, Yixuan Ma, Jiening Zhang, Jiujun Zhang, Ali Yeganeh, Sandile Charles Shongwe
Current monitoring technologies emphasize and address the issue of monitoring high-volume production processes. The high flexibility and diversity of current industrial production processes make monitoring technology for small batch processes even more important. In multivariate process monitoring, a broader applicability exists in multivariate coefficients of variation (MCV) based monitoring schemes due to the lower restriction of the process. In view of the effectiveness of MCV monitoring and with the aim to achieve further performance improvement of current MCV monitoring schemes in a finite horizon production, we additionally introduce two one-sided cumulative sum (CUSUM) MCV schemes. In the case of deterministic and random shifts, the design parameters of the proposed schemes are obtained via an optimization procedure designed by the Markov chain method and the corresponding performance is analysed based on different run length (RL) characteristics, including the mean and the standard deviation. Simulation comparisons with existing exponentially weighted moving average (EWMA) MCV schemes show that the proposed CUSUM MCV schemes are more efficient in monitoring most of the shifts, including the deterministic and random shifts. Finally, to demonstrate the benefits of the new monitoring schemes, a comprehensive case study on monitoring a steel sleeve manufacturing process is conducted.
{"title":"The efficiency of CUSUM schemes for monitoring the multivariate coefficient of variation in short runs process.","authors":"Xuelong Hu, Yixuan Ma, Jiening Zhang, Jiujun Zhang, Ali Yeganeh, Sandile Charles Shongwe","doi":"10.1080/02664763.2024.2405111","DOIUrl":"10.1080/02664763.2024.2405111","url":null,"abstract":"<p><p>Current monitoring technologies emphasize and address the issue of monitoring high-volume production processes. The high flexibility and diversity of current industrial production processes make monitoring technology for small batch processes even more important. In multivariate process monitoring, a broader applicability exists in multivariate coefficients of variation (MCV) based monitoring schemes due to the lower restriction of the process. In view of the effectiveness of MCV monitoring and with the aim to achieve further performance improvement of current MCV monitoring schemes in a finite horizon production, we additionally introduce two one-sided cumulative sum (CUSUM) MCV schemes. In the case of deterministic and random shifts, the design parameters of the proposed schemes are obtained via an optimization procedure designed by the Markov chain method and the corresponding performance is analysed based on different run length (RL) characteristics, including the mean and the standard deviation. Simulation comparisons with existing exponentially weighted moving average (EWMA) MCV schemes show that the proposed CUSUM MCV schemes are more efficient in monitoring most of the shifts, including the deterministic and random shifts. Finally, to demonstrate the benefits of the new monitoring schemes, a comprehensive case study on monitoring a steel sleeve manufacturing process is conducted.</p>","PeriodicalId":15239,"journal":{"name":"Journal of Applied Statistics","volume":"52 4","pages":"966-992"},"PeriodicalIF":1.1,"publicationDate":"2024-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11873948/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143557090","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-25eCollection Date: 2025-01-01DOI: 10.1080/02664763.2024.2399574
Daryan Naatjes, Stephen A Sedory, Sarjinder Singh
We develop a collection of unbiased estimators for the proportion of a population bearing a sensitive characteristic using a randomized response technique with two decks of cards for any choice of weights. The efficiency of the estimator depends on the weights, and we demonstrate how to find an optimal choice. The coefficients of skewness and kurtosis are introduced. We support our findings with a simulation study that models a real survey dataset. We suggest that a careful choice of such weights can also lead to all estimates of proportion lying between [0, 1]. In addition, we illustrate the use of the estimators in a recent study that estimates the proportion of students, 18 years and over, who had returned to the campus and tested positive for COVID-19.
{"title":"A class of infinite number of unbiased estimators using weighted squared distance for two-deck randomized response model.","authors":"Daryan Naatjes, Stephen A Sedory, Sarjinder Singh","doi":"10.1080/02664763.2024.2399574","DOIUrl":"10.1080/02664763.2024.2399574","url":null,"abstract":"<p><p>We develop a collection of unbiased estimators for the proportion of a population bearing a sensitive characteristic using a randomized response technique with two decks of cards for any choice of weights. The efficiency of the estimator depends on the weights, and we demonstrate how to find an optimal choice. The coefficients of skewness and kurtosis are introduced. We support our findings with a simulation study that models a real survey dataset. We suggest that a careful choice of such weights can also lead to all estimates of proportion lying between [0, 1]. In addition, we illustrate the use of the estimators in a recent study that estimates the proportion of students, 18 years and over, who had returned to the campus and tested positive for COVID-19.</p>","PeriodicalId":15239,"journal":{"name":"Journal of Applied Statistics","volume":"52 4","pages":"868-893"},"PeriodicalIF":1.1,"publicationDate":"2024-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11873971/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143557047","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}