首页 > 最新文献

Journal of Applied Statistics最新文献

英文 中文
A semiparametric accelerated failure time-based mixture cure tree. 基于半参数加速失效时间的混合修复树。
IF 1.1 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2024-10-23 eCollection Date: 2025-01-01 DOI: 10.1080/02664763.2024.2418476
Wisdom Aselisewine, Suvra Pal, Helton Saulo

The mixture cure rate model (MCM) is the most widely used model for the analysis of survival data with a cured subgroup. In this context, the most common strategy to model the cure probability is to assume a generalized linear model with a known link function, such as the logit link function. However, the logit model can only capture simple effects of covariates on the cure probability. In this article, we propose a new MCM where the cure probability is modeled using a decision tree-based classifier and the survival distribution of the uncured is modeled using an accelerated failure time structure. To estimate the model parameters, we develop an expectation maximization algorithm. Our simulation study shows that the proposed model performs better in capturing nonlinear classification boundaries when compared to the logit-based MCM and the spline-based MCM. This results in more accurate and precise estimates of the cured probabilities, which in-turn results in improved predictive accuracy of cure. We further show that capturing nonlinear classification boundary also improves the estimation results corresponding to the survival distribution of the uncured subjects. Finally, we apply our proposed model and the EM algorithm to analyze an existing bone marrow transplant data.

混合治愈率模型(MCM)是最广泛用于分析治愈亚组生存数据的模型。在这种情况下,对治愈概率进行建模的最常见策略是假设一个具有已知链接函数的广义线性模型,例如logit链接函数。然而,logit模型只能捕捉协变量对治愈概率的简单影响。在本文中,我们提出了一种新的MCM,其中治愈概率使用基于决策树的分类器建模,未治愈的生存分布使用加速故障时间结构建模。为了估计模型参数,我们开发了一种期望最大化算法。我们的仿真研究表明,与基于逻辑的MCM和基于样条的MCM相比,该模型在捕获非线性分类边界方面表现更好。这使得对治愈概率的估计更加准确和精确,从而提高了治愈预测的准确性。我们进一步证明,捕获非线性分类边界也改善了对未治愈受试者生存分布的估计结果。最后,我们将提出的模型和EM算法应用于现有的骨髓移植数据分析。
{"title":"A semiparametric accelerated failure time-based mixture cure tree.","authors":"Wisdom Aselisewine, Suvra Pal, Helton Saulo","doi":"10.1080/02664763.2024.2418476","DOIUrl":"10.1080/02664763.2024.2418476","url":null,"abstract":"<p><p>The mixture cure rate model (MCM) is the most widely used model for the analysis of survival data with a cured subgroup. In this context, the most common strategy to model the cure probability is to assume a generalized linear model with a known link function, such as the logit link function. However, the logit model can only capture simple effects of covariates on the cure probability. In this article, we propose a new MCM where the cure probability is modeled using a decision tree-based classifier and the survival distribution of the uncured is modeled using an accelerated failure time structure. To estimate the model parameters, we develop an expectation maximization algorithm. Our simulation study shows that the proposed model performs better in capturing nonlinear classification boundaries when compared to the logit-based MCM and the spline-based MCM. This results in more accurate and precise estimates of the cured probabilities, which in-turn results in improved predictive accuracy of cure. We further show that capturing nonlinear classification boundary also improves the estimation results corresponding to the survival distribution of the uncured subjects. Finally, we apply our proposed model and the EM algorithm to analyze an existing bone marrow transplant data.</p>","PeriodicalId":15239,"journal":{"name":"Journal of Applied Statistics","volume":"52 6","pages":"1177-1194"},"PeriodicalIF":1.1,"publicationDate":"2024-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12035937/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144020246","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The PCovR biplot: a graphical tool for principal covariates regression. PCovR双标图:主协变量回归的图形工具。
IF 1.1 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2024-10-18 eCollection Date: 2025-01-01 DOI: 10.1080/02664763.2024.2417978
Elisa Frutos-Bernal, José Luis Vicente-Villardón

Biplots are useful tools because they provide a visual representation of both individuals and variables simultaneously, making it easier to explore relationships and patterns within multidimensional datasets. This paper extends their use to examine the relationship between a set of predictors X and a set of response variables Y using Principal Covariates Regression analysis (PCovR). The PCovR biplot provides a simultaneous graphical representation of individuals, predictor variables and response variables. It also provides the ability to examine the relationship between both types of variables in the form of the regression coefficient matrix.

双标图是一种有用的工具,因为它们同时提供了个体和变量的可视化表示,使探索多维数据集中的关系和模式变得更加容易。本文扩展了它们的使用,使用主协变量回归分析(PCovR)来检查一组预测因子X和一组响应变量Y之间的关系。PCovR双标图同时提供了个体、预测变量和响应变量的图形表示。它还提供了以回归系数矩阵的形式检查两种变量之间关系的能力。
{"title":"The PCovR biplot: a graphical tool for principal covariates regression.","authors":"Elisa Frutos-Bernal, José Luis Vicente-Villardón","doi":"10.1080/02664763.2024.2417978","DOIUrl":"10.1080/02664763.2024.2417978","url":null,"abstract":"<p><p>Biplots are useful tools because they provide a visual representation of both individuals and variables simultaneously, making it easier to explore relationships and patterns within multidimensional datasets. This paper extends their use to examine the relationship between a set of predictors <math><mrow><mi>X</mi></mrow> </math> and a set of response variables <math><mrow><mi>Y</mi></mrow> </math> using Principal Covariates Regression analysis (PCovR). The PCovR biplot provides a simultaneous graphical representation of individuals, predictor variables and response variables. It also provides the ability to examine the relationship between both types of variables in the form of the regression coefficient matrix.</p>","PeriodicalId":15239,"journal":{"name":"Journal of Applied Statistics","volume":"52 5","pages":"1144-1159"},"PeriodicalIF":1.1,"publicationDate":"2024-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11951325/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143752849","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Reliability analysis based on doubly-truncated and interval-censored data. 基于双截尾和间隔截尾数据的可靠性分析。
IF 1.1 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2024-10-14 eCollection Date: 2025-01-01 DOI: 10.1080/02664763.2024.2415412
Pao-Sheng Shen, Huai-Man Li

Field data provide important information on product reliability. Interval sampling is widely used for collection of field data, which typically report incident cases during a certain time period. Such sampling scheme induces doubly truncated (DT) data if the exact failure time is known. In many situations, the exact failure date is known only to fall within an interval, leading to doubly truncated and interval censored (DTIC) data. This article considers analysis of DTIC data under parametric failure time models. We consider a conditional likelihood approach and propose interval estimation for parameters and the cumulative distribution functions. Simulation studies show that the proposed method performs well for finite sample size.

现场数据提供了产品可靠性的重要信息。间隔抽样广泛用于收集现场数据,这些数据通常报告特定时间段内的事件。如果确切的故障时间已知,这种采样方案会产生双截短(DT)数据。在许多情况下,确切的故障日期只在一个间隔内已知,从而导致数据的双重截断和间隔截尾(DTIC)。本文考虑参数化失效时间模型下的DTIC数据分析。我们考虑了条件似然方法,并提出了参数和累积分布函数的区间估计。仿真研究表明,该方法在有限样本量下具有良好的性能。
{"title":"Reliability analysis based on doubly-truncated and interval-censored data.","authors":"Pao-Sheng Shen, Huai-Man Li","doi":"10.1080/02664763.2024.2415412","DOIUrl":"10.1080/02664763.2024.2415412","url":null,"abstract":"<p><p>Field data provide important information on product reliability. Interval sampling is widely used for collection of field data, which typically report incident cases during a certain time period. Such sampling scheme induces doubly truncated (DT) data if the exact failure time is known. In many situations, the exact failure date is known only to fall within an interval, leading to doubly truncated and interval censored (DTIC) data. This article considers analysis of DTIC data under parametric failure time models. We consider a conditional likelihood approach and propose interval estimation for parameters and the cumulative distribution functions. Simulation studies show that the proposed method performs well for finite sample size.</p>","PeriodicalId":15239,"journal":{"name":"Journal of Applied Statistics","volume":"52 5","pages":"1128-1143"},"PeriodicalIF":1.1,"publicationDate":"2024-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11951335/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143752915","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A novel ranked k-nearest neighbors algorithm for missing data imputation. 一种新的k近邻排序缺失数据输入算法。
IF 1.1 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2024-10-11 eCollection Date: 2025-01-01 DOI: 10.1080/02664763.2024.2414357
Yasir Khan, Said Farooq Shah, Syed Muhammad Asim

Missing data is a common problem in many domains that rely on data analysis. The k Nearest Neighbors imputation method has been widely used to address this issue, but it has limitations in accurately imputing missing values, especially for datasets with small pairwise correlations and small values of k. In this study, we proposed a method, Ranked k Nearest Neighbors imputation that uses a similar approach to k Nearest Neighbor, but utilizing the concept of Ranked set sampling to select the most relevant neighbors for imputation. Our results show that the proposed method outperforms the standard k nearest neighbor method in terms of imputation accuracy both in case of Missing Completely at Random and Missing at Random mechanism, as demonstrated by consistently lower MSIE and MAIE values across all datasets. This suggests that the proposed method is a promising alternative for imputing missing values in datasets with small pairwise correlations and small values of k. Thus, the proposed Ranked k Nearest Neighbor method has important implications for data imputation in various domains and can contribute to the development of more efficient and accurate imputation methods without adding any computational complexity to an algorithm.

在许多依赖数据分析的领域中,丢失数据是一个常见的问题。k近邻归算方法已被广泛用于解决这一问题,但它在准确归算缺失值方面存在局限性,特别是对于具有小成对相关性和小k值的数据集。在本研究中,我们提出了一种方法,rank k Nearest Neighbors imputation,它使用类似于k近邻的方法,但利用rank集抽样的概念来选择最相关的邻居进行归算。我们的研究结果表明,在完全随机缺失和随机缺失机制的情况下,所提出的方法在imputation精度方面优于标准k近邻方法,所有数据集的MSIE和MAIE值都始终较低。这表明,所提出的方法是一种有希望的替代方法,用于在具有小成对相关性和小k值的数据集中输入缺失值。因此,所提出的排名k最近邻方法对各个领域的数据输入具有重要意义,并且可以有助于开发更有效和准确的输入方法,而不会增加算法的计算复杂性。
{"title":"A novel ranked <i>k</i>-nearest neighbors algorithm for missing data imputation.","authors":"Yasir Khan, Said Farooq Shah, Syed Muhammad Asim","doi":"10.1080/02664763.2024.2414357","DOIUrl":"10.1080/02664763.2024.2414357","url":null,"abstract":"<p><p>Missing data is a common problem in many domains that rely on data analysis. The <i>k</i> Nearest Neighbors imputation method has been widely used to address this issue, but it has limitations in accurately imputing missing values, especially for datasets with small pairwise correlations and small values of <i>k</i>. In this study, we proposed a method, Ranked <i>k</i> Nearest Neighbors imputation that uses a similar approach to <i>k</i> Nearest Neighbor, but utilizing the concept of Ranked set sampling to select the most relevant neighbors for imputation. Our results show that the proposed method outperforms the standard <i>k</i> nearest neighbor method in terms of imputation accuracy both in case of Missing Completely at Random and Missing at Random mechanism, as demonstrated by consistently lower MSIE and MAIE values across all datasets. This suggests that the proposed method is a promising alternative for imputing missing values in datasets with small pairwise correlations and small values of <i>k</i>. Thus, the proposed Ranked <i>k</i> Nearest Neighbor method has important implications for data imputation in various domains and can contribute to the development of more efficient and accurate imputation methods without adding any computational complexity to an algorithm.</p>","PeriodicalId":15239,"journal":{"name":"Journal of Applied Statistics","volume":"52 5","pages":"1103-1127"},"PeriodicalIF":1.1,"publicationDate":"2024-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11951327/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143752879","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Robust multi-outcome regression with correlated covariate blocks using fused LAD-lasso. 采用融合ladl -套索的相关协变量块鲁棒多结果回归。
IF 1.2 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2024-10-11 eCollection Date: 2025-01-01 DOI: 10.1080/02664763.2024.2414346
Jyrki Möttönen, Tero Lähderanta, Janne Salonen, Mikko J Sillanpää

Lasso is a popular and efficient approach to simultaneous estimation and variable selection in high-dimensional regression models. In this paper, a robust fused LAD-lasso method for multiple outcomes is presented that addresses the challenges of non-normal outcome distributions and outlying observations. Measured covariate data from space or time, or spectral bands or genomic positions often have natural correlation structure arising from measuring distance between the covariates. The proposed multi-outcome approach includes handling of such covariate blocks by a group fusion penalty, which encourages similarity between neighboring regression coefficient vectors by penalizing their differences, for example, in sequential data situation. Properties of the proposed approach are illustrated by extensive simulations using BIC-type criteria for model selection. The method is also applied to a real-life skewed data on retirement behavior with longitudinal heteroscedastic explanatory variables.

Lasso是一种在高维回归模型中进行同时估计和变量选择的有效方法。本文提出了一种鲁棒的多结果融合LAD-lasso方法,解决了非正态结果分布和离群观测值的挑战。从空间或时间、光谱波段或基因组位置测量的协变量数据往往具有由于测量协变量之间的距离而产生的自然相关结构。所提出的多结果方法包括通过组融合惩罚来处理这些协变量块,该方法通过惩罚相邻回归系数向量的差异来鼓励它们之间的相似性,例如,在顺序数据情况下。采用bic类型的模型选择标准进行了广泛的仿真,说明了所提出方法的特性。该方法还应用于具有纵向异方差解释变量的现实生活中有关退休行为的偏斜数据。
{"title":"Robust multi-outcome regression with correlated covariate blocks using fused LAD-lasso.","authors":"Jyrki Möttönen, Tero Lähderanta, Janne Salonen, Mikko J Sillanpää","doi":"10.1080/02664763.2024.2414346","DOIUrl":"10.1080/02664763.2024.2414346","url":null,"abstract":"<p><p>Lasso is a popular and efficient approach to simultaneous estimation and variable selection in high-dimensional regression models. In this paper, a robust fused LAD-lasso method for multiple outcomes is presented that addresses the challenges of non-normal outcome distributions and outlying observations. Measured covariate data from space or time, or spectral bands or genomic positions often have natural correlation structure arising from measuring distance between the covariates. The proposed multi-outcome approach includes handling of such covariate blocks by a group fusion penalty, which encourages similarity between neighboring regression coefficient vectors by penalizing their differences, for example, in sequential data situation. Properties of the proposed approach are illustrated by extensive simulations using BIC-type criteria for model selection. The method is also applied to a real-life skewed data on retirement behavior with longitudinal heteroscedastic explanatory variables.</p>","PeriodicalId":15239,"journal":{"name":"Journal of Applied Statistics","volume":"52 5","pages":"1081-1102"},"PeriodicalIF":1.2,"publicationDate":"2024-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11951329/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143752845","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bayesian poisson regression tensor train decomposition model for learning mortality pattern changes during COVID-19 pandemic. 基于贝叶斯泊松回归张量序列分解模型的COVID-19大流行期间死亡模式变化研究。
IF 1.2 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2024-10-10 eCollection Date: 2025-01-01 DOI: 10.1080/02664763.2024.2411608
Wei Zhang, Antonietta Mira, Ernst C Wit

COVID-19 has led to excess deaths around the world. However, the impact on mortality rates from other causes of death during this time remains unclear. To understand the broader impact of COVID-19 on other causes of death, we analyze Italian official data covering monthly mortality counts from January 2015 to December 2020. To handle the high-dimensional nature of the data, we developed a model that combines Poisson regression with tensor train decomposition to explore the lower-dimensional residual structure of the data. Our Bayesian approach incorporates prior information on model parameters and utilizes an efficient Metropolis-Hastings within Gibbs algorithm for posterior inference. Simulation studies were conducted to validate our approach. Our method not only identifies differential effects of interventions on cause-specific mortality rates through Poisson regression but also provides insights into the relationship between COVID-19 and other causes of death. Additionally, it uncovers latent classes related to demographic characteristics, temporal patterns, and causes of death.

COVID-19在世界各地导致了过多的死亡。然而,在此期间,其他死因对死亡率的影响仍不清楚。为了了解COVID-19对其他死亡原因的更广泛影响,我们分析了意大利2015年1月至2020年12月每月死亡人数的官方数据。为了处理数据的高维性质,我们开发了一个将泊松回归与张量序列分解相结合的模型来探索数据的低维剩余结构。我们的贝叶斯方法结合了模型参数的先验信息,并在Gibbs算法中利用高效的Metropolis-Hastings进行后验推理。进行了模拟研究来验证我们的方法。我们的方法不仅通过泊松回归确定了干预措施对病因特异性死亡率的不同影响,而且还提供了对COVID-19与其他死亡原因之间关系的见解。此外,它还揭示了与人口特征、时间模式和死亡原因相关的潜在类别。
{"title":"Bayesian poisson regression tensor train decomposition model for learning mortality pattern changes during COVID-19 pandemic.","authors":"Wei Zhang, Antonietta Mira, Ernst C Wit","doi":"10.1080/02664763.2024.2411608","DOIUrl":"10.1080/02664763.2024.2411608","url":null,"abstract":"<p><p>COVID-19 has led to excess deaths around the world. However, the impact on mortality rates from other causes of death during this time remains unclear. To understand the broader impact of COVID-19 on other causes of death, we analyze Italian official data covering monthly mortality counts from January 2015 to December 2020. To handle the high-dimensional nature of the data, we developed a model that combines Poisson regression with tensor train decomposition to explore the lower-dimensional residual structure of the data. Our Bayesian approach incorporates prior information on model parameters and utilizes an efficient Metropolis-Hastings within Gibbs algorithm for posterior inference. Simulation studies were conducted to validate our approach. Our method not only identifies differential effects of interventions on cause-specific mortality rates through Poisson regression but also provides insights into the relationship between COVID-19 and other causes of death. Additionally, it uncovers latent classes related to demographic characteristics, temporal patterns, and causes of death.</p>","PeriodicalId":15239,"journal":{"name":"Journal of Applied Statistics","volume":"52 5","pages":"1017-1039"},"PeriodicalIF":1.2,"publicationDate":"2024-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11951336/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143752896","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Regression-based rectangular tolerance regions as reference regions in laboratory medicine. 基于回归的矩形公差区域作为检验医学的参考区域。
IF 1.1 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2024-10-08 eCollection Date: 2025-01-01 DOI: 10.1080/02664763.2024.2411614
Iana Michelle L Garcia, Michael Daniel C Lucagbo

Reference ranges are invaluable in laboratory medicine, as these are indispensable tools for the interpretation of laboratory test results. When assessing measurements on a single analyte, univariate reference intervals are required. In many cases, however, measurements on several analytes are needed by medical practitioners to diagnose more complicated conditions such as kidney function or liver function. For such cases, it is recommended to use multivariate reference regions, which account for the cross-correlations among the analytes. Traditionally, multivariate reference regions (MRRs) have been constructed as ellipsoidal regions. The disadvantage of such regions is that they are unable to detect component-wise outlying measurements. Because of this, rectangular reference regions have recently been put forward in the literature. In this study, we develop methodologies to compute rectangular MRRs that incorporate covariate information, which are often necessary in evaluating laboratory test results. We construct the reference region using tolerance-based criteria so that the resulting region possesses the multiple use property. Results show that the proposed regions yield coverage probabilities that are accurate and are robust to the sample size. Finally, we apply the proposed procedures to a real-life example on the computation of an MRR for three components of the insulin-like growth factor system.

参考范围在检验医学中是无价的,因为它们是解释实验室检测结果不可或缺的工具。当评估单个分析物的测量值时,需要单变量参考区间。然而,在许多情况下,医生需要对几种分析物进行测量,以诊断更复杂的疾病,如肾功能或肝功能。对于这种情况,建议使用多变量参考区域,这可以解释分析物之间的相互关系。传统上,多变量参考区域(mrr)被构建为椭球形区域。这种区域的缺点是它们无法检测到组件的外围测量值。正因为如此,最近在文献中提出了矩形参考区域。在这项研究中,我们开发了计算包含协变量信息的矩形磁共振成像的方法,这在评估实验室测试结果时通常是必要的。我们使用基于公差的准则构造参考区域,使结果区域具有多用途特性。结果表明,所提出的区域产生的覆盖概率是准确的,并且对样本量具有鲁棒性。最后,我们将提出的程序应用于一个现实生活中的例子,计算胰岛素样生长因子系统的三个组成部分的MRR。
{"title":"Regression-based rectangular tolerance regions as reference regions in laboratory medicine.","authors":"Iana Michelle L Garcia, Michael Daniel C Lucagbo","doi":"10.1080/02664763.2024.2411614","DOIUrl":"10.1080/02664763.2024.2411614","url":null,"abstract":"<p><p>Reference ranges are invaluable in laboratory medicine, as these are indispensable tools for the interpretation of laboratory test results. When assessing measurements on a single analyte, univariate reference intervals are required. In many cases, however, measurements on several analytes are needed by medical practitioners to diagnose more complicated conditions such as kidney function or liver function. For such cases, it is recommended to use multivariate reference regions, which account for the cross-correlations among the analytes. Traditionally, multivariate reference regions (MRRs) have been constructed as ellipsoidal regions. The disadvantage of such regions is that they are unable to detect component-wise outlying measurements. Because of this, rectangular reference regions have recently been put forward in the literature. In this study, we develop methodologies to compute rectangular MRRs that incorporate covariate information, which are often necessary in evaluating laboratory test results. We construct the reference region using tolerance-based criteria so that the resulting region possesses the multiple use property. Results show that the proposed regions yield coverage probabilities that are accurate and are robust to the sample size. Finally, we apply the proposed procedures to a real-life example on the computation of an MRR for three components of the insulin-like growth factor system.</p>","PeriodicalId":15239,"journal":{"name":"Journal of Applied Statistics","volume":"52 5","pages":"1040-1062"},"PeriodicalIF":1.1,"publicationDate":"2024-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11951328/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143752912","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
COVINet: a deep learning-based and interpretable prediction model for the county-wise trajectories of COVID-19 in the United States. COVINet:一个基于深度学习和可解释的预测模型,用于预测美国各州的COVID-19发展轨迹。
IF 1.1 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2024-10-08 eCollection Date: 2025-01-01 DOI: 10.1080/02664763.2024.2412284
Yukang Jiang, Ting Tian, Wenting Zhou, Yuting Zhang, Zhongfei Li, Xueqin Wang, Heping Zhang

The devastating impact of COVID-19 on the United States has been profound since its onset in January 2020. Predicting the trajectory of epidemics accurately and devising strategies to curb their progression are currently formidable challenges. In response to this crisis, we propose COVINet, which combines the architecture of Long Short-Term Memory and Gated Recurrent Unit, incorporating actionable covariates to offer high-accuracy prediction and explainable response. First, we train COVINet models for confirmed cases and total deaths with five input features, and compare Mean Absolute Errors (MAEs) and Mean Relative Errors (MREs) of COVINet against ten competing models from the United States CDC in the last four weeks before April 26, 2021. The results show COVINet outperforms all competing models for MAEs and MREs when predicting total deaths. Then, we focus on prediction for the most severe county in each of the top 10 hot-spot states using COVINet. The MREs are small for all predictions made in the last 7 or 30 days before March 23, 2023. Beyond predictive accuracy, COVINet offers high interpretability, enhancing the understanding of pandemic dynamics. This dual capability positions COVINet as a powerful tool for informing effective strategies in pandemic prevention and governmental decision-making.

2019冠状病毒病自2020年1月爆发以来,对美国造成了深远的破坏性影响。准确预测流行病的发展轨迹和制定遏制其发展的战略目前是一项艰巨的挑战。针对这一危机,我们提出了COVINet,它结合了长短期记忆和门控循环单元的架构,结合了可操作的协变量,提供了高精度的预测和可解释的响应。首先,我们用五个输入特征训练COVINet模型,并在2021年4月26日之前的最后四周内,将COVINet模型的平均绝对误差(MAEs)和平均相对误差(MREs)与美国CDC的10个竞争模型进行比较。结果表明,在预测总死亡人数时,COVINet优于MAEs和MREs的所有竞争模型。然后,我们重点使用COVINet预测前10个热点州中最严重的县。对于2023年3月23日之前的最后7天或30天的所有预测来说,MREs都很小。除了预测准确性外,COVINet还具有高度的可解释性,增强了对大流行动态的理解。这种双重能力使冠状病毒网成为为大流行预防和政府决策的有效战略提供信息的有力工具。
{"title":"COVINet: a deep learning-based and interpretable prediction model for the county-wise trajectories of COVID-19 in the United States.","authors":"Yukang Jiang, Ting Tian, Wenting Zhou, Yuting Zhang, Zhongfei Li, Xueqin Wang, Heping Zhang","doi":"10.1080/02664763.2024.2412284","DOIUrl":"10.1080/02664763.2024.2412284","url":null,"abstract":"<p><p>The devastating impact of COVID-19 on the United States has been profound since its onset in January 2020. Predicting the trajectory of epidemics accurately and devising strategies to curb their progression are currently formidable challenges. In response to this crisis, we propose COVINet, which combines the architecture of Long Short-Term Memory and Gated Recurrent Unit, incorporating actionable covariates to offer high-accuracy prediction and explainable response. First, we train COVINet models for confirmed cases and total deaths with five input features, and compare Mean Absolute Errors (MAEs) and Mean Relative Errors (MREs) of COVINet against ten competing models from the United States CDC in the last four weeks before April 26, 2021. The results show COVINet outperforms all competing models for MAEs and MREs when predicting total deaths. Then, we focus on prediction for the most severe county in each of the top 10 hot-spot states using COVINet. The MREs are small for all predictions made in the last 7 or 30 days before March 23, 2023. Beyond predictive accuracy, COVINet offers high interpretability, enhancing the understanding of pandemic dynamics. This dual capability positions COVINet as a powerful tool for informing effective strategies in pandemic prevention and governmental decision-making.</p>","PeriodicalId":15239,"journal":{"name":"Journal of Applied Statistics","volume":"52 5","pages":"1063-1080"},"PeriodicalIF":1.1,"publicationDate":"2024-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11951337/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143752907","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The efficiency of CUSUM schemes for monitoring the multivariate coefficient of variation in short runs process. CUSUM方案监测短期过程多变量变异系数的有效性。
IF 1.1 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2024-09-25 eCollection Date: 2025-01-01 DOI: 10.1080/02664763.2024.2405111
Xuelong Hu, Yixuan Ma, Jiening Zhang, Jiujun Zhang, Ali Yeganeh, Sandile Charles Shongwe

Current monitoring technologies emphasize and address the issue of monitoring high-volume production processes. The high flexibility and diversity of current industrial production processes make monitoring technology for small batch processes even more important. In multivariate process monitoring, a broader applicability exists in multivariate coefficients of variation (MCV) based monitoring schemes due to the lower restriction of the process. In view of the effectiveness of MCV monitoring and with the aim to achieve further performance improvement of current MCV monitoring schemes in a finite horizon production, we additionally introduce two one-sided cumulative sum (CUSUM) MCV schemes. In the case of deterministic and random shifts, the design parameters of the proposed schemes are obtained via an optimization procedure designed by the Markov chain method and the corresponding performance is analysed based on different run length (RL) characteristics, including the mean and the standard deviation. Simulation comparisons with existing exponentially weighted moving average (EWMA) MCV schemes show that the proposed CUSUM MCV schemes are more efficient in monitoring most of the shifts, including the deterministic and random shifts. Finally, to demonstrate the benefits of the new monitoring schemes, a comprehensive case study on monitoring a steel sleeve manufacturing process is conducted.

目前的监测技术强调并解决监测大批量生产过程的问题。当前工业生产过程的高度灵活性和多样性使得小批量过程的监控技术变得更加重要。在多变量过程监控中,基于多变量变异系数(MCV)的监控方案具有较低的过程约束,适用性较广。鉴于MCV监测的有效性,为了进一步提高现有MCV监测方案在有限水平生产中的性能,我们又引入了两种单边累积和(CUSUM) MCV方案。在确定性和随机漂移的情况下,通过马尔可夫链方法设计的优化程序获得了所提出方案的设计参数,并根据不同的运行长度(RL)特征(包括平均值和标准差)分析了相应的性能。与现有指数加权移动平均MCV (exponential weighted moving average, EWMA)方案的仿真比较表明,CUSUM MCV方案能够更有效地监测大多数移动,包括确定性移动和随机移动。最后,为了证明新监测方案的好处,对钢套制造过程的监测进行了全面的案例研究。
{"title":"The efficiency of CUSUM schemes for monitoring the multivariate coefficient of variation in short runs process.","authors":"Xuelong Hu, Yixuan Ma, Jiening Zhang, Jiujun Zhang, Ali Yeganeh, Sandile Charles Shongwe","doi":"10.1080/02664763.2024.2405111","DOIUrl":"10.1080/02664763.2024.2405111","url":null,"abstract":"<p><p>Current monitoring technologies emphasize and address the issue of monitoring high-volume production processes. The high flexibility and diversity of current industrial production processes make monitoring technology for small batch processes even more important. In multivariate process monitoring, a broader applicability exists in multivariate coefficients of variation (MCV) based monitoring schemes due to the lower restriction of the process. In view of the effectiveness of MCV monitoring and with the aim to achieve further performance improvement of current MCV monitoring schemes in a finite horizon production, we additionally introduce two one-sided cumulative sum (CUSUM) MCV schemes. In the case of deterministic and random shifts, the design parameters of the proposed schemes are obtained via an optimization procedure designed by the Markov chain method and the corresponding performance is analysed based on different run length (RL) characteristics, including the mean and the standard deviation. Simulation comparisons with existing exponentially weighted moving average (EWMA) MCV schemes show that the proposed CUSUM MCV schemes are more efficient in monitoring most of the shifts, including the deterministic and random shifts. Finally, to demonstrate the benefits of the new monitoring schemes, a comprehensive case study on monitoring a steel sleeve manufacturing process is conducted.</p>","PeriodicalId":15239,"journal":{"name":"Journal of Applied Statistics","volume":"52 4","pages":"966-992"},"PeriodicalIF":1.1,"publicationDate":"2024-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11873948/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143557090","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A class of infinite number of unbiased estimators using weighted squared distance for two-deck randomized response model. 一类二阶随机响应模型的无偏加权距离平方估计。
IF 1.1 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2024-09-25 eCollection Date: 2025-01-01 DOI: 10.1080/02664763.2024.2399574
Daryan Naatjes, Stephen A Sedory, Sarjinder Singh

We develop a collection of unbiased estimators for the proportion of a population bearing a sensitive characteristic using a randomized response technique with two decks of cards for any choice of weights. The efficiency of the estimator depends on the weights, and we demonstrate how to find an optimal choice. The coefficients of skewness and kurtosis are introduced. We support our findings with a simulation study that models a real survey dataset. We suggest that a careful choice of such weights can also lead to all estimates of proportion lying between [0, 1]. In addition, we illustrate the use of the estimators in a recent study that estimates the proportion of students, 18 years and over, who had returned to the campus and tested positive for COVID-19.

我们开发了一组无偏估计的人口的比例具有敏感的特点,使用随机响应技术与两副牌的任何选择的权重。估计器的效率取决于权重,我们演示了如何找到一个最优选择。引入了偏度系数和峰度系数。我们通过模拟真实调查数据集的模拟研究来支持我们的发现。我们认为,仔细选择这些权重也可能导致所有比例估计值位于[0,1]之间。此外,我们在最近的一项研究中说明了估计器的使用,该研究估计了18岁及以上返回校园并检测出COVID-19呈阳性的学生的比例。
{"title":"A class of infinite number of unbiased estimators using weighted squared distance for two-deck randomized response model.","authors":"Daryan Naatjes, Stephen A Sedory, Sarjinder Singh","doi":"10.1080/02664763.2024.2399574","DOIUrl":"10.1080/02664763.2024.2399574","url":null,"abstract":"<p><p>We develop a collection of unbiased estimators for the proportion of a population bearing a sensitive characteristic using a randomized response technique with two decks of cards for any choice of weights. The efficiency of the estimator depends on the weights, and we demonstrate how to find an optimal choice. The coefficients of skewness and kurtosis are introduced. We support our findings with a simulation study that models a real survey dataset. We suggest that a careful choice of such weights can also lead to all estimates of proportion lying between [0, 1]. In addition, we illustrate the use of the estimators in a recent study that estimates the proportion of students, 18 years and over, who had returned to the campus and tested positive for COVID-19.</p>","PeriodicalId":15239,"journal":{"name":"Journal of Applied Statistics","volume":"52 4","pages":"868-893"},"PeriodicalIF":1.1,"publicationDate":"2024-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11873971/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143557047","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of Applied Statistics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1