首页 > 最新文献

Statistics in Medicine最新文献

英文 中文
A Comparison of Variance Estimators for Logistic Regression Models Estimated Using Generalized Estimating Equations (GEE) in the Context of Observational Health Services Research. 观察性健康服务研究中使用广义估计方程 (GEE) 估计的 Logistic 回归模型方差估计器的比较》(A Comparison of Variance Estimators for Logistic Regression Models Estimated Using Generalized Estimating Equations (GEE) in the Context of Observational Health Services Research)。
IF 1.8 4区 医学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-10-31 DOI: 10.1002/sim.10260
Peter C Austin

In observational health services research, researchers often use clustered data to estimate the independent association between individual outcomes and several cluster-level covariates after adjusting for individual-level characteristics. Generalized estimating equations are a popular method for estimating generalized linear models using clustered data. The conventional Liang-Zeger variance estimator is known to result in estimated standard errors that are biased low when the number of clusters in small. Alternative variance estimators have been proposed for use when the number of clusters is low. Previous studies focused on these alternative variance estimators in the context of cluster randomized trials, which are often characterized by a small number of clusters and by an outcomes regression model that often consists of a single cluster-level variable (the treatment/exposure variable). We addressed the following questions: (i) which estimator is preferred for estimating the standard errors of cluster-level covariates for logistic regression models with multiple binary and continuous cluster-level variables in addition to subject-level variables; (ii) in such settings, how many clusters are required for the Liang-Zeger variance estimator to have acceptable performance for estimating the standard errors of cluster-level covariates. We suggest that when estimating standard errors: (i) when the number of clusters is < 15 use the Kauermann-Carroll estimator; (ii) when the number of clusters is between 15 and 40 use the Fay-Graubard estimator; (iii) when the number of clusters exceeds 40, use the Liang-Zeger estimator or the Fay-Graubard estimator. When estimating confidence intervals, we suggest using the Mancl-DeRouen estimator with a t-distribution.

在观察性健康服务研究中,研究人员经常使用聚类数据来估计个体结果与调整个体水平特征后的几个聚类水平协变量之间的独立关联。广义估计方程是利用聚类数据估计广义线性模型的常用方法。众所周知,当聚类数量较少时,传统的梁-泽格方差估计器会导致估计标准误差偏低。有人提出了在聚类数较少时使用的替代方差估计器。以前的研究主要针对分组随机试验中的这些替代方差估计器,分组随机试验的特点通常是分组数量少,结果回归模型通常由单一分组变量(治疗/暴露变量)组成。我们探讨了以下问题:(i) 对于除受试者变量外还包含多个二元和连续群组级变量的逻辑回归模型,哪种估计器更适合用于估计群组级协变量的标准误差;(ii) 在这种情况下,需要多少群组才能使梁-泽格方差估计器在估计群组级协变量的标准误差时具有可接受的性能。我们建议在估计标准误差时:(i) 当聚类的数量是
{"title":"A Comparison of Variance Estimators for Logistic Regression Models Estimated Using Generalized Estimating Equations (GEE) in the Context of Observational Health Services Research.","authors":"Peter C Austin","doi":"10.1002/sim.10260","DOIUrl":"https://doi.org/10.1002/sim.10260","url":null,"abstract":"<p><p>In observational health services research, researchers often use clustered data to estimate the independent association between individual outcomes and several cluster-level covariates after adjusting for individual-level characteristics. Generalized estimating equations are a popular method for estimating generalized linear models using clustered data. The conventional Liang-Zeger variance estimator is known to result in estimated standard errors that are biased low when the number of clusters in small. Alternative variance estimators have been proposed for use when the number of clusters is low. Previous studies focused on these alternative variance estimators in the context of cluster randomized trials, which are often characterized by a small number of clusters and by an outcomes regression model that often consists of a single cluster-level variable (the treatment/exposure variable). We addressed the following questions: (i) which estimator is preferred for estimating the standard errors of cluster-level covariates for logistic regression models with multiple binary and continuous cluster-level variables in addition to subject-level variables; (ii) in such settings, how many clusters are required for the Liang-Zeger variance estimator to have acceptable performance for estimating the standard errors of cluster-level covariates. We suggest that when estimating standard errors: (i) when the number of clusters is < 15 use the Kauermann-Carroll estimator; (ii) when the number of clusters is between 15 and 40 use the Fay-Graubard estimator; (iii) when the number of clusters exceeds 40, use the Liang-Zeger estimator or the Fay-Graubard estimator. When estimating confidence intervals, we suggest using the Mancl-DeRouen estimator with a t-distribution.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":" ","pages":""},"PeriodicalIF":1.8,"publicationDate":"2024-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142547547","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A hybrid approach to sample size re-estimation in cluster randomized trials with continuous outcomes. 在具有连续结果的分组随机试验中重新估计样本量的混合方法。
IF 1.8 4区 医学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-10-30 Epub Date: 2024-08-28 DOI: 10.1002/sim.10205
Samuel K Sarkodie, James Ms Wason, Michael J Grayling

This study presents a hybrid (Bayesian-frequentist) approach to sample size re-estimation (SSRE) for cluster randomised trials with continuous outcome data, allowing for uncertainty in the intra-cluster correlation (ICC). In the hybrid framework, pre-trial knowledge about the ICC is captured by placing a Truncated Normal prior on it, which is then updated at an interim analysis using the study data, and used in expected power control. On average, both the hybrid and frequentist approaches mitigate against the implications of misspecifying the ICC at the trial's design stage. In addition, both frameworks lead to SSRE designs with approximate control of the type I error-rate at the desired level. It is clearly demonstrated how the hybrid approach is able to reduce the high variability in the re-estimated sample size observed within the frequentist framework, based on the informativeness of the prior. However, misspecification of a highly informative prior can cause significant power loss. In conclusion, a hybrid approach could offer advantages to cluster randomised trials using SSRE. Specifically, when there is available data or expert opinion to help guide the choice of prior for the ICC, the hybrid approach can reduce the variance of the re-estimated required sample size compared to a frequentist approach. As SSRE is unlikely to be employed when there is substantial amounts of such data available (ie, when a constructed prior is highly informative), the greatest utility of a hybrid approach to SSRE likely lies when there is low-quality evidence available to guide the choice of prior.

本研究提出了一种混合(贝叶斯-频率主义)方法,用于对具有连续结果数据的分组随机试验进行样本量再估计(SSRE),并考虑了分组内相关性(ICC)的不确定性。在混合框架中,通过对 ICC 建立截断正态先验来获取试验前对 ICC 的了解,然后在中期分析中使用研究数据对 ICC 进行更新,并用于预期功率控制。平均而言,混合法和频数法都能减轻在试验设计阶段错误指定 ICC 所带来的影响。此外,这两种框架都能将 SSRE 设计的 I 型误差率近似控制在理想水平。这清楚地表明了混合方法如何能够根据先验的信息量,减少频繁主义框架中观察到的重新估计样本量的高变异性。然而,对高信息量先验的错误定义会导致显著的功率损失。总之,混合方法可以为使用 SSRE 的分组随机试验提供优势。具体来说,当有可用数据或专家意见来帮助指导 ICC 先验值的选择时,与频数法相比,混合法可以减少重新估计的所需样本量的方差。由于 SSRE 不太可能在有大量此类数据时使用(即构建的先验具有高度信息性时),因此 SSRE 混合方法的最大作用可能是在有低质量证据可用于指导先验选择时。
{"title":"A hybrid approach to sample size re-estimation in cluster randomized trials with continuous outcomes.","authors":"Samuel K Sarkodie, James Ms Wason, Michael J Grayling","doi":"10.1002/sim.10205","DOIUrl":"10.1002/sim.10205","url":null,"abstract":"<p><p>This study presents a hybrid (Bayesian-frequentist) approach to sample size re-estimation (SSRE) for cluster randomised trials with continuous outcome data, allowing for uncertainty in the intra-cluster correlation (ICC). In the hybrid framework, pre-trial knowledge about the ICC is captured by placing a Truncated Normal prior on it, which is then updated at an interim analysis using the study data, and used in expected power control. On average, both the hybrid and frequentist approaches mitigate against the implications of misspecifying the ICC at the trial's design stage. In addition, both frameworks lead to SSRE designs with approximate control of the type I error-rate at the desired level. It is clearly demonstrated how the hybrid approach is able to reduce the high variability in the re-estimated sample size observed within the frequentist framework, based on the informativeness of the prior. However, misspecification of a highly informative prior can cause significant power loss. In conclusion, a hybrid approach could offer advantages to cluster randomised trials using SSRE. Specifically, when there is available data or expert opinion to help guide the choice of prior for the ICC, the hybrid approach can reduce the variance of the re-estimated required sample size compared to a frequentist approach. As SSRE is unlikely to be employed when there is substantial amounts of such data available (ie, when a constructed prior is highly informative), the greatest utility of a hybrid approach to SSRE likely lies when there is low-quality evidence available to guide the choice of prior.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":" ","pages":"4736-4751"},"PeriodicalIF":1.8,"publicationDate":"2024-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142081539","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A simulation study of the performance of statistical models for count outcomes with excessive zeros. 统计模型对零点过多的计数结果的性能模拟研究。
IF 1.8 4区 医学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-10-30 Epub Date: 2024-08-28 DOI: 10.1002/sim.10198
Zhengyang Zhou, Dateng Li, David Huh, Minge Xie, Eun-Young Mun

Background: Outcome measures that are count variables with excessive zeros are common in health behaviors research. Examples include the number of standard drinks consumed or alcohol-related problems experienced over time. There is a lack of empirical data about the relative performance of prevailing statistical models for assessing the efficacy of interventions when outcomes are zero-inflated, particularly compared with recently developed marginalized count regression approaches for such data.

Methods: The current simulation study examined five commonly used approaches for analyzing count outcomes, including two linear models (with outcomes on raw and log-transformed scales, respectively) and three prevailing count distribution-based models (ie, Poisson, negative binomial, and zero-inflated Poisson (ZIP) models). We also considered the marginalized zero-inflated Poisson (MZIP) model, a novel alternative that estimates the overall effects on the population mean while adjusting for zero-inflation. Motivated by alcohol misuse prevention trials, extensive simulations were conducted to evaluate and compare the statistical power and Type I error rate of the statistical models and approaches across data conditions that varied in sample size ( N = 100 $$ N=100 $$ to 500), zero rate (0.2 to 0.8), and intervention effect sizes.

Results: Under zero-inflation, the Poisson model failed to control the Type I error rate, resulting in higher than expected false positive results. When the intervention effects on the zero (vs. non-zero) and count parts were in the same direction, the MZIP model had the highest statistical power, followed by the linear model with outcomes on the raw scale, negative binomial model, and ZIP model. The performance of the linear model with a log-transformed outcome variable was unsatisfactory.

Conclusions: The MZIP model demonstrated better statistical properties in detecting true intervention effects and controlling false positive results for zero-inflated count outcomes. This MZIP model may serve as an appealing analytical approach to evaluating overall intervention effects in studies with count outcomes marked by excessive zeros.

背景:在健康行为研究中,含有过多零的计数变量的结果测量很常见。例如,随着时间的推移,饮用标准饮料的数量或遇到的与酒精有关的问题。目前还缺乏经验数据来说明在结果为零膨胀的情况下,评估干预效果的常用统计模型的相对性能,尤其是与最近针对此类数据开发的边际化计数回归方法相比:当前的模拟研究考察了五种常用的计数结果分析方法,包括两种线性模型(结果分别为原始量表和对数变换量表)和三种常用的基于计数分布的模型(即泊松模型、负二项模型和零膨胀泊松模型)。我们还考虑了边际零膨胀泊松(MZIP)模型,这是一种新颖的替代方法,在估计对人群平均值的总体影响的同时对零膨胀进行调整。受酒精滥用预防试验的启发,我们进行了大量模拟,以评估和比较不同数据条件下统计模型和方法的统计能力和 I 类错误率,这些数据条件包括样本量(N = 100 $$ N=100 $$ 至 500)、零率(0.2 至 0.8)和干预效果大小:在零膨胀条件下,泊松模型无法控制 I 类错误率,导致假阳性结果高于预期。当零点(与非零点)和计数部分的干预效果方向相同时,MZIP 模型的统计能力最高,其次是原始量表结果的线性模型、负二项模型和 ZIP 模型。采用对数变换结果变量的线性模型的效果并不理想:MZIP模型在检测真实干预效果和控制零膨胀计数结果的假阳性结果方面表现出更好的统计特性。这种 MZIP 模型可作为一种有吸引力的分析方法,用于评估那些计数结果出现过多零的研究中的总体干预效果。
{"title":"A simulation study of the performance of statistical models for count outcomes with excessive zeros.","authors":"Zhengyang Zhou, Dateng Li, David Huh, Minge Xie, Eun-Young Mun","doi":"10.1002/sim.10198","DOIUrl":"10.1002/sim.10198","url":null,"abstract":"<p><strong>Background: </strong>Outcome measures that are count variables with excessive zeros are common in health behaviors research. Examples include the number of standard drinks consumed or alcohol-related problems experienced over time. There is a lack of empirical data about the relative performance of prevailing statistical models for assessing the efficacy of interventions when outcomes are zero-inflated, particularly compared with recently developed marginalized count regression approaches for such data.</p><p><strong>Methods: </strong>The current simulation study examined five commonly used approaches for analyzing count outcomes, including two linear models (with outcomes on raw and log-transformed scales, respectively) and three prevailing count distribution-based models (ie, Poisson, negative binomial, and zero-inflated Poisson (ZIP) models). We also considered the marginalized zero-inflated Poisson (MZIP) model, a novel alternative that estimates the overall effects on the population mean while adjusting for zero-inflation. Motivated by alcohol misuse prevention trials, extensive simulations were conducted to evaluate and compare the statistical power and Type I error rate of the statistical models and approaches across data conditions that varied in sample size ( <math> <semantics><mrow><mi>N</mi> <mo>=</mo> <mn>100</mn></mrow> <annotation>$$ N=100 $$</annotation></semantics> </math> to 500), zero rate (0.2 to 0.8), and intervention effect sizes.</p><p><strong>Results: </strong>Under zero-inflation, the Poisson model failed to control the Type I error rate, resulting in higher than expected false positive results. When the intervention effects on the zero (vs. non-zero) and count parts were in the same direction, the MZIP model had the highest statistical power, followed by the linear model with outcomes on the raw scale, negative binomial model, and ZIP model. The performance of the linear model with a log-transformed outcome variable was unsatisfactory.</p><p><strong>Conclusions: </strong>The MZIP model demonstrated better statistical properties in detecting true intervention effects and controlling false positive results for zero-inflated count outcomes. This MZIP model may serve as an appealing analytical approach to evaluating overall intervention effects in studies with count outcomes marked by excessive zeros.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":" ","pages":"4752-4767"},"PeriodicalIF":1.8,"publicationDate":"2024-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11483204/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142081540","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Principal quantile treatment effect estimation using principal scores. 利用主分数估算主量子治疗效果。
IF 1.8 4区 医学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-10-30 Epub Date: 2024-08-19 DOI: 10.1002/sim.10178
Kotaro Mizuma, Takamasa Hashimoto, Sho Sakui, Shingo Kuroda

Intercurrent events and estimands play a key role in defining the treatment effects of interest precisely. Sometimes the median or other quantiles of outcomes in a principal stratum according to potential occurrence of intercurrent events are of interest in randomized clinical trials. Naïve analyses such as those based on the observed occurrence of the intercurrent events lead to biased results. Therefore, we propose principal quantile treatment effect estimators that can nonparametrically estimate the distribution of potential outcomes by principal score weighting without relying on the exclusion restriction assumption. Our simulation studies show that the proposed method works in situations where the median or quantiles may be regarded as the preferred population-level summary over the mean. We illustrate our proposed method by using data from a randomized controlled trial conducted on patients with nonerosive reflux disease.

并发症和估计因子在精确定义相关治疗效果方面起着关键作用。有时,在随机临床试验中,根据并发症的潜在发生率得出的主要分层结果的中位数或其他定量值会引起人们的兴趣。基于观察到的并发症发生率等进行的天真分析会导致结果偏差。因此,我们提出了主量子治疗效果估计方法,该方法可以通过主评分加权对潜在结果的分布进行非参数估计,而无需依赖于排除限制假设。我们的模拟研究表明,在中位数或量位数可被视为优于平均值的人群水平总结的情况下,我们提出的方法是有效的。我们使用一项随机对照试验的数据对我们提出的方法进行了说明,该试验针对的是非啮蚀性反流病患者。
{"title":"Principal quantile treatment effect estimation using principal scores.","authors":"Kotaro Mizuma, Takamasa Hashimoto, Sho Sakui, Shingo Kuroda","doi":"10.1002/sim.10178","DOIUrl":"10.1002/sim.10178","url":null,"abstract":"<p><p>Intercurrent events and estimands play a key role in defining the treatment effects of interest precisely. Sometimes the median or other quantiles of outcomes in a principal stratum according to potential occurrence of intercurrent events are of interest in randomized clinical trials. Naïve analyses such as those based on the observed occurrence of the intercurrent events lead to biased results. Therefore, we propose principal quantile treatment effect estimators that can nonparametrically estimate the distribution of potential outcomes by principal score weighting without relying on the exclusion restriction assumption. Our simulation studies show that the proposed method works in situations where the median or quantiles may be regarded as the preferred population-level summary over the mean. We illustrate our proposed method by using data from a randomized controlled trial conducted on patients with nonerosive reflux disease.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":" ","pages":"4635-4649"},"PeriodicalIF":1.8,"publicationDate":"2024-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142000647","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Anomaly Detection and Correction in Dense Functional Data Within Electronic Medical Records. 电子病历中密集功能数据的异常检测与校正。
IF 1.8 4区 医学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-10-30 Epub Date: 2024-09-03 DOI: 10.1002/sim.10209
Daren Kuwaye, Hyunkeun Ryan Cho

In medical research, the accuracy of data from electronic medical records (EMRs) is critical, particularly when analyzing dense functional data, where anomalies can severely compromise research integrity. Anomalies in EMRs often arise from human errors in data measurement and entry, and increase in frequency with the volume of data. Despite the established methods in computer science, anomaly detection in medical applications remains underdeveloped. We address this deficiency by introducing a novel tool for identifying and correcting anomalies specifically in dense functional EMR data. Our approach utilizes studentized residuals from a mean-shift model, and therefore assumes that the data adheres to a smooth functional trajectory. Additionally, our method is tailored to be conservative, focusing on anomalies that signify actual errors in the data collection process while controlling for false discovery rates and type II errors. To support widespread implementation, we provide a comprehensive R package, ensuring that our methods can be applied in diverse settings. Our methodology's efficacy has been validated through rigorous simulation studies and real-world applications, confirming its ability to accurately identify and correct errors, thus enhancing the reliability and quality of medical data analysis.

在医学研究中,电子病历(EMR)数据的准确性至关重要,尤其是在分析密集的功能数据时,异常数据会严重影响研究的完整性。EMR 中的异常通常是由于数据测量和输入过程中的人为错误造成的,而且随着数据量的增加,出现异常的频率也会增加。尽管计算机科学中已经有了成熟的方法,但医疗应用中的异常检测仍然发展不足。针对这一不足,我们推出了一种新工具,专门用于识别和纠正密集功能性 EMR 数据中的异常。我们的方法利用均值移动模型的学生化残差,因此假定数据遵循平滑的功能轨迹。此外,我们的方法非常保守,在控制误发现率和 II 类错误的同时,重点关注数据收集过程中实际错误的异常现象。为了支持广泛实施,我们提供了一个全面的 R 软件包,确保我们的方法可以应用于各种环境。我们的方法的有效性已通过严格的模拟研究和实际应用进行了验证,证实了其准确识别和纠正错误的能力,从而提高了医学数据分析的可靠性和质量。
{"title":"Anomaly Detection and Correction in Dense Functional Data Within Electronic Medical Records.","authors":"Daren Kuwaye, Hyunkeun Ryan Cho","doi":"10.1002/sim.10209","DOIUrl":"10.1002/sim.10209","url":null,"abstract":"<p><p>In medical research, the accuracy of data from electronic medical records (EMRs) is critical, particularly when analyzing dense functional data, where anomalies can severely compromise research integrity. Anomalies in EMRs often arise from human errors in data measurement and entry, and increase in frequency with the volume of data. Despite the established methods in computer science, anomaly detection in medical applications remains underdeveloped. We address this deficiency by introducing a novel tool for identifying and correcting anomalies specifically in dense functional EMR data. Our approach utilizes studentized residuals from a mean-shift model, and therefore assumes that the data adheres to a smooth functional trajectory. Additionally, our method is tailored to be conservative, focusing on anomalies that signify actual errors in the data collection process while controlling for false discovery rates and type II errors. To support widespread implementation, we provide a comprehensive R package, ensuring that our methods can be applied in diverse settings. Our methodology's efficacy has been validated through rigorous simulation studies and real-world applications, confirming its ability to accurately identify and correct errors, thus enhancing the reliability and quality of medical data analysis.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":" ","pages":"4768-4777"},"PeriodicalIF":1.8,"publicationDate":"2024-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142120560","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Modeling multiple-criterion diagnoses by heterogeneous-instance logistic regression. 通过异质事例逻辑回归对多重标准诊断建模。
IF 1.8 4区 医学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-10-30 Epub Date: 2024-08-27 DOI: 10.1002/sim.10202
Chun-Hao Yang, Ming-Han Li, Shu-Fang Wen, Sheng-Mao Chang

Mild cognitive impairment (MCI) is a prodromal stage of Alzheimer's disease (AD) that causes a significant burden in caregiving and medical costs. Clinically, the diagnosis of MCI is determined by the impairment statuses of five cognitive domains. If one of these cognitive domains is impaired, the patient is diagnosed with MCI, and if two out of the five domains are impaired, the patient is diagnosed with AD. In medical records, most of the time, the diagnosis of MCI/AD is given, but not the statuses of the five domains. We may treat the domain statuses as missing variables. This diagnostic procedure relates MCI/AD status modeling to multiple-instance learning, where each domain resembles an instance. However, traditional multiple-instance learning assumes common predictors among instances, but in our case, each domain is associated with different predictors. In this article, we generalized the multiple-instance logistic regression to accommodate the heterogeneity in predictors among different instances. The proposed model is dubbed heterogeneous-instance logistic regression and is estimated via the expectation-maximization algorithm because of the presence of the missing variables. We also derived two variants of the proposed model for the MCI and AD diagnoses. The proposed model is validated in terms of its estimation accuracy, latent status prediction, and robustness via extensive simulation studies. Finally, we analyzed the National Alzheimer's Coordinating Center-Uniform Data Set using the proposed model and demonstrated its potential.

轻度认知功能障碍(MCI)是阿尔茨海默病(AD)的前驱阶段,给护理工作和医疗费用带来沉重负担。临床上,MCI 的诊断是根据五个认知领域的损伤状况来确定的。如果其中一个认知领域受损,患者就会被诊断为 MCI;如果五个认知领域中有两个受损,患者就会被诊断为 AD。在医疗记录中,大多数情况下都会给出 MCI/AD 的诊断,但不会给出五个领域的状态。我们可以将领域状态视为缺失变量。这种诊断程序将 MCI/AD 状态建模与多实例学习联系起来,其中每个领域都类似于一个实例。不过,传统的多实例学习假设实例之间有共同的预测因子,但在我们的案例中,每个域都与不同的预测因子相关联。在本文中,我们对多实例逻辑回归进行了概括,以适应不同实例间预测因子的异质性。由于存在缺失变量,我们提出的模型被称为异质性实例逻辑回归,并通过期望最大化算法进行估计。我们还针对 MCI 和 AD 诊断推导出了所提模型的两个变体。通过大量的模拟研究,我们从估计准确性、潜伏状态预测和稳健性等方面对所提出的模型进行了验证。最后,我们利用所提出的模型对国家阿尔茨海默氏症协调中心统一数据集进行了分析,并证明了该模型的潜力。
{"title":"Modeling multiple-criterion diagnoses by heterogeneous-instance logistic regression.","authors":"Chun-Hao Yang, Ming-Han Li, Shu-Fang Wen, Sheng-Mao Chang","doi":"10.1002/sim.10202","DOIUrl":"10.1002/sim.10202","url":null,"abstract":"<p><p>Mild cognitive impairment (MCI) is a prodromal stage of Alzheimer's disease (AD) that causes a significant burden in caregiving and medical costs. Clinically, the diagnosis of MCI is determined by the impairment statuses of five cognitive domains. If one of these cognitive domains is impaired, the patient is diagnosed with MCI, and if two out of the five domains are impaired, the patient is diagnosed with AD. In medical records, most of the time, the diagnosis of MCI/AD is given, but not the statuses of the five domains. We may treat the domain statuses as missing variables. This diagnostic procedure relates MCI/AD status modeling to multiple-instance learning, where each domain resembles an instance. However, traditional multiple-instance learning assumes common predictors among instances, but in our case, each domain is associated with different predictors. In this article, we generalized the multiple-instance logistic regression to accommodate the heterogeneity in predictors among different instances. The proposed model is dubbed heterogeneous-instance logistic regression and is estimated via the expectation-maximization algorithm because of the presence of the missing variables. We also derived two variants of the proposed model for the MCI and AD diagnoses. The proposed model is validated in terms of its estimation accuracy, latent status prediction, and robustness via extensive simulation studies. Finally, we analyzed the National Alzheimer's Coordinating Center-Uniform Data Set using the proposed model and demonstrated its potential.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":" ","pages":"4684-4701"},"PeriodicalIF":1.8,"publicationDate":"2024-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142073903","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Optimal subsampling for semi-parametric accelerated failure time models with massive survival data using a rank-based approach. 使用基于秩的方法,为具有海量生存数据的半参数加速失效时间模型优化子采样。
IF 1.8 4区 医学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-10-30 Epub Date: 2024-08-20 DOI: 10.1002/sim.10200
Zehan Yang, HaiYing Wang, Jun Yan

Subsampling is a practical strategy for analyzing vast survival data, which are progressively encountered across diverse research domains. While the optimal subsampling method has been applied to inferences for Cox models and parametric accelerated failure time (AFT) models, its application to semi-parametric AFT models with rank-based estimation have received limited attention. The challenges arise from the non-smooth estimating function for regression coefficients and the seemingly zero contribution from censored observations in estimating functions in the commonly seen form. To address these challenges, we develop optimal subsampling probabilities for both event and censored observations by expressing the estimating functions through a well-defined stochastic process. Meanwhile, we apply an induced smoothing procedure to the non-smooth estimating functions. As the optimal subsampling probabilities depend on the unknown regression coefficients, we employ a two-step procedure to obtain a feasible estimation method. An additional benefit of the method is its ability to resolve the issue of underestimation of the variance when the subsample size approaches the full sample size. We validate the performance of our estimators through a simulation study and apply the methods to analyze the survival time of lymphoma patients in the surveillance, epidemiology, and end results program.

子采样是分析大量生存数据的一种实用策略,在不同的研究领域都会逐渐遇到。虽然最优子抽样方法已被应用于 Cox 模型和参数加速失效时间(AFT)模型的推断,但其在基于秩估计的半参数 AFT 模型中的应用却受到了有限的关注。所面临的挑战来自回归系数的非光滑估计函数,以及在通常形式的估计函数中,删减观测值的贡献似乎为零。为了应对这些挑战,我们通过一个定义明确的随机过程来表达估计函数,从而为事件和删减观测值开发出最优的子采样概率。同时,我们对不平滑的估计函数采用了诱导平滑程序。由于最佳子采样概率取决于未知回归系数,我们采用了两步程序来获得可行的估计方法。该方法的另一个优点是,当子样本规模接近全样本规模时,它能解决方差估计不足的问题。我们通过模拟研究验证了估计方法的性能,并将这些方法用于分析监测、流行病学和最终结果项目中淋巴瘤患者的生存时间。
{"title":"Optimal subsampling for semi-parametric accelerated failure time models with massive survival data using a rank-based approach.","authors":"Zehan Yang, HaiYing Wang, Jun Yan","doi":"10.1002/sim.10200","DOIUrl":"10.1002/sim.10200","url":null,"abstract":"<p><p>Subsampling is a practical strategy for analyzing vast survival data, which are progressively encountered across diverse research domains. While the optimal subsampling method has been applied to inferences for Cox models and parametric accelerated failure time (AFT) models, its application to semi-parametric AFT models with rank-based estimation have received limited attention. The challenges arise from the non-smooth estimating function for regression coefficients and the seemingly zero contribution from censored observations in estimating functions in the commonly seen form. To address these challenges, we develop optimal subsampling probabilities for both event and censored observations by expressing the estimating functions through a well-defined stochastic process. Meanwhile, we apply an induced smoothing procedure to the non-smooth estimating functions. As the optimal subsampling probabilities depend on the unknown regression coefficients, we employ a two-step procedure to obtain a feasible estimation method. An additional benefit of the method is its ability to resolve the issue of underestimation of the variance when the subsample size approaches the full sample size. We validate the performance of our estimators through a simulation study and apply the methods to analyze the survival time of lymphoma patients in the surveillance, epidemiology, and end results program.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":" ","pages":"4650-4666"},"PeriodicalIF":1.8,"publicationDate":"2024-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142005263","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Group sequential designs for clinical trials when the maximum sample size is uncertain. 在最大样本量不确定的情况下,对临床试验进行分组序列设计。
IF 1.8 4区 医学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-10-30 Epub Date: 2024-08-21 DOI: 10.1002/sim.10203
Amin Yarahmadi, Lori E Dodd, Thomas Jaki, Peter Horby, Nigel Stallard

Motivated by the experience of COVID-19 trials, we consider clinical trials in the setting of an emerging disease in which the uncertainty of natural disease course and potential treatment effects makes advance specification of a sample size challenging. One approach to such a challenge is to use a group sequential design to allow the trial to stop on the basis of interim analysis results as soon as a conclusion regarding the effectiveness of the treatment under investigation can be reached. As such a trial may be halted before a formal stopping boundary is reached, we consider the final analysis under such a scenario, proposing alternative methods for when the decision to halt the trial is made with or without knowledge of interim analysis results. We address the problems of ensuring that the type I error rate neither exceeds nor falls unnecessarily far below the nominal level. We also propose methods in which there is no maximum sample size, the trial continuing either until the stopping boundary is reached or it is decided to halt the trial.

受 COVID-19 试验经验的启发,我们考虑了新发疾病背景下的临床试验,在这种情况下,自然病程和潜在治疗效果的不确定性使得提前确定样本量具有挑战性。应对这种挑战的一种方法是采用分组顺序设计,以便一旦对所研究的治疗效果得出结论,就可以根据中期分析结果停止试验。由于这种试验可能会在达到正式停止界限之前就停止,因此我们考虑了这种情况下的最终分析,提出了在了解或不了解中期分析结果的情况下决定停止试验的替代方法。我们解决了确保 I 类错误率既不超过名义水平,也不会不必要地远远低于名义水平的问题。我们还提出了不设最大样本量的方法,即试验一直持续到达到停止边界或决定停止试验为止。
{"title":"Group sequential designs for clinical trials when the maximum sample size is uncertain.","authors":"Amin Yarahmadi, Lori E Dodd, Thomas Jaki, Peter Horby, Nigel Stallard","doi":"10.1002/sim.10203","DOIUrl":"10.1002/sim.10203","url":null,"abstract":"<p><p>Motivated by the experience of COVID-19 trials, we consider clinical trials in the setting of an emerging disease in which the uncertainty of natural disease course and potential treatment effects makes advance specification of a sample size challenging. One approach to such a challenge is to use a group sequential design to allow the trial to stop on the basis of interim analysis results as soon as a conclusion regarding the effectiveness of the treatment under investigation can be reached. As such a trial may be halted before a formal stopping boundary is reached, we consider the final analysis under such a scenario, proposing alternative methods for when the decision to halt the trial is made with or without knowledge of interim analysis results. We address the problems of ensuring that the type I error rate neither exceeds nor falls unnecessarily far below the nominal level. We also propose methods in which there is no maximum sample size, the trial continuing either until the stopping boundary is reached or it is decided to halt the trial.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":" ","pages":"4667-4683"},"PeriodicalIF":1.8,"publicationDate":"2024-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142009561","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Estimating causes of maternal death in data-sparse contexts. 在数据稀缺的情况下估算孕产妇死亡原因。
IF 1.8 4区 医学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-10-30 Epub Date: 2024-08-27 DOI: 10.1002/sim.10199
Michael Y C Chong, Marija Pejchinovska, Monica Alexander

Understanding the underlying causes of maternal death across all regions of the world is essential to inform policies and resource allocation to reduce the mortality burden. However, in many countries there exists very little data on the causes of maternal death, and data that do exist do not capture the entire population at risk. In this article, we present a Bayesian hierarchical multinomial model to estimate maternal cause of death distributions globally, regionally, and for all countries worldwide. The framework combines data from various sources to inform estimates, including data from civil registration and vital systems, smaller-scale surveys and studies, and high-quality data from confidential enquiries and surveillance systems. The framework accounts for varying data quality and coverage, and allows for situations where one or more causes of death are missing. We illustrate the results of the model on three case-study countries that have different data availability situations.

了解世界各地孕产妇死亡的根本原因,对于制定政策和分配资源以降低死亡率负担至关重要。然而,在许多国家,有关孕产妇死亡原因的数据非常少,而现有的数据并不能涵盖所有面临风险的人群。在这篇文章中,我们提出了一个贝叶斯分层多叉模型,用于估算全球、地区和世界各国的孕产妇死因分布。该框架结合了各种来源的数据,为估算提供信息,包括民事登记和人口动态系统数据、较小规模的调查和研究,以及来自保密查询和监测系统的高质量数据。该框架考虑到了不同的数据质量和覆盖范围,并允许出现一种或多种死因缺失的情况。我们以三个数据可用性情况不同的案例研究国家为例,说明了该模型的结果。
{"title":"Estimating causes of maternal death in data-sparse contexts.","authors":"Michael Y C Chong, Marija Pejchinovska, Monica Alexander","doi":"10.1002/sim.10199","DOIUrl":"10.1002/sim.10199","url":null,"abstract":"<p><p>Understanding the underlying causes of maternal death across all regions of the world is essential to inform policies and resource allocation to reduce the mortality burden. However, in many countries there exists very little data on the causes of maternal death, and data that do exist do not capture the entire population at risk. In this article, we present a Bayesian hierarchical multinomial model to estimate maternal cause of death distributions globally, regionally, and for all countries worldwide. The framework combines data from various sources to inform estimates, including data from civil registration and vital systems, smaller-scale surveys and studies, and high-quality data from confidential enquiries and surveillance systems. The framework accounts for varying data quality and coverage, and allows for situations where one or more causes of death are missing. We illustrate the results of the model on three case-study countries that have different data availability situations.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":" ","pages":"4702-4735"},"PeriodicalIF":1.8,"publicationDate":"2024-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142073902","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Estimands and Cumulative Incidence Function Regression in Clinical Trials: Some New Results on Interpretability and Robustness. 临床试验中的估计量和累积发病率函数回归:关于可解释性和稳健性的一些新结果。
IF 1.8 4区 医学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-10-29 DOI: 10.1002/sim.10236
Alexandra Bühler, Richard J Cook, Jerald F Lawless

Regression analyses based on transformations of cumulative incidence functions are often adopted when modeling and testing for treatment effects in clinical trial settings involving competing and semi-competing risks. Common frameworks include the Fine-Gray model and models based on direct binomial regression. Using large sample theory we derive the limiting values of treatment effect estimators based on such models when the data are generated according to multiplicative intensity-based models, and show that the estimand is sensitive to several process features. The rejection rates of hypothesis tests based on cumulative incidence function regression models are also examined for null hypotheses of different types, based on which a robustness property is established. In such settings supportive secondary analyses of treatment effects are essential to ensure a full understanding of the nature of treatment effects. An application to a palliative study of individuals with breast cancer metastatic to bone is provided for illustration.

在涉及竞争风险和半竞争风险的临床试验中,在对治疗效果进行建模和检验时,通常会采用基于累积发病率函数变换的回归分析。常见的框架包括 Fine-Gray 模型和基于直接二项回归的模型。利用大样本理论,我们推导出了当数据根据基于强度的乘法模型生成时,基于此类模型的治疗效果估计值的极限值,并表明估计值对几个过程特征很敏感。我们还针对不同类型的零假设,检验了基于累积发生率函数回归模型的假设检验的拒绝率,并在此基础上建立了稳健性属性。在这种情况下,为确保全面了解治疗效果的性质,对治疗效果进行辅助性二次分析至关重要。为说明起见,我们提供了一个应用于乳腺癌骨转移患者姑息研究的案例。
{"title":"Estimands and Cumulative Incidence Function Regression in Clinical Trials: Some New Results on Interpretability and Robustness.","authors":"Alexandra Bühler, Richard J Cook, Jerald F Lawless","doi":"10.1002/sim.10236","DOIUrl":"https://doi.org/10.1002/sim.10236","url":null,"abstract":"<p><p>Regression analyses based on transformations of cumulative incidence functions are often adopted when modeling and testing for treatment effects in clinical trial settings involving competing and semi-competing risks. Common frameworks include the Fine-Gray model and models based on direct binomial regression. Using large sample theory we derive the limiting values of treatment effect estimators based on such models when the data are generated according to multiplicative intensity-based models, and show that the estimand is sensitive to several process features. The rejection rates of hypothesis tests based on cumulative incidence function regression models are also examined for null hypotheses of different types, based on which a robustness property is established. In such settings supportive secondary analyses of treatment effects are essential to ensure a full understanding of the nature of treatment effects. An application to a palliative study of individuals with breast cancer metastatic to bone is provided for illustration.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":" ","pages":""},"PeriodicalIF":1.8,"publicationDate":"2024-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142523102","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Statistics in Medicine
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1