首页 > 最新文献

Statistics in Medicine最新文献

英文 中文
Smooth Hazards With Multiple Time Scales. 平滑危险与多个时间尺度。
IF 1.8 4区 医学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-01-15 Epub Date: 2024-12-09 DOI: 10.1002/sim.10297
Angela Carollo, Paul Eilers, Hein Putter, Jutta Gampe

Hazard models are the most commonly used tool to analyze time-to-event data. If more than one time scale is relevant for the event under study, models are required that can incorporate the dependence of a hazard along two (or more) time scales. Such models should be flexible to capture the joint influence of several time scales, and nonparametric smoothing techniques are obvious candidates. P $$ P $$ -splines offer a flexible way to specify such hazard surfaces, and estimation is achieved by maximizing a penalized Poisson likelihood. Standard observation schemes, such as right-censoring and left-truncation, can be accommodated in a straightforward manner. Proportional hazards regression with a baseline hazard varying over two time scales is presented. Efficient computation is possible by generalized linear array model (GLAM) algorithms or by exploiting a sparse mixed model formulation. A companion R-package is provided.

风险模型是分析事件时间数据最常用的工具。如果一个以上的时间尺度与所研究的事件相关,则需要能够将危险在两个(或更多)时间尺度上的依赖性纳入模型。这样的模型应该是灵活的,以捕捉几个时间尺度的共同影响,非参数平滑技术是明显的候选人。P $$ P $$样条提供了一种灵活的方法来指定这样的危险表面,估计是通过最大化惩罚泊松似然来实现的。标准的观测方案,如右截和左截,可以以一种直接的方式进行调整。提出了在两个时间尺度上具有基线风险变化的比例风险回归。通过广义线性阵列模型(GLAM)算法或利用稀疏混合模型公式可以实现高效的计算。提供了一个配套的r包。
{"title":"Smooth Hazards With Multiple Time Scales.","authors":"Angela Carollo, Paul Eilers, Hein Putter, Jutta Gampe","doi":"10.1002/sim.10297","DOIUrl":"10.1002/sim.10297","url":null,"abstract":"<p><p>Hazard models are the most commonly used tool to analyze time-to-event data. If more than one time scale is relevant for the event under study, models are required that can incorporate the dependence of a hazard along two (or more) time scales. Such models should be flexible to capture the joint influence of several time scales, and nonparametric smoothing techniques are obvious candidates. <math> <semantics><mrow><mi>P</mi></mrow> <annotation>$$ P $$</annotation></semantics> </math> -splines offer a flexible way to specify such hazard surfaces, and estimation is achieved by maximizing a penalized Poisson likelihood. Standard observation schemes, such as right-censoring and left-truncation, can be accommodated in a straightforward manner. Proportional hazards regression with a baseline hazard varying over two time scales is presented. Efficient computation is possible by generalized linear array model (GLAM) algorithms or by exploiting a sparse mixed model formulation. A companion R-package is provided.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":" ","pages":"e10297"},"PeriodicalIF":1.8,"publicationDate":"2025-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142795142","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Linear Mixed Modeling of Federated Data When Only the Mean, Covariance, and Sample Size Are Available. 当只有平均值、协方差和样本量可用时,联邦数据的线性混合建模。
IF 1.8 4区 医学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-01-15 Epub Date: 2024-12-11 DOI: 10.1002/sim.10300
Marie Analiz April Limpoco, Christel Faes, Niel Hens

In medical research, individual-level patient data provide invaluable information, but the patients' right to confidentiality remains of utmost priority. This poses a huge challenge when estimating statistical models such as a linear mixed model, which is an extension of linear regression models that can account for potential heterogeneity whenever data come from different data providers. Federated learning tackles this hurdle by estimating parameters without retrieving individual-level data. Instead, iterative communication of parameter estimate updates between the data providers and analysts is required. In this article, we propose an alternative framework to federated learning for fitting linear mixed models. Specifically, our approach only requires the mean, covariance, and sample size of multiple covariates from different data providers once. Using the principle of statistical sufficiency within the likelihood framework as theoretical support, this proposed strategy achieves estimates identical to those derived from actual individual-level data. We demonstrate this approach through real data on 15 068 patient records from 70 clinics at the Children's Hospital of Pennsylvania. Assuming that each clinic only shares summary statistics once, we model the COVID-19 polymerase chain reaction test cycle threshold as a function of patient information. Simplicity, communication efficiency, generalisability, and wider scope of implementation in any statistical software distinguish our approach from existing strategies in the literature.

在医学研究中,个人层面的患者数据提供了宝贵的信息,但患者的保密权仍然是最优先的。这在估计统计模型(如线性混合模型)时提出了巨大的挑战,线性混合模型是线性回归模型的扩展,可以解释来自不同数据提供者的数据的潜在异质性。联邦学习通过估计参数而不检索个人层面的数据来解决这一障碍。相反,需要在数据提供者和分析人员之间进行参数估计更新的迭代通信。在本文中,我们提出了一个用于拟合线性混合模型的联邦学习的替代框架。具体来说,我们的方法只需要一次来自不同数据提供者的多个协变量的均值、协方差和样本量。利用似然框架内的统计充分性原则作为理论支持,该建议的策略实现了与实际个人数据得出的估计值相同的估计值。我们通过宾夕法尼亚儿童医院70个诊所的15068个病人记录的真实数据来证明这种方法。假设每个诊所只共享汇总统计数据一次,我们将COVID-19聚合酶链反应测试周期阈值建模为患者信息的函数。简单,沟通效率,通用性,和更广泛的实施范围在任何统计软件区分我们的方法从现有的策略在文献中。
{"title":"Linear Mixed Modeling of Federated Data When Only the Mean, Covariance, and Sample Size Are Available.","authors":"Marie Analiz April Limpoco, Christel Faes, Niel Hens","doi":"10.1002/sim.10300","DOIUrl":"10.1002/sim.10300","url":null,"abstract":"<p><p>In medical research, individual-level patient data provide invaluable information, but the patients' right to confidentiality remains of utmost priority. This poses a huge challenge when estimating statistical models such as a linear mixed model, which is an extension of linear regression models that can account for potential heterogeneity whenever data come from different data providers. Federated learning tackles this hurdle by estimating parameters without retrieving individual-level data. Instead, iterative communication of parameter estimate updates between the data providers and analysts is required. In this article, we propose an alternative framework to federated learning for fitting linear mixed models. Specifically, our approach only requires the mean, covariance, and sample size of multiple covariates from different data providers once. Using the principle of statistical sufficiency within the likelihood framework as theoretical support, this proposed strategy achieves estimates identical to those derived from actual individual-level data. We demonstrate this approach through real data on 15 068 patient records from 70 clinics at the Children's Hospital of Pennsylvania. Assuming that each clinic only shares summary statistics once, we model the COVID-19 polymerase chain reaction test cycle threshold as a function of patient information. Simplicity, communication efficiency, generalisability, and wider scope of implementation in any statistical software distinguish our approach from existing strategies in the literature.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":" ","pages":"e10300"},"PeriodicalIF":1.8,"publicationDate":"2025-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142814337","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Sieve Maximum Likelihood Estimation of Partially Linear Transformation Models With Interval-Censored Data. 具有区间删失数据的部分线性变换模型的筛式最大似然估计。
IF 1.8 4区 医学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-12-30 Epub Date: 2024-11-14 DOI: 10.1002/sim.10225
Changhui Yuan, Shishun Zhao, Shuwei Li, Xinyuan Song

Partially linear models provide a valuable tool for modeling failure time data with nonlinear covariate effects. Their applicability and importance in survival analysis have been widely acknowledged. To date, numerous inference methods for such models have been developed under traditional right censoring. However, the existing studies seldom target interval-censored data, which provide more coarse information and frequently occur in many scientific studies involving periodical follow-up. In this work, we propose a flexible class of partially linear transformation models to examine parametric and nonparametric covariate effects for interval-censored outcomes. We consider the sieve maximum likelihood estimation approach that approximates the cumulative baseline hazard function and nonparametric covariate effect with the monotone splines and B $$ B $$ -splines, respectively. We develop an easy-to-implement expectation-maximization algorithm coupled with three-stage data augmentation to facilitate maximization. We establish the consistency of the proposed estimators and the asymptotic distribution of parametric components based on the empirical process techniques. Numerical results from extensive simulation studies indicate that our proposed method performs satisfactorily in finite samples. An application to a study on hypobaric decompression sickness suggests that the variable TR360 exhibits a significant dynamic and nonlinear effect on the risk of developing hypobaric decompression sickness.

部分线性模型为具有非线性协变量效应的失效时间数据建模提供了宝贵的工具。它们在生存分析中的适用性和重要性已得到广泛认可。迄今为止,在传统的右普查条件下,已开发出许多针对此类模型的推断方法。然而,现有的研究很少针对区间删失数据,而区间删失数据能提供更粗略的信息,并经常出现在许多涉及定期随访的科学研究中。在这项工作中,我们提出了一类灵活的部分线性变换模型,用于检验区间删失结果的参数和非参数协变量效应。我们考虑了筛分最大似然估计方法,该方法分别用单调样条和 B $ B $ B -样条逼近累积基线危险函数和非参数协变量效应。我们开发了一种易于实现的期望最大化算法,并结合了三阶段数据扩增以促进最大化。我们基于经验过程技术,建立了所提出估计器的一致性和参数成分的渐近分布。大量模拟研究的数值结果表明,我们提出的方法在有限样本中的表现令人满意。应用于低压减压病研究的结果表明,变量 TR360 对患低压减压病的风险有显著的动态非线性影响。
{"title":"Sieve Maximum Likelihood Estimation of Partially Linear Transformation Models With Interval-Censored Data.","authors":"Changhui Yuan, Shishun Zhao, Shuwei Li, Xinyuan Song","doi":"10.1002/sim.10225","DOIUrl":"10.1002/sim.10225","url":null,"abstract":"<p><p>Partially linear models provide a valuable tool for modeling failure time data with nonlinear covariate effects. Their applicability and importance in survival analysis have been widely acknowledged. To date, numerous inference methods for such models have been developed under traditional right censoring. However, the existing studies seldom target interval-censored data, which provide more coarse information and frequently occur in many scientific studies involving periodical follow-up. In this work, we propose a flexible class of partially linear transformation models to examine parametric and nonparametric covariate effects for interval-censored outcomes. We consider the sieve maximum likelihood estimation approach that approximates the cumulative baseline hazard function and nonparametric covariate effect with the monotone splines and <math> <semantics><mrow><mi>B</mi></mrow> <annotation>$$ B $$</annotation></semantics> </math> -splines, respectively. We develop an easy-to-implement expectation-maximization algorithm coupled with three-stage data augmentation to facilitate maximization. We establish the consistency of the proposed estimators and the asymptotic distribution of parametric components based on the empirical process techniques. Numerical results from extensive simulation studies indicate that our proposed method performs satisfactorily in finite samples. An application to a study on hypobaric decompression sickness suggests that the variable TR360 exhibits a significant dynamic and nonlinear effect on the risk of developing hypobaric decompression sickness.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":" ","pages":"5765-5776"},"PeriodicalIF":1.8,"publicationDate":"2024-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142628019","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Quantifying Overdiagnosis for Multicancer Detection Tests: A Novel Method. 量化多癌症检测试验的过度诊断:一种新方法
IF 1.8 4区 医学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-12-30 Epub Date: 2024-11-26 DOI: 10.1002/sim.10285
Stuart G Baker

Multicancer detection (MCD) tests use blood specimens to detect preclinical cancers. A major concern is overdiagnosis, the detection of preclinical cancer on screening that would not have developed into symptomatic cancer in the absence of screening. Because overdiagnosis can lead to unnecessary and harmful treatments, its quantification is important. A key metric is the screen overdiagnosis fraction (SOF), the probability of overdiagnosis at screen detection. Estimating SOF is notoriously difficult because overdiagnosis is not observed. This estimation is more challenging with MCD tests because short-term results are needed as the technology is rapidly changing. To estimate average SOF for a program of yearly MCD tests, I introduce a novel method that requires at least two yearly MCD tests given to persons having a wide range of ages and applies only to cancers for which there is no conventional screening. The method assumes an exponential distribution for the sojourn time in an operational screen-detectable preclinical cancer (OPC) state, defined as once screen-detectable (positive screen and work-up), always screen-detectable. Because this assumption appears in only one term in the SOF formula, the results are robust to violations of the assumption. An SOF plot graphs average SOF versus mean sojourn time. With lung cancer screening data and synthetic data, SOF plots distinguished small from moderate levels of SOF. With its unique set of assumptions, the SOF plot would complement other modeling approaches for estimating SOF once sufficient short-term observational data on MCD tests become available.

多癌症检测(MCD)试验使用血液标本来检测临床前癌症。一个主要的问题是过度诊断,即在筛查中发现了临床前癌症,而如果没有进行筛查,这些癌症是不会发展成有症状的癌症的。由于过度诊断会导致不必要和有害的治疗,因此对其进行量化非常重要。一个关键指标是筛查过度诊断率(SOF),即筛查时过度诊断的概率。由于无法观察到过度诊断,因此估算 SOF 十分困难。对于 MCD 检测来说,这种估算更具挑战性,因为该技术变化迅速,需要短期结果。为了估算每年进行一次 MCD 检测项目的平均 SOF,我引入了一种新方法,该方法要求每年至少对不同年龄段的人群进行两次 MCD 检测,并且只适用于没有进行常规筛查的癌症。该方法假定在可操作筛查检测的临床前癌症(OPC)状态下的停留时间为指数分布,即一旦可筛查检测(筛查和检查结果呈阳性),则始终可筛查检测。由于这一假设只出现在 SOF 公式中的一个项中,因此结果对违反这一假设的情况是稳健的。SOF 图是平均 SOF 与平均停留时间的关系图。通过肺癌筛查数据和合成数据,SOF 图可以区分 SOF 的小度和中度水平。SOF 图具有一套独特的假设条件,一旦获得足够的 MCD 检测短期观察数据,它将成为其他估算 SOF 的建模方法的补充。
{"title":"Quantifying Overdiagnosis for Multicancer Detection Tests: A Novel Method.","authors":"Stuart G Baker","doi":"10.1002/sim.10285","DOIUrl":"10.1002/sim.10285","url":null,"abstract":"<p><p>Multicancer detection (MCD) tests use blood specimens to detect preclinical cancers. A major concern is overdiagnosis, the detection of preclinical cancer on screening that would not have developed into symptomatic cancer in the absence of screening. Because overdiagnosis can lead to unnecessary and harmful treatments, its quantification is important. A key metric is the screen overdiagnosis fraction (SOF), the probability of overdiagnosis at screen detection. Estimating SOF is notoriously difficult because overdiagnosis is not observed. This estimation is more challenging with MCD tests because short-term results are needed as the technology is rapidly changing. To estimate average SOF for a program of yearly MCD tests, I introduce a novel method that requires at least two yearly MCD tests given to persons having a wide range of ages and applies only to cancers for which there is no conventional screening. The method assumes an exponential distribution for the sojourn time in an operational screen-detectable preclinical cancer (OPC) state, defined as once screen-detectable (positive screen and work-up), always screen-detectable. Because this assumption appears in only one term in the SOF formula, the results are robust to violations of the assumption. An SOF plot graphs average SOF versus mean sojourn time. With lung cancer screening data and synthetic data, SOF plots distinguished small from moderate levels of SOF. With its unique set of assumptions, the SOF plot would complement other modeling approaches for estimating SOF once sufficient short-term observational data on MCD tests become available.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":" ","pages":"5935-5943"},"PeriodicalIF":1.8,"publicationDate":"2024-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11639630/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142732807","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bayesian Decision Curve Analysis With Bayesdca. 贝叶斯决策曲线的贝叶斯分析。
IF 1.8 4区 医学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-12-30 Epub Date: 2024-12-01 DOI: 10.1002/sim.10277
Giuliano Netto Flores Cruz, Keegan Korthauer

Clinical decisions are often guided by clinical prediction models or diagnostic tests. Decision curve analysis (DCA) combines classical assessment of predictive performance with the consequences of using these strategies for clinical decision-making. In DCA, the best decision strategy is the one that maximizes the net benefit: the net number of true positives (or negatives) provided by a given strategy. Here, we employ Bayesian approaches to DCA, addressing four fundamental concerns when evaluating clinical decision strategies: (i) which strategies are clinically useful, (ii) what is the best available decision strategy, (iii) which of two competing strategies is better, and (iv) what is the expected net benefit loss associated with the current level of uncertainty. While often consistent with frequentist point estimates, fully Bayesian DCA allows for an intuitive probabilistic interpretation framework and the incorporation of prior evidence. We evaluate the methods using simulation and provide a comprehensive case study. Software implementation is available in the bayesDCA R package. Ultimately, the Bayesian DCA workflow may help clinicians and health policymakers adopt better-informed decisions.

临床决策通常以临床预测模型或诊断试验为指导。决策曲线分析(DCA)将经典的预测性能评估与使用这些策略进行临床决策的结果相结合。在DCA中,最佳决策策略是使净收益最大化的策略:即给定策略提供的真正积极(或消极)的净数量。在这里,我们采用贝叶斯方法进行DCA,解决了评估临床决策策略时的四个基本问题:(i)哪些策略在临床上有用,(ii)什么是最好的决策策略,(iii)两种竞争策略中哪一种更好,以及(iv)与当前不确定性水平相关的预期净收益损失是什么。虽然通常与频率点估计一致,但完全贝叶斯DCA允许直观的概率解释框架和先前证据的结合。我们使用模拟来评估这些方法,并提供一个全面的案例研究。软件实现可在bayesDCA R包中获得。最终,贝叶斯DCA工作流可以帮助临床医生和卫生决策者采取更明智的决策。
{"title":"Bayesian Decision Curve Analysis With Bayesdca.","authors":"Giuliano Netto Flores Cruz, Keegan Korthauer","doi":"10.1002/sim.10277","DOIUrl":"10.1002/sim.10277","url":null,"abstract":"<p><p>Clinical decisions are often guided by clinical prediction models or diagnostic tests. Decision curve analysis (DCA) combines classical assessment of predictive performance with the consequences of using these strategies for clinical decision-making. In DCA, the best decision strategy is the one that maximizes the net benefit: the net number of true positives (or negatives) provided by a given strategy. Here, we employ Bayesian approaches to DCA, addressing four fundamental concerns when evaluating clinical decision strategies: (i) which strategies are clinically useful, (ii) what is the best available decision strategy, (iii) which of two competing strategies is better, and (iv) what is the expected net benefit loss associated with the current level of uncertainty. While often consistent with frequentist point estimates, fully Bayesian DCA allows for an intuitive probabilistic interpretation framework and the incorporation of prior evidence. We evaluate the methods using simulation and provide a comprehensive case study. Software implementation is available in the bayesDCA R package. Ultimately, the Bayesian DCA workflow may help clinicians and health policymakers adopt better-informed decisions.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":" ","pages":"6042-6058"},"PeriodicalIF":1.8,"publicationDate":"2024-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11639651/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142772448","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Sensitivity Analysis for Effects of Multiple Exposures in the Presence of Unmeasured Confounding: Non-Gaussian and Time-to-Event Outcomes. 存在未测量混杂因素时多重暴露影响的敏感性分析:非高斯和事件时间结果。
IF 1.8 4区 医学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-12-30 Epub Date: 2024-12-01 DOI: 10.1002/sim.10293
Seungjae Lee, Boram Jeong, Donghwan Lee, Woojoo Lee

In epidemiological studies, evaluating the health impacts stemming from multiple exposures is one of the important goals. To analyze the effects of multiple exposures on discrete or time-to-event health outcomes, researchers often employ generalized linear models, Cox proportional hazards models, and machine learning methods. However, observational studies are prone to unmeasured confounding factors, which can introduce the potential for substantial bias in the multiple exposure effects. To address this issue, we propose a novel outcome model-based sensitivity analysis method for non-Gaussian and time-to-event outcomes with multiple exposures. All the proposed sensitivity analysis problems are formulated as linear programming problems with quadratic and linear constraints, which can be solved efficiently. Analytic solutions are provided for some optimization problems, and a numerical study is performed to examine how the proposed sensitivity analysis behaves in finite samples. We illustrate the proposed method using two real data examples.

在流行病学研究中,评估多重接触对健康的影响是重要目标之一。为了分析多次暴露对离散或时间-事件健康结果的影响,研究人员通常采用广义线性模型、Cox比例风险模型和机器学习方法。然而,观察性研究容易出现无法测量的混杂因素,这可能会在多重暴露效应中引入潜在的重大偏差。为了解决这个问题,我们提出了一种新的基于结果模型的灵敏度分析方法,用于多次暴露的非高斯和事件时间结果。所提出的灵敏度分析问题都被表述为具有二次约束和线性约束的线性规划问题,可以有效地求解。给出了一些优化问题的解析解,并对所提出的灵敏度分析方法在有限样本下的表现进行了数值研究。我们用两个真实的数据例子来说明所提出的方法。
{"title":"Sensitivity Analysis for Effects of Multiple Exposures in the Presence of Unmeasured Confounding: Non-Gaussian and Time-to-Event Outcomes.","authors":"Seungjae Lee, Boram Jeong, Donghwan Lee, Woojoo Lee","doi":"10.1002/sim.10293","DOIUrl":"10.1002/sim.10293","url":null,"abstract":"<p><p>In epidemiological studies, evaluating the health impacts stemming from multiple exposures is one of the important goals. To analyze the effects of multiple exposures on discrete or time-to-event health outcomes, researchers often employ generalized linear models, Cox proportional hazards models, and machine learning methods. However, observational studies are prone to unmeasured confounding factors, which can introduce the potential for substantial bias in the multiple exposure effects. To address this issue, we propose a novel outcome model-based sensitivity analysis method for non-Gaussian and time-to-event outcomes with multiple exposures. All the proposed sensitivity analysis problems are formulated as linear programming problems with quadratic and linear constraints, which can be solved efficiently. Analytic solutions are provided for some optimization problems, and a numerical study is performed to examine how the proposed sensitivity analysis behaves in finite samples. We illustrate the proposed method using two real data examples.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":" ","pages":"5996-6025"},"PeriodicalIF":1.8,"publicationDate":"2024-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142772469","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Double Sampling for Informatively Missing Data in Electronic Health Record-Based Comparative Effectiveness Research. 基于电子健康记录的比较有效性研究中信息缺失数据的双重抽样。
IF 1.8 4区 医学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-12-30 Epub Date: 2024-12-05 DOI: 10.1002/sim.10298
Alexander W Levis, Rajarshi Mukherjee, Rui Wang, Heidi Fischer, Sebastien Haneuse

Missing data arise in most applied settings and are ubiquitous in electronic health records (EHR). When data are missing not at random (MNAR) with respect to measured covariates, sensitivity analyses are often considered. These solutions, however, are often unsatisfying in that they are not guaranteed to yield actionable conclusions. Motivated by an EHR-based study of long-term outcomes following bariatric surgery, we consider the use of double sampling as a means to mitigate MNAR outcome data when the statistical goals are estimation and inference regarding causal effects. We describe assumptions that are sufficient for the identification of the joint distribution of confounders, treatment, and outcome under this design. Additionally, we derive efficient and robust estimators of the average causal treatment effect under a nonparametric model and under a model assuming outcomes were, in fact, initially missing at random (MAR). We compare these in simulations to an approach that adaptively estimates based on evidence of violation of the MAR assumption. Finally, we also show that the proposed double sampling design can be extended to handle arbitrary coarsening mechanisms, and derive nonparametric efficient estimators of any smooth full data functional.

在大多数应用环境中都会出现数据缺失,并且在电子健康记录(EHR)中普遍存在。当数据相对于测量的协变量是非随机缺失(MNAR)时,通常考虑敏感性分析。然而,这些解决方案往往不令人满意,因为它们不能保证产生可操作的结论。在一项基于ehr的减肥手术后长期结果研究的激励下,当统计目标是对因果效应的估计和推断时,我们考虑使用双重抽样作为减轻MNAR结果数据的手段。我们描述了在这种设计下足以识别混杂因素、治疗和结果的联合分布的假设。此外,我们在非参数模型和假设结果实际上最初随机缺失(MAR)的模型下推导出平均因果处理效应的有效和稳健估计。我们将这些模拟与基于违反MAR假设的证据自适应估计的方法进行比较。最后,我们还证明了所提出的双采样设计可以扩展到处理任意粗化机制,并推导出任意光滑全数据泛函的非参数有效估计。
{"title":"Double Sampling for Informatively Missing Data in Electronic Health Record-Based Comparative Effectiveness Research.","authors":"Alexander W Levis, Rajarshi Mukherjee, Rui Wang, Heidi Fischer, Sebastien Haneuse","doi":"10.1002/sim.10298","DOIUrl":"10.1002/sim.10298","url":null,"abstract":"<p><p>Missing data arise in most applied settings and are ubiquitous in electronic health records (EHR). When data are missing not at random (MNAR) with respect to measured covariates, sensitivity analyses are often considered. These solutions, however, are often unsatisfying in that they are not guaranteed to yield actionable conclusions. Motivated by an EHR-based study of long-term outcomes following bariatric surgery, we consider the use of double sampling as a means to mitigate MNAR outcome data when the statistical goals are estimation and inference regarding causal effects. We describe assumptions that are sufficient for the identification of the joint distribution of confounders, treatment, and outcome under this design. Additionally, we derive efficient and robust estimators of the average causal treatment effect under a nonparametric model and under a model assuming outcomes were, in fact, initially missing at random (MAR). We compare these in simulations to an approach that adaptively estimates based on evidence of violation of the MAR assumption. Finally, we also show that the proposed double sampling design can be extended to handle arbitrary coarsening mechanisms, and derive nonparametric efficient estimators of any smooth full data functional.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":" ","pages":"6086-6098"},"PeriodicalIF":1.8,"publicationDate":"2024-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11639654/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142786604","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Dynamic Treatment Regimes on Dyadic Networks. 二元网络上的动态处理机制。
IF 1.8 4区 医学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-12-30 Epub Date: 2024-11-28 DOI: 10.1002/sim.10278
Marizeh Mussavi Rizi, Joel A Dubin, Micheal P Wallace

Identifying interventions that are optimally tailored to each individual is of significant interest in various fields, in particular precision medicine. Dynamic treatment regimes (DTRs) employ sequences of decision rules that utilize individual patient information to recommend treatments. However, the assumption that an individual's treatment does not impact the outcomes of others, known as the no interference assumption, is often challenged in practical settings. For example, in infectious disease studies, the vaccine status of individuals in close proximity can influence the likelihood of infection. Imposing this assumption when it, in fact, does not hold, may lead to biased results and impact the validity of the resulting DTR optimization. We extend the estimation method of dynamic weighted ordinary least squares (dWOLS), a doubly robust and easily implemented approach for estimating optimal DTRs, to incorporate the presence of interference within dyads (i.e., pairs of individuals). We formalize an appropriate outcome model and describe the estimation of an optimal decision rule in the dyadic-network context. Through comprehensive simulations and analysis of the Population Assessment of Tobacco and Health (PATH) data, we demonstrate the improved performance of the proposed joint optimization strategy compared to the current state-of-the-art conditional optimization methods in estimating the optimal treatment assignments when within-dyad interference exists.

确定最适合每个人的干预措施在各个领域都具有重要意义,特别是精准医学。动态治疗方案(DTRs)采用一系列决策规则,利用个体患者信息来推荐治疗方案。然而,一个人的治疗不会影响其他人的结果的假设,即所谓的无干扰假设,在实际环境中经常受到挑战。例如,在传染病研究中,近距离接触的个体的疫苗状况可能影响感染的可能性。在这个假设实际上并不成立的情况下强加这个假设,可能会导致有偏差的结果,并影响结果DTR优化的有效性。我们扩展了动态加权普通最小二乘(dWOLS)的估计方法,这是一种双鲁棒且易于实现的估计最优dtr的方法,以纳入双组(即成对个体)内干扰的存在。我们形式化了一个适当的结果模型,并描述了二元网络环境下最优决策规则的估计。通过对烟草与健康人口评估(PATH)数据的综合模拟和分析,我们证明了在存在双内干扰时,与当前最先进的条件优化方法相比,所提出的联合优化策略在估计最佳处理分配方面的性能有所提高。
{"title":"Dynamic Treatment Regimes on Dyadic Networks.","authors":"Marizeh Mussavi Rizi, Joel A Dubin, Micheal P Wallace","doi":"10.1002/sim.10278","DOIUrl":"10.1002/sim.10278","url":null,"abstract":"<p><p>Identifying interventions that are optimally tailored to each individual is of significant interest in various fields, in particular precision medicine. Dynamic treatment regimes (DTRs) employ sequences of decision rules that utilize individual patient information to recommend treatments. However, the assumption that an individual's treatment does not impact the outcomes of others, known as the no interference assumption, is often challenged in practical settings. For example, in infectious disease studies, the vaccine status of individuals in close proximity can influence the likelihood of infection. Imposing this assumption when it, in fact, does not hold, may lead to biased results and impact the validity of the resulting DTR optimization. We extend the estimation method of dynamic weighted ordinary least squares (dWOLS), a doubly robust and easily implemented approach for estimating optimal DTRs, to incorporate the presence of interference within dyads (i.e., pairs of individuals). We formalize an appropriate outcome model and describe the estimation of an optimal decision rule in the dyadic-network context. Through comprehensive simulations and analysis of the Population Assessment of Tobacco and Health (PATH) data, we demonstrate the improved performance of the proposed joint optimization strategy compared to the current state-of-the-art conditional optimization methods in estimating the optimal treatment assignments when within-dyad interference exists.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":" ","pages":"5944-5967"},"PeriodicalIF":1.8,"publicationDate":"2024-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11639660/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142751738","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Transformed ROC Curve for Biomarker Evaluation. 生物标记物评估的转换 ROC 曲线
IF 1.8 4区 医学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-12-30 Epub Date: 2024-11-12 DOI: 10.1002/sim.10268
Jianping Yang, Pei-Fen Kuan, Xiangyu Li, Jialiang Li, Xiao-Hua Zhou

To complement the conventional area under the ROC curve (AUC) which cannot fully describe the diagnostic accuracy of some non-standard biomarkers, we introduce a transformed ROC curve and its associated transformed AUC (TAUC) in this article, and show that TAUC can relate the original improper biomarker to a proper biomarker after a non-monotone transformation. We then provide nonparametric estimation of the non-monotone transformation and TAUC, and establish their consistency and asymptotic normality. We conduct extensive simulation studies to assess the performance of the proposed TAUC method and compare with the traditional methods. Case studies on real biomedical data are provided to illustrate the proposed TAUC method. We are able to identify more important biomarkers that tend to escape the traditional screening method.

传统的 ROC 曲线下面积(AUC)不能完全描述某些非标准生物标记物的诊断准确性,为了对其进行补充,我们在本文中引入了转化 ROC 曲线及其相关的转化 AUC(TAUC),并证明 TAUC 可以将原始的不恰当生物标记物与经过非单调转化后的恰当生物标记物联系起来。然后,我们对非单调变换和 TAUC 进行了非参数估计,并确定了它们的一致性和渐近正态性。我们进行了广泛的模拟研究,以评估所提出的 TAUC 方法的性能,并与传统方法进行比较。我们还提供了真实生物医学数据的案例研究,以说明所提出的 TAUC 方法。我们能够识别出传统筛选方法往往无法识别的更重要的生物标志物。
{"title":"Transformed ROC Curve for Biomarker Evaluation.","authors":"Jianping Yang, Pei-Fen Kuan, Xiangyu Li, Jialiang Li, Xiao-Hua Zhou","doi":"10.1002/sim.10268","DOIUrl":"10.1002/sim.10268","url":null,"abstract":"<p><p>To complement the conventional area under the ROC curve (AUC) which cannot fully describe the diagnostic accuracy of some non-standard biomarkers, we introduce a transformed ROC curve and its associated transformed AUC (TAUC) in this article, and show that TAUC can relate the original improper biomarker to a proper biomarker after a non-monotone transformation. We then provide nonparametric estimation of the non-monotone transformation and TAUC, and establish their consistency and asymptotic normality. We conduct extensive simulation studies to assess the performance of the proposed TAUC method and compare with the traditional methods. Case studies on real biomedical data are provided to illustrate the proposed TAUC method. We are able to identify more important biomarkers that tend to escape the traditional screening method.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":" ","pages":"5681-5697"},"PeriodicalIF":1.8,"publicationDate":"2024-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142628004","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Shape Mediation Analysis in Alzheimer's Disease Studies. 阿尔茨海默病研究中的形状中介分析。
IF 1.8 4区 医学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-12-30 Epub Date: 2024-11-12 DOI: 10.1002/sim.10265
Xingcai Zhou, Miyeon Yeon, Jiangyan Wang, Shengxian Ding, Kaizhou Lei, Yanyong Zhao, Rongjie Liu, Chao Huang

As a crucial tool in neuroscience, mediation analysis has been developed and widely adopted to elucidate the role of intermediary variables derived from neuroimaging data. Typically, structural equation models (SEMs) are employed to investigate the influences of exposures on outcomes, with model coefficients being interpreted as causal effects. While existing SEMs have proven to be effective tools for mediation analysis involving various neuroimaging-related mediators, limited research has explored scenarios where these mediators are derived from the shape space. In addition, the linear relationship assumption adopted in existing SEMs may lead to substantial efficiency losses and decreased predictive accuracy in real-world applications. To address these challenges, we introduce a novel framework for shape mediation analysis, designed to explore the causal relationships between genetic exposures and clinical outcomes, whether mediated or unmediated by shape-related factors while accounting for potential confounding variables. Within our framework, we apply the square-root velocity function to extract elastic shape representations, which reside within the linear Hilbert space of square-integrable functions. Subsequently, we introduce a two-layer shape regression model to characterize the relationships among neurocognitive outcomes, elastic shape mediators, genetic exposures, and clinical confounders. Both estimation and inference procedures are established for unknown parameters along with the corresponding causal estimands. The asymptotic properties of estimated quantities are investigated as well. Both simulated studies and real-data analyses demonstrate the superior performance of our proposed method in terms of estimation accuracy and robustness when compared to existing approaches for estimating causal estimands.

作为神经科学的重要工具,中介分析已被开发并广泛采用,以阐明从神经影像数据中得出的中间变量的作用。通常情况下,采用结构方程模型(SEM)来研究暴露因素对结果的影响,并将模型系数解释为因果效应。虽然现有的 SEM 已被证明是涉及各种神经影像相关中介因子的中介分析的有效工具,但对这些中介因子来自形状空间的情景的探索却很有限。此外,现有 SEM 采用的线性关系假设可能会导致实际应用中的效率损失和预测准确性降低。为了应对这些挑战,我们引入了一种新的形状中介分析框架,旨在探索遗传暴露与临床结果之间的因果关系,无论是否由形状相关因素中介,同时考虑潜在的混杂变量。在我们的框架中,我们应用平方根速度函数来提取弹性形状表征,这些表征位于平方可积分函数的线性希尔伯特空间中。随后,我们引入了一个双层形状回归模型来描述神经认知结果、弹性形状介导因素、遗传暴露和临床混杂因素之间的关系。我们为未知参数和相应的因果估计值建立了估计和推理程序。此外,还研究了估计量的渐近特性。模拟研究和真实数据分析都表明,与现有的因果估计方法相比,我们提出的方法在估计准确性和稳健性方面都有卓越表现。
{"title":"Shape Mediation Analysis in Alzheimer's Disease Studies.","authors":"Xingcai Zhou, Miyeon Yeon, Jiangyan Wang, Shengxian Ding, Kaizhou Lei, Yanyong Zhao, Rongjie Liu, Chao Huang","doi":"10.1002/sim.10265","DOIUrl":"10.1002/sim.10265","url":null,"abstract":"<p><p>As a crucial tool in neuroscience, mediation analysis has been developed and widely adopted to elucidate the role of intermediary variables derived from neuroimaging data. Typically, structural equation models (SEMs) are employed to investigate the influences of exposures on outcomes, with model coefficients being interpreted as causal effects. While existing SEMs have proven to be effective tools for mediation analysis involving various neuroimaging-related mediators, limited research has explored scenarios where these mediators are derived from the shape space. In addition, the linear relationship assumption adopted in existing SEMs may lead to substantial efficiency losses and decreased predictive accuracy in real-world applications. To address these challenges, we introduce a novel framework for shape mediation analysis, designed to explore the causal relationships between genetic exposures and clinical outcomes, whether mediated or unmediated by shape-related factors while accounting for potential confounding variables. Within our framework, we apply the square-root velocity function to extract elastic shape representations, which reside within the linear Hilbert space of square-integrable functions. Subsequently, we introduce a two-layer shape regression model to characterize the relationships among neurocognitive outcomes, elastic shape mediators, genetic exposures, and clinical confounders. Both estimation and inference procedures are established for unknown parameters along with the corresponding causal estimands. The asymptotic properties of estimated quantities are investigated as well. Both simulated studies and real-data analyses demonstrate the superior performance of our proposed method in terms of estimation accuracy and robustness when compared to existing approaches for estimating causal estimands.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":" ","pages":"5698-5710"},"PeriodicalIF":1.8,"publicationDate":"2024-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142628013","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Statistics in Medicine
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1