Statistics in Medicine最新文献_第6页

Bayesian Empirical Likelihood Regression for Semiparametric Estimation of Optimal Dynamic Treatment Regimes. 贝叶斯经验似然回归半参数估计最佳动态治疗方案。

IF 1.8 4区医学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Statistics in Medicine

Pub Date : 2024-12-10 Epub Date: 2024-10-24 DOI: 10.1002/sim.10251

Weichang Yu, Howard Bondell

We propose a semiparametric approach to Bayesian modeling of dynamic treatment regimes that is built on a Bayesian likelihood-based regression estimation framework. Methods based on this framework exhibit a probabilistic coherence property that leads to accurate estimation of the optimal dynamic treatment regime. Unlike most Bayesian estimation methods, our proposed method avoids strong distributional assumptions for the intermediate and final outcomes by utilizing empirical likelihoods. Our proposed method allows for either linear, or more flexible forms of mean functions for the stagewise outcomes. A variational Bayes approximation is used for computation to avoid common pitfalls associated with Markov Chain Monte Carlo approaches coupled with empirical likelihood. Through simulations and analysis of the STAR*D sequential randomized trial data, our proposed method demonstrates superior accuracy over Q-learning and parametric Bayesian likelihood-based regression estimation, particularly when the parametric assumptions of regression error distributions may be potentially violated.

我们提出了一种基于贝叶斯似然回归估计框架的动态治疗制度贝叶斯半参数建模方法。基于该框架的方法具有概率一致性特性，能准确估计出最佳动态治疗方案。与大多数贝叶斯估计方法不同，我们提出的方法利用经验似然法，避免了对中间和最终结果的强分布假设。我们提出的方法允许对阶段性结果采用线性或更灵活的均值函数形式。计算中使用了变异贝叶斯近似法，以避免与马尔可夫链蒙特卡罗方法和经验似然法相关的常见缺陷。通过对 STAR*D 连续随机试验数据的模拟和分析，我们提出的方法比 Q-learning 和基于参数贝叶斯似然法的回归估计更准确，尤其是在可能违反回归误差分布参数假设的情况下。

引用次数: 0

A Nonparametric Global Win Probability Approach to the Analysis and Sizing of Randomized Controlled Trials With Multiple Endpoints of Different Scales and Missing Data: Beyond O'Brien-Wei-Lachin. 用非参数全局赢概率方法分析和确定具有不同尺度多终点和缺失数据的随机对照试验的规模：超越奥布莱恩-韦-拉钦。

IF 1.8 4区医学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Statistics in Medicine

Pub Date : 2024-12-10 Epub Date: 2024-10-17 DOI: 10.1002/sim.10247

Guangyong Zou, Lily Zou

Multiple primary endpoints are commonly used in randomized controlled trials to assess treatment effects. When the endpoints are measured on different scales, the O'Brien rank-sum test or the Wei-Lachin test for stochastic ordering may be used for hypothesis testing. However, the O'Brien-Wei-Lachin (OWL) approach is unable to handle missing data and adjust for baseline measurements. We present a nonparametric approach for data analysis that encompasses the OWL approach as a special case. Our approach is based on quantifying an endpoint-specific treatment effect using the probability that a participant in the treatment group has a better score than (or a win over) a participant in the control group. The average of the endpoint-specific win probabilities (WinPs), termed the global win probability (gWinP), is used to quantify the global treatment effect, with the null hypothesis gWinP = 0.50. Our approach involves converting the data for each endpoint to endpoint-specific win fractions, and modeling the win fractions using multivariate linear mixed models to obtain estimates of the endpoint-specific WinPs and the associated variance-covariance matrix. Focusing on confidence interval estimation for the gWinP, we derive sample size formulas for clinical trial design. Simulation results demonstrate that our approach performed well in terms of bias, interval coverage percentage, and assurance of achieving a pre-specified precision for the gWinP. Illustrative code for implementing the methods using SAS PROC RANK and PROC MIXED is provided.

随机对照试验通常使用多个主要终点来评估治疗效果。当终点的测量尺度不同时，可使用奥布莱恩秩和检验或随机排序的魏-拉钦检验进行假设检验。然而，O'Brien-Wei-Lachin（OWL）方法无法处理缺失数据和调整基线测量。我们提出了一种非参数数据分析方法，将 OWL 方法作为一个特例。我们的方法基于使用治疗组参与者比对照组参与者得分更高（或胜出）的概率来量化特定终点的治疗效果。终点特异性获胜概率（WinPs）的平均值称为全局获胜概率（gWinP），用于量化全局治疗效果，零假设为 gWinP = 0.50。我们的方法包括将每个终点的数据转换为终点特异性获胜分数，并使用多元线性混合模型对获胜分数进行建模，以获得终点特异性 WinPs 的估计值以及相关的方差-协方差矩阵。以 gWinP 的置信区间估计为重点，我们推导出了用于临床试验设计的样本量公式。模拟结果表明，我们的方法在偏差、区间覆盖率和保证实现 gWinP 的预设精度方面表现良好。我们还提供了使用 SAS PROC RANK 和 PROC MIXED 实现方法的示例代码。

{"title":"A Nonparametric Global Win Probability Approach to the Analysis and Sizing of Randomized Controlled Trials With Multiple Endpoints of Different Scales and Missing Data: Beyond O'Brien-Wei-Lachin.","authors":"Guangyong Zou, Lily Zou","doi":"10.1002/sim.10247","DOIUrl":"10.1002/sim.10247","url":null,"abstract":"Multiple primary endpoints are commonly used in randomized controlled trials to assess treatment effects. When the endpoints are measured on different scales, the O'Brien rank-sum test or the Wei-Lachin test for stochastic ordering may be used for hypothesis testing. However, the O'Brien-Wei-Lachin (OWL) approach is unable to handle missing data and adjust for baseline measurements. We present a nonparametric approach for data analysis that encompasses the OWL approach as a special case. Our approach is based on quantifying an endpoint-specific treatment effect using the probability that a participant in the treatment group has a better score than (or a win over) a participant in the control group. The average of the endpoint-specific win probabilities (WinPs), termed the global win probability (gWinP), is used to quantify the global treatment effect, with the null hypothesis gWinP = 0.50. Our approach involves converting the data for each endpoint to endpoint-specific win fractions, and modeling the win fractions using multivariate linear mixed models to obtain estimates of the endpoint-specific WinPs and the associated variance-covariance matrix. Focusing on confidence interval estimation for the gWinP, we derive sample size formulas for clinical trial design. Simulation results demonstrate that our approach performed well in terms of bias, interval coverage percentage, and assurance of achieving a pre-specified precision for the gWinP. Illustrative code for implementing the methods using SAS PROC RANK and PROC MIXED is provided.","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":" ","pages":"5366-5379"},"PeriodicalIF":1.8,"publicationDate":"2024-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11586912/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142475166","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

IF 1.8 4区医学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Statistics in Medicine

Pub Date : 2024-12-10 Epub Date: 2024-10-12 DOI: 10.1002/sim.10243

Kathrin Möllenhoff, Nadine Binder, Holger Dette

The identification of similar patient pathways is a crucial task in healthcare analytics. A flexible tool to address this issue are parametric competing risks models, where transition intensities may be specified by a variety of parametric distributions, thus in particular being possibly time-dependent. We assess the similarity between two such models by examining the transitions between different health states. This research introduces a method to measure the maximum differences in transition intensities over time, leading to the development of a test procedure for assessing similarity. We propose a parametric bootstrap approach for this purpose and provide a proof to confirm the validity of this procedure. The performance of our proposed method is evaluated through a simulation study, considering a range of sample sizes, differing amounts of censoring, and various thresholds for similarity. Finally, we demonstrate the practical application of our approach with a case study from urological clinical routine practice, which inspired this research.

在医疗分析中，识别类似病人的治疗路径是一项重要任务。参数竞争风险模型是解决这一问题的一个灵活工具，在该模型中，过渡强度可以由各种参数分布来指定，因此尤其可能与时间相关。我们通过研究不同健康状态之间的转变来评估两个此类模型之间的相似性。本研究介绍了一种测量随时间变化的过渡强度最大差异的方法，从而开发出一种评估相似性的测试程序。为此，我们提出了一种参数自引导方法，并提供了一个证明来证实该程序的有效性。我们通过模拟研究评估了我们提出的方法的性能，考虑了一系列样本大小、不同的删减量和各种相似性阈值。最后，我们通过泌尿科临床常规实践中的一个案例来展示我们方法的实际应用，这也是本研究的灵感来源。

引用次数: 0

Phase I/II Design for Selecting Subgroup-Specific Optimal Biological Doses for Prespecified Subgroups. 为预设亚组选择特定亚组最佳生物剂量的 I/II 期设计。

IF 1.8 4区医学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Statistics in Medicine

Pub Date : 2024-12-10 Epub Date: 2024-10-18 DOI: 10.1002/sim.10256

Sydney Porter, Thomas A Murray, Anne Eaton

We propose a phase I/II trial design to support dose-finding when the optimal biological dose (OBD) may differ in two prespecified patient subgroups. The proposed design uses a utility function to quantify efficacy-toxicity trade-offs, and a Bayesian model with spike and slab prior distributions for the subgroup effect on toxicity and efficacy to guide dosing and to facilitate identifying either subgroup-specific OBDs or a common OBD depending on the resulting trial data. In a simulation study, we find the proposed design performs nearly as well as a design that ignores subgroups when the dose-toxicity and dose-efficacy relationships are the same in both subgroups, and nearly as well as a design with independent dose-finding within each subgroup when these relationships differ across subgroups. In other words, the proposed adaptive design performs similarly to the design that would be chosen if investigators possessed foreknowledge about whether the dose-toxicity and/or dose-efficacy relationship differs across two prespecified subgroups. Thus, the proposed design may be effective for OBD selection when uncertainty exists about whether the OBD differs in two prespecified subgroups.

我们提出了一种 I/II 期试验设计，当两个预先指定的患者亚组的最佳生物剂量（OBD）可能不同时，该设计可支持剂量的确定。建议的设计使用效用函数来量化疗效-毒性权衡，并使用贝叶斯模型对亚组对毒性和疗效的影响进行尖峰和板块先验分布，以指导给药，并根据试验数据确定亚组特定的最佳生物剂量或共同的最佳生物剂量。在一项模拟研究中，我们发现当两个亚组的剂量-毒性和剂量-疗效关系相同时，所提出的设计方案与忽略亚组的设计方案效果几乎一样好；而当两个亚组的剂量-毒性和剂量-疗效关系不同时，所提出的设计方案与在每个亚组内独立寻找剂量的设计方案效果几乎一样好。换句话说，如果研究者预先知道两个预先指定的亚组之间的剂量-毒性和/或剂量-疗效关系是否存在差异，那么所建议的适应性设计的表现与所选择的设计类似。因此，当不确定在两个预先指定的亚组中 OBD 是否存在差异时，所建议的设计可能对 OBD 的选择有效。

{"title":"Phase I/II Design for Selecting Subgroup-Specific Optimal Biological Doses for Prespecified Subgroups.","authors":"Sydney Porter, Thomas A Murray, Anne Eaton","doi":"10.1002/sim.10256","DOIUrl":"10.1002/sim.10256","url":null,"abstract":"We propose a phase I/II trial design to support dose-finding when the optimal biological dose (OBD) may differ in two prespecified patient subgroups. The proposed design uses a utility function to quantify efficacy-toxicity trade-offs, and a Bayesian model with spike and slab prior distributions for the subgroup effect on toxicity and efficacy to guide dosing and to facilitate identifying either subgroup-specific OBDs or a common OBD depending on the resulting trial data. In a simulation study, we find the proposed design performs nearly as well as a design that ignores subgroups when the dose-toxicity and dose-efficacy relationships are the same in both subgroups, and nearly as well as a design with independent dose-finding within each subgroup when these relationships differ across subgroups. In other words, the proposed adaptive design performs similarly to the design that would be chosen if investigators possessed foreknowledge about whether the dose-toxicity and/or dose-efficacy relationship differs across two prespecified subgroups. Thus, the proposed design may be effective for OBD selection when uncertainty exists about whether the OBD differs in two prespecified subgroups.","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":" ","pages":"5401-5411"},"PeriodicalIF":1.8,"publicationDate":"2024-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11586896/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142475170","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Functional Principal Component Analysis for Continuous Non-Gaussian, Truncated, and Discrete Functional Data. 连续非高斯、截断和离散函数数据的函数主成分分析。

IF 1.8 4区医学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Statistics in Medicine

Pub Date : 2024-12-10 Epub Date: 2024-10-23 DOI: 10.1002/sim.10240

Debangan Dey, Rahul Ghosal, Kathleen Merikangas, Vadim Zipunnikov

Mobile health studies often collect multiple within-day self-reported assessments of participants' behavior and well-being on different scales such as physical activity (continuous scale), pain levels (truncated scale), mood states (ordinal scale), and the occurrence of daily life events (binary scale). These assessments, when indexed by time of day, can be treated and analyzed as functional data corresponding to their respective types: continuous, truncated, ordinal, and binary. Motivated by these examples, we develop a functional principal component analysis that deals with all four types of functional data in a unified manner. It employs a semiparametric Gaussian copula model, assuming a generalized latent non-paranormal process as the underlying generating mechanism for these four types of functional data. We specify latent temporal dependence using a covariance estimated through Kendall's $τ$ bridging method, incorporating smoothness in the bridging process. The approach is then extended with methods for handling both dense and sparse sampling designs, calculating subject-specific latent representations of observed data, latent principal components and principal component scores. Simulation studies demonstrate the method's competitive performance under both dense and sparse sampling designs. The method is applied to data from 497 participants in the National Institute of Mental Health Family Study of Mood Spectrum Disorders to characterize differences in within-day temporal patterns of mood in individuals with the major mood disorder subtypes, including Major Depressive Disorder and Type 1 and 2 Bipolar Disorder. Software implementation of the proposed method is provided in an R-package.

移动健康研究通常会收集参与者在一天内对自己的行为和健康状况进行的多种自我报告评估，这些评估涉及不同的量表，如体力活动（连续量表）、疼痛程度（截断量表）、情绪状态（序数量表）和日常生活事件的发生（二元量表）。这些评估以一天中的时间为索引，可作为功能数据进行处理和分析，并与各自的类型（连续量表、截断量表、序数量表和二进制量表）相对应。受这些例子的启发，我们开发了一种功能主成分分析法，以统一的方式处理所有四种类型的功能数据。它采用半参数高斯共轭模型，假定一个广义的潜在非正态过程是这四类函数数据的基本生成机制。我们使用通过 Kendall's τ$ tau$ 桥接方法估算的协方差来指定潜在的时间依赖性，并将平稳性纳入桥接过程。然后，该方法通过处理密集和稀疏抽样设计的方法进行了扩展，计算了观察数据的特定主体潜在表示、潜在主成分和主成分得分。模拟研究证明了该方法在密集和稀疏抽样设计下都具有很强的竞争力。该方法应用于美国国家心理健康研究所情绪谱系障碍家庭研究的 497 名参与者的数据，以描述主要情绪障碍亚型（包括重度抑郁障碍和 1 型和 2 型双相情感障碍）患者的日内情绪时间模式差异。建议方法的软件实现以 R 软件包的形式提供。

{"title":"Functional Principal Component Analysis for Continuous Non-Gaussian, Truncated, and Discrete Functional Data.","authors":"Debangan Dey, Rahul Ghosal, Kathleen Merikangas, Vadim Zipunnikov","doi":"10.1002/sim.10240","DOIUrl":"10.1002/sim.10240","url":null,"abstract":"Mobile health studies often collect multiple within-day self-reported assessments of participants' behavior and well-being on different scales such as physical activity (continuous scale), pain levels (truncated scale), mood states (ordinal scale), and the occurrence of daily life events (binary scale). These assessments, when indexed by time of day, can be treated and analyzed as functional data corresponding to their respective types: continuous, truncated, ordinal, and binary. Motivated by these examples, we develop a functional principal component analysis that deals with all four types of functional data in a unified manner. It employs a semiparametric Gaussian copula model, assuming a generalized latent non-paranormal process as the underlying generating mechanism for these four types of functional data. We specify latent temporal dependence using a covariance estimated through Kendall's <math> <semantics><mrow><mi>τ</mi></mrow> <annotation>$$ tau $$</annotation></semantics> </math> bridging method, incorporating smoothness in the bridging process. The approach is then extended with methods for handling both dense and sparse sampling designs, calculating subject-specific latent representations of observed data, latent principal components and principal component scores. Simulation studies demonstrate the method's competitive performance under both dense and sparse sampling designs. The method is applied to data from 497 participants in the National Institute of Mental Health Family Study of Mood Spectrum Disorders to characterize differences in within-day temporal patterns of mood in individuals with the major mood disorder subtypes, including Major Depressive Disorder and Type 1 and 2 Bipolar Disorder. Software implementation of the proposed method is provided in an R-package.","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":" ","pages":"5431-5445"},"PeriodicalIF":1.8,"publicationDate":"2024-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11586909/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142508368","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Multivariate Cluster Point Process to Quantify and Explore Multi-Entity Configurations: Application to Biofilm Image Data. 量化和探索多实体配置的多变量聚类点过程：生物膜图像数据的应用。

IF 1.8 4区医学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Statistics in Medicine

Pub Date : 2024-12-10 Epub Date: 2024-10-24 DOI: 10.1002/sim.10261

Suman Majumder, Brent A Coull, Jessica L Mark Welch, Patrick J La Riviere, Floyd E Dewhirst, Jacqueline R Starr, Kyu Ha Lee

Clusters of similar or dissimilar objects are encountered in many fields. Frequently used approaches treat each cluster's central object as latent. Yet, often objects of one or more types cluster around objects of another type. Such arrangements are common in biomedical images of cells, in which nearby cell types likely interact. Quantifying spatial relationships may elucidate biological mechanisms. Parent-offspring statistical frameworks can be usefully applied even when central objects ("parents") differ from peripheral ones ("offspring"). We propose the novel multivariate cluster point process (MCPP) to quantify multi-object (e.g., multi-cellular) arrangements. Unlike commonly used approaches, the MCPP exploits locations of the central parent object in clusters. It accounts for possibly multilayered, multivariate clustering. The model formulation requires specification of which object types function as cluster centers and which reside peripherally. If such information is unknown, the relative roles of object types may be explored by comparing fit of different models via the deviance information criterion (DIC). In simulated data, we compared a series of models' DIC; the MCPP correctly identified simulated relationships. It also produced more accurate and precise parameter estimates than the classical univariate Neyman-Scott process model. We also used the MCPP to quantify proposed configurations and explore new ones in human dental plaque biofilm image data. MCPP models quantified simultaneous clustering of Streptococcus and Porphyromonas around Corynebacterium and of Pasteurellaceae around Streptococcus and successfully captured hypothesized structures for all taxa. Further exploration suggested the presence of clustering between Fusobacterium and Leptotrichia, a previously unreported relationship.

在许多领域都会遇到相似或不相似的对象集群。常用的方法是将每个聚类的中心对象视为潜在对象。然而，一种或多种类型的物体往往会聚集在另一种类型的物体周围。这种排列在细胞的生物医学图像中很常见，其中附近的细胞类型可能会相互作用。量化空间关系可以阐明生物机制。即使中心对象（"亲代"）与外围对象（"子代"）不同，亲代-子代统计框架也能有效应用。我们提出了新颖的多变量聚类点过程（MCPP）来量化多对象（如多细胞）排列。与常用方法不同的是，MCPP 利用了簇中中心父对象的位置。它考虑到了可能的多层次、多变量聚类。模型表述需要说明哪些对象类型作为聚类中心，哪些位于外围。如果此类信息未知，则可通过偏差信息准则（DIC）比较不同模型的拟合度，从而探索对象类型的相对作用。在模拟数据中，我们比较了一系列模型的 DIC；MCPP 能正确识别模拟关系。与经典的单变量 Neyman-Scott 过程模型相比，它还能产生更准确、更精确的参数估计。我们还使用 MCPP 对人类牙菌斑生物膜图像数据中的拟议配置进行量化，并探索新的配置。MCPP 模型量化了链球菌和卟啉单胞菌围绕棒状杆菌以及巴斯德菌科围绕链球菌的同时聚类，并成功捕捉到了所有类群的假设结构。进一步的探索表明，镰刀菌和钩端螺旋体之间存在聚类关系，这种关系以前从未报道过。

{"title":"Multivariate Cluster Point Process to Quantify and Explore Multi-Entity Configurations: Application to Biofilm Image Data.","authors":"Suman Majumder, Brent A Coull, Jessica L Mark Welch, Patrick J La Riviere, Floyd E Dewhirst, Jacqueline R Starr, Kyu Ha Lee","doi":"10.1002/sim.10261","DOIUrl":"10.1002/sim.10261","url":null,"abstract":"Clusters of similar or dissimilar objects are encountered in many fields. Frequently used approaches treat each cluster's central object as latent. Yet, often objects of one or more types cluster around objects of another type. Such arrangements are common in biomedical images of cells, in which nearby cell types likely interact. Quantifying spatial relationships may elucidate biological mechanisms. Parent-offspring statistical frameworks can be usefully applied even when central objects (\"parents\") differ from peripheral ones (\"offspring\"). We propose the novel multivariate cluster point process (MCPP) to quantify multi-object (e.g., multi-cellular) arrangements. Unlike commonly used approaches, the MCPP exploits locations of the central parent object in clusters. It accounts for possibly multilayered, multivariate clustering. The model formulation requires specification of which object types function as cluster centers and which reside peripherally. If such information is unknown, the relative roles of object types may be explored by comparing fit of different models via the deviance information criterion (DIC). In simulated data, we compared a series of models' DIC; the MCPP correctly identified simulated relationships. It also produced more accurate and precise parameter estimates than the classical univariate Neyman-Scott process model. We also used the MCPP to quantify proposed configurations and explore new ones in human dental plaque biofilm image data. MCPP models quantified simultaneous clustering of Streptococcus and Porphyromonas around Corynebacterium and of Pasteurellaceae around Streptococcus and successfully captured hypothesized structures for all taxa. Further exploration suggested the presence of clustering between Fusobacterium and Leptotrichia, a previously unreported relationship.","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":" ","pages":"5446-5460"},"PeriodicalIF":1.8,"publicationDate":"2024-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142508280","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Deep Neural Network-Based Accelerated Failure Time Models Using Rank Loss. 基于深度神经网络的秩损失加速故障时间模型。

IF 1.8 4区医学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Statistics in Medicine

Pub Date : 2024-12-10 Epub Date: 2024-10-12 DOI: 10.1002/sim.10235

Gwangsu Kim, Jeongho Park, Sangwook Kang

An accelerated failure time (AFT) model assumes a log-linear relationship between failure times and a set of covariates. In contrast to other popular survival models that work on hazard functions, the effects of covariates are directly on failure times, the interpretation of which is intuitive. The semiparametric AFT model that does not specify the error distribution is sufficiently flexible and robust to depart from the distributional assumption. Owing to its desirable features, this class of model has been considered a promising alternative to the popular Cox model in the analysis of censored failure time data. However, in these AFT models, a linear predictor for the mean is typically assumed. Little research has addressed the non-linearity of predictors when modeling the mean. Deep neural networks (DNNs) have received much attention over the past few decades and have achieved remarkable success in a variety of fields. DNNs have a number of notable advantages and have been shown to be particularly useful in addressing non-linearity. Here, we propose applying a DNN to fit AFT models using Gehan-type loss combined with a sub-sampling technique. Finite sample properties of the proposed DNN and rank-based AFT model (DeepR-AFT) were investigated via an extensive simulation study. The DeepR-AFT model showed superior performance over its parametric and semiparametric counterparts when the predictor was nonlinear. For linear predictors, DeepR-AFT performed better when the dimensions of the covariates were large. The superior performance of the proposed DeepR-AFT was demonstrated using three real datasets.

加速故障时间（AFT）模型假定故障时间与一组协变量之间存在对数线性关系。与其他基于危险函数的流行生存模型不同，协变量的影响直接作用于失效时间，其解释非常直观。不指定误差分布的半参数 AFT 模型具有足够的灵活性和稳健性，可以偏离分布假设。由于其理想特性，这类模型被认为是分析删失失效时间数据时流行的 Cox 模型的理想替代模型。然而，在这些 AFT 模型中，通常假定平均值为线性预测。在对平均值建模时，很少有研究涉及预测因子的非线性问题。过去几十年来，深度神经网络（DNN）受到了广泛关注，并在多个领域取得了显著成就。DNNs 有许多显著的优势，在解决非线性问题方面尤其有用。在此，我们建议使用 DNN 来拟合 AFT 模型，将 Gehan 型损失与子采样技术相结合。通过广泛的模拟研究，我们考察了所提出的 DNN 和基于秩的 AFT 模型（DeepR-AFT）的有限样本特性。当预测因子为非线性时，DeepR-AFT 模型的性能优于参数和半参数模型。对于线性预测因子，当协变量的维度较大时，DeepR-AFT 的性能更好。利用三个真实数据集证明了所提出的 DeepR-AFT 的卓越性能。

{"title":"Deep Neural Network-Based Accelerated Failure Time Models Using Rank Loss.","authors":"Gwangsu Kim, Jeongho Park, Sangwook Kang","doi":"10.1002/sim.10235","DOIUrl":"10.1002/sim.10235","url":null,"abstract":"An accelerated failure time (AFT) model assumes a log-linear relationship between failure times and a set of covariates. In contrast to other popular survival models that work on hazard functions, the effects of covariates are directly on failure times, the interpretation of which is intuitive. The semiparametric AFT model that does not specify the error distribution is sufficiently flexible and robust to depart from the distributional assumption. Owing to its desirable features, this class of model has been considered a promising alternative to the popular Cox model in the analysis of censored failure time data. However, in these AFT models, a linear predictor for the mean is typically assumed. Little research has addressed the non-linearity of predictors when modeling the mean. Deep neural networks (DNNs) have received much attention over the past few decades and have achieved remarkable success in a variety of fields. DNNs have a number of notable advantages and have been shown to be particularly useful in addressing non-linearity. Here, we propose applying a DNN to fit AFT models using Gehan-type loss combined with a sub-sampling technique. Finite sample properties of the proposed DNN and rank-based AFT model (DeepR-AFT) were investigated via an extensive simulation study. The DeepR-AFT model showed superior performance over its parametric and semiparametric counterparts when the predictor was nonlinear. For linear predictors, DeepR-AFT performed better when the dimensions of the covariates were large. The superior performance of the proposed DeepR-AFT was demonstrated using three real datasets.","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":" ","pages":"5331-5343"},"PeriodicalIF":1.8,"publicationDate":"2024-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142475169","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Novel Bayesian Spatio-Temporal Surveillance Metric to Predict Emerging Infectious Disease Areas of High Disease Risk. 预测新发传染病高风险地区的新型贝叶斯时空监测指标。

IF 1.8 4区医学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Statistics in Medicine

Pub Date : 2024-12-10 Epub Date: 2024-10-10 DOI: 10.1002/sim.10227

Joanne Kim, Andrew B Lawson, Brian Neelon, Jeffrey E Korte, Jan M Eberth, Gerardo Chowell

Identification of areas of high disease risk has been one of the top goals for infectious disease public health surveillance. Accurate prediction of these regions leads to effective resource allocation and faster intervention. This paper proposes a novel prediction surveillance metric based on a Bayesian spatio-temporal model for infectious disease outbreaks. Exceedance probability, which has been commonly used for cluster detection in statistical epidemiology, was extended to predict areas of high risk. The proposed metric consists of three components: the area's risk profile, temporal risk trend, and spatial neighborhood influence. We also introduce a weighting scheme to balance these three components, which accommodates the characteristics of the infectious disease outbreak, spatial properties, and disease trends. Thorough simulation studies were conducted to identify the optimal weighting scheme and evaluate the performance of the proposed prediction surveillance metric. Results indicate that the area's own risk and the neighborhood influence play an important role in making a highly sensitive metric, and the risk trend term is important for the specificity and accuracy of prediction. The proposed prediction metric was applied to the COVID-19 case data of South Carolina from March 12, 2020, and the subsequent 30 weeks of data.

确定疾病高风险地区一直是传染病公共卫生监测的首要目标之一。对这些区域的准确预测有助于有效的资源分配和更快的干预。本文提出了一种基于贝叶斯时空模型的新型传染病爆发预测监测指标。统计流行病学中常用于集群检测的超常概率被扩展用于预测高风险区域。所提出的指标包括三个部分：地区风险概况、时间风险趋势和空间邻域影响。我们还引入了一个加权方案来平衡这三个部分，该方案考虑到了传染病爆发的特点、空间属性和疾病趋势。我们进行了彻底的模拟研究，以确定最佳加权方案，并评估所提出的预测监控指标的性能。结果表明，区域自身的风险和邻近地区的影响对制定高灵敏度的指标起着重要作用，而风险趋势项对预测的特异性和准确性也很重要。所提出的预测指标被应用于南卡罗来纳州 2020 年 3 月 12 日的 COVID-19 病例数据及其后 30 周的数据。

{"title":"A Novel Bayesian Spatio-Temporal Surveillance Metric to Predict Emerging Infectious Disease Areas of High Disease Risk.","authors":"Joanne Kim, Andrew B Lawson, Brian Neelon, Jeffrey E Korte, Jan M Eberth, Gerardo Chowell","doi":"10.1002/sim.10227","DOIUrl":"10.1002/sim.10227","url":null,"abstract":"Identification of areas of high disease risk has been one of the top goals for infectious disease public health surveillance. Accurate prediction of these regions leads to effective resource allocation and faster intervention. This paper proposes a novel prediction surveillance metric based on a Bayesian spatio-temporal model for infectious disease outbreaks. Exceedance probability, which has been commonly used for cluster detection in statistical epidemiology, was extended to predict areas of high risk. The proposed metric consists of three components: the area's risk profile, temporal risk trend, and spatial neighborhood influence. We also introduce a weighting scheme to balance these three components, which accommodates the characteristics of the infectious disease outbreak, spatial properties, and disease trends. Thorough simulation studies were conducted to identify the optimal weighting scheme and evaluate the performance of the proposed prediction surveillance metric. Results indicate that the area's own risk and the neighborhood influence play an important role in making a highly sensitive metric, and the risk trend term is important for the specificity and accuracy of prediction. The proposed prediction metric was applied to the COVID-19 case data of South Carolina from March 12, 2020, and the subsequent 30 weeks of data.","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":" ","pages":"5300-5315"},"PeriodicalIF":1.8,"publicationDate":"2024-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11586904/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142393470","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Modeling Chronic Disease Mortality by Methods From Accelerated Life Testing. 用加速寿命测试方法建立慢性病死亡率模型

IF 1.8 4区医学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Statistics in Medicine

Pub Date : 2024-12-10 Epub Date: 2024-10-09 DOI: 10.1002/sim.10233

Marina Zamsheva, Alexander Kluttig, Andreas Wienke, Oliver Kuss

We propose a parametric model for describing chronic disease mortality from cohort data and illustrate its use for Type 2 diabetes. The model uses ideas from accelerated life testing in reliability theory and conceptualizes the occurrence of a chronic disease as putting the observational unit to an enhanced stress level, which is supposed to shorten its lifetime. It further addresses the issue of semi-competing risk, that is, the asymmetry of death and diagnosis of disease, where the disease can be diagnosed before death, but not after. With respect to the cohort structure of the data, late entry into the cohort is taken into account and prevalent as well as incident cases inform the analysis. We finally give an extension of the model that allows age at disease diagnosis to be observed not exactly, but only partially within an interval. Model parameters can be straightforwardly estimated by Maximum Likelihood, using the assumption of a Gompertz distribution we show in a small simulation study that this works well. Data of the Cardiovascular Disease, Living and Ageing in Halle (CARLA) study, a population-based cohort in the city of Halle (Saale) in the eastern part of Germany, are used for illustration.

我们提出了一个参数模型来描述队列数据中的慢性病死亡率，并以 2 型糖尿病为例加以说明。该模型采用了可靠性理论中的加速寿命测试思想，并将慢性疾病的发生概念化为使观察单元处于更高的压力水平，从而缩短其寿命。该模型进一步解决了半竞争风险的问题，即死亡和疾病诊断的不对称性，即疾病可以在死亡前诊断出来，但不能在死亡后诊断出来。在数据的队列结构方面，考虑了晚期进入队列的情况，并将流行病例和偶发病例纳入分析。最后，我们对模型进行了扩展，允许对疾病诊断时的年龄进行非精确观测，而只是在区间内进行部分观测。模型参数可通过最大似然法直接估算，我们在一项小型模拟研究中使用了贡珀茨分布假设，结果表明这种方法效果很好。我们使用了德国东部哈勒市（萨勒州）的心血管疾病、生活和老龄化（CARLA）研究数据进行说明。

{"title":"Modeling Chronic Disease Mortality by Methods From Accelerated Life Testing.","authors":"Marina Zamsheva, Alexander Kluttig, Andreas Wienke, Oliver Kuss","doi":"10.1002/sim.10233","DOIUrl":"10.1002/sim.10233","url":null,"abstract":"We propose a parametric model for describing chronic disease mortality from cohort data and illustrate its use for Type 2 diabetes. The model uses ideas from accelerated life testing in reliability theory and conceptualizes the occurrence of a chronic disease as putting the observational unit to an enhanced stress level, which is supposed to shorten its lifetime. It further addresses the issue of semi-competing risk, that is, the asymmetry of death and diagnosis of disease, where the disease can be diagnosed before death, but not after. With respect to the cohort structure of the data, late entry into the cohort is taken into account and prevalent as well as incident cases inform the analysis. We finally give an extension of the model that allows age at disease diagnosis to be observed not exactly, but only partially within an interval. Model parameters can be straightforwardly estimated by Maximum Likelihood, using the assumption of a Gompertz distribution we show in a small simulation study that this works well. Data of the Cardiovascular Disease, Living and Ageing in Halle (CARLA) study, a population-based cohort in the city of Halle (Saale) in the eastern part of Germany, are used for illustration.","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":" ","pages":"5273-5284"},"PeriodicalIF":1.8,"publicationDate":"2024-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11586914/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142393475","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Genetic Prediction Modeling in Large Cohort Studies via Boosting Targeted Loss Functions. 通过提升目标损失函数在大型队列研究中建立遗传预测模型

IF 1.8 4区医学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Statistics in Medicine

Pub Date : 2024-12-10 Epub Date: 2024-10-23 DOI: 10.1002/sim.10249

Hannah Klinkhammer, Christian Staerk, Carlo Maj, Peter M Krawitz, Andreas Mayr

Polygenic risk scores (PRS) aim to predict a trait from genetic information, relying on common genetic variants with low to medium effect sizes. As genotype data are high-dimensional in nature, it is crucial to develop methods that can be applied to large-scale data (large $n$ and large $p$ ). Many PRS tools aggregate univariate summary statistics from genome-wide association studies into a single score. Recent advancements allow simultaneous modeling of variant effects from individual-level genotype data. In this context, we introduced snpboost, an algorithm that applies statistical boosting on individual-level genotype data to estimate PRS via multivariable regression models. By processing variants iteratively in batches, snpboost can deal with large-scale cohort data. Having solved the technical obstacles due to data dimensionality, the methodological scope can now be broadened-focusing on key objectives for the clinical application of PRS. Similar to most methods in this context, snpboost has, so far, been restricted to quantitative and binary traits. Now, we incorporate more advanced alternatives-targeted to the particular aim and outcome. Adapting the loss function extends the snpboost framework to further data situations such as time-to-event and count data. Furthermore, alternative loss functions for continuous outcomes allow us to focus not only on the mean of the conditional distribution but also on other aspects that may be more helpful in the risk stratification of individual patients and can quantify prediction uncertainty, for example, median or quantile regression. This work enhances PRS fitting across multiple model classes previously unfeasible for this data type.

多基因风险评分（PRS）的目的是从遗传信息中预测性状，依靠的是中低效应量的常见基因变异。由于基因型数据是高维数据，因此开发适用于大规模数据（大 n$ n$ 和大 p$ p$ ）的方法至关重要。许多 PRS 工具将全基因组关联研究中的单变量汇总统计汇总为一个分数。最近的研究进展允许同时对来自个体水平基因型数据的变异效应建模。在这种情况下，我们引入了 snpboost，这是一种在个体水平基因型数据上应用统计增强的算法，通过多变量回归模型来估计 PRS。通过分批迭代处理变异，snpboost 可以处理大规模队列数据。在解决了数据维度带来的技术障碍后，现在可以扩大方法论的范围，重点关注 PRS 临床应用的关键目标。与这方面的大多数方法类似，迄今为止，snpboost 也仅限于定量和二元性状。现在，我们针对特定的目标和结果采用了更先进的替代方法。调整损失函数可将 snpboost 框架扩展到更多数据情况，如时间到事件和计数数据。此外，连续结果的替代损失函数使我们不仅能关注条件分布的均值，还能关注其他方面，这些方面可能更有助于对个体患者进行风险分层，并能量化预测的不确定性，例如中位数或量子回归。这项工作增强了以前对这种数据类型不可行的多个模型类别的 PRS 拟合能力。

{"title":"Genetic Prediction Modeling in Large Cohort Studies via Boosting Targeted Loss Functions.","authors":"Hannah Klinkhammer, Christian Staerk, Carlo Maj, Peter M Krawitz, Andreas Mayr","doi":"10.1002/sim.10249","DOIUrl":"10.1002/sim.10249","url":null,"abstract":"Polygenic risk scores (PRS) aim to predict a trait from genetic information, relying on common genetic variants with low to medium effect sizes. As genotype data are high-dimensional in nature, it is crucial to develop methods that can be applied to large-scale data (large <math> <semantics><mrow><mi>n</mi></mrow> <annotation>$$ n $$</annotation></semantics> </math> and large <math> <semantics><mrow><mi>p</mi></mrow> <annotation>$$ p $$</annotation></semantics> </math> ). Many PRS tools aggregate univariate summary statistics from genome-wide association studies into a single score. Recent advancements allow simultaneous modeling of variant effects from individual-level genotype data. In this context, we introduced snpboost, an algorithm that applies statistical boosting on individual-level genotype data to estimate PRS via multivariable regression models. By processing variants iteratively in batches, snpboost can deal with large-scale cohort data. Having solved the technical obstacles due to data dimensionality, the methodological scope can now be broadened-focusing on key objectives for the clinical application of PRS. Similar to most methods in this context, snpboost has, so far, been restricted to quantitative and binary traits. Now, we incorporate more advanced alternatives-targeted to the particular aim and outcome. Adapting the loss function extends the snpboost framework to further data situations such as time-to-event and count data. Furthermore, alternative loss functions for continuous outcomes allow us to focus not only on the mean of the conditional distribution but also on other aspects that may be more helpful in the risk stratification of individual patients and can quantify prediction uncertainty, for example, median or quantile regression. This work enhances PRS fitting across multiple model classes previously unfeasible for this data type.","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":" ","pages":"5412-5430"},"PeriodicalIF":1.8,"publicationDate":"2024-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11586906/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142508279","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0