Statistics in Medicine最新文献_第7页

Genetic Prediction Modeling in Large Cohort Studies via Boosting Targeted Loss Functions. 通过提升目标损失函数在大型队列研究中建立遗传预测模型

IF 1.8 4区医学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Statistics in Medicine

Pub Date : 2024-12-10 Epub Date: 2024-10-23 DOI: 10.1002/sim.10249

Hannah Klinkhammer, Christian Staerk, Carlo Maj, Peter M Krawitz, Andreas Mayr

Polygenic risk scores (PRS) aim to predict a trait from genetic information, relying on common genetic variants with low to medium effect sizes. As genotype data are high-dimensional in nature, it is crucial to develop methods that can be applied to large-scale data (large $n$ and large $p$ ). Many PRS tools aggregate univariate summary statistics from genome-wide association studies into a single score. Recent advancements allow simultaneous modeling of variant effects from individual-level genotype data. In this context, we introduced snpboost, an algorithm that applies statistical boosting on individual-level genotype data to estimate PRS via multivariable regression models. By processing variants iteratively in batches, snpboost can deal with large-scale cohort data. Having solved the technical obstacles due to data dimensionality, the methodological scope can now be broadened-focusing on key objectives for the clinical application of PRS. Similar to most methods in this context, snpboost has, so far, been restricted to quantitative and binary traits. Now, we incorporate more advanced alternatives-targeted to the particular aim and outcome. Adapting the loss function extends the snpboost framework to further data situations such as time-to-event and count data. Furthermore, alternative loss functions for continuous outcomes allow us to focus not only on the mean of the conditional distribution but also on other aspects that may be more helpful in the risk stratification of individual patients and can quantify prediction uncertainty, for example, median or quantile regression. This work enhances PRS fitting across multiple model classes previously unfeasible for this data type.

多基因风险评分（PRS）的目的是从遗传信息中预测性状，依靠的是中低效应量的常见基因变异。由于基因型数据是高维数据，因此开发适用于大规模数据（大 n$ n$ 和大 p$ p$ ）的方法至关重要。许多 PRS 工具将全基因组关联研究中的单变量汇总统计汇总为一个分数。最近的研究进展允许同时对来自个体水平基因型数据的变异效应建模。在这种情况下，我们引入了 snpboost，这是一种在个体水平基因型数据上应用统计增强的算法，通过多变量回归模型来估计 PRS。通过分批迭代处理变异，snpboost 可以处理大规模队列数据。在解决了数据维度带来的技术障碍后，现在可以扩大方法论的范围，重点关注 PRS 临床应用的关键目标。与这方面的大多数方法类似，迄今为止，snpboost 也仅限于定量和二元性状。现在，我们针对特定的目标和结果采用了更先进的替代方法。调整损失函数可将 snpboost 框架扩展到更多数据情况，如时间到事件和计数数据。此外，连续结果的替代损失函数使我们不仅能关注条件分布的均值，还能关注其他方面，这些方面可能更有助于对个体患者进行风险分层，并能量化预测的不确定性，例如中位数或量子回归。这项工作增强了以前对这种数据类型不可行的多个模型类别的 PRS 拟合能力。

{"title":"Genetic Prediction Modeling in Large Cohort Studies via Boosting Targeted Loss Functions.","authors":"Hannah Klinkhammer, Christian Staerk, Carlo Maj, Peter M Krawitz, Andreas Mayr","doi":"10.1002/sim.10249","DOIUrl":"10.1002/sim.10249","url":null,"abstract":"Polygenic risk scores (PRS) aim to predict a trait from genetic information, relying on common genetic variants with low to medium effect sizes. As genotype data are high-dimensional in nature, it is crucial to develop methods that can be applied to large-scale data (large <math> <semantics><mrow><mi>n</mi></mrow> <annotation>$$ n $$</annotation></semantics> </math> and large <math> <semantics><mrow><mi>p</mi></mrow> <annotation>$$ p $$</annotation></semantics> </math> ). Many PRS tools aggregate univariate summary statistics from genome-wide association studies into a single score. Recent advancements allow simultaneous modeling of variant effects from individual-level genotype data. In this context, we introduced snpboost, an algorithm that applies statistical boosting on individual-level genotype data to estimate PRS via multivariable regression models. By processing variants iteratively in batches, snpboost can deal with large-scale cohort data. Having solved the technical obstacles due to data dimensionality, the methodological scope can now be broadened-focusing on key objectives for the clinical application of PRS. Similar to most methods in this context, snpboost has, so far, been restricted to quantitative and binary traits. Now, we incorporate more advanced alternatives-targeted to the particular aim and outcome. Adapting the loss function extends the snpboost framework to further data situations such as time-to-event and count data. Furthermore, alternative loss functions for continuous outcomes allow us to focus not only on the mean of the conditional distribution but also on other aspects that may be more helpful in the risk stratification of individual patients and can quantify prediction uncertainty, for example, median or quantile regression. This work enhances PRS fitting across multiple model classes previously unfeasible for this data type.","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":" ","pages":"5412-5430"},"PeriodicalIF":1.8,"publicationDate":"2024-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11586906/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142508279","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Causal Inference for Continuous Multiple Time Point Interventions. 连续多时间点干预的因果推理

IF 1.8 4区医学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Statistics in Medicine

Pub Date : 2024-12-10 Epub Date: 2024-10-17 DOI: 10.1002/sim.10246

Michael Schomaker, Helen McIlleron, Paolo Denti, Iván Díaz

There are limited options to estimate the treatment effects of variables which are continuous and measured at multiple time points, particularly if the true dose-response curve should be estimated as closely as possible. However, these situations may be of relevance: in pharmacology, one may be interested in how outcomes of people living with-and treated for-HIV, such as viral failure, would vary for time-varying interventions such as different drug concentration trajectories. A challenge for doing causal inference with continuous interventions is that the positivity assumption is typically violated. To address positivity violations, we develop projection functions, which reweigh and redefine the estimand of interest based on functions of the conditional support for the respective interventions. With these functions, we obtain the desired dose-response curve in areas of enough support, and otherwise a meaningful estimand that does not require the positivity assumption. We develop $g$ -computation type plug-in estimators for this case. Those are contrasted with g-computation estimators which are applied to continuous interventions without specifically addressing positivity violations, which we propose to be presented with diagnostics. The ideas are illustrated with longitudinal data from HIV positive children treated with an efavirenz-based regimen as part of the CHAPAS-3 trial, which enrolled children $< 13$ years in Zambia/Uganda. Simulations show in which situations a standard g-computation approach is appropriate, and in which it leads to bias and how the proposed weighted estimation approach then recovers the alternative estimand of interest.

对于连续性变量和在多个时间点测量的变量，估算其治疗效果的方法很有限，尤其是在需要尽可能接近真实剂量-反应曲线的情况下。然而，这些情况可能与此有关：在药理学中，人们可能会感兴趣的是，对于不同药物浓度轨迹等随时间变化的干预措施，艾滋病毒感染者和接受治疗者的结果（如病毒衰竭）会如何变化。对连续干预进行因果推断的一个挑战是，通常会违反正向性假设。为了解决违反正向性假设的问题，我们开发了投影函数，根据各干预措施的条件支持函数重新权衡和定义感兴趣的估计值。有了这些函数，我们就能在有足够支持度的区域获得所需的剂量-反应曲线，否则就能获得不需要正相关假设的有意义的估计值。在这种情况下，我们开发了 g $$ g $$ 计算型插件估计器。这些估算器与 g 计算估算器形成了鲜明对比，后者适用于连续干预，但不专门处理违反正相关性的情况，我们建议将其与诊断一起提出。我们使用 CHAPAS-3 试验中以依非韦伦为基础的治疗方案治疗的 HIV 阳性儿童的纵向数据来说明我们的想法，该试验在赞比亚/乌干达招募了 13 美元的儿童。模拟显示了标准 g 计算方法在哪些情况下是合适的，在哪些情况下会导致偏差，以及拟议的加权估计方法如何恢复感兴趣的替代估计值。

{"title":"Causal Inference for Continuous Multiple Time Point Interventions.","authors":"Michael Schomaker, Helen McIlleron, Paolo Denti, Iván Díaz","doi":"10.1002/sim.10246","DOIUrl":"10.1002/sim.10246","url":null,"abstract":"There are limited options to estimate the treatment effects of variables which are continuous and measured at multiple time points, particularly if the true dose-response curve should be estimated as closely as possible. However, these situations may be of relevance: in pharmacology, one may be interested in how outcomes of people living with-and treated for-HIV, such as viral failure, would vary for time-varying interventions such as different drug concentration trajectories. A challenge for doing causal inference with continuous interventions is that the positivity assumption is typically violated. To address positivity violations, we develop projection functions, which reweigh and redefine the estimand of interest based on functions of the conditional support for the respective interventions. With these functions, we obtain the desired dose-response curve in areas of enough support, and otherwise a meaningful estimand that does not require the positivity assumption. We develop <math> <semantics><mrow><mi>g</mi></mrow> <annotation>$$ g $$</annotation></semantics> </math> -computation type plug-in estimators for this case. Those are contrasted with g-computation estimators which are applied to continuous interventions without specifically addressing positivity violations, which we propose to be presented with diagnostics. The ideas are illustrated with longitudinal data from HIV positive children treated with an efavirenz-based regimen as part of the CHAPAS-3 trial, which enrolled children <math> <semantics><mrow><mo><</mo> <mn>13</mn></mrow> <annotation>$$ <13 $$</annotation></semantics> </math> years in Zambia/Uganda. Simulations show in which situations a standard g-computation approach is appropriate, and in which it leads to bias and how the proposed weighted estimation approach then recovers the alternative estimand of interest.","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":" ","pages":"5380-5400"},"PeriodicalIF":1.8,"publicationDate":"2024-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11586917/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142475167","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Variability in Causal Effects and Noncompliance in a Multisite Trial: A Bivariate Hierarchical Generalized Random Coefficients Model for a Binary Outcome. 多地点试验中的因果效应和违规行为的变异性：二元结果的二元分层广义随机系数模型。

IF 1.8 4区医学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Statistics in Medicine

Pub Date : 2024-12-10 Epub Date: 2024-10-15 DOI: 10.1002/sim.10229

Xinxin Sun, Yongyun Shin, Jennifer Elston Lafata, Stephen W Raudenbush

Within each of 170 physicians, patients were randomized to access e-assist, an online program that aimed to increase colorectal cancer screening (CRCS), or control. Compliance was partial: $78.34 %$ of the experimental patients accessed e-assist while no controls were provided the access. Of interest are the average causal effect of assignment to treatment and the complier average causal effect as well as the variation of these causal effects across physicians. Each physician generates probabilities of screening for experimental compliers (experimental patients who accessed e-assist), control compliers (controls who would have accessed e-assist had they been assigned to e-assist), and never takers (patients who would have avoided e-assist no matter what). Estimating physician-specific probabilities jointly over physicians poses novel challenges. We address these challenges by maximum likelihood, factoring a "complete-data likelihood" uniquely into the conditional distribution of screening and partially observed compliance given random effects and the distribution of random effects. We marginalize this likelihood using adaptive Gauss-Hermite quadrature. The approach is doubly iterative in that the conditional distribution defies analytic evaluation. Because the small sample size per physician constrains estimability of multiple random effects, we reduce their dimensionality using a shared random effects model having a factor analytic structure. We assess estimators and recommend sample sizes to produce reasonably accurate and precise estimates by simulation, and analyze data from a trial of a CRCS intervention.

在 170 名医生中，每名患者都被随机分配到旨在提高结直肠癌筛查率（CRCS）的在线项目 e-assist 或对照组中。部分患者遵从了这一计划：78.34% 的实验患者使用了电子辅助工具，而对照组患者则没有使用。值得关注的是分配治疗的平均因果效应和辅助者的平均因果效应，以及这些因果效应在不同医生之间的差异。每位医生都会产生筛选实验感染者（接受过电子辅助治疗的实验患者）、对照感染者（如果被分配接受电子辅助治疗，就会接受电子辅助治疗的对照组患者）和从不接受治疗者（无论如何都不会接受电子辅助治疗的患者）的概率。对医生的特定概率进行联合估计带来了新的挑战。我们通过最大似然法来解决这些难题，将 "完整数据似然法 "唯一地考虑到随机效应和随机效应分布下筛查和部分观察依从性的条件分布。我们使用自适应高斯-赫米特正交法对该可能性进行边际化。这种方法具有双重迭代性，因为条件分布无法进行分析评估。由于每位医生的样本量较小，限制了多重随机效应的可估算性，因此我们使用具有因子分析结构的共享随机效应模型来降低其维度。我们通过模拟评估估算器和建议样本量，以得出合理准确的估算结果，并分析了 CRCS 干预试验的数据。

{"title":"Variability in Causal Effects and Noncompliance in a Multisite Trial: A Bivariate Hierarchical Generalized Random Coefficients Model for a Binary Outcome.","authors":"Xinxin Sun, Yongyun Shin, Jennifer Elston Lafata, Stephen W Raudenbush","doi":"10.1002/sim.10229","DOIUrl":"10.1002/sim.10229","url":null,"abstract":"Within each of 170 physicians, patients were randomized to access e-assist, an online program that aimed to increase colorectal cancer screening (CRCS), or control. Compliance was partial: <math> <semantics><mrow><mn>78.34</mn> <mo>%</mo></mrow> <annotation>$$ 78.34% $$</annotation></semantics> </math> of the experimental patients accessed e-assist while no controls were provided the access. Of interest are the average causal effect of assignment to treatment and the complier average causal effect as well as the variation of these causal effects across physicians. Each physician generates probabilities of screening for experimental compliers (experimental patients who accessed e-assist), control compliers (controls who would have accessed e-assist had they been assigned to e-assist), and never takers (patients who would have avoided e-assist no matter what). Estimating physician-specific probabilities jointly over physicians poses novel challenges. We address these challenges by maximum likelihood, factoring a \"complete-data likelihood\" uniquely into the conditional distribution of screening and partially observed compliance given random effects and the distribution of random effects. We marginalize this likelihood using adaptive Gauss-Hermite quadrature. The approach is doubly iterative in that the conditional distribution defies analytic evaluation. Because the small sample size per physician constrains estimability of multiple random effects, we reduce their dimensionality using a shared random effects model having a factor analytic structure. We assess estimators and recommend sample sizes to produce reasonably accurate and precise estimates by simulation, and analyze data from a trial of a CRCS intervention.","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":" ","pages":"5353-5365"},"PeriodicalIF":1.8,"publicationDate":"2024-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11586915/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142475172","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Does Remdesivir Lower COVID-19 Mortality? A Subgroup Analysis of Hospitalized Adults Receiving Supplemental Oxygen. 雷米地韦能降低 COVID-19 死亡率吗？对接受辅助供氧的住院成人的分组分析。

IF 1.8 4区医学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Statistics in Medicine

Pub Date : 2024-12-10 Epub Date: 2024-10-10 DOI: 10.1002/sim.10241

Gail E Potter, Michael A Proschan

The first Adaptive COVID-19 Treatment Trial (ACTT-1) showed that remdesivir improved COVID-19 recovery time compared with placebo in hospitalized adults. The secondary outcome of mortality was almost significant overall (p = 0.07) and highly significant for people receiving supplemental oxygen at enrollment (p = 0.002), suggesting a mortality benefit concentrated in this group. We explore analysis methods that are helpful when a single subgroup benefits from treatment and apply them to ACTT-1, using baseline oxygen use to define subgroups. We consider two questions: (1) is the remdesivir effect for people receiving supplemental oxygen real, and (2) does this effect differ from the overall effect? For Question 1, we apply a Bonferroni adjustment to subgroup-specific hypothesis tests and the Westfall and Young permutation test, which is valid when small cell counts preclude normally distributed test statistics (a frequently unexamined condition in subgroup analyses). For Question 2, we introduce Q_max, the largest standardized difference between subgroup-specific effects and the overall effect. Q_max simultaneously tests whether any subgroup effect differs from the overall effect and identifies the subgroup benefitting most. We demonstrate that Q_max strongly controls the familywise error rate (FWER) when test statistics are normally distributed with no mean-variance relationship. We compare Q_max to a related permutation test, SEAMOS, which was previously proposed but not extensively applied or tested. We show that SEAMOS can have inflated Type 1 error under the global null when control arm event rates differ between subgroups. Our results support a mortality benefit from remdesivir in people receiving supplemental oxygen.

首次 COVID-19 适应性治疗试验（ACTT-1）显示，与安慰剂相比，雷米替韦能缩短住院成人的 COVID-19 恢复时间。死亡率这一次要结果在总体上几乎是显著的（p = 0.07），而在入组时接受辅助供氧的患者中则是高度显著的（p = 0.002），这表明死亡率获益主要集中在这一群体中。我们探讨了当单一亚组从治疗中获益时的分析方法，并将其应用于 ACTT-1，使用基线用氧量来定义亚组。我们考虑了两个问题：(1) 雷米替韦对接受辅助供氧者的效果是否真实；(2) 这种效果与总体效果是否存在差异？对于问题 1，我们对亚组特定假设检验以及 Westfall 和 Young permutation 检验进行了 Bonferroni 调整，该检验在细胞数较少而检验统计量不符合正态分布的情况下有效（这是亚组分析中经常出现的一种未审查情况）。对于问题 2，我们引入了 Qmax，即亚组效应与总体效应之间的最大标准化差异。Qmax 可同时检验任何亚组效应是否与总体效应不同，并确定受益最大的亚组。我们证明，当测试统计量呈正态分布且无均值-方差关系时，Qmax 能有效控制族内误差率 (FWER)。我们将 Qmax 与相关的置换检验 SEAMOS 进行了比较。我们发现，当控制臂事件发生率在不同亚组之间存在差异时，SEAMOS 在全局空值下可能会产生夸大的 1 类误差。我们的研究结果表明，雷米替韦对接受氧气补充治疗的患者的死亡率有益。

{"title":"Does Remdesivir Lower COVID-19 Mortality? A Subgroup Analysis of Hospitalized Adults Receiving Supplemental Oxygen.","authors":"Gail E Potter, Michael A Proschan","doi":"10.1002/sim.10241","DOIUrl":"10.1002/sim.10241","url":null,"abstract":"The first Adaptive COVID-19 Treatment Trial (ACTT-1) showed that remdesivir improved COVID-19 recovery time compared with placebo in hospitalized adults. The secondary outcome of mortality was almost significant overall (p = 0.07) and highly significant for people receiving supplemental oxygen at enrollment (p = 0.002), suggesting a mortality benefit concentrated in this group. We explore analysis methods that are helpful when a single subgroup benefits from treatment and apply them to ACTT-1, using baseline oxygen use to define subgroups. We consider two questions: (1) is the remdesivir effect for people receiving supplemental oxygen real, and (2) does this effect differ from the overall effect? For Question 1, we apply a Bonferroni adjustment to subgroup-specific hypothesis tests and the Westfall and Young permutation test, which is valid when small cell counts preclude normally distributed test statistics (a frequently unexamined condition in subgroup analyses). For Question 2, we introduce Qmax, the largest standardized difference between subgroup-specific effects and the overall effect. Qmax simultaneously tests whether any subgroup effect differs from the overall effect and identifies the subgroup benefitting most. We demonstrate that Qmax strongly controls the familywise error rate (FWER) when test statistics are normally distributed with no mean-variance relationship. We compare Qmax to a related permutation test, SEAMOS, which was previously proposed but not extensively applied or tested. We show that SEAMOS can have inflated Type 1 error under the global null when control arm event rates differ between subgroups. Our results support a mortality benefit from remdesivir in people receiving supplemental oxygen.","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":" ","pages":"5285-5299"},"PeriodicalIF":1.8,"publicationDate":"2024-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11586907/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142393473","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Combining Biomarkers to Improve Diagnostic Accuracy in Detecting Diseases With Group-Tested Data. 结合生物标志物，利用群体测试数据提高疾病诊断准确性

IF 1.8 4区医学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Statistics in Medicine

Pub Date : 2024-11-30 Epub Date: 2024-10-07 DOI: 10.1002/sim.10230

Jin Yang, Wei Zhang, Paul S Albert, Aiyi Liu, Zhen Chen

We consider the problem of combining multiple biomarkers to improve the diagnostic accuracy of detecting a disease when only group-tested data on the disease status are available. There are several challenges in addressing this problem, including unavailable individual disease statuses, differential misclassification depending on group size and number of diseased individuals in the group, and extensive computation due to a large number of possible combinations of multiple biomarkers. To tackle these issues, we propose a pairwise model fitting approach to estimating the distribution of the optimal linear combination of biomarkers and its diagnostic accuracy under the assumption of a multivariate normal distribution. The approach is evaluated in simulation studies and applied to data on chlamydia detection and COVID-19 diagnosis.

我们考虑的问题是，在只有关于疾病状态的群体测试数据的情况下，如何结合多种生物标记物来提高检测疾病的诊断准确性。解决这个问题有几个挑战，包括无法获得个体疾病状态、因群体规模和群体中患病个体数量不同而导致的分类错误，以及多种生物标记物的大量可能组合所导致的大量计算。为了解决这些问题，我们提出了一种成对模型拟合方法，在多变量正态分布的假设下，估计生物标记物最佳线性组合的分布及其诊断准确性。我们在模拟研究中对该方法进行了评估，并将其应用于衣原体检测和 COVID-19 诊断数据中。

引用次数: 0

Leveraging External Aggregated Information for the Marginal Accelerated Failure Time Model. 利用外部汇总信息的边际加速故障时间模型。

IF 1.8 4区医学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Statistics in Medicine

Pub Date : 2024-11-30 Epub Date: 2024-10-08 DOI: 10.1002/sim.10224

Ping Xie, Jie Ding, Xiaoguang Wang

It is becoming increasingly common for researchers to consider leveraging information from external sources to enhance the analysis of small-scale studies. While much attention has focused on univariate survival data, correlated survival data are prevalent in epidemiological investigations. In this article, we propose a unified framework to improve the estimation of the marginal accelerated failure time model with correlated survival data by integrating additional information given in the form of covariate effects evaluated in a reduced accelerated failure time model. Such auxiliary information can be summarized by using valid estimating equations and hence can then be combined with the internal linear rank-estimating equations via the generalized method of moments. We investigate the asymptotic properties of the proposed estimator and show that it is more efficient than the conventional estimator using internal data only. When population heterogeneity exists, we revise the proposed estimation procedure and present a shrinkage estimator to protect against bias and loss of efficiency. Moreover, the proposed estimation procedure can be further refined to accommodate the non-negligible uncertainty in the auxiliary information, leading to more trustable inference conclusions. Simulation results demonstrate the finite sample performance of the proposed methods, and empirical application on the Prostate, Lung, Colorectal, and Ovarian Cancer Screening Trial substantiates its practical relevance.

研究人员越来越普遍地考虑利用外部来源的信息来加强对小规模研究的分析。虽然单变量生存数据备受关注，但相关生存数据在流行病学调查中也很普遍。在本文中，我们提出了一个统一的框架，通过整合在简化的加速失效时间模型中评估的协变量效应形式给出的附加信息，来改进具有相关生存数据的边际加速失效时间模型的估计。这些辅助信息可以使用有效的估计方程进行总结，从而通过广义矩法与内部线性秩估计方程相结合。我们对所提出的估计器的渐近特性进行了研究，结果表明它比仅使用内部数据的传统估计器更有效。当存在人口异质性时，我们修改了所提出的估计程序，并提出了一种收缩估计器，以防止偏差和效率损失。此外，建议的估计程序还可以进一步完善，以适应辅助信息中不可忽略的不确定性，从而得出更可信的推断结论。仿真结果证明了所提方法的有限样本性能，而在前列腺癌、肺癌、结肠直肠癌和卵巢癌筛查试验中的经验应用则证实了该方法的实用性。

{"title":"Leveraging External Aggregated Information for the Marginal Accelerated Failure Time Model.","authors":"Ping Xie, Jie Ding, Xiaoguang Wang","doi":"10.1002/sim.10224","DOIUrl":"10.1002/sim.10224","url":null,"abstract":"It is becoming increasingly common for researchers to consider leveraging information from external sources to enhance the analysis of small-scale studies. While much attention has focused on univariate survival data, correlated survival data are prevalent in epidemiological investigations. In this article, we propose a unified framework to improve the estimation of the marginal accelerated failure time model with correlated survival data by integrating additional information given in the form of covariate effects evaluated in a reduced accelerated failure time model. Such auxiliary information can be summarized by using valid estimating equations and hence can then be combined with the internal linear rank-estimating equations via the generalized method of moments. We investigate the asymptotic properties of the proposed estimator and show that it is more efficient than the conventional estimator using internal data only. When population heterogeneity exists, we revise the proposed estimation procedure and present a shrinkage estimator to protect against bias and loss of efficiency. Moreover, the proposed estimation procedure can be further refined to accommodate the non-negligible uncertainty in the auxiliary information, leading to more trustable inference conclusions. Simulation results demonstrate the finite sample performance of the proposed methods, and empirical application on the Prostate, Lung, Colorectal, and Ovarian Cancer Screening Trial substantiates its practical relevance.","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":" ","pages":"5203-5216"},"PeriodicalIF":1.8,"publicationDate":"2024-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142393474","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Regression Approaches to Assess Effect of Treatments That Arrest Progression of Symptoms. 评估阻止症状发展的治疗效果的回归方法。

IF 1.8 4区医学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Statistics in Medicine

Pub Date : 2024-11-30 Epub Date: 2024-10-04 DOI: 10.1002/sim.10219

Ana M Ortega-Villa, Martha C Nason, Michael P Fay, Sara Alehashemi, Raphaela Goldbach-Mansky, Dean A Follmann

Motivated by a small sample example in neonatal onset multisystem inflammatory disease (NOMID), we propose a method that can be used when the interest is testing for an association between a changes in disease progression with start of treatment compared to historical disease progression prior to treatment. Our method estimates the longitudinal trajectory of the outcome variable and adds an interaction term between an intervention indicator variable and the time since initiation of the intervention. This method is appropriate for a situation in which the intervention slows or arrests the effect of the disease on the outcome, as is the case in our motivating example. By simulation in small samples and restricted sets of treatment initiation times, we show that the generalized estimating equations (GEE) formulation with small sample adjustments can bound the Type I error rate better than GEE and linear mixed models without small sample adjustments. Permutation tests (permuting the time of treatment initiation) is another valid approach that can also be useful. We illustrate the methodology through an application to a prospective cohort of NOMID patients enrolled at the NIH clinical center.

受新生儿发病多系统炎症性疾病（NOMID）小样本实例的启发，我们提出了一种方法，可用于测试疾病进展变化与开始治疗前历史疾病进展之间的关联。我们的方法估计了结果变量的纵向轨迹，并在干预指标变量和开始干预后的时间之间添加了一个交互项。这种方法适用于干预措施减缓或阻止了疾病对结果的影响的情况，我们的激励性例子就是这种情况。通过对小样本和受限的治疗开始时间集进行模拟，我们发现，与不进行小样本调整的广义估计方程（GEE）相比，进行了小样本调整的广义估计方程（GEE）能更好地约束 I 类错误率。换位检验（对治疗开始时间进行换位）是另一种有效的方法，也很有用。我们将在美国国立卫生研究院临床中心登记的 NOMID 患者前瞻性队列中应用该方法进行说明。

{"title":"Regression Approaches to Assess Effect of Treatments That Arrest Progression of Symptoms.","authors":"Ana M Ortega-Villa, Martha C Nason, Michael P Fay, Sara Alehashemi, Raphaela Goldbach-Mansky, Dean A Follmann","doi":"10.1002/sim.10219","DOIUrl":"10.1002/sim.10219","url":null,"abstract":"Motivated by a small sample example in neonatal onset multisystem inflammatory disease (NOMID), we propose a method that can be used when the interest is testing for an association between a changes in disease progression with start of treatment compared to historical disease progression prior to treatment. Our method estimates the longitudinal trajectory of the outcome variable and adds an interaction term between an intervention indicator variable and the time since initiation of the intervention. This method is appropriate for a situation in which the intervention slows or arrests the effect of the disease on the outcome, as is the case in our motivating example. By simulation in small samples and restricted sets of treatment initiation times, we show that the generalized estimating equations (GEE) formulation with small sample adjustments can bound the Type I error rate better than GEE and linear mixed models without small sample adjustments. Permutation tests (permuting the time of treatment initiation) is another valid approach that can also be useful. We illustrate the methodology through an application to a prospective cohort of NOMID patients enrolled at the NIH clinical center.","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":" ","pages":"5155-5165"},"PeriodicalIF":1.8,"publicationDate":"2024-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142372927","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Weighted Expectile Regression Neural Networks for Right Censored Data. 右删失数据的加权期望回归神经网络

IF 1.8 4区医学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Statistics in Medicine

Pub Date : 2024-11-30 Epub Date: 2024-09-29 DOI: 10.1002/sim.10221

Feipeng Zhang, Xi Chen, Peng Liu, Caiyun Fan

As a favorable alternative to the censored quantile regression, censored expectile regression has been popular in survival analysis due to its flexibility in modeling the heterogeneous effect of covariates. The existing weighted expectile regression (WER) method assumes that the censoring variable and covariates are independent, and that the covariates effects has a global linear structure. However, these two assumptions are too restrictive to capture the complex and nonlinear pattern of the underlying covariates effects. In this article, we developed a novel weighted expectile regression neural networks (WERNN) method by incorporating the deep neural network structure into the censored expectile regression framework. To handle the random censoring, we employ the inverse probability of censoring weighting (IPCW) technique in the expectile loss function. The proposed WERNN method is flexible enough to fit nonlinear patterns and therefore achieves more accurate prediction performance than the existing WER method for right censored data. Our findings are supported by extensive Monte Carlo simulation studies and a real data application.

作为删减量化回归的有利替代方法，删减期望回归因其在建立协变量异质效应模型方面的灵活性而在生存分析中广受欢迎。现有的加权期望值回归（WER）方法假设剔除变量和协变量是独立的，并且协变量效应具有全局线性结构。然而，这两个假设限制性太大，无法捕捉潜在协变量效应的复杂和非线性模式。在本文中，我们通过将深度神经网络结构融入有删减的期望回归框架，开发了一种新的加权期望回归神经网络（WERNN）方法。为了处理随机普查，我们在期望损失函数中采用了普查反概率加权（IPCW）技术。所提出的 WERNN 方法具有足够的灵活性来适应非线性模式，因此在右删减数据方面比现有的 WER 方法获得了更准确的预测性能。我们的研究结果得到了大量蒙特卡罗模拟研究和实际数据应用的支持。

引用次数: 0

Estimating Time-Varying Exposure Effects Through Continuous-Time Modelling in Mendelian Randomization. 通过孟德尔随机化中的连续时间模型估算时变暴露效应

IF 1.8 4区医学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Statistics in Medicine

Pub Date : 2024-11-30 Epub Date: 2024-10-06 DOI: 10.1002/sim.10222

Haodong Tian, Ashish Patel, Stephen Burgess

Mendelian randomization is an instrumental variable method that utilizes genetic information to investigate the causal effect of a modifiable exposure on an outcome. In most cases, the exposure changes over time. Understanding the time-varying causal effect of the exposure can yield detailed insights into mechanistic effects and the potential impact of public health interventions. Recently, a growing number of Mendelian randomization studies have attempted to explore time-varying causal effects. However, the proposed approaches oversimplify temporal information and rely on overly restrictive structural assumptions, limiting their reliability in addressing time-varying causal problems. This article considers a novel approach to estimate time-varying effects through continuous-time modelling by combining functional principal component analysis and weak-instrument-robust techniques. Our method effectively utilizes available data without making strong structural assumptions and can be applied in general settings where the exposure measurements occur at different timepoints for different individuals. We demonstrate through simulations that our proposed method performs well in estimating time-varying effects and provides reliable inference when the time-varying effect form is correctly specified. The method could theoretically be used to estimate arbitrarily complex time-varying effects. However, there is a trade-off between model complexity and instrument strength. Estimating complex time-varying effects requires instruments that are unrealistically strong. We illustrate the application of this method in a case study examining the time-varying effects of systolic blood pressure on urea levels.

孟德尔随机化是一种工具变量方法，它利用遗传信息来研究可改变的暴露对结果的因果影响。在大多数情况下，暴露会随时间发生变化。了解暴露随时间变化的因果效应可以深入了解机理效应和公共卫生干预措施的潜在影响。最近，越来越多的孟德尔随机研究试图探索随时间变化的因果效应。然而，所提出的方法过度简化了时间信息，并依赖于过于严格的结构假设，从而限制了其在解决时变因果问题方面的可靠性。本文探讨了一种通过连续时间建模估算时变效应的新方法，该方法结合了函数主成分分析和弱模糊稳健技术。我们的方法有效地利用了现有数据，而无需做出强烈的结构性假设，并可应用于不同个体在不同时间点进行暴露测量的一般环境。我们通过模拟证明，我们提出的方法在估计时变效应方面表现出色，并在正确指定时变效应形式时提供可靠的推论。理论上，该方法可用于估计任意复杂的时变效应。然而，在模型复杂性和工具强度之间需要权衡。估计复杂的时变效应需要不切实际的工具强度。我们在一个案例研究中说明了这种方法的应用，该案例研究了收缩压对尿素水平的时变效应。

{"title":"Estimating Time-Varying Exposure Effects Through Continuous-Time Modelling in Mendelian Randomization.","authors":"Haodong Tian, Ashish Patel, Stephen Burgess","doi":"10.1002/sim.10222","DOIUrl":"10.1002/sim.10222","url":null,"abstract":"Mendelian randomization is an instrumental variable method that utilizes genetic information to investigate the causal effect of a modifiable exposure on an outcome. In most cases, the exposure changes over time. Understanding the time-varying causal effect of the exposure can yield detailed insights into mechanistic effects and the potential impact of public health interventions. Recently, a growing number of Mendelian randomization studies have attempted to explore time-varying causal effects. However, the proposed approaches oversimplify temporal information and rely on overly restrictive structural assumptions, limiting their reliability in addressing time-varying causal problems. This article considers a novel approach to estimate time-varying effects through continuous-time modelling by combining functional principal component analysis and weak-instrument-robust techniques. Our method effectively utilizes available data without making strong structural assumptions and can be applied in general settings where the exposure measurements occur at different timepoints for different individuals. We demonstrate through simulations that our proposed method performs well in estimating time-varying effects and provides reliable inference when the time-varying effect form is correctly specified. The method could theoretically be used to estimate arbitrarily complex time-varying effects. However, there is a trade-off between model complexity and instrument strength. Estimating complex time-varying effects requires instruments that are unrealistically strong. We illustrate the application of this method in a case study examining the time-varying effects of systolic blood pressure on urea levels.","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":" ","pages":"5166-5181"},"PeriodicalIF":1.8,"publicationDate":"2024-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7616825/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142381657","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Complex Meta-Regression Model to Identify Effective Features of Interventions From Multi-Arm, Multi-Follow-Up Trials. 从多臂、多随访试验中识别干预措施有效特征的复杂元回归模型。

IF 1.8 4区医学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Statistics in Medicine

Pub Date : 2024-11-30 Epub Date: 2024-10-09 DOI: 10.1002/sim.10237

Annabel L Davies, Julian P T Higgins

Network meta-analysis (NMA) combines evidence from multiple trials to compare the effectiveness of a set of interventions. In many areas of research, interventions are often complex, made up of multiple components or features. This makes it difficult to define a common set of interventions on which to perform the analysis. One approach to this problem is component network meta-analysis (CNMA) which uses a meta-regression framework to define each intervention as a subset of components whose individual effects combine additively. In this article, we are motivated by a systematic review of complex interventions to prevent obesity in children. Due to considerable heterogeneity across the trials, these interventions cannot be expressed as a subset of components but instead are coded against a framework of characteristic features. To analyse these data, we develop a bespoke CNMA-inspired model that allows us to identify the most important features of interventions. We define a meta-regression model with covariates on three levels: intervention, study, and follow-up time, as well as flexible interaction terms. By specifying different regression structures for trials with and without a control arm, we relax the assumption from previous CNMA models that a control arm is the absence of intervention components. Furthermore, we derive a correlation structure that accounts for trials with multiple intervention arms and multiple follow-up times. Although, our model was developed for the specifics of the obesity data set, it has wider applicability to any set of complex interventions that can be coded according to a set of shared features.

网络荟萃分析（NMA）将来自多项试验的证据结合起来，以比较一组干预措施的有效性。在许多研究领域，干预措施往往很复杂，由多个部分或特征组成。这就很难确定一组共同的干预措施来进行分析。解决这一问题的方法之一是成分网络荟萃分析（CNMA），它使用元回归框架将每种干预措施定义为成分子集，这些成分的个体效应是相加的。在本文中，我们对预防儿童肥胖的复杂干预措施进行了系统回顾。由于各试验之间存在相当大的异质性，这些干预措施无法表述为组成部分的子集，而是根据特征框架进行编码。为了分析这些数据，我们开发了一个受 CNMA 启发的定制模型，使我们能够识别干预措施最重要的特征。我们定义了一个元回归模型，其中包含三个层面的协变量：干预、研究和随访时间，以及灵活的交互项。通过为有对照组和无对照组的试验指定不同的回归结构，我们放宽了以往 CNMA 模型中的假设，即对照组就是没有干预成分。此外，我们还推导出了一种相关结构，可用于多干预臂和多随访时间的试验。虽然我们的模型是针对肥胖症数据集的特殊性而开发的，但它对任何可根据一系列共同特征进行编码的复杂干预措施集都具有更广泛的适用性。

{"title":"A Complex Meta-Regression Model to Identify Effective Features of Interventions From Multi-Arm, Multi-Follow-Up Trials.","authors":"Annabel L Davies, Julian P T Higgins","doi":"10.1002/sim.10237","DOIUrl":"10.1002/sim.10237","url":null,"abstract":"Network meta-analysis (NMA) combines evidence from multiple trials to compare the effectiveness of a set of interventions. In many areas of research, interventions are often complex, made up of multiple components or features. This makes it difficult to define a common set of interventions on which to perform the analysis. One approach to this problem is component network meta-analysis (CNMA) which uses a meta-regression framework to define each intervention as a subset of components whose individual effects combine additively. In this article, we are motivated by a systematic review of complex interventions to prevent obesity in children. Due to considerable heterogeneity across the trials, these interventions cannot be expressed as a subset of components but instead are coded against a framework of characteristic features. To analyse these data, we develop a bespoke CNMA-inspired model that allows us to identify the most important features of interventions. We define a meta-regression model with covariates on three levels: intervention, study, and follow-up time, as well as flexible interaction terms. By specifying different regression structures for trials with and without a control arm, we relax the assumption from previous CNMA models that a control arm is the absence of intervention components. Furthermore, we derive a correlation structure that accounts for trials with multiple intervention arms and multiple follow-up times. Although, our model was developed for the specifics of the obesity data set, it has wider applicability to any set of complex interventions that can be coded according to a set of shared features.","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":" ","pages":"5217-5233"},"PeriodicalIF":1.8,"publicationDate":"2024-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11583959/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142393469","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0