首页 > 最新文献

Biometrics最新文献

英文 中文
Change surface regression for nonlinear subgroup identification with application to warfarin pharmacogenomics data. 变化面回归非线性亚群识别在华法林药物基因组学数据中的应用。
IF 1.4 4区 数学 Q3 BIOLOGY Pub Date : 2025-01-07 DOI: 10.1093/biomtc/ujae169
Pan Liu, Yaguang Li, Jialiang Li

Pharmacogenomics stands as a pivotal driver toward personalized medicine, aiming to optimize drug efficacy while minimizing adverse effects by uncovering the impact of genetic variations on inter-individual outcome variability. Despite its promise, the intricate landscape of drug metabolism introduces complexity, where the correlation between drug response and genes can be shaped by numerous nongenetic factors, often exhibiting heterogeneity across diverse subpopulations. This challenge is particularly pronounced in datasets such as the International Warfarin Pharmacogenetic Consortium (IWPC), which encompasses diverse patient information from multiple nations. To capture the between-patient heterogeneity in dosing requirement, we formulate a novel change surface model as a model-based approach for multiple subgroup identification in complex datasets. A key feature of our approach is its ability to accommodate nonlinear subgroup divisions, providing a clearer understanding of dynamic drug-gene associations. Furthermore, our model effectively handles high-dimensional data through a doubly penalized approach, ensuring both interpretability and adaptability. We propose an iterative 2-stage method that combines a change point detection technique in the first stage with a smoothed local adaptive majorize-minimization algorithm for surface regression in the second stage. Performance of the proposed methods is evaluated through extensive numerical studies. Application of our method to the IWPC dataset leads to significant new findings, where 3 subgroups subject to different pharmacogenomic relationships are identified, contributing valuable insights into the complex dynamics of drug-gene associations in patients.

药物基因组学是个性化医疗的关键驱动力,旨在通过揭示遗传变异对个体间结果变异性的影响来优化药物疗效,同时最大限度地减少不良反应。尽管前景很好,但药物代谢的复杂图景引入了复杂性,其中药物反应和基因之间的相关性可以由许多非遗传因素塑造,通常在不同的亚群中表现出异质性。这一挑战在国际华法林药物遗传联盟(IWPC)等数据集中尤其明显,该数据集包含来自多个国家的各种患者信息。为了捕捉患者之间剂量需求的异质性,我们制定了一种新的变化面模型,作为一种基于模型的方法,用于复杂数据集中的多亚组识别。我们方法的一个关键特征是它能够适应非线性亚群划分,提供对动态药物基因关联的更清晰理解。此外,我们的模型通过双重惩罚方法有效地处理高维数据,确保了可解释性和适应性。我们提出了一种迭代的两阶段方法,该方法结合了第一阶段的变化点检测技术和第二阶段的光滑局部自适应最大化算法用于表面回归。通过广泛的数值研究评估了所提出方法的性能。将我们的方法应用于IWPC数据集导致了重要的新发现,其中确定了受不同药物基因组学关系影响的3个亚组,为患者药物-基因关联的复杂动态提供了有价值的见解。
{"title":"Change surface regression for nonlinear subgroup identification with application to warfarin pharmacogenomics data.","authors":"Pan Liu, Yaguang Li, Jialiang Li","doi":"10.1093/biomtc/ujae169","DOIUrl":"https://doi.org/10.1093/biomtc/ujae169","url":null,"abstract":"<p><p>Pharmacogenomics stands as a pivotal driver toward personalized medicine, aiming to optimize drug efficacy while minimizing adverse effects by uncovering the impact of genetic variations on inter-individual outcome variability. Despite its promise, the intricate landscape of drug metabolism introduces complexity, where the correlation between drug response and genes can be shaped by numerous nongenetic factors, often exhibiting heterogeneity across diverse subpopulations. This challenge is particularly pronounced in datasets such as the International Warfarin Pharmacogenetic Consortium (IWPC), which encompasses diverse patient information from multiple nations. To capture the between-patient heterogeneity in dosing requirement, we formulate a novel change surface model as a model-based approach for multiple subgroup identification in complex datasets. A key feature of our approach is its ability to accommodate nonlinear subgroup divisions, providing a clearer understanding of dynamic drug-gene associations. Furthermore, our model effectively handles high-dimensional data through a doubly penalized approach, ensuring both interpretability and adaptability. We propose an iterative 2-stage method that combines a change point detection technique in the first stage with a smoothed local adaptive majorize-minimization algorithm for surface regression in the second stage. Performance of the proposed methods is evaluated through extensive numerical studies. Application of our method to the IWPC dataset leads to significant new findings, where 3 subgroups subject to different pharmacogenomic relationships are identified, contributing valuable insights into the complex dynamics of drug-gene associations in patients.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 1","pages":""},"PeriodicalIF":1.4,"publicationDate":"2025-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142999226","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Causal inference with cross-temporal design. 跨时间设计的因果推理。
IF 1.4 4区 数学 Q3 BIOLOGY Pub Date : 2025-01-07 DOI: 10.1093/biomtc/ujae163
Yi Cao, Pedro L Gozalo, Roee Gutman

When many participants in a randomized trial do not comply with their assigned intervention, the randomized encouragement design is a possible solution. In this design, the causal effects of the intervention can be estimated among participants who would have experienced the intervention if encouraged. For many policy interventions, encouragements cannot be randomized and investigators need to rely on observational data. To address this, we propose a cross-temporal design, which uses time to mimic a randomized encouragement experiment. However, time may be confounded with temporal trends that influence the outcomes. To disentangle these trends from the intervention effects, we replace the commonly used exclusion restrictions with temporal assumptions. We develop Bayesian procedures to estimate the causal effects and compare it to instrumental variables and matching approaches in simulations. The Bayesian approach outperforms the other 2 approaches in terms of estimation accuracy, and it is relatively robust to various violations of the common trends assumption. Taking advantage of the expansion of the Medicare Advantage (MA) program between 2011 and 2017, we implement the proposed method to estimate the effects of MA enrollment on the risk of skilled nursing facility residents being re-hospitalized within 30 days after discharge from the hospital.

当随机试验中的许多参与者不遵守分配的干预措施时,随机鼓励设计是一种可能的解决方案。在这个设计中,干预的因果效应可以在参与者中估计,如果鼓励的话,他们会经历干预。对于许多政策干预,鼓励措施不能是随机的,调查人员需要依靠观察数据。为了解决这个问题,我们提出了一个跨时间设计,它使用时间来模拟随机鼓励实验。但是,时间可能与影响结果的时间趋势相混淆。为了将这些趋势与干预效应区分开来,我们用时间假设取代了常用的排除限制。我们开发了贝叶斯程序来估计因果效应,并将其与模拟中的工具变量和匹配方法进行比较。贝叶斯方法在估计精度方面优于其他两种方法,并且对各种违反共同趋势假设的情况具有相对的鲁棒性。利用2011年至2017年间医疗保险优势(MA)计划的扩大,我们实施了所提出的方法来估计MA登记对熟练护理机构居民出院后30天内再次住院风险的影响。
{"title":"Causal inference with cross-temporal design.","authors":"Yi Cao, Pedro L Gozalo, Roee Gutman","doi":"10.1093/biomtc/ujae163","DOIUrl":"10.1093/biomtc/ujae163","url":null,"abstract":"<p><p>When many participants in a randomized trial do not comply with their assigned intervention, the randomized encouragement design is a possible solution. In this design, the causal effects of the intervention can be estimated among participants who would have experienced the intervention if encouraged. For many policy interventions, encouragements cannot be randomized and investigators need to rely on observational data. To address this, we propose a cross-temporal design, which uses time to mimic a randomized encouragement experiment. However, time may be confounded with temporal trends that influence the outcomes. To disentangle these trends from the intervention effects, we replace the commonly used exclusion restrictions with temporal assumptions. We develop Bayesian procedures to estimate the causal effects and compare it to instrumental variables and matching approaches in simulations. The Bayesian approach outperforms the other 2 approaches in terms of estimation accuracy, and it is relatively robust to various violations of the common trends assumption. Taking advantage of the expansion of the Medicare Advantage (MA) program between 2011 and 2017, we implement the proposed method to estimate the effects of MA enrollment on the risk of skilled nursing facility residents being re-hospitalized within 30 days after discharge from the hospital.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 1","pages":""},"PeriodicalIF":1.4,"publicationDate":"2025-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11725568/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142969461","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
High-dimensional partially linear functional Cox models. 高维部分线性泛函Cox模型。
IF 1.4 4区 数学 Q3 BIOLOGY Pub Date : 2025-01-07 DOI: 10.1093/biomtc/ujae164
Xin Chen, Hua Liu, Jiaqi Men, Jinhong You

As a commonly employed method for analyzing time-to-event data involving functional predictors, the functional Cox model assumes a linear relationship between the functional principal component (FPC) scores of the functional predictors and the hazard rates. However, in practical scenarios, such as our study on the survival time of kidney transplant recipients, this assumption often fails to hold. To address this limitation, we introduce a class of high-dimensional partially linear functional Cox models, which accommodates the non-linear effects of functional predictors on the response and allows for diverging numbers of scalar predictors and FPCs as the sample size increases. We employ the group smoothly clipped absolute deviation method to select relevant scalar predictors and FPCs, and use B-splines to obtain a smoothed estimate of the non-linear effect. The finite sample performance of the estimates is evaluated through simulation studies. The model is also applied to a kidney transplant dataset, allowing us to make inferences about the non-linear effects of functional predictors on patients' hazard rates, as well as to identify significant scalar predictors for long-term survival time.

作为分析涉及功能预测因子的时间到事件数据的常用方法,功能Cox模型假设功能预测因子的功能主成分(FPC)得分与风险率之间存在线性关系。然而,在实际情况下,例如我们对肾移植受者生存时间的研究,这种假设往往不成立。为了解决这一限制,我们引入了一类高维部分线性功能Cox模型,该模型适应功能预测因子对响应的非线性影响,并允许随着样本量的增加而分散标量预测因子和fpc的数量。我们采用组平滑裁剪绝对偏差法选择相关的标量预测因子和fpc,并使用b样条获得非线性效应的平滑估计。通过仿真研究评估了估计的有限样本性能。该模型还应用于肾移植数据集,使我们能够推断功能预测因子对患者危险率的非线性影响,并确定长期生存时间的重要标量预测因子。
{"title":"High-dimensional partially linear functional Cox models.","authors":"Xin Chen, Hua Liu, Jiaqi Men, Jinhong You","doi":"10.1093/biomtc/ujae164","DOIUrl":"https://doi.org/10.1093/biomtc/ujae164","url":null,"abstract":"<p><p>As a commonly employed method for analyzing time-to-event data involving functional predictors, the functional Cox model assumes a linear relationship between the functional principal component (FPC) scores of the functional predictors and the hazard rates. However, in practical scenarios, such as our study on the survival time of kidney transplant recipients, this assumption often fails to hold. To address this limitation, we introduce a class of high-dimensional partially linear functional Cox models, which accommodates the non-linear effects of functional predictors on the response and allows for diverging numbers of scalar predictors and FPCs as the sample size increases. We employ the group smoothly clipped absolute deviation method to select relevant scalar predictors and FPCs, and use B-splines to obtain a smoothed estimate of the non-linear effect. The finite sample performance of the estimates is evaluated through simulation studies. The model is also applied to a kidney transplant dataset, allowing us to make inferences about the non-linear effects of functional predictors on patients' hazard rates, as well as to identify significant scalar predictors for long-term survival time.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 1","pages":""},"PeriodicalIF":1.4,"publicationDate":"2025-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142977394","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Penalized G-estimation for effect modifier selection in a structural nested mean model for repeated outcomes. 在重复结果的结构嵌套平均模型中对效果修饰符选择的惩罚g估计。
IF 1.4 4区 数学 Q3 BIOLOGY Pub Date : 2025-01-07 DOI: 10.1093/biomtc/ujae165
Ajmery Jaman, Guanbo Wang, Ashkan Ertefaie, Michèle Bally, Renée Lévesque, Robert W Platt, Mireille E Schnitzer

Effect modification occurs when the impact of the treatment on an outcome varies based on the levels of other covariates known as effect modifiers. Modeling these effect differences is important for etiological goals and for purposes of optimizing treatment. Structural nested mean models (SNMMs) are useful causal models for estimating the potentially heterogeneous effect of a time-varying exposure on the mean of an outcome in the presence of time-varying confounding. A data-adaptive selection approach is necessary if the effect modifiers are unknown a priori and need to be identified. Although variable selection techniques are available for estimating the conditional average treatment effects using marginal structural models or for developing optimal dynamic treatment regimens, all of these methods consider a single end-of-follow-up outcome. In the context of an SNMM for repeated outcomes, we propose a doubly robust penalized G-estimator for the causal effect of a time-varying exposure with a simultaneous selection of effect modifiers and prove the oracle property of our estimator. We conduct a simulation study for the evaluation of its performance in finite samples and verification of its double-robustness property. Our work is motivated by the study of hemodiafiltration for treating patients with end-stage renal disease at the Centre Hospitalier de l'Université de Montréal. We apply the proposed method to investigate the effect heterogeneity of dialysis facility on the repeated session-specific hemodiafiltration outcomes.

当治疗对结果的影响基于其他称为效果修饰因子的协变量的水平而变化时,就会发生效果修饰。模拟这些效应差异对于病因学目标和优化治疗非常重要。结构嵌套均值模型(snmm)是一种有用的因果模型,用于估计时变暴露对时变混杂存在下结果均值的潜在异质性影响。如果效果修饰符是先验未知的,需要识别,则需要采用数据自适应选择方法。尽管变量选择技术可用于使用边际结构模型估计条件平均治疗效果或开发最佳动态治疗方案,但所有这些方法都考虑单个随访结束结果。在重复结果的SNMM的背景下,我们提出了一个双鲁棒惩罚g估计量,用于时变暴露的因果效应,同时选择效应修饰符,并证明了我们的估计量的预言性。我们进行了仿真研究,以评估其在有限样本中的性能并验证其双鲁棒性。我们工作的动机是在蒙特里萨大学医院中心进行血液滤过治疗终末期肾病患者的研究。我们应用所提出的方法来研究透析设备的异质性对重复时段特异性血液滤过结果的影响。
{"title":"Penalized G-estimation for effect modifier selection in a structural nested mean model for repeated outcomes.","authors":"Ajmery Jaman, Guanbo Wang, Ashkan Ertefaie, Michèle Bally, Renée Lévesque, Robert W Platt, Mireille E Schnitzer","doi":"10.1093/biomtc/ujae165","DOIUrl":"https://doi.org/10.1093/biomtc/ujae165","url":null,"abstract":"<p><p>Effect modification occurs when the impact of the treatment on an outcome varies based on the levels of other covariates known as effect modifiers. Modeling these effect differences is important for etiological goals and for purposes of optimizing treatment. Structural nested mean models (SNMMs) are useful causal models for estimating the potentially heterogeneous effect of a time-varying exposure on the mean of an outcome in the presence of time-varying confounding. A data-adaptive selection approach is necessary if the effect modifiers are unknown a priori and need to be identified. Although variable selection techniques are available for estimating the conditional average treatment effects using marginal structural models or for developing optimal dynamic treatment regimens, all of these methods consider a single end-of-follow-up outcome. In the context of an SNMM for repeated outcomes, we propose a doubly robust penalized G-estimator for the causal effect of a time-varying exposure with a simultaneous selection of effect modifiers and prove the oracle property of our estimator. We conduct a simulation study for the evaluation of its performance in finite samples and verification of its double-robustness property. Our work is motivated by the study of hemodiafiltration for treating patients with end-stage renal disease at the Centre Hospitalier de l'Université de Montréal. We apply the proposed method to investigate the effect heterogeneity of dialysis facility on the repeated session-specific hemodiafiltration outcomes.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 1","pages":""},"PeriodicalIF":1.4,"publicationDate":"2025-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142999234","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Weighted Q-learning for optimal dynamic treatment regimes with nonignorable missing covariates. 带不可忽略缺失协变量的最优动态治疗方案加权q学习。
IF 1.4 4区 数学 Q3 BIOLOGY Pub Date : 2025-01-07 DOI: 10.1093/biomtc/ujae161
Jian Sun, Bo Fu, Li Su

Dynamic treatment regimes (DTRs) formalize medical decision-making as a sequence of rules for different stages, mapping patient-level information to recommended treatments. In practice, estimating an optimal DTR using observational data from electronic medical record (EMR) databases can be complicated by nonignorable missing covariates resulting from informative monitoring of patients. Since complete case analysis can provide consistent estimation of outcome model parameters under the assumption of outcome-independent missingness, Q-learning is a natural approach to accommodating nonignorable missing covariates. However, the backward induction algorithm used in Q-learning can introduce challenges, as nonignorable missing covariates at later stages can result in nonignorable missing pseudo-outcomes at earlier stages, leading to suboptimal DTRs, even if the longitudinal outcome variables are fully observed. To address this unique missing data problem in DTR settings, we propose 2 weighted Q-learning approaches where inverse probability weights for missingness of the pseudo-outcomes are obtained through estimating equations with valid nonresponse instrumental variables or sensitivity analysis. The asymptotic properties of the weighted Q-learning estimators are derived, and the finite-sample performance of the proposed methods is evaluated and compared with alternative methods through extensive simulation studies. Using EMR data from the Medical Information Mart for Intensive Care database, we apply the proposed methods to investigate the optimal fluid strategy for sepsis patients in intensive care units.

动态治疗方案(DTRs)将医疗决策形式化为不同阶段的一系列规则,将患者层面的信息映射到推荐的治疗方法。在实践中,使用来自电子病历(EMR)数据库的观察数据估计最佳DTR可能会因患者信息监测导致的不可忽略的缺失协变量而变得复杂。由于完整的案例分析可以在与结果无关的缺失假设下提供结果模型参数的一致估计,因此q -学习是容纳不可忽略的缺失协变量的自然方法。然而,q学习中使用的反向归纳算法可能会带来挑战,因为后期不可忽略的缺失协变量可能导致早期不可忽略的缺失伪结果,从而导致次优dtr,即使纵向结果变量被完全观察到。为了解决DTR设置中这种独特的缺失数据问题,我们提出了2种加权q学习方法,其中通过估计具有有效非响应工具变量或敏感性分析的方程来获得伪结果缺失的逆概率权重。推导了加权q学习估计量的渐近性质,并通过广泛的仿真研究评估了所提出方法的有限样本性能,并与其他方法进行了比较。利用重症监护医疗信息市场数据库的EMR数据,我们应用所提出的方法来研究重症监护病房脓毒症患者的最佳液体策略。
{"title":"Weighted Q-learning for optimal dynamic treatment regimes with nonignorable missing covariates.","authors":"Jian Sun, Bo Fu, Li Su","doi":"10.1093/biomtc/ujae161","DOIUrl":"https://doi.org/10.1093/biomtc/ujae161","url":null,"abstract":"<p><p>Dynamic treatment regimes (DTRs) formalize medical decision-making as a sequence of rules for different stages, mapping patient-level information to recommended treatments. In practice, estimating an optimal DTR using observational data from electronic medical record (EMR) databases can be complicated by nonignorable missing covariates resulting from informative monitoring of patients. Since complete case analysis can provide consistent estimation of outcome model parameters under the assumption of outcome-independent missingness, Q-learning is a natural approach to accommodating nonignorable missing covariates. However, the backward induction algorithm used in Q-learning can introduce challenges, as nonignorable missing covariates at later stages can result in nonignorable missing pseudo-outcomes at earlier stages, leading to suboptimal DTRs, even if the longitudinal outcome variables are fully observed. To address this unique missing data problem in DTR settings, we propose 2 weighted Q-learning approaches where inverse probability weights for missingness of the pseudo-outcomes are obtained through estimating equations with valid nonresponse instrumental variables or sensitivity analysis. The asymptotic properties of the weighted Q-learning estimators are derived, and the finite-sample performance of the proposed methods is evaluated and compared with alternative methods through extensive simulation studies. Using EMR data from the Medical Information Mart for Intensive Care database, we apply the proposed methods to investigate the optimal fluid strategy for sepsis patients in intensive care units.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 1","pages":""},"PeriodicalIF":1.4,"publicationDate":"2025-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142943773","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Robust Bayesian graphical regression models for assessing tumor heterogeneity in proteomic networks. 评估蛋白质组学网络中肿瘤异质性的稳健贝叶斯图形回归模型。
IF 1.4 4区 数学 Q3 BIOLOGY Pub Date : 2025-01-07 DOI: 10.1093/biomtc/ujae160
Tsung-Hung Yao, Yang Ni, Anindya Bhadra, Jian Kang, Veerabhadran Baladandayuthapani

Graphical models are powerful tools to investigate complex dependency structures in high-throughput datasets. However, most existing graphical models make one of two canonical assumptions: (i) a homogeneous graph with a common network for all subjects or (ii) an assumption of normality, especially in the context of Gaussian graphical models. Both assumptions are restrictive and can fail to hold in certain applications such as proteomic networks in cancer. To this end, we propose an approach termed robust Bayesian graphical regression (rBGR) to estimate heterogeneous graphs for non-normally distributed data. rBGR is a flexible framework that accommodates non-normality through random marginal transformations and constructs covariate-dependent graphs to accommodate heterogeneity through graphical regression techniques. We formulate a new characterization of edge dependencies in such models called conditional sign independence with covariates, along with an efficient posterior sampling algorithm. In simulation studies, we demonstrate that rBGR outperforms existing graphical regression models for data generated under various levels of non-normality in both edge and covariate selection. We use rBGR to assess proteomic networks in lung and ovarian cancers to systematically investigate the effects of immunogenic heterogeneity within tumors. Our analyses reveal several important protein-protein interactions that are differentially associated with the immune cell abundance; some corroborate existing biological knowledge, whereas others are novel findings.

图形模型是研究高吞吐量数据集中复杂依赖结构的强大工具。然而,大多数现有的图形模型都有两个典型的假设:(i)所有对象都有一个共同网络的齐次图,或者(ii)正态性假设,特别是在高斯图形模型的背景下。这两种假设都是限制性的,在某些应用中可能不成立,比如癌症中的蛋白质组学网络。为此,我们提出了一种称为鲁棒贝叶斯图形回归(rBGR)的方法来估计非正态分布数据的异构图。rBGR是一个灵活的框架,通过随机边际变换来适应非正态性,并通过图形回归技术构建协变量相关图来适应异质性。我们在这些模型中提出了一种新的边缘依赖性表征,称为带协变量的条件符号独立性,以及一种有效的后验抽样算法。在模拟研究中,我们证明了rBGR在边缘和协变量选择的各种非正态性水平下生成的数据优于现有的图形回归模型。我们使用rBGR来评估肺癌和卵巢癌的蛋白质组学网络,以系统地研究肿瘤内免疫原异质性的影响。我们的分析揭示了几种重要的蛋白质-蛋白质相互作用与免疫细胞丰度的差异相关;一些证实了现有的生物学知识,而另一些则是新的发现。
{"title":"Robust Bayesian graphical regression models for assessing tumor heterogeneity in proteomic networks.","authors":"Tsung-Hung Yao, Yang Ni, Anindya Bhadra, Jian Kang, Veerabhadran Baladandayuthapani","doi":"10.1093/biomtc/ujae160","DOIUrl":"https://doi.org/10.1093/biomtc/ujae160","url":null,"abstract":"<p><p>Graphical models are powerful tools to investigate complex dependency structures in high-throughput datasets. However, most existing graphical models make one of two canonical assumptions: (i) a homogeneous graph with a common network for all subjects or (ii) an assumption of normality, especially in the context of Gaussian graphical models. Both assumptions are restrictive and can fail to hold in certain applications such as proteomic networks in cancer. To this end, we propose an approach termed robust Bayesian graphical regression (rBGR) to estimate heterogeneous graphs for non-normally distributed data. rBGR is a flexible framework that accommodates non-normality through random marginal transformations and constructs covariate-dependent graphs to accommodate heterogeneity through graphical regression techniques. We formulate a new characterization of edge dependencies in such models called conditional sign independence with covariates, along with an efficient posterior sampling algorithm. In simulation studies, we demonstrate that rBGR outperforms existing graphical regression models for data generated under various levels of non-normality in both edge and covariate selection. We use rBGR to assess proteomic networks in lung and ovarian cancers to systematically investigate the effects of immunogenic heterogeneity within tumors. Our analyses reveal several important protein-protein interactions that are differentially associated with the immune cell abundance; some corroborate existing biological knowledge, whereas others are novel findings.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 1","pages":""},"PeriodicalIF":1.4,"publicationDate":"2025-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142969463","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Gaussian processes for time series with lead-lag effects with applications to biology data. 超前滞后效应时间序列的高斯过程及其在生物数据中的应用。
IF 1.4 4区 数学 Q3 BIOLOGY Pub Date : 2025-01-07 DOI: 10.1093/biomtc/ujae156
Wancen Mu, Jiawen Chen, Eric S Davis, Kathleen Reed, Douglas Phanstiel, Michael I Love, Didong Li

Investigating the relationship, particularly the lead-lag effect, between time series is a common question across various disciplines, especially when uncovering biological processes. However, analyzing time series presents several challenges. Firstly, due to technical reasons, the time points at which observations are made are not at uniform intervals. Secondly, some lead-lag effects are transient, necessitating time-lag estimation based on a limited number of time points. Thirdly, external factors also impact these time series, requiring a similarity metric to assess the lead-lag relationship. To counter these issues, we introduce a model grounded in the Gaussian process, affording the flexibility to estimate lead-lag effects for irregular time series. In addition, our method outputs dissimilarity scores, thereby broadening its applications to include tasks such as ranking or clustering multiple pairwise time series when considering their strength of lead-lag effects with external factors. Crucially, we offer a series of theoretical proofs to substantiate the validity of our proposed kernels and the identifiability of kernel parameters. Our model demonstrates advances in various simulations and real-world applications, particularly in the study of dynamic chromatin interactions, compared to other leading methods.

研究时间序列之间的关系,特别是前导滞后效应,是各个学科的共同问题,特别是在揭示生物过程时。然而,分析时间序列会带来一些挑战。首先,由于技术原因,进行观测的时间点间隔不均匀。其次,一些超前滞后效应是短暂的,需要基于有限的时间点进行时滞估计。第三,外部因素也会影响这些时间序列,需要一个相似性度量来评估前滞后关系。为了解决这些问题,我们引入了一个基于高斯过程的模型,为估计不规则时间序列的超前滞后效应提供了灵活性。此外,我们的方法输出不相似分数,从而扩大其应用范围,包括在考虑其与外部因素的领先滞后效应强度时对多个成对时间序列进行排序或聚类等任务。至关重要的是,我们提供了一系列的理论证明来证实我们提出的核的有效性和核参数的可识别性。与其他领先的方法相比,我们的模型展示了各种模拟和现实世界应用的进步,特别是在动态染色质相互作用的研究方面。
{"title":"Gaussian processes for time series with lead-lag effects with applications to biology data.","authors":"Wancen Mu, Jiawen Chen, Eric S Davis, Kathleen Reed, Douglas Phanstiel, Michael I Love, Didong Li","doi":"10.1093/biomtc/ujae156","DOIUrl":"10.1093/biomtc/ujae156","url":null,"abstract":"<p><p>Investigating the relationship, particularly the lead-lag effect, between time series is a common question across various disciplines, especially when uncovering biological processes. However, analyzing time series presents several challenges. Firstly, due to technical reasons, the time points at which observations are made are not at uniform intervals. Secondly, some lead-lag effects are transient, necessitating time-lag estimation based on a limited number of time points. Thirdly, external factors also impact these time series, requiring a similarity metric to assess the lead-lag relationship. To counter these issues, we introduce a model grounded in the Gaussian process, affording the flexibility to estimate lead-lag effects for irregular time series. In addition, our method outputs dissimilarity scores, thereby broadening its applications to include tasks such as ranking or clustering multiple pairwise time series when considering their strength of lead-lag effects with external factors. Crucially, we offer a series of theoretical proofs to substantiate the validity of our proposed kernels and the identifiability of kernel parameters. Our model demonstrates advances in various simulations and real-world applications, particularly in the study of dynamic chromatin interactions, compared to other leading methods.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 1","pages":""},"PeriodicalIF":1.4,"publicationDate":"2025-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11704948/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142943771","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Improving estimation efficiency for survival data analysis by integrating a coarsened time-to-event outcome from an external study. 通过整合来自外部研究的粗化时间到事件结果,提高生存数据分析的估计效率。
IF 1.4 4区 数学 Q3 BIOLOGY Pub Date : 2025-01-07 DOI: 10.1093/biomtc/ujae168
Daxuan Deng, Lijun Zhang, Hao Feng, Vernon M Chinchilli, Chixiang Chen, Ming Wang

In the era of big data, increasing availability of data makes combining different data sources to obtain more accurate estimations a popular topic. However, the development of data integration is often hindered by the heterogeneity in data forms across studies. In this paper, we focus on a case in survival analysis where we have primary study data with a continuous time-to-event outcome and complete covariate measurements, while the data from an external study contain an outcome observed at regular intervals, and only a subset of covariates is measured. To incorporate external information while accounting for the different data forms, we posit working models and obtain informative weights by empirical likelihood, which will be used to construct a weighted estimator in the main analysis. We have established the theory demonstrating that the new estimator has higher estimation efficiency compared to the conventional ones, and this advantage is robust to working model misspecification, as confirmed in our simulation studies. To assess its utility, we apply our method to accommodate data from the National Alzheimer's Coordinating Center to improve the analysis of the Alzheimer's Disease Neuroimaging Initiative Phase 1 study.

在大数据时代,越来越多的数据可用性使得结合不同的数据源获得更准确的估计成为一个热门话题。然而,数据整合的发展往往受到跨研究数据形式异质性的阻碍。在本文中,我们重点关注生存分析中的一个案例,其中我们拥有具有连续时间到事件结果和完整协变量测量的原始研究数据,而来自外部研究的数据包含定期观察到的结果,并且仅测量了协变量的子集。为了在考虑不同数据形式的同时纳入外部信息,我们假设工作模型并通过经验似然获得信息权重,这些权重将用于在主要分析中构造加权估计器。我们建立的理论表明,与传统的估计器相比,新的估计器具有更高的估计效率,并且这一优势对工作模型的错误规范具有鲁棒性,我们的仿真研究证实了这一点。为了评估其效用,我们应用我们的方法来适应来自国家阿尔茨海默病协调中心的数据,以改进对阿尔茨海默病神经影像学倡议第一阶段研究的分析。
{"title":"Improving estimation efficiency for survival data analysis by integrating a coarsened time-to-event outcome from an external study.","authors":"Daxuan Deng, Lijun Zhang, Hao Feng, Vernon M Chinchilli, Chixiang Chen, Ming Wang","doi":"10.1093/biomtc/ujae168","DOIUrl":"10.1093/biomtc/ujae168","url":null,"abstract":"<p><p>In the era of big data, increasing availability of data makes combining different data sources to obtain more accurate estimations a popular topic. However, the development of data integration is often hindered by the heterogeneity in data forms across studies. In this paper, we focus on a case in survival analysis where we have primary study data with a continuous time-to-event outcome and complete covariate measurements, while the data from an external study contain an outcome observed at regular intervals, and only a subset of covariates is measured. To incorporate external information while accounting for the different data forms, we posit working models and obtain informative weights by empirical likelihood, which will be used to construct a weighted estimator in the main analysis. We have established the theory demonstrating that the new estimator has higher estimation efficiency compared to the conventional ones, and this advantage is robust to working model misspecification, as confirmed in our simulation studies. To assess its utility, we apply our method to accommodate data from the National Alzheimer's Coordinating Center to improve the analysis of the Alzheimer's Disease Neuroimaging Initiative Phase 1 study.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 1","pages":""},"PeriodicalIF":1.4,"publicationDate":"2025-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11747882/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142999230","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Distributed lag models for retrospective cohort data with application to a study of built environment and body weight.
IF 1.4 4区 数学 Q3 BIOLOGY Pub Date : 2025-01-07 DOI: 10.1093/biomtc/ujae166
Jennifer F Bobb, Stephen J Mooney, Maricela Cruz, Anne Vernez Moudon, Adam Drewnowski, David Arterburn, Andrea J Cook

Distributed lag models (DLMs) estimate the health effects of exposure over multiple time lags prior to the outcome and are widely used in time series studies. Applying DLMs to retrospective cohort studies is challenging due to inconsistent lengths of exposure history across participants, which is common when using electronic health record databases. A standard approach is to define subcohorts of individuals with some minimum exposure history, but this limits power and may amplify selection bias. We propose alternative full-cohort methods that use all available data while simultaneously enabling examination of the longest time lag estimable in the cohort. Through simulation studies, we find that restricting to a subcohort can lead to biased estimates of exposure effects due to confounding by correlated exposures at more distant lags. By contrast, full-cohort methods that incorporate multiple imputation of complete exposure histories can avoid this bias to efficiently estimate lagged and cumulative effects. Applying full-cohort DLMs to a study examining the association between residential density (a proxy for walkability) over 12 years and body weight, we find evidence of an immediate effect in the prior 1-2 years. We also observed an association at the maximal lag considered (12 years prior), which we posit reflects an earlier ($ge$12 years) or incrementally increasing prior effect over time. DLMs can be efficiently incorporated within retrospective cohort studies to identify critical windows of exposure.

{"title":"Distributed lag models for retrospective cohort data with application to a study of built environment and body weight.","authors":"Jennifer F Bobb, Stephen J Mooney, Maricela Cruz, Anne Vernez Moudon, Adam Drewnowski, David Arterburn, Andrea J Cook","doi":"10.1093/biomtc/ujae166","DOIUrl":"10.1093/biomtc/ujae166","url":null,"abstract":"<p><p>Distributed lag models (DLMs) estimate the health effects of exposure over multiple time lags prior to the outcome and are widely used in time series studies. Applying DLMs to retrospective cohort studies is challenging due to inconsistent lengths of exposure history across participants, which is common when using electronic health record databases. A standard approach is to define subcohorts of individuals with some minimum exposure history, but this limits power and may amplify selection bias. We propose alternative full-cohort methods that use all available data while simultaneously enabling examination of the longest time lag estimable in the cohort. Through simulation studies, we find that restricting to a subcohort can lead to biased estimates of exposure effects due to confounding by correlated exposures at more distant lags. By contrast, full-cohort methods that incorporate multiple imputation of complete exposure histories can avoid this bias to efficiently estimate lagged and cumulative effects. Applying full-cohort DLMs to a study examining the association between residential density (a proxy for walkability) over 12 years and body weight, we find evidence of an immediate effect in the prior 1-2 years. We also observed an association at the maximal lag considered (12 years prior), which we posit reflects an earlier ($ge$12 years) or incrementally increasing prior effect over time. DLMs can be efficiently incorporated within retrospective cohort studies to identify critical windows of exposure.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 1","pages":""},"PeriodicalIF":1.4,"publicationDate":"2025-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11760659/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143031922","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Pseudo-observations for bivariate survival data.
IF 1.4 4区 数学 Q3 BIOLOGY Pub Date : 2025-01-07 DOI: 10.1093/biomtc/ujaf006
Yael Travis-Lumer, Micha Mandel, Rebecca A Betensky

The pseudo-observations approach has been gaining popularity as a method to estimate covariate effects on censored survival data. It is used regularly to estimate covariate effects on quantities such as survival probabilities, restricted mean life, cumulative incidence, and others. In this work, we propose to generalize the pseudo-observations approach to situations where a bivariate failure-time variable is observed, subject to right censoring. The idea is to first estimate the joint survival function of both failure times and then use it to define the relevant pseudo-observations. Once the pseudo-observations are calculated, they are used as the response in a generalized linear model. We consider 2 common nonparametric estimators of the joint survival function: the estimator of Lin and Ying (1993) and the Dabrowska estimator (Dabrowska, 1988). For both estimators, we show that our bivariate pseudo-observations approach produces regression estimates that are consistent and asymptotically normal. Our proposed method enables estimation of covariate effects on quantities such as the joint survival probability at a fixed bivariate time point or simultaneously at several time points and, consequentially, can estimate covariate-adjusted conditional survival probabilities. We demonstrate the method using simulations and an analysis of 2 real-world datasets.

{"title":"Pseudo-observations for bivariate survival data.","authors":"Yael Travis-Lumer, Micha Mandel, Rebecca A Betensky","doi":"10.1093/biomtc/ujaf006","DOIUrl":"https://doi.org/10.1093/biomtc/ujaf006","url":null,"abstract":"<p><p>The pseudo-observations approach has been gaining popularity as a method to estimate covariate effects on censored survival data. It is used regularly to estimate covariate effects on quantities such as survival probabilities, restricted mean life, cumulative incidence, and others. In this work, we propose to generalize the pseudo-observations approach to situations where a bivariate failure-time variable is observed, subject to right censoring. The idea is to first estimate the joint survival function of both failure times and then use it to define the relevant pseudo-observations. Once the pseudo-observations are calculated, they are used as the response in a generalized linear model. We consider 2 common nonparametric estimators of the joint survival function: the estimator of Lin and Ying (1993) and the Dabrowska estimator (Dabrowska, 1988). For both estimators, we show that our bivariate pseudo-observations approach produces regression estimates that are consistent and asymptotically normal. Our proposed method enables estimation of covariate effects on quantities such as the joint survival probability at a fixed bivariate time point or simultaneously at several time points and, consequentially, can estimate covariate-adjusted conditional survival probabilities. We demonstrate the method using simulations and an analysis of 2 real-world datasets.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 1","pages":""},"PeriodicalIF":1.4,"publicationDate":"2025-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143188046","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Biometrics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1