The increasing availability and scale of biobanks and "omic" datasets bring new horizons for understanding biological mechanisms. PathGPS is an exploratory data analysis tool to discover genetic architectures using Genome Wide Association Studies (GWAS) summary data. PathGPS is based on a linear structural equation model where traits are regulated by both genetic and environmental pathways. PathGPS decouples the genetic and environmental components by contrasting the GWAS associations of "signal" genes with those of "noise" genes. From the estimated genetic component, PathGPS then extracts genetic pathways via principal component and factor analysis, leveraging the low-rank and sparse properties. In addition, we provide a bootstrap aggregating ("bagging") algorithm to improve stability under data perturbation and hyperparameter tuning. When applied to a metabolomics dataset and the UK Biobank, PathGPS confirms several known gene-trait clusters and suggests multiple new hypotheses for future investigations.
{"title":"PathGPS: discover shared genetic architecture using GWAS summary data.","authors":"Zijun Gao, Qingyuan Zhao, Trevor Hastie","doi":"10.1093/biomtc/ujae060","DOIUrl":"10.1093/biomtc/ujae060","url":null,"abstract":"<p><p>The increasing availability and scale of biobanks and \"omic\" datasets bring new horizons for understanding biological mechanisms. PathGPS is an exploratory data analysis tool to discover genetic architectures using Genome Wide Association Studies (GWAS) summary data. PathGPS is based on a linear structural equation model where traits are regulated by both genetic and environmental pathways. PathGPS decouples the genetic and environmental components by contrasting the GWAS associations of \"signal\" genes with those of \"noise\" genes. From the estimated genetic component, PathGPS then extracts genetic pathways via principal component and factor analysis, leveraging the low-rank and sparse properties. In addition, we provide a bootstrap aggregating (\"bagging\") algorithm to improve stability under data perturbation and hyperparameter tuning. When applied to a metabolomics dataset and the UK Biobank, PathGPS confirms several known gene-trait clusters and suggests multiple new hypotheses for future investigations.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"80 3","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11247175/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141615885","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In randomized controlled trials, adjusting for baseline covariates is commonly used to improve the precision of treatment effect estimation. However, covariates often have missing values. Recently, Zhao and Ding studied two simple strategies, the single imputation method and missingness-indicator method (MIM), to handle missing covariates and showed that both methods can provide an efficiency gain compared to not adjusting for covariates. To better understand and compare these two strategies, we propose and investigate a novel theoretical imputation framework termed cross-world imputation (CWI). This framework includes both single imputation and MIM as special cases, facilitating the comparison of their efficiency. Through the lens of CWI, we show that MIM implicitly searches for the optimal CWI values and thus achieves optimal efficiency. We also derive conditions under which the single imputation method, by searching for the optimal single imputation values, can achieve the same efficiency as the MIM. We illustrate our findings through simulation studies and a real data analysis based on the Childhood Adenotonsillectomy Trial. We conclude by discussing the practical implications of our findings.
{"title":"Adjusting for incomplete baseline covariates in randomized controlled trials: a cross-world imputation framework.","authors":"Yilin Song, James P Hughes, Ting Ye","doi":"10.1093/biomtc/ujae094","DOIUrl":"https://doi.org/10.1093/biomtc/ujae094","url":null,"abstract":"<p><p>In randomized controlled trials, adjusting for baseline covariates is commonly used to improve the precision of treatment effect estimation. However, covariates often have missing values. Recently, Zhao and Ding studied two simple strategies, the single imputation method and missingness-indicator method (MIM), to handle missing covariates and showed that both methods can provide an efficiency gain compared to not adjusting for covariates. To better understand and compare these two strategies, we propose and investigate a novel theoretical imputation framework termed cross-world imputation (CWI). This framework includes both single imputation and MIM as special cases, facilitating the comparison of their efficiency. Through the lens of CWI, we show that MIM implicitly searches for the optimal CWI values and thus achieves optimal efficiency. We also derive conditions under which the single imputation method, by searching for the optimal single imputation values, can achieve the same efficiency as the MIM. We illustrate our findings through simulation studies and a real data analysis based on the Childhood Adenotonsillectomy Trial. We conclude by discussing the practical implications of our findings.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"80 3","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11398886/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142280079","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yizhen Xu, Ji Soo Kim, Laura K Hummers, Ami A Shah, Scott L Zeger
Dynamic prediction of causal effects under different treatment regimens is an essential problem in precision medicine. It is challenging because the actual mechanisms of treatment assignment and effects are unknown in observational studies. We propose a multivariate generalized linear mixed-effects model and a Bayesian g-computation algorithm to calculate the posterior distribution of subgroup-specific intervention benefits of dynamic treatment regimes. Unmeasured time-invariant factors are included as subject-specific random effects in the assumed joint distribution of outcomes, time-varying confounders, and treatment assignments. We identify a sequential ignorability assumption conditional on treatment assignment heterogeneity, that is, analogous to balancing the latent treatment preference due to unmeasured time-invariant factors. We present a simulation study to assess the proposed method's performance. The method is applied to observational clinical data to investigate the efficacy of continuously using mycophenolate in different subgroups of scleroderma patients.
对不同治疗方案的因果效应进行动态预测是精准医学的一个基本问题。由于在观察性研究中,治疗分配和效果的实际机制尚不清楚,因此这项工作极具挑战性。我们提出了一种多变量广义线性混合效应模型和贝叶斯 g 计算算法,用于计算动态治疗方案的亚组特异性干预效益的后验分布。在假定的结果、时变混杂因素和治疗分配的联合分布中,未测量的时变因素作为特定受试者的随机效应被包含在内。我们确定了一个以治疗分配异质性为条件的连续无知假设,即类似于平衡未测量时变因素导致的潜在治疗偏好。我们通过模拟研究来评估所提出方法的性能。我们将该方法应用于观察性临床数据,以研究在硬皮病患者的不同亚组中持续使用霉酚酸酯的疗效。
{"title":"Causal inference using multivariate generalized linear mixed-effects models.","authors":"Yizhen Xu, Ji Soo Kim, Laura K Hummers, Ami A Shah, Scott L Zeger","doi":"10.1093/biomtc/ujae100","DOIUrl":"https://doi.org/10.1093/biomtc/ujae100","url":null,"abstract":"<p><p>Dynamic prediction of causal effects under different treatment regimens is an essential problem in precision medicine. It is challenging because the actual mechanisms of treatment assignment and effects are unknown in observational studies. We propose a multivariate generalized linear mixed-effects model and a Bayesian g-computation algorithm to calculate the posterior distribution of subgroup-specific intervention benefits of dynamic treatment regimes. Unmeasured time-invariant factors are included as subject-specific random effects in the assumed joint distribution of outcomes, time-varying confounders, and treatment assignments. We identify a sequential ignorability assumption conditional on treatment assignment heterogeneity, that is, analogous to balancing the latent treatment preference due to unmeasured time-invariant factors. We present a simulation study to assess the proposed method's performance. The method is applied to observational clinical data to investigate the efficacy of continuously using mycophenolate in different subgroups of scleroderma patients.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"80 3","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11422711/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142340678","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This discussion provides commentary on the paper by Ethan M. Alt, Xiuya Chang, Xun Jiang, Qing Liu, May Mo, H. Amy Xia, and Joseph G. Ibrahim entitled "LEAP: the latent exchangeability prior for borrowing information from historical data". The authors propose a novel method to bridge the incorporation of supplemental information into a study while also identifying potentially exchangeable subgroups to better facilitate information sharing. In this discussion, we highlight the potential relationship with other Bayesian model averaging approaches, such as multisource exchangeability modeling, and provide a brief numeric case study to illustrate how the concepts behind latent exchangeability prior may also improve the performance of existing methods. The results provided by Alt et al. are exciting, and we believe that the method represents a meaningful approach to more efficient information sharing.
本讨论对 Ethan M. Alt、Xiuya Chang、Xun Jiang、Qing Liu、May Mo、H. Amy Xia 和 Joseph G. Ibrahim 题为 "LEAP:从历史数据中借用信息的潜在可交换性先验 "的论文进行了评论。作者提出了一种新方法,在将补充信息纳入研究的同时,还能识别潜在的可交换子群,从而更好地促进信息共享。在讨论中,我们强调了与其他贝叶斯模型平均方法(如多源可交换性建模)的潜在关系,并提供了一个简短的数字案例研究,以说明潜在可交换性先验背后的概念如何也能提高现有方法的性能。Alt 等人提供的结果令人振奋,我们相信该方法是实现更高效信息共享的一种有意义的方法。
{"title":"Discussion on \"LEAP: the latent exchangeability prior for borrowing information from historical data\" by Ethan M. Alt, Xiuya Chang, Xun Jiang, Qing Liu, May Mo, H. Amy Xia, and Joseph G. Ibrahim.","authors":"Shannon D Thomas, Alexander M Kaizer","doi":"10.1093/biomtc/ujae086","DOIUrl":"10.1093/biomtc/ujae086","url":null,"abstract":"<p><p>This discussion provides commentary on the paper by Ethan M. Alt, Xiuya Chang, Xun Jiang, Qing Liu, May Mo, H. Amy Xia, and Joseph G. Ibrahim entitled \"LEAP: the latent exchangeability prior for borrowing information from historical data\". The authors propose a novel method to bridge the incorporation of supplemental information into a study while also identifying potentially exchangeable subgroups to better facilitate information sharing. In this discussion, we highlight the potential relationship with other Bayesian model averaging approaches, such as multisource exchangeability modeling, and provide a brief numeric case study to illustrate how the concepts behind latent exchangeability prior may also improve the performance of existing methods. The results provided by Alt et al. are exciting, and we believe that the method represents a meaningful approach to more efficient information sharing.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"80 3","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11427888/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142340681","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ethan M Alt, Xiuya Chang, Xun Jiang, Qing Liu, May Mo, Hong Amy Xia, Joseph G Ibrahim
The discussions of our paper provide insights into the practical considerations of the latent exchangeability prior while also highlighting further extensions. In this rejoinder, we briefly summarize the discussions and provide comments.
{"title":"Rejoinder to the discussion on \"LEAP: the latent exchangeability prior for borrowing information from historical data\".","authors":"Ethan M Alt, Xiuya Chang, Xun Jiang, Qing Liu, May Mo, Hong Amy Xia, Joseph G Ibrahim","doi":"10.1093/biomtc/ujae087","DOIUrl":"https://doi.org/10.1093/biomtc/ujae087","url":null,"abstract":"<p><p>The discussions of our paper provide insights into the practical considerations of the latent exchangeability prior while also highlighting further extensions. In this rejoinder, we briefly summarize the discussions and provide comments.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"80 3","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142340684","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In real-world applications involving multi-class ordinal discrimination, a common approach is to aggregate multiple predictive variables into a linear combination, aiming to develop a classifier with high prediction accuracy. Assessment of such multi-class classifiers often utilizes the hypervolume under ROC manifolds (HUM). When dealing with a substantial pool of potential predictors and achieving optimal HUM, it becomes imperative to conduct appropriate statistical inference. However, prevalent methodologies in existing literature are computationally expensive. We propose to use the jackknife empirical likelihood method to address this issue. The Wilks' theorem under moderate conditions is established and the power analysis under the Pitman alternative is provided. We also introduce a novel network-based rapid computation algorithm specifically designed for computing a general multi-sample $U$-statistic in our test procedure. To compare our approach against existing approaches, we conduct extensive simulations. Results demonstrate the superior performance of our method in terms of test size, power, and implementation time. Furthermore, we apply our method to analyze a real medical dataset and obtain some new findings.
{"title":"Hypothesis tests in ordinal predictive models with optimal accuracy.","authors":"Yuyang Liu, Shan Luo, Jialiang Li","doi":"10.1093/biomtc/ujae079","DOIUrl":"https://doi.org/10.1093/biomtc/ujae079","url":null,"abstract":"<p><p>In real-world applications involving multi-class ordinal discrimination, a common approach is to aggregate multiple predictive variables into a linear combination, aiming to develop a classifier with high prediction accuracy. Assessment of such multi-class classifiers often utilizes the hypervolume under ROC manifolds (HUM). When dealing with a substantial pool of potential predictors and achieving optimal HUM, it becomes imperative to conduct appropriate statistical inference. However, prevalent methodologies in existing literature are computationally expensive. We propose to use the jackknife empirical likelihood method to address this issue. The Wilks' theorem under moderate conditions is established and the power analysis under the Pitman alternative is provided. We also introduce a novel network-based rapid computation algorithm specifically designed for computing a general multi-sample $U$-statistic in our test procedure. To compare our approach against existing approaches, we conduct extensive simulations. Results demonstrate the superior performance of our method in terms of test size, power, and implementation time. Furthermore, we apply our method to analyze a real medical dataset and obtain some new findings.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"80 3","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142016282","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The geometric median, which is applicable to high-dimensional data, can be viewed as a generalization of the univariate median used in 1-dimensional data. It can be used as a robust estimator for identifying the location of multi-dimensional data and has a wide range of applications in real-world scenarios. This paper explores the problem of high-dimensional multivariate analysis of variance (MANOVA) using the geometric median. A maximum-type statistic that relies on the differences between the geometric medians among various groups is introduced. The distribution of the new test statistic is derived under the null hypothesis using Gaussian approximations, and its consistency under the alternative hypothesis is established. To approximate the distribution of the new statistic in high dimensions, a wild bootstrap algorithm is proposed and theoretically justified. Through simulation studies conducted across a variety of dimensions, sample sizes, and data-generating models, we demonstrate the finite-sample performance of our geometric median-based MANOVA method. Additionally, we implement the proposed approach to analyze a breast cancer gene expression dataset.
{"title":"High-dimensional multivariate analysis of variance via geometric median and bootstrapping.","authors":"Guanghui Cheng, Ruitao Lin, Liuhua Peng","doi":"10.1093/biomtc/ujae088","DOIUrl":"10.1093/biomtc/ujae088","url":null,"abstract":"<p><p>The geometric median, which is applicable to high-dimensional data, can be viewed as a generalization of the univariate median used in 1-dimensional data. It can be used as a robust estimator for identifying the location of multi-dimensional data and has a wide range of applications in real-world scenarios. This paper explores the problem of high-dimensional multivariate analysis of variance (MANOVA) using the geometric median. A maximum-type statistic that relies on the differences between the geometric medians among various groups is introduced. The distribution of the new test statistic is derived under the null hypothesis using Gaussian approximations, and its consistency under the alternative hypothesis is established. To approximate the distribution of the new statistic in high dimensions, a wild bootstrap algorithm is proposed and theoretically justified. Through simulation studies conducted across a variety of dimensions, sample sizes, and data-generating models, we demonstrate the finite-sample performance of our geometric median-based MANOVA method. Additionally, we implement the proposed approach to analyze a breast cancer gene expression dataset.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"80 3","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11381952/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142153109","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We propose a new Bayesian nonparametric method for estimating the causal effects of mediation in the presence of a post-treatment confounder. The methodology is motivated by the Rural Lifestyle Intervention Treatment Effectiveness Trial (Rural LITE) for which there is interest in estimating causal mediation effects but is complicated by the presence of a post-treatment confounder. We specify an enriched Dirichlet process mixture (EDPM) to model the joint distribution of the observed data (outcome, mediator, post-treatment confounder, treatment, and baseline confounders). For identifiability, we use the extended version of the standard sequential ignorability (SI) as introduced in Hong et al. along with a Gaussian copula model assumption. The observed data model and causal identification assumptions enable us to estimate and identify the causal effects of mediation, that is, the natural direct effects (NDE) and natural indirect effects (NIE). Our method enables easy computation of NIE and NDE for a subset of confounding variables and addresses missing data through data augmentation under the assumption of ignorable missingness. We conduct simulation studies to assess the performance of our proposed method. Furthermore, we apply this approach to evaluate the causal mediation effect in the Rural LITE trial, finding that there was not strong evidence for the potential mediator.
我们提出了一种新的贝叶斯非参数方法,用于在存在治疗后混杂因素的情况下估计中介的因果效应。该方法受农村生活方式干预治疗效果试验(Rural Lifestyle Intervention Treatment Effectiveness Trial,RITE)的启发,该试验对因果中介效应的估计很感兴趣,但由于存在治疗后混杂因素而变得复杂。我们指定了一个丰富的 Dirichlet 过程混合物(EDPM)来模拟观察数据(结果、中介因素、治疗后混杂因素、治疗和基线混杂因素)的联合分布。在可识别性方面,我们使用了 Hong 等人引入的标准序列无知(SI)的扩展版本,以及高斯共轭模型假设。观察数据模型和因果识别假设使我们能够估计和识别中介的因果效应,即自然直接效应(NDE)和自然间接效应(NIE)。我们的方法可以轻松计算混杂变量子集的自然直接效应(NIE)和自然间接效应(NDE),并在可忽略缺失的假设下通过数据扩增解决缺失数据问题。我们进行了模拟研究,以评估我们提出的方法的性能。此外,我们还应用这种方法评估了农村 LITE 试验中的因果中介效应,发现并没有强有力的证据证明潜在的中介效应。
{"title":"A Bayesian nonparametric approach for causal mediation with a post-treatment confounder.","authors":"Woojung Bae, Michael J Daniels, Michael G Perri","doi":"10.1093/biomtc/ujae099","DOIUrl":"10.1093/biomtc/ujae099","url":null,"abstract":"<p><p>We propose a new Bayesian nonparametric method for estimating the causal effects of mediation in the presence of a post-treatment confounder. The methodology is motivated by the Rural Lifestyle Intervention Treatment Effectiveness Trial (Rural LITE) for which there is interest in estimating causal mediation effects but is complicated by the presence of a post-treatment confounder. We specify an enriched Dirichlet process mixture (EDPM) to model the joint distribution of the observed data (outcome, mediator, post-treatment confounder, treatment, and baseline confounders). For identifiability, we use the extended version of the standard sequential ignorability (SI) as introduced in Hong et al. along with a Gaussian copula model assumption. The observed data model and causal identification assumptions enable us to estimate and identify the causal effects of mediation, that is, the natural direct effects (NDE) and natural indirect effects (NIE). Our method enables easy computation of NIE and NDE for a subset of confounding variables and addresses missing data through data augmentation under the assumption of ignorable missingness. We conduct simulation studies to assess the performance of our proposed method. Furthermore, we apply this approach to evaluate the causal mediation effect in the Rural LITE trial, finding that there was not strong evidence for the potential mediator.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"80 3","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11418020/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142280078","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Gaussian graphical models (GGMs) are useful for understanding the complex relationships between biological entities. Transfer learning can improve the estimation of GGMs in a target dataset by incorporating relevant information from related source studies. However, biomedical research often involves intrinsic and latent heterogeneity within a study, such as heterogeneous subpopulations. This heterogeneity can make it difficult to identify informative source studies or lead to negative transfer if the source study is improperly used. To address this challenge, we developed a heterogeneous latent transfer learning (Latent-TL) approach that accounts for both within-sample and between-sample heterogeneity. The idea behind this approach is to "learn from the alike" by leveraging the similarities between source and target GGMs within each subpopulation. The Latent-TL algorithm simultaneously identifies common subpopulation structures among samples and facilitates the learning of target GGMs using source samples from the same subpopulation. Through extensive simulations and real data application, we have shown that the proposed method outperforms single-site learning and standard transfer learning that ignores the latent structures. We have also demonstrated the applicability of the proposed algorithm in characterizing gene co-expression networks in breast cancer patients, where the inferred genetic networks identified many biologically meaningful gene-gene interactions.
{"title":"Heterogeneous latent transfer learning in Gaussian graphical models.","authors":"Qiong Wu, Chi Wang, Yong Chen","doi":"10.1093/biomtc/ujae096","DOIUrl":"10.1093/biomtc/ujae096","url":null,"abstract":"<p><p>Gaussian graphical models (GGMs) are useful for understanding the complex relationships between biological entities. Transfer learning can improve the estimation of GGMs in a target dataset by incorporating relevant information from related source studies. However, biomedical research often involves intrinsic and latent heterogeneity within a study, such as heterogeneous subpopulations. This heterogeneity can make it difficult to identify informative source studies or lead to negative transfer if the source study is improperly used. To address this challenge, we developed a heterogeneous latent transfer learning (Latent-TL) approach that accounts for both within-sample and between-sample heterogeneity. The idea behind this approach is to \"learn from the alike\" by leveraging the similarities between source and target GGMs within each subpopulation. The Latent-TL algorithm simultaneously identifies common subpopulation structures among samples and facilitates the learning of target GGMs using source samples from the same subpopulation. Through extensive simulations and real data application, we have shown that the proposed method outperforms single-site learning and standard transfer learning that ignores the latent structures. We have also demonstrated the applicability of the proposed algorithm in characterizing gene co-expression networks in breast cancer patients, where the inferred genetic networks identified many biologically meaningful gene-gene interactions.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"80 3","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11413907/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142280081","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alex Stringer, Tugba Akkaya Hocagil, Richard J Cook, Louise M Ryan, Sandra W Jacobson, Joseph L Jacobson
Benchmark dose analysis aims to estimate the level of exposure to a toxin associated with a clinically significant adverse outcome and quantifies uncertainty using the lower limit of a confidence interval for this level. We develop a novel framework for benchmark dose analysis based on monotone additive dose-response models. We first introduce a flexible approach for fitting monotone additive models via penalized B-splines and Laplace-approximate marginal likelihood. A reflective Newton method is then developed that employs de Boor's algorithm for computing splines and their derivatives for efficient estimation of the benchmark dose. Finally, we develop a novel approach for calculating benchmark dose lower limits based on an approximate pivot for the nonlinear equation solved by the estimated benchmark dose. The favorable properties of this approach compared to the Delta method and a parameteric bootstrap are discussed. We apply the new methods to make inferences about the level of prenatal alcohol exposure associated with clinically significant cognitive defects in children using data from six NIH-funded longitudinal cohort studies. Software to reproduce the results in this paper is available online and makes use of the novel semibmd R package, which implements the methods in this paper.
基准剂量分析旨在估算与临床显著不良结果相关的毒素暴露水平,并利用该水平置信区间的下限来量化不确定性。我们基于单调相加剂量反应模型开发了一种新的基准剂量分析框架。我们首先介绍了一种灵活的方法,通过受惩罚的 B-样条曲线和拉普拉斯近似边际似然法拟合单调相加模型。然后,我们开发了一种反射牛顿方法,该方法采用 de Boor 算法计算样条及其导数,从而高效地估算基准剂量。最后,我们根据估计基准剂量所求解的非线性方程的近似支点,开发了一种计算基准剂量下限的新方法。我们讨论了这种方法与德尔塔法和参数自举法相比的有利特性。我们利用美国国立卫生研究院(NIH)资助的六项纵向队列研究数据,运用新方法推断了与临床上重大儿童认知缺陷相关的产前酒精暴露水平。重现本文结果的软件可在线获取,该软件使用了新颖的 semibmd R 软件包,该软件包实现了本文的方法。
{"title":"Semi-parametric benchmark dose analysis with monotone additive models.","authors":"Alex Stringer, Tugba Akkaya Hocagil, Richard J Cook, Louise M Ryan, Sandra W Jacobson, Joseph L Jacobson","doi":"10.1093/biomtc/ujae098","DOIUrl":"https://doi.org/10.1093/biomtc/ujae098","url":null,"abstract":"<p><p>Benchmark dose analysis aims to estimate the level of exposure to a toxin associated with a clinically significant adverse outcome and quantifies uncertainty using the lower limit of a confidence interval for this level. We develop a novel framework for benchmark dose analysis based on monotone additive dose-response models. We first introduce a flexible approach for fitting monotone additive models via penalized B-splines and Laplace-approximate marginal likelihood. A reflective Newton method is then developed that employs de Boor's algorithm for computing splines and their derivatives for efficient estimation of the benchmark dose. Finally, we develop a novel approach for calculating benchmark dose lower limits based on an approximate pivot for the nonlinear equation solved by the estimated benchmark dose. The favorable properties of this approach compared to the Delta method and a parameteric bootstrap are discussed. We apply the new methods to make inferences about the level of prenatal alcohol exposure associated with clinically significant cognitive defects in children using data from six NIH-funded longitudinal cohort studies. Software to reproduce the results in this paper is available online and makes use of the novel semibmd R package, which implements the methods in this paper.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"80 3","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11403299/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142280083","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}