Yuheng Wang, Juan Ye, Xiaohui Li, David L Borchers
Passive acoustic monitoring can be an effective way of monitoring wildlife populations that are acoustically active but difficult to survey visually, but identifying target species calls in recordings is non-trivial. Machine learning (ML) techniques can do detection quickly but may miss calls and produce false positives, i.e., misidentify calls from other sources as being from the target species. While abundance estimation methods can address the former issue effectively, methods to deal with false positives are under-investigated. We propose an acoustic spatial capture-recapture (ASCR) method that deals with false positives by treating species identity as a latent variable. Individual-level outputs from ML techniques are treated as random variables whose distributions depend on the latent identity. This gives rise to a mixture model likelihood that we maximize to estimate call density. We compare our method to existing methods by applying it to an ASCR survey of frogs and simulated acoustic surveys of gibbons based on real gibbon acoustic data. Estimates from our method are closer to ASCR applied to the dataset without false positives than those from a widely used false positive "correction factor" method. Simulations show our method to have bias close to zero and accurate coverage probabilities and to perform substantially better than ASCR without accounting for false positives.
被动声学监测是监测声学活跃但难以目测的野生动物种群的一种有效方法,但在录音中识别目标物种的叫声并非易事。机器学习(ML)技术可以快速完成检测,但可能会漏检和产生假阳性,即把其他来源的叫声误认为是目标物种的叫声。虽然丰度估算方法可以有效解决前一个问题,但处理误报的方法还没有得到充分研究。我们提出了一种声学空间捕获-再捕获(ASCR)方法,通过将物种身份作为一个潜在变量来处理假阳性。来自 ML 技术的个体级输出被视为随机变量,其分布取决于潜在身份。这就产生了一个混合模型似然,我们将其最大化以估计调用密度。通过将我们的方法应用于 ASCR 青蛙调查和基于真实长臂猿声学数据的模拟长臂猿声学调查,我们将其与现有方法进行了比较。与广泛使用的假阳性 "校正因子 "方法相比,我们的方法得出的估计值更接近于应用于数据集的无假阳性 ASCR 方法。模拟结果表明,我们的方法偏差接近于零,覆盖概率准确,在不考虑假阳性的情况下,其性能大大优于 ASCR。
{"title":"Towards automated animal density estimation with acoustic spatial capture-recapture.","authors":"Yuheng Wang, Juan Ye, Xiaohui Li, David L Borchers","doi":"10.1093/biomtc/ujae081","DOIUrl":"https://doi.org/10.1093/biomtc/ujae081","url":null,"abstract":"<p><p>Passive acoustic monitoring can be an effective way of monitoring wildlife populations that are acoustically active but difficult to survey visually, but identifying target species calls in recordings is non-trivial. Machine learning (ML) techniques can do detection quickly but may miss calls and produce false positives, i.e., misidentify calls from other sources as being from the target species. While abundance estimation methods can address the former issue effectively, methods to deal with false positives are under-investigated. We propose an acoustic spatial capture-recapture (ASCR) method that deals with false positives by treating species identity as a latent variable. Individual-level outputs from ML techniques are treated as random variables whose distributions depend on the latent identity. This gives rise to a mixture model likelihood that we maximize to estimate call density. We compare our method to existing methods by applying it to an ASCR survey of frogs and simulated acoustic surveys of gibbons based on real gibbon acoustic data. Estimates from our method are closer to ASCR applied to the dataset without false positives than those from a widely used false positive \"correction factor\" method. Simulations show our method to have bias close to zero and accurate coverage probabilities and to perform substantially better than ASCR without accounting for false positives.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142079070","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Niklas Hagemann, Giampiero Marra, Frank Bretz, Kathrin Möllenhoff
A common problem in clinical trials is to test whether the effect of an explanatory variable on a response of interest is similar between two groups, for example, patient or treatment groups. In this regard, similarity is defined as equivalence up to a pre-specified threshold that denotes an acceptable deviation between the two groups. This issue is typically tackled by assessing if the explanatory variable's effect on the response is similar. This assessment is based on, for example, confidence intervals of differences or a suitable distance between two parametric regression models. Typically, these approaches build on the assumption of a univariate continuous or binary outcome variable. However, multivariate outcomes, especially beyond the case of bivariate binary responses, remain underexplored. This paper introduces an approach based on a generalized joint regression framework exploiting the Gaussian copula. Compared to existing methods, our approach accommodates various outcome variable scales, such as continuous, binary, categorical, and ordinal, including mixed outcomes in multi-dimensional spaces. We demonstrate the validity of this approach through a simulation study and an efficacy-toxicity case study, hence highlighting its practical relevance.
{"title":"Testing for similarity of multivariate mixed outcomes using generalized joint regression models with application to efficacy-toxicity responses.","authors":"Niklas Hagemann, Giampiero Marra, Frank Bretz, Kathrin Möllenhoff","doi":"10.1093/biomtc/ujae077","DOIUrl":"https://doi.org/10.1093/biomtc/ujae077","url":null,"abstract":"<p><p>A common problem in clinical trials is to test whether the effect of an explanatory variable on a response of interest is similar between two groups, for example, patient or treatment groups. In this regard, similarity is defined as equivalence up to a pre-specified threshold that denotes an acceptable deviation between the two groups. This issue is typically tackled by assessing if the explanatory variable's effect on the response is similar. This assessment is based on, for example, confidence intervals of differences or a suitable distance between two parametric regression models. Typically, these approaches build on the assumption of a univariate continuous or binary outcome variable. However, multivariate outcomes, especially beyond the case of bivariate binary responses, remain underexplored. This paper introduces an approach based on a generalized joint regression framework exploiting the Gaussian copula. Compared to existing methods, our approach accommodates various outcome variable scales, such as continuous, binary, categorical, and ordinal, including mixed outcomes in multi-dimensional spaces. We demonstrate the validity of this approach through a simulation study and an efficacy-toxicity case study, hence highlighting its practical relevance.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142016283","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Catherine Xinrui Yu, Jiaqi Gu, Zhaomeng Chen, Zihuai He
Testing multiple hypotheses of conditional independence with provable error rate control is a fundamental problem with various applications. To infer conditional independence with family-wise error rate (FWER) control when only summary statistics of marginal dependence are accessible, we adopt GhostKnockoff to directly generate knockoff copies of summary statistics and propose a new filter to select features conditionally dependent on the response. In addition, we develop a computationally efficient algorithm to greatly reduce the computational cost of knockoff copies generation without sacrificing power and FWER control. Experiments on simulated data and a real dataset of Alzheimer's disease genetics demonstrate the advantage of the proposed method over existing alternatives in both statistical power and computational efficiency.
{"title":"Summary statistics knockoffs inference with family-wise error rate control.","authors":"Catherine Xinrui Yu, Jiaqi Gu, Zhaomeng Chen, Zihuai He","doi":"10.1093/biomtc/ujae082","DOIUrl":"10.1093/biomtc/ujae082","url":null,"abstract":"<p><p>Testing multiple hypotheses of conditional independence with provable error rate control is a fundamental problem with various applications. To infer conditional independence with family-wise error rate (FWER) control when only summary statistics of marginal dependence are accessible, we adopt GhostKnockoff to directly generate knockoff copies of summary statistics and propose a new filter to select features conditionally dependent on the response. In addition, we develop a computationally efficient algorithm to greatly reduce the computational cost of knockoff copies generation without sacrificing power and FWER control. Experiments on simulated data and a real dataset of Alzheimer's disease genetics demonstrate the advantage of the proposed method over existing alternatives in both statistical power and computational efficiency.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11367731/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142104014","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Isaac H Goldstein, Daniel M Parker, Sunny Jiang, Volodymyr M Minin
Concentrations of pathogen genomes measured in wastewater have recently become available as a new data source to use when modeling the spread of infectious diseases. One promising use for this data source is inference of the effective reproduction number, the average number of individuals a newly infected person will infect. We propose a model where new infections arrive according to a time-varying immigration rate which can be interpreted as an average number of secondary infections produced by one infectious individual per unit time. This model allows us to estimate the effective reproduction number from concentrations of pathogen genomes, while avoiding difficulty to verify assumptions about the dynamics of the susceptible population. As a byproduct of our primary goal, we also produce a new model for estimating the effective reproduction number from case data using the same framework. We test this modeling framework in an agent-based simulation study with a realistic data generating mechanism which accounts for the time-varying dynamics of pathogen shedding. Finally, we apply our new model to estimating the effective reproduction number of SARS-CoV-2, the causative agent of COVID-19, in Los Angeles, CA, using pathogen RNA concentrations collected from a large wastewater treatment facility.
{"title":"Semiparametric inference of effective reproduction number dynamics from wastewater pathogen surveillance data.","authors":"Isaac H Goldstein, Daniel M Parker, Sunny Jiang, Volodymyr M Minin","doi":"10.1093/biomtc/ujae074","DOIUrl":"10.1093/biomtc/ujae074","url":null,"abstract":"<p><p>Concentrations of pathogen genomes measured in wastewater have recently become available as a new data source to use when modeling the spread of infectious diseases. One promising use for this data source is inference of the effective reproduction number, the average number of individuals a newly infected person will infect. We propose a model where new infections arrive according to a time-varying immigration rate which can be interpreted as an average number of secondary infections produced by one infectious individual per unit time. This model allows us to estimate the effective reproduction number from concentrations of pathogen genomes, while avoiding difficulty to verify assumptions about the dynamics of the susceptible population. As a byproduct of our primary goal, we also produce a new model for estimating the effective reproduction number from case data using the same framework. We test this modeling framework in an agent-based simulation study with a realistic data generating mechanism which accounts for the time-varying dynamics of pathogen shedding. Finally, we apply our new model to estimating the effective reproduction number of SARS-CoV-2, the causative agent of COVID-19, in Los Angeles, CA, using pathogen RNA concentrations collected from a large wastewater treatment facility.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141896690","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The increasing availability and scale of biobanks and "omic" datasets bring new horizons for understanding biological mechanisms. PathGPS is an exploratory data analysis tool to discover genetic architectures using Genome Wide Association Studies (GWAS) summary data. PathGPS is based on a linear structural equation model where traits are regulated by both genetic and environmental pathways. PathGPS decouples the genetic and environmental components by contrasting the GWAS associations of "signal" genes with those of "noise" genes. From the estimated genetic component, PathGPS then extracts genetic pathways via principal component and factor analysis, leveraging the low-rank and sparse properties. In addition, we provide a bootstrap aggregating ("bagging") algorithm to improve stability under data perturbation and hyperparameter tuning. When applied to a metabolomics dataset and the UK Biobank, PathGPS confirms several known gene-trait clusters and suggests multiple new hypotheses for future investigations.
{"title":"PathGPS: discover shared genetic architecture using GWAS summary data.","authors":"Zijun Gao, Qingyuan Zhao, Trevor Hastie","doi":"10.1093/biomtc/ujae060","DOIUrl":"10.1093/biomtc/ujae060","url":null,"abstract":"<p><p>The increasing availability and scale of biobanks and \"omic\" datasets bring new horizons for understanding biological mechanisms. PathGPS is an exploratory data analysis tool to discover genetic architectures using Genome Wide Association Studies (GWAS) summary data. PathGPS is based on a linear structural equation model where traits are regulated by both genetic and environmental pathways. PathGPS decouples the genetic and environmental components by contrasting the GWAS associations of \"signal\" genes with those of \"noise\" genes. From the estimated genetic component, PathGPS then extracts genetic pathways via principal component and factor analysis, leveraging the low-rank and sparse properties. In addition, we provide a bootstrap aggregating (\"bagging\") algorithm to improve stability under data perturbation and hyperparameter tuning. When applied to a metabolomics dataset and the UK Biobank, PathGPS confirms several known gene-trait clusters and suggests multiple new hypotheses for future investigations.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11247175/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141615885","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In randomized controlled trials, adjusting for baseline covariates is commonly used to improve the precision of treatment effect estimation. However, covariates often have missing values. Recently, Zhao and Ding studied two simple strategies, the single imputation method and missingness-indicator method (MIM), to handle missing covariates and showed that both methods can provide an efficiency gain compared to not adjusting for covariates. To better understand and compare these two strategies, we propose and investigate a novel theoretical imputation framework termed cross-world imputation (CWI). This framework includes both single imputation and MIM as special cases, facilitating the comparison of their efficiency. Through the lens of CWI, we show that MIM implicitly searches for the optimal CWI values and thus achieves optimal efficiency. We also derive conditions under which the single imputation method, by searching for the optimal single imputation values, can achieve the same efficiency as the MIM. We illustrate our findings through simulation studies and a real data analysis based on the Childhood Adenotonsillectomy Trial. We conclude by discussing the practical implications of our findings.
{"title":"Adjusting for incomplete baseline covariates in randomized controlled trials: a cross-world imputation framework.","authors":"Yilin Song, James P Hughes, Ting Ye","doi":"10.1093/biomtc/ujae094","DOIUrl":"https://doi.org/10.1093/biomtc/ujae094","url":null,"abstract":"<p><p>In randomized controlled trials, adjusting for baseline covariates is commonly used to improve the precision of treatment effect estimation. However, covariates often have missing values. Recently, Zhao and Ding studied two simple strategies, the single imputation method and missingness-indicator method (MIM), to handle missing covariates and showed that both methods can provide an efficiency gain compared to not adjusting for covariates. To better understand and compare these two strategies, we propose and investigate a novel theoretical imputation framework termed cross-world imputation (CWI). This framework includes both single imputation and MIM as special cases, facilitating the comparison of their efficiency. Through the lens of CWI, we show that MIM implicitly searches for the optimal CWI values and thus achieves optimal efficiency. We also derive conditions under which the single imputation method, by searching for the optimal single imputation values, can achieve the same efficiency as the MIM. We illustrate our findings through simulation studies and a real data analysis based on the Childhood Adenotonsillectomy Trial. We conclude by discussing the practical implications of our findings.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11398886/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142280079","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In real-world applications involving multi-class ordinal discrimination, a common approach is to aggregate multiple predictive variables into a linear combination, aiming to develop a classifier with high prediction accuracy. Assessment of such multi-class classifiers often utilizes the hypervolume under ROC manifolds (HUM). When dealing with a substantial pool of potential predictors and achieving optimal HUM, it becomes imperative to conduct appropriate statistical inference. However, prevalent methodologies in existing literature are computationally expensive. We propose to use the jackknife empirical likelihood method to address this issue. The Wilks' theorem under moderate conditions is established and the power analysis under the Pitman alternative is provided. We also introduce a novel network-based rapid computation algorithm specifically designed for computing a general multi-sample $U$-statistic in our test procedure. To compare our approach against existing approaches, we conduct extensive simulations. Results demonstrate the superior performance of our method in terms of test size, power, and implementation time. Furthermore, we apply our method to analyze a real medical dataset and obtain some new findings.
{"title":"Hypothesis tests in ordinal predictive models with optimal accuracy.","authors":"Yuyang Liu, Shan Luo, Jialiang Li","doi":"10.1093/biomtc/ujae079","DOIUrl":"https://doi.org/10.1093/biomtc/ujae079","url":null,"abstract":"<p><p>In real-world applications involving multi-class ordinal discrimination, a common approach is to aggregate multiple predictive variables into a linear combination, aiming to develop a classifier with high prediction accuracy. Assessment of such multi-class classifiers often utilizes the hypervolume under ROC manifolds (HUM). When dealing with a substantial pool of potential predictors and achieving optimal HUM, it becomes imperative to conduct appropriate statistical inference. However, prevalent methodologies in existing literature are computationally expensive. We propose to use the jackknife empirical likelihood method to address this issue. The Wilks' theorem under moderate conditions is established and the power analysis under the Pitman alternative is provided. We also introduce a novel network-based rapid computation algorithm specifically designed for computing a general multi-sample $U$-statistic in our test procedure. To compare our approach against existing approaches, we conduct extensive simulations. Results demonstrate the superior performance of our method in terms of test size, power, and implementation time. Furthermore, we apply our method to analyze a real medical dataset and obtain some new findings.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142016282","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The geometric median, which is applicable to high-dimensional data, can be viewed as a generalization of the univariate median used in 1-dimensional data. It can be used as a robust estimator for identifying the location of multi-dimensional data and has a wide range of applications in real-world scenarios. This paper explores the problem of high-dimensional multivariate analysis of variance (MANOVA) using the geometric median. A maximum-type statistic that relies on the differences between the geometric medians among various groups is introduced. The distribution of the new test statistic is derived under the null hypothesis using Gaussian approximations, and its consistency under the alternative hypothesis is established. To approximate the distribution of the new statistic in high dimensions, a wild bootstrap algorithm is proposed and theoretically justified. Through simulation studies conducted across a variety of dimensions, sample sizes, and data-generating models, we demonstrate the finite-sample performance of our geometric median-based MANOVA method. Additionally, we implement the proposed approach to analyze a breast cancer gene expression dataset.
{"title":"High-dimensional multivariate analysis of variance via geometric median and bootstrapping.","authors":"Guanghui Cheng, Ruitao Lin, Liuhua Peng","doi":"10.1093/biomtc/ujae088","DOIUrl":"10.1093/biomtc/ujae088","url":null,"abstract":"<p><p>The geometric median, which is applicable to high-dimensional data, can be viewed as a generalization of the univariate median used in 1-dimensional data. It can be used as a robust estimator for identifying the location of multi-dimensional data and has a wide range of applications in real-world scenarios. This paper explores the problem of high-dimensional multivariate analysis of variance (MANOVA) using the geometric median. A maximum-type statistic that relies on the differences between the geometric medians among various groups is introduced. The distribution of the new test statistic is derived under the null hypothesis using Gaussian approximations, and its consistency under the alternative hypothesis is established. To approximate the distribution of the new statistic in high dimensions, a wild bootstrap algorithm is proposed and theoretically justified. Through simulation studies conducted across a variety of dimensions, sample sizes, and data-generating models, we demonstrate the finite-sample performance of our geometric median-based MANOVA method. Additionally, we implement the proposed approach to analyze a breast cancer gene expression dataset.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11381952/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142153109","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This discussion provides commentary on the paper by Ethan M. Alt, Xiuya Chang, Xun Jiang, Qing Liu, May Mo, H. Amy Xia, and Joseph G. Ibrahim entitled "LEAP: the latent exchangeability prior for borrowing information from historical data". The authors propose a novel method to bridge the incorporation of supplemental information into a study while also identifying potentially exchangeable subgroups to better facilitate information sharing. In this discussion, we highlight the potential relationship with other Bayesian model averaging approaches, such as multisource exchangeability modeling, and provide a brief numeric case study to illustrate how the concepts behind latent exchangeability prior may also improve the performance of existing methods. The results provided by Alt et al. are exciting, and we believe that the method represents a meaningful approach to more efficient information sharing.
本讨论对 Ethan M. Alt、Xiuya Chang、Xun Jiang、Qing Liu、May Mo、H. Amy Xia 和 Joseph G. Ibrahim 题为 "LEAP:从历史数据中借用信息的潜在可交换性先验 "的论文进行了评论。作者提出了一种新方法,在将补充信息纳入研究的同时,还能识别潜在的可交换子群,从而更好地促进信息共享。在讨论中,我们强调了与其他贝叶斯模型平均方法(如多源可交换性建模)的潜在关系,并提供了一个简短的数字案例研究,以说明潜在可交换性先验背后的概念如何也能提高现有方法的性能。Alt 等人提供的结果令人振奋,我们相信该方法是实现更高效信息共享的一种有意义的方法。
{"title":"Discussion on \"LEAP: the latent exchangeability prior for borrowing information from historical data\" by Ethan M. Alt, Xiuya Chang, Xun Jiang, Qing Liu, May Mo, H. Amy Xia, and Joseph G. Ibrahim.","authors":"Shannon D Thomas, Alexander M Kaizer","doi":"10.1093/biomtc/ujae086","DOIUrl":"10.1093/biomtc/ujae086","url":null,"abstract":"<p><p>This discussion provides commentary on the paper by Ethan M. Alt, Xiuya Chang, Xun Jiang, Qing Liu, May Mo, H. Amy Xia, and Joseph G. Ibrahim entitled \"LEAP: the latent exchangeability prior for borrowing information from historical data\". The authors propose a novel method to bridge the incorporation of supplemental information into a study while also identifying potentially exchangeable subgroups to better facilitate information sharing. In this discussion, we highlight the potential relationship with other Bayesian model averaging approaches, such as multisource exchangeability modeling, and provide a brief numeric case study to illustrate how the concepts behind latent exchangeability prior may also improve the performance of existing methods. The results provided by Alt et al. are exciting, and we believe that the method represents a meaningful approach to more efficient information sharing.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11427888/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142340681","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We propose a new Bayesian nonparametric method for estimating the causal effects of mediation in the presence of a post-treatment confounder. The methodology is motivated by the Rural Lifestyle Intervention Treatment Effectiveness Trial (Rural LITE) for which there is interest in estimating causal mediation effects but is complicated by the presence of a post-treatment confounder. We specify an enriched Dirichlet process mixture (EDPM) to model the joint distribution of the observed data (outcome, mediator, post-treatment confounder, treatment, and baseline confounders). For identifiability, we use the extended version of the standard sequential ignorability (SI) as introduced in Hong et al. along with a Gaussian copula model assumption. The observed data model and causal identification assumptions enable us to estimate and identify the causal effects of mediation, that is, the natural direct effects (NDE) and natural indirect effects (NIE). Our method enables easy computation of NIE and NDE for a subset of confounding variables and addresses missing data through data augmentation under the assumption of ignorable missingness. We conduct simulation studies to assess the performance of our proposed method. Furthermore, we apply this approach to evaluate the causal mediation effect in the Rural LITE trial, finding that there was not strong evidence for the potential mediator.
我们提出了一种新的贝叶斯非参数方法,用于在存在治疗后混杂因素的情况下估计中介的因果效应。该方法受农村生活方式干预治疗效果试验(Rural Lifestyle Intervention Treatment Effectiveness Trial,RITE)的启发,该试验对因果中介效应的估计很感兴趣,但由于存在治疗后混杂因素而变得复杂。我们指定了一个丰富的 Dirichlet 过程混合物(EDPM)来模拟观察数据(结果、中介因素、治疗后混杂因素、治疗和基线混杂因素)的联合分布。在可识别性方面,我们使用了 Hong 等人引入的标准序列无知(SI)的扩展版本,以及高斯共轭模型假设。观察数据模型和因果识别假设使我们能够估计和识别中介的因果效应,即自然直接效应(NDE)和自然间接效应(NIE)。我们的方法可以轻松计算混杂变量子集的自然直接效应(NIE)和自然间接效应(NDE),并在可忽略缺失的假设下通过数据扩增解决缺失数据问题。我们进行了模拟研究,以评估我们提出的方法的性能。此外,我们还应用这种方法评估了农村 LITE 试验中的因果中介效应,发现并没有强有力的证据证明潜在的中介效应。
{"title":"A Bayesian nonparametric approach for causal mediation with a post-treatment confounder.","authors":"Woojung Bae, Michael J Daniels, Michael G Perri","doi":"10.1093/biomtc/ujae099","DOIUrl":"10.1093/biomtc/ujae099","url":null,"abstract":"<p><p>We propose a new Bayesian nonparametric method for estimating the causal effects of mediation in the presence of a post-treatment confounder. The methodology is motivated by the Rural Lifestyle Intervention Treatment Effectiveness Trial (Rural LITE) for which there is interest in estimating causal mediation effects but is complicated by the presence of a post-treatment confounder. We specify an enriched Dirichlet process mixture (EDPM) to model the joint distribution of the observed data (outcome, mediator, post-treatment confounder, treatment, and baseline confounders). For identifiability, we use the extended version of the standard sequential ignorability (SI) as introduced in Hong et al. along with a Gaussian copula model assumption. The observed data model and causal identification assumptions enable us to estimate and identify the causal effects of mediation, that is, the natural direct effects (NDE) and natural indirect effects (NIE). Our method enables easy computation of NIE and NDE for a subset of confounding variables and addresses missing data through data augmentation under the assumption of ignorable missingness. We conduct simulation studies to assess the performance of our proposed method. Furthermore, we apply this approach to evaluate the causal mediation effect in the Rural LITE trial, finding that there was not strong evidence for the potential mediator.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11418020/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142280078","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}