Before implementing a biomarker test for early cancer detection into routine clinical care, the test must demonstrate clinical utility, that is, the test results should lead to clinical actions that positively affect patient-relevant outcomes. Unlike therapeutical trials for patients diagnosed with cancer, designing a randomized controlled trial (RCT) to demonstrate the clinical utility of an early detection biomarker with mortality and related endpoints poses unique challenges. The hurdles stem from the prolonged natural progression of the disease and the lack of information regarding the time-varying screening effect on the target asymptomatic population. To facilitate the study design of screening trials, we propose using a generic multistate disease history model and derive model-based effect sizes. The model links key performance metrics of the test, such as sensitivity, to primary endpoints like the incidence of late-stage cancer. It also incorporates the practical implementation of the biomarker-testing program in real-world scenarios. Based on the chronological time scale aligned with RCT, our method allows the assessment of study powers based on key features of the new program, including the test sensitivity, the length of follow-up, and the number and frequency of repeated tests. The calculation tool from the proposed method will enable practitioners to perform realistic and quick evaluations when strategizing screening trials for specific diseases. We use numerical examples based on the National Lung Screening Trial to demonstrate the method.
{"title":"Designing cancer screening trials for reduction in late-stage cancer incidence.","authors":"Kehao Zhu, Ying-Qi Zhao, Yingye Zheng","doi":"10.1093/biomtc/ujae097","DOIUrl":"https://doi.org/10.1093/biomtc/ujae097","url":null,"abstract":"<p><p>Before implementing a biomarker test for early cancer detection into routine clinical care, the test must demonstrate clinical utility, that is, the test results should lead to clinical actions that positively affect patient-relevant outcomes. Unlike therapeutical trials for patients diagnosed with cancer, designing a randomized controlled trial (RCT) to demonstrate the clinical utility of an early detection biomarker with mortality and related endpoints poses unique challenges. The hurdles stem from the prolonged natural progression of the disease and the lack of information regarding the time-varying screening effect on the target asymptomatic population. To facilitate the study design of screening trials, we propose using a generic multistate disease history model and derive model-based effect sizes. The model links key performance metrics of the test, such as sensitivity, to primary endpoints like the incidence of late-stage cancer. It also incorporates the practical implementation of the biomarker-testing program in real-world scenarios. Based on the chronological time scale aligned with RCT, our method allows the assessment of study powers based on key features of the new program, including the test sensitivity, the length of follow-up, and the number and frequency of repeated tests. The calculation tool from the proposed method will enable practitioners to perform realistic and quick evaluations when strategizing screening trials for specific diseases. We use numerical examples based on the National Lung Screening Trial to demonstrate the method.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"80 3","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11413908/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142280080","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The paper extends the empirical likelihood (EL) approach of Liu et al. to a new and very flexible family of latent class models for capture-recapture data also allowing for serial dependence on previous capture history, conditionally on latent type and covariates. The EL approach allows to estimate the overall population size directly rather than by adding estimates conditional to covariate configurations. A Fisher-scoring algorithm for maximum likelihood estimation is proposed and a more efficient alternative to the traditional EL approach for estimating the non-parametric component is introduced; this allows us to show that the mapping between the non-parametric distribution of the covariates and the probabilities of being never captured is one-to-one and strictly increasing. Asymptotic results are outlined, and a procedure for constructing profile likelihood confidence intervals for the population size is presented. Two examples based on real data are used to illustrate the proposed approach and a simulation study indicates that, when estimating the overall undercount, the method proposed here is substantially more efficient than the one based on conditional maximum likelihood estimation, especially when the sample size is not sufficiently large.
本文将 Liu 等人的经验似然法(EL)扩展到一个新的、非常灵活的捕获-再捕获数据的潜类模型系列,该模型还允许对先前捕获历史的序列依赖,并以潜类和协变量为条件。EL 方法允许直接估计总体种群数量,而不是通过添加协变量配置条件下的估计值。我们提出了最大似然估计的费雪评分算法,并引入了一种更有效的替代传统 EL 方法的方法来估计非参数成分;这使我们能够证明协变量的非参数分布与从未被捕获的概率之间的映射是一一对应且严格递增的。概述了渐近结果,并介绍了构建人口规模的轮廓似然置信区间的程序。模拟研究表明,在估算总体少计人数时,本文提出的方法比基于条件最大似然估算的方法更有效,尤其是在样本量不够大的情况下。
{"title":"Estimating the size of a closed population by modeling latent and observed heterogeneity.","authors":"Francesco Bartolucci, Antonio Forcina","doi":"10.1093/biomtc/ujae017","DOIUrl":"10.1093/biomtc/ujae017","url":null,"abstract":"<p><p>The paper extends the empirical likelihood (EL) approach of Liu et al. to a new and very flexible family of latent class models for capture-recapture data also allowing for serial dependence on previous capture history, conditionally on latent type and covariates. The EL approach allows to estimate the overall population size directly rather than by adding estimates conditional to covariate configurations. A Fisher-scoring algorithm for maximum likelihood estimation is proposed and a more efficient alternative to the traditional EL approach for estimating the non-parametric component is introduced; this allows us to show that the mapping between the non-parametric distribution of the covariates and the probabilities of being never captured is one-to-one and strictly increasing. Asymptotic results are outlined, and a procedure for constructing profile likelihood confidence intervals for the population size is presented. Two examples based on real data are used to illustrate the proposed approach and a simulation study indicates that, when estimating the overall undercount, the method proposed here is substantially more efficient than the one based on conditional maximum likelihood estimation, especially when the sample size is not sufficiently large.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"80 2","pages":""},"PeriodicalIF":1.9,"publicationDate":"2024-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140304679","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Charles E McCulloch, John M Neuhaus, Ross D Boylan
Statistical models incorporating cluster-specific intercepts are commonly used in hierarchical settings, for example, observations clustered within patients or patients clustered within hospitals. Predicted values of these intercepts are often used to identify or "flag" extreme or outlying clusters, such as poorly performing hospitals or patients with rapid declines in their health. We consider a variety of flagging rules, assessing different predictors, and using different accuracy measures. Using theoretical calculations and comprehensive numerical evaluation, we show that previously proposed rules based on the 2 most commonly used predictors, the usual best linear unbiased predictor and fixed effects predictor, perform extremely poorly: the incorrect flagging rates are either unacceptably high (approaching 0.5 in the limit) or overly conservative (eg, much <0.05 for reasonable parameter values, leading to very low correct flagging rates). We develop novel methods for flagging extreme clusters that can control the incorrect flagging rates, including very simple-to-use versions that we call "self-calibrated." The new methods have substantially higher correct flagging rates than previously proposed methods for flagging extreme values, while controlling the incorrect flagging rates. We illustrate their application using data on length of stay in pediatric hospitals for children admitted for asthma diagnoses.
{"title":"Flagging unusual clusters based on linear mixed models using weighted and self-calibrated predictors.","authors":"Charles E McCulloch, John M Neuhaus, Ross D Boylan","doi":"10.1093/biomtc/ujae022","DOIUrl":"10.1093/biomtc/ujae022","url":null,"abstract":"<p><p>Statistical models incorporating cluster-specific intercepts are commonly used in hierarchical settings, for example, observations clustered within patients or patients clustered within hospitals. Predicted values of these intercepts are often used to identify or \"flag\" extreme or outlying clusters, such as poorly performing hospitals or patients with rapid declines in their health. We consider a variety of flagging rules, assessing different predictors, and using different accuracy measures. Using theoretical calculations and comprehensive numerical evaluation, we show that previously proposed rules based on the 2 most commonly used predictors, the usual best linear unbiased predictor and fixed effects predictor, perform extremely poorly: the incorrect flagging rates are either unacceptably high (approaching 0.5 in the limit) or overly conservative (eg, much <0.05 for reasonable parameter values, leading to very low correct flagging rates). We develop novel methods for flagging extreme clusters that can control the incorrect flagging rates, including very simple-to-use versions that we call \"self-calibrated.\" The new methods have substantially higher correct flagging rates than previously proposed methods for flagging extreme values, while controlling the incorrect flagging rates. We illustrate their application using data on length of stay in pediatric hospitals for children admitted for asthma diagnoses.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"80 2","pages":""},"PeriodicalIF":1.9,"publicationDate":"2024-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140334556","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Discussion on \"Bayesian meta-analysis of penetrance for cancer risk\" by Thanthirige Lakshika M. Ruberu, Danielle Braun, Giovanni Parmigiani, and Swati Biswas.","authors":"Sudipto Banerjee","doi":"10.1093/biomtc/ujae039","DOIUrl":"10.1093/biomtc/ujae039","url":null,"abstract":"","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"80 2","pages":""},"PeriodicalIF":1.9,"publicationDate":"2024-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11140848/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141178761","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Doubly adaptive biased coin design (DBCD), a response-adaptive randomization scheme, aims to skew subject assignment probabilities based on accrued responses for ethical considerations. Recent years have seen substantial advances in understanding DBCD's theoretical properties, assuming correct model specification for the responses. However, concerns have been raised about the impact of model misspecification on its design and analysis. In this paper, we assess the robustness to both design model misspecification and analysis model misspecification under DBCD. On one hand, we confirm that the consistency and asymptotic normality of the allocation proportions can be preserved, even when the responses follow a distribution other than the one imposed by the design model during the implementation of DBCD. On the other hand, we extensively investigate three commonly used linear regression models for estimating and inferring the treatment effect, namely difference-in-means, analysis of covariance (ANCOVA) I, and ANCOVA II. By allowing these regression models to be arbitrarily misspecified, thereby not reflecting the true data generating process, we derive the consistency and asymptotic normality of the treatment effect estimators evaluated from the three models. The asymptotic properties show that the ANCOVA II model, which takes covariate-by-treatment interaction terms into account, yields the most efficient estimator. These results can provide theoretical support for using DBCD in scenarios involving model misspecification, thereby promoting the widespread application of this randomization procedure.
{"title":"Robustness of response-adaptive randomization.","authors":"Xiaoqing Ye, Feifang Hu, Wei Ma","doi":"10.1093/biomtc/ujae049","DOIUrl":"https://doi.org/10.1093/biomtc/ujae049","url":null,"abstract":"<p><p>Doubly adaptive biased coin design (DBCD), a response-adaptive randomization scheme, aims to skew subject assignment probabilities based on accrued responses for ethical considerations. Recent years have seen substantial advances in understanding DBCD's theoretical properties, assuming correct model specification for the responses. However, concerns have been raised about the impact of model misspecification on its design and analysis. In this paper, we assess the robustness to both design model misspecification and analysis model misspecification under DBCD. On one hand, we confirm that the consistency and asymptotic normality of the allocation proportions can be preserved, even when the responses follow a distribution other than the one imposed by the design model during the implementation of DBCD. On the other hand, we extensively investigate three commonly used linear regression models for estimating and inferring the treatment effect, namely difference-in-means, analysis of covariance (ANCOVA) I, and ANCOVA II. By allowing these regression models to be arbitrarily misspecified, thereby not reflecting the true data generating process, we derive the consistency and asymptotic normality of the treatment effect estimators evaluated from the three models. The asymptotic properties show that the ANCOVA II model, which takes covariate-by-treatment interaction terms into account, yields the most efficient estimator. These results can provide theoretical support for using DBCD in scenarios involving model misspecification, thereby promoting the widespread application of this randomization procedure.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"80 2","pages":""},"PeriodicalIF":1.9,"publicationDate":"2024-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141178772","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
It has become consensus that mild cognitive impairment (MCI), one of the early symptoms onset of Alzheimer's disease (AD), may appear 10 or more years after the emergence of neuropathological abnormalities. Therefore, understanding the progression of AD biomarkers and uncovering when brain alterations begin in the preclinical stage, while patients are still cognitively normal, are crucial for effective early detection and therapeutic development. In this paper, we develop a Bayesian semiparametric framework that jointly models the longitudinal trajectory of the AD biomarker with a changepoint relative to the occurrence of symptoms onset, which is subject to left truncation and right censoring, in a heterogeneous population. Furthermore, unlike most existing methods assuming that everyone in the considered population will eventually develop the disease, our approach accounts for the possibility that some individuals may never experience MCI or AD, even after a long follow-up time. We evaluate the proposed model through simulation studies and demonstrate its clinical utility by examining an important AD biomarker, ptau181, using a dataset from the Biomarkers of Cognitive Decline Among Normal Individuals (BIOCARD) study.
轻度认知障碍(MCI)是阿尔茨海默病(AD)的早期症状之一,可能在神经病理学异常出现 10 年或更长时间后才出现,这一点已成为共识。因此,了解阿尔茨海默病生物标志物的发展过程,并在患者认知能力正常的情况下揭示大脑改变何时开始于临床前阶段,对于有效的早期检测和治疗开发至关重要。在本文中,我们开发了一种贝叶斯半参数框架,在异质性人群中,该框架可联合建模AD生物标志物的纵向轨迹与相对于症状发作的变化点,该变化点会受到左截断和右删减的影响。此外,与大多数现有方法假设所考虑人群中的每个人最终都会发病不同,我们的方法考虑到了某些个体即使经过长时间随访也可能从未出现 MCI 或 AD 的可能性。我们通过模拟研究对所提出的模型进行了评估,并利用正常人认知能力下降生物标志物(BIOCARD)研究的数据集检测了一个重要的注意力缺失症生物标志物 ptau181,从而证明了该模型的临床实用性。
{"title":"A Bayesian semi-parametric model for learning biomarker trajectories and changepoints in the preclinical phase of Alzheimer's disease.","authors":"Kunbo Wang, William Hua, MeiCheng Wang, Yanxun Xu","doi":"10.1093/biomtc/ujae048","DOIUrl":"10.1093/biomtc/ujae048","url":null,"abstract":"<p><p>It has become consensus that mild cognitive impairment (MCI), one of the early symptoms onset of Alzheimer's disease (AD), may appear 10 or more years after the emergence of neuropathological abnormalities. Therefore, understanding the progression of AD biomarkers and uncovering when brain alterations begin in the preclinical stage, while patients are still cognitively normal, are crucial for effective early detection and therapeutic development. In this paper, we develop a Bayesian semiparametric framework that jointly models the longitudinal trajectory of the AD biomarker with a changepoint relative to the occurrence of symptoms onset, which is subject to left truncation and right censoring, in a heterogeneous population. Furthermore, unlike most existing methods assuming that everyone in the considered population will eventually develop the disease, our approach accounts for the possibility that some individuals may never experience MCI or AD, even after a long follow-up time. We evaluate the proposed model through simulation studies and demonstrate its clinical utility by examining an important AD biomarker, ptau181, using a dataset from the Biomarkers of Cognitive Decline Among Normal Individuals (BIOCARD) study.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"80 2","pages":""},"PeriodicalIF":1.9,"publicationDate":"2024-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11110494/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141074619","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In mobile health, tailoring interventions for real-time delivery is of paramount importance. Micro-randomized trials have emerged as the "gold-standard" methodology for developing such interventions. Analyzing data from these trials provides insights into the efficacy of interventions and the potential moderation by specific covariates. The "causal excursion effect," a novel class of causal estimand, addresses these inquiries. Yet, existing research mainly focuses on continuous or binary data, leaving count data largely unexplored. The current work is motivated by the Drink Less micro-randomized trial from the UK, which focuses on a zero-inflated proximal outcome, i.e., the number of screen views in the subsequent hour following the intervention decision point. To be specific, we revisit the concept of causal excursion effect, specifically for zero-inflated count outcomes, and introduce novel estimation approaches that incorporate nonparametric techniques. Bidirectional asymptotics are established for the proposed estimators. Simulation studies are conducted to evaluate the performance of the proposed methods. As an illustration, we also implement these methods to the Drink Less trial data.
{"title":"Incorporating nonparametric methods for estimating causal excursion effects in mobile health with zero-inflated count outcomes.","authors":"Xueqing Liu, Tianchen Qian, Lauren Bell, Bibhas Chakraborty","doi":"10.1093/biomtc/ujae054","DOIUrl":"https://doi.org/10.1093/biomtc/ujae054","url":null,"abstract":"<p><p>In mobile health, tailoring interventions for real-time delivery is of paramount importance. Micro-randomized trials have emerged as the \"gold-standard\" methodology for developing such interventions. Analyzing data from these trials provides insights into the efficacy of interventions and the potential moderation by specific covariates. The \"causal excursion effect,\" a novel class of causal estimand, addresses these inquiries. Yet, existing research mainly focuses on continuous or binary data, leaving count data largely unexplored. The current work is motivated by the Drink Less micro-randomized trial from the UK, which focuses on a zero-inflated proximal outcome, i.e., the number of screen views in the subsequent hour following the intervention decision point. To be specific, we revisit the concept of causal excursion effect, specifically for zero-inflated count outcomes, and introduce novel estimation approaches that incorporate nonparametric techniques. Bidirectional asymptotics are established for the proposed estimators. Simulation studies are conducted to evaluate the performance of the proposed methods. As an illustration, we also implement these methods to the Drink Less trial data.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"80 2","pages":""},"PeriodicalIF":1.9,"publicationDate":"2024-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141260409","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yeseul Jeon, Won Chang, Seonghyun Jeong, Sanghoon Han, Jaewoo Park
Convolutional neural networks (CNNs) provide flexible function approximations for a wide variety of applications when the input variables are in the form of images or spatial data. Although CNNs often outperform traditional statistical models in prediction accuracy, statistical inference, such as estimating the effects of covariates and quantifying the prediction uncertainty, is not trivial due to the highly complicated model structure and overparameterization. To address this challenge, we propose a new Bayesian approach by embedding CNNs within the generalized linear models (GLMs) framework. We use extracted nodes from the last hidden layer of CNN with Monte Carlo (MC) dropout as informative covariates in GLM. This improves accuracy in prediction and regression coefficient inference, allowing for the interpretation of coefficients and uncertainty quantification. By fitting ensemble GLMs across multiple realizations from MC dropout, we can account for uncertainties in extracting the features. We apply our methods to biological and epidemiological problems, which have both high-dimensional correlated inputs and vector covariates. Specifically, we consider malaria incidence data, brain tumor image data, and fMRI data. By extracting information from correlated inputs, the proposed method can provide an interpretable Bayesian analysis. The algorithm can be broadly applicable to image regressions or correlated data analysis by enabling accurate Bayesian inference quickly.
当输入变量为图像或空间数据时,卷积神经网络(CNN)可为各种应用提供灵活的函数近似。虽然卷积神经网络在预测准确性上往往优于传统统计模型,但由于模型结构非常复杂且参数过多,统计推断(如估计协变量的影响和量化预测的不确定性)并非易事。为了应对这一挑战,我们提出了一种新的贝叶斯方法,即在广义线性模型(GLM)框架内嵌入 CNN。我们将从 CNN 最后一个隐藏层提取的节点与蒙特卡罗(MC)剔除作为广义线性模型中的信息协变量。这提高了预测和回归系数推断的准确性,允许对系数进行解释和不确定性量化。通过拟合来自 MC 丢失的多个变现的集合 GLM,我们可以考虑提取特征时的不确定性。我们将我们的方法应用于生物和流行病学问题,这些问题既有高维相关输入,也有向量协变量。具体来说,我们考虑了疟疾发病率数据、脑肿瘤图像数据和 fMRI 数据。通过从相关输入中提取信息,所提出的方法可以提供可解释的贝叶斯分析。通过快速实现准确的贝叶斯推理,该算法可广泛应用于图像回归或相关数据分析。
{"title":"A Bayesian convolutional neural network-based generalized linear model.","authors":"Yeseul Jeon, Won Chang, Seonghyun Jeong, Sanghoon Han, Jaewoo Park","doi":"10.1093/biomtc/ujae057","DOIUrl":"https://doi.org/10.1093/biomtc/ujae057","url":null,"abstract":"<p><p>Convolutional neural networks (CNNs) provide flexible function approximations for a wide variety of applications when the input variables are in the form of images or spatial data. Although CNNs often outperform traditional statistical models in prediction accuracy, statistical inference, such as estimating the effects of covariates and quantifying the prediction uncertainty, is not trivial due to the highly complicated model structure and overparameterization. To address this challenge, we propose a new Bayesian approach by embedding CNNs within the generalized linear models (GLMs) framework. We use extracted nodes from the last hidden layer of CNN with Monte Carlo (MC) dropout as informative covariates in GLM. This improves accuracy in prediction and regression coefficient inference, allowing for the interpretation of coefficients and uncertainty quantification. By fitting ensemble GLMs across multiple realizations from MC dropout, we can account for uncertainties in extracting the features. We apply our methods to biological and epidemiological problems, which have both high-dimensional correlated inputs and vector covariates. Specifically, we consider malaria incidence data, brain tumor image data, and fMRI data. By extracting information from correlated inputs, the proposed method can provide an interpretable Bayesian analysis. The algorithm can be broadly applicable to image regressions or correlated data analysis by enabling accurate Bayesian inference quickly.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"80 2","pages":""},"PeriodicalIF":1.9,"publicationDate":"2024-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141417569","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
To leverage the advancements in genome-wide association studies (GWAS) and quantitative trait loci (QTL) mapping for traits and molecular phenotypes to gain mechanistic understanding of the genetic regulation, biological researchers often investigate the expression QTLs (eQTLs) that colocalize with QTL or GWAS peaks. Our research is inspired by 2 such studies. One aims to identify the causal single nucleotide polymorphisms that are responsible for the phenotypic variation and whose effects can be explained by their impacts at the transcriptomic level in maize. The other study in mouse focuses on uncovering the cis-driver genes that induce phenotypic changes by regulating trans-regulated genes. Both studies can be formulated as mediation problems with potentially high-dimensional exposures, confounders, and mediators that seek to estimate the overall indirect effect (IE) for each exposure. In this paper, we propose MedDiC, a novel procedure to estimate the overall IE based on difference-in-coefficients approach. Our simulation studies find that MedDiC offers valid inference for the IE with higher power, shorter confidence intervals, and faster computing time than competing methods. We apply MedDiC to the 2 aforementioned motivating datasets and find that MedDiC yields reproducible outputs across the analysis of closely related traits, with results supported by external biological evidence. The code and additional information are available on our GitHub page (https://github.com/QiZhangStat/MedDiC).
{"title":"Dissecting the colocalized GWAS and eQTLs with mediation analysis for high-dimensional exposures and confounders.","authors":"Qi Zhang, Zhikai Yang, Jinliang Yang","doi":"10.1093/biomtc/ujae050","DOIUrl":"https://doi.org/10.1093/biomtc/ujae050","url":null,"abstract":"<p><p>To leverage the advancements in genome-wide association studies (GWAS) and quantitative trait loci (QTL) mapping for traits and molecular phenotypes to gain mechanistic understanding of the genetic regulation, biological researchers often investigate the expression QTLs (eQTLs) that colocalize with QTL or GWAS peaks. Our research is inspired by 2 such studies. One aims to identify the causal single nucleotide polymorphisms that are responsible for the phenotypic variation and whose effects can be explained by their impacts at the transcriptomic level in maize. The other study in mouse focuses on uncovering the cis-driver genes that induce phenotypic changes by regulating trans-regulated genes. Both studies can be formulated as mediation problems with potentially high-dimensional exposures, confounders, and mediators that seek to estimate the overall indirect effect (IE) for each exposure. In this paper, we propose MedDiC, a novel procedure to estimate the overall IE based on difference-in-coefficients approach. Our simulation studies find that MedDiC offers valid inference for the IE with higher power, shorter confidence intervals, and faster computing time than competing methods. We apply MedDiC to the 2 aforementioned motivating datasets and find that MedDiC yields reproducible outputs across the analysis of closely related traits, with results supported by external biological evidence. The code and additional information are available on our GitHub page (https://github.com/QiZhangStat/MedDiC).</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"80 2","pages":""},"PeriodicalIF":1.9,"publicationDate":"2024-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141155115","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Evan Kwiatkowski, Jiawen Zhu, Xiao Li, Herbert Pang, Grazyna Lieberman, Matthew A Psioda
We develop a method for hybrid analyses that uses external controls to augment internal control arms in randomized controlled trials (RCTs) where the degree of borrowing is determined based on similarity between RCT and external control patients to account for systematic differences (e.g., unmeasured confounders). The method represents a novel extension of the power prior where discounting weights are computed separately for each external control based on compatibility with the randomized control data. The discounting weights are determined using the predictive distribution for the external controls derived via the posterior distribution for time-to-event parameters estimated from the RCT. This method is applied using a proportional hazards regression model with piecewise constant baseline hazard. A simulation study and a real-data example are presented based on a completed trial in non-small cell lung cancer. It is shown that the case weighted power prior provides robust inference under various forms of incompatibility between the external controls and RCT population.
{"title":"Case weighted power priors for hybrid control analyses with time-to-event data.","authors":"Evan Kwiatkowski, Jiawen Zhu, Xiao Li, Herbert Pang, Grazyna Lieberman, Matthew A Psioda","doi":"10.1093/biomtc/ujae019","DOIUrl":"10.1093/biomtc/ujae019","url":null,"abstract":"<p><p>We develop a method for hybrid analyses that uses external controls to augment internal control arms in randomized controlled trials (RCTs) where the degree of borrowing is determined based on similarity between RCT and external control patients to account for systematic differences (e.g., unmeasured confounders). The method represents a novel extension of the power prior where discounting weights are computed separately for each external control based on compatibility with the randomized control data. The discounting weights are determined using the predictive distribution for the external controls derived via the posterior distribution for time-to-event parameters estimated from the RCT. This method is applied using a proportional hazards regression model with piecewise constant baseline hazard. A simulation study and a real-data example are presented based on a completed trial in non-small cell lung cancer. It is shown that the case weighted power prior provides robust inference under various forms of incompatibility between the external controls and RCT population.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"80 2","pages":""},"PeriodicalIF":1.9,"publicationDate":"2024-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10968526/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140304678","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}