In randomized controlled trials, adjusting for baseline covariates is commonly used to improve the precision of treatment effect estimation. However, covariates often have missing values. Recently, Zhao and Ding studied two simple strategies, the single imputation method and missingness-indicator method (MIM), to handle missing covariates and showed that both methods can provide an efficiency gain compared to not adjusting for covariates. To better understand and compare these two strategies, we propose and investigate a novel theoretical imputation framework termed cross-world imputation (CWI). This framework includes both single imputation and MIM as special cases, facilitating the comparison of their efficiency. Through the lens of CWI, we show that MIM implicitly searches for the optimal CWI values and thus achieves optimal efficiency. We also derive conditions under which the single imputation method, by searching for the optimal single imputation values, can achieve the same efficiency as the MIM. We illustrate our findings through simulation studies and a real data analysis based on the Childhood Adenotonsillectomy Trial. We conclude by discussing the practical implications of our findings.
{"title":"Adjusting for incomplete baseline covariates in randomized controlled trials: a cross-world imputation framework.","authors":"Yilin Song, James P Hughes, Ting Ye","doi":"10.1093/biomtc/ujae094","DOIUrl":"https://doi.org/10.1093/biomtc/ujae094","url":null,"abstract":"<p><p>In randomized controlled trials, adjusting for baseline covariates is commonly used to improve the precision of treatment effect estimation. However, covariates often have missing values. Recently, Zhao and Ding studied two simple strategies, the single imputation method and missingness-indicator method (MIM), to handle missing covariates and showed that both methods can provide an efficiency gain compared to not adjusting for covariates. To better understand and compare these two strategies, we propose and investigate a novel theoretical imputation framework termed cross-world imputation (CWI). This framework includes both single imputation and MIM as special cases, facilitating the comparison of their efficiency. Through the lens of CWI, we show that MIM implicitly searches for the optimal CWI values and thus achieves optimal efficiency. We also derive conditions under which the single imputation method, by searching for the optimal single imputation values, can achieve the same efficiency as the MIM. We illustrate our findings through simulation studies and a real data analysis based on the Childhood Adenotonsillectomy Trial. We conclude by discussing the practical implications of our findings.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11398886/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142280079","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In real-world applications involving multi-class ordinal discrimination, a common approach is to aggregate multiple predictive variables into a linear combination, aiming to develop a classifier with high prediction accuracy. Assessment of such multi-class classifiers often utilizes the hypervolume under ROC manifolds (HUM). When dealing with a substantial pool of potential predictors and achieving optimal HUM, it becomes imperative to conduct appropriate statistical inference. However, prevalent methodologies in existing literature are computationally expensive. We propose to use the jackknife empirical likelihood method to address this issue. The Wilks' theorem under moderate conditions is established and the power analysis under the Pitman alternative is provided. We also introduce a novel network-based rapid computation algorithm specifically designed for computing a general multi-sample $U$-statistic in our test procedure. To compare our approach against existing approaches, we conduct extensive simulations. Results demonstrate the superior performance of our method in terms of test size, power, and implementation time. Furthermore, we apply our method to analyze a real medical dataset and obtain some new findings.
{"title":"Hypothesis tests in ordinal predictive models with optimal accuracy.","authors":"Yuyang Liu, Shan Luo, Jialiang Li","doi":"10.1093/biomtc/ujae079","DOIUrl":"https://doi.org/10.1093/biomtc/ujae079","url":null,"abstract":"<p><p>In real-world applications involving multi-class ordinal discrimination, a common approach is to aggregate multiple predictive variables into a linear combination, aiming to develop a classifier with high prediction accuracy. Assessment of such multi-class classifiers often utilizes the hypervolume under ROC manifolds (HUM). When dealing with a substantial pool of potential predictors and achieving optimal HUM, it becomes imperative to conduct appropriate statistical inference. However, prevalent methodologies in existing literature are computationally expensive. We propose to use the jackknife empirical likelihood method to address this issue. The Wilks' theorem under moderate conditions is established and the power analysis under the Pitman alternative is provided. We also introduce a novel network-based rapid computation algorithm specifically designed for computing a general multi-sample $U$-statistic in our test procedure. To compare our approach against existing approaches, we conduct extensive simulations. Results demonstrate the superior performance of our method in terms of test size, power, and implementation time. Furthermore, we apply our method to analyze a real medical dataset and obtain some new findings.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142016282","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The geometric median, which is applicable to high-dimensional data, can be viewed as a generalization of the univariate median used in 1-dimensional data. It can be used as a robust estimator for identifying the location of multi-dimensional data and has a wide range of applications in real-world scenarios. This paper explores the problem of high-dimensional multivariate analysis of variance (MANOVA) using the geometric median. A maximum-type statistic that relies on the differences between the geometric medians among various groups is introduced. The distribution of the new test statistic is derived under the null hypothesis using Gaussian approximations, and its consistency under the alternative hypothesis is established. To approximate the distribution of the new statistic in high dimensions, a wild bootstrap algorithm is proposed and theoretically justified. Through simulation studies conducted across a variety of dimensions, sample sizes, and data-generating models, we demonstrate the finite-sample performance of our geometric median-based MANOVA method. Additionally, we implement the proposed approach to analyze a breast cancer gene expression dataset.
{"title":"High-dimensional multivariate analysis of variance via geometric median and bootstrapping.","authors":"Guanghui Cheng, Ruitao Lin, Liuhua Peng","doi":"10.1093/biomtc/ujae088","DOIUrl":"10.1093/biomtc/ujae088","url":null,"abstract":"<p><p>The geometric median, which is applicable to high-dimensional data, can be viewed as a generalization of the univariate median used in 1-dimensional data. It can be used as a robust estimator for identifying the location of multi-dimensional data and has a wide range of applications in real-world scenarios. This paper explores the problem of high-dimensional multivariate analysis of variance (MANOVA) using the geometric median. A maximum-type statistic that relies on the differences between the geometric medians among various groups is introduced. The distribution of the new test statistic is derived under the null hypothesis using Gaussian approximations, and its consistency under the alternative hypothesis is established. To approximate the distribution of the new statistic in high dimensions, a wild bootstrap algorithm is proposed and theoretically justified. Through simulation studies conducted across a variety of dimensions, sample sizes, and data-generating models, we demonstrate the finite-sample performance of our geometric median-based MANOVA method. Additionally, we implement the proposed approach to analyze a breast cancer gene expression dataset.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11381952/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142153109","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We propose a new Bayesian nonparametric method for estimating the causal effects of mediation in the presence of a post-treatment confounder. The methodology is motivated by the Rural Lifestyle Intervention Treatment Effectiveness Trial (Rural LITE) for which there is interest in estimating causal mediation effects but is complicated by the presence of a post-treatment confounder. We specify an enriched Dirichlet process mixture (EDPM) to model the joint distribution of the observed data (outcome, mediator, post-treatment confounder, treatment, and baseline confounders). For identifiability, we use the extended version of the standard sequential ignorability (SI) as introduced in Hong et al. along with a Gaussian copula model assumption. The observed data model and causal identification assumptions enable us to estimate and identify the causal effects of mediation, that is, the natural direct effects (NDE) and natural indirect effects (NIE). Our method enables easy computation of NIE and NDE for a subset of confounding variables and addresses missing data through data augmentation under the assumption of ignorable missingness. We conduct simulation studies to assess the performance of our proposed method. Furthermore, we apply this approach to evaluate the causal mediation effect in the Rural LITE trial, finding that there was not strong evidence for the potential mediator.
我们提出了一种新的贝叶斯非参数方法,用于在存在治疗后混杂因素的情况下估计中介的因果效应。该方法受农村生活方式干预治疗效果试验(Rural Lifestyle Intervention Treatment Effectiveness Trial,RITE)的启发,该试验对因果中介效应的估计很感兴趣,但由于存在治疗后混杂因素而变得复杂。我们指定了一个丰富的 Dirichlet 过程混合物(EDPM)来模拟观察数据(结果、中介因素、治疗后混杂因素、治疗和基线混杂因素)的联合分布。在可识别性方面,我们使用了 Hong 等人引入的标准序列无知(SI)的扩展版本,以及高斯共轭模型假设。观察数据模型和因果识别假设使我们能够估计和识别中介的因果效应,即自然直接效应(NDE)和自然间接效应(NIE)。我们的方法可以轻松计算混杂变量子集的自然直接效应(NIE)和自然间接效应(NDE),并在可忽略缺失的假设下通过数据扩增解决缺失数据问题。我们进行了模拟研究,以评估我们提出的方法的性能。此外,我们还应用这种方法评估了农村 LITE 试验中的因果中介效应,发现并没有强有力的证据证明潜在的中介效应。
{"title":"A Bayesian nonparametric approach for causal mediation with a post-treatment confounder.","authors":"Woojung Bae, Michael J Daniels, Michael G Perri","doi":"10.1093/biomtc/ujae099","DOIUrl":"https://doi.org/10.1093/biomtc/ujae099","url":null,"abstract":"<p><p>We propose a new Bayesian nonparametric method for estimating the causal effects of mediation in the presence of a post-treatment confounder. The methodology is motivated by the Rural Lifestyle Intervention Treatment Effectiveness Trial (Rural LITE) for which there is interest in estimating causal mediation effects but is complicated by the presence of a post-treatment confounder. We specify an enriched Dirichlet process mixture (EDPM) to model the joint distribution of the observed data (outcome, mediator, post-treatment confounder, treatment, and baseline confounders). For identifiability, we use the extended version of the standard sequential ignorability (SI) as introduced in Hong et al. along with a Gaussian copula model assumption. The observed data model and causal identification assumptions enable us to estimate and identify the causal effects of mediation, that is, the natural direct effects (NDE) and natural indirect effects (NIE). Our method enables easy computation of NIE and NDE for a subset of confounding variables and addresses missing data through data augmentation under the assumption of ignorable missingness. We conduct simulation studies to assess the performance of our proposed method. Furthermore, we apply this approach to evaluate the causal mediation effect in the Rural LITE trial, finding that there was not strong evidence for the potential mediator.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142280078","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alex Stringer, Tugba Akkaya Hocagil, Richard J Cook, Louise M Ryan, Sandra W Jacobson, Joseph L Jacobson
Benchmark dose analysis aims to estimate the level of exposure to a toxin associated with a clinically significant adverse outcome and quantifies uncertainty using the lower limit of a confidence interval for this level. We develop a novel framework for benchmark dose analysis based on monotone additive dose-response models. We first introduce a flexible approach for fitting monotone additive models via penalized B-splines and Laplace-approximate marginal likelihood. A reflective Newton method is then developed that employs de Boor's algorithm for computing splines and their derivatives for efficient estimation of the benchmark dose. Finally, we develop a novel approach for calculating benchmark dose lower limits based on an approximate pivot for the nonlinear equation solved by the estimated benchmark dose. The favorable properties of this approach compared to the Delta method and a parameteric bootstrap are discussed. We apply the new methods to make inferences about the level of prenatal alcohol exposure associated with clinically significant cognitive defects in children using data from six NIH-funded longitudinal cohort studies. Software to reproduce the results in this paper is available online and makes use of the novel semibmd R package, which implements the methods in this paper.
基准剂量分析旨在估算与临床显著不良结果相关的毒素暴露水平,并利用该水平置信区间的下限来量化不确定性。我们基于单调相加剂量反应模型开发了一种新的基准剂量分析框架。我们首先介绍了一种灵活的方法,通过受惩罚的 B-样条曲线和拉普拉斯近似边际似然法拟合单调相加模型。然后,我们开发了一种反射牛顿方法,该方法采用 de Boor 算法计算样条及其导数,从而高效地估算基准剂量。最后,我们根据估计基准剂量所求解的非线性方程的近似支点,开发了一种计算基准剂量下限的新方法。我们讨论了这种方法与德尔塔法和参数自举法相比的有利特性。我们利用美国国立卫生研究院(NIH)资助的六项纵向队列研究数据,运用新方法推断了与临床上重大儿童认知缺陷相关的产前酒精暴露水平。重现本文结果的软件可在线获取,该软件使用了新颖的 semibmd R 软件包,该软件包实现了本文的方法。
{"title":"Semi-parametric benchmark dose analysis with monotone additive models.","authors":"Alex Stringer, Tugba Akkaya Hocagil, Richard J Cook, Louise M Ryan, Sandra W Jacobson, Joseph L Jacobson","doi":"10.1093/biomtc/ujae098","DOIUrl":"https://doi.org/10.1093/biomtc/ujae098","url":null,"abstract":"<p><p>Benchmark dose analysis aims to estimate the level of exposure to a toxin associated with a clinically significant adverse outcome and quantifies uncertainty using the lower limit of a confidence interval for this level. We develop a novel framework for benchmark dose analysis based on monotone additive dose-response models. We first introduce a flexible approach for fitting monotone additive models via penalized B-splines and Laplace-approximate marginal likelihood. A reflective Newton method is then developed that employs de Boor's algorithm for computing splines and their derivatives for efficient estimation of the benchmark dose. Finally, we develop a novel approach for calculating benchmark dose lower limits based on an approximate pivot for the nonlinear equation solved by the estimated benchmark dose. The favorable properties of this approach compared to the Delta method and a parameteric bootstrap are discussed. We apply the new methods to make inferences about the level of prenatal alcohol exposure associated with clinically significant cognitive defects in children using data from six NIH-funded longitudinal cohort studies. Software to reproduce the results in this paper is available online and makes use of the novel semibmd R package, which implements the methods in this paper.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11403299/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142280083","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We address the challenge of estimating regression coefficients and selecting relevant predictors in the context of mixed linear regression in high dimensions, where the number of predictors greatly exceeds the sample size. Recent advancements in this field have centered on incorporating sparsity-inducing penalties into the expectation-maximization (EM) algorithm, which seeks to maximize the conditional likelihood of the response given the predictors. However, existing procedures often treat predictors as fixed or overlook their inherent variability. In this paper, we leverage the independence between the predictor and the latent indicator variable of mixtures to facilitate efficient computation and also achieve synergistic variable selection across all mixture components. We establish the non-asymptotic convergence rate of the proposed fast group-penalized EM estimator to the true regression parameters. The effectiveness of our method is demonstrated through extensive simulations and an application to the Cancer Cell Line Encyclopedia dataset for the prediction of anticancer drug sensitivity.
在高维度混合线性回归中,预测因子的数量大大超过了样本量,我们要解决的难题是估计回归系数和选择相关预测因子。该领域的最新进展集中在将稀疏性诱导惩罚纳入期望最大化(EM)算法中,该算法旨在最大化给定预测因子的响应的条件可能性。然而,现有程序通常将预测因子视为固定的,或忽略其固有的可变性。在本文中,我们利用预测变量和混合物的潜在指示变量之间的独立性来提高计算效率,并在所有混合物成分中实现协同变量选择。我们确定了所提出的快速组惩罚 EM 估计器对真实回归参数的非渐近收敛率。我们通过大量的模拟和应用于癌症细胞系百科全书数据集来预测抗癌药物敏感性,从而证明了我们方法的有效性。
{"title":"Leveraging independence in high-dimensional mixed linear regression.","authors":"Ning Wang, Kai Deng, Qing Mai, Xin Zhang","doi":"10.1093/biomtc/ujae103","DOIUrl":"https://doi.org/10.1093/biomtc/ujae103","url":null,"abstract":"<p><p>We address the challenge of estimating regression coefficients and selecting relevant predictors in the context of mixed linear regression in high dimensions, where the number of predictors greatly exceeds the sample size. Recent advancements in this field have centered on incorporating sparsity-inducing penalties into the expectation-maximization (EM) algorithm, which seeks to maximize the conditional likelihood of the response given the predictors. However, existing procedures often treat predictors as fixed or overlook their inherent variability. In this paper, we leverage the independence between the predictor and the latent indicator variable of mixtures to facilitate efficient computation and also achieve synergistic variable selection across all mixture components. We establish the non-asymptotic convergence rate of the proposed fast group-penalized EM estimator to the true regression parameters. The effectiveness of our method is demonstrated through extensive simulations and an application to the Cancer Cell Line Encyclopedia dataset for the prediction of anticancer drug sensitivity.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142307073","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Gaussian graphical models (GGMs) are useful for understanding the complex relationships between biological entities. Transfer learning can improve the estimation of GGMs in a target dataset by incorporating relevant information from related source studies. However, biomedical research often involves intrinsic and latent heterogeneity within a study, such as heterogeneous subpopulations. This heterogeneity can make it difficult to identify informative source studies or lead to negative transfer if the source study is improperly used. To address this challenge, we developed a heterogeneous latent transfer learning (Latent-TL) approach that accounts for both within-sample and between-sample heterogeneity. The idea behind this approach is to "learn from the alike" by leveraging the similarities between source and target GGMs within each subpopulation. The Latent-TL algorithm simultaneously identifies common subpopulation structures among samples and facilitates the learning of target GGMs using source samples from the same subpopulation. Through extensive simulations and real data application, we have shown that the proposed method outperforms single-site learning and standard transfer learning that ignores the latent structures. We have also demonstrated the applicability of the proposed algorithm in characterizing gene co-expression networks in breast cancer patients, where the inferred genetic networks identified many biologically meaningful gene-gene interactions.
{"title":"Heterogeneous latent transfer learning in Gaussian graphical models.","authors":"Qiong Wu, Chi Wang, Yong Chen","doi":"10.1093/biomtc/ujae096","DOIUrl":"https://doi.org/10.1093/biomtc/ujae096","url":null,"abstract":"<p><p>Gaussian graphical models (GGMs) are useful for understanding the complex relationships between biological entities. Transfer learning can improve the estimation of GGMs in a target dataset by incorporating relevant information from related source studies. However, biomedical research often involves intrinsic and latent heterogeneity within a study, such as heterogeneous subpopulations. This heterogeneity can make it difficult to identify informative source studies or lead to negative transfer if the source study is improperly used. To address this challenge, we developed a heterogeneous latent transfer learning (Latent-TL) approach that accounts for both within-sample and between-sample heterogeneity. The idea behind this approach is to \"learn from the alike\" by leveraging the similarities between source and target GGMs within each subpopulation. The Latent-TL algorithm simultaneously identifies common subpopulation structures among samples and facilitates the learning of target GGMs using source samples from the same subpopulation. Through extensive simulations and real data application, we have shown that the proposed method outperforms single-site learning and standard transfer learning that ignores the latent structures. We have also demonstrated the applicability of the proposed algorithm in characterizing gene co-expression networks in breast cancer patients, where the inferred genetic networks identified many biologically meaningful gene-gene interactions.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142280081","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This article addresses the challenge of estimating receiver operating characteristic (ROC) curves and the areas under these curves (AUC) in the context of an imperfect gold standard, a common issue in diagnostic accuracy studies. We delve into the nonparametric identification and estimation of ROC curves and AUCs when the reference standard for disease status is prone to error. Our approach hinges on the known or estimable accuracy of this imperfect reference standard and the conditional independent assumption, under which we demonstrate the identifiability of ROC curves and propose a nonparametric estimation method. In cases where the accuracy of the imperfect reference standard remains unknown, we establish that while ROC curves are unidentifiable, the sign of the difference between two AUCs is identifiable. This insight leads us to develop a hypothesis-testing method for assessing the relative superiority of AUCs. Compared to the existing methods, the proposed methods are nonparametric so that they do not rely on the parametric model assumptions. In addition, they are applicable to both the ROC/AUC analysis of continuous biomarkers and the AUC analysis of ordinal biomarkers. Our theoretical results and simulation studies validate the proposed methods, which we further illustrate through application in two real-world diagnostic studies.
{"title":"Nonparametric receiver operating characteristic curve analysis with an imperfect gold standard.","authors":"Jiarui Sun, Chao Tang, Wuxiang Xie, Xiao-Hua Zhou","doi":"10.1093/biomtc/ujae063","DOIUrl":"https://doi.org/10.1093/biomtc/ujae063","url":null,"abstract":"<p><p>This article addresses the challenge of estimating receiver operating characteristic (ROC) curves and the areas under these curves (AUC) in the context of an imperfect gold standard, a common issue in diagnostic accuracy studies. We delve into the nonparametric identification and estimation of ROC curves and AUCs when the reference standard for disease status is prone to error. Our approach hinges on the known or estimable accuracy of this imperfect reference standard and the conditional independent assumption, under which we demonstrate the identifiability of ROC curves and propose a nonparametric estimation method. In cases where the accuracy of the imperfect reference standard remains unknown, we establish that while ROC curves are unidentifiable, the sign of the difference between two AUCs is identifiable. This insight leads us to develop a hypothesis-testing method for assessing the relative superiority of AUCs. Compared to the existing methods, the proposed methods are nonparametric so that they do not rely on the parametric model assumptions. In addition, they are applicable to both the ROC/AUC analysis of continuous biomarkers and the AUC analysis of ordinal biomarkers. Our theoretical results and simulation studies validate the proposed methods, which we further illustrate through application in two real-world diagnostic studies.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141589542","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The response envelope model proposed by Cook et al. (2010) is an efficient method to estimate the regression coefficient under the context of the multivariate linear regression model. It improves estimation efficiency by identifying material and immaterial parts of responses and removing the immaterial variation. The response envelope model has been investigated only for continuous response variables. In this paper, we propose the multivariate probit model with latent envelope, in short, the probit envelope model, as a response envelope model for multivariate binary response variables. The probit envelope model takes into account relations between Gaussian latent variables of the multivariate probit model by using the idea of the response envelope model. We address the identifiability of the probit envelope model by employing the essential identifiability concept and suggest a Bayesian method for the parameter estimation. We illustrate the probit envelope model via simulation studies and real-data analysis. The simulation studies show that the probit envelope model has the potential to gain efficiency in estimation compared to the multivariate probit model. The real data analysis shows that the probit envelope model is useful for multi-label classification.
{"title":"Bayesian inference for multivariate probit model with latent envelope.","authors":"Kwangmin Lee, Yeonhee Park","doi":"10.1093/biomtc/ujae059","DOIUrl":"https://doi.org/10.1093/biomtc/ujae059","url":null,"abstract":"<p><p>The response envelope model proposed by Cook et al. (2010) is an efficient method to estimate the regression coefficient under the context of the multivariate linear regression model. It improves estimation efficiency by identifying material and immaterial parts of responses and removing the immaterial variation. The response envelope model has been investigated only for continuous response variables. In this paper, we propose the multivariate probit model with latent envelope, in short, the probit envelope model, as a response envelope model for multivariate binary response variables. The probit envelope model takes into account relations between Gaussian latent variables of the multivariate probit model by using the idea of the response envelope model. We address the identifiability of the probit envelope model by employing the essential identifiability concept and suggest a Bayesian method for the parameter estimation. We illustrate the probit envelope model via simulation studies and real-data analysis. The simulation studies show that the probit envelope model has the potential to gain efficiency in estimation compared to the multivariate probit model. The real data analysis shows that the probit envelope model is useful for multi-label classification.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141475824","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The summary receiver operating characteristic (SROC) curve has been recommended as one important meta-analytical summary to represent the accuracy of a diagnostic test in the presence of heterogeneous cutoff values. However, selective publication of diagnostic studies for meta-analysis can induce publication bias (PB) on the estimate of the SROC curve. Several sensitivity analysis methods have been developed to quantify PB on the SROC curve, and all these methods utilize parametric selection functions to model the selective publication mechanism. The main contribution of this article is to propose a new sensitivity analysis approach that derives the worst-case bounds for the SROC curve by adopting nonparametric selection functions under minimal assumptions. The estimation procedures of the worst-case bounds use the Monte Carlo method to approximate the bias on the SROC curves along with the corresponding area under the curves, and then the maximum and minimum values of PB under a range of marginal selection probabilities are optimized by nonlinear programming. We apply the proposed method to real-world meta-analyses to show that the worst-case bounds of the SROC curves can provide useful insights for discussing the robustness of meta-analytical findings on diagnostic test accuracy.
{"title":"Nonparametric worst-case bounds for publication bias on the summary receiver operating characteristic curve.","authors":"Yi Zhou, Ao Huang, Satoshi Hattori","doi":"10.1093/biomtc/ujae080","DOIUrl":"10.1093/biomtc/ujae080","url":null,"abstract":"<p><p>The summary receiver operating characteristic (SROC) curve has been recommended as one important meta-analytical summary to represent the accuracy of a diagnostic test in the presence of heterogeneous cutoff values. However, selective publication of diagnostic studies for meta-analysis can induce publication bias (PB) on the estimate of the SROC curve. Several sensitivity analysis methods have been developed to quantify PB on the SROC curve, and all these methods utilize parametric selection functions to model the selective publication mechanism. The main contribution of this article is to propose a new sensitivity analysis approach that derives the worst-case bounds for the SROC curve by adopting nonparametric selection functions under minimal assumptions. The estimation procedures of the worst-case bounds use the Monte Carlo method to approximate the bias on the SROC curves along with the corresponding area under the curves, and then the maximum and minimum values of PB under a range of marginal selection probabilities are optimized by nonlinear programming. We apply the proposed method to real-world meta-analyses to show that the worst-case bounds of the SROC curves can provide useful insights for discussing the robustness of meta-analytical findings on diagnostic test accuracy.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142118917","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}