首页 > 最新文献

Biometrics最新文献

英文 中文
Adjusting for incomplete baseline covariates in randomized controlled trials: a cross-world imputation framework. 调整随机对照试验中不完整的基线协变量:跨世界估算框架。
IF 1.4 4区 数学 Q3 BIOLOGY Pub Date : 2024-07-01 DOI: 10.1093/biomtc/ujae094
Yilin Song, James P Hughes, Ting Ye

In randomized controlled trials, adjusting for baseline covariates is commonly used to improve the precision of treatment effect estimation. However, covariates often have missing values. Recently, Zhao and Ding studied two simple strategies, the single imputation method and missingness-indicator method (MIM), to handle missing covariates and showed that both methods can provide an efficiency gain compared to not adjusting for covariates. To better understand and compare these two strategies, we propose and investigate a novel theoretical imputation framework termed cross-world imputation (CWI). This framework includes both single imputation and MIM as special cases, facilitating the comparison of their efficiency. Through the lens of CWI, we show that MIM implicitly searches for the optimal CWI values and thus achieves optimal efficiency. We also derive conditions under which the single imputation method, by searching for the optimal single imputation values, can achieve the same efficiency as the MIM. We illustrate our findings through simulation studies and a real data analysis based on the Childhood Adenotonsillectomy Trial. We conclude by discussing the practical implications of our findings.

在随机对照试验中,调整基线协变量通常用于提高治疗效果估计的精确度。然而,协变量往往存在缺失值。最近,赵(Zhao)和丁(Ding)研究了处理缺失协变量的两种简单策略,即单一估算法和缺失指示器法(MIM),结果表明,与不调整协变量相比,这两种方法都能提高效率。为了更好地理解和比较这两种策略,我们提出并研究了一种新的理论估算框架,称为跨世界估算(CWI)。该框架将单一估算和 MIM 作为特例,便于比较它们的效率。通过 CWI 的视角,我们表明 MIM 会隐含地搜索最佳 CWI 值,从而实现最佳效率。我们还推导出了单一估算方法通过寻找最佳单一估算值而达到与 MIM 相同效率的条件。我们通过模拟研究和基于儿童腺样体切除术试验的真实数据分析来说明我们的发现。最后,我们将讨论我们的发现的实际意义。
{"title":"Adjusting for incomplete baseline covariates in randomized controlled trials: a cross-world imputation framework.","authors":"Yilin Song, James P Hughes, Ting Ye","doi":"10.1093/biomtc/ujae094","DOIUrl":"https://doi.org/10.1093/biomtc/ujae094","url":null,"abstract":"<p><p>In randomized controlled trials, adjusting for baseline covariates is commonly used to improve the precision of treatment effect estimation. However, covariates often have missing values. Recently, Zhao and Ding studied two simple strategies, the single imputation method and missingness-indicator method (MIM), to handle missing covariates and showed that both methods can provide an efficiency gain compared to not adjusting for covariates. To better understand and compare these two strategies, we propose and investigate a novel theoretical imputation framework termed cross-world imputation (CWI). This framework includes both single imputation and MIM as special cases, facilitating the comparison of their efficiency. Through the lens of CWI, we show that MIM implicitly searches for the optimal CWI values and thus achieves optimal efficiency. We also derive conditions under which the single imputation method, by searching for the optimal single imputation values, can achieve the same efficiency as the MIM. We illustrate our findings through simulation studies and a real data analysis based on the Childhood Adenotonsillectomy Trial. We conclude by discussing the practical implications of our findings.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11398886/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142280079","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hypothesis tests in ordinal predictive models with optimal accuracy. 具有最佳准确性的序数预测模型中的假设检验。
IF 1.4 4区 数学 Q3 BIOLOGY Pub Date : 2024-07-01 DOI: 10.1093/biomtc/ujae079
Yuyang Liu, Shan Luo, Jialiang Li

In real-world applications involving multi-class ordinal discrimination, a common approach is to aggregate multiple predictive variables into a linear combination, aiming to develop a classifier with high prediction accuracy. Assessment of such multi-class classifiers often utilizes the hypervolume under ROC manifolds (HUM). When dealing with a substantial pool of potential predictors and achieving optimal HUM, it becomes imperative to conduct appropriate statistical inference. However, prevalent methodologies in existing literature are computationally expensive. We propose to use the jackknife empirical likelihood method to address this issue. The Wilks' theorem under moderate conditions is established and the power analysis under the Pitman alternative is provided. We also introduce a novel network-based rapid computation algorithm specifically designed for computing a general multi-sample $U$-statistic in our test procedure. To compare our approach against existing approaches, we conduct extensive simulations. Results demonstrate the superior performance of our method in terms of test size, power, and implementation time. Furthermore, we apply our method to analyze a real medical dataset and obtain some new findings.

在涉及多类序数判别的实际应用中,一种常见的方法是将多个预测变量聚合成一个线性组合,从而开发出一种预测精度高的分类器。对这种多类分类器的评估通常使用 ROC 流形下的超体积(HUM)。在处理大量潜在预测因子并实现最佳 HUM 时,必须进行适当的统计推断。然而,现有文献中普遍采用的方法计算成本高昂。我们建议使用杰克刀经验似然法(jackknife empirical likelihood method)来解决这一问题。我们建立了温和条件下的 Wilks' 定理,并提供了 Pitman 备选方案下的幂次分析。我们还引入了一种基于网络的新型快速计算算法,专门用于计算测试程序中的一般多样本 U$ 统计量。为了将我们的方法与现有方法进行比较,我们进行了大量模拟。结果表明,我们的方法在测试规模、功率和实施时间方面都具有卓越的性能。此外,我们还应用我们的方法分析了一个真实的医疗数据集,并获得了一些新的发现。
{"title":"Hypothesis tests in ordinal predictive models with optimal accuracy.","authors":"Yuyang Liu, Shan Luo, Jialiang Li","doi":"10.1093/biomtc/ujae079","DOIUrl":"https://doi.org/10.1093/biomtc/ujae079","url":null,"abstract":"<p><p>In real-world applications involving multi-class ordinal discrimination, a common approach is to aggregate multiple predictive variables into a linear combination, aiming to develop a classifier with high prediction accuracy. Assessment of such multi-class classifiers often utilizes the hypervolume under ROC manifolds (HUM). When dealing with a substantial pool of potential predictors and achieving optimal HUM, it becomes imperative to conduct appropriate statistical inference. However, prevalent methodologies in existing literature are computationally expensive. We propose to use the jackknife empirical likelihood method to address this issue. The Wilks' theorem under moderate conditions is established and the power analysis under the Pitman alternative is provided. We also introduce a novel network-based rapid computation algorithm specifically designed for computing a general multi-sample $U$-statistic in our test procedure. To compare our approach against existing approaches, we conduct extensive simulations. Results demonstrate the superior performance of our method in terms of test size, power, and implementation time. Furthermore, we apply our method to analyze a real medical dataset and obtain some new findings.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142016282","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
High-dimensional multivariate analysis of variance via geometric median and bootstrapping. 通过几何中值和引导进行高维多变量方差分析。
IF 1.4 4区 数学 Q3 BIOLOGY Pub Date : 2024-07-01 DOI: 10.1093/biomtc/ujae088
Guanghui Cheng, Ruitao Lin, Liuhua Peng

The geometric median, which is applicable to high-dimensional data, can be viewed as a generalization of the univariate median used in 1-dimensional data. It can be used as a robust estimator for identifying the location of multi-dimensional data and has a wide range of applications in real-world scenarios. This paper explores the problem of high-dimensional multivariate analysis of variance (MANOVA) using the geometric median. A maximum-type statistic that relies on the differences between the geometric medians among various groups is introduced. The distribution of the new test statistic is derived under the null hypothesis using Gaussian approximations, and its consistency under the alternative hypothesis is established. To approximate the distribution of the new statistic in high dimensions, a wild bootstrap algorithm is proposed and theoretically justified. Through simulation studies conducted across a variety of dimensions, sample sizes, and data-generating models, we demonstrate the finite-sample performance of our geometric median-based MANOVA method. Additionally, we implement the proposed approach to analyze a breast cancer gene expression dataset.

几何中值适用于高维数据,可视为用于一维数据的单变量中值的一般化。它可以作为一种稳健的估计器来识别多维数据的位置,在现实世界中有着广泛的应用。本文探讨了使用几何中值进行高维多变量方差分析(MANOVA)的问题。本文介绍了一种依赖于各组间几何中值差异的最大值型统计量。利用高斯近似法得出了新检验统计量在零假设下的分布,并确定了其在备择假设下的一致性。为了逼近新统计量在高维度下的分布,提出了一种野生引导算法,并从理论上证明了该算法的合理性。通过对各种维度、样本大小和数据生成模型进行模拟研究,我们证明了基于几何中值的 MANOVA 方法的有限样本性能。此外,我们还利用提出的方法分析了乳腺癌基因表达数据集。
{"title":"High-dimensional multivariate analysis of variance via geometric median and bootstrapping.","authors":"Guanghui Cheng, Ruitao Lin, Liuhua Peng","doi":"10.1093/biomtc/ujae088","DOIUrl":"10.1093/biomtc/ujae088","url":null,"abstract":"<p><p>The geometric median, which is applicable to high-dimensional data, can be viewed as a generalization of the univariate median used in 1-dimensional data. It can be used as a robust estimator for identifying the location of multi-dimensional data and has a wide range of applications in real-world scenarios. This paper explores the problem of high-dimensional multivariate analysis of variance (MANOVA) using the geometric median. A maximum-type statistic that relies on the differences between the geometric medians among various groups is introduced. The distribution of the new test statistic is derived under the null hypothesis using Gaussian approximations, and its consistency under the alternative hypothesis is established. To approximate the distribution of the new statistic in high dimensions, a wild bootstrap algorithm is proposed and theoretically justified. Through simulation studies conducted across a variety of dimensions, sample sizes, and data-generating models, we demonstrate the finite-sample performance of our geometric median-based MANOVA method. Additionally, we implement the proposed approach to analyze a breast cancer gene expression dataset.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11381952/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142153109","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Bayesian nonparametric approach for causal mediation with a post-treatment confounder. 贝叶斯非参数方法,用于处理后混杂因素的因果中介。
IF 1.4 4区 数学 Q3 BIOLOGY Pub Date : 2024-07-01 DOI: 10.1093/biomtc/ujae099
Woojung Bae, Michael J Daniels, Michael G Perri

We propose a new Bayesian nonparametric method for estimating the causal effects of mediation in the presence of a post-treatment confounder. The methodology is motivated by the Rural Lifestyle Intervention Treatment Effectiveness Trial (Rural LITE) for which there is interest in estimating causal mediation effects but is complicated by the presence of a post-treatment confounder. We specify an enriched Dirichlet process mixture (EDPM) to model the joint distribution of the observed data (outcome, mediator, post-treatment confounder, treatment, and baseline confounders). For identifiability, we use the extended version of the standard sequential ignorability (SI) as introduced in Hong et al. along with a Gaussian copula model assumption. The observed data model and causal identification assumptions enable us to estimate and identify the causal effects of mediation, that is, the natural direct effects (NDE) and natural indirect effects (NIE). Our method enables easy computation of NIE and NDE for a subset of confounding variables and addresses missing data through data augmentation under the assumption of ignorable missingness. We conduct simulation studies to assess the performance of our proposed method. Furthermore, we apply this approach to evaluate the causal mediation effect in the Rural LITE trial, finding that there was not strong evidence for the potential mediator.

我们提出了一种新的贝叶斯非参数方法,用于在存在治疗后混杂因素的情况下估计中介的因果效应。该方法受农村生活方式干预治疗效果试验(Rural Lifestyle Intervention Treatment Effectiveness Trial,RITE)的启发,该试验对因果中介效应的估计很感兴趣,但由于存在治疗后混杂因素而变得复杂。我们指定了一个丰富的 Dirichlet 过程混合物(EDPM)来模拟观察数据(结果、中介因素、治疗后混杂因素、治疗和基线混杂因素)的联合分布。在可识别性方面,我们使用了 Hong 等人引入的标准序列无知(SI)的扩展版本,以及高斯共轭模型假设。观察数据模型和因果识别假设使我们能够估计和识别中介的因果效应,即自然直接效应(NDE)和自然间接效应(NIE)。我们的方法可以轻松计算混杂变量子集的自然直接效应(NIE)和自然间接效应(NDE),并在可忽略缺失的假设下通过数据扩增解决缺失数据问题。我们进行了模拟研究,以评估我们提出的方法的性能。此外,我们还应用这种方法评估了农村 LITE 试验中的因果中介效应,发现并没有强有力的证据证明潜在的中介效应。
{"title":"A Bayesian nonparametric approach for causal mediation with a post-treatment confounder.","authors":"Woojung Bae, Michael J Daniels, Michael G Perri","doi":"10.1093/biomtc/ujae099","DOIUrl":"https://doi.org/10.1093/biomtc/ujae099","url":null,"abstract":"<p><p>We propose a new Bayesian nonparametric method for estimating the causal effects of mediation in the presence of a post-treatment confounder. The methodology is motivated by the Rural Lifestyle Intervention Treatment Effectiveness Trial (Rural LITE) for which there is interest in estimating causal mediation effects but is complicated by the presence of a post-treatment confounder. We specify an enriched Dirichlet process mixture (EDPM) to model the joint distribution of the observed data (outcome, mediator, post-treatment confounder, treatment, and baseline confounders). For identifiability, we use the extended version of the standard sequential ignorability (SI) as introduced in Hong et al. along with a Gaussian copula model assumption. The observed data model and causal identification assumptions enable us to estimate and identify the causal effects of mediation, that is, the natural direct effects (NDE) and natural indirect effects (NIE). Our method enables easy computation of NIE and NDE for a subset of confounding variables and addresses missing data through data augmentation under the assumption of ignorable missingness. We conduct simulation studies to assess the performance of our proposed method. Furthermore, we apply this approach to evaluate the causal mediation effect in the Rural LITE trial, finding that there was not strong evidence for the potential mediator.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142280078","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Semi-parametric benchmark dose analysis with monotone additive models. 使用单调相加模型进行半参数基准剂量分析
IF 1.4 4区 数学 Q3 BIOLOGY Pub Date : 2024-07-01 DOI: 10.1093/biomtc/ujae098
Alex Stringer, Tugba Akkaya Hocagil, Richard J Cook, Louise M Ryan, Sandra W Jacobson, Joseph L Jacobson

Benchmark dose analysis aims to estimate the level of exposure to a toxin associated with a clinically significant adverse outcome and quantifies uncertainty using the lower limit of a confidence interval for this level. We develop a novel framework for benchmark dose analysis based on monotone additive dose-response models. We first introduce a flexible approach for fitting monotone additive models via penalized B-splines and Laplace-approximate marginal likelihood. A reflective Newton method is then developed that employs de Boor's algorithm for computing splines and their derivatives for efficient estimation of the benchmark dose. Finally, we develop a novel approach for calculating benchmark dose lower limits based on an approximate pivot for the nonlinear equation solved by the estimated benchmark dose. The favorable properties of this approach compared to the Delta method and a parameteric bootstrap are discussed. We apply the new methods to make inferences about the level of prenatal alcohol exposure associated with clinically significant cognitive defects in children using data from six NIH-funded longitudinal cohort studies. Software to reproduce the results in this paper is available online and makes use of the novel semibmd  R package, which implements the methods in this paper.

基准剂量分析旨在估算与临床显著不良结果相关的毒素暴露水平,并利用该水平置信区间的下限来量化不确定性。我们基于单调相加剂量反应模型开发了一种新的基准剂量分析框架。我们首先介绍了一种灵活的方法,通过受惩罚的 B-样条曲线和拉普拉斯近似边际似然法拟合单调相加模型。然后,我们开发了一种反射牛顿方法,该方法采用 de Boor 算法计算样条及其导数,从而高效地估算基准剂量。最后,我们根据估计基准剂量所求解的非线性方程的近似支点,开发了一种计算基准剂量下限的新方法。我们讨论了这种方法与德尔塔法和参数自举法相比的有利特性。我们利用美国国立卫生研究院(NIH)资助的六项纵向队列研究数据,运用新方法推断了与临床上重大儿童认知缺陷相关的产前酒精暴露水平。重现本文结果的软件可在线获取,该软件使用了新颖的 semibmd R 软件包,该软件包实现了本文的方法。
{"title":"Semi-parametric benchmark dose analysis with monotone additive models.","authors":"Alex Stringer, Tugba Akkaya Hocagil, Richard J Cook, Louise M Ryan, Sandra W Jacobson, Joseph L Jacobson","doi":"10.1093/biomtc/ujae098","DOIUrl":"https://doi.org/10.1093/biomtc/ujae098","url":null,"abstract":"<p><p>Benchmark dose analysis aims to estimate the level of exposure to a toxin associated with a clinically significant adverse outcome and quantifies uncertainty using the lower limit of a confidence interval for this level. We develop a novel framework for benchmark dose analysis based on monotone additive dose-response models. We first introduce a flexible approach for fitting monotone additive models via penalized B-splines and Laplace-approximate marginal likelihood. A reflective Newton method is then developed that employs de Boor's algorithm for computing splines and their derivatives for efficient estimation of the benchmark dose. Finally, we develop a novel approach for calculating benchmark dose lower limits based on an approximate pivot for the nonlinear equation solved by the estimated benchmark dose. The favorable properties of this approach compared to the Delta method and a parameteric bootstrap are discussed. We apply the new methods to make inferences about the level of prenatal alcohol exposure associated with clinically significant cognitive defects in children using data from six NIH-funded longitudinal cohort studies. Software to reproduce the results in this paper is available online and makes use of the novel semibmd  R package, which implements the methods in this paper.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11403299/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142280083","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Leveraging independence in high-dimensional mixed linear regression. 利用高维混合线性回归中的独立性
IF 1.4 4区 数学 Q3 BIOLOGY Pub Date : 2024-07-01 DOI: 10.1093/biomtc/ujae103
Ning Wang, Kai Deng, Qing Mai, Xin Zhang

We address the challenge of estimating regression coefficients and selecting relevant predictors in the context of mixed linear regression in high dimensions, where the number of predictors greatly exceeds the sample size. Recent advancements in this field have centered on incorporating sparsity-inducing penalties into the expectation-maximization (EM) algorithm, which seeks to maximize the conditional likelihood of the response given the predictors. However, existing procedures often treat predictors as fixed or overlook their inherent variability. In this paper, we leverage the independence between the predictor and the latent indicator variable of mixtures to facilitate efficient computation and also achieve synergistic variable selection across all mixture components. We establish the non-asymptotic convergence rate of the proposed fast group-penalized EM estimator to the true regression parameters. The effectiveness of our method is demonstrated through extensive simulations and an application to the Cancer Cell Line Encyclopedia dataset for the prediction of anticancer drug sensitivity.

在高维度混合线性回归中,预测因子的数量大大超过了样本量,我们要解决的难题是估计回归系数和选择相关预测因子。该领域的最新进展集中在将稀疏性诱导惩罚纳入期望最大化(EM)算法中,该算法旨在最大化给定预测因子的响应的条件可能性。然而,现有程序通常将预测因子视为固定的,或忽略其固有的可变性。在本文中,我们利用预测变量和混合物的潜在指示变量之间的独立性来提高计算效率,并在所有混合物成分中实现协同变量选择。我们确定了所提出的快速组惩罚 EM 估计器对真实回归参数的非渐近收敛率。我们通过大量的模拟和应用于癌症细胞系百科全书数据集来预测抗癌药物敏感性,从而证明了我们方法的有效性。
{"title":"Leveraging independence in high-dimensional mixed linear regression.","authors":"Ning Wang, Kai Deng, Qing Mai, Xin Zhang","doi":"10.1093/biomtc/ujae103","DOIUrl":"https://doi.org/10.1093/biomtc/ujae103","url":null,"abstract":"<p><p>We address the challenge of estimating regression coefficients and selecting relevant predictors in the context of mixed linear regression in high dimensions, where the number of predictors greatly exceeds the sample size. Recent advancements in this field have centered on incorporating sparsity-inducing penalties into the expectation-maximization (EM) algorithm, which seeks to maximize the conditional likelihood of the response given the predictors. However, existing procedures often treat predictors as fixed or overlook their inherent variability. In this paper, we leverage the independence between the predictor and the latent indicator variable of mixtures to facilitate efficient computation and also achieve synergistic variable selection across all mixture components. We establish the non-asymptotic convergence rate of the proposed fast group-penalized EM estimator to the true regression parameters. The effectiveness of our method is demonstrated through extensive simulations and an application to the Cancer Cell Line Encyclopedia dataset for the prediction of anticancer drug sensitivity.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142307073","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Heterogeneous latent transfer learning in Gaussian graphical models. 高斯图形模型中的异质潜移默化学习。
IF 1.4 4区 数学 Q3 BIOLOGY Pub Date : 2024-07-01 DOI: 10.1093/biomtc/ujae096
Qiong Wu, Chi Wang, Yong Chen

Gaussian graphical models (GGMs) are useful for understanding the complex relationships between biological entities. Transfer learning can improve the estimation of GGMs in a target dataset by incorporating relevant information from related source studies. However, biomedical research often involves intrinsic and latent heterogeneity within a study, such as heterogeneous subpopulations. This heterogeneity can make it difficult to identify informative source studies or lead to negative transfer if the source study is improperly used. To address this challenge, we developed a heterogeneous latent transfer learning (Latent-TL) approach that accounts for both within-sample and between-sample heterogeneity. The idea behind this approach is to "learn from the alike" by leveraging the similarities between source and target GGMs within each subpopulation. The Latent-TL algorithm simultaneously identifies common subpopulation structures among samples and facilitates the learning of target GGMs using source samples from the same subpopulation. Through extensive simulations and real data application, we have shown that the proposed method outperforms single-site learning and standard transfer learning that ignores the latent structures. We have also demonstrated the applicability of the proposed algorithm in characterizing gene co-expression networks in breast cancer patients, where the inferred genetic networks identified many biologically meaningful gene-gene interactions.

高斯图形模型(GGM)有助于理解生物实体之间的复杂关系。迁移学习可以通过纳入相关源研究的相关信息来改进目标数据集中高斯图模型的估计。然而,生物医学研究往往涉及研究中的内在和潜在异质性,如异质性亚群。这种异质性可能会导致难以识别信息来源研究,或者在来源研究使用不当的情况下导致负迁移。为了应对这一挑战,我们开发了一种异质性潜移默化迁移学习(Latent-TL)方法,它同时考虑了样本内和样本间的异质性。这种方法背后的理念是利用每个子群中源和目标 GGM 之间的相似性来 "从相似中学习"。Latent-TL 算法能同时识别样本间共同的亚群结构,并利用来自同一亚群的源样本促进目标 GGM 的学习。通过大量的模拟和实际数据应用,我们证明了所提出的方法优于忽略潜在结构的单点学习和标准迁移学习。我们还证明了所提算法在表征乳腺癌患者基因共表达网络中的适用性,推断出的基因网络发现了许多具有生物学意义的基因-基因相互作用。
{"title":"Heterogeneous latent transfer learning in Gaussian graphical models.","authors":"Qiong Wu, Chi Wang, Yong Chen","doi":"10.1093/biomtc/ujae096","DOIUrl":"https://doi.org/10.1093/biomtc/ujae096","url":null,"abstract":"<p><p>Gaussian graphical models (GGMs) are useful for understanding the complex relationships between biological entities. Transfer learning can improve the estimation of GGMs in a target dataset by incorporating relevant information from related source studies. However, biomedical research often involves intrinsic and latent heterogeneity within a study, such as heterogeneous subpopulations. This heterogeneity can make it difficult to identify informative source studies or lead to negative transfer if the source study is improperly used. To address this challenge, we developed a heterogeneous latent transfer learning (Latent-TL) approach that accounts for both within-sample and between-sample heterogeneity. The idea behind this approach is to \"learn from the alike\" by leveraging the similarities between source and target GGMs within each subpopulation. The Latent-TL algorithm simultaneously identifies common subpopulation structures among samples and facilitates the learning of target GGMs using source samples from the same subpopulation. Through extensive simulations and real data application, we have shown that the proposed method outperforms single-site learning and standard transfer learning that ignores the latent structures. We have also demonstrated the applicability of the proposed algorithm in characterizing gene co-expression networks in breast cancer patients, where the inferred genetic networks identified many biologically meaningful gene-gene interactions.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142280081","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Nonparametric receiver operating characteristic curve analysis with an imperfect gold standard. 使用不完善的金标准进行非参数接收器工作特征曲线分析。
IF 1.4 4区 数学 Q3 BIOLOGY Pub Date : 2024-07-01 DOI: 10.1093/biomtc/ujae063
Jiarui Sun, Chao Tang, Wuxiang Xie, Xiao-Hua Zhou

This article addresses the challenge of estimating receiver operating characteristic (ROC) curves and the areas under these curves (AUC) in the context of an imperfect gold standard, a common issue in diagnostic accuracy studies. We delve into the nonparametric identification and estimation of ROC curves and AUCs when the reference standard for disease status is prone to error. Our approach hinges on the known or estimable accuracy of this imperfect reference standard and the conditional independent assumption, under which we demonstrate the identifiability of ROC curves and propose a nonparametric estimation method. In cases where the accuracy of the imperfect reference standard remains unknown, we establish that while ROC curves are unidentifiable, the sign of the difference between two AUCs is identifiable. This insight leads us to develop a hypothesis-testing method for assessing the relative superiority of AUCs. Compared to the existing methods, the proposed methods are nonparametric so that they do not rely on the parametric model assumptions. In addition, they are applicable to both the ROC/AUC analysis of continuous biomarkers and the AUC analysis of ordinal biomarkers. Our theoretical results and simulation studies validate the proposed methods, which we further illustrate through application in two real-world diagnostic studies.

本文探讨了在金标准不完善的情况下估计接收者操作特征曲线(ROC)和曲线下面积(AUC)所面临的挑战,这是诊断准确性研究中的一个常见问题。当疾病状态的参考标准容易出错时,我们将深入研究 ROC 曲线和 AUC 的非参数识别和估算。我们的方法取决于这种不完美参考标准的已知或可估计准确性以及条件独立假设,在此假设下,我们证明了 ROC 曲线的可识别性,并提出了一种非参数估计方法。在不完全参考标准的准确性仍然未知的情况下,我们确定 ROC 曲线是不可识别的,但两个 AUC 之间差值的符号是可以识别的。这一洞察力促使我们开发出一种假设检验方法,用于评估 AUC 的相对优越性。与现有方法相比,所提出的方法是非参数方法,因此不依赖于参数模型假设。此外,它们还适用于连续生物标记物的 ROC/AUC 分析和序数生物标记物的 AUC 分析。我们的理论结果和模拟研究验证了所提出的方法,并通过在两项实际诊断研究中的应用进一步说明了这些方法。
{"title":"Nonparametric receiver operating characteristic curve analysis with an imperfect gold standard.","authors":"Jiarui Sun, Chao Tang, Wuxiang Xie, Xiao-Hua Zhou","doi":"10.1093/biomtc/ujae063","DOIUrl":"https://doi.org/10.1093/biomtc/ujae063","url":null,"abstract":"<p><p>This article addresses the challenge of estimating receiver operating characteristic (ROC) curves and the areas under these curves (AUC) in the context of an imperfect gold standard, a common issue in diagnostic accuracy studies. We delve into the nonparametric identification and estimation of ROC curves and AUCs when the reference standard for disease status is prone to error. Our approach hinges on the known or estimable accuracy of this imperfect reference standard and the conditional independent assumption, under which we demonstrate the identifiability of ROC curves and propose a nonparametric estimation method. In cases where the accuracy of the imperfect reference standard remains unknown, we establish that while ROC curves are unidentifiable, the sign of the difference between two AUCs is identifiable. This insight leads us to develop a hypothesis-testing method for assessing the relative superiority of AUCs. Compared to the existing methods, the proposed methods are nonparametric so that they do not rely on the parametric model assumptions. In addition, they are applicable to both the ROC/AUC analysis of continuous biomarkers and the AUC analysis of ordinal biomarkers. Our theoretical results and simulation studies validate the proposed methods, which we further illustrate through application in two real-world diagnostic studies.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141589542","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bayesian inference for multivariate probit model with latent envelope. 具有潜在包络的多元概率模型的贝叶斯推断。
IF 1.4 4区 数学 Q3 BIOLOGY Pub Date : 2024-07-01 DOI: 10.1093/biomtc/ujae059
Kwangmin Lee, Yeonhee Park

The response envelope model proposed by Cook et al. (2010) is an efficient method to estimate the regression coefficient under the context of the multivariate linear regression model. It improves estimation efficiency by identifying material and immaterial parts of responses and removing the immaterial variation. The response envelope model has been investigated only for continuous response variables. In this paper, we propose the multivariate probit model with latent envelope, in short, the probit envelope model, as a response envelope model for multivariate binary response variables. The probit envelope model takes into account relations between Gaussian latent variables of the multivariate probit model by using the idea of the response envelope model. We address the identifiability of the probit envelope model by employing the essential identifiability concept and suggest a Bayesian method for the parameter estimation. We illustrate the probit envelope model via simulation studies and real-data analysis. The simulation studies show that the probit envelope model has the potential to gain efficiency in estimation compared to the multivariate probit model. The real data analysis shows that the probit envelope model is useful for multi-label classification.

库克等人(2010 年)提出的反应包络模型是多元线性回归模型下估计回归系数的一种有效方法。它通过识别响应的实质性和非实质性部分并去除非实质性变异来提高估计效率。响应包络模型只针对连续响应变量进行过研究。本文提出了带潜包络的多元 probit 模型,简称 probit 包络模型,作为多元二元响应变量的响应包络模型。probit 包络模型利用响应包络模型的思想,考虑了多元 probit 模型中高斯潜变量之间的关系。我们利用基本可识别性概念来解决 probit 包络模型的可识别性问题,并提出了参数估计的贝叶斯方法。我们通过模拟研究和实际数据分析来说明 probit 包络模型。模拟研究表明,与多元概率模型相比,概率包络模型具有提高估计效率的潜力。真实数据分析表明,概率包络模型适用于多标签分类。
{"title":"Bayesian inference for multivariate probit model with latent envelope.","authors":"Kwangmin Lee, Yeonhee Park","doi":"10.1093/biomtc/ujae059","DOIUrl":"https://doi.org/10.1093/biomtc/ujae059","url":null,"abstract":"<p><p>The response envelope model proposed by Cook et al. (2010) is an efficient method to estimate the regression coefficient under the context of the multivariate linear regression model. It improves estimation efficiency by identifying material and immaterial parts of responses and removing the immaterial variation. The response envelope model has been investigated only for continuous response variables. In this paper, we propose the multivariate probit model with latent envelope, in short, the probit envelope model, as a response envelope model for multivariate binary response variables. The probit envelope model takes into account relations between Gaussian latent variables of the multivariate probit model by using the idea of the response envelope model. We address the identifiability of the probit envelope model by employing the essential identifiability concept and suggest a Bayesian method for the parameter estimation. We illustrate the probit envelope model via simulation studies and real-data analysis. The simulation studies show that the probit envelope model has the potential to gain efficiency in estimation compared to the multivariate probit model. The real data analysis shows that the probit envelope model is useful for multi-label classification.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141475824","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Nonparametric worst-case bounds for publication bias on the summary receiver operating characteristic curve. 非参数最坏情况下接收者操作特征曲线汇总的发表偏倚界限。
IF 1.4 4区 数学 Q3 BIOLOGY Pub Date : 2024-07-01 DOI: 10.1093/biomtc/ujae080
Yi Zhou, Ao Huang, Satoshi Hattori

The summary receiver operating characteristic (SROC) curve has been recommended as one important meta-analytical summary to represent the accuracy of a diagnostic test in the presence of heterogeneous cutoff values. However, selective publication of diagnostic studies for meta-analysis can induce publication bias (PB) on the estimate of the SROC curve. Several sensitivity analysis methods have been developed to quantify PB on the SROC curve, and all these methods utilize parametric selection functions to model the selective publication mechanism. The main contribution of this article is to propose a new sensitivity analysis approach that derives the worst-case bounds for the SROC curve by adopting nonparametric selection functions under minimal assumptions. The estimation procedures of the worst-case bounds use the Monte Carlo method to approximate the bias on the SROC curves along with the corresponding area under the curves, and then the maximum and minimum values of PB under a range of marginal selection probabilities are optimized by nonlinear programming. We apply the proposed method to real-world meta-analyses to show that the worst-case bounds of the SROC curves can provide useful insights for discussing the robustness of meta-analytical findings on diagnostic test accuracy.

接受者操作特征曲线(SROC)总结被推荐为一种重要的荟萃分析总结,用于在存在不同截断值的情况下表示诊断测试的准确性。然而,选择性发表用于荟萃分析的诊断研究可能会导致 SROC 曲线的估计值出现发表偏倚(PB)。目前已开发出几种敏感性分析方法来量化 SROC 曲线上的发表偏倚,所有这些方法都利用参数选择函数来模拟选择性发表机制。本文的主要贡献在于提出了一种新的敏感性分析方法,通过在最小假设条件下采用非参数选择函数,推导出 SROC 曲线的最坏情况界限。最坏情况界限的估算程序使用蒙特卡罗方法来近似 SROC 曲线上的偏差以及相应的曲线下面积,然后通过非线性编程优化一系列边际选择概率下 PB 的最大值和最小值。我们将所提出的方法应用于现实世界的荟萃分析,结果表明 SROC 曲线的最坏情况界限可以为讨论诊断检测准确性荟萃分析结果的稳健性提供有用的见解。
{"title":"Nonparametric worst-case bounds for publication bias on the summary receiver operating characteristic curve.","authors":"Yi Zhou, Ao Huang, Satoshi Hattori","doi":"10.1093/biomtc/ujae080","DOIUrl":"10.1093/biomtc/ujae080","url":null,"abstract":"<p><p>The summary receiver operating characteristic (SROC) curve has been recommended as one important meta-analytical summary to represent the accuracy of a diagnostic test in the presence of heterogeneous cutoff values. However, selective publication of diagnostic studies for meta-analysis can induce publication bias (PB) on the estimate of the SROC curve. Several sensitivity analysis methods have been developed to quantify PB on the SROC curve, and all these methods utilize parametric selection functions to model the selective publication mechanism. The main contribution of this article is to propose a new sensitivity analysis approach that derives the worst-case bounds for the SROC curve by adopting nonparametric selection functions under minimal assumptions. The estimation procedures of the worst-case bounds use the Monte Carlo method to approximate the bias on the SROC curves along with the corresponding area under the curves, and then the maximum and minimum values of PB under a range of marginal selection probabilities are optimized by nonlinear programming. We apply the proposed method to real-world meta-analyses to show that the worst-case bounds of the SROC curves can provide useful insights for discussing the robustness of meta-analytical findings on diagnostic test accuracy.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142118917","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Biometrics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1