首页 > 最新文献

Biometrics最新文献

英文 中文
Leveraging independence in high-dimensional mixed linear regression. 利用高维混合线性回归中的独立性
IF 1.4 4区 数学 Q3 BIOLOGY Pub Date : 2024-07-01 DOI: 10.1093/biomtc/ujae103
Ning Wang, Kai Deng, Qing Mai, Xin Zhang

We address the challenge of estimating regression coefficients and selecting relevant predictors in the context of mixed linear regression in high dimensions, where the number of predictors greatly exceeds the sample size. Recent advancements in this field have centered on incorporating sparsity-inducing penalties into the expectation-maximization (EM) algorithm, which seeks to maximize the conditional likelihood of the response given the predictors. However, existing procedures often treat predictors as fixed or overlook their inherent variability. In this paper, we leverage the independence between the predictor and the latent indicator variable of mixtures to facilitate efficient computation and also achieve synergistic variable selection across all mixture components. We establish the non-asymptotic convergence rate of the proposed fast group-penalized EM estimator to the true regression parameters. The effectiveness of our method is demonstrated through extensive simulations and an application to the Cancer Cell Line Encyclopedia dataset for the prediction of anticancer drug sensitivity.

在高维度混合线性回归中,预测因子的数量大大超过了样本量,我们要解决的难题是估计回归系数和选择相关预测因子。该领域的最新进展集中在将稀疏性诱导惩罚纳入期望最大化(EM)算法中,该算法旨在最大化给定预测因子的响应的条件可能性。然而,现有程序通常将预测因子视为固定的,或忽略其固有的可变性。在本文中,我们利用预测变量和混合物的潜在指示变量之间的独立性来提高计算效率,并在所有混合物成分中实现协同变量选择。我们确定了所提出的快速组惩罚 EM 估计器对真实回归参数的非渐近收敛率。我们通过大量的模拟和应用于癌症细胞系百科全书数据集来预测抗癌药物敏感性,从而证明了我们方法的有效性。
{"title":"Leveraging independence in high-dimensional mixed linear regression.","authors":"Ning Wang, Kai Deng, Qing Mai, Xin Zhang","doi":"10.1093/biomtc/ujae103","DOIUrl":"10.1093/biomtc/ujae103","url":null,"abstract":"<p><p>We address the challenge of estimating regression coefficients and selecting relevant predictors in the context of mixed linear regression in high dimensions, where the number of predictors greatly exceeds the sample size. Recent advancements in this field have centered on incorporating sparsity-inducing penalties into the expectation-maximization (EM) algorithm, which seeks to maximize the conditional likelihood of the response given the predictors. However, existing procedures often treat predictors as fixed or overlook their inherent variability. In this paper, we leverage the independence between the predictor and the latent indicator variable of mixtures to facilitate efficient computation and also achieve synergistic variable selection across all mixture components. We establish the non-asymptotic convergence rate of the proposed fast group-penalized EM estimator to the true regression parameters. The effectiveness of our method is demonstrated through extensive simulations and an application to the Cancer Cell Line Encyclopedia dataset for the prediction of anticancer drug sensitivity.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"80 3","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142307073","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
LEAP: the latent exchangeability prior for borrowing information from historical data. LEAP:从历史数据中借用信息的潜在可交换性先验。
IF 1.4 4区 数学 Q3 BIOLOGY Pub Date : 2024-07-01 DOI: 10.1093/biomtc/ujae083
Ethan M Alt, Xiuya Chang, Xun Jiang, Qing Liu, May Mo, Hong Amy Xia, Joseph G Ibrahim

It is becoming increasingly popular to elicit informative priors on the basis of historical data. Popular existing priors, including the power prior, commensurate prior, and robust meta-analytic predictive prior, provide blanket discounting. Thus, if only a subset of participants in the historical data are exchangeable with the current data, these priors may not be appropriate. In order to combat this issue, propensity score approaches have been proposed. However, these approaches are only concerned with the covariate distribution, whereas exchangeability is typically assessed with parameters pertaining to the outcome. In this paper, we introduce the latent exchangeability prior (LEAP), where observations in the historical data are classified into exchangeable and non-exchangeable groups. The LEAP discounts the historical data by identifying the most relevant subjects from the historical data. We compare our proposed approach against alternative approaches in simulations and present a case study using our proposed prior to augment a control arm in a phase 3 clinical trial in plaque psoriasis with an unbalanced randomization scheme.

根据历史数据得出信息先验越来越流行。现有的流行先验,包括幂先验、相称先验和稳健元分析预测先验,都提供了全面贴现。因此,如果历史数据中只有一部分参与者可以与当前数据进行交换,那么这些先验可能并不合适。为了解决这个问题,有人提出了倾向得分法。然而,这些方法只关注协变量的分布,而可交换性通常是通过与结果相关的参数来评估的。在本文中,我们引入了潜在可交换性先验(LEAP),将历史数据中的观测值分为可交换组和不可交换组。LEAP 通过从历史数据中识别出最相关的对象来对历史数据进行折现。我们在模拟中将我们提出的方法与其他方法进行了比较,并介绍了一个案例研究,该案例研究使用我们提出的先验来增强斑块型银屑病 3 期临床试验中采用非平衡随机化方案的对照组。
{"title":"LEAP: the latent exchangeability prior for borrowing information from historical data.","authors":"Ethan M Alt, Xiuya Chang, Xun Jiang, Qing Liu, May Mo, Hong Amy Xia, Joseph G Ibrahim","doi":"10.1093/biomtc/ujae083","DOIUrl":"https://doi.org/10.1093/biomtc/ujae083","url":null,"abstract":"<p><p>It is becoming increasingly popular to elicit informative priors on the basis of historical data. Popular existing priors, including the power prior, commensurate prior, and robust meta-analytic predictive prior, provide blanket discounting. Thus, if only a subset of participants in the historical data are exchangeable with the current data, these priors may not be appropriate. In order to combat this issue, propensity score approaches have been proposed. However, these approaches are only concerned with the covariate distribution, whereas exchangeability is typically assessed with parameters pertaining to the outcome. In this paper, we introduce the latent exchangeability prior (LEAP), where observations in the historical data are classified into exchangeable and non-exchangeable groups. The LEAP discounts the historical data by identifying the most relevant subjects from the historical data. We compare our proposed approach against alternative approaches in simulations and present a case study using our proposed prior to augment a control arm in a phase 3 clinical trial in plaque psoriasis with an unbalanced randomization scheme.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"80 3","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142340682","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Nonparametric worst-case bounds for publication bias on the summary receiver operating characteristic curve. 非参数最坏情况下接收者操作特征曲线汇总的发表偏倚界限。
IF 1.4 4区 数学 Q3 BIOLOGY Pub Date : 2024-07-01 DOI: 10.1093/biomtc/ujae080
Yi Zhou, Ao Huang, Satoshi Hattori

The summary receiver operating characteristic (SROC) curve has been recommended as one important meta-analytical summary to represent the accuracy of a diagnostic test in the presence of heterogeneous cutoff values. However, selective publication of diagnostic studies for meta-analysis can induce publication bias (PB) on the estimate of the SROC curve. Several sensitivity analysis methods have been developed to quantify PB on the SROC curve, and all these methods utilize parametric selection functions to model the selective publication mechanism. The main contribution of this article is to propose a new sensitivity analysis approach that derives the worst-case bounds for the SROC curve by adopting nonparametric selection functions under minimal assumptions. The estimation procedures of the worst-case bounds use the Monte Carlo method to approximate the bias on the SROC curves along with the corresponding area under the curves, and then the maximum and minimum values of PB under a range of marginal selection probabilities are optimized by nonlinear programming. We apply the proposed method to real-world meta-analyses to show that the worst-case bounds of the SROC curves can provide useful insights for discussing the robustness of meta-analytical findings on diagnostic test accuracy.

接受者操作特征曲线(SROC)总结被推荐为一种重要的荟萃分析总结,用于在存在不同截断值的情况下表示诊断测试的准确性。然而,选择性发表用于荟萃分析的诊断研究可能会导致 SROC 曲线的估计值出现发表偏倚(PB)。目前已开发出几种敏感性分析方法来量化 SROC 曲线上的发表偏倚,所有这些方法都利用参数选择函数来模拟选择性发表机制。本文的主要贡献在于提出了一种新的敏感性分析方法,通过在最小假设条件下采用非参数选择函数,推导出 SROC 曲线的最坏情况界限。最坏情况界限的估算程序使用蒙特卡罗方法来近似 SROC 曲线上的偏差以及相应的曲线下面积,然后通过非线性编程优化一系列边际选择概率下 PB 的最大值和最小值。我们将所提出的方法应用于现实世界的荟萃分析,结果表明 SROC 曲线的最坏情况界限可以为讨论诊断检测准确性荟萃分析结果的稳健性提供有用的见解。
{"title":"Nonparametric worst-case bounds for publication bias on the summary receiver operating characteristic curve.","authors":"Yi Zhou, Ao Huang, Satoshi Hattori","doi":"10.1093/biomtc/ujae080","DOIUrl":"10.1093/biomtc/ujae080","url":null,"abstract":"<p><p>The summary receiver operating characteristic (SROC) curve has been recommended as one important meta-analytical summary to represent the accuracy of a diagnostic test in the presence of heterogeneous cutoff values. However, selective publication of diagnostic studies for meta-analysis can induce publication bias (PB) on the estimate of the SROC curve. Several sensitivity analysis methods have been developed to quantify PB on the SROC curve, and all these methods utilize parametric selection functions to model the selective publication mechanism. The main contribution of this article is to propose a new sensitivity analysis approach that derives the worst-case bounds for the SROC curve by adopting nonparametric selection functions under minimal assumptions. The estimation procedures of the worst-case bounds use the Monte Carlo method to approximate the bias on the SROC curves along with the corresponding area under the curves, and then the maximum and minimum values of PB under a range of marginal selection probabilities are optimized by nonlinear programming. We apply the proposed method to real-world meta-analyses to show that the worst-case bounds of the SROC curves can provide useful insights for discussing the robustness of meta-analytical findings on diagnostic test accuracy.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"80 3","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142118917","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bayesian inference for multivariate probit model with latent envelope. 具有潜在包络的多元概率模型的贝叶斯推断。
IF 1.4 4区 数学 Q3 BIOLOGY Pub Date : 2024-07-01 DOI: 10.1093/biomtc/ujae059
Kwangmin Lee, Yeonhee Park

The response envelope model proposed by Cook et al. (2010) is an efficient method to estimate the regression coefficient under the context of the multivariate linear regression model. It improves estimation efficiency by identifying material and immaterial parts of responses and removing the immaterial variation. The response envelope model has been investigated only for continuous response variables. In this paper, we propose the multivariate probit model with latent envelope, in short, the probit envelope model, as a response envelope model for multivariate binary response variables. The probit envelope model takes into account relations between Gaussian latent variables of the multivariate probit model by using the idea of the response envelope model. We address the identifiability of the probit envelope model by employing the essential identifiability concept and suggest a Bayesian method for the parameter estimation. We illustrate the probit envelope model via simulation studies and real-data analysis. The simulation studies show that the probit envelope model has the potential to gain efficiency in estimation compared to the multivariate probit model. The real data analysis shows that the probit envelope model is useful for multi-label classification.

库克等人(2010 年)提出的反应包络模型是多元线性回归模型下估计回归系数的一种有效方法。它通过识别响应的实质性和非实质性部分并去除非实质性变异来提高估计效率。响应包络模型只针对连续响应变量进行过研究。本文提出了带潜包络的多元 probit 模型,简称 probit 包络模型,作为多元二元响应变量的响应包络模型。probit 包络模型利用响应包络模型的思想,考虑了多元 probit 模型中高斯潜变量之间的关系。我们利用基本可识别性概念来解决 probit 包络模型的可识别性问题,并提出了参数估计的贝叶斯方法。我们通过模拟研究和实际数据分析来说明 probit 包络模型。模拟研究表明,与多元概率模型相比,概率包络模型具有提高估计效率的潜力。真实数据分析表明,概率包络模型适用于多标签分类。
{"title":"Bayesian inference for multivariate probit model with latent envelope.","authors":"Kwangmin Lee, Yeonhee Park","doi":"10.1093/biomtc/ujae059","DOIUrl":"https://doi.org/10.1093/biomtc/ujae059","url":null,"abstract":"<p><p>The response envelope model proposed by Cook et al. (2010) is an efficient method to estimate the regression coefficient under the context of the multivariate linear regression model. It improves estimation efficiency by identifying material and immaterial parts of responses and removing the immaterial variation. The response envelope model has been investigated only for continuous response variables. In this paper, we propose the multivariate probit model with latent envelope, in short, the probit envelope model, as a response envelope model for multivariate binary response variables. The probit envelope model takes into account relations between Gaussian latent variables of the multivariate probit model by using the idea of the response envelope model. We address the identifiability of the probit envelope model by employing the essential identifiability concept and suggest a Bayesian method for the parameter estimation. We illustrate the probit envelope model via simulation studies and real-data analysis. The simulation studies show that the probit envelope model has the potential to gain efficiency in estimation compared to the multivariate probit model. The real data analysis shows that the probit envelope model is useful for multi-label classification.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"80 3","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141475824","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Nonparametric receiver operating characteristic curve analysis with an imperfect gold standard. 使用不完善的金标准进行非参数接收器工作特征曲线分析。
IF 1.4 4区 数学 Q3 BIOLOGY Pub Date : 2024-07-01 DOI: 10.1093/biomtc/ujae063
Jiarui Sun, Chao Tang, Wuxiang Xie, Xiao-Hua Zhou

This article addresses the challenge of estimating receiver operating characteristic (ROC) curves and the areas under these curves (AUC) in the context of an imperfect gold standard, a common issue in diagnostic accuracy studies. We delve into the nonparametric identification and estimation of ROC curves and AUCs when the reference standard for disease status is prone to error. Our approach hinges on the known or estimable accuracy of this imperfect reference standard and the conditional independent assumption, under which we demonstrate the identifiability of ROC curves and propose a nonparametric estimation method. In cases where the accuracy of the imperfect reference standard remains unknown, we establish that while ROC curves are unidentifiable, the sign of the difference between two AUCs is identifiable. This insight leads us to develop a hypothesis-testing method for assessing the relative superiority of AUCs. Compared to the existing methods, the proposed methods are nonparametric so that they do not rely on the parametric model assumptions. In addition, they are applicable to both the ROC/AUC analysis of continuous biomarkers and the AUC analysis of ordinal biomarkers. Our theoretical results and simulation studies validate the proposed methods, which we further illustrate through application in two real-world diagnostic studies.

本文探讨了在金标准不完善的情况下估计接收者操作特征曲线(ROC)和曲线下面积(AUC)所面临的挑战,这是诊断准确性研究中的一个常见问题。当疾病状态的参考标准容易出错时,我们将深入研究 ROC 曲线和 AUC 的非参数识别和估算。我们的方法取决于这种不完美参考标准的已知或可估计准确性以及条件独立假设,在此假设下,我们证明了 ROC 曲线的可识别性,并提出了一种非参数估计方法。在不完全参考标准的准确性仍然未知的情况下,我们确定 ROC 曲线是不可识别的,但两个 AUC 之间差值的符号是可以识别的。这一洞察力促使我们开发出一种假设检验方法,用于评估 AUC 的相对优越性。与现有方法相比,所提出的方法是非参数方法,因此不依赖于参数模型假设。此外,它们还适用于连续生物标记物的 ROC/AUC 分析和序数生物标记物的 AUC 分析。我们的理论结果和模拟研究验证了所提出的方法,并通过在两项实际诊断研究中的应用进一步说明了这些方法。
{"title":"Nonparametric receiver operating characteristic curve analysis with an imperfect gold standard.","authors":"Jiarui Sun, Chao Tang, Wuxiang Xie, Xiao-Hua Zhou","doi":"10.1093/biomtc/ujae063","DOIUrl":"https://doi.org/10.1093/biomtc/ujae063","url":null,"abstract":"<p><p>This article addresses the challenge of estimating receiver operating characteristic (ROC) curves and the areas under these curves (AUC) in the context of an imperfect gold standard, a common issue in diagnostic accuracy studies. We delve into the nonparametric identification and estimation of ROC curves and AUCs when the reference standard for disease status is prone to error. Our approach hinges on the known or estimable accuracy of this imperfect reference standard and the conditional independent assumption, under which we demonstrate the identifiability of ROC curves and propose a nonparametric estimation method. In cases where the accuracy of the imperfect reference standard remains unknown, we establish that while ROC curves are unidentifiable, the sign of the difference between two AUCs is identifiable. This insight leads us to develop a hypothesis-testing method for assessing the relative superiority of AUCs. Compared to the existing methods, the proposed methods are nonparametric so that they do not rely on the parametric model assumptions. In addition, they are applicable to both the ROC/AUC analysis of continuous biomarkers and the AUC analysis of ordinal biomarkers. Our theoretical results and simulation studies validate the proposed methods, which we further illustrate through application in two real-world diagnostic studies.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"80 3","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141589542","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Absolute risk from double nested case-control designs: cause-specific proportional hazards models with and without augmented estimating equations. 双嵌套病例对照设计的绝对风险:使用和不使用增强估计方程的特定病因比例危险模型。
IF 1.4 4区 数学 Q3 BIOLOGY Pub Date : 2024-07-01 DOI: 10.1093/biomtc/ujae062
Minjung Lee, Mitchell H Gail

We estimate relative hazards and absolute risks (or cumulative incidence or crude risk) under cause-specific proportional hazards models for competing risks from double nested case-control (DNCC) data. In the DNCC design, controls are time-matched not only to cases from the cause of primary interest, but also to cases from competing risks (the phase-two sample). Complete covariate data are available in the phase-two sample, but other cohort members only have information on survival outcomes and some covariates. Design-weighted estimators use inverse sampling probabilities computed from Samuelsen-type calculations for DNCC. To take advantage of additional information available on all cohort members, we augment the estimating equations with a term that is unbiased for zero but improves the efficiency of estimates from the cause-specific proportional hazards model. We establish the asymptotic properties of the proposed estimators, including the estimator of absolute risk, and derive consistent variance estimators. We show that augmented design-weighted estimators are more efficient than design-weighted estimators. Through simulations, we show that the proposed asymptotic methods yield nominal operating characteristics in practical sample sizes. We illustrate the methods using prostate cancer mortality data from the Prostate, Lung, Colorectal, and Ovarian Cancer Screening Trial Study of the National Cancer Institute.

我们从双嵌套病例对照(DNCC)数据中,根据竞争风险的特定病因比例危险度模型估算相对危险度和绝对危险度(或累积发病率或粗风险)。在 DNCC 设计中,对照组不仅要与主要病因的病例进行时间匹配,还要与竞争风险的病例(第二阶段样本)进行时间匹配。第二阶段样本有完整的协变量数据,但其他队列成员只有生存结果和一些协变量信息。设计加权估计器使用的是根据 DNCC 的 Samuelsen 类型计算得出的反抽样概率。为了利用所有队列成员的额外信息,我们在估计方程中增加了一个对零无偏的项,但提高了特定成因比例危险模型的估计效率。我们建立了所建议的估计器(包括绝对风险估计器)的渐近特性,并推导出一致的方差估计器。我们表明,增强设计加权估计器比设计加权估计器更有效。通过模拟,我们表明所提出的渐近方法能在实际样本量中产生名义运行特征。我们使用美国国家癌症研究所的前列腺癌、肺癌、结肠直肠癌和卵巢癌筛查试验研究中的前列腺癌死亡率数据来说明这些方法。
{"title":"Absolute risk from double nested case-control designs: cause-specific proportional hazards models with and without augmented estimating equations.","authors":"Minjung Lee, Mitchell H Gail","doi":"10.1093/biomtc/ujae062","DOIUrl":"https://doi.org/10.1093/biomtc/ujae062","url":null,"abstract":"<p><p>We estimate relative hazards and absolute risks (or cumulative incidence or crude risk) under cause-specific proportional hazards models for competing risks from double nested case-control (DNCC) data. In the DNCC design, controls are time-matched not only to cases from the cause of primary interest, but also to cases from competing risks (the phase-two sample). Complete covariate data are available in the phase-two sample, but other cohort members only have information on survival outcomes and some covariates. Design-weighted estimators use inverse sampling probabilities computed from Samuelsen-type calculations for DNCC. To take advantage of additional information available on all cohort members, we augment the estimating equations with a term that is unbiased for zero but improves the efficiency of estimates from the cause-specific proportional hazards model. We establish the asymptotic properties of the proposed estimators, including the estimator of absolute risk, and derive consistent variance estimators. We show that augmented design-weighted estimators are more efficient than design-weighted estimators. Through simulations, we show that the proposed asymptotic methods yield nominal operating characteristics in practical sample sizes. We illustrate the methods using prostate cancer mortality data from the Prostate, Lung, Colorectal, and Ovarian Cancer Screening Trial Study of the National Cancer Institute.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"80 3","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141589541","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Factor-augmented transformation models for interval-censored failure time data. 用于间隔删失故障时间数据的因子增强变换模型。
IF 1.4 4区 数学 Q3 BIOLOGY Pub Date : 2024-07-01 DOI: 10.1093/biomtc/ujae078
Hongxi Li, Shuwei Li, Liuquan Sun, Xinyuan Song

Interval-censored failure time data frequently arise in various scientific studies where each subject experiences periodical examinations for the occurrence of the failure event of interest, and the failure time is only known to lie in a specific time interval. In addition, collected data may include multiple observed variables with a certain degree of correlation, leading to severe multicollinearity issues. This work proposes a factor-augmented transformation model to analyze interval-censored failure time data while reducing model dimensionality and avoiding multicollinearity elicited by multiple correlated covariates. We provide a joint modeling framework by comprising a factor analysis model to group multiple observed variables into a few latent factors and a class of semiparametric transformation models with the augmented factors to examine their and other covariate effects on the failure event. Furthermore, we propose a nonparametric maximum likelihood estimation approach and develop a computationally stable and reliable expectation-maximization algorithm for its implementation. We establish the asymptotic properties of the proposed estimators and conduct simulation studies to assess the empirical performance of the proposed method. An application to the Alzheimer's Disease Neuroimaging Initiative (ADNI) study is provided. An R package ICTransCFA is also available for practitioners. Data used in preparation of this article were obtained from the ADNI database.

区间删失失效时间数据经常出现在各种科学研究中,在这些研究中,每个受试者都经历了相关失效事件发生的定期检查,而失效时间只知道在一个特定的时间区间内。此外,收集到的数据可能包含多个具有一定相关性的观测变量,从而导致严重的多重共线性问题。本研究提出了一种因子增强变换模型,用于分析区间删失的故障时间数据,同时降低模型维度,避免多个相关协变量引起的多重共线性。我们提供了一个联合建模框架,其中包括一个因子分析模型,用于将多个观测变量归类为几个潜在因子,以及一类带有增强因子的半参数变换模型,用于检验这些因子和其他协变量对故障事件的影响。此外,我们还提出了一种非参数最大似然估计方法,并为其实现开发了一种计算稳定可靠的期望最大化算法。我们建立了所提估计器的渐近特性,并进行了模拟研究,以评估所提方法的经验性能。我们还提供了阿尔茨海默病神经影像倡议(ADNI)研究的应用。此外,还为实践者提供了一个 R 软件包 ICTransCFA。本文编写过程中使用的数据来自 ADNI 数据库。
{"title":"Factor-augmented transformation models for interval-censored failure time data.","authors":"Hongxi Li, Shuwei Li, Liuquan Sun, Xinyuan Song","doi":"10.1093/biomtc/ujae078","DOIUrl":"https://doi.org/10.1093/biomtc/ujae078","url":null,"abstract":"<p><p>Interval-censored failure time data frequently arise in various scientific studies where each subject experiences periodical examinations for the occurrence of the failure event of interest, and the failure time is only known to lie in a specific time interval. In addition, collected data may include multiple observed variables with a certain degree of correlation, leading to severe multicollinearity issues. This work proposes a factor-augmented transformation model to analyze interval-censored failure time data while reducing model dimensionality and avoiding multicollinearity elicited by multiple correlated covariates. We provide a joint modeling framework by comprising a factor analysis model to group multiple observed variables into a few latent factors and a class of semiparametric transformation models with the augmented factors to examine their and other covariate effects on the failure event. Furthermore, we propose a nonparametric maximum likelihood estimation approach and develop a computationally stable and reliable expectation-maximization algorithm for its implementation. We establish the asymptotic properties of the proposed estimators and conduct simulation studies to assess the empirical performance of the proposed method. An application to the Alzheimer's Disease Neuroimaging Initiative (ADNI) study is provided. An R package ICTransCFA is also available for practitioners. Data used in preparation of this article were obtained from the ADNI database.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"80 3","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142035125","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Improving prediction of linear regression models by integrating external information from heterogeneous populations: James-Stein estimators. 通过整合来自异质种群的外部信息改进线性回归模型的预测:詹姆斯-斯坦估计器
IF 1.4 4区 数学 Q3 BIOLOGY Pub Date : 2024-07-01 DOI: 10.1093/biomtc/ujae072
Peisong Han, Haoyue Li, Sung Kyun Park, Bhramar Mukherjee, Jeremy M G Taylor

We consider the setting where (1) an internal study builds a linear regression model for prediction based on individual-level data, (2) some external studies have fitted similar linear regression models that use only subsets of the covariates and provide coefficient estimates for the reduced models without individual-level data, and (3) there is heterogeneity across these study populations. The goal is to integrate the external model summary information into fitting the internal model to improve prediction accuracy. We adapt the James-Stein shrinkage method to propose estimators that are no worse and are oftentimes better in the prediction mean squared error after information integration, regardless of the degree of study population heterogeneity. We conduct comprehensive simulation studies to investigate the numerical performance of the proposed estimators. We also apply the method to enhance a prediction model for patella bone lead level in terms of blood lead level and other covariates by integrating summary information from published literature.

我们考虑的情况是:(1) 一项内部研究根据个体水平数据建立了一个线性回归预测模型;(2) 一些外部研究拟合了类似的线性回归模型,这些模型只使用了协变量子集,并在没有个体水平数据的情况下提供了缩小模型的系数估计值;(3) 这些研究人群之间存在异质性。我们的目标是将外部模型的摘要信息整合到内部模型的拟合中,以提高预测的准确性。我们采用詹姆斯-斯泰因收缩方法,提出了在信息整合后预测均方误差不会变差的估计器,而且在很多情况下,无论研究人群的异质性程度如何,估计器的预测均方误差都会更好。我们进行了全面的模拟研究,以考察所提出的估计器的数值性能。我们还应用该方法,通过整合已发表文献的摘要信息,从血铅水平和其他协变量的角度增强了髌骨骨铅水平的预测模型。
{"title":"Improving prediction of linear regression models by integrating external information from heterogeneous populations: James-Stein estimators.","authors":"Peisong Han, Haoyue Li, Sung Kyun Park, Bhramar Mukherjee, Jeremy M G Taylor","doi":"10.1093/biomtc/ujae072","DOIUrl":"10.1093/biomtc/ujae072","url":null,"abstract":"<p><p>We consider the setting where (1) an internal study builds a linear regression model for prediction based on individual-level data, (2) some external studies have fitted similar linear regression models that use only subsets of the covariates and provide coefficient estimates for the reduced models without individual-level data, and (3) there is heterogeneity across these study populations. The goal is to integrate the external model summary information into fitting the internal model to improve prediction accuracy. We adapt the James-Stein shrinkage method to propose estimators that are no worse and are oftentimes better in the prediction mean squared error after information integration, regardless of the degree of study population heterogeneity. We conduct comprehensive simulation studies to investigate the numerical performance of the proposed estimators. We also apply the method to enhance a prediction model for patella bone lead level in terms of blood lead level and other covariates by integrating summary information from published literature.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"80 3","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11299067/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141888418","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Propensity weighting plus adjustment in proportional hazards model is not doubly robust. 比例危险模型中的倾向加权加调整不具有双重稳健性。
IF 1.4 4区 数学 Q3 BIOLOGY Pub Date : 2024-07-01 DOI: 10.1093/biomtc/ujae069
Erin E Gabriel, Michael C Sachs, Ingeborg Waernbaum, Els Goetghebeur, Paul F Blanche, Stijn Vansteelandt, Arvid Sjölander, Thomas Scheike

Recently, it has become common for applied works to combine commonly used survival analysis modeling methods, such as the multivariable Cox model and propensity score weighting, with the intention of forming a doubly robust estimator of an exposure effect hazard ratio that is unbiased in large samples when either the Cox model or the propensity score model is correctly specified. This combination does not, in general, produce a doubly robust estimator, even after regression standardization, when there is truly a causal effect. We demonstrate via simulation this lack of double robustness for the semiparametric Cox model, the Weibull proportional hazards model, and a simple proportional hazards flexible parametric model, with both the latter models fit via maximum likelihood. We provide a novel proof that the combination of propensity score weighting and a proportional hazards survival model, fit either via full or partial likelihood, is consistent under the null of no causal effect of the exposure on the outcome under particular censoring mechanisms if either the propensity score or the outcome model is correctly specified and contains all confounders. Given our results suggesting that double robustness only exists under the null, we outline 2 simple alternative estimators that are doubly robust for the survival difference at a given time point (in the above sense), provided the censoring mechanism can be correctly modeled, and one doubly robust method of estimation for the full survival curve. We provide R code to use these estimators for estimation and inference in the supporting information.

近来,应用研究普遍将常用的生存分析建模方法(如多变量 Cox 模型和倾向得分加权法)结合起来,目的是形成一个双重稳健的暴露效应危险比估计值,当 Cox 模型或倾向得分模型被正确指定时,该估计值在大样本中是无偏的。一般来说,当确实存在因果效应时,即使经过回归标准化处理,这种组合也不会产生双重稳健估计值。我们通过模拟证明了半参数 Cox 模型、Weibull 比例危险模型和简单比例危险灵活参数模型缺乏双重稳健性,后两种模型都是通过最大似然法拟合的。我们提供了一个新颖的证明,即如果倾向得分或结果模型指定正确且包含所有混杂因素,那么倾向得分加权与比例危险生存模型的组合,无论是通过完全似然法还是部分似然法拟合,在暴露对结果无因果效应的空值下,在特定的删减机制下都是一致的。鉴于我们的研究结果表明双重稳健性只存在于空值条件下,我们概述了 2 种简单的替代估计方法,它们对给定时间点上的生存率差异具有双重稳健性(在上述意义上),前提是能够正确地对剔除机制进行建模;我们还概述了一种对完整生存率曲线具有双重稳健性的估计方法。我们在辅助信息中提供了使用这些估计器进行估计和推断的 R 代码。
{"title":"Propensity weighting plus adjustment in proportional hazards model is not doubly robust.","authors":"Erin E Gabriel, Michael C Sachs, Ingeborg Waernbaum, Els Goetghebeur, Paul F Blanche, Stijn Vansteelandt, Arvid Sjölander, Thomas Scheike","doi":"10.1093/biomtc/ujae069","DOIUrl":"https://doi.org/10.1093/biomtc/ujae069","url":null,"abstract":"<p><p>Recently, it has become common for applied works to combine commonly used survival analysis modeling methods, such as the multivariable Cox model and propensity score weighting, with the intention of forming a doubly robust estimator of an exposure effect hazard ratio that is unbiased in large samples when either the Cox model or the propensity score model is correctly specified. This combination does not, in general, produce a doubly robust estimator, even after regression standardization, when there is truly a causal effect. We demonstrate via simulation this lack of double robustness for the semiparametric Cox model, the Weibull proportional hazards model, and a simple proportional hazards flexible parametric model, with both the latter models fit via maximum likelihood. We provide a novel proof that the combination of propensity score weighting and a proportional hazards survival model, fit either via full or partial likelihood, is consistent under the null of no causal effect of the exposure on the outcome under particular censoring mechanisms if either the propensity score or the outcome model is correctly specified and contains all confounders. Given our results suggesting that double robustness only exists under the null, we outline 2 simple alternative estimators that are doubly robust for the survival difference at a given time point (in the above sense), provided the censoring mechanism can be correctly modeled, and one doubly robust method of estimation for the full survival curve. We provide R code to use these estimators for estimation and inference in the supporting information.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"80 3","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141733497","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Post-selection inference in regression models for group testing data. 分组测试数据回归模型中的后选择推断。
IF 1.4 4区 数学 Q3 BIOLOGY Pub Date : 2024-07-01 DOI: 10.1093/biomtc/ujae101
Qinyan Shen, Karl Gregory, Xianzheng Huang

We develop a methodology for valid inference after variable selection in logistic regression when the responses are partially observed, that is, when one observes a set of error-prone testing outcomes instead of the true values of the responses. Aiming at selecting important covariates while accounting for missing information in the response data, we apply the expectation-maximization algorithm to compute maximum likelihood estimators subject to LASSO penalization. Subsequent to variable selection, we make inferences on the selected covariate effects by extending post-selection inference methodology based on the polyhedral lemma. Empirical evidence from our extensive simulation study suggests that our post-selection inference results are more reliable than those from naive inference methods that use the same data to perform variable selection and inference without adjusting for variable selection.

我们开发了一种在逻辑回归中选择变量后进行有效推断的方法,这种方法适用于部分观察到反应的情况,即观察到一组容易出错的测试结果而不是反应的真实值。为了选择重要的协变量,同时考虑响应数据中的缺失信息,我们采用期望最大化算法来计算受 LASSO 惩罚的最大似然估计值。在变量选择之后,我们根据多面体(polyhedral)lemma 扩展了选择后推断方法,从而对所选协变量的影响进行推断。大量模拟研究的经验证据表明,与使用相同数据进行变量选择和推断而不对变量选择进行调整的天真推断方法相比,我们的后选择推断结果更加可靠。
{"title":"Post-selection inference in regression models for group testing data.","authors":"Qinyan Shen, Karl Gregory, Xianzheng Huang","doi":"10.1093/biomtc/ujae101","DOIUrl":"https://doi.org/10.1093/biomtc/ujae101","url":null,"abstract":"<p><p>We develop a methodology for valid inference after variable selection in logistic regression when the responses are partially observed, that is, when one observes a set of error-prone testing outcomes instead of the true values of the responses. Aiming at selecting important covariates while accounting for missing information in the response data, we apply the expectation-maximization algorithm to compute maximum likelihood estimators subject to LASSO penalization. Subsequent to variable selection, we make inferences on the selected covariate effects by extending post-selection inference methodology based on the polyhedral lemma. Empirical evidence from our extensive simulation study suggests that our post-selection inference results are more reliable than those from naive inference methods that use the same data to perform variable selection and inference without adjusting for variable selection.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"80 3","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142280082","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Biometrics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1