首页 > 最新文献

Biometrics最新文献

英文 中文
Double robust conditional independence test for novel biomarkers given established risk factors with survival data. 双鲁棒条件独立测试新的生物标志物给定的风险因素与生存数据。
IF 1.7 4区 数学 Q3 BIOLOGY Pub Date : 2025-10-08 DOI: 10.1093/biomtc/ujaf133
Baoying Yang, Jing Qin, Jing Ning, Yukun Liu

Conditional independence is a foundational concept for understanding probabilistic relationships among variables, with broad applications in fields such as causal inference and machine learning. This study focuses on testing conditional independence, $Tperp X|Z$, where T represents survival data possibly subject to right censoring, Z represents established risk factors for T, and X represents potential novel biomarkers. The goal is to identify novel biomarkers that offer additional merits for further risk assessment and prediction. This can be achieved by using either the partial or parametric likelihood ratio statistic to evaluate whether the coefficient vector of X in the conditional model of T given $(X^{ mathrm{scriptscriptstyle top } }, Z^{ mathrm{scriptscriptstyle top } })^{ mathrm{scriptscriptstyle top } }$ is equal to zero. Traditional tests such as directly comparing likelihood ratios to chi-squared distributions may produce erroneous type-I error rates under model misspecification. As an alternative, we propose a resampling-based method to approximate the distribution of the likelihood ratios. A key advantage of the proposed test is its double robustness: it achieves approximately correct type-I error rates when either the conditional outcome model or the working model of ${rm pr} (X|Z)$ is correctly specified. Additionally, machine learning techniques can be incorporated to improve test performance. Simulation studies and the application to the Alzheimer's Disease Neuroimaging Initiative (ADNI) data demonstrate the finite-sample performance of the proposed tests.

条件独立是理解变量间概率关系的基本概念,在因果推理和机器学习等领域有着广泛的应用。这项研究的重点是测试条件独立性,$Tperp X|Z$,其中T代表可能受到正确审查的生存数据,Z代表T的既定风险因素,X代表潜在的新生物标志物。目标是确定新的生物标志物,为进一步的风险评估和预测提供额外的优点。这可以通过使用偏似然比或参数似然比统计量来评估给定$(X^{mathrm{scriptscriptstyle top}}, Z^{mathrm{scriptscriptstyle top}})^{mathrm{scriptscriptstyle top}}$的条件模型中X的系数向量是否等于零来实现。传统的检验,如直接将似然比与卡方分布进行比较,可能会在模型错误规范下产生错误的i型错误率。作为替代方案,我们提出了一种基于重采样的方法来近似似然比的分布。所提出的测试的一个关键优势是它的双重鲁棒性:当条件结果模型或${rm pr} (X|Z)$的工作模型被正确指定时,它实现了近似正确的i型错误率。此外,可以结合机器学习技术来提高测试性能。模拟研究和对阿尔茨海默病神经成像倡议(ADNI)数据的应用证明了所提出的测试的有限样本性能。
{"title":"Double robust conditional independence test for novel biomarkers given established risk factors with survival data.","authors":"Baoying Yang, Jing Qin, Jing Ning, Yukun Liu","doi":"10.1093/biomtc/ujaf133","DOIUrl":"10.1093/biomtc/ujaf133","url":null,"abstract":"<p><p>Conditional independence is a foundational concept for understanding probabilistic relationships among variables, with broad applications in fields such as causal inference and machine learning. This study focuses on testing conditional independence, $Tperp X|Z$, where T represents survival data possibly subject to right censoring, Z represents established risk factors for T, and X represents potential novel biomarkers. The goal is to identify novel biomarkers that offer additional merits for further risk assessment and prediction. This can be achieved by using either the partial or parametric likelihood ratio statistic to evaluate whether the coefficient vector of X in the conditional model of T given $(X^{ mathrm{scriptscriptstyle top } }, Z^{ mathrm{scriptscriptstyle top } })^{ mathrm{scriptscriptstyle top } }$ is equal to zero. Traditional tests such as directly comparing likelihood ratios to chi-squared distributions may produce erroneous type-I error rates under model misspecification. As an alternative, we propose a resampling-based method to approximate the distribution of the likelihood ratios. A key advantage of the proposed test is its double robustness: it achieves approximately correct type-I error rates when either the conditional outcome model or the working model of ${rm pr} (X|Z)$ is correctly specified. Additionally, machine learning techniques can be incorporated to improve test performance. Simulation studies and the application to the Alzheimer's Disease Neuroimaging Initiative (ADNI) data demonstrate the finite-sample performance of the proposed tests.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 4","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145336170","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Generalized nonparametric temporal modeling of recurrent events with application to a malaria vaccine trial. 复发事件的广义非参数时间模型及其在疟疾疫苗试验中的应用
IF 1.7 4区 数学 Q3 BIOLOGY Pub Date : 2025-10-08 DOI: 10.1093/biomtc/ujaf146
Fei Heng, Yanqing Sun, Jing Xu, Peter B Gilbert

Motivated by a malaria vaccine efficacy trial, this paper investigates generalized nonparametric temporal models of intensity processes with multiple time scales. Through the choice of link functions, the proposed models encompass a wide range of models such as the multiplicative temporal intensity model and the additive temporal intensity model. A maximum likelihood estimation procedure is developed to estimate the effects of two time-scales via the local linear smoothing with double kernels. Computational algorithms are developed to facilitate applications of the proposed method. An adaptive algorithm is developed to overcome the challenges of overlapping covariates. A cross-validation bandwidth selection procedure based on the logarithm of likelihood criteria is discussed. The asymptotic properties of the proposed estimators are investigated. Our simulation study shows that the proposed methods have satisfactory finite sample performance for both the multiplicative temporal intensity model and additive temporal intensity model. The proposed methods are applied to analyze the MAL-094/MAL-095 malaria vaccine efficacy trial data to investigate how the new malaria infection risk changes over time and how a prior infection or vaccination changes the future infection risk. The proposed method provides new insight into the protective effects of the malaria vaccine against new malaria infections and how the vaccine efficacy is modified by the history of prior malaria infection over time.

受疟疾疫苗疗效试验的启发,本文研究了多时间尺度强度过程的广义非参数时间模型。通过对链接函数的选择,所提出的模型涵盖了广泛的模型,如乘法时间强度模型和加性时间强度模型。提出了一种最大似然估计方法,通过双核局部线性平滑来估计两个时间尺度的影响。开发了计算算法以促进所提出方法的应用。为了克服协变量重叠的问题,提出了一种自适应算法。讨论了基于对数似然准则的交叉验证带宽选择程序。研究了所提估计量的渐近性质。仿真研究表明,所提出的方法对乘法时间强度模型和加性时间强度模型都具有满意的有限样本性能。本文采用上述方法对MAL-094/MAL-095疟疾疫苗疗效试验数据进行分析,探讨新的疟疾感染风险如何随时间变化,以及既往感染或接种疫苗如何改变未来感染风险。所提出的方法为疟疾疫苗对新的疟疾感染的保护作用以及疫苗效力如何随着时间的推移而被既往疟疾感染史所改变提供了新的见解。
{"title":"Generalized nonparametric temporal modeling of recurrent events with application to a malaria vaccine trial.","authors":"Fei Heng, Yanqing Sun, Jing Xu, Peter B Gilbert","doi":"10.1093/biomtc/ujaf146","DOIUrl":"10.1093/biomtc/ujaf146","url":null,"abstract":"<p><p>Motivated by a malaria vaccine efficacy trial, this paper investigates generalized nonparametric temporal models of intensity processes with multiple time scales. Through the choice of link functions, the proposed models encompass a wide range of models such as the multiplicative temporal intensity model and the additive temporal intensity model. A maximum likelihood estimation procedure is developed to estimate the effects of two time-scales via the local linear smoothing with double kernels. Computational algorithms are developed to facilitate applications of the proposed method. An adaptive algorithm is developed to overcome the challenges of overlapping covariates. A cross-validation bandwidth selection procedure based on the logarithm of likelihood criteria is discussed. The asymptotic properties of the proposed estimators are investigated. Our simulation study shows that the proposed methods have satisfactory finite sample performance for both the multiplicative temporal intensity model and additive temporal intensity model. The proposed methods are applied to analyze the MAL-094/MAL-095 malaria vaccine efficacy trial data to investigate how the new malaria infection risk changes over time and how a prior infection or vaccination changes the future infection risk. The proposed method provides new insight into the protective effects of the malaria vaccine against new malaria infections and how the vaccine efficacy is modified by the history of prior malaria infection over time.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 4","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12635532/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145562433","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Deep partially linear transformation model for right-censored survival data. 右截尾生存数据的深度部分线性变换模型。
IF 1.7 4区 数学 Q3 BIOLOGY Pub Date : 2025-10-08 DOI: 10.1093/biomtc/ujaf126
Junkai Yin, Yue Zhang, Zhangsheng Yu

Although the Cox proportional hazards (PH) model is well established and extensively used in the analysis of survival data, the PH assumption may not always hold in practical scenarios. The class of semiparametric transformation models extends the Cox model and also includes many other survival models as special cases. This paper introduces a deep partially linear transformation model as a general and flexible regression framework for right-censored data. The proposed method is capable of avoiding the curse of dimensionality while still retaining the interpretability of some covariates of interest. We derive the overall convergence rate of the maximum likelihood estimators, the minimax lower bound of the nonparametric deep neural network estimator, and the asymptotic normality and the semiparametric efficiency of the parametric estimator. Comprehensive simulation studies demonstrate the impressive performance of the proposed estimation procedure in terms of both the estimation accuracy and the predictive power, which is further validated by an application to a real-world dataset.

虽然Cox比例风险(PH)模型已经建立并广泛用于生存数据的分析,但PH假设在实际情况下并不总是成立。半参数变换模型是对Cox模型的扩展,并包含了许多其他的生存模型作为特例。本文介绍了一种深度部分线性变换模型作为右截尾数据的通用、灵活的回归框架。提出的方法能够避免维数的诅咒,同时仍然保留一些感兴趣的协变量的可解释性。我们得到了极大似然估计的总体收敛速率,非参数深度神经网络估计的极小极大下界,以及参数估计的渐近正态性和半参数效率。综合仿真研究表明,所提出的估计方法在估计精度和预测能力方面具有令人印象深刻的性能,并通过实际数据集的应用进一步验证了这一点。
{"title":"Deep partially linear transformation model for right-censored survival data.","authors":"Junkai Yin, Yue Zhang, Zhangsheng Yu","doi":"10.1093/biomtc/ujaf126","DOIUrl":"10.1093/biomtc/ujaf126","url":null,"abstract":"<p><p>Although the Cox proportional hazards (PH) model is well established and extensively used in the analysis of survival data, the PH assumption may not always hold in practical scenarios. The class of semiparametric transformation models extends the Cox model and also includes many other survival models as special cases. This paper introduces a deep partially linear transformation model as a general and flexible regression framework for right-censored data. The proposed method is capable of avoiding the curse of dimensionality while still retaining the interpretability of some covariates of interest. We derive the overall convergence rate of the maximum likelihood estimators, the minimax lower bound of the nonparametric deep neural network estimator, and the asymptotic normality and the semiparametric efficiency of the parametric estimator. Comprehensive simulation studies demonstrate the impressive performance of the proposed estimation procedure in terms of both the estimation accuracy and the predictive power, which is further validated by an application to a real-world dataset.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 4","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145399782","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Randomized optimal selection design for dose optimization. 剂量优化的随机优化选择设计。
IF 1.7 4区 数学 Q3 BIOLOGY Pub Date : 2025-10-08 DOI: 10.1093/biomtc/ujaf124
Shuqi Wang, Ying Yuan, Suyu Liu

The US Food and Drug Administration (FDA) launched Project Optimus to shift the objective of dose selection from the maximum tolerated dose to the optimal biological dose (OBD), optimizing the benefit-risk tradeoff. One approach recommended by the FDA's guidance is to conduct randomized trials comparing multiple doses. In this paper, using the selection design framework, we propose a Randomized Optimal SElection (ROSE) design, which minimizes sample size while ensuring the probability of correct selection of the OBD at pre-specified accuracy levels. The ROSE design is simple to implement, involving a straightforward comparison of the difference in response rates between two dose arms against a predetermined decision boundary. We further consider a two-stage ROSE design that allows for early selection of the OBD at the interim when there is sufficient evidence, further reducing the sample size. Simulation studies demonstrate that the ROSE design exhibits desirable operating characteristics in correctly identifying the OBD. A sample size of 15-40 patients per dosage arm typically results in a percentage of correct selection of the optimal dose ranging from 60% to 70%.

美国食品和药物管理局(FDA)启动了Optimus项目,将剂量选择的目标从最大耐受剂量转移到最佳生物剂量(OBD),优化收益-风险权衡。FDA指南推荐的一种方法是进行随机试验,比较多种剂量。在本文中,我们使用选择设计框架,提出了一种随机最优选择(ROSE)设计,该设计最小化样本量,同时确保在预先指定的精度水平下正确选择OBD的概率。ROSE的设计很容易实现,它直接比较了两个剂量臂对预定决策边界的反应率差异。我们进一步考虑了两阶段ROSE设计,允许在有足够证据的中间阶段早期选择OBD,进一步减少样本量。仿真研究表明,ROSE设计在正确识别OBD方面具有良好的工作特性。每个剂量组15-40例患者的样本量通常导致正确选择最佳剂量的百分比在60%至70%之间。
{"title":"Randomized optimal selection design for dose optimization.","authors":"Shuqi Wang, Ying Yuan, Suyu Liu","doi":"10.1093/biomtc/ujaf124","DOIUrl":"10.1093/biomtc/ujaf124","url":null,"abstract":"<p><p>The US Food and Drug Administration (FDA) launched Project Optimus to shift the objective of dose selection from the maximum tolerated dose to the optimal biological dose (OBD), optimizing the benefit-risk tradeoff. One approach recommended by the FDA's guidance is to conduct randomized trials comparing multiple doses. In this paper, using the selection design framework, we propose a Randomized Optimal SElection (ROSE) design, which minimizes sample size while ensuring the probability of correct selection of the OBD at pre-specified accuracy levels. The ROSE design is simple to implement, involving a straightforward comparison of the difference in response rates between two dose arms against a predetermined decision boundary. We further consider a two-stage ROSE design that allows for early selection of the OBD at the interim when there is sufficient evidence, further reducing the sample size. Simulation studies demonstrate that the ROSE design exhibits desirable operating characteristics in correctly identifying the OBD. A sample size of 15-40 patients per dosage arm typically results in a percentage of correct selection of the optimal dose ranging from 60% to 70%.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 4","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12505323/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145249541","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Correction to: Nonparametric assessment of regimen response curve estimators. 修正:方案反应曲线估计器的非参数评估。
IF 1.7 4区 数学 Q3 BIOLOGY Pub Date : 2025-10-08 DOI: 10.1093/biomtc/ujaf137
{"title":"Correction to: Nonparametric assessment of regimen response curve estimators.","authors":"","doi":"10.1093/biomtc/ujaf137","DOIUrl":"10.1093/biomtc/ujaf137","url":null,"abstract":"","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 4","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12525390/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145298458","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A semiparametric Gaussian Mixture Model with spatial dependence and its application to whole-slide image clustering analysis. 具有空间依赖性的半参数高斯混合模型及其在整张幻灯片图像聚类分析中的应用。
IF 1.7 4区 数学 Q3 BIOLOGY Pub Date : 2025-10-08 DOI: 10.1093/biomtc/ujaf149
Baichen Yu, Jin Liu, Hansheng Wang

We develop here a semiparametric Gaussian Mixture Model (SGMM) for unsupervised learning with valuable spatial information taken into consideration. Specifically, we assume for each instance a random location. Then, conditional on this random location, we assume for the feature vector a standard Gaussian Mixture Model (GMM). The proposed SGMM allows the mixing probability to be nonparametrically related to the spatial location. Compared with a classical GMM, SGMM is considerably more flexible and allows the instances from the same class to be spatially clustered. To estimate the SGMM, novel EM algorithms are developed and rigorous asymptotic theories are established. Extensive numerical simulations are conducted to demonstrate our finite sample performance. For a real application, we apply our SGMM method to the CAMELYON16 dataset of whole-slide images for breast cancer detection. The SGMM method demonstrates outstanding clustering performance.

我们在这里开发了一个半参数高斯混合模型(SGMM),用于考虑有价值的空间信息的无监督学习。具体来说,我们假设每个实例的位置是随机的。然后,在这个随机位置的条件下,我们假设特征向量为标准高斯混合模型(GMM)。所提出的SGMM允许混合概率与空间位置非参数相关。与经典的GMM相比,SGMM灵活得多,并允许对来自同一类的实例进行空间聚类。为了估计SGMM,开发了新的EM算法,并建立了严格的渐近理论。大量的数值模拟证明了我们的有限样本性能。在实际应用中,我们将我们的SGMM方法应用于CAMELYON16全幻灯片图像数据集,用于乳腺癌检测。SGMM方法显示了出色的聚类性能。
{"title":"A semiparametric Gaussian Mixture Model with spatial dependence and its application to whole-slide image clustering analysis.","authors":"Baichen Yu, Jin Liu, Hansheng Wang","doi":"10.1093/biomtc/ujaf149","DOIUrl":"https://doi.org/10.1093/biomtc/ujaf149","url":null,"abstract":"<p><p>We develop here a semiparametric Gaussian Mixture Model (SGMM) for unsupervised learning with valuable spatial information taken into consideration. Specifically, we assume for each instance a random location. Then, conditional on this random location, we assume for the feature vector a standard Gaussian Mixture Model (GMM). The proposed SGMM allows the mixing probability to be nonparametrically related to the spatial location. Compared with a classical GMM, SGMM is considerably more flexible and allows the instances from the same class to be spatially clustered. To estimate the SGMM, novel EM algorithms are developed and rigorous asymptotic theories are established. Extensive numerical simulations are conducted to demonstrate our finite sample performance. For a real application, we apply our SGMM method to the CAMELYON16 dataset of whole-slide images for breast cancer detection. The SGMM method demonstrates outstanding clustering performance.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 4","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145653431","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SPLasso for high-dimensional additive hazards regression with covariate measurement error. 带有协变量测量误差的高维加性危害回归的SPLasso。
IF 1.7 4区 数学 Q3 BIOLOGY Pub Date : 2025-10-08 DOI: 10.1093/biomtc/ujaf130
Jiarui Zhang, Hongsheng Liu, Xin Chen, Jinfeng Xu

High-dimensional error-prone survival data are prevalent in biomedical studies, where numerous clinical or genetic variables are collected for risk assessment. The presence of measurement errors in covariates complicates parameter estimation and variable selection, leading to non-convex optimization challenges. We propose an error-in-variables additive hazards regression model for high-dimensional noisy survival data. By employing the nearest positive semi-definite matrix projection, we develop a fast Lasso approach (semi-definite projection Lasso, SPLasso) and its soft thresholding variant (SPLasso-T), both with theoretical guarantees. Under mild assumptions, we establish model selection consistency, oracle inequalities, and limiting distributions for these methods. Simulation studies and two real data applications demonstrate the methods' superior efficiency in handling high-dimensional data, particularly showcasing remarkable performance in scenarios with missing values, highlighting their robustness and practical utility in complex biomedical settings.

高维易出错的生存数据在生物医学研究中很普遍,其中收集了许多临床或遗传变量以进行风险评估。协变量测量误差的存在使参数估计和变量选择复杂化,导致非凸优化挑战。我们提出了一种针对高维噪声生存数据的变量误差加性风险回归模型。通过采用最接近的正半定矩阵投影,我们开发了一种快速Lasso方法(半定投影Lasso, SPLasso)及其软阈值变体SPLasso- t,两者都具有理论保证。在温和的假设下,我们建立了这些方法的模型选择一致性、oracle不等式和限制分布。仿真研究和两个实际数据应用表明,该方法在处理高维数据方面具有卓越的效率,特别是在缺失值的情况下表现出卓越的性能,突出了其在复杂生物医学环境中的鲁棒性和实用性。
{"title":"SPLasso for high-dimensional additive hazards regression with covariate measurement error.","authors":"Jiarui Zhang, Hongsheng Liu, Xin Chen, Jinfeng Xu","doi":"10.1093/biomtc/ujaf130","DOIUrl":"https://doi.org/10.1093/biomtc/ujaf130","url":null,"abstract":"<p><p>High-dimensional error-prone survival data are prevalent in biomedical studies, where numerous clinical or genetic variables are collected for risk assessment. The presence of measurement errors in covariates complicates parameter estimation and variable selection, leading to non-convex optimization challenges. We propose an error-in-variables additive hazards regression model for high-dimensional noisy survival data. By employing the nearest positive semi-definite matrix projection, we develop a fast Lasso approach (semi-definite projection Lasso, SPLasso) and its soft thresholding variant (SPLasso-T), both with theoretical guarantees. Under mild assumptions, we establish model selection consistency, oracle inequalities, and limiting distributions for these methods. Simulation studies and two real data applications demonstrate the methods' superior efficiency in handling high-dimensional data, particularly showcasing remarkable performance in scenarios with missing values, highlighting their robustness and practical utility in complex biomedical settings.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 4","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145273319","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bayesian scalar-on-image regression with spatial interactions for modeling Alzheimer's disease. 具有空间交互作用的贝叶斯图像上标量回归模型用于阿尔茨海默病建模。
IF 1.7 4区 数学 Q3 BIOLOGY Pub Date : 2025-10-08 DOI: 10.1093/biomtc/ujaf144
Nilanjana Chakraborty, Qi Long, Suprateek Kundu

There has been substantial progress in predictive modeling for cognitive impairment in neurodegenerative disorders such as Alzheimer's disease (AD), based on neuroimaging biomarkers. However, existing approaches typically do not incorporate heterogeneity that may potentially arise due to interactions between the spatially varying imaging features and supplementary demographic, clinical and genetic risk factors in AD. Unfortunately, ignoring such heterogeneity may potentially result in poor prediction and biased estimation. Building on existing scalar-on-image regression framework, we address this issue by incorporating spatially varying interactions between brain image and supplementary risk factors to model cognitive impairment in AD. The proposed Bayesian method tackles spatial interactions via hierarchical representation for the functional regression coefficients depending on supplementary risk factors, which is embedded in a scalar-on-function framework involving a multi-resolution wavelet decomposition. To address the curse of dimensionality, we induce simultaneous sparsity and clustering via a spike and slab mixture prior, where the slab component is characterized by a latent class distribution. We develop an efficient Markov chain Monte Carlo algorithm for posterior computation. Extensive simulations and application to the longitudinal Alzheimer's Disease Neuroimaging Initiative study illustrate significantly improved prediction of cognitive impairment in AD across multiple visits by our model in comparison with alternate approaches. The proposed approach also identifies key brain regions in AD that exhibit significant association with cognitive abilities, either directly or through interactions with risk factors.

基于神经成像生物标志物的神经退行性疾病(如阿尔茨海默病(AD))认知障碍预测建模已经取得了实质性进展。然而,现有的方法通常没有考虑到AD的异质性,这种异质性可能是由于空间变化的成像特征与补充的人口、临床和遗传风险因素之间的相互作用而产生的。不幸的是,忽视这种异质性可能会导致预测不良和估计偏差。在现有的图像上的标量回归框架的基础上,我们通过结合脑图像和补充风险因素之间的空间变化相互作用来模拟AD的认知障碍来解决这个问题。所提出的贝叶斯方法通过依赖于补充风险因素的功能回归系数的分层表示来处理空间相互作用,该方法嵌入在涉及多分辨率小波分解的标量-函数框架中。为了解决维度的诅咒,我们通过尖峰和板混合先验来诱导同时的稀疏性和聚类,其中板成分的特征是潜在的类分布。提出了一种有效的后验计算马尔可夫链蒙特卡罗算法。纵向阿尔茨海默病神经影像学倡议研究的广泛模拟和应用表明,与其他方法相比,我们的模型在多次就诊中显著提高了对阿尔茨海默病认知障碍的预测。该方法还确定了AD中与认知能力直接或通过与风险因素相互作用而表现出显著关联的关键大脑区域。
{"title":"Bayesian scalar-on-image regression with spatial interactions for modeling Alzheimer's disease.","authors":"Nilanjana Chakraborty, Qi Long, Suprateek Kundu","doi":"10.1093/biomtc/ujaf144","DOIUrl":"10.1093/biomtc/ujaf144","url":null,"abstract":"<p><p>There has been substantial progress in predictive modeling for cognitive impairment in neurodegenerative disorders such as Alzheimer's disease (AD), based on neuroimaging biomarkers. However, existing approaches typically do not incorporate heterogeneity that may potentially arise due to interactions between the spatially varying imaging features and supplementary demographic, clinical and genetic risk factors in AD. Unfortunately, ignoring such heterogeneity may potentially result in poor prediction and biased estimation. Building on existing scalar-on-image regression framework, we address this issue by incorporating spatially varying interactions between brain image and supplementary risk factors to model cognitive impairment in AD. The proposed Bayesian method tackles spatial interactions via hierarchical representation for the functional regression coefficients depending on supplementary risk factors, which is embedded in a scalar-on-function framework involving a multi-resolution wavelet decomposition. To address the curse of dimensionality, we induce simultaneous sparsity and clustering via a spike and slab mixture prior, where the slab component is characterized by a latent class distribution. We develop an efficient Markov chain Monte Carlo algorithm for posterior computation. Extensive simulations and application to the longitudinal Alzheimer's Disease Neuroimaging Initiative study illustrate significantly improved prediction of cognitive impairment in AD across multiple visits by our model in comparison with alternate approaches. The proposed approach also identifies key brain regions in AD that exhibit significant association with cognitive abilities, either directly or through interactions with risk factors.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 4","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12613162/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145501754","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
High-dimensional multi-study multi-modality covariate-augmented generalized factor model. 高维多研究多模态协变量增广广义因子模型。
IF 1.7 4区 数学 Q3 BIOLOGY Pub Date : 2025-07-03 DOI: 10.1093/biomtc/ujaf107
Wei Liu, Qingzhi Zhong

Latent factor models that integrate data from multiple sources/studies or modalities have garnered considerable attention across various disciplines. However, existing methods predominantly focus either on multi-study integration or multi-modality integration, rendering them insufficient for analyzing the diverse modalities measured across multiple studies. To address this limitation and cater to practical needs, we introduce a high-dimensional generalized factor model that seamlessly integrates multi-modality data from multiple studies, while also accommodating additional covariates. We conduct a thorough investigation of the identifiability conditions to enhance the model's interpretability. To tackle the complexity of high-dimensional nonlinear integration caused by 4 large latent random matrices, we utilize a variational lower bound to approximate the observed log-likelihood by employing a variational posterior distribution. By profiling the variational parameters, we establish the asymptotical properties of estimators for model parameters using M-estimation theory. Furthermore, we devise a computationally efficient variational expectation maximization (EM) algorithm to execute the estimation process and a criterion to determine the optimal number of both study-shared and study-specific factors. Extensive simulation studies and a real-world application show that the proposed method significantly outperforms existing methods in terms of estimation accuracy and computational efficiency.

整合来自多个来源/研究或模式的数据的潜在因素模型已经在各个学科中引起了相当大的关注。然而,现有的方法主要集中于多研究整合或多模态整合,使得它们不足以分析多个研究中测量的不同模态。为了解决这一限制并满足实际需要,我们引入了一个高维广义因子模型,该模型无缝集成了来自多个研究的多模态数据,同时还包含了额外的协变量。我们对可识别性条件进行了彻底的调查,以提高模型的可解释性。为了解决由4个大型潜在随机矩阵引起的高维非线性积分的复杂性,我们利用变分下界通过变分后验分布来近似观察到的对数似然。通过刻画变分参数,利用m估计理论建立了模型参数估计量的渐近性质。此外,我们设计了一个计算效率高的变分期望最大化(EM)算法来执行估计过程,并设计了一个标准来确定研究共享和研究特定因素的最佳数量。大量的仿真研究和实际应用表明,该方法在估计精度和计算效率方面明显优于现有方法。
{"title":"High-dimensional multi-study multi-modality covariate-augmented generalized factor model.","authors":"Wei Liu, Qingzhi Zhong","doi":"10.1093/biomtc/ujaf107","DOIUrl":"10.1093/biomtc/ujaf107","url":null,"abstract":"<p><p>Latent factor models that integrate data from multiple sources/studies or modalities have garnered considerable attention across various disciplines. However, existing methods predominantly focus either on multi-study integration or multi-modality integration, rendering them insufficient for analyzing the diverse modalities measured across multiple studies. To address this limitation and cater to practical needs, we introduce a high-dimensional generalized factor model that seamlessly integrates multi-modality data from multiple studies, while also accommodating additional covariates. We conduct a thorough investigation of the identifiability conditions to enhance the model's interpretability. To tackle the complexity of high-dimensional nonlinear integration caused by 4 large latent random matrices, we utilize a variational lower bound to approximate the observed log-likelihood by employing a variational posterior distribution. By profiling the variational parameters, we establish the asymptotical properties of estimators for model parameters using M-estimation theory. Furthermore, we devise a computationally efficient variational expectation maximization (EM) algorithm to execute the estimation process and a criterion to determine the optimal number of both study-shared and study-specific factors. Extensive simulation studies and a real-world application show that the proposed method significantly outperforms existing methods in terms of estimation accuracy and computational efficiency.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 3","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144871261","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Model robust designs for dose-response models. 剂量-反应模型的模型稳健设计。
IF 1.7 4区 数学 Q3 BIOLOGY Pub Date : 2025-07-03 DOI: 10.1093/biomtc/ujaf112
Belmiro P M Duarte, Anthony C Atkinson, Nuno M C Oliveira

An optimal experimental design is a structured data collection plan aimed at maximizing the amount of information gathered. Determining an optimal experimental design, however, relies on the assumption that a predetermined model structure, relating the response and covariates, is known a priori. In practical scenarios, such as dose-response modeling, the form of the model representing the "true" relationship is frequently unknown, although there exists a finite set or pool of potential alternative models. Designing experiments based on a single model from this set may lead to inefficiency or inadequacy if the "true" model differs from that assumed when calculating the design. One approach to minimize the impact of the uncertainty in the model on the experimental plan is known as model robust design. In this context, we systematically address the challenge of finding approximate optimal model robust experimental designs. Our focus is on locally optimal designs, so allowing some of the models in the pool to be nonlinear. We present three Semidefinite Programming-based formulations, each aligned with one of the classes of model robustness criteria introduced by Läuter. These formulations exploit the semidefinite representability of the robustness criteria, leading to the representation of the robust problem as a semidefinite program. To ensure comparability of information measures across various models, we employ standardized designs. To illustrate the application of our approach, we consider a dose-response study where, initially, seven models were postulated as potential candidates to describe the dose-response relationship.

最佳实验设计是一个结构化的数据收集计划,旨在最大限度地收集信息。然而,确定最佳实验设计依赖于一个预先确定的模型结构的假设,该模型结构与响应和协变量有关,是已知的先验。在实际情况中,例如剂量-反应建模,尽管存在有限的备选模型集或库,但代表“真实”关系的模型的形式往往是未知的。如果“真实”模型与计算设计时假设的模型不同,则基于这组模型中的单个模型设计实验可能会导致效率低下或不充分。将模型中的不确定性对实验计划的影响最小化的一种方法被称为模型稳健设计。在这种情况下,我们系统地解决了寻找近似最优模型稳健实验设计的挑战。我们的重点是局部最优设计,因此允许池中的一些模型是非线性的。我们提出了三个基于半确定规划的公式,每个公式都与Läuter引入的一类模型鲁棒性标准相一致。这些公式利用了鲁棒性准则的半定可表示性,从而将鲁棒问题表示为半定规划。为了确保不同模型间信息度量的可比性,我们采用了标准化设计。为了说明我们方法的应用,我们考虑了一项剂量-反应研究,其中最初假设了七个模型作为描述剂量-反应关系的潜在候选模型。
{"title":"Model robust designs for dose-response models.","authors":"Belmiro P M Duarte, Anthony C Atkinson, Nuno M C Oliveira","doi":"10.1093/biomtc/ujaf112","DOIUrl":"https://doi.org/10.1093/biomtc/ujaf112","url":null,"abstract":"<p><p>An optimal experimental design is a structured data collection plan aimed at maximizing the amount of information gathered. Determining an optimal experimental design, however, relies on the assumption that a predetermined model structure, relating the response and covariates, is known a priori. In practical scenarios, such as dose-response modeling, the form of the model representing the \"true\" relationship is frequently unknown, although there exists a finite set or pool of potential alternative models. Designing experiments based on a single model from this set may lead to inefficiency or inadequacy if the \"true\" model differs from that assumed when calculating the design. One approach to minimize the impact of the uncertainty in the model on the experimental plan is known as model robust design. In this context, we systematically address the challenge of finding approximate optimal model robust experimental designs. Our focus is on locally optimal designs, so allowing some of the models in the pool to be nonlinear. We present three Semidefinite Programming-based formulations, each aligned with one of the classes of model robustness criteria introduced by Läuter. These formulations exploit the semidefinite representability of the robustness criteria, leading to the representation of the robust problem as a semidefinite program. To ensure comparability of information measures across various models, we employ standardized designs. To illustrate the application of our approach, we consider a dose-response study where, initially, seven models were postulated as potential candidates to describe the dose-response relationship.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 3","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144941118","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Biometrics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1