首页 > 最新文献

Biometrics最新文献

英文 中文
A semiparametric Gaussian Mixture Model with spatial dependence and its application to whole-slide image clustering analysis. 具有空间依赖性的半参数高斯混合模型及其在整张幻灯片图像聚类分析中的应用。
IF 1.7 4区 数学 Q3 BIOLOGY Pub Date : 2025-10-08 DOI: 10.1093/biomtc/ujaf149
Baichen Yu, Jin Liu, Hansheng Wang

We develop here a semiparametric Gaussian Mixture Model (SGMM) for unsupervised learning with valuable spatial information taken into consideration. Specifically, we assume for each instance a random location. Then, conditional on this random location, we assume for the feature vector a standard Gaussian Mixture Model (GMM). The proposed SGMM allows the mixing probability to be nonparametrically related to the spatial location. Compared with a classical GMM, SGMM is considerably more flexible and allows the instances from the same class to be spatially clustered. To estimate the SGMM, novel EM algorithms are developed and rigorous asymptotic theories are established. Extensive numerical simulations are conducted to demonstrate our finite sample performance. For a real application, we apply our SGMM method to the CAMELYON16 dataset of whole-slide images for breast cancer detection. The SGMM method demonstrates outstanding clustering performance.

我们在这里开发了一个半参数高斯混合模型(SGMM),用于考虑有价值的空间信息的无监督学习。具体来说,我们假设每个实例的位置是随机的。然后,在这个随机位置的条件下,我们假设特征向量为标准高斯混合模型(GMM)。所提出的SGMM允许混合概率与空间位置非参数相关。与经典的GMM相比,SGMM灵活得多,并允许对来自同一类的实例进行空间聚类。为了估计SGMM,开发了新的EM算法,并建立了严格的渐近理论。大量的数值模拟证明了我们的有限样本性能。在实际应用中,我们将我们的SGMM方法应用于CAMELYON16全幻灯片图像数据集,用于乳腺癌检测。SGMM方法显示了出色的聚类性能。
{"title":"A semiparametric Gaussian Mixture Model with spatial dependence and its application to whole-slide image clustering analysis.","authors":"Baichen Yu, Jin Liu, Hansheng Wang","doi":"10.1093/biomtc/ujaf149","DOIUrl":"https://doi.org/10.1093/biomtc/ujaf149","url":null,"abstract":"<p><p>We develop here a semiparametric Gaussian Mixture Model (SGMM) for unsupervised learning with valuable spatial information taken into consideration. Specifically, we assume for each instance a random location. Then, conditional on this random location, we assume for the feature vector a standard Gaussian Mixture Model (GMM). The proposed SGMM allows the mixing probability to be nonparametrically related to the spatial location. Compared with a classical GMM, SGMM is considerably more flexible and allows the instances from the same class to be spatially clustered. To estimate the SGMM, novel EM algorithms are developed and rigorous asymptotic theories are established. Extensive numerical simulations are conducted to demonstrate our finite sample performance. For a real application, we apply our SGMM method to the CAMELYON16 dataset of whole-slide images for breast cancer detection. The SGMM method demonstrates outstanding clustering performance.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 4","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145653431","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SPLasso for high-dimensional additive hazards regression with covariate measurement error. 带有协变量测量误差的高维加性危害回归的SPLasso。
IF 1.7 4区 数学 Q3 BIOLOGY Pub Date : 2025-10-08 DOI: 10.1093/biomtc/ujaf130
Jiarui Zhang, Hongsheng Liu, Xin Chen, Jinfeng Xu

High-dimensional error-prone survival data are prevalent in biomedical studies, where numerous clinical or genetic variables are collected for risk assessment. The presence of measurement errors in covariates complicates parameter estimation and variable selection, leading to non-convex optimization challenges. We propose an error-in-variables additive hazards regression model for high-dimensional noisy survival data. By employing the nearest positive semi-definite matrix projection, we develop a fast Lasso approach (semi-definite projection Lasso, SPLasso) and its soft thresholding variant (SPLasso-T), both with theoretical guarantees. Under mild assumptions, we establish model selection consistency, oracle inequalities, and limiting distributions for these methods. Simulation studies and two real data applications demonstrate the methods' superior efficiency in handling high-dimensional data, particularly showcasing remarkable performance in scenarios with missing values, highlighting their robustness and practical utility in complex biomedical settings.

高维易出错的生存数据在生物医学研究中很普遍,其中收集了许多临床或遗传变量以进行风险评估。协变量测量误差的存在使参数估计和变量选择复杂化,导致非凸优化挑战。我们提出了一种针对高维噪声生存数据的变量误差加性风险回归模型。通过采用最接近的正半定矩阵投影,我们开发了一种快速Lasso方法(半定投影Lasso, SPLasso)及其软阈值变体SPLasso- t,两者都具有理论保证。在温和的假设下,我们建立了这些方法的模型选择一致性、oracle不等式和限制分布。仿真研究和两个实际数据应用表明,该方法在处理高维数据方面具有卓越的效率,特别是在缺失值的情况下表现出卓越的性能,突出了其在复杂生物医学环境中的鲁棒性和实用性。
{"title":"SPLasso for high-dimensional additive hazards regression with covariate measurement error.","authors":"Jiarui Zhang, Hongsheng Liu, Xin Chen, Jinfeng Xu","doi":"10.1093/biomtc/ujaf130","DOIUrl":"https://doi.org/10.1093/biomtc/ujaf130","url":null,"abstract":"<p><p>High-dimensional error-prone survival data are prevalent in biomedical studies, where numerous clinical or genetic variables are collected for risk assessment. The presence of measurement errors in covariates complicates parameter estimation and variable selection, leading to non-convex optimization challenges. We propose an error-in-variables additive hazards regression model for high-dimensional noisy survival data. By employing the nearest positive semi-definite matrix projection, we develop a fast Lasso approach (semi-definite projection Lasso, SPLasso) and its soft thresholding variant (SPLasso-T), both with theoretical guarantees. Under mild assumptions, we establish model selection consistency, oracle inequalities, and limiting distributions for these methods. Simulation studies and two real data applications demonstrate the methods' superior efficiency in handling high-dimensional data, particularly showcasing remarkable performance in scenarios with missing values, highlighting their robustness and practical utility in complex biomedical settings.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 4","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145273319","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bayesian scalar-on-image regression with spatial interactions for modeling Alzheimer's disease. 具有空间交互作用的贝叶斯图像上标量回归模型用于阿尔茨海默病建模。
IF 1.7 4区 数学 Q3 BIOLOGY Pub Date : 2025-10-08 DOI: 10.1093/biomtc/ujaf144
Nilanjana Chakraborty, Qi Long, Suprateek Kundu

There has been substantial progress in predictive modeling for cognitive impairment in neurodegenerative disorders such as Alzheimer's disease (AD), based on neuroimaging biomarkers. However, existing approaches typically do not incorporate heterogeneity that may potentially arise due to interactions between the spatially varying imaging features and supplementary demographic, clinical and genetic risk factors in AD. Unfortunately, ignoring such heterogeneity may potentially result in poor prediction and biased estimation. Building on existing scalar-on-image regression framework, we address this issue by incorporating spatially varying interactions between brain image and supplementary risk factors to model cognitive impairment in AD. The proposed Bayesian method tackles spatial interactions via hierarchical representation for the functional regression coefficients depending on supplementary risk factors, which is embedded in a scalar-on-function framework involving a multi-resolution wavelet decomposition. To address the curse of dimensionality, we induce simultaneous sparsity and clustering via a spike and slab mixture prior, where the slab component is characterized by a latent class distribution. We develop an efficient Markov chain Monte Carlo algorithm for posterior computation. Extensive simulations and application to the longitudinal Alzheimer's Disease Neuroimaging Initiative study illustrate significantly improved prediction of cognitive impairment in AD across multiple visits by our model in comparison with alternate approaches. The proposed approach also identifies key brain regions in AD that exhibit significant association with cognitive abilities, either directly or through interactions with risk factors.

基于神经成像生物标志物的神经退行性疾病(如阿尔茨海默病(AD))认知障碍预测建模已经取得了实质性进展。然而,现有的方法通常没有考虑到AD的异质性,这种异质性可能是由于空间变化的成像特征与补充的人口、临床和遗传风险因素之间的相互作用而产生的。不幸的是,忽视这种异质性可能会导致预测不良和估计偏差。在现有的图像上的标量回归框架的基础上,我们通过结合脑图像和补充风险因素之间的空间变化相互作用来模拟AD的认知障碍来解决这个问题。所提出的贝叶斯方法通过依赖于补充风险因素的功能回归系数的分层表示来处理空间相互作用,该方法嵌入在涉及多分辨率小波分解的标量-函数框架中。为了解决维度的诅咒,我们通过尖峰和板混合先验来诱导同时的稀疏性和聚类,其中板成分的特征是潜在的类分布。提出了一种有效的后验计算马尔可夫链蒙特卡罗算法。纵向阿尔茨海默病神经影像学倡议研究的广泛模拟和应用表明,与其他方法相比,我们的模型在多次就诊中显著提高了对阿尔茨海默病认知障碍的预测。该方法还确定了AD中与认知能力直接或通过与风险因素相互作用而表现出显著关联的关键大脑区域。
{"title":"Bayesian scalar-on-image regression with spatial interactions for modeling Alzheimer's disease.","authors":"Nilanjana Chakraborty, Qi Long, Suprateek Kundu","doi":"10.1093/biomtc/ujaf144","DOIUrl":"10.1093/biomtc/ujaf144","url":null,"abstract":"<p><p>There has been substantial progress in predictive modeling for cognitive impairment in neurodegenerative disorders such as Alzheimer's disease (AD), based on neuroimaging biomarkers. However, existing approaches typically do not incorporate heterogeneity that may potentially arise due to interactions between the spatially varying imaging features and supplementary demographic, clinical and genetic risk factors in AD. Unfortunately, ignoring such heterogeneity may potentially result in poor prediction and biased estimation. Building on existing scalar-on-image regression framework, we address this issue by incorporating spatially varying interactions between brain image and supplementary risk factors to model cognitive impairment in AD. The proposed Bayesian method tackles spatial interactions via hierarchical representation for the functional regression coefficients depending on supplementary risk factors, which is embedded in a scalar-on-function framework involving a multi-resolution wavelet decomposition. To address the curse of dimensionality, we induce simultaneous sparsity and clustering via a spike and slab mixture prior, where the slab component is characterized by a latent class distribution. We develop an efficient Markov chain Monte Carlo algorithm for posterior computation. Extensive simulations and application to the longitudinal Alzheimer's Disease Neuroimaging Initiative study illustrate significantly improved prediction of cognitive impairment in AD across multiple visits by our model in comparison with alternate approaches. The proposed approach also identifies key brain regions in AD that exhibit significant association with cognitive abilities, either directly or through interactions with risk factors.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 4","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12613162/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145501754","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
High-dimensional multi-study multi-modality covariate-augmented generalized factor model. 高维多研究多模态协变量增广广义因子模型。
IF 1.7 4区 数学 Q3 BIOLOGY Pub Date : 2025-07-03 DOI: 10.1093/biomtc/ujaf107
Wei Liu, Qingzhi Zhong

Latent factor models that integrate data from multiple sources/studies or modalities have garnered considerable attention across various disciplines. However, existing methods predominantly focus either on multi-study integration or multi-modality integration, rendering them insufficient for analyzing the diverse modalities measured across multiple studies. To address this limitation and cater to practical needs, we introduce a high-dimensional generalized factor model that seamlessly integrates multi-modality data from multiple studies, while also accommodating additional covariates. We conduct a thorough investigation of the identifiability conditions to enhance the model's interpretability. To tackle the complexity of high-dimensional nonlinear integration caused by 4 large latent random matrices, we utilize a variational lower bound to approximate the observed log-likelihood by employing a variational posterior distribution. By profiling the variational parameters, we establish the asymptotical properties of estimators for model parameters using M-estimation theory. Furthermore, we devise a computationally efficient variational expectation maximization (EM) algorithm to execute the estimation process and a criterion to determine the optimal number of both study-shared and study-specific factors. Extensive simulation studies and a real-world application show that the proposed method significantly outperforms existing methods in terms of estimation accuracy and computational efficiency.

整合来自多个来源/研究或模式的数据的潜在因素模型已经在各个学科中引起了相当大的关注。然而,现有的方法主要集中于多研究整合或多模态整合,使得它们不足以分析多个研究中测量的不同模态。为了解决这一限制并满足实际需要,我们引入了一个高维广义因子模型,该模型无缝集成了来自多个研究的多模态数据,同时还包含了额外的协变量。我们对可识别性条件进行了彻底的调查,以提高模型的可解释性。为了解决由4个大型潜在随机矩阵引起的高维非线性积分的复杂性,我们利用变分下界通过变分后验分布来近似观察到的对数似然。通过刻画变分参数,利用m估计理论建立了模型参数估计量的渐近性质。此外,我们设计了一个计算效率高的变分期望最大化(EM)算法来执行估计过程,并设计了一个标准来确定研究共享和研究特定因素的最佳数量。大量的仿真研究和实际应用表明,该方法在估计精度和计算效率方面明显优于现有方法。
{"title":"High-dimensional multi-study multi-modality covariate-augmented generalized factor model.","authors":"Wei Liu, Qingzhi Zhong","doi":"10.1093/biomtc/ujaf107","DOIUrl":"10.1093/biomtc/ujaf107","url":null,"abstract":"<p><p>Latent factor models that integrate data from multiple sources/studies or modalities have garnered considerable attention across various disciplines. However, existing methods predominantly focus either on multi-study integration or multi-modality integration, rendering them insufficient for analyzing the diverse modalities measured across multiple studies. To address this limitation and cater to practical needs, we introduce a high-dimensional generalized factor model that seamlessly integrates multi-modality data from multiple studies, while also accommodating additional covariates. We conduct a thorough investigation of the identifiability conditions to enhance the model's interpretability. To tackle the complexity of high-dimensional nonlinear integration caused by 4 large latent random matrices, we utilize a variational lower bound to approximate the observed log-likelihood by employing a variational posterior distribution. By profiling the variational parameters, we establish the asymptotical properties of estimators for model parameters using M-estimation theory. Furthermore, we devise a computationally efficient variational expectation maximization (EM) algorithm to execute the estimation process and a criterion to determine the optimal number of both study-shared and study-specific factors. Extensive simulation studies and a real-world application show that the proposed method significantly outperforms existing methods in terms of estimation accuracy and computational efficiency.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 3","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144871261","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Model robust designs for dose-response models. 剂量-反应模型的模型稳健设计。
IF 1.7 4区 数学 Q3 BIOLOGY Pub Date : 2025-07-03 DOI: 10.1093/biomtc/ujaf112
Belmiro P M Duarte, Anthony C Atkinson, Nuno M C Oliveira

An optimal experimental design is a structured data collection plan aimed at maximizing the amount of information gathered. Determining an optimal experimental design, however, relies on the assumption that a predetermined model structure, relating the response and covariates, is known a priori. In practical scenarios, such as dose-response modeling, the form of the model representing the "true" relationship is frequently unknown, although there exists a finite set or pool of potential alternative models. Designing experiments based on a single model from this set may lead to inefficiency or inadequacy if the "true" model differs from that assumed when calculating the design. One approach to minimize the impact of the uncertainty in the model on the experimental plan is known as model robust design. In this context, we systematically address the challenge of finding approximate optimal model robust experimental designs. Our focus is on locally optimal designs, so allowing some of the models in the pool to be nonlinear. We present three Semidefinite Programming-based formulations, each aligned with one of the classes of model robustness criteria introduced by Läuter. These formulations exploit the semidefinite representability of the robustness criteria, leading to the representation of the robust problem as a semidefinite program. To ensure comparability of information measures across various models, we employ standardized designs. To illustrate the application of our approach, we consider a dose-response study where, initially, seven models were postulated as potential candidates to describe the dose-response relationship.

最佳实验设计是一个结构化的数据收集计划,旨在最大限度地收集信息。然而,确定最佳实验设计依赖于一个预先确定的模型结构的假设,该模型结构与响应和协变量有关,是已知的先验。在实际情况中,例如剂量-反应建模,尽管存在有限的备选模型集或库,但代表“真实”关系的模型的形式往往是未知的。如果“真实”模型与计算设计时假设的模型不同,则基于这组模型中的单个模型设计实验可能会导致效率低下或不充分。将模型中的不确定性对实验计划的影响最小化的一种方法被称为模型稳健设计。在这种情况下,我们系统地解决了寻找近似最优模型稳健实验设计的挑战。我们的重点是局部最优设计,因此允许池中的一些模型是非线性的。我们提出了三个基于半确定规划的公式,每个公式都与Läuter引入的一类模型鲁棒性标准相一致。这些公式利用了鲁棒性准则的半定可表示性,从而将鲁棒问题表示为半定规划。为了确保不同模型间信息度量的可比性,我们采用了标准化设计。为了说明我们方法的应用,我们考虑了一项剂量-反应研究,其中最初假设了七个模型作为描述剂量-反应关系的潜在候选模型。
{"title":"Model robust designs for dose-response models.","authors":"Belmiro P M Duarte, Anthony C Atkinson, Nuno M C Oliveira","doi":"10.1093/biomtc/ujaf112","DOIUrl":"https://doi.org/10.1093/biomtc/ujaf112","url":null,"abstract":"<p><p>An optimal experimental design is a structured data collection plan aimed at maximizing the amount of information gathered. Determining an optimal experimental design, however, relies on the assumption that a predetermined model structure, relating the response and covariates, is known a priori. In practical scenarios, such as dose-response modeling, the form of the model representing the \"true\" relationship is frequently unknown, although there exists a finite set or pool of potential alternative models. Designing experiments based on a single model from this set may lead to inefficiency or inadequacy if the \"true\" model differs from that assumed when calculating the design. One approach to minimize the impact of the uncertainty in the model on the experimental plan is known as model robust design. In this context, we systematically address the challenge of finding approximate optimal model robust experimental designs. Our focus is on locally optimal designs, so allowing some of the models in the pool to be nonlinear. We present three Semidefinite Programming-based formulations, each aligned with one of the classes of model robustness criteria introduced by Läuter. These formulations exploit the semidefinite representability of the robustness criteria, leading to the representation of the robust problem as a semidefinite program. To ensure comparability of information measures across various models, we employ standardized designs. To illustrate the application of our approach, we consider a dose-response study where, initially, seven models were postulated as potential candidates to describe the dose-response relationship.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 3","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144941118","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Semi-supervised linear regression: enhancing efficiency and robustness in high dimensions. 半监督线性回归:提高高维的效率和鲁棒性。
IF 1.7 4区 数学 Q3 BIOLOGY Pub Date : 2025-07-03 DOI: 10.1093/biomtc/ujaf113
Kai Chen, Yuqian Zhang

In semi-supervised learning, the prevailing understanding suggests that observing additional unlabeled samples improves estimation accuracy for linear parameters only in the case of model misspecification. In this work, we challenge such a claim and show that additional unlabeled samples are beneficial in high-dimensional settings. Initially focusing on a dense scenario, we introduce robust semi-supervised estimators for the regression coefficient without relying on sparse structures in the population slope. Even when the true underlying model is linear, we show that leveraging information from large-scale unlabeled data helps reduce estimation bias, thereby improving both estimation accuracy and inference robustness. Moreover, we propose semi-supervised methods with further enhanced efficiency in scenarios with a sparse linear slope. The performance of the proposed methods is demonstrated through extensive numerical studies.

在半监督学习中,普遍的理解表明,只有在模型错误规范的情况下,观察额外的未标记样本才能提高线性参数的估计精度。在这项工作中,我们挑战这样的说法,并表明额外的未标记样本在高维环境中是有益的。首先关注密集场景,我们引入了回归系数的鲁棒半监督估计器,而不依赖于总体斜率中的稀疏结构。即使真正的底层模型是线性的,我们也表明利用大规模未标记数据的信息有助于减少估计偏差,从而提高估计精度和推理鲁棒性。此外,我们提出了半监督方法,进一步提高了在稀疏线性斜率场景下的效率。通过广泛的数值研究证明了所提出方法的性能。
{"title":"Semi-supervised linear regression: enhancing efficiency and robustness in high dimensions.","authors":"Kai Chen, Yuqian Zhang","doi":"10.1093/biomtc/ujaf113","DOIUrl":"https://doi.org/10.1093/biomtc/ujaf113","url":null,"abstract":"<p><p>In semi-supervised learning, the prevailing understanding suggests that observing additional unlabeled samples improves estimation accuracy for linear parameters only in the case of model misspecification. In this work, we challenge such a claim and show that additional unlabeled samples are beneficial in high-dimensional settings. Initially focusing on a dense scenario, we introduce robust semi-supervised estimators for the regression coefficient without relying on sparse structures in the population slope. Even when the true underlying model is linear, we show that leveraging information from large-scale unlabeled data helps reduce estimation bias, thereby improving both estimation accuracy and inference robustness. Moreover, we propose semi-supervised methods with further enhanced efficiency in scenarios with a sparse linear slope. The performance of the proposed methods is demonstrated through extensive numerical studies.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 3","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144941170","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Frequency band analysis of nonstationary multivariate time series. 非平稳多元时间序列的频带分析。
IF 1.7 4区 数学 Q3 BIOLOGY Pub Date : 2025-07-03 DOI: 10.1093/biomtc/ujaf083
Raanju R Sundararajan, Scott A Bruce

Information from frequency bands in biomedical time series provides useful summaries of the observed signal. Many existing methods consider summaries of the time series obtained over a few well-known, pre-defined frequency bands of interest. However, there is a dearth of data-driven methods for identifying frequency bands that optimally summarize frequency-domain information in the time series. A new method to identify partition points in the frequency space of a multivariate locally stationary time series is proposed. These partition points signify changes across frequencies in the time-varying behavior of the signal and provide frequency band summary measures that best preserve nonstationary dynamics of the observed series. An $L_2$-norm based discrepancy measure that finds differences in the time-varying spectral density matrix is constructed, and its asymptotic properties are derived. New nonparametric bootstrap tests are also provided to identify significant frequency partition points and to identify components and cross-components of the spectral matrix exhibiting changes over frequencies. Finite-sample performance of the proposed method is illustrated via simulations. The proposed method is used to develop optimal frequency band summary measures for characterizing time-varying behavior in resting-state electroencephalography time series, as well as identifying components and cross-components associated with each frequency partition point.

生物医学时间序列中的频带信息提供了观测信号的有用摘要。许多现有的方法考虑在几个众所周知的、预先定义的感兴趣的频带上获得的时间序列的摘要。然而,缺乏数据驱动的方法来识别最佳地总结时间序列中的频域信息的频带。提出了一种识别多变量局部平稳时间序列频率空间中分割点的新方法。这些划分点表示信号时变行为中不同频率的变化,并提供最好地保持观测序列的非平稳动态的频带汇总测量。构造了一个基于L_2 -范数的时变谱密度矩阵差值测度,并推导了其渐近性质。还提供了新的非参数自举测试来识别重要的频率划分点,并识别频谱矩阵中显示随频率变化的分量和交叉分量。通过仿真验证了该方法的有限样本性能。该方法用于开发最佳频带汇总度量,以表征静息状态脑电图时间序列中的时变行为,以及识别与每个频率划分点相关的分量和交叉分量。
{"title":"Frequency band analysis of nonstationary multivariate time series.","authors":"Raanju R Sundararajan, Scott A Bruce","doi":"10.1093/biomtc/ujaf083","DOIUrl":"10.1093/biomtc/ujaf083","url":null,"abstract":"<p><p>Information from frequency bands in biomedical time series provides useful summaries of the observed signal. Many existing methods consider summaries of the time series obtained over a few well-known, pre-defined frequency bands of interest. However, there is a dearth of data-driven methods for identifying frequency bands that optimally summarize frequency-domain information in the time series. A new method to identify partition points in the frequency space of a multivariate locally stationary time series is proposed. These partition points signify changes across frequencies in the time-varying behavior of the signal and provide frequency band summary measures that best preserve nonstationary dynamics of the observed series. An $L_2$-norm based discrepancy measure that finds differences in the time-varying spectral density matrix is constructed, and its asymptotic properties are derived. New nonparametric bootstrap tests are also provided to identify significant frequency partition points and to identify components and cross-components of the spectral matrix exhibiting changes over frequencies. Finite-sample performance of the proposed method is illustrated via simulations. The proposed method is used to develop optimal frequency band summary measures for characterizing time-varying behavior in resting-state electroencephalography time series, as well as identifying components and cross-components associated with each frequency partition point.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 3","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12290460/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144706182","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Causal machine learning for heterogeneous treatment effects in the presence of missing outcome data. 在缺少结果数据的情况下,异质性治疗效果的因果机器学习。
IF 1.7 4区 数学 Q3 BIOLOGY Pub Date : 2025-07-03 DOI: 10.1093/biomtc/ujaf098
Matthew Pryce, Karla Diaz-Ordaz, Ruth H Keogh, Stijn Vansteelandt

When estimating heterogeneous treatment effects, missing outcome data can complicate treatment effect estimation, causing certain subgroups of the population to be poorly represented. In this work, we discuss this commonly overlooked problem and consider the impact that missing at random outcome data has on causal machine learning estimators for the conditional average treatment effect (CATE). We propose 2 de-biased machine learning estimators for the CATE, the mDR-learner, and mEP-learner, which address the issue of under-representation by integrating inverse probability of censoring weights into the DR-learner and EP-learner, respectively. We show that under reasonable conditions, these estimators are oracle efficient and illustrate their favorable performance through simulated data settings, comparing them to existing CATE estimators, including comparison to estimators that use common missing data techniques. We present an example of their application using the GBSG2 trial, exploring treatment effect heterogeneity when comparing hormonal therapies to non-hormonal therapies among breast cancer patients post surgery, and offer guidance on the decisions a practitioner must make when implementing these estimators.

在估计异质性治疗效果时,缺少结局数据会使治疗效果估计复杂化,导致人群的某些亚组代表性不足。在这项工作中,我们讨论了这个经常被忽视的问题,并考虑了随机结果数据缺失对条件平均处理效果(CATE)的因果机器学习估计器的影响。我们为CATE, mDR-learner和mEP-learner提出了2个去偏机器学习估计器,它们分别通过将审查权的逆概率集成到DR-learner和EP-learner中来解决代表性不足的问题。我们表明,在合理的条件下,这些估计器是oracle高效的,并通过模拟数据设置说明它们的良好性能,将它们与现有的CATE估计器进行比较,包括与使用常见缺失数据技术的估计器进行比较。我们在GBSG2试验中展示了它们的应用实例,在比较乳腺癌术后患者的激素治疗与非激素治疗时,探索治疗效果的异质性,并为医生在实施这些评估时必须做出的决定提供指导。
{"title":"Causal machine learning for heterogeneous treatment effects in the presence of missing outcome data.","authors":"Matthew Pryce, Karla Diaz-Ordaz, Ruth H Keogh, Stijn Vansteelandt","doi":"10.1093/biomtc/ujaf098","DOIUrl":"https://doi.org/10.1093/biomtc/ujaf098","url":null,"abstract":"<p><p>When estimating heterogeneous treatment effects, missing outcome data can complicate treatment effect estimation, causing certain subgroups of the population to be poorly represented. In this work, we discuss this commonly overlooked problem and consider the impact that missing at random outcome data has on causal machine learning estimators for the conditional average treatment effect (CATE). We propose 2 de-biased machine learning estimators for the CATE, the mDR-learner, and mEP-learner, which address the issue of under-representation by integrating inverse probability of censoring weights into the DR-learner and EP-learner, respectively. We show that under reasonable conditions, these estimators are oracle efficient and illustrate their favorable performance through simulated data settings, comparing them to existing CATE estimators, including comparison to estimators that use common missing data techniques. We present an example of their application using the GBSG2 trial, exploring treatment effect heterogeneity when comparing hormonal therapies to non-hormonal therapies among breast cancer patients post surgery, and offer guidance on the decisions a practitioner must make when implementing these estimators.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 3","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144752242","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Exploring the heterogeneity in recurrent episode lengths based on quantile regression. 基于分位数回归探讨复发发作长度的异质性。
IF 1.7 4区 数学 Q3 BIOLOGY Pub Date : 2025-07-03 DOI: 10.1093/biomtc/ujaf122
Yi Liu, Guillermo E Umpierrez, Limin Peng

Recurrent episode data frequently arise in chronic disease studies when an event of interest occurs repeatedly and each occurrence lasts for a random period of time. Understanding the heterogeneity in recurrent episode lengths can help guide dynamic and customized disease management. However, there has been relative sparse attention to methods tailored to this end. Existing approaches either do not confer direct interpretation on episode lengths or involve restrictive or unrealistic distributional assumptions, such as exchangeability of within-individual episode lengths. In this work, we propose a modeling strategy that overcomes these limitations through adopting quantile regression and sensibly incorporating time-dependent covariates. Treating recurrent episodes as clustered data, we develop an estimation procedure that properly handles the special complications, including dependent censoring, dependent truncation, and informative cluster size. Our estimation procedure is computationally simple and yields estimators with desirable asymptotic properties. Our numerical studies demonstrate the advantages of the proposed method over naive adaptations of existing approaches.

在慢性病研究中,当感兴趣的事件反复发生且每次发生持续一段随机时间时,经常出现复发性发作数据。了解复发期长度的异质性有助于指导动态和定制的疾病管理。然而,对为此目的量身定制的方法的关注相对较少。现有的方法要么不能直接解释情节长度,要么涉及限制性或不切实际的分布假设,例如个体情节长度的可交换性。在这项工作中,我们提出了一种建模策略,通过采用分位数回归和合理地结合时间相关协变量来克服这些限制。将反复发作的事件作为聚类数据,我们开发了一种估计程序,可以适当地处理特殊的并发症,包括依赖审查,依赖截断和信息聚类大小。我们的估计过程计算简单,得到的估计量具有理想的渐近性质。我们的数值研究表明,所提出的方法优于现有方法的幼稚适应。
{"title":"Exploring the heterogeneity in recurrent episode lengths based on quantile regression.","authors":"Yi Liu, Guillermo E Umpierrez, Limin Peng","doi":"10.1093/biomtc/ujaf122","DOIUrl":"10.1093/biomtc/ujaf122","url":null,"abstract":"<p><p>Recurrent episode data frequently arise in chronic disease studies when an event of interest occurs repeatedly and each occurrence lasts for a random period of time. Understanding the heterogeneity in recurrent episode lengths can help guide dynamic and customized disease management. However, there has been relative sparse attention to methods tailored to this end. Existing approaches either do not confer direct interpretation on episode lengths or involve restrictive or unrealistic distributional assumptions, such as exchangeability of within-individual episode lengths. In this work, we propose a modeling strategy that overcomes these limitations through adopting quantile regression and sensibly incorporating time-dependent covariates. Treating recurrent episodes as clustered data, we develop an estimation procedure that properly handles the special complications, including dependent censoring, dependent truncation, and informative cluster size. Our estimation procedure is computationally simple and yields estimators with desirable asymptotic properties. Our numerical studies demonstrate the advantages of the proposed method over naive adaptations of existing approaches.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 3","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12448847/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145091020","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Adjusted predictions for generalized estimating equations. 广义估计方程的调整预测。
IF 1.4 4区 数学 Q3 BIOLOGY Pub Date : 2025-07-03 DOI: 10.1093/biomtc/ujaf090
Francis K C Hui, Samuel Muller, Alan H Welsh

Generalized estimating equations (GEEs) are a popular statistical method for longitudinal data analysis, requiring specification of the first 2 marginal moments of the response along with a working correlation matrix to capture temporal correlations within a cluster. When it comes to prediction at future/new time points using GEEs, a standard approach adopted by practitioners and software is to base it simply on the marginal mean model. In this article, we propose an alternative approach to prediction for independent cluster GEEs. By viewing the GEE as solving an iterative working linear model, we borrow ideas from universal kriging to construct an adjusted predictor that exploits working cross-correlations between the current and new observations within the same cluster. We establish theoretical conditions for the adjusted GEE predictor to outperform the standard GEE predictor. Simulations and an application to longitudinal data on the growth of sitka spruces demonstrate that, even when we misspecify the working correlation structure, adjusted GEE predictors can achieve better performance relative to standard GEE predictors, the so-called "oracle" GEE predictor using all time points, and potentially even cluster-specific predictions from a generalized linear mixed model.

广义估计方程(GEEs)是一种流行的纵向数据分析统计方法,需要指定响应的前两个边缘矩以及工作相关矩阵,以捕获集群内的时间相关性。当使用GEEs预测未来/新时间点时,从业者和软件采用的标准方法是简单地基于边际平均模型。在本文中,我们提出了一种预测独立集群GEEs的替代方法。通过将GEE视为求解迭代工作线性模型,我们借用通用克里金的思想来构建一个调整后的预测器,该预测器利用同一簇内当前和新观测之间的工作相互关系。我们建立了调整后的GEE预测器优于标准GEE预测器的理论条件。对锡特卡云杉生长的纵向数据的模拟和应用表明,即使我们错误地指定了工作相关结构,调整后的GEE预测器也可以获得更好的性能,相对于标准的GEE预测器,所谓的“oracle”GEE预测器使用所有时间点,甚至可能来自广义线性混合模型的特定集群预测。
{"title":"Adjusted predictions for generalized estimating equations.","authors":"Francis K C Hui, Samuel Muller, Alan H Welsh","doi":"10.1093/biomtc/ujaf090","DOIUrl":"https://doi.org/10.1093/biomtc/ujaf090","url":null,"abstract":"<p><p>Generalized estimating equations (GEEs) are a popular statistical method for longitudinal data analysis, requiring specification of the first 2 marginal moments of the response along with a working correlation matrix to capture temporal correlations within a cluster. When it comes to prediction at future/new time points using GEEs, a standard approach adopted by practitioners and software is to base it simply on the marginal mean model. In this article, we propose an alternative approach to prediction for independent cluster GEEs. By viewing the GEE as solving an iterative working linear model, we borrow ideas from universal kriging to construct an adjusted predictor that exploits working cross-correlations between the current and new observations within the same cluster. We establish theoretical conditions for the adjusted GEE predictor to outperform the standard GEE predictor. Simulations and an application to longitudinal data on the growth of sitka spruces demonstrate that, even when we misspecify the working correlation structure, adjusted GEE predictors can achieve better performance relative to standard GEE predictors, the so-called \"oracle\" GEE predictor using all time points, and potentially even cluster-specific predictions from a generalized linear mixed model.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 3","pages":""},"PeriodicalIF":1.4,"publicationDate":"2025-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144706180","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Biometrics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1