首页 > 最新文献

Biostatistics最新文献

英文 中文
Bayesian estimation of covariate assisted principal regression for brain functional connectivity. 针对大脑功能连接性的协变量辅助主回归贝叶斯估计。
IF 1.8 3区 数学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-12-31 DOI: 10.1093/biostatistics/kxae023
Hyung G Park

This paper presents a Bayesian reformulation of covariate-assisted principal regression for covariance matrix outcomes to identify low-dimensional components in the covariance associated with covariates. By introducing a geometric approach to the covariance matrices and leveraging Euclidean geometry, we estimate dimension reduction parameters and model covariance heterogeneity based on covariates. This method enables joint estimation and uncertainty quantification of relevant model parameters associated with heteroscedasticity. We demonstrate our approach through simulation studies and apply it to analyze associations between covariates and brain functional connectivity using data from the Human Connectome Project.

本文对协方差矩阵结果的协方差辅助主回归进行了贝叶斯重构,以识别协方差中与协方差相关的低维成分。通过对协方差矩阵引入几何方法并利用欧几里得几何,我们可以根据协方差估计降维参数并建立协方差异质性模型。这种方法可以对与异方差相关的模型参数进行联合估计和不确定性量化。我们通过模拟研究展示了我们的方法,并将其应用于利用人类连接组项目的数据分析协变量与大脑功能连接之间的关联。
{"title":"Bayesian estimation of covariate assisted principal regression for brain functional connectivity.","authors":"Hyung G Park","doi":"10.1093/biostatistics/kxae023","DOIUrl":"10.1093/biostatistics/kxae023","url":null,"abstract":"<p><p>This paper presents a Bayesian reformulation of covariate-assisted principal regression for covariance matrix outcomes to identify low-dimensional components in the covariance associated with covariates. By introducing a geometric approach to the covariance matrices and leveraging Euclidean geometry, we estimate dimension reduction parameters and model covariance heterogeneity based on covariates. This method enables joint estimation and uncertainty quantification of relevant model parameters associated with heteroscedasticity. We demonstrate our approach through simulation studies and apply it to analyze associations between covariates and brain functional connectivity using data from the Human Connectome Project.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":" ","pages":""},"PeriodicalIF":1.8,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11823071/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141565188","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A novel high-dimensional model for identifying regional DNA methylation QTLs. 一种新的高维区域DNA甲基化qtl识别模型。
IF 2 3区 数学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-12-31 DOI: 10.1093/biostatistics/kxaf032
Kaiqiong Zhao, Archer Y Yang, Karim Oualkacha, Yixiao Zeng, Kathleen Klein, Marie Hudson, Inés Colmegna, Sasha Bernatsky, Celia M T Greenwood

Varying coefficient models offer the flexibility to learn the dynamic changes of regression coefficients. Despite their good interpretability and diverse applications, in high-dimensional settings, existing estimation methods for such models have important limitations. For example, we routinely encounter the need for variable selection when faced with a large collection of covariates with nonlinear/varying effects on outcomes, and no ideal solutions exist. One illustration of this situation could be identifying a subset of genetic variants with local influence on methylation levels in a regulatory region. To address this problem, we propose a composite sparse penalty that encourages both sparsity and smoothness for the varying coefficients. We present an efficient proximal gradient descent algorithm that scales to high-dimensional predictor spaces, providing sparse solutions for the varying coefficients. A comprehensive simulation study has been conducted to evaluate the performance of our approach in terms of estimation, prediction and selection accuracy. We show that the inclusion of smoothness control yields much better results over sparsity-only approaches. An adaptive version of the penalty offers additional performance gains. We further demonstrate the utility of our method in identifying regional mQTLs from asymptomatic samples in the CARTaGENE cohort. The methodology is implemented in the R package sparseSOMNiBUS, available on GitHub.

变系数模型提供了学习回归系数动态变化的灵活性。尽管它们具有良好的可解释性和多种应用,但在高维环境中,现有的此类模型估计方法具有重要的局限性。例如,当我们面对大量对结果具有非线性/变化影响的协变量时,我们经常遇到变量选择的需要,并且不存在理想的解决方案。这种情况的一个例子可能是确定对调控区域甲基化水平有局部影响的遗传变异子集。为了解决这个问题,我们提出了一种复合稀疏惩罚,它既鼓励稀疏性,又鼓励变化系数的平滑性。我们提出了一种有效的近端梯度下降算法,该算法可扩展到高维预测空间,为变化系数提供稀疏解。进行了全面的仿真研究,以评估我们的方法在估计,预测和选择精度方面的性能。我们表明,包含平滑控制比仅稀疏性方法产生更好的结果。惩罚的自适应版本提供了额外的性能提升。我们进一步证明了我们的方法在从CARTaGENE队列的无症状样本中识别区域性mqtl的实用性。该方法在R包sparseSOMNiBUS中实现,可以在GitHub上获得。
{"title":"A novel high-dimensional model for identifying regional DNA methylation QTLs.","authors":"Kaiqiong Zhao, Archer Y Yang, Karim Oualkacha, Yixiao Zeng, Kathleen Klein, Marie Hudson, Inés Colmegna, Sasha Bernatsky, Celia M T Greenwood","doi":"10.1093/biostatistics/kxaf032","DOIUrl":"10.1093/biostatistics/kxaf032","url":null,"abstract":"<p><p>Varying coefficient models offer the flexibility to learn the dynamic changes of regression coefficients. Despite their good interpretability and diverse applications, in high-dimensional settings, existing estimation methods for such models have important limitations. For example, we routinely encounter the need for variable selection when faced with a large collection of covariates with nonlinear/varying effects on outcomes, and no ideal solutions exist. One illustration of this situation could be identifying a subset of genetic variants with local influence on methylation levels in a regulatory region. To address this problem, we propose a composite sparse penalty that encourages both sparsity and smoothness for the varying coefficients. We present an efficient proximal gradient descent algorithm that scales to high-dimensional predictor spaces, providing sparse solutions for the varying coefficients. A comprehensive simulation study has been conducted to evaluate the performance of our approach in terms of estimation, prediction and selection accuracy. We show that the inclusion of smoothness control yields much better results over sparsity-only approaches. An adaptive version of the penalty offers additional performance gains. We further demonstrate the utility of our method in identifying regional mQTLs from asymptomatic samples in the CARTaGENE cohort. The methodology is implemented in the R package sparseSOMNiBUS, available on GitHub.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":"26 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12554007/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145373301","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Assessing treatment efficacy for interval-censored endpoints using multistate semi-Markov models fit to multiple data streams. 使用适合多个数据流的多状态半马尔可夫模型评估间隔截尾端点的治疗效果。
IF 2 3区 数学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-12-31 DOI: 10.1093/biostatistics/kxaf038
Raphaël Morsomme, C Jason Liang, Allyson Mateja, Dean A Follmann, Meagan P O'Brien, Chenguang Wang, Jonathan Fintzi

We introduce a computationally efficient and general approach for utilizing multiple, possibly interval-censored, data streams to study complex biomedical endpoints using multistate semi-Markov models. Our motivating application is the REGEN-2069 trial, which investigated the protective efficacy (PE) of the monoclonal antibody combination REGEN-COV against SARS-CoV-2 when administered prophylactically to individuals in households at high risk of secondary transmission. Using data on symptom onset, episodic RT-qPCR sampling, and serological testing, we estimate the PE of REGEN-COV for asymptomatic infection, its effect on seroconversion following infection, and the duration of viral shedding. We find that REGEN-COV reduced the risk of asymptomatic infection and the duration of viral shedding, and led to lower rates of seroconversion among asymptomatically infected participants. Our algorithm for fitting semi-Markov models to interval-censored data employs a Monte Carlo expectation-maximization algorithm combined with importance sampling to efficiently address the intractability of the marginal likelihood when data are intermittently observed. Our algorithm provides substantial computational improvements over existing methods and allows us to fit semi-parametric models despite complex coarsening of the data.

我们介绍了一种计算效率高且通用的方法,用于利用多个可能间隔截尾的数据流来使用多状态半马尔可夫模型研究复杂的生物医学端点。我们的激励申请是REGEN-2069试验,该试验研究了单克隆抗体组合REGEN-COV对SARS-CoV-2的保护功效(PE),当对家庭中继发性传播高风险的个体进行预防性注射时。利用症状发作、时发性RT-qPCR取样和血清学检测的数据,我们估计了无症状感染时REGEN-COV的PE、其对感染后血清转化的影响以及病毒脱落的持续时间。我们发现,REGEN-COV降低了无症状感染的风险和病毒脱落的持续时间,并导致无症状感染参与者的血清转化率降低。我们将半马尔可夫模型拟合到区间截尾数据的算法采用蒙特卡罗期望最大化算法与重要抽样相结合,以有效地解决数据间歇观察时边际似然的难处性。与现有方法相比,我们的算法提供了实质性的计算改进,并允许我们在数据复杂粗糙化的情况下拟合半参数模型。
{"title":"Assessing treatment efficacy for interval-censored endpoints using multistate semi-Markov models fit to multiple data streams.","authors":"Raphaël Morsomme, C Jason Liang, Allyson Mateja, Dean A Follmann, Meagan P O'Brien, Chenguang Wang, Jonathan Fintzi","doi":"10.1093/biostatistics/kxaf038","DOIUrl":"10.1093/biostatistics/kxaf038","url":null,"abstract":"<p><p>We introduce a computationally efficient and general approach for utilizing multiple, possibly interval-censored, data streams to study complex biomedical endpoints using multistate semi-Markov models. Our motivating application is the REGEN-2069 trial, which investigated the protective efficacy (PE) of the monoclonal antibody combination REGEN-COV against SARS-CoV-2 when administered prophylactically to individuals in households at high risk of secondary transmission. Using data on symptom onset, episodic RT-qPCR sampling, and serological testing, we estimate the PE of REGEN-COV for asymptomatic infection, its effect on seroconversion following infection, and the duration of viral shedding. We find that REGEN-COV reduced the risk of asymptomatic infection and the duration of viral shedding, and led to lower rates of seroconversion among asymptomatically infected participants. Our algorithm for fitting semi-Markov models to interval-censored data employs a Monte Carlo expectation-maximization algorithm combined with importance sampling to efficiently address the intractability of the marginal likelihood when data are intermittently observed. Our algorithm provides substantial computational improvements over existing methods and allows us to fit semi-parametric models despite complex coarsening of the data.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":"26 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12629085/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145558436","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Correction to: Scalable kernel balancing weights in a nationwide observational study of hospital profit status and heart attack outcomes. 修正:在一项全国性的医院盈利状况和心脏病发作结果的观察性研究中,可扩展的核平衡权值。
IF 1.8 3区 数学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-12-31 DOI: 10.1093/biostatistics/kxae050
{"title":"Correction to: Scalable kernel balancing weights in a nationwide observational study of hospital profit status and heart attack outcomes.","authors":"","doi":"10.1093/biostatistics/kxae050","DOIUrl":"10.1093/biostatistics/kxae050","url":null,"abstract":"","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":" ","pages":""},"PeriodicalIF":1.8,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142883691","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Recurrent events modeling based on a reflected Brownian motion with application to hypoglycemia. 基于反射布朗运动的反复事件模型及其在低血糖中的应用。
IF 1.8 3区 数学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-12-31 DOI: 10.1093/biostatistics/kxae053
Yingfa Xie, Haoda Fu, Yuan Huang, Vladimir Pozdnyakov, Jun Yan

Patients with type 2 diabetes need to closely monitor blood sugar levels as their routine diabetes self-management. Although many treatment agents aim to tightly control blood sugar, hypoglycemia often stands as an adverse event. In practice, patients can observe hypoglycemic events more easily than hyperglycemic events due to the perception of neurogenic symptoms. We propose to model each patient's observed hypoglycemic event as a lower boundary crossing event for a reflected Brownian motion with an upper reflection barrier. The lower boundary is set by clinical standards. To capture patient heterogeneity and within-patient dependence, covariates and a patient level frailty are incorporated into the volatility and the upper reflection barrier. This framework provides quantification for the underlying glucose level variability, patients heterogeneity, and risk factors' impact on glucose. We make inferences based on a Bayesian framework using Markov chain Monte Carlo. Two model comparison criteria, the deviance information criterion and the logarithm of the pseudo-marginal likelihood, are used for model selection. The methodology is validated in simulation studies. In analyzing a dataset from the diabetic patients in the DURABLE trial, our model provides adequate fit, generates data similar to the observed data, and offers insights that could be missed by other models.

2型糖尿病患者需要密切监测血糖水平,作为糖尿病的常规自我管理。虽然许多治疗药物的目标是严格控制血糖,但低血糖往往是一个不良事件。在实践中,由于神经源性症状的感知,患者更容易观察到低血糖事件而不是高血糖事件。我们建议将每个患者观察到的低血糖事件建模为具有上反射屏障的反射布朗运动的下边界跨越事件。下限由临床标准确定。为了捕获患者异质性和患者内部依赖性,协变量和患者水平的脆弱性被纳入波动率和上反射屏障。该框架为潜在的血糖水平变异性、患者异质性和危险因素对血糖的影响提供了量化。我们利用马尔可夫链蒙特卡罗在贝叶斯框架上进行推理。模型选择采用了偏差信息准则和伪边际似然的对数两个模型比较准则。该方法在仿真研究中得到了验证。在分析DURABLE试验中糖尿病患者的数据集时,我们的模型提供了足够的拟合,生成的数据与观察到的数据相似,并提供了其他模型可能错过的见解。
{"title":"Recurrent events modeling based on a reflected Brownian motion with application to hypoglycemia.","authors":"Yingfa Xie, Haoda Fu, Yuan Huang, Vladimir Pozdnyakov, Jun Yan","doi":"10.1093/biostatistics/kxae053","DOIUrl":"10.1093/biostatistics/kxae053","url":null,"abstract":"<p><p>Patients with type 2 diabetes need to closely monitor blood sugar levels as their routine diabetes self-management. Although many treatment agents aim to tightly control blood sugar, hypoglycemia often stands as an adverse event. In practice, patients can observe hypoglycemic events more easily than hyperglycemic events due to the perception of neurogenic symptoms. We propose to model each patient's observed hypoglycemic event as a lower boundary crossing event for a reflected Brownian motion with an upper reflection barrier. The lower boundary is set by clinical standards. To capture patient heterogeneity and within-patient dependence, covariates and a patient level frailty are incorporated into the volatility and the upper reflection barrier. This framework provides quantification for the underlying glucose level variability, patients heterogeneity, and risk factors' impact on glucose. We make inferences based on a Bayesian framework using Markov chain Monte Carlo. Two model comparison criteria, the deviance information criterion and the logarithm of the pseudo-marginal likelihood, are used for model selection. The methodology is validated in simulation studies. In analyzing a dataset from the diabetic patients in the DURABLE trial, our model provides adequate fit, generates data similar to the observed data, and offers insights that could be missed by other models.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":"26 1","pages":""},"PeriodicalIF":1.8,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143048852","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Causal functional mediation analysis with an application to functional magnetic resonance imaging data. 因果功能中介分析及其在功能磁共振成像数据中的应用。
IF 1.8 3区 数学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-12-31 DOI: 10.1093/biostatistics/kxaf019
Yi Zhao, Xi Luo, Michael E Sobel, Martin A Lindquist, Brian S Caffo

A primary goal of task-based functional magnetic resonance imaging (fMRI) studies is to quantify the effective connectivity between brain regions when stimuli are presented. Assessing the dynamics of effective connectivity has attracted increasing attention. Causal mediation analysis serves as a widely implemented tool aiming to delineate the mechanism between task stimuli and brain activations. However, the case, where the treatment, mediator, and outcome are continuous functions, has not been studied. Causal mediation analysis for functional data is considered. Semiparametric functional linear structural equation models are introduced and causal assumptions are discussed. The proposed models allow for the estimation of individual effect curves. The models are applied to a task-based fMRI study, providing a new perspective of studying dynamic brain connectivity. The R package cfma for implementation is available on CRAN.

任务型功能磁共振成像(fMRI)研究的一个主要目标是量化当刺激出现时大脑区域之间的有效连接。评估有效互联互通的动态已引起越来越多的关注。因果中介分析是一种广泛应用的工具,旨在描述任务刺激和大脑激活之间的机制。然而,在治疗、中介和结果是连续函数的情况下,尚未研究。考虑了功能数据的因果中介分析。引入了半参数泛函线性结构方程模型,并讨论了因果假设。所提出的模型允许对个别效应曲线进行估计。该模型应用于基于任务的fMRI研究,为研究动态脑连接提供了新的视角。用于实现的R包cfma可在CRAN上获得。
{"title":"Causal functional mediation analysis with an application to functional magnetic resonance imaging data.","authors":"Yi Zhao, Xi Luo, Michael E Sobel, Martin A Lindquist, Brian S Caffo","doi":"10.1093/biostatistics/kxaf019","DOIUrl":"10.1093/biostatistics/kxaf019","url":null,"abstract":"<p><p>A primary goal of task-based functional magnetic resonance imaging (fMRI) studies is to quantify the effective connectivity between brain regions when stimuli are presented. Assessing the dynamics of effective connectivity has attracted increasing attention. Causal mediation analysis serves as a widely implemented tool aiming to delineate the mechanism between task stimuli and brain activations. However, the case, where the treatment, mediator, and outcome are continuous functions, has not been studied. Causal mediation analysis for functional data is considered. Semiparametric functional linear structural equation models are introduced and causal assumptions are discussed. The proposed models allow for the estimation of individual effect curves. The models are applied to a task-based fMRI study, providing a new perspective of studying dynamic brain connectivity. The R package cfma for implementation is available on CRAN.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":"26 1","pages":""},"PeriodicalIF":1.8,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12206356/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144531230","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Model-based multifacet clustering with high-dimensional omics applications. 基于模型的多面聚类与高维 omics 应用。
IF 1.8 3区 数学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-12-31 DOI: 10.1093/biostatistics/kxae020
Wei Zong, Danyang Li, Marianne L Seney, Colleen A Mcclung, George C Tseng

High-dimensional omics data often contain intricate and multifaceted information, resulting in the coexistence of multiple plausible sample partitions based on different subsets of selected features. Conventional clustering methods typically yield only one clustering solution, limiting their capacity to fully capture all facets of cluster structures in high-dimensional data. To address this challenge, we propose a model-based multifacet clustering (MFClust) method based on a mixture of Gaussian mixture models, where the former mixture achieves facet assignment for gene features and the latter mixture determines cluster assignment of samples. We demonstrate superior facet and cluster assignment accuracy of MFClust through simulation studies. The proposed method is applied to three transcriptomic applications from postmortem brain and lung disease studies. The result captures multifacet clustering structures associated with critical clinical variables and provides intriguing biological insights for further hypothesis generation and discovery.

高维海洋组学数据通常包含错综复杂的多方面信息,导致基于所选特征的不同子集的多个可信样本分区并存。传统的聚类方法通常只能得到一种聚类解决方案,这限制了它们充分捕捉高维数据中聚类结构所有方面的能力。为了应对这一挑战,我们提出了一种基于模型的多面聚类(MFClust)方法,该方法基于高斯混合模型的混合物,前一种混合物实现基因特征的面分配,后一种混合物决定样本的聚类分配。我们通过模拟研究证明了 MFClust 在面和聚类分配上的卓越准确性。我们将所提出的方法应用于脑死亡后和肺部疾病研究中的三个转录组应用。结果捕捉到了与关键临床变量相关的多方面聚类结构,并为进一步的假设生成和发现提供了引人入胜的生物学见解。
{"title":"Model-based multifacet clustering with high-dimensional omics applications.","authors":"Wei Zong, Danyang Li, Marianne L Seney, Colleen A Mcclung, George C Tseng","doi":"10.1093/biostatistics/kxae020","DOIUrl":"10.1093/biostatistics/kxae020","url":null,"abstract":"<p><p>High-dimensional omics data often contain intricate and multifaceted information, resulting in the coexistence of multiple plausible sample partitions based on different subsets of selected features. Conventional clustering methods typically yield only one clustering solution, limiting their capacity to fully capture all facets of cluster structures in high-dimensional data. To address this challenge, we propose a model-based multifacet clustering (MFClust) method based on a mixture of Gaussian mixture models, where the former mixture achieves facet assignment for gene features and the latter mixture determines cluster assignment of samples. We demonstrate superior facet and cluster assignment accuracy of MFClust through simulation studies. The proposed method is applied to three transcriptomic applications from postmortem brain and lung disease studies. The result captures multifacet clustering structures associated with critical clinical variables and provides intriguing biological insights for further hypothesis generation and discovery.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":" ","pages":""},"PeriodicalIF":1.8,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11823124/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141604511","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Testing for a difference in means of a single feature after clustering. 聚类后对单个特征的均值差异进行测试。
IF 2 3区 数学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-12-31 DOI: 10.1093/biostatistics/kxae046
Yiqun T Chen, Lucy L Gao

For many applications, it is critical to interpret and validate groups of observations obtained via clustering. A common interpretation and validation approach involves testing differences in feature means between observations in two estimated clusters. In this setting, classical hypothesis tests lead to an inflated Type I error rate. To overcome this problem, we propose a new test for the difference in means in a single feature between a pair of clusters obtained using hierarchical or k-means clustering. The test controls the selective Type I error rate in finite samples and can be efficiently computed. We further illustrate the validity and power of our proposal in simulation and demonstrate its use on single-cell RNA-sequencing data.

对于许多应用程序,解释和验证通过聚类获得的观察组是至关重要的。一种常见的解释和验证方法包括测试两个估计聚类中观测值之间的特征均值差异。在这种情况下,经典的假设检验会导致I型错误率过高。为了克服这个问题,我们提出了一种新的测试方法,用于使用分层聚类或k-means聚类获得的一对聚类之间单个特征的均值差异。该测试控制了有限样本的选择性I型错误率,并且可以有效地计算。我们进一步在模拟中说明了我们的建议的有效性和力量,并展示了它在单细胞rna测序数据上的应用。
{"title":"Testing for a difference in means of a single feature after clustering.","authors":"Yiqun T Chen, Lucy L Gao","doi":"10.1093/biostatistics/kxae046","DOIUrl":"10.1093/biostatistics/kxae046","url":null,"abstract":"<p><p>For many applications, it is critical to interpret and validate groups of observations obtained via clustering. A common interpretation and validation approach involves testing differences in feature means between observations in two estimated clusters. In this setting, classical hypothesis tests lead to an inflated Type I error rate. To overcome this problem, we propose a new test for the difference in means in a single feature between a pair of clusters obtained using hierarchical or k-means clustering. The test controls the selective Type I error rate in finite samples and can be efficiently computed. We further illustrate the validity and power of our proposal in simulation and demonstrate its use on single-cell RNA-sequencing data.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":"26 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11687323/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142911253","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bayesian mapping of mortality clusters. 死亡率聚类的贝叶斯映射。
IF 2 3区 数学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-12-31 DOI: 10.1093/biostatistics/kxaf028
Andrea Sottosanti, Enrico Bovo, Pietro Belloni, Giovanna Boccuzzo

Disease mapping analyses the distribution of several disease outcomes within a territory. Primary goals include identifying areas with unexpected changes in mortality rates, studying the relation among multiple diseases, and dividing the analysed territory into clusters based on the observed levels of disease incidence or mortality. In this work, we focus on detecting spatial mortality clusters, that occur when neighbouring areas within a territory exhibit similar mortality levels due to one or more diseases. When multiple causes of death are examined together, it is relevant to identify not only the spatial boundaries of the clusters but also the diseases that lead to their formation. However, existing methods in literature struggle to address this dual problem effectively and simultaneously. To overcome these limitations, we introduce perla, a multivariate Bayesian model that clusters areas in a territory according to the observed mortality rates of multiple causes of death, also exploiting the information of external covariates. Our model incorporates the spatial structure of data directly into the clustering probabilities by leveraging the stick-breaking formulation of the multinomial distribution. Additionally, it exploits suitable global-local shrinkage priors to ensure that the detection of clusters depends on diseases showing concrete increases or decreases in mortality levels, while excluding uninformative diseases. We propose a Markov chain Monte Carlo algorithm for posterior inference that consists of closed-form Gibbs sampling moves for nearly every model parameter, without requiring complex tuning operations. This work is primarily motivated by a case study on the territory of a local unit within the Italian public healthcare system, known as ULSS6 Euganea. To demonstrate the flexibility and effectiveness of our methodology, we also validate perla with a series of simulation experiments and an extensive case study on mortality levels in U.S. counties.

疾病制图分析在一个地区内几种疾病结果的分布。主要目标包括确定死亡率发生意外变化的地区,研究多种疾病之间的关系,并根据观察到的疾病发病率或死亡率水平将所分析的地区划分为类。在这项工作中,我们的重点是检测空间死亡集群,当一个领土内的邻近地区由于一种或多种疾病而表现出相似的死亡率水平时,就会发生这种集群。在一起检查多种死亡原因时,不仅要确定集群的空间边界,还要确定导致集群形成的疾病。然而,现有的文献方法难以同时有效地解决这一双重问题。为了克服这些限制,我们引入了perla,这是一个多变量贝叶斯模型,根据观察到的多种死亡原因的死亡率对一个领土内的区域进行聚类,同时也利用了外部协变量的信息。我们的模型通过利用多项分布的破棍公式,将数据的空间结构直接纳入聚类概率。此外,它利用适当的全局-局部收缩先验,以确保对群集的检测取决于显示死亡率水平具体增加或减少的疾病,同时排除无信息的疾病。我们提出了一种马尔可夫链蒙特卡罗算法用于后验推理,该算法由几乎每个模型参数的封闭形式吉布斯采样移动组成,不需要复杂的调谐操作。这项工作的主要动机是对意大利公共医疗保健系统(ULSS6 Euganea)内的一个地方单位进行的案例研究。为了证明我们方法的灵活性和有效性,我们还通过一系列模拟实验和对美国各县死亡率水平的广泛案例研究来验证perla。
{"title":"Bayesian mapping of mortality clusters.","authors":"Andrea Sottosanti, Enrico Bovo, Pietro Belloni, Giovanna Boccuzzo","doi":"10.1093/biostatistics/kxaf028","DOIUrl":"10.1093/biostatistics/kxaf028","url":null,"abstract":"<p><p>Disease mapping analyses the distribution of several disease outcomes within a territory. Primary goals include identifying areas with unexpected changes in mortality rates, studying the relation among multiple diseases, and dividing the analysed territory into clusters based on the observed levels of disease incidence or mortality. In this work, we focus on detecting spatial mortality clusters, that occur when neighbouring areas within a territory exhibit similar mortality levels due to one or more diseases. When multiple causes of death are examined together, it is relevant to identify not only the spatial boundaries of the clusters but also the diseases that lead to their formation. However, existing methods in literature struggle to address this dual problem effectively and simultaneously. To overcome these limitations, we introduce perla, a multivariate Bayesian model that clusters areas in a territory according to the observed mortality rates of multiple causes of death, also exploiting the information of external covariates. Our model incorporates the spatial structure of data directly into the clustering probabilities by leveraging the stick-breaking formulation of the multinomial distribution. Additionally, it exploits suitable global-local shrinkage priors to ensure that the detection of clusters depends on diseases showing concrete increases or decreases in mortality levels, while excluding uninformative diseases. We propose a Markov chain Monte Carlo algorithm for posterior inference that consists of closed-form Gibbs sampling moves for nearly every model parameter, without requiring complex tuning operations. This work is primarily motivated by a case study on the territory of a local unit within the Italian public healthcare system, known as ULSS6 Euganea. To demonstrate the flexibility and effectiveness of our methodology, we also validate perla with a series of simulation experiments and an extensive case study on mortality levels in U.S. counties.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":"26 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12596199/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145477223","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Fast standard error estimation for joint models of longitudinal and time-to-event data based on stochastic EM algorithms. 基于随机 EM 算法的纵向数据和时间到事件数据联合模型的快速标准误差估计。
IF 2 3区 数学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-12-31 DOI: 10.1093/biostatistics/kxae043
Tingting Yu, Lang Wu, Ronald J Bosch, Davey M Smith, Rui Wang

Maximum likelihood inference can often become computationally intensive when performing joint modeling of longitudinal and time-to-event data, due to the intractable integrals in the joint likelihood function. The computational challenges escalate further when modeling HIV-1 viral load data, owing to the nonlinear trajectories and the presence of left-censored data resulting from the assay's lower limit of quantification. In this paper, for a joint model comprising a nonlinear mixed-effect model and a Cox Proportional Hazards model, we develop a computationally efficient Stochastic EM (StEM) algorithm for parameter estimation. Furthermore, we propose a novel technique for fast standard error estimation, which directly estimates standard errors from the results of StEM iterations and is broadly applicable to various joint modeling settings, such as those containing generalized linear mixed-effect models, parametric survival models, or joint models with more than two submodels. We evaluate the performance of the proposed methods through simulation studies and apply them to HIV-1 viral load data from six AIDS Clinical Trials Group studies to characterize viral rebound trajectories following the interruption of antiretroviral therapy (ART), accounting for the informative duration of off-ART periods.

在对纵向数据和时间到事件数据进行联合建模时,由于联合似然函数中的积分难以处理,最大似然推断往往会变得计算密集。在对 HIV-1 病毒载量数据建模时,由于非线性轨迹和检测定量下限导致的左删失数据的存在,计算挑战进一步升级。本文针对由非线性混合效应模型和 Cox 比例危害模型组成的联合模型,开发了一种计算高效的随机 EM(StEM)算法,用于参数估计。此外,我们还提出了一种快速标准误差估计的新技术,该技术可直接从 StEM 迭代结果中估计标准误差,广泛适用于各种联合建模环境,如包含广义线性混合效应模型、参数生存模型或具有两个以上子模型的联合模型。我们通过模拟研究评估了所提方法的性能,并将其应用于六项艾滋病临床试验组研究中的 HIV-1 病毒载量数据,以描述抗逆转录病毒疗法(ART)中断后的病毒反弹轨迹,同时考虑到非抗病毒治疗期的信息持续时间。
{"title":"Fast standard error estimation for joint models of longitudinal and time-to-event data based on stochastic EM algorithms.","authors":"Tingting Yu, Lang Wu, Ronald J Bosch, Davey M Smith, Rui Wang","doi":"10.1093/biostatistics/kxae043","DOIUrl":"10.1093/biostatistics/kxae043","url":null,"abstract":"<p><p>Maximum likelihood inference can often become computationally intensive when performing joint modeling of longitudinal and time-to-event data, due to the intractable integrals in the joint likelihood function. The computational challenges escalate further when modeling HIV-1 viral load data, owing to the nonlinear trajectories and the presence of left-censored data resulting from the assay's lower limit of quantification. In this paper, for a joint model comprising a nonlinear mixed-effect model and a Cox Proportional Hazards model, we develop a computationally efficient Stochastic EM (StEM) algorithm for parameter estimation. Furthermore, we propose a novel technique for fast standard error estimation, which directly estimates standard errors from the results of StEM iterations and is broadly applicable to various joint modeling settings, such as those containing generalized linear mixed-effect models, parametric survival models, or joint models with more than two submodels. We evaluate the performance of the proposed methods through simulation studies and apply them to HIV-1 viral load data from six AIDS Clinical Trials Group studies to characterize viral rebound trajectories following the interruption of antiretroviral therapy (ART), accounting for the informative duration of off-ART periods.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":" ","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11823262/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142632694","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Biostatistics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1