首页 > 最新文献

Biostatistics最新文献

英文 中文
Incorporating historic information to further improve power when conducting Bayesian information borrowing in basket trials. 在篮子试验中引入历史信息,进一步提高贝叶斯信息的有效性。
IF 1.8 3区 数学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-12-31 DOI: 10.1093/biostatistics/kxaf016
Libby Daniells, Pavel Mozgunov, Helen Barnett, Alun Bedding, Thomas Jaki

In basket trials a single therapeutic treatment is tested on several patient populations simultaneously, each of which forming a basket, where patients across all baskets on the trial share a common genetic aberration. These trials allow testing of treatments on small groups of patients, however, limited basket sample sizes can result in inadequate precision and power of estimates. It is well known that Bayesian information borrowing models such as the exchangeability-nonexchangeability (EXNEX) model can be implemented to tackle such a problem, drawing on information from one basket when making inference in another. An alternative approach to improve power of estimates, is to incorporate any historical or external information available. This paper considers models that amalgamate both forms of information borrowing, allowing borrowing between baskets in the ongoing trial whilst also drawing on response data from historical sources, with the aim to further improve treatment effect estimates. We propose several Bayesian information borrowing approaches that incorporate historical information into the model. These methods are data-driven, updating the degree of borrowing based on the level of homogeneity between information sources. A thorough simulation study is presented to draw comparisons between the proposed approaches, whilst also comparing to the standard EXNEX model in which no historical information is utilized. The models are also applied to a real-life trial example to demonstrate their performance in practice. We show that the incorporation of historic data under the novel approaches can lead to a substantial improvement in precision and power of treatment effect estimates when such data is homogeneous to the responses in the ongoing trial. Under some approaches, this came alongside an inflation in type I error rate in cases of heterogeneity. However, the use of a power prior in the EXNEX model is shown to increase power and precision, whilst maintaining similar error rates to the standard EXNEX model.

在篮子试验中,一种治疗方法同时在几个患者群体中进行测试,每个患者群体形成一个篮子,所有篮子中的患者都有共同的遗传畸变。这些试验允许在小组患者中测试治疗方法,然而,有限的篮子样本量可能导致估计的精度和效力不足。众所周知,贝叶斯信息借用模型,如可交换性-不可交换性(EXNEX)模型可以实现来解决这样的问题,从一个篮子中提取信息,同时在另一个篮子中进行推理。另一种改进估计能力的方法是合并任何可用的历史或外部信息。本文考虑了合并两种形式的信息借鉴的模型,允许在正在进行的试验中在篮子之间进行借鉴,同时也利用来自历史来源的响应数据,目的是进一步改善治疗效果的估计。我们提出了几种将历史信息纳入模型的贝叶斯信息借用方法。这些方法是数据驱动的,基于信息源之间的同质性水平来更新借阅程度。提出了一个彻底的仿真研究来比较所提出的方法,同时也比较了没有使用历史信息的标准EXNEX模型。并将该模型应用于一个实际的试验实例,以验证其在实践中的性能。我们表明,当这些数据与正在进行的试验中的反应一致时,在新方法下合并历史数据可以导致治疗效果估计的精度和能力的实质性提高。在某些方法下,这与异质性情况下的I类错误率膨胀同时发生。然而,在EXNEX模型中使用功率先验可以提高功率和精度,同时保持与标准EXNEX模型相似的错误率。
{"title":"Incorporating historic information to further improve power when conducting Bayesian information borrowing in basket trials.","authors":"Libby Daniells, Pavel Mozgunov, Helen Barnett, Alun Bedding, Thomas Jaki","doi":"10.1093/biostatistics/kxaf016","DOIUrl":"10.1093/biostatistics/kxaf016","url":null,"abstract":"<p><p>In basket trials a single therapeutic treatment is tested on several patient populations simultaneously, each of which forming a basket, where patients across all baskets on the trial share a common genetic aberration. These trials allow testing of treatments on small groups of patients, however, limited basket sample sizes can result in inadequate precision and power of estimates. It is well known that Bayesian information borrowing models such as the exchangeability-nonexchangeability (EXNEX) model can be implemented to tackle such a problem, drawing on information from one basket when making inference in another. An alternative approach to improve power of estimates, is to incorporate any historical or external information available. This paper considers models that amalgamate both forms of information borrowing, allowing borrowing between baskets in the ongoing trial whilst also drawing on response data from historical sources, with the aim to further improve treatment effect estimates. We propose several Bayesian information borrowing approaches that incorporate historical information into the model. These methods are data-driven, updating the degree of borrowing based on the level of homogeneity between information sources. A thorough simulation study is presented to draw comparisons between the proposed approaches, whilst also comparing to the standard EXNEX model in which no historical information is utilized. The models are also applied to a real-life trial example to demonstrate their performance in practice. We show that the incorporation of historic data under the novel approaches can lead to a substantial improvement in precision and power of treatment effect estimates when such data is homogeneous to the responses in the ongoing trial. Under some approaches, this came alongside an inflation in type I error rate in cases of heterogeneity. However, the use of a power prior in the EXNEX model is shown to increase power and precision, whilst maintaining similar error rates to the standard EXNEX model.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":"26 1","pages":""},"PeriodicalIF":1.8,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12204204/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144327836","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Predicting distributions of physical activity profiles in the National Health and Nutrition Examination Survey database using a partially linear Fréchet single index model. 使用部分线性fr<s:1>单指数模型预测国家健康和营养检查调查数据库中身体活动概况的分布。
IF 1.8 3区 数学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-12-31 DOI: 10.1093/biostatistics/kxaf013
Marcos Matabuena, Aritra Ghosal, Wendy Meiring, Alexander Petersen

Object-oriented data analysis is a fascinating and evolving field in modern statistical science, with the potential to make significant contributions to biomedical applications. This statistical framework facilitates the development of new methods to analyze complex data objects that capture more information than traditional clinical biomarkers. This paper applies the object-oriented framework to analyze physical activity levels, measured by accelerometers, as response objects in a regression model. Unlike traditional summary metrics, we utilize a recently proposed representation of physical activity data as a distributional object, providing a more nuanced and complete profile of individual energy expenditure across all ranges of monitoring intensity. A novel hybrid Fréchet regression model is proposed and applied to US population accelerometer data from National Health and Nutrition Examination Survey (NHANES) 2011 to 2014. The semi-parametric nature of the model allows for the inclusion of nonlinear effects for critical variables, such as age, which are biologically known to have subtle impacts on physical activity. Simultaneously, the inclusion of linear effects preserves interpretability for other variables, particularly categorical covariates such as ethnicity and sex. The results obtained are valuable from a public health perspective and could lead to new strategies for optimizing physical activity interventions in specific American subpopulations.

面向对象的数据分析是现代统计科学中一个引人入胜且不断发展的领域,具有为生物医学应用做出重大贡献的潜力。这种统计框架促进了新方法的发展,以分析复杂的数据对象,比传统的临床生物标志物捕获更多的信息。本文应用面向对象的框架来分析由加速度计测量的身体活动水平,作为回归模型中的响应对象。与传统的汇总指标不同,我们利用最近提出的身体活动数据表示作为分布对象,在所有监测强度范围内提供更细致和完整的个人能量消耗概况。本文提出了一种新的混合fracei回归模型,并将其应用于2011年至2014年美国国家健康与营养调查(NHANES)的人口加速度计数据。该模型的半参数性质允许包含关键变量的非线性效应,如年龄,这在生物学上已知对身体活动有微妙的影响。同时,线性效应的包含保留了其他变量的可解释性,特别是分类协变量,如种族和性别。从公共卫生的角度来看,所获得的结果是有价值的,并可能导致优化特定美国亚群的体育活动干预的新策略。
{"title":"Predicting distributions of physical activity profiles in the National Health and Nutrition Examination Survey database using a partially linear Fréchet single index model.","authors":"Marcos Matabuena, Aritra Ghosal, Wendy Meiring, Alexander Petersen","doi":"10.1093/biostatistics/kxaf013","DOIUrl":"10.1093/biostatistics/kxaf013","url":null,"abstract":"<p><p>Object-oriented data analysis is a fascinating and evolving field in modern statistical science, with the potential to make significant contributions to biomedical applications. This statistical framework facilitates the development of new methods to analyze complex data objects that capture more information than traditional clinical biomarkers. This paper applies the object-oriented framework to analyze physical activity levels, measured by accelerometers, as response objects in a regression model. Unlike traditional summary metrics, we utilize a recently proposed representation of physical activity data as a distributional object, providing a more nuanced and complete profile of individual energy expenditure across all ranges of monitoring intensity. A novel hybrid Fréchet regression model is proposed and applied to US population accelerometer data from National Health and Nutrition Examination Survey (NHANES) 2011 to 2014. The semi-parametric nature of the model allows for the inclusion of nonlinear effects for critical variables, such as age, which are biologically known to have subtle impacts on physical activity. Simultaneously, the inclusion of linear effects preserves interpretability for other variables, particularly categorical covariates such as ethnicity and sex. The results obtained are valuable from a public health perspective and could lead to new strategies for optimizing physical activity interventions in specific American subpopulations.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":"26 1","pages":""},"PeriodicalIF":1.8,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144129647","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Stochastic EM algorithm for partially observed stochastic epidemics with individual heterogeneity. 具有个体异质性的部分观测随机流行病的随机 EM 算法。
IF 1.8 3区 数学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-12-31 DOI: 10.1093/biostatistics/kxae018
Fan Bu, Allison E Aiello, Alexander Volfovsky, Jason Xu

We develop a stochastic epidemic model progressing over dynamic networks, where infection rates are heterogeneous and may vary with individual-level covariates. The joint dynamics are modeled as a continuous-time Markov chain such that disease transmission is constrained by the contact network structure, and network evolution is in turn influenced by individual disease statuses. To accommodate partial epidemic observations commonly seen in real-world data, we propose a stochastic EM algorithm for inference, introducing key innovations that include efficient conditional samplers for imputing missing infection and recovery times which respect the dynamic contact network. Experiments on both synthetic and real datasets demonstrate that our inference method can accurately and efficiently recover model parameters and provide valuable insight at the presence of unobserved disease episodes in epidemic data.

我们建立了一个在动态网络上发展的随机流行病模型,在这个模型中,感染率是异质的,并可能随个体水平的协变量而变化。联合动态模型是一个连续时间马尔可夫链,疾病传播受接触网络结构的制约,而网络演化反过来又受个体疾病状态的影响。为了适应真实世界数据中常见的部分流行病观测数据,我们提出了一种用于推断的随机电磁算法,并引入了一些关键创新,包括有效的条件采样器,用于计算缺失的感染和恢复时间,这些采样器尊重动态接触网络。在合成数据集和真实数据集上进行的实验表明,我们的推理方法可以准确、高效地恢复模型参数,并对流行病数据中未观察到的疾病发作提供有价值的见解。
{"title":"Stochastic EM algorithm for partially observed stochastic epidemics with individual heterogeneity.","authors":"Fan Bu, Allison E Aiello, Alexander Volfovsky, Jason Xu","doi":"10.1093/biostatistics/kxae018","DOIUrl":"10.1093/biostatistics/kxae018","url":null,"abstract":"<p><p>We develop a stochastic epidemic model progressing over dynamic networks, where infection rates are heterogeneous and may vary with individual-level covariates. The joint dynamics are modeled as a continuous-time Markov chain such that disease transmission is constrained by the contact network structure, and network evolution is in turn influenced by individual disease statuses. To accommodate partial epidemic observations commonly seen in real-world data, we propose a stochastic EM algorithm for inference, introducing key innovations that include efficient conditional samplers for imputing missing infection and recovery times which respect the dynamic contact network. Experiments on both synthetic and real datasets demonstrate that our inference method can accurately and efficiently recover model parameters and provide valuable insight at the presence of unobserved disease episodes in epidemic data.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":" ","pages":""},"PeriodicalIF":1.8,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141903694","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A scalable two-stage Bayesian approach accounting for exposure measurement error in environmental epidemiology. 在环境流行病学中考虑暴露测量误差的可扩展两阶段贝叶斯方法。
IF 2 3区 数学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-12-31 DOI: 10.1093/biostatistics/kxae038
Changwoo J Lee, Elaine Symanski, Amal Rammah, Dong Hun Kang, Philip K Hopke, Eun Sug Park

Accounting for exposure measurement errors has been recognized as a crucial problem in environmental epidemiology for over two decades. Bayesian hierarchical models offer a coherent probabilistic framework for evaluating associations between environmental exposures and health effects, which take into account exposure measurement errors introduced by uncertainty in the estimated exposure as well as spatial misalignment between the exposure and health outcome data. While two-stage Bayesian analyses are often regarded as a good alternative to fully Bayesian analyses when joint estimation is not feasible, there has been minimal research on how to properly propagate uncertainty from the first-stage exposure model to the second-stage health model, especially in the case of a large number of participant locations along with spatially correlated exposures. We propose a scalable two-stage Bayesian approach, called a sparse multivariate normal (sparse MVN) prior approach, based on the Vecchia approximation for assessing associations between exposure and health outcomes in environmental epidemiology. We compare its performance with existing approaches through simulation. Our sparse MVN prior approach shows comparable performance with the fully Bayesian approach, which is a gold standard but is impossible to implement in some cases. We investigate the association between source-specific exposures and pollutant (nitrogen dioxide [NO2])-specific exposures and birth weight of full-term infants born in 2012 in Harris County, Texas, using several approaches, including the newly developed method.

二十多年来,暴露测量误差一直被认为是环境流行病学中的一个关键问题。贝叶斯分层模型为评估环境暴露与健康影响之间的关联提供了一个连贯的概率框架,该框架考虑到了估计暴露量的不确定性以及暴露量与健康结果数据之间的空间错位所带来的暴露测量误差。在联合估计不可行的情况下,两阶段贝叶斯分析通常被认为是完全贝叶斯分析的良好替代方法,但关于如何将不确定性从第一阶段暴露模型正确传播到第二阶段健康模型的研究却很少,尤其是在有大量参与地点和空间相关暴露的情况下。我们提出了一种可扩展的两阶段贝叶斯方法,称为稀疏多变量正态(稀疏 MVN)先验方法,该方法基于 Vecchia 近似,用于评估环境流行病学中暴露与健康结果之间的关联。我们通过模拟将其性能与现有方法进行了比较。我们的稀疏 MVN 先验方法与完全贝叶斯方法的性能相当,后者是黄金标准,但在某些情况下无法实施。我们使用几种方法(包括新开发的方法)调查了德克萨斯州哈里斯县 2012 年出生的足月婴儿的特定来源暴露和特定污染物(二氧化氮 [NO2])暴露与出生体重之间的关联。
{"title":"A scalable two-stage Bayesian approach accounting for exposure measurement error in environmental epidemiology.","authors":"Changwoo J Lee, Elaine Symanski, Amal Rammah, Dong Hun Kang, Philip K Hopke, Eun Sug Park","doi":"10.1093/biostatistics/kxae038","DOIUrl":"10.1093/biostatistics/kxae038","url":null,"abstract":"<p><p>Accounting for exposure measurement errors has been recognized as a crucial problem in environmental epidemiology for over two decades. Bayesian hierarchical models offer a coherent probabilistic framework for evaluating associations between environmental exposures and health effects, which take into account exposure measurement errors introduced by uncertainty in the estimated exposure as well as spatial misalignment between the exposure and health outcome data. While two-stage Bayesian analyses are often regarded as a good alternative to fully Bayesian analyses when joint estimation is not feasible, there has been minimal research on how to properly propagate uncertainty from the first-stage exposure model to the second-stage health model, especially in the case of a large number of participant locations along with spatially correlated exposures. We propose a scalable two-stage Bayesian approach, called a sparse multivariate normal (sparse MVN) prior approach, based on the Vecchia approximation for assessing associations between exposure and health outcomes in environmental epidemiology. We compare its performance with existing approaches through simulation. Our sparse MVN prior approach shows comparable performance with the fully Bayesian approach, which is a gold standard but is impossible to implement in some cases. We investigate the association between source-specific exposures and pollutant (nitrogen dioxide [NO2])-specific exposures and birth weight of full-term infants born in 2012 in Harris County, Texas, using several approaches, including the newly developed method.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":" ","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11823266/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142378644","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The winner's curse under dependence: repairing empirical Bayes using convoluted densities. 依赖下的赢家诅咒:用卷积密度修复经验贝叶斯。
IF 2 3区 数学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-12-31 DOI: 10.1093/biostatistics/kxaf025
Stijn Hawinkel, Olivier Thas, Steven Maere

The winner's curse is a form of selection bias that arises when estimates are obtained for a large number of features, but only a subset of most extreme estimates is reported. It occurs in large scale significance testing as well as in rank-based selection, and imperils reproducibility of findings and follow-up study design. Several methods correcting for this selection bias have been proposed, but questions remain on their susceptibility to dependence between features since theoretical analyses and comparative studies are few. We prove that estimation through Tweedie's formula is biased in presence of strong dependence, and propose a convolution of its density estimator to restore its competitive performance, which also aids other empirical Bayes methods. Furthermore, we perform a comprehensive simulation study comparing different classes of winner's curse correction methods for point estimates as well as confidence intervals under dependence. We find a bootstrap method and empirical Bayes methods with density convolution to perform best at correcting the selection bias, although this correction generally does not improve the feature ranking. Finally, we apply the methods to a comparison of single-feature versus multi-feature prediction models in predicting Brassica napus phenotypes from gene expression data, demonstrating that the superiority of the best single-feature model may be illusory.

赢家的诅咒是一种选择偏差的形式,当获得了大量特征的估计,但只有最极端估计的子集被报告时,就会出现这种偏差。它发生在大规模显著性检验以及基于秩的选择中,并危及结果的可重复性和后续研究设计。已经提出了几种纠正这种选择偏差的方法,但由于理论分析和比较研究很少,它们对特征之间依赖性的敏感性仍然存在问题。我们证明了Tweedie公式的估计在存在强依赖性的情况下是有偏差的,并提出了其密度估计器的卷积来恢复其竞争性能,这也有助于其他经验贝叶斯方法。此外,我们进行了全面的模拟研究,比较了不同类别的赢家诅咒校正方法的点估计以及依赖下的置信区间。我们发现带密度卷积的bootstrap方法和经验贝叶斯方法在校正选择偏差方面表现最好,尽管这种校正通常不会提高特征排名。最后,我们将这些方法应用于单特征和多特征预测模型在从基因表达数据预测甘蓝型表型方面的比较,表明最佳单特征模型的优势可能是虚幻的。
{"title":"The winner's curse under dependence: repairing empirical Bayes using convoluted densities.","authors":"Stijn Hawinkel, Olivier Thas, Steven Maere","doi":"10.1093/biostatistics/kxaf025","DOIUrl":"https://doi.org/10.1093/biostatistics/kxaf025","url":null,"abstract":"<p><p>The winner's curse is a form of selection bias that arises when estimates are obtained for a large number of features, but only a subset of most extreme estimates is reported. It occurs in large scale significance testing as well as in rank-based selection, and imperils reproducibility of findings and follow-up study design. Several methods correcting for this selection bias have been proposed, but questions remain on their susceptibility to dependence between features since theoretical analyses and comparative studies are few. We prove that estimation through Tweedie's formula is biased in presence of strong dependence, and propose a convolution of its density estimator to restore its competitive performance, which also aids other empirical Bayes methods. Furthermore, we perform a comprehensive simulation study comparing different classes of winner's curse correction methods for point estimates as well as confidence intervals under dependence. We find a bootstrap method and empirical Bayes methods with density convolution to perform best at correcting the selection bias, although this correction generally does not improve the feature ranking. Finally, we apply the methods to a comparison of single-feature versus multi-feature prediction models in predicting Brassica napus phenotypes from gene expression data, demonstrating that the superiority of the best single-feature model may be illusory.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":"26 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144979577","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Assessing spatial disparities: a Bayesian linear regression approach. 评估空间差异:贝叶斯线性回归方法。
IF 2 3区 数学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-12-31 DOI: 10.1093/biostatistics/kxaf048
Kyle Wu, Sudipto Banerjee

Epidemiological investigations of regionally aggregated spatial data often involve detecting spatial health disparities among neighboring regions on a map of disease mortality or incidence rates. Analyzing such data introduces spatial dependence among health outcomes and seeks to report statistically significant spatial disparities by delineating boundaries that separate neighboring regions with disparate health outcomes. However, there are statistical challenges to appropriately define what constitutes a spatial disparity and to construct robust probabilistic inferences for spatial disparities. We enrich the familiar Bayesian linear regression framework to introduce spatial autoregression and offer model-based detection of spatial disparities. We derive exploitable analytical tractability that considerably accelerates computation. Simulation experiments conducted on a county map of the entire United States demonstrate the effectiveness of our method, and we apply our method to a data set from the Institute of Health Metrics and Evaluation (IHME) on age-standardized US county-level estimates of lung cancer mortality rates.

对区域汇总空间数据的流行病学调查往往涉及在疾病死亡率或发病率地图上发现邻近区域之间的空间健康差异。对这些数据的分析引入了健康结果之间的空间依赖性,并试图通过划定将不同健康结果的邻近地区分开的边界来报告统计上显著的空间差异。然而,在适当地定义什么构成空间差异以及构建空间差异的可靠概率推断方面存在统计学上的挑战。我们丰富了熟悉的贝叶斯线性回归框架,引入空间自回归,并提供基于模型的空间差异检测。我们推导出可利用的分析可追溯性,大大加快了计算速度。在整个美国的县地图上进行的模拟实验证明了我们的方法的有效性,我们将我们的方法应用于健康计量与评估研究所(IHME)关于年龄标准化的美国县级肺癌死亡率估计的数据集。
{"title":"Assessing spatial disparities: a Bayesian linear regression approach.","authors":"Kyle Wu, Sudipto Banerjee","doi":"10.1093/biostatistics/kxaf048","DOIUrl":"https://doi.org/10.1093/biostatistics/kxaf048","url":null,"abstract":"<p><p>Epidemiological investigations of regionally aggregated spatial data often involve detecting spatial health disparities among neighboring regions on a map of disease mortality or incidence rates. Analyzing such data introduces spatial dependence among health outcomes and seeks to report statistically significant spatial disparities by delineating boundaries that separate neighboring regions with disparate health outcomes. However, there are statistical challenges to appropriately define what constitutes a spatial disparity and to construct robust probabilistic inferences for spatial disparities. We enrich the familiar Bayesian linear regression framework to introduce spatial autoregression and offer model-based detection of spatial disparities. We derive exploitable analytical tractability that considerably accelerates computation. Simulation experiments conducted on a county map of the entire United States demonstrate the effectiveness of our method, and we apply our method to a data set from the Institute of Health Metrics and Evaluation (IHME) on age-standardized US county-level estimates of lung cancer mortality rates.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":"26 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145776620","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Bayesian semi-parametric approach to causal mediation for longitudinal mediators and time-to-event outcomes with application to a cardiovascular disease cohort study. 贝叶斯半参数方法对纵向介质和事件发生时间结果的因果中介的应用于心血管疾病队列研究。
IF 2 3区 数学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-12-31 DOI: 10.1093/biostatistics/kxaf027
Saurabh Bhandari, Michael J Daniels, Maria Josefsson, Donald M Lloyd-Jones, Juned Siddique

Causal mediation analysis of observational data is an important tool for investigating the potential causal effects of medications on disease-related risk factors, and on time-to-death (or disease progression) through these risk factors. However, when analyzing data from a cohort study, such analyses are complicated by the longitudinal structure of the risk factors and the presence of time-varying confounders. Leveraging data from the Atherosclerosis Risk in Communities (ARIC) cohort study, we develop a causal mediation approach, using (semi-parametric) Bayesian Additive Regression Tree (BART) models for the longitudinal and survival data. Our framework is developed using static longitudinal exposure regimes and allows for time-varying confounders and mediators, both of which can be either continuous or binary. We also identify and estimate direct and indirect causal effects in the presence of a competing event. We apply our methods to assess how medication, prescribed to target cardiovascular disease (CVD) risk factors, affects the time-to-CVD death.

观察数据的因果中介分析是研究药物对疾病相关危险因素以及通过这些危险因素对死亡时间(或疾病进展)的潜在因果影响的重要工具。然而,当分析来自队列研究的数据时,这种分析由于风险因素的纵向结构和时变混杂因素的存在而变得复杂。利用社区动脉粥样硬化风险(ARIC)队列研究的数据,我们开发了一种因果中介方法,使用(半参数)贝叶斯加性回归树(BART)模型处理纵向和生存数据。我们的框架是使用静态纵向暴露机制开发的,并允许时变混杂因素和中介因素,这两者都可以是连续的或二进制的。我们还识别和估计在竞争事件存在的直接和间接因果效应。我们应用我们的方法来评估针对心血管疾病(CVD)危险因素的药物治疗如何影响心血管疾病死亡时间。
{"title":"A Bayesian semi-parametric approach to causal mediation for longitudinal mediators and time-to-event outcomes with application to a cardiovascular disease cohort study.","authors":"Saurabh Bhandari, Michael J Daniels, Maria Josefsson, Donald M Lloyd-Jones, Juned Siddique","doi":"10.1093/biostatistics/kxaf027","DOIUrl":"10.1093/biostatistics/kxaf027","url":null,"abstract":"<p><p>Causal mediation analysis of observational data is an important tool for investigating the potential causal effects of medications on disease-related risk factors, and on time-to-death (or disease progression) through these risk factors. However, when analyzing data from a cohort study, such analyses are complicated by the longitudinal structure of the risk factors and the presence of time-varying confounders. Leveraging data from the Atherosclerosis Risk in Communities (ARIC) cohort study, we develop a causal mediation approach, using (semi-parametric) Bayesian Additive Regression Tree (BART) models for the longitudinal and survival data. Our framework is developed using static longitudinal exposure regimes and allows for time-varying confounders and mediators, both of which can be either continuous or binary. We also identify and estimate direct and indirect causal effects in the presence of a competing event. We apply our methods to assess how medication, prescribed to target cardiovascular disease (CVD) risk factors, affects the time-to-CVD death.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":"26 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12479244/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145193833","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A modeling framework for detecting and leveraging node-level information in Bayesian network inference. 在贝叶斯网络推理中检测和利用节点级信息的建模框架。
IF 1.8 3区 数学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-12-31 DOI: 10.1093/biostatistics/kxae021
Xiaoyue Xi, Hélène Ruffieux

Bayesian graphical models are powerful tools to infer complex relationships in high dimension, yet are often fraught with computational and statistical challenges. If exploited in a principled way, the increasing information collected alongside the data of primary interest constitutes an opportunity to mitigate these difficulties by guiding the detection of dependence structures. For instance, gene network inference may be informed by the use of publicly available summary statistics on the regulation of genes by genetic variants. Here we present a novel Gaussian graphical modeling framework to identify and leverage information on the centrality of nodes in conditional independence graphs. Specifically, we consider a fully joint hierarchical model to simultaneously infer (i) sparse precision matrices and (ii) the relevance of node-level information for uncovering the sought-after network structure. We encode such information as candidate auxiliary variables using a spike-and-slab submodel on the propensity of nodes to be hubs, which allows hypothesis-free selection and interpretation of a sparse subset of relevant variables. As efficient exploration of large posterior spaces is needed for real-world applications, we develop a variational expectation conditional maximization algorithm that scales inference to hundreds of samples, nodes and auxiliary variables. We illustrate and exploit the advantages of our approach in simulations and in a gene network study which identifies hub genes involved in biological pathways relevant to immune-mediated diseases.

贝叶斯图模型是推断高维度复杂关系的强大工具,但在计算和统计方面往往充满挑战。如果以有原则的方式加以利用,那么随着主要兴趣数据的收集而不断增加的信息,就有机会通过指导依赖结构的检测来减轻这些困难。例如,基因网络推断可以利用公开的基因变异调控汇总统计数据。在这里,我们提出了一种新颖的高斯图建模框架,用于识别和利用条件独立图中节点的中心性信息。具体来说,我们考虑了一个完全联合的分层模型,以同时推断 (i) 稀疏精度矩阵和 (ii) 节点级信息对揭示所需的网络结构的相关性。我们使用一个关于节点成为枢纽的倾向的尖峰-板块子模型,将这些信息编码为候选辅助变量,从而可以无假设地选择和解释相关变量的稀疏子集。由于现实世界的应用需要对大型后验空间进行有效探索,我们开发了一种变分期望条件最大化算法,可将推理扩展到数百个样本、节点和辅助变量。我们在模拟和基因网络研究中说明并利用了我们方法的优势,该研究确定了与免疫介导疾病相关的生物通路中的枢纽基因。
{"title":"A modeling framework for detecting and leveraging node-level information in Bayesian network inference.","authors":"Xiaoyue Xi, Hélène Ruffieux","doi":"10.1093/biostatistics/kxae021","DOIUrl":"10.1093/biostatistics/kxae021","url":null,"abstract":"<p><p>Bayesian graphical models are powerful tools to infer complex relationships in high dimension, yet are often fraught with computational and statistical challenges. If exploited in a principled way, the increasing information collected alongside the data of primary interest constitutes an opportunity to mitigate these difficulties by guiding the detection of dependence structures. For instance, gene network inference may be informed by the use of publicly available summary statistics on the regulation of genes by genetic variants. Here we present a novel Gaussian graphical modeling framework to identify and leverage information on the centrality of nodes in conditional independence graphs. Specifically, we consider a fully joint hierarchical model to simultaneously infer (i) sparse precision matrices and (ii) the relevance of node-level information for uncovering the sought-after network structure. We encode such information as candidate auxiliary variables using a spike-and-slab submodel on the propensity of nodes to be hubs, which allows hypothesis-free selection and interpretation of a sparse subset of relevant variables. As efficient exploration of large posterior spaces is needed for real-world applications, we develop a variational expectation conditional maximization algorithm that scales inference to hundreds of samples, nodes and auxiliary variables. We illustrate and exploit the advantages of our approach in simulations and in a gene network study which identifies hub genes involved in biological pathways relevant to immune-mediated diseases.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":" ","pages":""},"PeriodicalIF":1.8,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11823055/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141452158","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Connectivity Regression. 连接回归。
IF 1.8 3区 数学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-12-31 DOI: 10.1093/biostatistics/kxaf002
Neel Desai, Veera Baladandayuthapani, Russell T Shinohara, Jeffrey S Morris

Assessing how brain functional connectivity networks vary across individuals promises to uncover important scientific questions such as patterns of healthy brain aging through the lifespan or dysconnectivity associated with disease. In this article, we introduce a general regression framework, Connectivity Regression (ConnReg), for regressing subject-specific functional connectivity networks on covariates while accounting for within-network inter-edge dependence. ConnReg utilizes a multivariate generalization of Fisher's transformation to project network objects into an alternative space where Gaussian assumptions are justified and positive semidefinite constraints are automatically satisfied. Penalized multivariate regression is fit in the transformed space to simultaneously induce sparsity in regression coefficients and in covariance elements, which capture within network inter-edge dependence. We use permutation tests to perform multiplicity-adjusted inference to identify covariates associated with connectivity, and stability selection scores to identify network edges that vary with selected covariates. Simulation studies validate the inferential properties of our proposed method and demonstrate how estimating and accounting for within-network inter-edge dependence leads to more efficient estimation, more powerful inference, and more accurate selection of covariate-dependent network edges. We apply ConnReg to the Human Connectome Project Young Adult study, revealing insights into how connectivity varies with language processing covariates and structural brain features.

评估大脑功能连接网络在个体之间的差异有望揭示重要的科学问题,例如健康大脑在整个生命周期中的衰老模式或与疾病相关的连接障碍。在本文中,我们介绍了一个通用的回归框架,Connectivity regression (ConnReg),用于在考虑网络内边缘依赖的情况下,在协变量上回归特定主题的功能连接网络。ConnReg利用Fisher变换的多元泛化将网络对象投射到一个替代空间中,在这个空间中高斯假设被证明是正确的,并且正的半确定约束被自动满足。在变换后的空间中拟合惩罚多元回归,同时诱导回归系数和协方差元素的稀疏性,从而捕获网络边缘间的依赖关系。我们使用置换测试来执行多重调整推理,以识别与连通性相关的协变量,并使用稳定性选择分数来识别随所选协变量变化的网络边缘。仿真研究验证了我们提出的方法的推理特性,并展示了如何估计和计算网络内边缘间的依赖,从而更有效地估计,更强大的推理,更准确地选择协变量相关的网络边缘。我们将ConnReg应用于人类连接组项目年轻人研究,揭示了连接如何随语言处理协变量和大脑结构特征而变化的见解。
{"title":"Connectivity Regression.","authors":"Neel Desai, Veera Baladandayuthapani, Russell T Shinohara, Jeffrey S Morris","doi":"10.1093/biostatistics/kxaf002","DOIUrl":"10.1093/biostatistics/kxaf002","url":null,"abstract":"<p><p>Assessing how brain functional connectivity networks vary across individuals promises to uncover important scientific questions such as patterns of healthy brain aging through the lifespan or dysconnectivity associated with disease. In this article, we introduce a general regression framework, Connectivity Regression (ConnReg), for regressing subject-specific functional connectivity networks on covariates while accounting for within-network inter-edge dependence. ConnReg utilizes a multivariate generalization of Fisher's transformation to project network objects into an alternative space where Gaussian assumptions are justified and positive semidefinite constraints are automatically satisfied. Penalized multivariate regression is fit in the transformed space to simultaneously induce sparsity in regression coefficients and in covariance elements, which capture within network inter-edge dependence. We use permutation tests to perform multiplicity-adjusted inference to identify covariates associated with connectivity, and stability selection scores to identify network edges that vary with selected covariates. Simulation studies validate the inferential properties of our proposed method and demonstrate how estimating and accounting for within-network inter-edge dependence leads to more efficient estimation, more powerful inference, and more accurate selection of covariate-dependent network edges. We apply ConnReg to the Human Connectome Project Young Adult study, revealing insights into how connectivity varies with language processing covariates and structural brain features.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":"26 1","pages":""},"PeriodicalIF":1.8,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12020475/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143996159","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Identification and estimation of causal effects with confounders missing not at random. 非随机缺失混杂因素的因果效应识别和估计。
IF 1.8 3区 数学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-12-31 DOI: 10.1093/biostatistics/kxaf015
Jian Sun, Bo Fu

Making causal inferences from observational studies can be challenging when confounders are missing not at random. In such cases, identifying causal effects is often not guaranteed. Motivated by a real example, we consider a treatment-independent missingness assumption under which we establish the identification of causal effects when confounders are missing not at random. We propose a weighted estimating equation approach for estimating model parameters and introduce three estimators for the average causal effect, based on regression, propensity score weighting, and doubly robust estimation. We evaluate the performance of these estimators through simulations, and provide a real data analysis to illustrate our proposed method.

当混杂因素不是随机缺失时,从观察性研究中做出因果推断可能是具有挑战性的。在这种情况下,往往不能保证确定因果关系。在一个真实例子的激励下,我们考虑了一个与治疗无关的缺失假设,在这个假设下,我们建立了混杂因素非随机缺失时因果效应的识别。我们提出了一种加权估计方程方法来估计模型参数,并引入了三种基于回归、倾向得分加权和双重稳健估计的平均因果效应估计器。我们通过模拟来评估这些估计器的性能,并提供一个真实的数据分析来说明我们提出的方法。
{"title":"Identification and estimation of causal effects with confounders missing not at random.","authors":"Jian Sun, Bo Fu","doi":"10.1093/biostatistics/kxaf015","DOIUrl":"https://doi.org/10.1093/biostatistics/kxaf015","url":null,"abstract":"<p><p>Making causal inferences from observational studies can be challenging when confounders are missing not at random. In such cases, identifying causal effects is often not guaranteed. Motivated by a real example, we consider a treatment-independent missingness assumption under which we establish the identification of causal effects when confounders are missing not at random. We propose a weighted estimating equation approach for estimating model parameters and introduce three estimators for the average causal effect, based on regression, propensity score weighting, and doubly robust estimation. We evaluate the performance of these estimators through simulations, and provide a real data analysis to illustrate our proposed method.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":"26 1","pages":""},"PeriodicalIF":1.8,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144210341","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Biostatistics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1