首页 > 最新文献

Biostatistics最新文献

英文 中文
Stochastic EM algorithm for partially observed stochastic epidemics with individual heterogeneity. 具有个体异质性的部分观测随机流行病的随机 EM 算法。
IF 1.8 3区 数学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-08-08 DOI: 10.1093/biostatistics/kxae018
Fan Bu, Allison E Aiello, Alexander Volfovsky, Jason Xu

We develop a stochastic epidemic model progressing over dynamic networks, where infection rates are heterogeneous and may vary with individual-level covariates. The joint dynamics are modeled as a continuous-time Markov chain such that disease transmission is constrained by the contact network structure, and network evolution is in turn influenced by individual disease statuses. To accommodate partial epidemic observations commonly seen in real-world data, we propose a stochastic EM algorithm for inference, introducing key innovations that include efficient conditional samplers for imputing missing infection and recovery times which respect the dynamic contact network. Experiments on both synthetic and real datasets demonstrate that our inference method can accurately and efficiently recover model parameters and provide valuable insight at the presence of unobserved disease episodes in epidemic data.

我们建立了一个在动态网络上发展的随机流行病模型,在这个模型中,感染率是异质的,并可能随个体水平的协变量而变化。联合动态模型是一个连续时间马尔可夫链,疾病传播受接触网络结构的制约,而网络演化反过来又受个体疾病状态的影响。为了适应真实世界数据中常见的部分流行病观测数据,我们提出了一种用于推断的随机电磁算法,并引入了一些关键创新,包括有效的条件采样器,用于计算缺失的感染和恢复时间,这些采样器尊重动态接触网络。在合成数据集和真实数据集上进行的实验表明,我们的推理方法可以准确、高效地恢复模型参数,并对流行病数据中未观察到的疾病发作提供有价值的见解。
{"title":"Stochastic EM algorithm for partially observed stochastic epidemics with individual heterogeneity.","authors":"Fan Bu, Allison E Aiello, Alexander Volfovsky, Jason Xu","doi":"10.1093/biostatistics/kxae018","DOIUrl":"https://doi.org/10.1093/biostatistics/kxae018","url":null,"abstract":"<p><p>We develop a stochastic epidemic model progressing over dynamic networks, where infection rates are heterogeneous and may vary with individual-level covariates. The joint dynamics are modeled as a continuous-time Markov chain such that disease transmission is constrained by the contact network structure, and network evolution is in turn influenced by individual disease statuses. To accommodate partial epidemic observations commonly seen in real-world data, we propose a stochastic EM algorithm for inference, introducing key innovations that include efficient conditional samplers for imputing missing infection and recovery times which respect the dynamic contact network. Experiments on both synthetic and real datasets demonstrate that our inference method can accurately and efficiently recover model parameters and provide valuable insight at the presence of unobserved disease episodes in epidemic data.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":" ","pages":""},"PeriodicalIF":1.8,"publicationDate":"2024-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141903694","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Adaptive Gaussian Markov random fields for child mortality estimation. 用于儿童死亡率估算的自适应高斯马尔可夫随机场。
IF 1.8 3区 数学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-08-05 DOI: 10.1093/biostatistics/kxae030
Serge Aleshin-Guendel, Jon Wakefield

The under-5 mortality rate (U5MR), a critical health indicator, is typically estimated from household surveys in lower and middle income countries. Spatio-temporal disaggregation of household survey data can lead to highly variable estimates of U5MR, necessitating the usage of smoothing models which borrow information across space and time. The assumptions of common smoothing models may be unrealistic when certain time periods or regions are expected to have shocks in mortality relative to their neighbors, which can lead to oversmoothing of U5MR estimates. In this paper, we develop a spatial and temporal smoothing approach based on Gaussian Markov random field models which incorporate knowledge of these expected shocks in mortality. We demonstrate the potential for these models to improve upon alternatives not incorporating knowledge of expected shocks in a simulation study. We apply these models to estimate U5MR in Rwanda at the national level from 1985 to 2019, a time period which includes the Rwandan civil war and genocide.

5 岁以下儿童死亡率(U5MR)是一项重要的健康指标,通常由中低收入国家的住户调查估算得出。对住户调查数据进行时空分类会导致 5 岁以下儿童死亡率的估算值变化很大,因此有必要使用平滑模型来借用跨时空的信息。当某些时间段或地区的死亡率相对于其邻近地区有冲击时,普通平滑模型的假设可能不切实际,从而导致五岁以下幼儿死亡率估计值的过度平滑。在本文中,我们开发了一种基于高斯马尔可夫随机场模型的时空平滑方法,其中包含了这些预期死亡率冲击的知识。在一项模拟研究中,我们展示了这些模型改进未纳入预期冲击知识的替代方法的潜力。我们应用这些模型估算了 1985 年至 2019 年卢旺达全国的五岁以下幼儿死亡率,这一时期包括卢旺达内战和种族灭绝。
{"title":"Adaptive Gaussian Markov random fields for child mortality estimation.","authors":"Serge Aleshin-Guendel, Jon Wakefield","doi":"10.1093/biostatistics/kxae030","DOIUrl":"https://doi.org/10.1093/biostatistics/kxae030","url":null,"abstract":"<p><p>The under-5 mortality rate (U5MR), a critical health indicator, is typically estimated from household surveys in lower and middle income countries. Spatio-temporal disaggregation of household survey data can lead to highly variable estimates of U5MR, necessitating the usage of smoothing models which borrow information across space and time. The assumptions of common smoothing models may be unrealistic when certain time periods or regions are expected to have shocks in mortality relative to their neighbors, which can lead to oversmoothing of U5MR estimates. In this paper, we develop a spatial and temporal smoothing approach based on Gaussian Markov random field models which incorporate knowledge of these expected shocks in mortality. We demonstrate the potential for these models to improve upon alternatives not incorporating knowledge of expected shocks in a simulation study. We apply these models to estimate U5MR in Rwanda at the national level from 1985 to 2019, a time period which includes the Rwandan civil war and genocide.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":" ","pages":""},"PeriodicalIF":1.8,"publicationDate":"2024-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141894969","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Direct estimation and inference of higher-level correlations from lower-level measurements with applications in gene-pathway and proteomics studies. 从较低层次的测量结果直接估计和推断较高层次的相关性,并将其应用于基因通路和蛋白质组学研究。
IF 1.8 3区 数学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-07-31 DOI: 10.1093/biostatistics/kxae027
Yue Wang, Haoran Shi

This paper tackles the challenge of estimating correlations between higher-level biological variables (e.g. proteins and gene pathways) when only lower-level measurements are directly observed (e.g. peptides and individual genes). Existing methods typically aggregate lower-level data into higher-level variables and then estimate correlations based on the aggregated data. However, different data aggregation methods can yield varying correlation estimates as they target different higher-level quantities. Our solution is a latent factor model that directly estimates these higher-level correlations from lower-level data without the need for data aggregation. We further introduce a shrinkage estimator to ensure the positive definiteness and improve the accuracy of the estimated correlation matrix. Furthermore, we establish the asymptotic normality of our estimator, enabling efficient computation of P-values for the identification of significant correlations. The effectiveness of our approach is demonstrated through comprehensive simulations and the analysis of proteomics and gene expression datasets. We develop the R package highcor for implementing our method.

本文探讨了在只能直接观测到较低层次测量数据(如肽和单个基因)的情况下,如何估算较高层次生物变量(如蛋白质和基因通路)之间的相关性这一难题。现有方法通常是将较低级别的数据聚合为较高级别的变量,然后根据聚合数据估计相关性。然而,不同的数据聚合方法会产生不同的相关性估计值,因为它们针对的是不同的高层次数量。我们的解决方案是采用潜因模型,无需数据聚合,直接从低层次数据中估算这些高层次相关性。我们进一步引入了收缩估计器,以确保正定性并提高相关矩阵估计的准确性。此外,我们还建立了估计器的渐近正态性,从而可以高效计算 P 值,识别重要的相关性。我们通过对蛋白质组学和基因表达数据集的全面模拟和分析,证明了我们方法的有效性。我们开发了用于实现我们方法的 R 软件包 highcor。
{"title":"Direct estimation and inference of higher-level correlations from lower-level measurements with applications in gene-pathway and proteomics studies.","authors":"Yue Wang, Haoran Shi","doi":"10.1093/biostatistics/kxae027","DOIUrl":"https://doi.org/10.1093/biostatistics/kxae027","url":null,"abstract":"<p><p>This paper tackles the challenge of estimating correlations between higher-level biological variables (e.g. proteins and gene pathways) when only lower-level measurements are directly observed (e.g. peptides and individual genes). Existing methods typically aggregate lower-level data into higher-level variables and then estimate correlations based on the aggregated data. However, different data aggregation methods can yield varying correlation estimates as they target different higher-level quantities. Our solution is a latent factor model that directly estimates these higher-level correlations from lower-level data without the need for data aggregation. We further introduce a shrinkage estimator to ensure the positive definiteness and improve the accuracy of the estimated correlation matrix. Furthermore, we establish the asymptotic normality of our estimator, enabling efficient computation of P-values for the identification of significant correlations. The effectiveness of our approach is demonstrated through comprehensive simulations and the analysis of proteomics and gene expression datasets. We develop the R package highcor for implementing our method.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":" ","pages":""},"PeriodicalIF":1.8,"publicationDate":"2024-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141861746","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Estimating causal effects for binary outcomes using per-decision inverse probability weighting. 使用每次决定的反概率加权法估算二元结果的因果效应。
IF 1.8 3区 数学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-07-30 DOI: 10.1093/biostatistics/kxae025
Yihan Bao, Lauren Bell, Elizabeth Williamson, Claire Garnett, Tianchen Qian

Micro-randomized trials are commonly conducted for optimizing mobile health interventions such as push notifications for behavior change. In analyzing such trials, causal excursion effects are often of primary interest, and their estimation typically involves inverse probability weighting (IPW). However, in a micro-randomized trial, additional treatments can often occur during the time window over which an outcome is defined, and this can greatly inflate the variance of the causal effect estimator because IPW would involve a product of numerous weights. To reduce variance and improve estimation efficiency, we propose two new estimators using a modified version of IPW, which we call "per-decision IPW." The second estimator further improves efficiency using the projection idea from the semiparametric efficiency theory. These estimators are applicable when the outcome is binary and can be expressed as the maximum of a series of sub-outcomes defined over sub-intervals of time. We establish the estimators' consistency and asymptotic normality. Through simulation studies and real data applications, we demonstrate substantial efficiency improvement of the proposed estimator over existing estimators. The new estimators can be used to improve the precision of primary and secondary analyses for micro-randomized trials with binary outcomes.

微随机试验通常用于优化移动健康干预措施,如推送行为改变通知。在分析此类试验时,因果偏移效应通常是主要关注点,其估算通常涉及反概率加权(IPW)。然而,在微观随机试验中,在确定结果的时间窗口内经常会出现额外的治疗,这会大大增加因果效应估计值的方差,因为 IPW 会涉及众多权重的乘积。为了减少方差并提高估计效率,我们提出了两个使用改进版 IPW 的新估计器,我们称之为 "每次决定 IPW"。第二个估计器利用半参数效率理论中的投影思想进一步提高了效率。这些估计器适用于结果为二进制的情况,并可表示为一系列子结果的最大值,这些子结果定义在时间的子区间内。我们确定了估计值的一致性和渐近正态性。通过模拟研究和实际数据应用,我们证明了与现有的估计器相比,所提出的估计器在效率上有了很大的提高。新估计器可用于提高二元结果微型随机试验的一级和二级分析精度。
{"title":"Estimating causal effects for binary outcomes using per-decision inverse probability weighting.","authors":"Yihan Bao, Lauren Bell, Elizabeth Williamson, Claire Garnett, Tianchen Qian","doi":"10.1093/biostatistics/kxae025","DOIUrl":"https://doi.org/10.1093/biostatistics/kxae025","url":null,"abstract":"<p><p>Micro-randomized trials are commonly conducted for optimizing mobile health interventions such as push notifications for behavior change. In analyzing such trials, causal excursion effects are often of primary interest, and their estimation typically involves inverse probability weighting (IPW). However, in a micro-randomized trial, additional treatments can often occur during the time window over which an outcome is defined, and this can greatly inflate the variance of the causal effect estimator because IPW would involve a product of numerous weights. To reduce variance and improve estimation efficiency, we propose two new estimators using a modified version of IPW, which we call \"per-decision IPW.\" The second estimator further improves efficiency using the projection idea from the semiparametric efficiency theory. These estimators are applicable when the outcome is binary and can be expressed as the maximum of a series of sub-outcomes defined over sub-intervals of time. We establish the estimators' consistency and asymptotic normality. Through simulation studies and real data applications, we demonstrate substantial efficiency improvement of the proposed estimator over existing estimators. The new estimators can be used to improve the precision of primary and secondary analyses for micro-randomized trials with binary outcomes.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":" ","pages":""},"PeriodicalIF":1.8,"publicationDate":"2024-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141794123","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Incorporating prior information in gene expression network-based cancer heterogeneity analysis. 在基于基因表达网络的癌症异质性分析中纳入先验信息。
IF 1.8 3区 数学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-07-29 DOI: 10.1093/biostatistics/kxae028
Rong Li, Shaodong Xu, Yang Li, Zuojian Tang, Di Feng, James Cai, Shuangge Ma

Cancer is molecularly heterogeneous, with seemingly similar patients having different molecular landscapes and accordingly different clinical behaviors. In recent studies, gene expression networks have been shown as more effective/informative for cancer heterogeneity analysis than some simpler measures. Gene interconnections can be classified as "direct" and "indirect," where the latter can be caused by shared genomic regulators (such as transcription factors, microRNAs, and other regulatory molecules) and other mechanisms. It has been suggested that incorporating the regulators of gene expressions in network analysis and focusing on the direct interconnections can lead to a deeper understanding of the more essential gene interconnections. Such analysis can be seriously challenged by the large number of parameters (jointly caused by network analysis, incorporation of regulators, and heterogeneity) and often weak signals. To effectively tackle this problem, we propose incorporating prior information contained in the published literature. A key challenge is that such prior information can be partial or even wrong. We develop a two-step procedure that can flexibly accommodate different levels of prior information quality. Simulation demonstrates the effectiveness of the proposed approach and its superiority over relevant competitors. In the analysis of a breast cancer dataset, findings different from the alternatives are made, and the identified sample subgroups have important clinical differences.

癌症具有分子异质性,看似相似的患者具有不同的分子图谱,因此临床表现也不尽相同。最近的研究表明,基因表达网络比一些简单的测量方法更能有效地分析癌症的异质性。基因之间的相互联系可分为 "直接 "和 "间接 "两种,后者可能是由共享的基因组调控因子(如转录因子、microRNA 和其他调控分子)和其他机制造成的。有人认为,将基因表达的调控因子纳入网络分析并关注直接的相互联系,可以加深对更本质的基因相互联系的理解。这种分析可能会受到大量参数(由网络分析、纳入调控因子和异质性共同造成)和信号通常较弱的严重挑战。为有效解决这一问题,我们建议将已发表文献中包含的先验信息纳入其中。一个关键的挑战是,这些先验信息可能是片面的,甚至是错误的。我们开发了一种两步程序,可以灵活地适应不同程度的先验信息质量。仿真证明了所提方法的有效性及其优于相关竞争者的优势。在对乳腺癌数据集的分析中,我们得出了与其他方法不同的结论,而且所确定的样本亚群具有重要的临床差异。
{"title":"Incorporating prior information in gene expression network-based cancer heterogeneity analysis.","authors":"Rong Li, Shaodong Xu, Yang Li, Zuojian Tang, Di Feng, James Cai, Shuangge Ma","doi":"10.1093/biostatistics/kxae028","DOIUrl":"https://doi.org/10.1093/biostatistics/kxae028","url":null,"abstract":"<p><p>Cancer is molecularly heterogeneous, with seemingly similar patients having different molecular landscapes and accordingly different clinical behaviors. In recent studies, gene expression networks have been shown as more effective/informative for cancer heterogeneity analysis than some simpler measures. Gene interconnections can be classified as \"direct\" and \"indirect,\" where the latter can be caused by shared genomic regulators (such as transcription factors, microRNAs, and other regulatory molecules) and other mechanisms. It has been suggested that incorporating the regulators of gene expressions in network analysis and focusing on the direct interconnections can lead to a deeper understanding of the more essential gene interconnections. Such analysis can be seriously challenged by the large number of parameters (jointly caused by network analysis, incorporation of regulators, and heterogeneity) and often weak signals. To effectively tackle this problem, we propose incorporating prior information contained in the published literature. A key challenge is that such prior information can be partial or even wrong. We develop a two-step procedure that can flexibly accommodate different levels of prior information quality. Simulation demonstrates the effectiveness of the proposed approach and its superiority over relevant competitors. In the analysis of a breast cancer dataset, findings different from the alternatives are made, and the identified sample subgroups have important clinical differences.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":" ","pages":""},"PeriodicalIF":1.8,"publicationDate":"2024-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141794124","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Model-based multifacet clustering with high-dimensional omics applications. 基于模型的多面聚类与高维 omics 应用。
IF 1.8 3区 数学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-07-13 DOI: 10.1093/biostatistics/kxae020
Wei Zong, Danyang Li, Marianne L Seney, Colleen A Mcclung, George C Tseng

High-dimensional omics data often contain intricate and multifaceted information, resulting in the coexistence of multiple plausible sample partitions based on different subsets of selected features. Conventional clustering methods typically yield only one clustering solution, limiting their capacity to fully capture all facets of cluster structures in high-dimensional data. To address this challenge, we propose a model-based multifacet clustering (MFClust) method based on a mixture of Gaussian mixture models, where the former mixture achieves facet assignment for gene features and the latter mixture determines cluster assignment of samples. We demonstrate superior facet and cluster assignment accuracy of MFClust through simulation studies. The proposed method is applied to three transcriptomic applications from postmortem brain and lung disease studies. The result captures multifacet clustering structures associated with critical clinical variables and provides intriguing biological insights for further hypothesis generation and discovery.

高维海洋组学数据通常包含错综复杂的多方面信息,导致基于所选特征的不同子集的多个可信样本分区并存。传统的聚类方法通常只能得到一种聚类解决方案,这限制了它们充分捕捉高维数据中聚类结构所有方面的能力。为了应对这一挑战,我们提出了一种基于模型的多面聚类(MFClust)方法,该方法基于高斯混合模型的混合物,前一种混合物实现基因特征的面分配,后一种混合物决定样本的聚类分配。我们通过模拟研究证明了 MFClust 在面和聚类分配上的卓越准确性。我们将所提出的方法应用于脑死亡后和肺部疾病研究中的三个转录组应用。结果捕捉到了与关键临床变量相关的多方面聚类结构,并为进一步的假设生成和发现提供了引人入胜的生物学见解。
{"title":"Model-based multifacet clustering with high-dimensional omics applications.","authors":"Wei Zong, Danyang Li, Marianne L Seney, Colleen A Mcclung, George C Tseng","doi":"10.1093/biostatistics/kxae020","DOIUrl":"https://doi.org/10.1093/biostatistics/kxae020","url":null,"abstract":"<p><p>High-dimensional omics data often contain intricate and multifaceted information, resulting in the coexistence of multiple plausible sample partitions based on different subsets of selected features. Conventional clustering methods typically yield only one clustering solution, limiting their capacity to fully capture all facets of cluster structures in high-dimensional data. To address this challenge, we propose a model-based multifacet clustering (MFClust) method based on a mixture of Gaussian mixture models, where the former mixture achieves facet assignment for gene features and the latter mixture determines cluster assignment of samples. We demonstrate superior facet and cluster assignment accuracy of MFClust through simulation studies. The proposed method is applied to three transcriptomic applications from postmortem brain and lung disease studies. The result captures multifacet clustering structures associated with critical clinical variables and provides intriguing biological insights for further hypothesis generation and discovery.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":" ","pages":""},"PeriodicalIF":1.8,"publicationDate":"2024-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141604511","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A marginal structural model for normal tissue complication probability. 正常组织并发症概率的边际结构模型。
IF 1.8 3区 数学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-07-09 DOI: 10.1093/biostatistics/kxae019
Thai-Son Tang, Zhihui Liu, Ali Hosni, John Kim, Olli Saarela

The goal of radiation therapy for cancer is to deliver prescribed radiation dose to the tumor while minimizing dose to the surrounding healthy tissues. To evaluate treatment plans, the dose distribution to healthy organs is commonly summarized as dose-volume histograms (DVHs). Normal tissue complication probability (NTCP) modeling has centered around making patient-level risk predictions with features extracted from the DVHs, but few have considered adapting a causal framework to evaluate the safety of alternative treatment plans. We propose causal estimands for NTCP based on deterministic and stochastic interventions, as well as propose estimators based on marginal structural models that impose bivariable monotonicity between dose, volume, and toxicity risk. The properties of these estimators are studied through simulations, and their use is illustrated in the context of radiotherapy treatment of anal canal cancer patients.

癌症放射治疗的目标是将规定的放射剂量输送到肿瘤,同时尽量减少对周围健康组织的剂量。为了评估治疗计划,通常将健康器官的剂量分布总结为剂量-体积直方图(DVH)。正常组织并发症概率(NTCP)建模的核心是利用从剂量-体积直方图中提取的特征进行患者层面的风险预测,但很少有人考虑采用因果框架来评估替代治疗方案的安全性。我们提出了基于确定性和随机性干预的 NTCP 因果估计值,并提出了基于边际结构模型的估计值,这些模型在剂量、容量和毒性风险之间施加了双变量单调性。通过模拟研究了这些估计器的特性,并以肛管癌患者的放疗治疗为例说明了它们的应用。
{"title":"A marginal structural model for normal tissue complication probability.","authors":"Thai-Son Tang, Zhihui Liu, Ali Hosni, John Kim, Olli Saarela","doi":"10.1093/biostatistics/kxae019","DOIUrl":"https://doi.org/10.1093/biostatistics/kxae019","url":null,"abstract":"<p><p>The goal of radiation therapy for cancer is to deliver prescribed radiation dose to the tumor while minimizing dose to the surrounding healthy tissues. To evaluate treatment plans, the dose distribution to healthy organs is commonly summarized as dose-volume histograms (DVHs). Normal tissue complication probability (NTCP) modeling has centered around making patient-level risk predictions with features extracted from the DVHs, but few have considered adapting a causal framework to evaluate the safety of alternative treatment plans. We propose causal estimands for NTCP based on deterministic and stochastic interventions, as well as propose estimators based on marginal structural models that impose bivariable monotonicity between dose, volume, and toxicity risk. The properties of these estimators are studied through simulations, and their use is illustrated in the context of radiotherapy treatment of anal canal cancer patients.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":" ","pages":""},"PeriodicalIF":1.8,"publicationDate":"2024-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141565187","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bayesian estimation of covariate assisted principal regression for brain functional connectivity. 针对大脑功能连接性的协变量辅助主回归贝叶斯估计。
IF 1.8 3区 数学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-07-09 DOI: 10.1093/biostatistics/kxae023
Hyung G Park

This paper presents a Bayesian reformulation of covariate-assisted principal regression for covariance matrix outcomes to identify low-dimensional components in the covariance associated with covariates. By introducing a geometric approach to the covariance matrices and leveraging Euclidean geometry, we estimate dimension reduction parameters and model covariance heterogeneity based on covariates. This method enables joint estimation and uncertainty quantification of relevant model parameters associated with heteroscedasticity. We demonstrate our approach through simulation studies and apply it to analyze associations between covariates and brain functional connectivity using data from the Human Connectome Project.

本文对协方差矩阵结果的协方差辅助主回归进行了贝叶斯重构,以识别协方差中与协方差相关的低维成分。通过对协方差矩阵引入几何方法并利用欧几里得几何,我们可以根据协方差估计降维参数并建立协方差异质性模型。这种方法可以对与异方差相关的模型参数进行联合估计和不确定性量化。我们通过模拟研究展示了我们的方法,并将其应用于利用人类连接组项目的数据分析协变量与大脑功能连接之间的关联。
{"title":"Bayesian estimation of covariate assisted principal regression for brain functional connectivity.","authors":"Hyung G Park","doi":"10.1093/biostatistics/kxae023","DOIUrl":"https://doi.org/10.1093/biostatistics/kxae023","url":null,"abstract":"<p><p>This paper presents a Bayesian reformulation of covariate-assisted principal regression for covariance matrix outcomes to identify low-dimensional components in the covariance associated with covariates. By introducing a geometric approach to the covariance matrices and leveraging Euclidean geometry, we estimate dimension reduction parameters and model covariance heterogeneity based on covariates. This method enables joint estimation and uncertainty quantification of relevant model parameters associated with heteroscedasticity. We demonstrate our approach through simulation studies and apply it to analyze associations between covariates and brain functional connectivity using data from the Human Connectome Project.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":" ","pages":""},"PeriodicalIF":1.8,"publicationDate":"2024-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141565188","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multivariate spatiotemporal functional principal component analysis for modeling hospitalization and mortality rates in the dialysis population. 透析人群住院率和死亡率建模的多变量时空功能主成分分析
IF 1.8 3区 数学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-07-01 DOI: 10.1093/biostatistics/kxad013
Qi Qian, Danh V Nguyen, Donatello Telesca, Esra Kurum, Connie M Rhee, Sudipto Banerjee, Yihao Li, Damla Senturk

Dialysis patients experience frequent hospitalizations and a higher mortality rate compared to other Medicare populations, in whom hospitalizations are a major contributor to morbidity, mortality, and healthcare costs. Patients also typically remain on dialysis for the duration of their lives or until kidney transplantation. Hence, there is growing interest in studying the spatiotemporal trends in the correlated outcomes of hospitalization and mortality among dialysis patients as a function of time starting from transition to dialysis across the United States Utilizing national data from the United States Renal Data System (USRDS), we propose a novel multivariate spatiotemporal functional principal component analysis model to study the joint spatiotemporal patterns of hospitalization and mortality rates among dialysis patients. The proposal is based on a multivariate Karhunen-Loéve expansion that describes leading directions of variation across time and induces spatial correlations among region-specific scores. An efficient estimation procedure is proposed using only univariate principal components decompositions and a Markov Chain Monte Carlo framework for targeting the spatial correlations. The finite sample performance of the proposed method is studied through simulations. Novel applications to the USRDS data highlight hot spots across the United States with higher hospitalization and/or mortality rates and time periods of elevated risk.

与其他医疗保险人群相比,透析患者经历频繁的住院治疗和更高的死亡率,在其他人群中,住院治疗是发病率、死亡率和医疗费用的主要因素。患者通常在其一生中或直到肾移植前都要进行透析。因此,人们越来越有兴趣研究透析患者住院和死亡率相关结果的时空趋势,作为美国各地从过渡到透析的时间的函数。我们提出了一种新的多元时空功能主成分分析模型来研究透析患者住院率和死亡率的联合时空模式。该建议基于多元karhunen - losamade扩展,该扩展描述了跨时间变化的主要方向,并诱导了区域特定分数之间的空间相关性。提出了一种仅使用单变量主成分分解和马尔可夫链蒙特卡罗框架针对空间相关性的有效估计方法。通过仿真研究了该方法的有限样本性能。对USRDS数据的新应用突出了美国各地住院率和/或死亡率较高的热点地区以及风险升高的时间段。
{"title":"Multivariate spatiotemporal functional principal component analysis for modeling hospitalization and mortality rates in the dialysis population.","authors":"Qi Qian, Danh V Nguyen, Donatello Telesca, Esra Kurum, Connie M Rhee, Sudipto Banerjee, Yihao Li, Damla Senturk","doi":"10.1093/biostatistics/kxad013","DOIUrl":"10.1093/biostatistics/kxad013","url":null,"abstract":"<p><p>Dialysis patients experience frequent hospitalizations and a higher mortality rate compared to other Medicare populations, in whom hospitalizations are a major contributor to morbidity, mortality, and healthcare costs. Patients also typically remain on dialysis for the duration of their lives or until kidney transplantation. Hence, there is growing interest in studying the spatiotemporal trends in the correlated outcomes of hospitalization and mortality among dialysis patients as a function of time starting from transition to dialysis across the United States Utilizing national data from the United States Renal Data System (USRDS), we propose a novel multivariate spatiotemporal functional principal component analysis model to study the joint spatiotemporal patterns of hospitalization and mortality rates among dialysis patients. The proposal is based on a multivariate Karhunen-Loéve expansion that describes leading directions of variation across time and induces spatial correlations among region-specific scores. An efficient estimation procedure is proposed using only univariate principal components decompositions and a Markov Chain Monte Carlo framework for targeting the spatial correlations. The finite sample performance of the proposed method is studied through simulations. Novel applications to the USRDS data highlight hot spots across the United States with higher hospitalization and/or mortality rates and time periods of elevated risk.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":" ","pages":"718-735"},"PeriodicalIF":1.8,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11358256/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10019524","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A scalable approach for continuous time Markov models with covariates. 带有协变量的连续时间马尔可夫模型的可扩展方法
IF 1.8 3区 数学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-07-01 DOI: 10.1093/biostatistics/kxad012
Farhad Hatami, Alex Ocampo, Gordon Graham, Thomas E Nichols, Habib Ganjgahi

Existing methods for fitting continuous time Markov models (CTMM) in the presence of covariates suffer from scalability issues due to high computational cost of matrix exponentials calculated for each observation. In this article, we propose an optimization technique for CTMM which uses a stochastic gradient descent algorithm combined with differentiation of the matrix exponential using a Padé approximation. This approach makes fitting large scale data feasible. We present two methods for computing standard errors, one novel approach using the Padé expansion and the other using power series expansion of the matrix exponential. Through simulations, we find improved performance relative to existing CTMM methods, and we demonstrate the method on the large-scale multiple sclerosis NO.MS data set.

在存在协变量的情况下,现有的连续时间马尔可夫模型(CTMM)拟合方法存在可扩展性问题,原因是为每个观测值计算矩阵指数的计算成本很高。在本文中,我们提出了一种 CTMM 的优化技术,该技术使用随机梯度下降算法,并结合使用 Padé 近似对矩阵指数进行微分。这种方法可以拟合大规模数据。我们提出了两种计算标准误差的方法,一种是使用 Padé 扩展的新方法,另一种是使用矩阵指数的幂级数扩展。通过模拟,我们发现相对于现有的 CTMM 方法,该方法的性能有所提高,我们还在大规模多发性硬化 NO.MS 数据集上演示了该方法。
{"title":"A scalable approach for continuous time Markov models with covariates.","authors":"Farhad Hatami, Alex Ocampo, Gordon Graham, Thomas E Nichols, Habib Ganjgahi","doi":"10.1093/biostatistics/kxad012","DOIUrl":"10.1093/biostatistics/kxad012","url":null,"abstract":"<p><p>Existing methods for fitting continuous time Markov models (CTMM) in the presence of covariates suffer from scalability issues due to high computational cost of matrix exponentials calculated for each observation. In this article, we propose an optimization technique for CTMM which uses a stochastic gradient descent algorithm combined with differentiation of the matrix exponential using a Padé approximation. This approach makes fitting large scale data feasible. We present two methods for computing standard errors, one novel approach using the Padé expansion and the other using power series expansion of the matrix exponential. Through simulations, we find improved performance relative to existing CTMM methods, and we demonstrate the method on the large-scale multiple sclerosis NO.MS data set.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":" ","pages":"681-701"},"PeriodicalIF":1.8,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11247187/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9770094","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Biostatistics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1