首页 > 最新文献

Biometrics最新文献

英文 中文
Direct and indirect treatment effects in the presence of semicompeting risks. 半竞争风险下的直接和间接治疗效果。
IF 1.9 4区 数学 Q1 Mathematics Pub Date : 2024-03-27 DOI: 10.1093/biomtc/ujae032
Yuhao Deng, Yi Wang, Xiao-Hua Zhou

Semicompeting risks refer to the phenomenon that the terminal event (such as death) can censor the nonterminal event (such as disease progression) but not vice versa. The treatment effect on the terminal event can be delivered either directly following the treatment or indirectly through the nonterminal event. We consider 2 strategies to decompose the total effect into a direct effect and an indirect effect under the framework of mediation analysis in completely randomized experiments by adjusting the prevalence and hazard of nonterminal events, respectively. They require slightly different assumptions on cross-world quantities to achieve identifiability. We establish asymptotic properties for the estimated counterfactual cumulative incidences and decomposed treatment effects. We illustrate the subtle difference between these 2 decompositions through simulation studies and two real-data applications in the Supplementary Materials.

半竞争风险指的是终末事件(如死亡)可以抑制非终末事件(如疾病进展),反之亦然。对终末事件的治疗效果可以在治疗后直接产生,也可以通过非终末事件间接产生。我们考虑了两种策略,在完全随机实验的中介分析框架下,分别通过调整非终末事件的流行率和危险性,将总效应分解为直接效应和间接效应。为实现可识别性,它们对跨世界量的假设略有不同。我们建立了估计的反事实累积发病率和分解治疗效果的渐近特性。我们在补充材料中通过模拟研究和两个真实数据应用来说明这两种分解之间的细微差别。
{"title":"Direct and indirect treatment effects in the presence of semicompeting risks.","authors":"Yuhao Deng, Yi Wang, Xiao-Hua Zhou","doi":"10.1093/biomtc/ujae032","DOIUrl":"https://doi.org/10.1093/biomtc/ujae032","url":null,"abstract":"<p><p>Semicompeting risks refer to the phenomenon that the terminal event (such as death) can censor the nonterminal event (such as disease progression) but not vice versa. The treatment effect on the terminal event can be delivered either directly following the treatment or indirectly through the nonterminal event. We consider 2 strategies to decompose the total effect into a direct effect and an indirect effect under the framework of mediation analysis in completely randomized experiments by adjusting the prevalence and hazard of nonterminal events, respectively. They require slightly different assumptions on cross-world quantities to achieve identifiability. We establish asymptotic properties for the estimated counterfactual cumulative incidences and decomposed treatment effects. We illustrate the subtle difference between these 2 decompositions through simulation studies and two real-data applications in the Supplementary Materials.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":null,"pages":null},"PeriodicalIF":1.9,"publicationDate":"2024-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140921170","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Rejoinder to the discussion on "Bayesian meta-analysis of penetrance for cancer risk". 对 "癌症风险渗透的贝叶斯元分析 "讨论的再评论。
IF 1.4 4区 数学 Q3 BIOLOGY Pub Date : 2024-03-27 DOI: 10.1093/biomtc/ujae040
Thanthirige Lakshika M Ruberu, Danielle Braun, Giovanni Parmigiani, Swati Biswas

The five discussions of our paper provide several modeling alternatives, extensions, and generalizations that can potentially guide future research in meta-analysis. In this rejoinder, we briefly summarize and comment on some of those points.

我们论文中的五次讨论提供了几种建模替代方案、扩展和概括,这些方案和概括有可能指导未来的荟萃分析研究。在本复辩中,我们将简要总结并评论其中的一些观点。
{"title":"Rejoinder to the discussion on \"Bayesian meta-analysis of penetrance for cancer risk\".","authors":"Thanthirige Lakshika M Ruberu, Danielle Braun, Giovanni Parmigiani, Swati Biswas","doi":"10.1093/biomtc/ujae040","DOIUrl":"10.1093/biomtc/ujae040","url":null,"abstract":"<p><p>The five discussions of our paper provide several modeling alternatives, extensions, and generalizations that can potentially guide future research in meta-analysis. In this rejoinder, we briefly summarize and comment on some of those points.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2024-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141178768","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Differential recall bias in estimating treatment effects in observational studies. 在观察性研究中估计治疗效果时的不同回忆偏差。
IF 1.4 4区 数学 Q3 BIOLOGY Pub Date : 2024-03-27 DOI: 10.1093/biomtc/ujae058
Suhwan Bong, Kwonsang Lee, Francesca Dominici

Observational studies are frequently used to estimate the effect of an exposure or treatment on an outcome. To obtain an unbiased estimate of the treatment effect, it is crucial to measure the exposure accurately. A common type of exposure misclassification is recall bias, which occurs in retrospective cohort studies when study subjects may inaccurately recall their past exposure. Particularly challenging is differential recall bias in the context of self-reported binary exposures, where the bias may be directional rather than random and its extent varies according to the outcomes experienced. This paper makes several contributions: (1) it establishes bounds for the average treatment effect even when a validation study is not available; (2) it proposes multiple estimation methods across various strategies predicated on different assumptions; and (3) it suggests a sensitivity analysis technique to assess the robustness of the causal conclusion, incorporating insights from prior research. The effectiveness of these methods is demonstrated through simulation studies that explore various model misspecification scenarios. These approaches are then applied to investigate the effect of childhood physical abuse on mental health in adulthood.

观察性研究常用于估计暴露或治疗对结果的影响。要获得无偏的治疗效果估计值,准确测量暴露至关重要。一种常见的暴露误分类是回忆偏差,这种偏差发生在回顾性队列研究中,研究对象可能会不准确地回忆起他们过去的暴露情况。尤其具有挑战性的是在自我报告的二元暴露情况下出现的不同回忆偏差,这种偏差可能是定向的而不是随机的,其程度因所经历的结果而异。本文有以下几个贡献:(1) 即使在没有验证研究的情况下,也能确定平均治疗效果的界限;(2) 提出了基于不同假设的各种策略的多种估计方法;(3) 提出了一种敏感性分析技术,以评估因果结论的稳健性,并结合了先前研究的见解。这些方法的有效性是通过模拟研究来证明的,模拟研究探讨了各种模型的失当情况。然后将这些方法应用于研究童年身体虐待对成年后心理健康的影响。
{"title":"Differential recall bias in estimating treatment effects in observational studies.","authors":"Suhwan Bong, Kwonsang Lee, Francesca Dominici","doi":"10.1093/biomtc/ujae058","DOIUrl":"10.1093/biomtc/ujae058","url":null,"abstract":"<p><p>Observational studies are frequently used to estimate the effect of an exposure or treatment on an outcome. To obtain an unbiased estimate of the treatment effect, it is crucial to measure the exposure accurately. A common type of exposure misclassification is recall bias, which occurs in retrospective cohort studies when study subjects may inaccurately recall their past exposure. Particularly challenging is differential recall bias in the context of self-reported binary exposures, where the bias may be directional rather than random and its extent varies according to the outcomes experienced. This paper makes several contributions: (1) it establishes bounds for the average treatment effect even when a validation study is not available; (2) it proposes multiple estimation methods across various strategies predicated on different assumptions; and (3) it suggests a sensitivity analysis technique to assess the robustness of the causal conclusion, incorporating insights from prior research. The effectiveness of these methods is demonstrated through simulation studies that explore various model misspecification scenarios. These approaches are then applied to investigate the effect of childhood physical abuse on mental health in adulthood.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2024-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11199734/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141449623","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
High-dimensional covariate-augmented overdispersed poisson factor model. 高维协变量增强过分散泊松因子模型。
IF 1.9 4区 数学 Q1 Mathematics Pub Date : 2024-03-27 DOI: 10.1093/biomtc/ujae031
Wei Liu, Qingzhi Zhong

The current Poisson factor models often assume that the factors are unknown, which overlooks the explanatory potential of certain observable covariates. This study focuses on high dimensional settings, where the number of the count response variables and/or covariates can diverge as the sample size increases. A covariate-augmented overdispersed Poisson factor model is proposed to jointly perform a high-dimensional Poisson factor analysis and estimate a large coefficient matrix for overdispersed count data. A group of identifiability conditions is provided to theoretically guarantee computational identifiability. We incorporate the interdependence of both response variables and covariates by imposing a low-rank constraint on the large coefficient matrix. To address the computation challenges posed by nonlinearity, two high-dimensional latent matrices, and the low-rank constraint, we propose a novel variational estimation scheme that combines Laplace and Taylor approximations. We also develop a criterion based on a singular value ratio to determine the number of factors and the rank of the coefficient matrix. Comprehensive simulation studies demonstrate that the proposed method outperforms the state-of-the-art methods in estimation accuracy and computational efficiency. The practical merit of our method is demonstrated by an application to the CITE-seq dataset. A flexible implementation of our proposed method is available in the R package COAP.

目前的泊松因子模型通常假设因子是未知的,这就忽略了某些可观测协变量的解释潜力。本研究侧重于高维环境,在这种环境中,计数响应变量和/或协变量的数量会随着样本量的增加而分散。本研究提出了一种协变量增强的过度分散泊松因子模型,以联合执行高维泊松因子分析,并估计过度分散计数数据的大系数矩阵。我们提供了一组可识别性条件,从理论上保证了计算的可识别性。我们通过对大系数矩阵施加低秩约束,将响应变量和协变量的相互依赖性纳入其中。为了解决非线性、两个高维潜矩阵和低阶约束带来的计算挑战,我们提出了一种结合拉普拉斯和泰勒近似的新型变分估计方案。我们还开发了一种基于奇异值比率的标准,用于确定因子的数量和系数矩阵的秩。综合模拟研究表明,所提出的方法在估计精度和计算效率方面都优于最先进的方法。我们的方法在 CITE-seq 数据集上的应用证明了它的实用价值。我们提出的方法可在 R 软件包 COAP 中灵活实现。
{"title":"High-dimensional covariate-augmented overdispersed poisson factor model.","authors":"Wei Liu, Qingzhi Zhong","doi":"10.1093/biomtc/ujae031","DOIUrl":"https://doi.org/10.1093/biomtc/ujae031","url":null,"abstract":"<p><p>The current Poisson factor models often assume that the factors are unknown, which overlooks the explanatory potential of certain observable covariates. This study focuses on high dimensional settings, where the number of the count response variables and/or covariates can diverge as the sample size increases. A covariate-augmented overdispersed Poisson factor model is proposed to jointly perform a high-dimensional Poisson factor analysis and estimate a large coefficient matrix for overdispersed count data. A group of identifiability conditions is provided to theoretically guarantee computational identifiability. We incorporate the interdependence of both response variables and covariates by imposing a low-rank constraint on the large coefficient matrix. To address the computation challenges posed by nonlinearity, two high-dimensional latent matrices, and the low-rank constraint, we propose a novel variational estimation scheme that combines Laplace and Taylor approximations. We also develop a criterion based on a singular value ratio to determine the number of factors and the rank of the coefficient matrix. Comprehensive simulation studies demonstrate that the proposed method outperforms the state-of-the-art methods in estimation accuracy and computational efficiency. The practical merit of our method is demonstrated by an application to the CITE-seq dataset. A flexible implementation of our proposed method is available in the R package COAP.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":null,"pages":null},"PeriodicalIF":1.9,"publicationDate":"2024-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140861837","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Identifying temporal pathways using biomarkers in the presence of latent non-Gaussian components. 在存在潜在非高斯成分的情况下利用生物标记物识别时间路径。
IF 1.9 4区 数学 Q1 Mathematics Pub Date : 2024-03-27 DOI: 10.1093/biomtc/ujae033
Shanghong Xie, Donglin Zeng, Yuanjia Wang

Time-series data collected from a network of random variables are useful for identifying temporal pathways among the network nodes. Observed measurements may contain multiple sources of signals and noises, including Gaussian signals of interest and non-Gaussian noises, including artifacts, structured noise, and other unobserved factors (eg, genetic risk factors, disease susceptibility). Existing methods, including vector autoregression (VAR) and dynamic causal modeling do not account for unobserved non-Gaussian components. Furthermore, existing methods cannot effectively distinguish contemporaneous relationships from temporal relations. In this work, we propose a novel method to identify latent temporal pathways using time-series biomarker data collected from multiple subjects. The model adjusts for the non-Gaussian components and separates the temporal network from the contemporaneous network. Specifically, an independent component analysis (ICA) is used to extract the unobserved non-Gaussian components, and residuals are used to estimate the contemporaneous and temporal networks among the node variables based on method of moments. The algorithm is fast and can easily scale up. We derive the identifiability and the asymptotic properties of the temporal and contemporaneous networks. We demonstrate superior performance of our method by extensive simulations and an application to a study of attention-deficit/hyperactivity disorder (ADHD), where we analyze the temporal relationships between brain regional biomarkers. We find that temporal network edges were across different brain regions, while most contemporaneous network edges were bilateral between the same regions and belong to a subset of the functional connectivity network.

从随机变量网络中收集的时间序列数据有助于确定网络节点之间的时间路径。观测到的测量结果可能包含多种信号源和噪声源,其中包括相关的高斯信号和非高斯噪声,包括假象、结构噪声和其他未观测到的因素(如遗传风险因素、疾病易感性)。现有的方法,包括向量自回归(VAR)和动态因果建模,都没有考虑到未观察到的非高斯成分。此外,现有方法无法有效区分同期关系和时间关系。在这项工作中,我们提出了一种新方法,利用从多个受试者收集的时间序列生物标记物数据来识别潜在的时间路径。该模型调整了非高斯成分,并将时间网络与同期网络分开。具体来说,独立分量分析(ICA)用于提取未观测到的非高斯分量,残差则用于根据矩法估计节点变量之间的同期和时间网络。该算法速度快,易于扩展。我们推导了时间网络和同期网络的可识别性和渐近特性。我们通过大量模拟和应用于注意力缺陷/多动障碍(ADHD)的研究来证明我们的方法性能优越,我们分析了大脑区域生物标志物之间的时间关系。我们发现,时间网络边缘跨越不同的大脑区域,而大多数同期网络边缘是同一区域之间的双边网络,属于功能连接网络的一个子集。
{"title":"Identifying temporal pathways using biomarkers in the presence of latent non-Gaussian components.","authors":"Shanghong Xie, Donglin Zeng, Yuanjia Wang","doi":"10.1093/biomtc/ujae033","DOIUrl":"https://doi.org/10.1093/biomtc/ujae033","url":null,"abstract":"<p><p>Time-series data collected from a network of random variables are useful for identifying temporal pathways among the network nodes. Observed measurements may contain multiple sources of signals and noises, including Gaussian signals of interest and non-Gaussian noises, including artifacts, structured noise, and other unobserved factors (eg, genetic risk factors, disease susceptibility). Existing methods, including vector autoregression (VAR) and dynamic causal modeling do not account for unobserved non-Gaussian components. Furthermore, existing methods cannot effectively distinguish contemporaneous relationships from temporal relations. In this work, we propose a novel method to identify latent temporal pathways using time-series biomarker data collected from multiple subjects. The model adjusts for the non-Gaussian components and separates the temporal network from the contemporaneous network. Specifically, an independent component analysis (ICA) is used to extract the unobserved non-Gaussian components, and residuals are used to estimate the contemporaneous and temporal networks among the node variables based on method of moments. The algorithm is fast and can easily scale up. We derive the identifiability and the asymptotic properties of the temporal and contemporaneous networks. We demonstrate superior performance of our method by extensive simulations and an application to a study of attention-deficit/hyperactivity disorder (ADHD), where we analyze the temporal relationships between brain regional biomarkers. We find that temporal network edges were across different brain regions, while most contemporaneous network edges were bilateral between the same regions and belong to a subset of the functional connectivity network.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":null,"pages":null},"PeriodicalIF":1.9,"publicationDate":"2024-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140847961","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Rejoinder to "On exact randomization-based covariate-adjusted confidence intervals" by Jacob Fiksel. 对 Jacob Fiksel 所作 "基于精确随机化的协变量调整置信区间 "的反驳。
IF 1.9 4区 数学 Q1 Mathematics Pub Date : 2024-03-27 DOI: 10.1093/biomtc/ujae052
Ke Zhu, Hanzhong Liu
{"title":"Rejoinder to \"On exact randomization-based covariate-adjusted confidence intervals\" by Jacob Fiksel.","authors":"Ke Zhu, Hanzhong Liu","doi":"10.1093/biomtc/ujae052","DOIUrl":"https://doi.org/10.1093/biomtc/ujae052","url":null,"abstract":"","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":null,"pages":null},"PeriodicalIF":1.9,"publicationDate":"2024-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141260423","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Testing conditional quantile independence with functional covariate. 利用函数协变量测试条件量子独立性
IF 1.9 4区 数学 Q1 Mathematics Pub Date : 2024-03-27 DOI: 10.1093/biomtc/ujae036
Yongzhen Feng, Jie Li, Xiaojun Song

We propose a new non-parametric conditional independence test for a scalar response and a functional covariate over a continuum of quantile levels. We build a Cramer-von Mises type test statistic based on an empirical process indexed by random projections of the functional covariate, effectively avoiding the "curse of dimensionality" under the projected hypothesis, which is almost surely equivalent to the null hypothesis. The asymptotic null distribution of the proposed test statistic is obtained under some mild assumptions. The asymptotic global and local power properties of our test statistic are then investigated. We specifically demonstrate that the statistic is able to detect a broad class of local alternatives converging to the null at the parametric rate. Additionally, we recommend a simple multiplier bootstrap approach for estimating the critical values. The finite-sample performance of our statistic is examined through several Monte Carlo simulation experiments. Finally, an analysis of an EEG data set is used to show the utility and versatility of our proposed test statistic.

我们提出了一种新的非参数条件独立性检验方法,适用于量级连续体上的标量响应和函数协变量。我们基于由函数协变量随机投影索引的经验过程,建立了一个克拉默-冯-米塞斯类型的检验统计量,有效避免了投影假设下的 "维度诅咒",因为投影假设几乎肯定等同于零假设。在一些温和的假设条件下,可以得到所提检验统计量的渐近零分布。然后研究了我们的检验统计量的渐近全局和局部幂特性。我们特别证明,该统计量能够以参数速率检测出一大类收敛于空值的局部替代方案。此外,我们还推荐了一种简单的乘法引导方法来估计临界值。我们通过几个蒙特卡罗模拟实验检验了统计量的有限样本性能。最后,通过对脑电图数据集的分析,展示了我们提出的检验统计量的实用性和多功能性。
{"title":"Testing conditional quantile independence with functional covariate.","authors":"Yongzhen Feng, Jie Li, Xiaojun Song","doi":"10.1093/biomtc/ujae036","DOIUrl":"https://doi.org/10.1093/biomtc/ujae036","url":null,"abstract":"<p><p>We propose a new non-parametric conditional independence test for a scalar response and a functional covariate over a continuum of quantile levels. We build a Cramer-von Mises type test statistic based on an empirical process indexed by random projections of the functional covariate, effectively avoiding the \"curse of dimensionality\" under the projected hypothesis, which is almost surely equivalent to the null hypothesis. The asymptotic null distribution of the proposed test statistic is obtained under some mild assumptions. The asymptotic global and local power properties of our test statistic are then investigated. We specifically demonstrate that the statistic is able to detect a broad class of local alternatives converging to the null at the parametric rate. Additionally, we recommend a simple multiplier bootstrap approach for estimating the critical values. The finite-sample performance of our statistic is examined through several Monte Carlo simulation experiments. Finally, an analysis of an EEG data set is used to show the utility and versatility of our proposed test statistic.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":null,"pages":null},"PeriodicalIF":1.9,"publicationDate":"2024-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140921111","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Topical hidden genome: discovering latent cancer mutational topics using a Bayesian multilevel context-learning approach. 主题隐藏基因组:利用贝叶斯多层次语境学习方法发现潜在的癌症突变主题。
IF 1.4 4区 数学 Q3 BIOLOGY Pub Date : 2024-03-27 DOI: 10.1093/biomtc/ujae030
Saptarshi Chakraborty, Zoe Guan, Colin B Begg, Ronglai Shen

Inferring the cancer-type specificities of ultra-rare, genome-wide somatic mutations is an open problem. Traditional statistical methods cannot handle such data due to their ultra-high dimensionality and extreme data sparsity. To harness information in rare mutations, we have recently proposed a formal multilevel multilogistic "hidden genome" model. Through its hierarchical layers, the model condenses information in ultra-rare mutations through meta-features embodying mutation contexts to characterize cancer types. Consistent, scalable point estimation of the model can incorporate 10s of millions of variants across thousands of tumors and permit impressive prediction and attribution. However, principled statistical inference is infeasible due to the volume, correlation, and noninterpretability of mutation contexts. In this paper, we propose a novel framework that leverages topic models from computational linguistics to effectuate dimension reduction of mutation contexts producing interpretable, decorrelated meta-feature topics. We propose an efficient MCMC algorithm for implementation that permits rigorous full Bayesian inference at a scale that is orders of magnitude beyond the capability of existing out-of-the-box inferential high-dimensional multi-class regression methods and software. Applying our model to the Pan Cancer Analysis of Whole Genomes dataset reveals interesting biological insights including somatic mutational topics associated with UV exposure in skin cancer, aging in colorectal cancer, and strong influence of epigenome organization in liver cancer. Under cross-validation, our model demonstrates highly competitive predictive performance against blackbox methods of random forest and deep learning.

推断超罕见的全基因组体细胞突变的癌症类型特异性是一个尚未解决的问题。由于数据的超高维度和极度稀疏性,传统的统计方法无法处理此类数据。为了利用罕见突变的信息,我们最近提出了一种正式的多层次多逻辑 "隐藏基因组 "模型。通过其分层,该模型通过体现突变背景的元特征来浓缩超罕见突变的信息,从而描述癌症类型。对模型进行一致的、可扩展的点估算,可纳入数千个肿瘤中的数千万个变异,并进行令人印象深刻的预测和归因。然而,由于突变背景的数量、相关性和不可解释性,原则性统计推断是不可行的。在本文中,我们提出了一个新颖的框架,利用计算语言学中的主题模型来实现突变上下文的降维,从而产生可解释的、装饰相关的元特征主题。我们提出了一种高效的 MCMC 算法,该算法允许在现有开箱即用的高维多类回归推理方法和软件无法实现的规模上进行严格的全贝叶斯推理。将我们的模型应用于泛癌症全基因组分析数据集揭示了有趣的生物学观点,包括皮肤癌中与紫外线暴露相关的体细胞突变主题、结直肠癌中的衰老以及肝癌中表观基因组组织的强烈影响。在交叉验证下,我们的模型与随机森林和深度学习等黑盒方法相比,显示出极具竞争力的预测性能。
{"title":"Topical hidden genome: discovering latent cancer mutational topics using a Bayesian multilevel context-learning approach.","authors":"Saptarshi Chakraborty, Zoe Guan, Colin B Begg, Ronglai Shen","doi":"10.1093/biomtc/ujae030","DOIUrl":"10.1093/biomtc/ujae030","url":null,"abstract":"<p><p>Inferring the cancer-type specificities of ultra-rare, genome-wide somatic mutations is an open problem. Traditional statistical methods cannot handle such data due to their ultra-high dimensionality and extreme data sparsity. To harness information in rare mutations, we have recently proposed a formal multilevel multilogistic \"hidden genome\" model. Through its hierarchical layers, the model condenses information in ultra-rare mutations through meta-features embodying mutation contexts to characterize cancer types. Consistent, scalable point estimation of the model can incorporate 10s of millions of variants across thousands of tumors and permit impressive prediction and attribution. However, principled statistical inference is infeasible due to the volume, correlation, and noninterpretability of mutation contexts. In this paper, we propose a novel framework that leverages topic models from computational linguistics to effectuate dimension reduction of mutation contexts producing interpretable, decorrelated meta-feature topics. We propose an efficient MCMC algorithm for implementation that permits rigorous full Bayesian inference at a scale that is orders of magnitude beyond the capability of existing out-of-the-box inferential high-dimensional multi-class regression methods and software. Applying our model to the Pan Cancer Analysis of Whole Genomes dataset reveals interesting biological insights including somatic mutational topics associated with UV exposure in skin cancer, aging in colorectal cancer, and strong influence of epigenome organization in liver cancer. Under cross-validation, our model demonstrates highly competitive predictive performance against blackbox methods of random forest and deep learning.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2024-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11056772/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140847962","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Integrating randomized and observational studies to estimate optimal dynamic treatment regimes. 整合随机研究和观察研究,估算最佳动态治疗方案。
IF 1.9 4区 数学 Q1 Mathematics Pub Date : 2024-03-27 DOI: 10.1093/biomtc/ujae046
Anna Batorsky, Kevin J Anstrom, Donglin Zeng

Sequential multiple assignment randomized trials (SMARTs) are the gold standard for estimating optimal dynamic treatment regimes (DTRs), but are costly and require a large sample size. We introduce the multi-stage augmented Q-learning estimator (MAQE) to improve efficiency of estimation of optimal DTRs by augmenting SMART data with observational data. Our motivating example comes from the Back Pain Consortium, where one of the overarching aims is to learn how to tailor treatments for chronic low back pain to individual patient phenotypes, knowledge which is lacking clinically. The Consortium-wide collaborative SMART and observational studies within the Consortium collect data on the same participant phenotypes, treatments, and outcomes at multiple time points, which can easily be integrated. Previously published single-stage augmentation methods for integration of trial and observational study (OS) data were adapted to estimate optimal DTRs from SMARTs using Q-learning. Simulation studies show the MAQE, which integrates phenotype, treatment, and outcome information from multiple studies over multiple time points, more accurately estimates the optimal DTR, and has a higher average value than a comparable Q-learning estimator without augmentation. We demonstrate this improvement is robust to a wide range of trial and OS sample sizes, addition of noise variables, and effect sizes.

顺序多重分配随机试验(SMART)是估算最佳动态治疗方案(DTR)的黄金标准,但成本高昂,且需要大量样本。我们引入了多阶段增强 Q 学习估计器(MAQE),通过观测数据增强 SMART 数据来提高最佳动态治疗方案的估计效率。我们的激励性实例来自背痛联盟,该联盟的总体目标之一是学习如何根据患者的个体表型来定制慢性腰背痛的治疗方法,而这正是临床上所缺乏的知识。联盟内的 SMART 合作研究和观察性研究收集了多个时间点上相同参与者的表型、治疗和结果数据,这些数据很容易整合。之前发表的用于整合试验和观察性研究(OS)数据的单阶段增强方法经过调整后,可使用 Q-learning 从 SMARTs 中估算出最佳 DTR。模拟研究表明,MAQE 整合了多个研究在多个时间点上的表型、治疗和结果信息,能更准确地估计出最佳 DTR,其平均值也高于未进行扩增的同类 Q-learning 估计器。我们证明了这种改进对各种试验和操作系统样本大小、噪声变量的添加以及效应大小都是稳健的。
{"title":"Integrating randomized and observational studies to estimate optimal dynamic treatment regimes.","authors":"Anna Batorsky, Kevin J Anstrom, Donglin Zeng","doi":"10.1093/biomtc/ujae046","DOIUrl":"10.1093/biomtc/ujae046","url":null,"abstract":"<p><p>Sequential multiple assignment randomized trials (SMARTs) are the gold standard for estimating optimal dynamic treatment regimes (DTRs), but are costly and require a large sample size. We introduce the multi-stage augmented Q-learning estimator (MAQE) to improve efficiency of estimation of optimal DTRs by augmenting SMART data with observational data. Our motivating example comes from the Back Pain Consortium, where one of the overarching aims is to learn how to tailor treatments for chronic low back pain to individual patient phenotypes, knowledge which is lacking clinically. The Consortium-wide collaborative SMART and observational studies within the Consortium collect data on the same participant phenotypes, treatments, and outcomes at multiple time points, which can easily be integrated. Previously published single-stage augmentation methods for integration of trial and observational study (OS) data were adapted to estimate optimal DTRs from SMARTs using Q-learning. Simulation studies show the MAQE, which integrates phenotype, treatment, and outcome information from multiple studies over multiple time points, more accurately estimates the optimal DTR, and has a higher average value than a comparable Q-learning estimator without augmentation. We demonstrate this improvement is robust to a wide range of trial and OS sample sizes, addition of noise variables, and effect sizes.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":null,"pages":null},"PeriodicalIF":1.9,"publicationDate":"2024-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11130757/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141157534","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Sequential covariate-adjusted randomization via hierarchically minimizing Mahalanobis distance and marginal imbalance. 通过分层最小化 Mahalanobis 距离和边际不平衡,实现序列协变量调整随机化。
IF 1.9 4区 数学 Q1 Mathematics Pub Date : 2024-03-27 DOI: 10.1093/biomtc/ujae047
Haoyu Yang, Yichen Qin, Yang Li, Feifang Hu

In comparative studies, covariate balance and sequential allocation schemes have attracted growing academic interest. Although many theoretically justified adaptive randomization methods achieve the covariate balance, they often allocate patients in pairs or groups. To better meet the practical requirements where the clinicians cannot wait for other participants to assign the current patient for some economic or ethical reasons, we propose a method that randomizes patients individually and sequentially. The proposed method conceptually separates the covariate imbalance, measured by the newly proposed modified Mahalanobis distance, and the marginal imbalance, that is the sample size difference between the 2 groups, and it minimizes them with an explicit priority order. Compared with the existing sequential randomization methods, the proposed method achieves the best possible covariate balance while maintaining the marginal balance directly, offering us more control of the randomization process. We demonstrate the superior performance of the proposed method through a wide range of simulation studies and real data analysis, and also establish theoretical guarantees for the proposed method in terms of both the convergence of the imbalance measure and the subsequent treatment effect estimation.

在比较研究中,协变量平衡和顺序分配方案引起了越来越多的学术兴趣。尽管许多理论上合理的自适应随机化方法都能实现协变量平衡,但它们通常是将患者成对或成组分配。为了更好地满足实际需求,即临床医生因经济或伦理原因无法等待其他参与者分配当前患者,我们提出了一种单独和顺序随机分配患者的方法。该方法从概念上将协变量不平衡(用新提出的修正马哈罗诺比距离来衡量)和边际不平衡(即两组间的样本量差异)分开,并通过明确的优先顺序将其最小化。与现有的顺序随机化方法相比,所提出的方法在实现最佳协变量平衡的同时,还直接保持了边际平衡,为我们提供了对随机化过程的更多控制。我们通过大量的模拟研究和真实数据分析证明了所提方法的优越性能,并从不平衡度量的收敛性和后续的治疗效果估计两方面为所提方法建立了理论保证。
{"title":"Sequential covariate-adjusted randomization via hierarchically minimizing Mahalanobis distance and marginal imbalance.","authors":"Haoyu Yang, Yichen Qin, Yang Li, Feifang Hu","doi":"10.1093/biomtc/ujae047","DOIUrl":"https://doi.org/10.1093/biomtc/ujae047","url":null,"abstract":"<p><p>In comparative studies, covariate balance and sequential allocation schemes have attracted growing academic interest. Although many theoretically justified adaptive randomization methods achieve the covariate balance, they often allocate patients in pairs or groups. To better meet the practical requirements where the clinicians cannot wait for other participants to assign the current patient for some economic or ethical reasons, we propose a method that randomizes patients individually and sequentially. The proposed method conceptually separates the covariate imbalance, measured by the newly proposed modified Mahalanobis distance, and the marginal imbalance, that is the sample size difference between the 2 groups, and it minimizes them with an explicit priority order. Compared with the existing sequential randomization methods, the proposed method achieves the best possible covariate balance while maintaining the marginal balance directly, offering us more control of the randomization process. We demonstrate the superior performance of the proposed method through a wide range of simulation studies and real data analysis, and also establish theoretical guarantees for the proposed method in terms of both the convergence of the imbalance measure and the subsequent treatment effect estimation.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":null,"pages":null},"PeriodicalIF":1.9,"publicationDate":"2024-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141155119","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Biometrics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1