Biostatistics最新文献_第5页

A Bayesian approach for investigating the pharmacogenetics of combination antiretroviral therapy in people with HIV. 研究艾滋病病毒感染者抗逆转录病毒联合疗法药物遗传学的贝叶斯方法。

IF 1.8 3区数学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Biostatistics

Pub Date : 2024-10-01 DOI: 10.1093/biostatistics/kxae001

Wei Jin, Yang Ni, Amanda B Spence, Leah H Rubin, Yanxun Xu

Combination antiretroviral therapy (ART) with at least three different drugs has become the standard of care for people with HIV (PWH) due to its exceptional effectiveness in viral suppression. However, many ART drugs have been reported to associate with neuropsychiatric adverse effects including depression, especially when certain genetic polymorphisms exist. Pharmacogenetics is an important consideration for administering combination ART as it may influence drug efficacy and increase risk for neuropsychiatric conditions. Large-scale longitudinal HIV databases provide researchers opportunities to investigate the pharmacogenetics of combination ART in a data-driven manner. However, with more than 30 FDA-approved ART drugs, the interplay between the large number of possible ART drug combinations and genetic polymorphisms imposes statistical modeling challenges. We develop a Bayesian approach to examine the longitudinal effects of combination ART and their interactions with genetic polymorphisms on depressive symptoms in PWH. The proposed method utilizes a Gaussian process with a composite kernel function to capture the longitudinal combination ART effects by directly incorporating individuals' treatment histories, and a Bayesian classification and regression tree to account for individual heterogeneity. Through both simulation studies and an application to a dataset from the Women's Interagency HIV Study, we demonstrate the clinical utility of the proposed approach in investigating the pharmacogenetics of combination ART and assisting physicians to make effective individualized treatment decisions that can improve health outcomes for PWH.

至少使用三种不同药物的联合抗逆转录病毒疗法（ART）在抑制病毒方面效果显著，因此已成为艾滋病病毒感染者（PWH）的标准治疗方法。然而，据报道，许多抗逆转录病毒疗法药物都会产生神经精神方面的不良反应，包括抑郁症，尤其是在存在某些基因多态性的情况下。药物遗传学是实施联合抗逆转录病毒疗法的一个重要考虑因素，因为它可能会影响药物疗效并增加神经精神疾病的风险。大规模的艾滋病纵向数据库为研究人员提供了以数据为导向研究联合抗逆转录病毒疗法药物遗传学的机会。然而，由于美国 FDA 批准的抗逆转录病毒疗法药物超过 30 种，大量可能的抗逆转录病毒疗法药物组合与基因多态性之间的相互作用给统计建模带来了挑战。我们开发了一种贝叶斯方法来研究抗逆转录病毒疗法组合及其与遗传多态性之间的相互作用对 PWH 抑郁症状的纵向影响。所提出的方法利用具有复合核函数的高斯过程，通过直接纳入个体的治疗历史来捕捉联合抗逆转录病毒疗法的纵向效应，并利用贝叶斯分类和回归树来考虑个体的异质性。通过模拟研究和对妇女机构间艾滋病研究数据集的应用，我们证明了所提方法在研究联合抗逆转录病毒疗法的药物遗传学方面的临床实用性，并可协助医生做出有效的个体化治疗决策，从而改善艾滋病患者的健康状况。

{"title":"A Bayesian approach for investigating the pharmacogenetics of combination antiretroviral therapy in people with HIV.","authors":"Wei Jin, Yang Ni, Amanda B Spence, Leah H Rubin, Yanxun Xu","doi":"10.1093/biostatistics/kxae001","DOIUrl":"10.1093/biostatistics/kxae001","url":null,"abstract":"Combination antiretroviral therapy (ART) with at least three different drugs has become the standard of care for people with HIV (PWH) due to its exceptional effectiveness in viral suppression. However, many ART drugs have been reported to associate with neuropsychiatric adverse effects including depression, especially when certain genetic polymorphisms exist. Pharmacogenetics is an important consideration for administering combination ART as it may influence drug efficacy and increase risk for neuropsychiatric conditions. Large-scale longitudinal HIV databases provide researchers opportunities to investigate the pharmacogenetics of combination ART in a data-driven manner. However, with more than 30 FDA-approved ART drugs, the interplay between the large number of possible ART drug combinations and genetic polymorphisms imposes statistical modeling challenges. We develop a Bayesian approach to examine the longitudinal effects of combination ART and their interactions with genetic polymorphisms on depressive symptoms in PWH. The proposed method utilizes a Gaussian process with a composite kernel function to capture the longitudinal combination ART effects by directly incorporating individuals' treatment histories, and a Bayesian classification and regression tree to account for individual heterogeneity. Through both simulation studies and an application to a dataset from the Women's Interagency HIV Study, we demonstrate the clinical utility of the proposed approach in investigating the pharmacogenetics of combination ART and assisting physicians to make effective individualized treatment decisions that can improve health outcomes for PWH.","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":" ","pages":"1034-1048"},"PeriodicalIF":1.8,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139747854","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Fast matrix completion in epigenetic methylation studies with informative covariates. 在带有信息协变量的表观遗传甲基化研究中快速完成矩阵。

IF 1.8 3区数学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Biostatistics

Pub Date : 2024-10-01 DOI: 10.1093/biostatistics/kxae016

Mélina Ribaud, Aurélie Labbe, Khaled Fouda, Karim Oualkacha

DNA methylation is an important epigenetic mark that modulates gene expression through the inhibition of transcriptional proteins binding to DNA. As in many other omics experiments, the issue of missing values is an important one, and appropriate imputation techniques are important in avoiding an unnecessary sample size reduction as well as to optimally leverage the information collected. We consider the case where relatively few samples are processed via an expensive high-density whole genome bisulfite sequencing (WGBS) strategy and a larger number of samples is processed using more affordable low-density, array-based technologies. In such cases, one can impute the low-coverage (array-based) methylation data using the high-density information provided by the WGBS samples. In this paper, we propose an efficient Linear Model of Coregionalisation with informative Covariates (LMCC) to predict missing values based on observed values and covariates. Our model assumes that at each site, the methylation vector of all samples is linked to the set of fixed factors (covariates) and a set of latent factors. Furthermore, we exploit the functional nature of the data and the spatial correlation across sites by assuming some Gaussian processes on the fixed and latent coefficient vectors, respectively. Our simulations show that the use of covariates can significantly improve the accuracy of imputed values, especially in cases where missing data contain some relevant information about the explanatory variable. We also showed that our proposed model is particularly efficient when the number of columns is much greater than the number of rows-which is usually the case in methylation data analysis. Finally, we apply and compare our proposed method with alternative approaches on two real methylation datasets, showing how covariates such as cell type, tissue type or age can enhance the accuracy of imputed values.

DNA 甲基化是一种重要的表观遗传标记，它通过抑制转录蛋白与 DNA 的结合来调节基因表达。与许多其他 omics 实验一样，缺失值也是一个重要问题，适当的估算技术对于避免不必要的样本量减少以及优化利用收集到的信息非常重要。我们考虑的情况是，通过昂贵的高密度全基因组亚硫酸氢盐测序（WGBS）策略处理的样本相对较少，而通过价格更低廉的基于阵列的低密度技术处理的样本数量较多。在这种情况下，我们可以利用 WGBS 样本提供的高密度信息来推算低覆盖率（基于阵列的）甲基化数据。在本文中，我们提出了一种高效的带有信息协变量的核心区域化线性模型（LMCC），用于根据观测值和协变量预测缺失值。我们的模型假定，在每个位点，所有样本的甲基化向量都与一组固定因子（协变量）和一组潜在因子相关联。此外，我们还利用了数据的函数性质和不同位点间的空间相关性，分别假设了固定系数向量和潜在系数向量的一些高斯过程。我们的模拟结果表明，协变量的使用可以显著提高估算值的准确性，尤其是在缺失数据包含一些解释变量相关信息的情况下。我们还表明，当列数远大于行数时，我们提出的模型尤其有效--甲基化数据分析中通常就是这种情况。最后，我们在两个真实的甲基化数据集上应用并比较了我们提出的方法和其他方法，展示了细胞类型、组织类型或年龄等协变量如何提高估算值的准确性。

{"title":"Fast matrix completion in epigenetic methylation studies with informative covariates.","authors":"Mélina Ribaud, Aurélie Labbe, Khaled Fouda, Karim Oualkacha","doi":"10.1093/biostatistics/kxae016","DOIUrl":"10.1093/biostatistics/kxae016","url":null,"abstract":"DNA methylation is an important epigenetic mark that modulates gene expression through the inhibition of transcriptional proteins binding to DNA. As in many other omics experiments, the issue of missing values is an important one, and appropriate imputation techniques are important in avoiding an unnecessary sample size reduction as well as to optimally leverage the information collected. We consider the case where relatively few samples are processed via an expensive high-density whole genome bisulfite sequencing (WGBS) strategy and a larger number of samples is processed using more affordable low-density, array-based technologies. In such cases, one can impute the low-coverage (array-based) methylation data using the high-density information provided by the WGBS samples. In this paper, we propose an efficient Linear Model of Coregionalisation with informative Covariates (LMCC) to predict missing values based on observed values and covariates. Our model assumes that at each site, the methylation vector of all samples is linked to the set of fixed factors (covariates) and a set of latent factors. Furthermore, we exploit the functional nature of the data and the spatial correlation across sites by assuming some Gaussian processes on the fixed and latent coefficient vectors, respectively. Our simulations show that the use of covariates can significantly improve the accuracy of imputed values, especially in cases where missing data contain some relevant information about the explanatory variable. We also showed that our proposed model is particularly efficient when the number of columns is much greater than the number of rows-which is usually the case in methylation data analysis. Finally, we apply and compare our proposed method with alternative approaches on two real methylation datasets, showing how covariates such as cell type, tissue type or age can enhance the accuracy of imputed values.","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":" ","pages":"1062-1078"},"PeriodicalIF":1.8,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11471954/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141293984","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Tree-informed Bayesian multi-source domain adaptation: cross-population probabilistic cause-of-death assignment using verbal autopsy. 树状信息贝叶斯多源领域适应：利用口头尸检进行跨人群死因概率分配。

IF 1.8 3区数学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Biostatistics

Pub Date : 2024-10-01 DOI: 10.1093/biostatistics/kxae005

Zhenke Wu, Zehang R Li, Irena Chen, Mengbing Li

Determining causes of deaths (CODs) occurred outside of civil registration and vital statistics systems is challenging. A technique called verbal autopsy (VA) is widely adopted to gather information on deaths in practice. A VA consists of interviewing relatives of a deceased person about symptoms of the deceased in the period leading to the death, often resulting in multivariate binary responses. While statistical methods have been devised for estimating the cause-specific mortality fractions (CSMFs) for a study population, continued expansion of VA to new populations (or "domains") necessitates approaches that recognize between-domain differences while capitalizing on potential similarities. In this article, we propose such a domain-adaptive method that integrates external between-domain similarity information encoded by a prespecified rooted weighted tree. Given a cause, we use latent class models to characterize the conditional distributions of the responses that may vary by domain. We specify a logistic stick-breaking Gaussian diffusion process prior along the tree for class mixing weights with node-specific spike-and-slab priors to pool information between the domains in a data-driven way. The posterior inference is conducted via a scalable variational Bayes algorithm. Simulation studies show that the domain adaptation enabled by the proposed method improves CSMF estimation and individual COD assignment. We also illustrate and evaluate the method using a validation dataset. The article concludes with a discussion of limitations and future directions.

确定民事登记和生命统计系统之外的死亡原因（COD）具有挑战性。在实践中，一种名为口头尸检（VA）的技术被广泛用于收集死亡信息。口头尸检包括对死者亲属进行访谈，了解死者在死亡前的症状，通常会得出多变量二元回答。虽然已有统计方法用于估算研究人群的特定病因死亡率分数（CSMFs），但要继续将 VA 扩展到新的人群（或 "领域"），就必须采用既能认识到不同领域之间的差异，又能利用潜在相似性的方法。在本文中，我们提出了这样一种领域自适应方法，它整合了由预先指定的有根加权树编码的外部域间相似性信息。在给定原因的情况下，我们使用潜类模型来描述可能因领域而异的响应的条件分布。我们沿树为类混合权重指定了一个逻辑破棒高斯扩散过程先验，并指定了节点特定的尖峰和平板先验，以数据驱动的方式汇集域间信息。后验推断通过可扩展的变异贝叶斯算法进行。仿真研究表明，所提出方法的域适应性改进了 CSMF 估计和个体 COD 分配。我们还使用验证数据集对该方法进行了说明和评估。文章最后讨论了局限性和未来发展方向。

{"title":"Tree-informed Bayesian multi-source domain adaptation: cross-population probabilistic cause-of-death assignment using verbal autopsy.","authors":"Zhenke Wu, Zehang R Li, Irena Chen, Mengbing Li","doi":"10.1093/biostatistics/kxae005","DOIUrl":"10.1093/biostatistics/kxae005","url":null,"abstract":"Determining causes of deaths (CODs) occurred outside of civil registration and vital statistics systems is challenging. A technique called verbal autopsy (VA) is widely adopted to gather information on deaths in practice. A VA consists of interviewing relatives of a deceased person about symptoms of the deceased in the period leading to the death, often resulting in multivariate binary responses. While statistical methods have been devised for estimating the cause-specific mortality fractions (CSMFs) for a study population, continued expansion of VA to new populations (or \"domains\") necessitates approaches that recognize between-domain differences while capitalizing on potential similarities. In this article, we propose such a domain-adaptive method that integrates external between-domain similarity information encoded by a prespecified rooted weighted tree. Given a cause, we use latent class models to characterize the conditional distributions of the responses that may vary by domain. We specify a logistic stick-breaking Gaussian diffusion process prior along the tree for class mixing weights with node-specific spike-and-slab priors to pool information between the domains in a data-driven way. The posterior inference is conducted via a scalable variational Bayes algorithm. Simulation studies show that the domain adaptation enabled by the proposed method improves CSMF estimation and individual COD assignment. We also illustrate and evaluate the method using a validation dataset. The article concludes with a discussion of limitations and future directions.","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":" ","pages":"1233-1253"},"PeriodicalIF":1.8,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11471964/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139944717","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Neuroimaging meta regression for coordinate based meta analysis data with a spatial model. 利用空间模型对基于坐标的元分析数据进行神经成像元回归。

IF 1.8 3区数学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Biostatistics

Pub Date : 2024-10-01 DOI: 10.1093/biostatistics/kxae024

Yifan Yu, Rosario Pintos Lobo, Michael Cody Riedel, Katherine Bottenhorn, Angela R Laird, Thomas E Nichols

Coordinate-based meta-analysis combines evidence from a collection of neuroimaging studies to estimate brain activation. In such analyses, a key practical challenge is to find a computationally efficient approach with good statistical interpretability to model the locations of activation foci. In this article, we propose a generative coordinate-based meta-regression (CBMR) framework to approximate a smooth activation intensity function and investigate the effect of study-level covariates (e.g. year of publication, sample size). We employ a spline parameterization to model the spatial structure of brain activation and consider four stochastic models for modeling the random variation in foci. To examine the validity of CBMR, we estimate brain activation on 20 meta-analytic datasets, conduct spatial homogeneity tests at the voxel level, and compare the results to those generated by existing kernel-based and model-based approaches. Keywords: generalized linear models; meta-analysis; spatial statistics; statistical modeling.

基于坐标的荟萃分析结合了一系列神经成像研究的证据来估计大脑的激活情况。在此类分析中，一个关键的实际挑战是找到一种计算效率高、统计解释性好的方法来模拟激活灶的位置。在本文中，我们提出了一种基于坐标的生成元回归（CBMR）框架，以近似平滑的激活强度函数，并研究研究层面协变量（如发表年份、样本大小）的影响。我们采用样条参数化来模拟大脑激活的空间结构，并考虑了四种随机模型来模拟病灶的随机变化。为了检验 CBMR 的有效性，我们在 20 个元分析数据集上估计了脑激活情况，在体素水平上进行了空间同质性测试，并将结果与现有的基于核的方法和基于模型的方法得出的结果进行了比较。关键词：广义线性模型；元分析；空间统计学；统计建模。

{"title":"Neuroimaging meta regression for coordinate based meta analysis data with a spatial model.","authors":"Yifan Yu, Rosario Pintos Lobo, Michael Cody Riedel, Katherine Bottenhorn, Angela R Laird, Thomas E Nichols","doi":"10.1093/biostatistics/kxae024","DOIUrl":"10.1093/biostatistics/kxae024","url":null,"abstract":"Coordinate-based meta-analysis combines evidence from a collection of neuroimaging studies to estimate brain activation. In such analyses, a key practical challenge is to find a computationally efficient approach with good statistical interpretability to model the locations of activation foci. In this article, we propose a generative coordinate-based meta-regression (CBMR) framework to approximate a smooth activation intensity function and investigate the effect of study-level covariates (e.g. year of publication, sample size). We employ a spline parameterization to model the spatial structure of brain activation and consider four stochastic models for modeling the random variation in foci. To examine the validity of CBMR, we estimate brain activation on 20 meta-analytic datasets, conduct spatial homogeneity tests at the voxel level, and compare the results to those generated by existing kernel-based and model-based approaches. Keywords: generalized linear models; meta-analysis; spatial statistics; statistical modeling.","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":" ","pages":"1210-1232"},"PeriodicalIF":1.8,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11471956/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141604512","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Dynamic models augmented by hierarchical data: an application of estimating HIV epidemics at sub-national level. 分层数据增强的动态模型：估算国家以下一级艾滋病毒流行情况的应用。

IF 1.8 3区数学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Biostatistics

Pub Date : 2024-10-01 DOI: 10.1093/biostatistics/kxae003

Bao Le, Xiaoyue Niu, Tim Brown, Jeffrey W Imai-Eaton

Dynamic models have been successfully used in producing estimates of HIV epidemics at the national level due to their epidemiological nature and their ability to estimate prevalence, incidence, and mortality rates simultaneously. Recently, HIV interventions and policies have required more information at sub-national levels to support local planning, decision-making and resource allocation. Unfortunately, many areas lack sufficient data for deriving stable and reliable results, and this is a critical technical barrier to more stratified estimates. One solution is to borrow information from other areas within the same country. However, directly assuming hierarchical structures within the HIV dynamic models is complicated and computationally time-consuming. In this article, we propose a simple and innovative way to incorporate hierarchical information into the dynamical systems by using auxiliary data. The proposed method efficiently uses information from multiple areas within each country without increasing the computational burden. As a result, the new model improves predictive ability and uncertainty assessment.

动态模型具有流行病学性质，能够同时估算流行率、发病率和死亡率，因此已成功用于估算国家层面的艾滋病毒流行情况。最近，艾滋病干预措施和政策需要国家以下各级提供更多信息，以支持地方规划、决策和资源分配。遗憾的是，许多地区缺乏足够的数据来得出稳定可靠的结果，这是进行更多分层估算的关键技术障碍。解决办法之一是借用同一国家其他地区的信息。然而，在 HIV 动态模型中直接假设分层结构既复杂又耗费计算时间。在本文中，我们提出了一种简单而创新的方法，通过使用辅助数据将分层信息纳入动态系统。所提出的方法在不增加计算负担的情况下，有效地利用了每个国家内多个地区的信息。因此，新模型提高了预测能力和不确定性评估。

{"title":"Dynamic models augmented by hierarchical data: an application of estimating HIV epidemics at sub-national level.","authors":"Bao Le, Xiaoyue Niu, Tim Brown, Jeffrey W Imai-Eaton","doi":"10.1093/biostatistics/kxae003","DOIUrl":"10.1093/biostatistics/kxae003","url":null,"abstract":"Dynamic models have been successfully used in producing estimates of HIV epidemics at the national level due to their epidemiological nature and their ability to estimate prevalence, incidence, and mortality rates simultaneously. Recently, HIV interventions and policies have required more information at sub-national levels to support local planning, decision-making and resource allocation. Unfortunately, many areas lack sufficient data for deriving stable and reliable results, and this is a critical technical barrier to more stratified estimates. One solution is to borrow information from other areas within the same country. However, directly assuming hierarchical structures within the HIV dynamic models is complicated and computationally time-consuming. In this article, we propose a simple and innovative way to incorporate hierarchical information into the dynamical systems by using auxiliary data. The proposed method efficiently uses information from multiple areas within each country without increasing the computational burden. As a result, the new model improves predictive ability and uncertainty assessment.","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":" ","pages":"1049-1061"},"PeriodicalIF":1.8,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11471966/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139998375","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Bayesian mixed model inference for genetic association under related samples with brain network phenotype. 贝叶斯混合模型推断脑网络表型相关样本下的遗传关联。

IF 1.8 3区数学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Biostatistics

Pub Date : 2024-10-01 DOI: 10.1093/biostatistics/kxae008

Xinyuan Tian, Yiting Wang, Selena Wang, Yi Zhao, Yize Zhao

Genetic association studies for brain connectivity phenotypes have gained prominence due to advances in noninvasive imaging techniques and quantitative genetics. Brain connectivity traits, characterized by network configurations and unique biological structures, present distinct challenges compared to other quantitative phenotypes. Furthermore, the presence of sample relatedness in the most imaging genetics studies limits the feasibility of adopting existing network-response modeling. In this article, we fill this gap by proposing a Bayesian network-response mixed-effect model that considers a network-variate phenotype and incorporates population structures including pedigrees and unknown sample relatedness. To accommodate the inherent topological architecture associated with the genetic contributions to the phenotype, we model the effect components via a set of effect network configurations and impose an inter-network sparsity and intra-network shrinkage to dissect the phenotypic network configurations affected by the risk genetic variant. A Markov chain Monte Carlo (MCMC) algorithm is further developed to facilitate uncertainty quantification. We evaluate the performance of our model through extensive simulations. By further applying the method to study, the genetic bases for brain structural connectivity using data from the Human Connectome Project with excessive family structures, we obtain plausible and interpretable results. Beyond brain connectivity genetic studies, our proposed model also provides a general linear mixed-effect regression framework for network-variate outcomes.

由于无创成像技术和定量遗传学的进步，针对大脑连通性表型的遗传关联研究日益突出。与其他定量表型相比，以网络构型和独特生物结构为特征的大脑连接特征面临着独特的挑战。此外，在大多数成像遗传学研究中，样本相关性的存在限制了采用现有网络反应模型的可行性。在本文中，我们提出了一种贝叶斯网络反应混合效应模型，该模型考虑了网络变量表型，并纳入了包括血统和未知样本相关性在内的种群结构，从而填补了这一空白。为了适应与表型遗传贡献相关的固有拓扑结构，我们通过一组效应网络配置对效应成分进行建模，并施加网络间稀疏性和网络内收缩性，以剖析受风险遗传变异影响的表型网络配置。我们还进一步开发了马尔科夫链蒙特卡罗（MCMC）算法，以促进不确定性量化。我们通过大量模拟来评估模型的性能。通过进一步应用该方法，我们利用人类连接组项目的数据研究了大脑结构连通性的遗传基础，并获得了可信且可解释的结果。除了大脑连接性遗传研究之外，我们提出的模型还为网络变量结果提供了一般线性混合效应回归框架。

{"title":"Bayesian mixed model inference for genetic association under related samples with brain network phenotype.","authors":"Xinyuan Tian, Yiting Wang, Selena Wang, Yi Zhao, Yize Zhao","doi":"10.1093/biostatistics/kxae008","DOIUrl":"10.1093/biostatistics/kxae008","url":null,"abstract":"Genetic association studies for brain connectivity phenotypes have gained prominence due to advances in noninvasive imaging techniques and quantitative genetics. Brain connectivity traits, characterized by network configurations and unique biological structures, present distinct challenges compared to other quantitative phenotypes. Furthermore, the presence of sample relatedness in the most imaging genetics studies limits the feasibility of adopting existing network-response modeling. In this article, we fill this gap by proposing a Bayesian network-response mixed-effect model that considers a network-variate phenotype and incorporates population structures including pedigrees and unknown sample relatedness. To accommodate the inherent topological architecture associated with the genetic contributions to the phenotype, we model the effect components via a set of effect network configurations and impose an inter-network sparsity and intra-network shrinkage to dissect the phenotypic network configurations affected by the risk genetic variant. A Markov chain Monte Carlo (MCMC) algorithm is further developed to facilitate uncertainty quantification. We evaluate the performance of our model through extensive simulations. By further applying the method to study, the genetic bases for brain structural connectivity using data from the Human Connectome Project with excessive family structures, we obtain plausible and interpretable results. Beyond brain connectivity genetic studies, our proposed model also provides a general linear mixed-effect regression framework for network-variate outcomes.","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":" ","pages":"1195-1209"},"PeriodicalIF":1.8,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11639157/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140144658","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Identifying predictors of resilience to stressors in single-arm studies of pre-post change. 在前后变化的单臂研究中确定对压力的恢复能力的预测因素。

IF 1.8 3区数学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Biostatistics

Pub Date : 2024-10-01 DOI: 10.1093/biostatistics/kxad018

Ravi Varadhan, Jiafeng Zhu, Karen Bandeen-Roche

Many older adults experience a major stressor at some point in their lives. The ability to recover well after a major stressor is known as resilience. An important goal of geriatric research is to identify factors that influence resilience to stressors. Studies of resilience in older adults are typically conducted with a single-arm where everyone experiences the stressor. The simplistic approach of regressing change versus baseline yields biased estimates due to mathematical coupling and regression to the mean (RTM). We develop a method to correct the bias. We extend the method to include covariates. Our approach considers a counterfactual control group and involves sensitivity analyses to evaluate different settings of control group parameters. Only minimal distributional assumptions are required. Simulation studies demonstrate the validity of the method. We illustrate the method using a large, registry of older adults (N =7239) who underwent total knee replacement (TKR). We demonstrate how external data can be utilized to constrain the sensitivity analysis. Naive analyses implicated several treatment effect modifiers including baseline function, age, body-mass index (BMI), gender, number of comorbidities, income, and race. Corrected analysis revealed that baseline (pre-stressor) function was not strongly linked to recovery after TKR and among the covariates, only age and number of comorbidities were consistently and negatively associated with post-stressor recovery in all functional domains. Correction of mathematical coupling and RTM is necessary for drawing valid inferences regarding the effect of covariates and baseline status on pre-post change. Our method provides a simple estimator to this end.

许多老年人在一生中都会遇到重大压力。在经历重大压力后能够很好地恢复的能力被称为恢复力。老年医学研究的一个重要目标就是找出影响压力恢复能力的因素。对老年人复原力的研究通常采用单臂法，即每个人都经历压力源。由于数学耦合和均值回归（RTM）的原因，将变化与基线进行回归的简单方法会产生有偏差的估计值。我们开发了一种方法来纠正这种偏差。我们将该方法扩展到包括协变量。我们的方法考虑了反事实对照组，并进行了敏感性分析，以评估对照组参数的不同设置。只需要最低限度的分布假设。模拟研究证明了该方法的有效性。我们使用一个接受全膝关节置换术（TKR）的大型老年人登记册（N = 7239）来说明该方法。我们展示了如何利用外部数据来限制敏感性分析。原始分析揭示了多个治疗效果调节因素，包括基线功能、年龄、体重指数 (BMI)、性别、合并症数量、收入和种族。校正分析表明，基线（应激前）功能与 TKR 术后恢复的关系不大，在协变量中，只有年龄和合并症数量与应激后所有功能领域的恢复持续负相关。为了有效推断协变量和基线状态对前后变化的影响，有必要对数学耦合和 RTM 进行校正。我们的方法为此提供了一个简单的估算器。

{"title":"Identifying predictors of resilience to stressors in single-arm studies of pre-post change.","authors":"Ravi Varadhan, Jiafeng Zhu, Karen Bandeen-Roche","doi":"10.1093/biostatistics/kxad018","DOIUrl":"10.1093/biostatistics/kxad018","url":null,"abstract":"Many older adults experience a major stressor at some point in their lives. The ability to recover well after a major stressor is known as resilience. An important goal of geriatric research is to identify factors that influence resilience to stressors. Studies of resilience in older adults are typically conducted with a single-arm where everyone experiences the stressor. The simplistic approach of regressing change versus baseline yields biased estimates due to mathematical coupling and regression to the mean (RTM). We develop a method to correct the bias. We extend the method to include covariates. Our approach considers a counterfactual control group and involves sensitivity analyses to evaluate different settings of control group parameters. Only minimal distributional assumptions are required. Simulation studies demonstrate the validity of the method. We illustrate the method using a large, registry of older adults (N =7239) who underwent total knee replacement (TKR). We demonstrate how external data can be utilized to constrain the sensitivity analysis. Naive analyses implicated several treatment effect modifiers including baseline function, age, body-mass index (BMI), gender, number of comorbidities, income, and race. Corrected analysis revealed that baseline (pre-stressor) function was not strongly linked to recovery after TKR and among the covariates, only age and number of comorbidities were consistently and negatively associated with post-stressor recovery in all functional domains. Correction of mathematical coupling and RTM is necessary for drawing valid inferences regarding the effect of covariates and baseline status on pre-post change. Our method provides a simple estimator to this end.","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":" ","pages":"1094-1111"},"PeriodicalIF":1.8,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11639147/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10297247","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Evaluating dynamic and predictive discrimination for recurrent event models: use of a time-dependent C-index. 评估动态和预测判别的反复事件模型:使用时间相关的c指数。

IF 1.8 3区数学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Biostatistics

Pub Date : 2024-10-01 DOI: 10.1093/biostatistics/kxad031

Jian Wang, Xinyang Jiang, Jing Ning

Interest in analyzing recurrent event data has increased over the past few decades. One essential aspect of a risk prediction model for recurrent event data is to accurately distinguish individuals with different risks of developing a recurrent event. Although the concordance index (C-index) effectively evaluates the overall discriminative ability of a regression model for recurrent event data, a local measure is also desirable to capture dynamic performance of the regression model over time. Therefore, in this study, we propose a time-dependent C-index measure for inferring the model's discriminative ability locally. We formulated the C-index as a function of time using a flexible parametric model and constructed a concordance-based likelihood for estimation and inference. We adapted a perturbation-resampling procedure for variance estimation. Extensive simulations were conducted to investigate the proposed time-dependent C-index's finite-sample performance and estimation procedure. We applied the time-dependent C-index to three regression models of a study of re-hospitalization in patients with colorectal cancer to evaluate the models' discriminative capability.

在过去的几十年里，人们对分析周期性事件数据的兴趣越来越大。复发事件数据风险预测模型的一个重要方面是准确区分具有不同复发事件风险的个体。虽然一致性指数(C-index)有效地评估了回归模型对周期性事件数据的整体判别能力，但也需要一个局部度量来捕捉回归模型随时间的动态性能。因此，在本研究中，我们提出了一个与时间相关的c指数测度来推断模型的局部判别能力。我们使用一个灵活的参数模型将c指数表述为时间的函数，并构建了一个基于一致性的似然估计和推断。我们采用了一种扰动重采样方法来估计方差。我们进行了大量的模拟，以研究所提出的时变c指数的有限样本性能和估计过程。我们将时间依赖的c指数应用于一项结直肠癌患者再住院研究的三个回归模型，以评估模型的判别能力。

{"title":"Evaluating dynamic and predictive discrimination for recurrent event models: use of a time-dependent C-index.","authors":"Jian Wang, Xinyang Jiang, Jing Ning","doi":"10.1093/biostatistics/kxad031","DOIUrl":"10.1093/biostatistics/kxad031","url":null,"abstract":"Interest in analyzing recurrent event data has increased over the past few decades. One essential aspect of a risk prediction model for recurrent event data is to accurately distinguish individuals with different risks of developing a recurrent event. Although the concordance index (C-index) effectively evaluates the overall discriminative ability of a regression model for recurrent event data, a local measure is also desirable to capture dynamic performance of the regression model over time. Therefore, in this study, we propose a time-dependent C-index measure for inferring the model's discriminative ability locally. We formulated the C-index as a function of time using a flexible parametric model and constructed a concordance-based likelihood for estimation and inference. We adapted a perturbation-resampling procedure for variance estimation. Extensive simulations were conducted to investigate the proposed time-dependent C-index's finite-sample performance and estimation procedure. We applied the time-dependent C-index to three regression models of a study of re-hospitalization in patients with colorectal cancer to evaluate the models' discriminative capability.","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":" ","pages":"1140-1155"},"PeriodicalIF":1.8,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11471962/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89720651","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Similarity-based multimodal regression. 基于相似性的多模态回归。

IF 1.8 3区数学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Biostatistics

Pub Date : 2024-10-01 DOI: 10.1093/biostatistics/kxad033

Andrew A Chen, Sarah M Weinstein, Azeez Adebimpe, Ruben C Gur, Raquel E Gur, Kathleen R Merikangas, Theodore D Satterthwaite, Russell T Shinohara, Haochang Shou

To better understand complex human phenotypes, large-scale studies have increasingly collected multiple data modalities across domains such as imaging, mobile health, and physical activity. The properties of each data type often differ substantially and require either separate analyses or extensive processing to obtain comparable features for a combined analysis. Multimodal data fusion enables certain analyses on matrix-valued and vector-valued data, but it generally cannot integrate modalities of different dimensions and data structures. For a single data modality, multivariate distance matrix regression provides a distance-based framework for regression accommodating a wide range of data types. However, no distance-based method exists to handle multiple complementary types of data. We propose a novel distance-based regression model, which we refer to as Similarity-based Multimodal Regression (SiMMR), that enables simultaneous regression of multiple modalities through their distance profiles. We demonstrate through simulation, imaging studies, and longitudinal mobile health analyses that our proposed method can detect associations between clinical variables and multimodal data of differing properties and dimensionalities, even with modest sample sizes. We perform experiments to evaluate several different test statistics and provide recommendations for applying our method across a broad range of scenarios.

为了更好地理解复杂的人类表型，大规模研究越来越多地收集了成像、移动健康和身体活动等领域的多种数据模式。每种数据类型的属性通常差别很大，需要单独分析或广泛处理才能获得可比较的特征，以便进行组合分析。多模态数据融合可以对矩阵值和向量值数据进行一定的分析，但通常不能将不同维数和数据结构的模态融合在一起。对于单一数据模式，多变量距离矩阵回归提供了一个基于距离的框架，用于容纳各种数据类型的回归。然而，目前还没有基于距离的方法来处理多种互补类型的数据。我们提出了一种新的基于距离的回归模型，我们称之为基于相似性的多模态回归(SiMMR)，它可以通过它们的距离曲线同时回归多个模态。我们通过模拟、成像研究和纵向移动健康分析证明，即使样本量不大，我们提出的方法也可以检测临床变量与不同性质和维度的多模态数据之间的关联。我们执行实验来评估几个不同的测试统计数据，并为在广泛的场景中应用我们的方法提供建议。

{"title":"Similarity-based multimodal regression.","authors":"Andrew A Chen, Sarah M Weinstein, Azeez Adebimpe, Ruben C Gur, Raquel E Gur, Kathleen R Merikangas, Theodore D Satterthwaite, Russell T Shinohara, Haochang Shou","doi":"10.1093/biostatistics/kxad033","DOIUrl":"10.1093/biostatistics/kxad033","url":null,"abstract":"To better understand complex human phenotypes, large-scale studies have increasingly collected multiple data modalities across domains such as imaging, mobile health, and physical activity. The properties of each data type often differ substantially and require either separate analyses or extensive processing to obtain comparable features for a combined analysis. Multimodal data fusion enables certain analyses on matrix-valued and vector-valued data, but it generally cannot integrate modalities of different dimensions and data structures. For a single data modality, multivariate distance matrix regression provides a distance-based framework for regression accommodating a wide range of data types. However, no distance-based method exists to handle multiple complementary types of data. We propose a novel distance-based regression model, which we refer to as Similarity-based Multimodal Regression (SiMMR), that enables simultaneous regression of multiple modalities through their distance profiles. We demonstrate through simulation, imaging studies, and longitudinal mobile health analyses that our proposed method can detect associations between clinical variables and multimodal data of differing properties and dimensionalities, even with modest sample sizes. We perform experiments to evaluate several different test statistics and provide recommendations for applying our method across a broad range of scenarios.","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":" ","pages":"1122-1139"},"PeriodicalIF":1.8,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11471965/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138500309","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Estimation of optimal treatment regimes with electronic medical record data using the residual life value estimator. 使用剩余生命值估算器，利用电子病历数据估算最佳治疗方案。

IF 1.8 3区数学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Biostatistics

Pub Date : 2024-10-01 DOI: 10.1093/biostatistics/kxae002

Grace Rhodes, Marie Davidian, Wenbin Lu

Clinicians and patients must make treatment decisions at a series of key decision points throughout disease progression. A dynamic treatment regime is a set of sequential decision rules that return treatment decisions based on accumulating patient information, like that commonly found in electronic medical record (EMR) data. When applied to a patient population, an optimal treatment regime leads to the most favorable outcome on average. Identifying optimal treatment regimes that maximize residual life is especially desirable for patients with life-threatening diseases such as sepsis, a complex medical condition that involves severe infections with organ dysfunction. We introduce the residual life value estimator (ReLiVE), an estimator for the expected value of cumulative restricted residual life under a fixed treatment regime. Building on ReLiVE, we present a method for estimating an optimal treatment regime that maximizes expected cumulative restricted residual life. Our proposed method, ReLiVE-Q, conducts estimation via the backward induction algorithm Q-learning. We illustrate the utility of ReLiVE-Q in simulation studies, and we apply ReLiVE-Q to estimate an optimal treatment regime for septic patients in the intensive care unit using EMR data from the Multiparameter Intelligent Monitoring Intensive Care database. Ultimately, we demonstrate that ReLiVE-Q leverages accumulating patient information to estimate personalized treatment regimes that optimize a clinically meaningful function of residual life.

临床医生和患者必须在疾病进展过程中的一系列关键决策点上做出治疗决策。动态治疗方案是一套连续的决策规则，根据不断积累的患者信息（如电子病历（EMR）数据中常见的信息）返回治疗决策。当应用于患者群体时，最佳治疗方案平均会带来最有利的结果。对于脓毒症等危及生命的疾病患者来说，找出能最大限度延长剩余生命的最佳治疗方案尤为重要，脓毒症是一种复杂的疾病，涉及严重感染和器官功能障碍。我们引入了残余生命值估算器（ReLiVE），这是一种在固定治疗方案下累积受限残余生命预期值的估算器。在 ReLiVE 的基础上，我们提出了一种估算最佳治疗方案的方法，该方案可使预期累积受限残余寿命最大化。我们提出的 ReLiVE-Q 方法通过后向归纳算法 Q-learning 进行估算。我们在模拟研究中说明了 ReLiVE-Q 的实用性，并利用多参数智能监测重症监护数据库中的 EMR 数据，应用 ReLiVE-Q 估算了重症监护病房脓毒症患者的最佳治疗方案。最终，我们证明了 ReLiVE-Q 能够利用不断积累的患者信息来估算个性化治疗方案，从而优化具有临床意义的剩余生命功能。

{"title":"Estimation of optimal treatment regimes with electronic medical record data using the residual life value estimator.","authors":"Grace Rhodes, Marie Davidian, Wenbin Lu","doi":"10.1093/biostatistics/kxae002","DOIUrl":"10.1093/biostatistics/kxae002","url":null,"abstract":"Clinicians and patients must make treatment decisions at a series of key decision points throughout disease progression. A dynamic treatment regime is a set of sequential decision rules that return treatment decisions based on accumulating patient information, like that commonly found in electronic medical record (EMR) data. When applied to a patient population, an optimal treatment regime leads to the most favorable outcome on average. Identifying optimal treatment regimes that maximize residual life is especially desirable for patients with life-threatening diseases such as sepsis, a complex medical condition that involves severe infections with organ dysfunction. We introduce the residual life value estimator (ReLiVE), an estimator for the expected value of cumulative restricted residual life under a fixed treatment regime. Building on ReLiVE, we present a method for estimating an optimal treatment regime that maximizes expected cumulative restricted residual life. Our proposed method, ReLiVE-Q, conducts estimation via the backward induction algorithm Q-learning. We illustrate the utility of ReLiVE-Q in simulation studies, and we apply ReLiVE-Q to estimate an optimal treatment regime for septic patients in the intensive care unit using EMR data from the Multiparameter Intelligent Monitoring Intensive Care database. Ultimately, we demonstrate that ReLiVE-Q leverages accumulating patient information to estimate personalized treatment regimes that optimize a clinically meaningful function of residual life.","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":" ","pages":"933-946"},"PeriodicalIF":1.8,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11471959/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139708547","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0