首页 > 最新文献

Biostatistics最新文献

英文 中文
Functional support vector machine. 功能支持向量机
IF 1.8 3区 数学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-10-01 DOI: 10.1093/biostatistics/kxae007
Shanghong Xie, R Todd Ogden

Linear and generalized linear scalar-on-function modeling have been commonly used to understand the relationship between a scalar response variable (e.g. continuous, binary outcomes) and functional predictors. Such techniques are sensitive to model misspecification when the relationship between the response variable and the functional predictors is complex. On the other hand, support vector machines (SVMs) are among the most robust prediction models but do not take account of the high correlations between repeated measurements and cannot be used for irregular data. In this work, we propose a novel method to integrate functional principal component analysis with SVM techniques for classification and regression to account for the continuous nature of functional data and the nonlinear relationship between the scalar response variable and the functional predictors. We demonstrate the performance of our method through extensive simulation experiments and two real data applications: the classification of alcoholics using electroencephalography signals and the prediction of glucobrassicin concentration using near-infrared reflectance spectroscopy. Our methods especially have more advantages when the measurement errors in functional predictors are relatively large.

线性和广义线性标量-函数模型常用于了解标量响应变量(如连续、二元结果)与函数预测因子之间的关系。当响应变量和功能预测因子之间的关系很复杂时,这类技术对模型的错误规范很敏感。另一方面,支持向量机(SVM)是最稳健的预测模型之一,但不能考虑重复测量之间的高度相关性,也不能用于不规则数据。在这项工作中,我们提出了一种新方法,将功能主成分分析与 SVM 分类和回归技术相结合,以考虑功能数据的连续性以及标量响应变量与功能预测因子之间的非线性关系。我们通过大量模拟实验和两个真实数据应用证明了我们方法的性能:利用脑电信号对酗酒者进行分类,以及利用近红外反射光谱预测葡萄糖苷浓度。当功能预测因子的测量误差相对较大时,我们的方法尤其更具优势。
{"title":"Functional support vector machine.","authors":"Shanghong Xie, R Todd Ogden","doi":"10.1093/biostatistics/kxae007","DOIUrl":"10.1093/biostatistics/kxae007","url":null,"abstract":"<p><p>Linear and generalized linear scalar-on-function modeling have been commonly used to understand the relationship between a scalar response variable (e.g. continuous, binary outcomes) and functional predictors. Such techniques are sensitive to model misspecification when the relationship between the response variable and the functional predictors is complex. On the other hand, support vector machines (SVMs) are among the most robust prediction models but do not take account of the high correlations between repeated measurements and cannot be used for irregular data. In this work, we propose a novel method to integrate functional principal component analysis with SVM techniques for classification and regression to account for the continuous nature of functional data and the nonlinear relationship between the scalar response variable and the functional predictors. We demonstrate the performance of our method through extensive simulation experiments and two real data applications: the classification of alcoholics using electroencephalography signals and the prediction of glucobrassicin concentration using near-infrared reflectance spectroscopy. Our methods especially have more advantages when the measurement errors in functional predictors are relatively large.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":" ","pages":"1178-1194"},"PeriodicalIF":1.8,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140112299","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Correction to: Exponential family measurement error models for single-cell CRISPR screens. 更正:单细胞 CRISPR 筛选的指数族测量误差模型。
IF 1.8 3区 数学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-10-01 DOI: 10.1093/biostatistics/kxae022
{"title":"Correction to: Exponential family measurement error models for single-cell CRISPR screens.","authors":"","doi":"10.1093/biostatistics/kxae022","DOIUrl":"10.1093/biostatistics/kxae022","url":null,"abstract":"","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":" ","pages":"1273"},"PeriodicalIF":1.8,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141319004","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bayesian semiparametric model for sequential treatment decisions with informative timing. 具有信息时间的序列治疗决策的贝叶斯半参数模型。
IF 1.8 3区 数学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-10-01 DOI: 10.1093/biostatistics/kxad035
Arman Oganisian, Kelly D Getz, Todd A Alonzo, Richard Aplenc, Jason A Roy

We develop a Bayesian semiparametric model for the impact of dynamic treatment rules on survival among patients diagnosed with pediatric acute myeloid leukemia (AML). The data consist of a subset of patients enrolled in a phase III clinical trial in which patients move through a sequence of four treatment courses. At each course, they undergo treatment that may or may not include anthracyclines (ACT). While ACT is known to be effective at treating AML, it is also cardiotoxic and can lead to early death for some patients. Our task is to estimate the potential survival probability under hypothetical dynamic ACT treatment strategies, but there are several impediments. First, since ACT is not randomized, its effect on survival is confounded over time. Second, subjects initiate the next course depending on when they recover from the previous course, making timing potentially informative of subsequent treatment and survival. Third, patients may die or drop out before ever completing the full treatment sequence. We develop a generative Bayesian semiparametric model based on Gamma Process priors to address these complexities. At each treatment course, the model captures subjects' transition to subsequent treatment or death in continuous time. G-computation is used to compute a posterior over potential survival probability that is adjusted for time-varying confounding. Using our approach, we estimate the efficacy of hypothetical treatment rules that dynamically modify ACT based on evolving cardiac function.

我们针对动态治疗规则对小儿急性髓性白血病(AML)患者生存期的影响建立了一个贝叶斯半参数模型。数据由参加 III 期临床试验的患者子集组成,在该试验中,患者依次接受四个疗程的治疗。在每个疗程中,患者接受的治疗可能包括也可能不包括蒽环类药物(ACT)。众所周知,蒽环类药物能有效治疗急性髓细胞白血病,但它也有心脏毒性,可能导致一些患者过早死亡。我们的任务是估算假设的动态 ACT 治疗策略下的潜在生存概率,但这有几个障碍。首先,由于 ACT 不是随机的,它对生存的影响会随着时间的推移而受到干扰。其次,受试者何时开始下一疗程取决于他们何时从上一疗程中康复,这使得时间可能对后续治疗和存活率产生影响。第三,患者可能在完成全部治疗序列之前死亡或退出。我们开发了一种基于伽马过程先验的贝叶斯半参数生成模型来解决这些复杂问题。在每个疗程中,该模型都能连续捕捉受试者向后续治疗或死亡的转变。G 计算用于计算潜在存活概率的后验值,并根据时变混杂因素进行调整。利用我们的方法,我们估算了假设治疗规则的疗效,这些规则根据不断变化的心脏功能动态修改 ACT。
{"title":"Bayesian semiparametric model for sequential treatment decisions with informative timing.","authors":"Arman Oganisian, Kelly D Getz, Todd A Alonzo, Richard Aplenc, Jason A Roy","doi":"10.1093/biostatistics/kxad035","DOIUrl":"10.1093/biostatistics/kxad035","url":null,"abstract":"<p><p>We develop a Bayesian semiparametric model for the impact of dynamic treatment rules on survival among patients diagnosed with pediatric acute myeloid leukemia (AML). The data consist of a subset of patients enrolled in a phase III clinical trial in which patients move through a sequence of four treatment courses. At each course, they undergo treatment that may or may not include anthracyclines (ACT). While ACT is known to be effective at treating AML, it is also cardiotoxic and can lead to early death for some patients. Our task is to estimate the potential survival probability under hypothetical dynamic ACT treatment strategies, but there are several impediments. First, since ACT is not randomized, its effect on survival is confounded over time. Second, subjects initiate the next course depending on when they recover from the previous course, making timing potentially informative of subsequent treatment and survival. Third, patients may die or drop out before ever completing the full treatment sequence. We develop a generative Bayesian semiparametric model based on Gamma Process priors to address these complexities. At each treatment course, the model captures subjects' transition to subsequent treatment or death in continuous time. G-computation is used to compute a posterior over potential survival probability that is adjusted for time-varying confounding. Using our approach, we estimate the efficacy of hypothetical treatment rules that dynamically modify ACT based on evolving cardiac function.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":" ","pages":"947-961"},"PeriodicalIF":1.8,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11471958/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139479547","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Mendelian randomization analysis using multiple biomarkers of an underlying common exposure. 利用潜在共同暴露的多种生物标志物进行孟德尔随机分析。
IF 1.8 3区 数学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-10-01 DOI: 10.1093/biostatistics/kxae006
Jin Jin, Guanghao Qi, Zhi Yu, Nilanjan Chatterjee

Mendelian randomization (MR) analysis is increasingly popular for testing the causal effect of exposures on disease outcomes using data from genome-wide association studies. In some settings, the underlying exposure, such as systematic inflammation, may not be directly observable, but measurements can be available on multiple biomarkers or other types of traits that are co-regulated by the exposure. We propose a method for MR analysis on latent exposures (MRLE), which tests the significance for, and the direction of, the effect of a latent exposure by leveraging information from multiple related traits. The method is developed by constructing a set of estimating functions based on the second-order moments of GWAS summary association statistics for the observable traits, under a structural equation model where genetic variants are assumed to have indirect effects through the latent exposure and potentially direct effects on the traits. Simulation studies show that MRLE has well-controlled type I error rates and enhanced power compared to single-trait MR tests under various types of pleiotropy. Applications of MRLE using genetic association statistics across five inflammatory biomarkers (CRP, IL-6, IL-8, TNF-α, and MCP-1) provide evidence for potential causal effects of inflammation on increasing the risk of coronary artery disease, colorectal cancer, and rheumatoid arthritis, while standard MR analysis for individual biomarkers fails to detect consistent evidence for such effects.

孟德尔随机化(MR)分析越来越多地用于利用全基因组关联研究的数据测试暴露对疾病结果的因果效应。在某些情况下,潜在的暴露(如系统性炎症)可能无法直接观察到,但可以测量受暴露共同调控的多种生物标志物或其他类型的性状。我们提出了一种对潜在暴露进行磁共振分析(MRLE)的方法,通过利用多个相关性状的信息来测试潜在暴露的显著性和影响方向。该方法是在一个结构方程模型下开发的,在该模型中,假设遗传变异通过潜在暴露产生间接影响,并可能对性状产生直接影响,根据可观察性状的 GWAS 总结关联统计量的二阶矩构建一组估计函数。模拟研究表明,MRLE 具有很好的 I 型误差率控制,在各种类型的多效性条件下,与单性状 MR 检验相比,MRLE 的功率更大。在五个炎症生物标记物(CRP、IL-6、IL-8、TNF-α 和 MCP-1)中使用遗传关联统计的 MRLE 应用提供了炎症对增加冠状动脉疾病、结直肠癌和类风湿性关节炎风险的潜在因果效应的证据,而对单个生物标记物的标准 MR 分析未能检测到此类效应的一致证据。
{"title":"Mendelian randomization analysis using multiple biomarkers of an underlying common exposure.","authors":"Jin Jin, Guanghao Qi, Zhi Yu, Nilanjan Chatterjee","doi":"10.1093/biostatistics/kxae006","DOIUrl":"10.1093/biostatistics/kxae006","url":null,"abstract":"<p><p>Mendelian randomization (MR) analysis is increasingly popular for testing the causal effect of exposures on disease outcomes using data from genome-wide association studies. In some settings, the underlying exposure, such as systematic inflammation, may not be directly observable, but measurements can be available on multiple biomarkers or other types of traits that are co-regulated by the exposure. We propose a method for MR analysis on latent exposures (MRLE), which tests the significance for, and the direction of, the effect of a latent exposure by leveraging information from multiple related traits. The method is developed by constructing a set of estimating functions based on the second-order moments of GWAS summary association statistics for the observable traits, under a structural equation model where genetic variants are assumed to have indirect effects through the latent exposure and potentially direct effects on the traits. Simulation studies show that MRLE has well-controlled type I error rates and enhanced power compared to single-trait MR tests under various types of pleiotropy. Applications of MRLE using genetic association statistics across five inflammatory biomarkers (CRP, IL-6, IL-8, TNF-α, and MCP-1) provide evidence for potential causal effects of inflammation on increasing the risk of coronary artery disease, colorectal cancer, and rheumatoid arthritis, while standard MR analysis for individual biomarkers fails to detect consistent evidence for such effects.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":" ","pages":"1015-1033"},"PeriodicalIF":1.8,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140066298","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Bayesian approach for investigating the pharmacogenetics of combination antiretroviral therapy in people with HIV. 研究艾滋病病毒感染者抗逆转录病毒联合疗法药物遗传学的贝叶斯方法。
IF 1.8 3区 数学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-10-01 DOI: 10.1093/biostatistics/kxae001
Wei Jin, Yang Ni, Amanda B Spence, Leah H Rubin, Yanxun Xu

Combination antiretroviral therapy (ART) with at least three different drugs has become the standard of care for people with HIV (PWH) due to its exceptional effectiveness in viral suppression. However, many ART drugs have been reported to associate with neuropsychiatric adverse effects including depression, especially when certain genetic polymorphisms exist. Pharmacogenetics is an important consideration for administering combination ART as it may influence drug efficacy and increase risk for neuropsychiatric conditions. Large-scale longitudinal HIV databases provide researchers opportunities to investigate the pharmacogenetics of combination ART in a data-driven manner. However, with more than 30 FDA-approved ART drugs, the interplay between the large number of possible ART drug combinations and genetic polymorphisms imposes statistical modeling challenges. We develop a Bayesian approach to examine the longitudinal effects of combination ART and their interactions with genetic polymorphisms on depressive symptoms in PWH. The proposed method utilizes a Gaussian process with a composite kernel function to capture the longitudinal combination ART effects by directly incorporating individuals' treatment histories, and a Bayesian classification and regression tree to account for individual heterogeneity. Through both simulation studies and an application to a dataset from the Women's Interagency HIV Study, we demonstrate the clinical utility of the proposed approach in investigating the pharmacogenetics of combination ART and assisting physicians to make effective individualized treatment decisions that can improve health outcomes for PWH.

至少使用三种不同药物的联合抗逆转录病毒疗法(ART)在抑制病毒方面效果显著,因此已成为艾滋病病毒感染者(PWH)的标准治疗方法。然而,据报道,许多抗逆转录病毒疗法药物都会产生神经精神方面的不良反应,包括抑郁症,尤其是在存在某些基因多态性的情况下。药物遗传学是实施联合抗逆转录病毒疗法的一个重要考虑因素,因为它可能会影响药物疗效并增加神经精神疾病的风险。大规模的艾滋病纵向数据库为研究人员提供了以数据为导向研究联合抗逆转录病毒疗法药物遗传学的机会。然而,由于美国 FDA 批准的抗逆转录病毒疗法药物超过 30 种,大量可能的抗逆转录病毒疗法药物组合与基因多态性之间的相互作用给统计建模带来了挑战。我们开发了一种贝叶斯方法来研究抗逆转录病毒疗法组合及其与遗传多态性之间的相互作用对 PWH 抑郁症状的纵向影响。所提出的方法利用具有复合核函数的高斯过程,通过直接纳入个体的治疗历史来捕捉联合抗逆转录病毒疗法的纵向效应,并利用贝叶斯分类和回归树来考虑个体的异质性。通过模拟研究和对妇女机构间艾滋病研究数据集的应用,我们证明了所提方法在研究联合抗逆转录病毒疗法的药物遗传学方面的临床实用性,并可协助医生做出有效的个体化治疗决策,从而改善艾滋病患者的健康状况。
{"title":"A Bayesian approach for investigating the pharmacogenetics of combination antiretroviral therapy in people with HIV.","authors":"Wei Jin, Yang Ni, Amanda B Spence, Leah H Rubin, Yanxun Xu","doi":"10.1093/biostatistics/kxae001","DOIUrl":"10.1093/biostatistics/kxae001","url":null,"abstract":"<p><p>Combination antiretroviral therapy (ART) with at least three different drugs has become the standard of care for people with HIV (PWH) due to its exceptional effectiveness in viral suppression. However, many ART drugs have been reported to associate with neuropsychiatric adverse effects including depression, especially when certain genetic polymorphisms exist. Pharmacogenetics is an important consideration for administering combination ART as it may influence drug efficacy and increase risk for neuropsychiatric conditions. Large-scale longitudinal HIV databases provide researchers opportunities to investigate the pharmacogenetics of combination ART in a data-driven manner. However, with more than 30 FDA-approved ART drugs, the interplay between the large number of possible ART drug combinations and genetic polymorphisms imposes statistical modeling challenges. We develop a Bayesian approach to examine the longitudinal effects of combination ART and their interactions with genetic polymorphisms on depressive symptoms in PWH. The proposed method utilizes a Gaussian process with a composite kernel function to capture the longitudinal combination ART effects by directly incorporating individuals' treatment histories, and a Bayesian classification and regression tree to account for individual heterogeneity. Through both simulation studies and an application to a dataset from the Women's Interagency HIV Study, we demonstrate the clinical utility of the proposed approach in investigating the pharmacogenetics of combination ART and assisting physicians to make effective individualized treatment decisions that can improve health outcomes for PWH.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":" ","pages":"1034-1048"},"PeriodicalIF":1.8,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139747854","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Tree-informed Bayesian multi-source domain adaptation: cross-population probabilistic cause-of-death assignment using verbal autopsy. 树状信息贝叶斯多源领域适应:利用口头尸检进行跨人群死因概率分配。
IF 1.8 3区 数学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-10-01 DOI: 10.1093/biostatistics/kxae005
Zhenke Wu, Zehang R Li, Irena Chen, Mengbing Li

Determining causes of deaths (CODs) occurred outside of civil registration and vital statistics systems is challenging. A technique called verbal autopsy (VA) is widely adopted to gather information on deaths in practice. A VA consists of interviewing relatives of a deceased person about symptoms of the deceased in the period leading to the death, often resulting in multivariate binary responses. While statistical methods have been devised for estimating the cause-specific mortality fractions (CSMFs) for a study population, continued expansion of VA to new populations (or "domains") necessitates approaches that recognize between-domain differences while capitalizing on potential similarities. In this article, we propose such a domain-adaptive method that integrates external between-domain similarity information encoded by a prespecified rooted weighted tree. Given a cause, we use latent class models to characterize the conditional distributions of the responses that may vary by domain. We specify a logistic stick-breaking Gaussian diffusion process prior along the tree for class mixing weights with node-specific spike-and-slab priors to pool information between the domains in a data-driven way. The posterior inference is conducted via a scalable variational Bayes algorithm. Simulation studies show that the domain adaptation enabled by the proposed method improves CSMF estimation and individual COD assignment. We also illustrate and evaluate the method using a validation dataset. The article concludes with a discussion of limitations and future directions.

确定民事登记和生命统计系统之外的死亡原因(COD)具有挑战性。在实践中,一种名为口头尸检(VA)的技术被广泛用于收集死亡信息。口头尸检包括对死者亲属进行访谈,了解死者在死亡前的症状,通常会得出多变量二元回答。虽然已有统计方法用于估算研究人群的特定病因死亡率分数(CSMFs),但要继续将 VA 扩展到新的人群(或 "领域"),就必须采用既能认识到不同领域之间的差异,又能利用潜在相似性的方法。在本文中,我们提出了这样一种领域自适应方法,它整合了由预先指定的有根加权树编码的外部域间相似性信息。在给定原因的情况下,我们使用潜类模型来描述可能因领域而异的响应的条件分布。我们沿树为类混合权重指定了一个逻辑破棒高斯扩散过程先验,并指定了节点特定的尖峰和平板先验,以数据驱动的方式汇集域间信息。后验推断通过可扩展的变异贝叶斯算法进行。仿真研究表明,所提出方法的域适应性改进了 CSMF 估计和个体 COD 分配。我们还使用验证数据集对该方法进行了说明和评估。文章最后讨论了局限性和未来发展方向。
{"title":"Tree-informed Bayesian multi-source domain adaptation: cross-population probabilistic cause-of-death assignment using verbal autopsy.","authors":"Zhenke Wu, Zehang R Li, Irena Chen, Mengbing Li","doi":"10.1093/biostatistics/kxae005","DOIUrl":"10.1093/biostatistics/kxae005","url":null,"abstract":"<p><p>Determining causes of deaths (CODs) occurred outside of civil registration and vital statistics systems is challenging. A technique called verbal autopsy (VA) is widely adopted to gather information on deaths in practice. A VA consists of interviewing relatives of a deceased person about symptoms of the deceased in the period leading to the death, often resulting in multivariate binary responses. While statistical methods have been devised for estimating the cause-specific mortality fractions (CSMFs) for a study population, continued expansion of VA to new populations (or \"domains\") necessitates approaches that recognize between-domain differences while capitalizing on potential similarities. In this article, we propose such a domain-adaptive method that integrates external between-domain similarity information encoded by a prespecified rooted weighted tree. Given a cause, we use latent class models to characterize the conditional distributions of the responses that may vary by domain. We specify a logistic stick-breaking Gaussian diffusion process prior along the tree for class mixing weights with node-specific spike-and-slab priors to pool information between the domains in a data-driven way. The posterior inference is conducted via a scalable variational Bayes algorithm. Simulation studies show that the domain adaptation enabled by the proposed method improves CSMF estimation and individual COD assignment. We also illustrate and evaluate the method using a validation dataset. The article concludes with a discussion of limitations and future directions.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":" ","pages":"1233-1253"},"PeriodicalIF":1.8,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11471964/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139944717","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Fast matrix completion in epigenetic methylation studies with informative covariates. 在带有信息协变量的表观遗传甲基化研究中快速完成矩阵。
IF 1.8 3区 数学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-10-01 DOI: 10.1093/biostatistics/kxae016
Mélina Ribaud, Aurélie Labbe, Khaled Fouda, Karim Oualkacha

DNA methylation is an important epigenetic mark that modulates gene expression through the inhibition of transcriptional proteins binding to DNA. As in many other omics experiments, the issue of missing values is an important one, and appropriate imputation techniques are important in avoiding an unnecessary sample size reduction as well as to optimally leverage the information collected. We consider the case where relatively few samples are processed via an expensive high-density whole genome bisulfite sequencing (WGBS) strategy and a larger number of samples is processed using more affordable low-density, array-based technologies. In such cases, one can impute the low-coverage (array-based) methylation data using the high-density information provided by the WGBS samples. In this paper, we propose an efficient Linear Model of Coregionalisation with informative Covariates (LMCC) to predict missing values based on observed values and covariates. Our model assumes that at each site, the methylation vector of all samples is linked to the set of fixed factors (covariates) and a set of latent factors. Furthermore, we exploit the functional nature of the data and the spatial correlation across sites by assuming some Gaussian processes on the fixed and latent coefficient vectors, respectively. Our simulations show that the use of covariates can significantly improve the accuracy of imputed values, especially in cases where missing data contain some relevant information about the explanatory variable. We also showed that our proposed model is particularly efficient when the number of columns is much greater than the number of rows-which is usually the case in methylation data analysis. Finally, we apply and compare our proposed method with alternative approaches on two real methylation datasets, showing how covariates such as cell type, tissue type or age can enhance the accuracy of imputed values.

DNA 甲基化是一种重要的表观遗传标记,它通过抑制转录蛋白与 DNA 的结合来调节基因表达。与许多其他 omics 实验一样,缺失值也是一个重要问题,适当的估算技术对于避免不必要的样本量减少以及优化利用收集到的信息非常重要。我们考虑的情况是,通过昂贵的高密度全基因组亚硫酸氢盐测序(WGBS)策略处理的样本相对较少,而通过价格更低廉的基于阵列的低密度技术处理的样本数量较多。在这种情况下,我们可以利用 WGBS 样本提供的高密度信息来推算低覆盖率(基于阵列的)甲基化数据。在本文中,我们提出了一种高效的带有信息协变量的核心区域化线性模型(LMCC),用于根据观测值和协变量预测缺失值。我们的模型假定,在每个位点,所有样本的甲基化向量都与一组固定因子(协变量)和一组潜在因子相关联。此外,我们还利用了数据的函数性质和不同位点间的空间相关性,分别假设了固定系数向量和潜在系数向量的一些高斯过程。我们的模拟结果表明,协变量的使用可以显著提高估算值的准确性,尤其是在缺失数据包含一些解释变量相关信息的情况下。我们还表明,当列数远大于行数时,我们提出的模型尤其有效--甲基化数据分析中通常就是这种情况。最后,我们在两个真实的甲基化数据集上应用并比较了我们提出的方法和其他方法,展示了细胞类型、组织类型或年龄等协变量如何提高估算值的准确性。
{"title":"Fast matrix completion in epigenetic methylation studies with informative covariates.","authors":"Mélina Ribaud, Aurélie Labbe, Khaled Fouda, Karim Oualkacha","doi":"10.1093/biostatistics/kxae016","DOIUrl":"10.1093/biostatistics/kxae016","url":null,"abstract":"<p><p>DNA methylation is an important epigenetic mark that modulates gene expression through the inhibition of transcriptional proteins binding to DNA. As in many other omics experiments, the issue of missing values is an important one, and appropriate imputation techniques are important in avoiding an unnecessary sample size reduction as well as to optimally leverage the information collected. We consider the case where relatively few samples are processed via an expensive high-density whole genome bisulfite sequencing (WGBS) strategy and a larger number of samples is processed using more affordable low-density, array-based technologies. In such cases, one can impute the low-coverage (array-based) methylation data using the high-density information provided by the WGBS samples. In this paper, we propose an efficient Linear Model of Coregionalisation with informative Covariates (LMCC) to predict missing values based on observed values and covariates. Our model assumes that at each site, the methylation vector of all samples is linked to the set of fixed factors (covariates) and a set of latent factors. Furthermore, we exploit the functional nature of the data and the spatial correlation across sites by assuming some Gaussian processes on the fixed and latent coefficient vectors, respectively. Our simulations show that the use of covariates can significantly improve the accuracy of imputed values, especially in cases where missing data contain some relevant information about the explanatory variable. We also showed that our proposed model is particularly efficient when the number of columns is much greater than the number of rows-which is usually the case in methylation data analysis. Finally, we apply and compare our proposed method with alternative approaches on two real methylation datasets, showing how covariates such as cell type, tissue type or age can enhance the accuracy of imputed values.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":" ","pages":"1062-1078"},"PeriodicalIF":1.8,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11471954/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141293984","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bayesian mixed model inference for genetic association under related samples with brain network phenotype. 贝叶斯混合模型推断脑网络表型相关样本下的遗传关联。
IF 1.8 3区 数学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-10-01 DOI: 10.1093/biostatistics/kxae008
Xinyuan Tian, Yiting Wang, Selena Wang, Yi Zhao, Yize Zhao

Genetic association studies for brain connectivity phenotypes have gained prominence due to advances in noninvasive imaging techniques and quantitative genetics. Brain connectivity traits, characterized by network configurations and unique biological structures, present distinct challenges compared to other quantitative phenotypes. Furthermore, the presence of sample relatedness in the most imaging genetics studies limits the feasibility of adopting existing network-response modeling. In this article, we fill this gap by proposing a Bayesian network-response mixed-effect model that considers a network-variate phenotype and incorporates population structures including pedigrees and unknown sample relatedness. To accommodate the inherent topological architecture associated with the genetic contributions to the phenotype, we model the effect components via a set of effect network configurations and impose an inter-network sparsity and intra-network shrinkage to dissect the phenotypic network configurations affected by the risk genetic variant. A Markov chain Monte Carlo (MCMC) algorithm is further developed to facilitate uncertainty quantification. We evaluate the performance of our model through extensive simulations. By further applying the method to study, the genetic bases for brain structural connectivity using data from the Human Connectome Project with excessive family structures, we obtain plausible and interpretable results. Beyond brain connectivity genetic studies, our proposed model also provides a general linear mixed-effect regression framework for network-variate outcomes.

由于无创成像技术和定量遗传学的进步,针对大脑连通性表型的遗传关联研究日益突出。与其他定量表型相比,以网络构型和独特生物结构为特征的大脑连接特征面临着独特的挑战。此外,在大多数成像遗传学研究中,样本相关性的存在限制了采用现有网络反应模型的可行性。在本文中,我们提出了一种贝叶斯网络反应混合效应模型,该模型考虑了网络变量表型,并纳入了包括血统和未知样本相关性在内的种群结构,从而填补了这一空白。为了适应与表型遗传贡献相关的固有拓扑结构,我们通过一组效应网络配置对效应成分进行建模,并施加网络间稀疏性和网络内收缩性,以剖析受风险遗传变异影响的表型网络配置。我们还进一步开发了马尔科夫链蒙特卡罗(MCMC)算法,以促进不确定性量化。我们通过大量模拟来评估模型的性能。通过进一步应用该方法,我们利用人类连接组项目的数据研究了大脑结构连通性的遗传基础,并获得了可信且可解释的结果。除了大脑连接性遗传研究之外,我们提出的模型还为网络变量结果提供了一般线性混合效应回归框架。
{"title":"Bayesian mixed model inference for genetic association under related samples with brain network phenotype.","authors":"Xinyuan Tian, Yiting Wang, Selena Wang, Yi Zhao, Yize Zhao","doi":"10.1093/biostatistics/kxae008","DOIUrl":"10.1093/biostatistics/kxae008","url":null,"abstract":"<p><p>Genetic association studies for brain connectivity phenotypes have gained prominence due to advances in noninvasive imaging techniques and quantitative genetics. Brain connectivity traits, characterized by network configurations and unique biological structures, present distinct challenges compared to other quantitative phenotypes. Furthermore, the presence of sample relatedness in the most imaging genetics studies limits the feasibility of adopting existing network-response modeling. In this article, we fill this gap by proposing a Bayesian network-response mixed-effect model that considers a network-variate phenotype and incorporates population structures including pedigrees and unknown sample relatedness. To accommodate the inherent topological architecture associated with the genetic contributions to the phenotype, we model the effect components via a set of effect network configurations and impose an inter-network sparsity and intra-network shrinkage to dissect the phenotypic network configurations affected by the risk genetic variant. A Markov chain Monte Carlo (MCMC) algorithm is further developed to facilitate uncertainty quantification. We evaluate the performance of our model through extensive simulations. By further applying the method to study, the genetic bases for brain structural connectivity using data from the Human Connectome Project with excessive family structures, we obtain plausible and interpretable results. Beyond brain connectivity genetic studies, our proposed model also provides a general linear mixed-effect regression framework for network-variate outcomes.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":" ","pages":"1195-1209"},"PeriodicalIF":1.8,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140144658","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Dynamic models augmented by hierarchical data: an application of estimating HIV epidemics at sub-national level. 分层数据增强的动态模型:估算国家以下一级艾滋病毒流行情况的应用。
IF 1.8 3区 数学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-10-01 DOI: 10.1093/biostatistics/kxae003
Bao Le, Xiaoyue Niu, Tim Brown, Jeffrey W Imai-Eaton

Dynamic models have been successfully used in producing estimates of HIV epidemics at the national level due to their epidemiological nature and their ability to estimate prevalence, incidence, and mortality rates simultaneously. Recently, HIV interventions and policies have required more information at sub-national levels to support local planning, decision-making and resource allocation. Unfortunately, many areas lack sufficient data for deriving stable and reliable results, and this is a critical technical barrier to more stratified estimates. One solution is to borrow information from other areas within the same country. However, directly assuming hierarchical structures within the HIV dynamic models is complicated and computationally time-consuming. In this article, we propose a simple and innovative way to incorporate hierarchical information into the dynamical systems by using auxiliary data. The proposed method efficiently uses information from multiple areas within each country without increasing the computational burden. As a result, the new model improves predictive ability and uncertainty assessment.

动态模型具有流行病学性质,能够同时估算流行率、发病率和死亡率,因此已成功用于估算国家层面的艾滋病毒流行情况。最近,艾滋病干预措施和政策需要国家以下各级提供更多信息,以支持地方规划、决策和资源分配。遗憾的是,许多地区缺乏足够的数据来得出稳定可靠的结果,这是进行更多分层估算的关键技术障碍。解决办法之一是借用同一国家其他地区的信息。然而,在 HIV 动态模型中直接假设分层结构既复杂又耗费计算时间。在本文中,我们提出了一种简单而创新的方法,通过使用辅助数据将分层信息纳入动态系统。所提出的方法在不增加计算负担的情况下,有效地利用了每个国家内多个地区的信息。因此,新模型提高了预测能力和不确定性评估。
{"title":"Dynamic models augmented by hierarchical data: an application of estimating HIV epidemics at sub-national level.","authors":"Bao Le, Xiaoyue Niu, Tim Brown, Jeffrey W Imai-Eaton","doi":"10.1093/biostatistics/kxae003","DOIUrl":"10.1093/biostatistics/kxae003","url":null,"abstract":"<p><p>Dynamic models have been successfully used in producing estimates of HIV epidemics at the national level due to their epidemiological nature and their ability to estimate prevalence, incidence, and mortality rates simultaneously. Recently, HIV interventions and policies have required more information at sub-national levels to support local planning, decision-making and resource allocation. Unfortunately, many areas lack sufficient data for deriving stable and reliable results, and this is a critical technical barrier to more stratified estimates. One solution is to borrow information from other areas within the same country. However, directly assuming hierarchical structures within the HIV dynamic models is complicated and computationally time-consuming. In this article, we propose a simple and innovative way to incorporate hierarchical information into the dynamical systems by using auxiliary data. The proposed method efficiently uses information from multiple areas within each country without increasing the computational burden. As a result, the new model improves predictive ability and uncertainty assessment.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":" ","pages":"1049-1061"},"PeriodicalIF":1.8,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11471966/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139998375","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Neuroimaging meta regression for coordinate based meta analysis data with a spatial model. 利用空间模型对基于坐标的元分析数据进行神经成像元回归。
IF 1.8 3区 数学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-10-01 DOI: 10.1093/biostatistics/kxae024
Yifan Yu, Rosario Pintos Lobo, Michael Cody Riedel, Katherine Bottenhorn, Angela R Laird, Thomas E Nichols

Coordinate-based meta-analysis combines evidence from a collection of neuroimaging studies to estimate brain activation. In such analyses, a key practical challenge is to find a computationally efficient approach with good statistical interpretability to model the locations of activation foci. In this article, we propose a generative coordinate-based meta-regression (CBMR) framework to approximate a smooth activation intensity function and investigate the effect of study-level covariates (e.g. year of publication, sample size). We employ a spline parameterization to model the spatial structure of brain activation and consider four stochastic models for modeling the random variation in foci. To examine the validity of CBMR, we estimate brain activation on 20 meta-analytic datasets, conduct spatial homogeneity tests at the voxel level, and compare the results to those generated by existing kernel-based and model-based approaches. Keywords: generalized linear models; meta-analysis; spatial statistics; statistical modeling.

基于坐标的荟萃分析结合了一系列神经成像研究的证据来估计大脑的激活情况。在此类分析中,一个关键的实际挑战是找到一种计算效率高、统计解释性好的方法来模拟激活灶的位置。在本文中,我们提出了一种基于坐标的生成元回归(CBMR)框架,以近似平滑的激活强度函数,并研究研究层面协变量(如发表年份、样本大小)的影响。我们采用样条参数化来模拟大脑激活的空间结构,并考虑了四种随机模型来模拟病灶的随机变化。为了检验 CBMR 的有效性,我们在 20 个元分析数据集上估计了脑激活情况,在体素水平上进行了空间同质性测试,并将结果与现有的基于核的方法和基于模型的方法得出的结果进行了比较。关键词:广义线性模型;元分析;空间统计学;统计建模。
{"title":"Neuroimaging meta regression for coordinate based meta analysis data with a spatial model.","authors":"Yifan Yu, Rosario Pintos Lobo, Michael Cody Riedel, Katherine Bottenhorn, Angela R Laird, Thomas E Nichols","doi":"10.1093/biostatistics/kxae024","DOIUrl":"10.1093/biostatistics/kxae024","url":null,"abstract":"<p><p>Coordinate-based meta-analysis combines evidence from a collection of neuroimaging studies to estimate brain activation. In such analyses, a key practical challenge is to find a computationally efficient approach with good statistical interpretability to model the locations of activation foci. In this article, we propose a generative coordinate-based meta-regression (CBMR) framework to approximate a smooth activation intensity function and investigate the effect of study-level covariates (e.g. year of publication, sample size). We employ a spline parameterization to model the spatial structure of brain activation and consider four stochastic models for modeling the random variation in foci. To examine the validity of CBMR, we estimate brain activation on 20 meta-analytic datasets, conduct spatial homogeneity tests at the voxel level, and compare the results to those generated by existing kernel-based and model-based approaches. Keywords: generalized linear models; meta-analysis; spatial statistics; statistical modeling.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":" ","pages":"1210-1232"},"PeriodicalIF":1.8,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11471956/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141604512","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Biostatistics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1