Addressing dispersion in mis-measured multivariate binomial outcomes: A novel statistical approach for detecting differentially methylated regions in bisulfite sequencing data.

IF 1.8 4区 医学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Statistics in Medicine Pub Date : 2024-09-10 Epub Date: 2024-06-26 DOI:10.1002/sim.10149
Kaiqiong Zhao, Karim Oualkacha, Yixiao Zeng, Cathy Shen, Kathleen Klein, Lajmi Lakhal-Chaieb, Aurélie Labbe, Tomi Pastinen, Marie Hudson, Inés Colmegna, Sasha Bernatsky, Celia M T Greenwood
{"title":"Addressing dispersion in mis-measured multivariate binomial outcomes: A novel statistical approach for detecting differentially methylated regions in bisulfite sequencing data.","authors":"Kaiqiong Zhao, Karim Oualkacha, Yixiao Zeng, Cathy Shen, Kathleen Klein, Lajmi Lakhal-Chaieb, Aurélie Labbe, Tomi Pastinen, Marie Hudson, Inés Colmegna, Sasha Bernatsky, Celia M T Greenwood","doi":"10.1002/sim.10149","DOIUrl":null,"url":null,"abstract":"<p><p>Motivated by a DNA methylation application, this article addresses the problem of fitting and inferring a multivariate binomial regression model for outcomes that are contaminated by errors and exhibit extra-parametric variations, also known as dispersion. While dispersion in univariate binomial regression has been extensively studied, addressing dispersion in the context of multivariate outcomes remains a complex and relatively unexplored task. The complexity arises from a noteworthy data characteristic observed in our motivating dataset: non-constant yet correlated dispersion across outcomes. To address this challenge and account for possible measurement error, we propose a novel hierarchical quasi-binomial varying coefficient mixed model, which enables flexible dispersion patterns through a combination of additive and multiplicative dispersion components. To maximize the Laplace-approximated quasi-likelihood of our model, we further develop a specialized two-stage expectation-maximization (EM) algorithm, where a plug-in estimate for the multiplicative scale parameter enhances the speed and stability of the EM iterations. Simulations demonstrated that our approach yields accurate inference for smooth covariate effects and exhibits excellent power in detecting non-zero effects. Additionally, we applied our proposed method to investigate the association between DNA methylation, measured across the genome through targeted custom capture sequencing of whole blood, and levels of anti-citrullinated protein antibodies (ACPA), a preclinical marker for rheumatoid arthritis (RA) risk. Our analysis revealed 23 significant genes that potentially contribute to ACPA-related differential methylation, highlighting the relevance of cell signaling and collagen metabolism in RA. We implemented our method in the R Bioconductor package called \"SOMNiBUS.\"</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":null,"pages":null},"PeriodicalIF":1.8000,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Statistics in Medicine","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1002/sim.10149","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/6/26 0:00:00","PubModel":"Epub","JCR":"Q3","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Motivated by a DNA methylation application, this article addresses the problem of fitting and inferring a multivariate binomial regression model for outcomes that are contaminated by errors and exhibit extra-parametric variations, also known as dispersion. While dispersion in univariate binomial regression has been extensively studied, addressing dispersion in the context of multivariate outcomes remains a complex and relatively unexplored task. The complexity arises from a noteworthy data characteristic observed in our motivating dataset: non-constant yet correlated dispersion across outcomes. To address this challenge and account for possible measurement error, we propose a novel hierarchical quasi-binomial varying coefficient mixed model, which enables flexible dispersion patterns through a combination of additive and multiplicative dispersion components. To maximize the Laplace-approximated quasi-likelihood of our model, we further develop a specialized two-stage expectation-maximization (EM) algorithm, where a plug-in estimate for the multiplicative scale parameter enhances the speed and stability of the EM iterations. Simulations demonstrated that our approach yields accurate inference for smooth covariate effects and exhibits excellent power in detecting non-zero effects. Additionally, we applied our proposed method to investigate the association between DNA methylation, measured across the genome through targeted custom capture sequencing of whole blood, and levels of anti-citrullinated protein antibodies (ACPA), a preclinical marker for rheumatoid arthritis (RA) risk. Our analysis revealed 23 significant genes that potentially contribute to ACPA-related differential methylation, highlighting the relevance of cell signaling and collagen metabolism in RA. We implemented our method in the R Bioconductor package called "SOMNiBUS."

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
解决测量误差多变量二项结果的分散问题:在亚硫酸氢盐测序数据中检测差异甲基化区域的新型统计方法。
受 DNA 甲基化应用的启发,本文探讨了如何拟合和推断受误差污染并表现出参数外变化(也称为离散性)的结果的多变量二叉回归模型的问题。虽然单变量二项回归中的离散性已被广泛研究,但解决多变量结果中的离散性问题仍然是一项复杂且相对尚未探索的任务。这种复杂性源于在我们的激励数据集中观察到的一个值得注意的数据特征:结果间非恒定但相关的离散性。为了应对这一挑战并考虑到可能存在的测量误差,我们提出了一种新颖的分层准二项式变化系数混合模型,通过结合加法和乘法离散成分来实现灵活的离散模式。为了最大化模型的拉普拉斯近似准概率,我们进一步开发了一种专门的两阶段期望最大化(EM)算法,其中对乘法规模参数的插件估计提高了 EM 迭代的速度和稳定性。模拟结果表明,我们的方法可以准确推断出平滑的协变量效应,并在检测非零效应方面表现出卓越的能力。此外,我们还应用我们提出的方法研究了 DNA 甲基化与抗瓜氨酸蛋白抗体(ACPA)水平之间的关联,DNA 甲基化是通过对全血进行有针对性的定制捕获测序在整个基因组中进行测量的,而抗瓜氨酸蛋白抗体是类风湿性关节炎(RA)风险的临床前标志物。我们的分析揭示了 23 个可能导致 ACPA 相关差异甲基化的重要基因,凸显了 RA 中细胞信号传导和胶原代谢的相关性。我们在名为 "SOMNiBUS "的 R Bioconductor 软件包中实现了我们的方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Statistics in Medicine
Statistics in Medicine 医学-公共卫生、环境卫生与职业卫生
CiteScore
3.40
自引率
10.00%
发文量
334
审稿时长
2-4 weeks
期刊介绍: The journal aims to influence practice in medicine and its associated sciences through the publication of papers on statistical and other quantitative methods. Papers will explain new methods and demonstrate their application, preferably through a substantive, real, motivating example or a comprehensive evaluation based on an illustrative example. Alternatively, papers will report on case-studies where creative use or technical generalizations of established methodology is directed towards a substantive application. Reviews of, and tutorials on, general topics relevant to the application of statistics to medicine will also be published. The main criteria for publication are appropriateness of the statistical methods to a particular medical problem and clarity of exposition. Papers with primarily mathematical content will be excluded. The journal aims to enhance communication between statisticians, clinicians and medical researchers.
期刊最新文献
Estimating Time-Varying Exposure Effects Through Continuous-Time Modelling in Mendelian Randomization. Regression Approaches to Assess Effect of Treatments That Arrest Progression of Symptoms. Latent Archetypes of the Spatial Patterns of Cancer. Pairwise Accelerated Failure Time Regression Models for Infectious Disease Transmission in Close-Contact Groups With External Sources of Infection. Weighted Expectile Regression Neural Networks for Right Censored Data.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1