Interaction models matter: an efficient, flexible computational framework for model-specific investigation of epistasis.

IF 4 3区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Biodata Mining Pub Date : 2024-02-28 DOI:10.1186/s13040-024-00358-0
Sandra Batista, Vered Senderovich Madar, Philip J Freda, Priyanka Bhandary, Attri Ghosh, Nicholas Matsumoto, Apurva S Chitre, Abraham A Palmer, Jason H Moore
{"title":"Interaction models matter: an efficient, flexible computational framework for model-specific investigation of epistasis.","authors":"Sandra Batista, Vered Senderovich Madar, Philip J Freda, Priyanka Bhandary, Attri Ghosh, Nicholas Matsumoto, Apurva S Chitre, Abraham A Palmer, Jason H Moore","doi":"10.1186/s13040-024-00358-0","DOIUrl":null,"url":null,"abstract":"<p><strong>Purpose: </strong>Epistasis, the interaction between two or more genes, is integral to the study of genetics and is present throughout nature. Yet, it is seldom fully explored as most approaches primarily focus on single-locus effects, partly because analyzing all pairwise and higher-order interactions requires significant computational resources. Furthermore, existing methods for epistasis detection only consider a Cartesian (multiplicative) model for interaction terms. This is likely limiting as epistatic interactions can evolve to produce varied relationships between genetic loci, some complex and not linearly separable.</p><p><strong>Methods: </strong>We present new algorithms for the interaction coefficients for standard regression models for epistasis that permit many varied models for the interaction terms for loci and efficient memory usage. The algorithms are given for two-way and three-way epistasis and may be generalized to higher order epistasis. Statistical tests for the interaction coefficients are also provided. We also present an efficient matrix based algorithm for permutation testing for two-way epistasis. We offer a proof and experimental evidence that methods that look for epistasis only at loci that have main effects may not be justified. Given the computational efficiency of the algorithm, we applied the method to a rat data set and mouse data set, with at least 10,000 loci and 1,000 samples each, using the standard Cartesian model and the XOR model to explore body mass index.</p><p><strong>Results: </strong>This study reveals that although many of the loci found to exhibit significant statistical epistasis overlap between models in rats, the pairs are mostly distinct. Further, the XOR model found greater evidence for statistical epistasis in many more pairs of loci in both data sets with almost all significant epistasis in mice identified using XOR. In the rat data set, loci involved in epistasis under the XOR model are enriched for biologically relevant pathways.</p><p><strong>Conclusion: </strong>Our results in both species show that many biologically relevant epistatic relationships would have been undetected if only one interaction model was applied, providing evidence that varied interaction models should be implemented to explore epistatic interactions that occur in living systems.</p>","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"17 1","pages":"7"},"PeriodicalIF":4.0000,"publicationDate":"2024-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10900690/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biodata Mining","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s13040-024-00358-0","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Purpose: Epistasis, the interaction between two or more genes, is integral to the study of genetics and is present throughout nature. Yet, it is seldom fully explored as most approaches primarily focus on single-locus effects, partly because analyzing all pairwise and higher-order interactions requires significant computational resources. Furthermore, existing methods for epistasis detection only consider a Cartesian (multiplicative) model for interaction terms. This is likely limiting as epistatic interactions can evolve to produce varied relationships between genetic loci, some complex and not linearly separable.

Methods: We present new algorithms for the interaction coefficients for standard regression models for epistasis that permit many varied models for the interaction terms for loci and efficient memory usage. The algorithms are given for two-way and three-way epistasis and may be generalized to higher order epistasis. Statistical tests for the interaction coefficients are also provided. We also present an efficient matrix based algorithm for permutation testing for two-way epistasis. We offer a proof and experimental evidence that methods that look for epistasis only at loci that have main effects may not be justified. Given the computational efficiency of the algorithm, we applied the method to a rat data set and mouse data set, with at least 10,000 loci and 1,000 samples each, using the standard Cartesian model and the XOR model to explore body mass index.

Results: This study reveals that although many of the loci found to exhibit significant statistical epistasis overlap between models in rats, the pairs are mostly distinct. Further, the XOR model found greater evidence for statistical epistasis in many more pairs of loci in both data sets with almost all significant epistasis in mice identified using XOR. In the rat data set, loci involved in epistasis under the XOR model are enriched for biologically relevant pathways.

Conclusion: Our results in both species show that many biologically relevant epistatic relationships would have been undetected if only one interaction model was applied, providing evidence that varied interaction models should be implemented to explore epistatic interactions that occur in living systems.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
交互模型很重要:一种高效、灵活的计算框架,用于特定模型的表观性研究。
目的:外显子效应(两个或多个基因之间的相互作用)是遗传学研究中不可或缺的一部分,它存在于整个自然界中。然而,由于大多数方法主要关注单病灶效应,而分析所有成对和高阶相互作用需要大量计算资源,因此很少对其进行充分探索。此外,现有的外显子检测方法只考虑相互作用项的笛卡尔(乘法)模型。这很可能具有局限性,因为表观相互作用会在遗传位点之间演变出各种关系,有些关系很复杂,而且不是线性可分的:方法:我们针对表观遗传的标准回归模型提出了交互作用系数的新算法,这种算法允许为基因座的交互作用项建立多种不同的模型,并能有效地使用内存。这些算法适用于双向和三向外显率,并可推广到更高阶的外显率。我们还提供了交互作用系数的统计检验。我们还提出了一种基于矩阵的高效算法,用于双向外显率的置换检验。我们提供了证明和实验证据,说明只在具有主效应的位点上寻找表观性的方法可能是不合理的。鉴于该算法的计算效率,我们将该方法应用于大鼠数据集和小鼠数据集,每个数据集至少有 10,000 个位点和 1,000 个样本,使用标准笛卡尔模型和 XOR 模型来探讨体重指数:研究结果表明,虽然在大鼠中发现的许多基因位点在不同模型之间有显著的统计外显重叠,但这些位点对大多是不同的。此外,在两个数据集中,XOR 模型在更多的基因位点对中发现了更多的统计外显性证据,在小鼠中几乎所有的显著外显性都是通过 XOR 发现的。在大鼠的数据集中,XOR 模型中涉及外显的基因位点都富集在生物相关的通路上:我们在两个物种中的研究结果表明,如果只采用一种相互作用模型,许多与生物相关的表观关系可能不会被发现,这证明应该采用不同的相互作用模型来探索生命系统中发生的表观相互作用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Biodata Mining
Biodata Mining MATHEMATICAL & COMPUTATIONAL BIOLOGY-
CiteScore
7.90
自引率
0.00%
发文量
28
审稿时长
23 weeks
期刊介绍: BioData Mining is an open access, open peer-reviewed journal encompassing research on all aspects of data mining applied to high-dimensional biological and biomedical data, focusing on computational aspects of knowledge discovery from large-scale genetic, transcriptomic, genomic, proteomic, and metabolomic data. Topical areas include, but are not limited to: -Development, evaluation, and application of novel data mining and machine learning algorithms. -Adaptation, evaluation, and application of traditional data mining and machine learning algorithms. -Open-source software for the application of data mining and machine learning algorithms. -Design, development and integration of databases, software and web services for the storage, management, retrieval, and analysis of data from large scale studies. -Pre-processing, post-processing, modeling, and interpretation of data mining and machine learning results for biological interpretation and knowledge discovery.
期刊最新文献
Deep learning-based Emergency Department In-hospital Cardiac Arrest Score (Deep EDICAS) for early prediction of cardiac arrest and cardiopulmonary resuscitation in the emergency department. Supervised multiple kernel learning approaches for multi-omics data integration. Transcriptome-based network analysis related to regulatory T cells infiltration identified RCN1 as a potential biomarker for prognosis in clear cell renal cell carcinoma. Deciphering the tissue-specific functional effect of Alzheimer risk SNPs with deep genome annotation. Investigating potential drug targets for IgA nephropathy and membranous nephropathy through multi-queue plasma protein analysis: a Mendelian randomization study based on SMR and co-localization analysis.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1