评估基于浮雕的算法在检测高阶交互作用方面的局限性。

IF 4 3区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Biodata Mining Pub Date : 2024-10-01 DOI:10.1186/s13040-024-00390-0
Philip J Freda, Suyu Ye, Robert Zhang, Jason H Moore, Ryan J Urbanowicz
{"title":"评估基于浮雕的算法在检测高阶交互作用方面的局限性。","authors":"Philip J Freda, Suyu Ye, Robert Zhang, Jason H Moore, Ryan J Urbanowicz","doi":"10.1186/s13040-024-00390-0","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Epistasis, the interaction between genetic loci where the effect of one locus is influenced by one or more other loci, plays a crucial role in the genetic architecture of complex traits. However, as the number of loci considered increases, the investigation of epistasis becomes exponentially more complex, making the selection of key features vital for effective downstream analyses. Relief-Based Algorithms (RBAs) are often employed for this purpose due to their reputation as \"interaction-sensitive\" algorithms and uniquely non-exhaustive approach. However, the limitations of RBAs in detecting interactions, particularly those involving multiple loci, have not been thoroughly defined. This study seeks to address this gap by evaluating the efficiency of RBAs in detecting higher-order epistatic interactions. Motivated by previous findings that suggest some RBAs may rank predictive features involved in higher-order epistasis negatively, we explore the potential of absolute value ranking of RBA feature weights as an alternative approach for capturing complex interactions. In this study, we assess the performance of ReliefF, MultiSURF, and MultiSURFstar on simulated genetic datasets that model various patterns of genotype-phenotype associations, including 2-way to 5-way genetic interactions, and compare their performance to two control methods: a random shuffle and mutual information.</p><p><strong>Results: </strong>Our findings indicate that while RBAs effectively identify lower-order (2 to 3-way) interactions, their capability to detect higher-order interactions is significantly limited, primarily by large feature count but also by signal noise. Specifically, we observe that RBAs are successful in detecting fully penetrant 4-way XOR interactions using an absolute value ranking approach, but this is restricted to datasets with only 20 total features.</p><p><strong>Conclusions: </strong>These results highlight the inherent limitations of current RBAs and underscore the need for the development of Relief-based approaches with enhanced detection capabilities for the investigation of epistasis, particularly in datasets with large feature counts and complex higher-order interactions.</p>","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":null,"pages":null},"PeriodicalIF":4.0000,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11443793/pdf/","citationCount":"0","resultStr":"{\"title\":\"Assessing the limitations of relief-based algorithms in detecting higher-order interactions.\",\"authors\":\"Philip J Freda, Suyu Ye, Robert Zhang, Jason H Moore, Ryan J Urbanowicz\",\"doi\":\"10.1186/s13040-024-00390-0\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>Epistasis, the interaction between genetic loci where the effect of one locus is influenced by one or more other loci, plays a crucial role in the genetic architecture of complex traits. However, as the number of loci considered increases, the investigation of epistasis becomes exponentially more complex, making the selection of key features vital for effective downstream analyses. Relief-Based Algorithms (RBAs) are often employed for this purpose due to their reputation as \\\"interaction-sensitive\\\" algorithms and uniquely non-exhaustive approach. However, the limitations of RBAs in detecting interactions, particularly those involving multiple loci, have not been thoroughly defined. This study seeks to address this gap by evaluating the efficiency of RBAs in detecting higher-order epistatic interactions. Motivated by previous findings that suggest some RBAs may rank predictive features involved in higher-order epistasis negatively, we explore the potential of absolute value ranking of RBA feature weights as an alternative approach for capturing complex interactions. In this study, we assess the performance of ReliefF, MultiSURF, and MultiSURFstar on simulated genetic datasets that model various patterns of genotype-phenotype associations, including 2-way to 5-way genetic interactions, and compare their performance to two control methods: a random shuffle and mutual information.</p><p><strong>Results: </strong>Our findings indicate that while RBAs effectively identify lower-order (2 to 3-way) interactions, their capability to detect higher-order interactions is significantly limited, primarily by large feature count but also by signal noise. Specifically, we observe that RBAs are successful in detecting fully penetrant 4-way XOR interactions using an absolute value ranking approach, but this is restricted to datasets with only 20 total features.</p><p><strong>Conclusions: </strong>These results highlight the inherent limitations of current RBAs and underscore the need for the development of Relief-based approaches with enhanced detection capabilities for the investigation of epistasis, particularly in datasets with large feature counts and complex higher-order interactions.</p>\",\"PeriodicalId\":48947,\"journal\":{\"name\":\"Biodata Mining\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":4.0000,\"publicationDate\":\"2024-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11443793/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Biodata Mining\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1186/s13040-024-00390-0\",\"RegionNum\":3,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"MATHEMATICAL & COMPUTATIONAL BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biodata Mining","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s13040-024-00390-0","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}
引用次数: 0

摘要

背景:外显性是遗传位点之间的相互作用,其中一个位点的效应受一个或多个其他位点的影响,在复杂性状的遗传结构中起着至关重要的作用。然而,随着所考虑的基因位点数量的增加,外显性的研究也变得更加复杂,因此选择关键特征对于有效的下游分析至关重要。基于救济的算法(RBA)因其 "交互敏感 "算法的美誉和独特的非穷举方法而经常被用于此目的。然而,RBA 在检测相互作用,尤其是涉及多个位点的相互作用方面的局限性尚未得到彻底界定。本研究试图通过评估 RBA 在检测高阶表观相互作用方面的效率来弥补这一不足。之前的研究结果表明,一些 RBA 可能会对涉及高阶表观相互作用的预测特征进行负排序,受此启发,我们探索了 RBA 特征权重绝对值排序作为捕捉复杂相互作用的另一种方法的潜力。在这项研究中,我们评估了 ReliefF、MultiSURF 和 MultiSURFstar 在模拟遗传数据集上的表现,这些数据集模拟了基因型与表型关联的各种模式,包括 2 向到 5 向遗传相互作用,并将它们的表现与两种对照方法(随机洗牌和互信息)进行了比较:我们的研究结果表明,虽然 RBA 能有效识别低阶(2 至 3 向)相互作用,但其检测高阶相互作用的能力却受到很大限制,这主要是由于特征数量较大,同时也受到信号噪声的影响。具体来说,我们观察到,使用绝对值排序方法,RBA 可以成功检测出完全穿透的 4 向 XOR 相互作用,但这仅限于总特征数只有 20 个的数据集:这些结果凸显了当前 RBAs 的固有局限性,并强调了开发基于 Relief 的方法的必要性,这种方法具有更强的检测能力,可用于研究表观性,特别是在具有大量特征和复杂高阶相互作用的数据集中。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Assessing the limitations of relief-based algorithms in detecting higher-order interactions.

Background: Epistasis, the interaction between genetic loci where the effect of one locus is influenced by one or more other loci, plays a crucial role in the genetic architecture of complex traits. However, as the number of loci considered increases, the investigation of epistasis becomes exponentially more complex, making the selection of key features vital for effective downstream analyses. Relief-Based Algorithms (RBAs) are often employed for this purpose due to their reputation as "interaction-sensitive" algorithms and uniquely non-exhaustive approach. However, the limitations of RBAs in detecting interactions, particularly those involving multiple loci, have not been thoroughly defined. This study seeks to address this gap by evaluating the efficiency of RBAs in detecting higher-order epistatic interactions. Motivated by previous findings that suggest some RBAs may rank predictive features involved in higher-order epistasis negatively, we explore the potential of absolute value ranking of RBA feature weights as an alternative approach for capturing complex interactions. In this study, we assess the performance of ReliefF, MultiSURF, and MultiSURFstar on simulated genetic datasets that model various patterns of genotype-phenotype associations, including 2-way to 5-way genetic interactions, and compare their performance to two control methods: a random shuffle and mutual information.

Results: Our findings indicate that while RBAs effectively identify lower-order (2 to 3-way) interactions, their capability to detect higher-order interactions is significantly limited, primarily by large feature count but also by signal noise. Specifically, we observe that RBAs are successful in detecting fully penetrant 4-way XOR interactions using an absolute value ranking approach, but this is restricted to datasets with only 20 total features.

Conclusions: These results highlight the inherent limitations of current RBAs and underscore the need for the development of Relief-based approaches with enhanced detection capabilities for the investigation of epistasis, particularly in datasets with large feature counts and complex higher-order interactions.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Biodata Mining
Biodata Mining MATHEMATICAL & COMPUTATIONAL BIOLOGY-
CiteScore
7.90
自引率
0.00%
发文量
28
审稿时长
23 weeks
期刊介绍: BioData Mining is an open access, open peer-reviewed journal encompassing research on all aspects of data mining applied to high-dimensional biological and biomedical data, focusing on computational aspects of knowledge discovery from large-scale genetic, transcriptomic, genomic, proteomic, and metabolomic data. Topical areas include, but are not limited to: -Development, evaluation, and application of novel data mining and machine learning algorithms. -Adaptation, evaluation, and application of traditional data mining and machine learning algorithms. -Open-source software for the application of data mining and machine learning algorithms. -Design, development and integration of databases, software and web services for the storage, management, retrieval, and analysis of data from large scale studies. -Pre-processing, post-processing, modeling, and interpretation of data mining and machine learning results for biological interpretation and knowledge discovery.
期刊最新文献
Deep joint learning diagnosis of Alzheimer's disease based on multimodal feature fusion. Modeling heterogeneity of Sudanese hospital stay in neonatal and maternal unit: non-parametric random effect models with Gamma distribution. Ensemble feature selection and tabular data augmentation with generative adversarial networks to enhance cutaneous melanoma identification and interpretability. Priority-Elastic net for binary disease outcome prediction based on multi-omics data. A regularized Cox hierarchical model for incorporating annotation information in predictive omic studies.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1