ML-GAP:使用自动编码器和数据增强的机器学习增强型基因组分析管道。

IF 2.8 3区 生物学 Q2 GENETICS & HEREDITY Frontiers in Genetics Pub Date : 2024-09-25 eCollection Date: 2024-01-01 DOI:10.3389/fgene.2024.1442759
Melih Agraz, Dincer Goksuluk, Peng Zhang, Bum-Rak Choi, Richard T Clements, Gaurav Choudhary, George Em Karniadakis
{"title":"ML-GAP:使用自动编码器和数据增强的机器学习增强型基因组分析管道。","authors":"Melih Agraz, Dincer Goksuluk, Peng Zhang, Bum-Rak Choi, Richard T Clements, Gaurav Choudhary, George Em Karniadakis","doi":"10.3389/fgene.2024.1442759","DOIUrl":null,"url":null,"abstract":"<p><strong>Introduction: </strong>The advent of RNA sequencing (RNA-Seq) has significantly advanced our understanding of the transcriptomic landscape, revealing intricate gene expression patterns across biological states and conditions. However, the complexity and volume of RNA-Seq data pose challenges in identifying differentially expressed genes (DEGs), critical for understanding the molecular basis of diseases like cancer.</p><p><strong>Methods: </strong>We introduce a novel Machine Learning-Enhanced Genomic Data Analysis Pipeline (ML-GAP) that incorporates autoencoders and innovative data augmentation strategies, notably the MixUp method, to overcome these challenges. By creating synthetic training examples through a linear combination of input pairs and their labels, MixUp significantly enhances the model's ability to generalize from the training data to unseen examples.</p><p><strong>Results: </strong>Our results demonstrate the ML-GAP's superiority in accuracy, efficiency, and insights, particularly crediting the MixUp method for its substantial contribution to the pipeline's effectiveness, advancing greatly genomic data analysis and setting a new standard in the field.</p><p><strong>Discussion: </strong>This, in turn, suggests that ML-GAP has the potential to perform more accurate detection of DEGs but also offers new avenues for therapeutic intervention and research. By integrating explainable artificial intelligence (XAI) techniques, ML-GAP ensures a transparent and interpretable analysis, highlighting the significance of identified genetic markers.</p>","PeriodicalId":12750,"journal":{"name":"Frontiers in Genetics","volume":null,"pages":null},"PeriodicalIF":2.8000,"publicationDate":"2024-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11467662/pdf/","citationCount":"0","resultStr":"{\"title\":\"ML-GAP: machine learning-enhanced genomic analysis pipeline using autoencoders and data augmentation.\",\"authors\":\"Melih Agraz, Dincer Goksuluk, Peng Zhang, Bum-Rak Choi, Richard T Clements, Gaurav Choudhary, George Em Karniadakis\",\"doi\":\"10.3389/fgene.2024.1442759\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Introduction: </strong>The advent of RNA sequencing (RNA-Seq) has significantly advanced our understanding of the transcriptomic landscape, revealing intricate gene expression patterns across biological states and conditions. However, the complexity and volume of RNA-Seq data pose challenges in identifying differentially expressed genes (DEGs), critical for understanding the molecular basis of diseases like cancer.</p><p><strong>Methods: </strong>We introduce a novel Machine Learning-Enhanced Genomic Data Analysis Pipeline (ML-GAP) that incorporates autoencoders and innovative data augmentation strategies, notably the MixUp method, to overcome these challenges. By creating synthetic training examples through a linear combination of input pairs and their labels, MixUp significantly enhances the model's ability to generalize from the training data to unseen examples.</p><p><strong>Results: </strong>Our results demonstrate the ML-GAP's superiority in accuracy, efficiency, and insights, particularly crediting the MixUp method for its substantial contribution to the pipeline's effectiveness, advancing greatly genomic data analysis and setting a new standard in the field.</p><p><strong>Discussion: </strong>This, in turn, suggests that ML-GAP has the potential to perform more accurate detection of DEGs but also offers new avenues for therapeutic intervention and research. By integrating explainable artificial intelligence (XAI) techniques, ML-GAP ensures a transparent and interpretable analysis, highlighting the significance of identified genetic markers.</p>\",\"PeriodicalId\":12750,\"journal\":{\"name\":\"Frontiers in Genetics\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":2.8000,\"publicationDate\":\"2024-09-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11467662/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Frontiers in Genetics\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.3389/fgene.2024.1442759\",\"RegionNum\":3,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2024/1/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q2\",\"JCRName\":\"GENETICS & HEREDITY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in Genetics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.3389/fgene.2024.1442759","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}
引用次数: 0

摘要

导言:RNA 测序(RNA-Seq)的出现极大地推动了我们对转录组图谱的了解,揭示了不同生物状态和条件下错综复杂的基因表达模式。然而,RNA-Seq 数据的复杂性和数量给识别差异表达基因(DEGs)带来了挑战,而差异表达基因对于了解癌症等疾病的分子基础至关重要:我们介绍了一种新颖的机器学习增强基因组数据分析管道(ML-GAP),它结合了自动编码器和创新的数据增强策略,特别是 MixUp 方法,以克服这些挑战。通过对输入数据对及其标签进行线性组合来创建合成训练示例,MixUp 极大地增强了模型从训练数据泛化到未见示例的能力:结果:我们的研究结果证明了 ML-GAP 在准确性、效率和洞察力方面的优越性,尤其是 MixUp 方法对管道有效性的巨大贡献,大大推进了基因组数据分析,并为该领域树立了新的标准:这反过来表明,ML-GAP 有潜力对 DEGs 进行更准确的检测,同时也为治疗干预和研究提供了新的途径。通过整合可解释人工智能(XAI)技术,ML-GAP 确保了分析的透明性和可解释性,突出了已识别遗传标记的重要性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
ML-GAP: machine learning-enhanced genomic analysis pipeline using autoencoders and data augmentation.

Introduction: The advent of RNA sequencing (RNA-Seq) has significantly advanced our understanding of the transcriptomic landscape, revealing intricate gene expression patterns across biological states and conditions. However, the complexity and volume of RNA-Seq data pose challenges in identifying differentially expressed genes (DEGs), critical for understanding the molecular basis of diseases like cancer.

Methods: We introduce a novel Machine Learning-Enhanced Genomic Data Analysis Pipeline (ML-GAP) that incorporates autoencoders and innovative data augmentation strategies, notably the MixUp method, to overcome these challenges. By creating synthetic training examples through a linear combination of input pairs and their labels, MixUp significantly enhances the model's ability to generalize from the training data to unseen examples.

Results: Our results demonstrate the ML-GAP's superiority in accuracy, efficiency, and insights, particularly crediting the MixUp method for its substantial contribution to the pipeline's effectiveness, advancing greatly genomic data analysis and setting a new standard in the field.

Discussion: This, in turn, suggests that ML-GAP has the potential to perform more accurate detection of DEGs but also offers new avenues for therapeutic intervention and research. By integrating explainable artificial intelligence (XAI) techniques, ML-GAP ensures a transparent and interpretable analysis, highlighting the significance of identified genetic markers.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Frontiers in Genetics
Frontiers in Genetics Biochemistry, Genetics and Molecular Biology-Molecular Medicine
CiteScore
5.50
自引率
8.10%
发文量
3491
审稿时长
14 weeks
期刊介绍: Frontiers in Genetics publishes rigorously peer-reviewed research on genes and genomes relating to all the domains of life, from humans to plants to livestock and other model organisms. Led by an outstanding Editorial Board of the world’s leading experts, this multidisciplinary, open-access journal is at the forefront of communicating cutting-edge research to researchers, academics, clinicians, policy makers and the public. The study of inheritance and the impact of the genome on various biological processes is well documented. However, the majority of discoveries are still to come. A new era is seeing major developments in the function and variability of the genome, the use of genetic and genomic tools and the analysis of the genetic basis of various biological phenomena.
期刊最新文献
Identification of m5C-Related gene diagnostic biomarkers for sepsis: a machine learning study. Number of human protein interactions correlates with structural, but not regulatory conservation of the respective genes. Comparison of blood parameters in two genetically different groups of horses for functional longevity in show jumping. Editorial: Non-coding RNAs and human diseases volume 2 -long non-coding RNAs and pathogenesis of human disease. Editorial: Epigenetic modification in neurological diseases.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1