GenRepAI: Utilizing Artificial Intelligence to Identify Repeats in Genomic Suffix Trees

IF 2.4 3区 生物学 Q3 BIOCHEMICAL RESEARCH METHODS Current Bioinformatics Pub Date : 2024-07-10 DOI:10.2174/0115748936303435240702112205
Freeson Kaniwa
{"title":"GenRepAI: Utilizing Artificial Intelligence to Identify Repeats in Genomic Suffix Trees","authors":"Freeson Kaniwa","doi":"10.2174/0115748936303435240702112205","DOIUrl":null,"url":null,"abstract":"Background: The human genome is densely populated with repetitive DNA sequences that play crucial roles in genomic functions and structures but are also implicated in over 40 human diseases. The computational challenge of identifying and characterizing these repeats is significant due to the complexity and size of the genome, which are overwhelming traditional algorithms. Methods: To address these challenges, we propose GenRepAI, a deep learning framework to navigate and analyze genomic suffix trees. GenRepAI employs supervised machine learning classifiers trained on labeled datasets of repeat annotations and unsupervised anomaly detection to identify novel repeat sequences. The models are trained using convolutional neural networks (CNNs), long short-term memory networks (LSTMs), and vision transformers to classify and annotate repeats within the human genome. Results: GenRepAI is designed to comprehensively profile repeats that underlie various neurological diseases, allowing researchers to identify pathogenic expansions. The framework will integrate into existing genomic analysis pipelines, with the capability to screen patient genomes and highlight potential causal variants for further validation. Conclusion: GenRepAI is set to become a foundational tool in genomics, leveraging artificial intelligence to enhance the characterization of repetitive sequences. It promises significant advancements in the molecular diagnosis of repeat expansion disorders and contributes to a deeper understanding of genomic structure and function, with broad applications in personalized medicine.","PeriodicalId":10801,"journal":{"name":"Current Bioinformatics","volume":"17 1","pages":""},"PeriodicalIF":2.4000,"publicationDate":"2024-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Current Bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.2174/0115748936303435240702112205","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0

Abstract

Background: The human genome is densely populated with repetitive DNA sequences that play crucial roles in genomic functions and structures but are also implicated in over 40 human diseases. The computational challenge of identifying and characterizing these repeats is significant due to the complexity and size of the genome, which are overwhelming traditional algorithms. Methods: To address these challenges, we propose GenRepAI, a deep learning framework to navigate and analyze genomic suffix trees. GenRepAI employs supervised machine learning classifiers trained on labeled datasets of repeat annotations and unsupervised anomaly detection to identify novel repeat sequences. The models are trained using convolutional neural networks (CNNs), long short-term memory networks (LSTMs), and vision transformers to classify and annotate repeats within the human genome. Results: GenRepAI is designed to comprehensively profile repeats that underlie various neurological diseases, allowing researchers to identify pathogenic expansions. The framework will integrate into existing genomic analysis pipelines, with the capability to screen patient genomes and highlight potential causal variants for further validation. Conclusion: GenRepAI is set to become a foundational tool in genomics, leveraging artificial intelligence to enhance the characterization of repetitive sequences. It promises significant advancements in the molecular diagnosis of repeat expansion disorders and contributes to a deeper understanding of genomic structure and function, with broad applications in personalized medicine.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
GenRepAI:利用人工智能识别基因组后缀树中的重复序列
背景:人类基因组中存在大量重复的 DNA 序列,它们在基因组功能和结构中发挥着至关重要的作用,同时也与 40 多种人类疾病有关。由于基因组的复杂性和规模,识别和表征这些重复序列的计算难度很大,传统算法难以承受。方法:为了应对这些挑战,我们提出了 GenRepAI,这是一个导航和分析基因组后缀树的深度学习框架。GenRepAI 采用在重复注释的标记数据集上训练的监督机器学习分类器和无监督异常检测来识别新的重复序列。模型使用卷积神经网络(CNN)、长短期记忆网络(LSTM)和视觉转换器进行训练,以对人类基因组中的重复序列进行分类和注释。结果GenRepAI 旨在全面剖析导致各种神经系统疾病的重复序列,使研究人员能够识别致病性扩展。该框架将集成到现有的基因组分析管道中,能够筛选患者基因组并突出潜在的因果变异,以便进一步验证。结论GenRepAI 将成为基因组学的基础工具,利用人工智能加强重复序列的特征描述。它有望在重复扩增疾病的分子诊断方面取得重大进展,并有助于加深对基因组结构和功能的理解,在个性化医疗方面有着广泛的应用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Current Bioinformatics
Current Bioinformatics 生物-生化研究方法
CiteScore
6.60
自引率
2.50%
发文量
77
审稿时长
>12 weeks
期刊介绍: Current Bioinformatics aims to publish all the latest and outstanding developments in bioinformatics. Each issue contains a series of timely, in-depth/mini-reviews, research papers and guest edited thematic issues written by leaders in the field, covering a wide range of the integration of biology with computer and information science. The journal focuses on advances in computational molecular/structural biology, encompassing areas such as computing in biomedicine and genomics, computational proteomics and systems biology, and metabolic pathway engineering. Developments in these fields have direct implications on key issues related to health care, medicine, genetic disorders, development of agricultural products, renewable energy, environmental protection, etc.
期刊最新文献
Mining Transcriptional Data for Precision Medicine: Bioinformatics Insights into Inflammatory Bowel Disease Prediction of miRNA-disease Associations by Deep Matrix Decomposition Method based on Fused Similarity Information TCM@MPXV: A Resource for Treating Monkeypox Patients in Traditional Chinese Medicine Identifying Key Clinical Indicators Associated with the Risk of Death in Hospitalized COVID-19 Patients A Parallel Implementation for Large-Scale TSR-based 3D Structural Comparisons of Protein and Amino Acid
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1