{"title":"GenRepAI: Utilizing Artificial Intelligence to Identify Repeats in Genomic Suffix Trees","authors":"Freeson Kaniwa","doi":"10.2174/0115748936303435240702112205","DOIUrl":null,"url":null,"abstract":"Background: The human genome is densely populated with repetitive DNA sequences that play crucial roles in genomic functions and structures but are also implicated in over 40 human diseases. The computational challenge of identifying and characterizing these repeats is significant due to the complexity and size of the genome, which are overwhelming traditional algorithms. Methods: To address these challenges, we propose GenRepAI, a deep learning framework to navigate and analyze genomic suffix trees. GenRepAI employs supervised machine learning classifiers trained on labeled datasets of repeat annotations and unsupervised anomaly detection to identify novel repeat sequences. The models are trained using convolutional neural networks (CNNs), long short-term memory networks (LSTMs), and vision transformers to classify and annotate repeats within the human genome. Results: GenRepAI is designed to comprehensively profile repeats that underlie various neurological diseases, allowing researchers to identify pathogenic expansions. The framework will integrate into existing genomic analysis pipelines, with the capability to screen patient genomes and highlight potential causal variants for further validation. Conclusion: GenRepAI is set to become a foundational tool in genomics, leveraging artificial intelligence to enhance the characterization of repetitive sequences. It promises significant advancements in the molecular diagnosis of repeat expansion disorders and contributes to a deeper understanding of genomic structure and function, with broad applications in personalized medicine.","PeriodicalId":10801,"journal":{"name":"Current Bioinformatics","volume":"17 1","pages":""},"PeriodicalIF":2.4000,"publicationDate":"2024-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Current Bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.2174/0115748936303435240702112205","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0
Abstract
Background: The human genome is densely populated with repetitive DNA sequences that play crucial roles in genomic functions and structures but are also implicated in over 40 human diseases. The computational challenge of identifying and characterizing these repeats is significant due to the complexity and size of the genome, which are overwhelming traditional algorithms. Methods: To address these challenges, we propose GenRepAI, a deep learning framework to navigate and analyze genomic suffix trees. GenRepAI employs supervised machine learning classifiers trained on labeled datasets of repeat annotations and unsupervised anomaly detection to identify novel repeat sequences. The models are trained using convolutional neural networks (CNNs), long short-term memory networks (LSTMs), and vision transformers to classify and annotate repeats within the human genome. Results: GenRepAI is designed to comprehensively profile repeats that underlie various neurological diseases, allowing researchers to identify pathogenic expansions. The framework will integrate into existing genomic analysis pipelines, with the capability to screen patient genomes and highlight potential causal variants for further validation. Conclusion: GenRepAI is set to become a foundational tool in genomics, leveraging artificial intelligence to enhance the characterization of repetitive sequences. It promises significant advancements in the molecular diagnosis of repeat expansion disorders and contributes to a deeper understanding of genomic structure and function, with broad applications in personalized medicine.
期刊介绍:
Current Bioinformatics aims to publish all the latest and outstanding developments in bioinformatics. Each issue contains a series of timely, in-depth/mini-reviews, research papers and guest edited thematic issues written by leaders in the field, covering a wide range of the integration of biology with computer and information science.
The journal focuses on advances in computational molecular/structural biology, encompassing areas such as computing in biomedicine and genomics, computational proteomics and systems biology, and metabolic pathway engineering. Developments in these fields have direct implications on key issues related to health care, medicine, genetic disorders, development of agricultural products, renewable energy, environmental protection, etc.