RNA-ModX: a multilabel prediction and interpretation framework for RNA modifications.

IF 7.7 2区生物学 Q1 BIOCHEMICAL RESEARCH METHODS Briefings in bioinformatics Pub Date : 2024-11-22 DOI:10.1093/bib/bbae688

Chelsea Chen Yuge, Ee Soon Hang, Madasamy Ravi Nadar Mamtha, Shashikant Vishwakarma, Sijia Wang, Cheng Wang, Nguyen Quoc Khanh Le

{"title":"RNA-ModX: a multilabel prediction and interpretation framework for RNA modifications.","authors":"Chelsea Chen Yuge, Ee Soon Hang, Madasamy Ravi Nadar Mamtha, Shashikant Vishwakarma, Sijia Wang, Cheng Wang, Nguyen Quoc Khanh Le","doi":"10.1093/bib/bbae688","DOIUrl":null,"url":null,"abstract":"<p><p>Accurate prediction of RNA modifications holds profound implications for elucidating RNA function and mechanism, with potential applications in drug development. Here, the RNA-ModX presents a highly precise predictive model designed to forecast post-transcriptional RNA modifications, complemented by a user-friendly web application tailored for seamless utilization by future researchers. To achieve exceptional accuracy, the RNA-ModX systematically explored a range of machine learning models, including Long Short-Term Memory (LSTM), Gated Recurrent Unit, and Transformer-based architectures. The model underwent rigorous testing using a dataset comprising RNA sequences containing the four fundamental nucleotides (A, C, G, U) and spanning 12 prevalent modification classes (m6A, m1A, m5C, m5U, m6Am, m7G, Ψ, I, Am, Cm, Gm, and Um), with sequences of length 1001 nucleotides. Notably, the LSTM model, augmented with 3-mer encoding, demonstrated the highest level of model accuracy. Furthermore, Local Interpretable Model-Agnostic Explanations were employed to facilitate result interpretation, enhancing the transparency and interpretability of the model's predictions. In conjunction with the model development, a user-friendly web application was meticulously crafted, featuring an intuitive interface for researchers to effortlessly upload RNA sequences. Upon submission, the model executes in the backend, generating predictions which are seamlessly presented to the user in a coherent manner. This integration of cutting-edge predictive modeling with a user-centric interface signifies a significant step forward in facilitating the exploration and utilization of RNA modification prediction technologies by the broader research community.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 1","pages":""},"PeriodicalIF":7.7000,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11684893/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Briefings in bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/bib/bbae688","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}

引用次数: 0

Abstract

Accurate prediction of RNA modifications holds profound implications for elucidating RNA function and mechanism, with potential applications in drug development. Here, the RNA-ModX presents a highly precise predictive model designed to forecast post-transcriptional RNA modifications, complemented by a user-friendly web application tailored for seamless utilization by future researchers. To achieve exceptional accuracy, the RNA-ModX systematically explored a range of machine learning models, including Long Short-Term Memory (LSTM), Gated Recurrent Unit, and Transformer-based architectures. The model underwent rigorous testing using a dataset comprising RNA sequences containing the four fundamental nucleotides (A, C, G, U) and spanning 12 prevalent modification classes (m6A, m1A, m5C, m5U, m6Am, m7G, Ψ, I, Am, Cm, Gm, and Um), with sequences of length 1001 nucleotides. Notably, the LSTM model, augmented with 3-mer encoding, demonstrated the highest level of model accuracy. Furthermore, Local Interpretable Model-Agnostic Explanations were employed to facilitate result interpretation, enhancing the transparency and interpretability of the model's predictions. In conjunction with the model development, a user-friendly web application was meticulously crafted, featuring an intuitive interface for researchers to effortlessly upload RNA sequences. Upon submission, the model executes in the backend, generating predictions which are seamlessly presented to the user in a coherent manner. This integration of cutting-edge predictive modeling with a user-centric interface signifies a significant step forward in facilitating the exploration and utilization of RNA modification prediction technologies by the broader research community.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

RNA- modx: RNA修饰的多标签预测和解释框架。

准确预测RNA修饰对阐明RNA的功能和机制具有深远的意义，在药物开发中具有潜在的应用价值。在这里，RNA- modx提出了一个高度精确的预测模型，旨在预测转录后RNA修饰，辅以用户友好的web应用程序，为未来的研究人员量身定制无缝使用。为了达到卓越的准确性，RNA-ModX系统地探索了一系列机器学习模型，包括长短期记忆（LSTM）、门控循环单元和基于变压器的架构。该模型使用包含包含四种基本核苷酸（a， C， G， U）的RNA序列的数据集进行了严格的测试，这些序列包含12种常见的修饰类（m6A, m1A, m5C, m5U, m6Am, m7G， Ψ， I, Am, Cm， Gm和Um），序列长度为1001个核苷酸。值得注意的是，使用3-mer编码增强的LSTM模型显示出最高水平的模型精度。此外，采用局部可解释模型不可知论解释（Local Interpretable model - agnostic interpretation）促进结果解释，提高模型预测的透明度和可解释性。与模型开发相结合，精心制作了一个用户友好的web应用程序，具有直观的界面，供研究人员毫不费力地上传RNA序列。提交后，模型在后端执行，生成以一致的方式无缝呈现给用户的预测。将尖端预测建模与以用户为中心的界面相结合，标志着更广泛的研究界在促进RNA修饰预测技术的探索和利用方面迈出了重要的一步。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Briefings in bioinformatics 生物-生化研究方法

CiteScore

13.20

自引率

13.70%

发文量

549

审稿时长

6 months

期刊介绍： Briefings in Bioinformatics is an international journal serving as a platform for researchers and educators in the life sciences. It also appeals to mathematicians, statisticians, and computer scientists applying their expertise to biological challenges. The journal focuses on reviews tailored for users of databases and analytical tools in contemporary genetics, molecular and systems biology. It stands out by offering practical assistance and guidance to non-specialists in computerized methodologies. Covering a wide range from introductory concepts to specific protocols and analyses, the papers address bacterial, plant, fungal, animal, and human data. The journal's detailed subject areas include genetic studies of phenotypes and genotypes, mapping, DNA sequencing, expression profiling, gene expression studies, microarrays, alignment methods, protein profiles and HMMs, lipids, metabolic and signaling pathways, structure determination and function prediction, phylogenetic studies, and education and training.