GenoM7GNet: An Efficient N7-Methylguanosine Site Prediction Approach Based on a Nucleotide Language Model

IF 3.4 3区生物学 Q2 BIOCHEMICAL RESEARCH METHODS IEEE/ACM Transactions on Computational Biology and Bioinformatics Pub Date : 2024-09-20 DOI:10.1109/TCBB.2024.3459870

Chuang Li;Heshi Wang;Yanhua Wen;Rui Yin;Xiangxiang Zeng;Keqin Li

{"title":"GenoM7GNet: An Efficient N7-Methylguanosine Site Prediction Approach Based on a Nucleotide Language Model","authors":"Chuang Li;Heshi Wang;Yanhua Wen;Rui Yin;Xiangxiang Zeng;Keqin Li","doi":"10.1109/TCBB.2024.3459870","DOIUrl":null,"url":null,"abstract":"N\n<inline-formula><tex-math>$^{7}$</tex-math></inline-formula>\n-methylguanosine (m7G), one of the mainstream post-transcriptional RNA modifications, occupies an exceedingly significant place in medical treatments. However, classic approaches for identifying m7G sites are costly both in time and equipment. Meanwhile, the existing machine learning methods extract limited hidden information from RNA sequences, thus making it difficult to improve the accuracy. Therefore, we put forward to a deep learning network, called “GenoM7GNet,” for m7G site identification. This model utilizes a Bidirectional Encoder Representation from Transformers (BERT) and is pretrained on nucleotide sequences data to capture hidden patterns from RNA sequences for m7G site prediction. Moreover, through detailed comparative experiments with various deep learning models, we discovered that the one-dimensional convolutional neural network (CNN) exhibits outstanding performance in sequence feature learning and classification. The proposed GenoM7GNet model achieved 0.953in accuracy, 0.932in sensitivity, 0.976in specificity, 0.907in Matthews Correlation Coefficient and 0.984in Area Under the receiver operating characteristic Curve on performance evaluation. Extensive experimental results further prove that our GenoM7GNet model markedly surpasses other state-of-the-art models in predicting m7G sites, exhibiting high computing performance.","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"21 6","pages":"2258-2268"},"PeriodicalIF":3.4000,"publicationDate":"2024-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10684781/","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}

引用次数: 0

Abstract

$^{7}$

-methylguanosine (m7G), one of the mainstream post-transcriptional RNA modifications, occupies an exceedingly significant place in medical treatments. However, classic approaches for identifying m7G sites are costly both in time and equipment. Meanwhile, the existing machine learning methods extract limited hidden information from RNA sequences, thus making it difficult to improve the accuracy. Therefore, we put forward to a deep learning network, called “GenoM7GNet,” for m7G site identification. This model utilizes a Bidirectional Encoder Representation from Transformers (BERT) and is pretrained on nucleotide sequences data to capture hidden patterns from RNA sequences for m7G site prediction. Moreover, through detailed comparative experiments with various deep learning models, we discovered that the one-dimensional convolutional neural network (CNN) exhibits outstanding performance in sequence feature learning and classification. The proposed GenoM7GNet model achieved 0.953in accuracy, 0.932in sensitivity, 0.976in specificity, 0.907in Matthews Correlation Coefficient and 0.984in Area Under the receiver operating characteristic Curve on performance evaluation. Extensive experimental results further prove that our GenoM7GNet model markedly surpasses other state-of-the-art models in predicting m7G sites, exhibiting high computing performance.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

GenoM7GNet：基于核苷酸语言模型的高效 N7-甲基鸟苷位点预测方法

N7 -甲基鸟苷（m7G）是转录后 RNA 修饰的主流之一，在医学治疗中占有极其重要的地位。然而，识别 m7G 位点的传统方法在时间和设备上都很昂贵。同时，现有的机器学习方法从 RNA 序列中提取的隐藏信息有限，因此很难提高准确率。因此，我们提出了一种用于识别 m7G 位点的深度学习网络，称为 "GenoM7GNet"。该模型利用双向变换器编码器表征（BERT），并在核苷酸序列数据上进行预训练，以捕捉 RNA 序列中的隐藏模式，用于 m7G 位点预测。此外，通过与各种深度学习模型的详细对比实验，我们发现一维卷积神经网络（CNN）在序列特征学习和分类方面表现出色。所提出的 GenoM7GNet 模型在性能评估上取得了 0.953 的准确率、0.932 的灵敏度、0.976 的特异性、0.907 的马修斯相关系数和 0.984 的接收者工作特征曲线下面积。广泛的实验结果进一步证明，我们的 GenoM7GNet 模型在预测 m7G 位点方面明显超越了其他最先进的模型，表现出很高的计算性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE/ACM Transactions on Computational Biology and Bioinformatics 工程技术-计算机：跨学科应用

CiteScore

7.50

自引率

6.70%

发文量

479

审稿时长

3 months

期刊介绍： IEEE/ACM Transactions on Computational Biology and Bioinformatics emphasizes the algorithmic, mathematical, statistical and computational methods that are central in bioinformatics and computational biology; the development and testing of effective computer programs in bioinformatics; the development of biological databases; and important biological results that are obtained from the use of these methods, programs and databases; the emerging field of Systems Biology, where many forms of data are used to create a computer-based model of a complex biological system