Comprehensive Review and Assessment of Computational Methods for Prediction of N6-Methyladenosine Sites.

IF 3.5 3区生物学 Q1 BIOLOGY Biology-Basel Pub Date : 2024-09-28 DOI:10.3390/biology13100777

Zhengtao Luo, Liyi Yu, Zhaochun Xu, Kening Liu, Lichuan Gu

{"title":"Comprehensive Review and Assessment of Computational Methods for Prediction of N6-Methyladenosine Sites.","authors":"Zhengtao Luo, Liyi Yu, Zhaochun Xu, Kening Liu, Lichuan Gu","doi":"10.3390/biology13100777","DOIUrl":null,"url":null,"abstract":"N6-methyladenosine (m6A) plays a crucial regulatory role in the control of cellular functions and gene expression. Recent advances in sequencing techniques for transcriptome-wide m6A mapping have accelerated the accumulation of m6A site information at a single-nucleotide level, providing more high-confidence training data to develop computational approaches for m6A site prediction. However, it is still a major challenge to precisely predict m6A sites using in silico approaches. To advance the computational support for m6A site identification, here, we curated 13 up-to-date benchmark datasets from nine different species (i.e., H. sapiens, M. musculus, Rat, S. cerevisiae, Zebrafish, A. thaliana, Pig, Rhesus, and Chimpanzee). This will assist the research community in conducting an unbiased evaluation of alternative approaches and support future research on m6A modification. We revisited 52 computational approaches published since 2015 for m6A site identification, including 30 traditional machine learning-based, 14 deep learning-based, and 8 ensemble learning-based methods. We comprehensively reviewed these computational approaches in terms of their training datasets, calculated features, computational methodologies, performance evaluation strategy, and webserver/software usability. Using these benchmark datasets, we benchmarked nine predictors with available online websites or stand-alone software and assessed their prediction performance. We found that deep learning and traditional machine learning approaches generally outperformed scoring function-based approaches. In summary, the curated benchmark dataset repository and the systematic assessment in this study serve to inform the design and implementation of state-of-the-art computational approaches for m6A identification and facilitate more rigorous comparisons of new methods in the future.","PeriodicalId":48624,"journal":{"name":"Biology-Basel","volume":"13 10","pages":""},"PeriodicalIF":3.5000,"publicationDate":"2024-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11504118/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biology-Basel","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.3390/biology13100777","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

N6-methyladenosine (m⁶A) plays a crucial regulatory role in the control of cellular functions and gene expression. Recent advances in sequencing techniques for transcriptome-wide m⁶A mapping have accelerated the accumulation of m⁶A site information at a single-nucleotide level, providing more high-confidence training data to develop computational approaches for m⁶A site prediction. However, it is still a major challenge to precisely predict m⁶A sites using in silico approaches. To advance the computational support for m⁶A site identification, here, we curated 13 up-to-date benchmark datasets from nine different species (i.e., H. sapiens, M. musculus, Rat, S. cerevisiae, Zebrafish, A. thaliana, Pig, Rhesus, and Chimpanzee). This will assist the research community in conducting an unbiased evaluation of alternative approaches and support future research on m⁶A modification. We revisited 52 computational approaches published since 2015 for m⁶A site identification, including 30 traditional machine learning-based, 14 deep learning-based, and 8 ensemble learning-based methods. We comprehensively reviewed these computational approaches in terms of their training datasets, calculated features, computational methodologies, performance evaluation strategy, and webserver/software usability. Using these benchmark datasets, we benchmarked nine predictors with available online websites or stand-alone software and assessed their prediction performance. We found that deep learning and traditional machine learning approaches generally outperformed scoring function-based approaches. In summary, the curated benchmark dataset repository and the systematic assessment in this study serve to inform the design and implementation of state-of-the-art computational approaches for m⁶A identification and facilitate more rigorous comparisons of new methods in the future.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

全面回顾和评估用于预测 N6-甲基腺苷位点的计算方法

N6-甲基腺苷（m6A）在控制细胞功能和基因表达方面起着至关重要的调控作用。最近，用于全转录组 m6A 图谱的测序技术取得了进展，加速了单核苷酸水平 m6A 位点信息的积累，为开发 m6A 位点预测的计算方法提供了更多高置信度的训练数据。然而，利用硅学方法精确预测 m6A 位点仍是一项重大挑战。为了推进对 m6A 位点鉴定的计算支持，我们在此从 9 个不同物种（即智人、麝香猫、大鼠、S. cerevisiae、斑马鱼、A. thaliana、猪、恒河猴和黑猩猩）中收集了 13 个最新的基准数据集。这将有助于研究界对替代方法进行无偏见的评估，并支持未来对 m6A 修饰的研究。我们重新研究了 2015 年以来发表的 52 种 m6A 位点鉴定计算方法，包括 30 种基于传统机器学习的方法、14 种基于深度学习的方法和 8 种基于集合学习的方法。我们从训练数据集、计算特征、计算方法、性能评估策略以及网络服务器/软件的可用性等方面全面审查了这些计算方法。利用这些基准数据集，我们用可用的在线网站或独立软件对九种预测方法进行了基准测试，并评估了它们的预测性能。我们发现，深度学习和传统机器学习方法的性能普遍优于基于评分函数的方法。总之，本研究中策划的基准数据集库和系统性评估有助于为 m6A 鉴定设计和实施最先进的计算方法提供信息，并促进未来对新方法进行更严格的比较。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Biology-Basel Biological Science-Biological Science

CiteScore

5.70

自引率

4.80%

发文量

1618

审稿时长

11 weeks

期刊介绍： Biology (ISSN 2079-7737) is an international, peer-reviewed, quick-refereeing open access journal of Biological Science published by MDPI online. It publishes reviews, research papers and communications in all areas of biology and at the interface of related disciplines. Our aim is to encourage scientists to publish their experimental and theoretical results in as much detail as possible. There is no restriction on the length of the papers. The full experimental details must be provided so that the results can be reproduced. Electronic files regarding the full details of the experimental procedure, if unable to be published in a normal way, can be deposited as supplementary material.