Zhengtao Luo, Liyi Yu, Zhaochun Xu, Kening Liu, Lichuan Gu
{"title":"Comprehensive Review and Assessment of Computational Methods for Prediction of N6-Methyladenosine Sites.","authors":"Zhengtao Luo, Liyi Yu, Zhaochun Xu, Kening Liu, Lichuan Gu","doi":"10.3390/biology13100777","DOIUrl":null,"url":null,"abstract":"<p><p>N6-methyladenosine (m<sup>6</sup>A) plays a crucial regulatory role in the control of cellular functions and gene expression. Recent advances in sequencing techniques for transcriptome-wide m<sup>6</sup>A mapping have accelerated the accumulation of m<sup>6</sup>A site information at a single-nucleotide level, providing more high-confidence training data to develop computational approaches for m<sup>6</sup>A site prediction. However, it is still a major challenge to precisely predict m<sup>6</sup>A sites using in silico approaches. To advance the computational support for m<sup>6</sup>A site identification, here, we curated 13 up-to-date benchmark datasets from nine different species (i.e., <i>H. sapiens</i>, <i>M. musculus</i>, <i>Rat</i>, <i>S. cerevisiae</i>, <i>Zebrafish</i>, <i>A. thaliana</i>, <i>Pig</i>, <i>Rhesus</i>, and <i>Chimpanzee</i>). This will assist the research community in conducting an unbiased evaluation of alternative approaches and support future research on m<sup>6</sup>A modification. We revisited 52 computational approaches published since 2015 for m<sup>6</sup>A site identification, including 30 traditional machine learning-based, 14 deep learning-based, and 8 ensemble learning-based methods. We comprehensively reviewed these computational approaches in terms of their training datasets, calculated features, computational methodologies, performance evaluation strategy, and webserver/software usability. Using these benchmark datasets, we benchmarked nine predictors with available online websites or stand-alone software and assessed their prediction performance. We found that deep learning and traditional machine learning approaches generally outperformed scoring function-based approaches. In summary, the curated benchmark dataset repository and the systematic assessment in this study serve to inform the design and implementation of state-of-the-art computational approaches for m<sup>6</sup>A identification and facilitate more rigorous comparisons of new methods in the future.</p>","PeriodicalId":48624,"journal":{"name":"Biology-Basel","volume":"13 10","pages":""},"PeriodicalIF":3.6000,"publicationDate":"2024-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11504118/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biology-Basel","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.3390/biology13100777","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
N6-methyladenosine (m6A) plays a crucial regulatory role in the control of cellular functions and gene expression. Recent advances in sequencing techniques for transcriptome-wide m6A mapping have accelerated the accumulation of m6A site information at a single-nucleotide level, providing more high-confidence training data to develop computational approaches for m6A site prediction. However, it is still a major challenge to precisely predict m6A sites using in silico approaches. To advance the computational support for m6A site identification, here, we curated 13 up-to-date benchmark datasets from nine different species (i.e., H. sapiens, M. musculus, Rat, S. cerevisiae, Zebrafish, A. thaliana, Pig, Rhesus, and Chimpanzee). This will assist the research community in conducting an unbiased evaluation of alternative approaches and support future research on m6A modification. We revisited 52 computational approaches published since 2015 for m6A site identification, including 30 traditional machine learning-based, 14 deep learning-based, and 8 ensemble learning-based methods. We comprehensively reviewed these computational approaches in terms of their training datasets, calculated features, computational methodologies, performance evaluation strategy, and webserver/software usability. Using these benchmark datasets, we benchmarked nine predictors with available online websites or stand-alone software and assessed their prediction performance. We found that deep learning and traditional machine learning approaches generally outperformed scoring function-based approaches. In summary, the curated benchmark dataset repository and the systematic assessment in this study serve to inform the design and implementation of state-of-the-art computational approaches for m6A identification and facilitate more rigorous comparisons of new methods in the future.
期刊介绍:
Biology (ISSN 2079-7737) is an international, peer-reviewed, quick-refereeing open access journal of Biological Science published by MDPI online. It publishes reviews, research papers and communications in all areas of biology and at the interface of related disciplines. Our aim is to encourage scientists to publish their experimental and theoretical results in as much detail as possible. There is no restriction on the length of the papers. The full experimental details must be provided so that the results can be reproduced. Electronic files regarding the full details of the experimental procedure, if unable to be published in a normal way, can be deposited as supplementary material.