Vanda A Gaonac'h-Lovejoy, Martin Sauvageau, John S Mattick, Martin A Smith
{"title":"ECSFinder: Optimized prediction of evolutionarily conserved RNA secondary structures from genome sequences","authors":"Vanda A Gaonac'h-Lovejoy, Martin Sauvageau, John S Mattick, Martin A Smith","doi":"10.1101/2024.09.14.612549","DOIUrl":null,"url":null,"abstract":"Accurate prediction of RNA secondary structures is essential for understanding the evolutionary conservation and functional roles of long noncoding RNAs (lncRNAs) across diverse species. In this study, we benchmarked two leading tools for predicting evolutionarily conserved RNA secondary structures (ECSs), SISSIz and R-scape, using two distinct experimental frameworks: one focusing on well-characterized mitochondrial RNA structures and the other on experimentally validated Rfam structures embedded within simulated genome alignments. While both tools performed comparably overall, each displayed subtle preferences in detecting ECSs. To address these limitations, we evaluated two interpretable machine learning approaches that integrate the strengths of both methods. By balancing thermodynamic stability features from RNALalifold and SISSIz with robust covariation metrics from R-scape, a random forest classifier significantly outperformed both conventional tools. This classifier was implemented in ECSfinder, a new tool that provides a robust, interpretable solution for genome-wide identification of conserved RNA structures, offering valuable insights into lncRNA function and evolutionary conservation. ECSfinder is designed for large-scale comparative genomics applications and promises to facilitate the discovery of novel functional RNA elements.","PeriodicalId":501307,"journal":{"name":"bioRxiv - Bioinformatics","volume":"188 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"bioRxiv - Bioinformatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1101/2024.09.14.612549","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Accurate prediction of RNA secondary structures is essential for understanding the evolutionary conservation and functional roles of long noncoding RNAs (lncRNAs) across diverse species. In this study, we benchmarked two leading tools for predicting evolutionarily conserved RNA secondary structures (ECSs), SISSIz and R-scape, using two distinct experimental frameworks: one focusing on well-characterized mitochondrial RNA structures and the other on experimentally validated Rfam structures embedded within simulated genome alignments. While both tools performed comparably overall, each displayed subtle preferences in detecting ECSs. To address these limitations, we evaluated two interpretable machine learning approaches that integrate the strengths of both methods. By balancing thermodynamic stability features from RNALalifold and SISSIz with robust covariation metrics from R-scape, a random forest classifier significantly outperformed both conventional tools. This classifier was implemented in ECSfinder, a new tool that provides a robust, interpretable solution for genome-wide identification of conserved RNA structures, offering valuable insights into lncRNA function and evolutionary conservation. ECSfinder is designed for large-scale comparative genomics applications and promises to facilitate the discovery of novel functional RNA elements.