{"title":"OmniGenome: Aligning RNA Sequences with Secondary Structures in Genomic Foundation Models","authors":"Heng Yang, Ke Li","doi":"arxiv-2407.11242","DOIUrl":null,"url":null,"abstract":"The structures of RNA sequences play a vital role in various cellular\nprocesses, while existing genomic foundation models (FMs) have struggled with\nprecise sequence-structure alignment, due to the complexity of exponential\ncombinations of nucleotide bases. In this study, we introduce OmniGenome, a\nfoundation model that addresses this critical challenge of sequence-structure\nalignment in RNA FMs. OmniGenome bridges the sequences with secondary\nstructures using structure-contextualized modeling, enabling hard in-silico\ngenomic tasks that existing FMs cannot handle, e.g., RNA design tasks. The\nresults on two comprehensive genomic benchmarks show that OmniGenome achieves\nstate-of-the-art performance on complex RNA subtasks. For example, OmniGenome\nsolved 74% of complex puzzles, compared to SpliceBERT which solved only 3% of\nthe puzzles. Besides, OmniGenome solves most of the puzzles within $1$ hour,\nwhile the existing methods usually allocate $24$ hours for each puzzle.\nOverall, OmniGenome establishes wide genomic application cases and offers\nprofound insights into biological mechanisms from the perspective of\nsequence-structure alignment.","PeriodicalId":501070,"journal":{"name":"arXiv - QuanBio - Genomics","volume":"1 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - QuanBio - Genomics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2407.11242","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The structures of RNA sequences play a vital role in various cellular
processes, while existing genomic foundation models (FMs) have struggled with
precise sequence-structure alignment, due to the complexity of exponential
combinations of nucleotide bases. In this study, we introduce OmniGenome, a
foundation model that addresses this critical challenge of sequence-structure
alignment in RNA FMs. OmniGenome bridges the sequences with secondary
structures using structure-contextualized modeling, enabling hard in-silico
genomic tasks that existing FMs cannot handle, e.g., RNA design tasks. The
results on two comprehensive genomic benchmarks show that OmniGenome achieves
state-of-the-art performance on complex RNA subtasks. For example, OmniGenome
solved 74% of complex puzzles, compared to SpliceBERT which solved only 3% of
the puzzles. Besides, OmniGenome solves most of the puzzles within $1$ hour,
while the existing methods usually allocate $24$ hours for each puzzle.
Overall, OmniGenome establishes wide genomic application cases and offers
profound insights into biological mechanisms from the perspective of
sequence-structure alignment.