CRBPSA:利用序列结构关注模型识别 CircRNA-RBP 相互作用位点。

IF 4.4 1区 生物学 Q1 BIOLOGY BMC Biology Pub Date : 2024-11-14 DOI:10.1186/s12915-024-02055-0
Chao Cao, Chunyu Wang, Qi Dai, Quan Zou, Tao Wang
{"title":"CRBPSA:利用序列结构关注模型识别 CircRNA-RBP 相互作用位点。","authors":"Chao Cao, Chunyu Wang, Qi Dai, Quan Zou, Tao Wang","doi":"10.1186/s12915-024-02055-0","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Due to the ability of circRNA to bind with corresponding RBPs and play a critical role in gene regulation and disease prevention, numerous identification algorithms have been developed. Nevertheless, most of the current mainstream methods primarily capture one-dimensional sequence features through various descriptors, while neglecting the effective extraction of secondary structure features. Moreover, as the number of introduced descriptors increases, the issues of sparsity and ineffective representation also rise, causing a significant burden on computational models and leaving room for improvement in predictive performance.</p><p><strong>Results: </strong>Based on this, we focused on capturing the features of secondary structure in sequences and developed a new architecture called CRBPSA, which is based on a sequence-structure attention mechanism. Firstly, a base-pairing matrix is generated by calculating the matching probability between each base, with a Gaussian function introduced as a weight to construct the secondary structure. Then, a Structure_Transformer is employed to extract base-pairing information and spatial positional dependencies, enabling the identification of binding sites through deeper feature extraction. Experimental results using the same set of hyperparameters on 37 circRNA datasets, totaling 671,952 samples, show that the CRBPSA algorithm achieves an average AUC of 99.93%, surpassing all existing prediction methods.</p><p><strong>Conclusions: </strong>CRBPSA is a lightweight and efficient prediction tool for circRNA-RBP, which can capture structural features of sequences with minimal computational resources and accurately predict protein-binding sites. This tool facilitates a deeper understanding of the biological processes and mechanisms underlying circRNA and protein interactions.</p>","PeriodicalId":9339,"journal":{"name":"BMC Biology","volume":"22 1","pages":"260"},"PeriodicalIF":4.4000,"publicationDate":"2024-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11566611/pdf/","citationCount":"0","resultStr":"{\"title\":\"CRBPSA: CircRNA-RBP interaction sites identification using sequence structural attention model.\",\"authors\":\"Chao Cao, Chunyu Wang, Qi Dai, Quan Zou, Tao Wang\",\"doi\":\"10.1186/s12915-024-02055-0\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>Due to the ability of circRNA to bind with corresponding RBPs and play a critical role in gene regulation and disease prevention, numerous identification algorithms have been developed. Nevertheless, most of the current mainstream methods primarily capture one-dimensional sequence features through various descriptors, while neglecting the effective extraction of secondary structure features. Moreover, as the number of introduced descriptors increases, the issues of sparsity and ineffective representation also rise, causing a significant burden on computational models and leaving room for improvement in predictive performance.</p><p><strong>Results: </strong>Based on this, we focused on capturing the features of secondary structure in sequences and developed a new architecture called CRBPSA, which is based on a sequence-structure attention mechanism. Firstly, a base-pairing matrix is generated by calculating the matching probability between each base, with a Gaussian function introduced as a weight to construct the secondary structure. Then, a Structure_Transformer is employed to extract base-pairing information and spatial positional dependencies, enabling the identification of binding sites through deeper feature extraction. Experimental results using the same set of hyperparameters on 37 circRNA datasets, totaling 671,952 samples, show that the CRBPSA algorithm achieves an average AUC of 99.93%, surpassing all existing prediction methods.</p><p><strong>Conclusions: </strong>CRBPSA is a lightweight and efficient prediction tool for circRNA-RBP, which can capture structural features of sequences with minimal computational resources and accurately predict protein-binding sites. This tool facilitates a deeper understanding of the biological processes and mechanisms underlying circRNA and protein interactions.</p>\",\"PeriodicalId\":9339,\"journal\":{\"name\":\"BMC Biology\",\"volume\":\"22 1\",\"pages\":\"260\"},\"PeriodicalIF\":4.4000,\"publicationDate\":\"2024-11-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11566611/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"BMC Biology\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1186/s12915-024-02055-0\",\"RegionNum\":1,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Biology","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s12915-024-02055-0","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOLOGY","Score":null,"Total":0}
引用次数: 0

摘要

背景:由于 circRNA 能够与相应的 RBPs 结合,并在基因调控和疾病预防中发挥关键作用,人们开发了许多识别算法。然而,目前大多数主流方法主要是通过各种描述符捕捉一维序列特征,而忽略了二级结构特征的有效提取。此外,随着引入描述符数量的增加,稀疏性和无效表示的问题也随之增加,给计算模型带来了很大负担,预测性能也有待提高:在此基础上,我们重点捕捉了序列中二级结构的特征,并开发了一种基于序列-结构关注机制的新架构--CRBPSA。首先,通过计算每个碱基之间的匹配概率生成碱基配对矩阵,并引入高斯函数作为权重来构建二级结构。然后,利用结构转换器(Structure_Transformer)提取碱基配对信息和空间位置依赖关系,从而通过更深入的特征提取识别结合位点。在 37 个 circRNA 数据集(共 671 952 个样本)上使用同一组超参数的实验结果表明,CRBPSA 算法的平均 AUC 高达 99.93%,超过了所有现有的预测方法:CRBPSA是一种轻量级、高效的循环RNA-RBP预测工具,它能以最少的计算资源捕捉序列的结构特征,并准确预测蛋白质结合位点。该工具有助于深入了解 circRNA 与蛋白质相互作用的生物学过程和机制。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
CRBPSA: CircRNA-RBP interaction sites identification using sequence structural attention model.

Background: Due to the ability of circRNA to bind with corresponding RBPs and play a critical role in gene regulation and disease prevention, numerous identification algorithms have been developed. Nevertheless, most of the current mainstream methods primarily capture one-dimensional sequence features through various descriptors, while neglecting the effective extraction of secondary structure features. Moreover, as the number of introduced descriptors increases, the issues of sparsity and ineffective representation also rise, causing a significant burden on computational models and leaving room for improvement in predictive performance.

Results: Based on this, we focused on capturing the features of secondary structure in sequences and developed a new architecture called CRBPSA, which is based on a sequence-structure attention mechanism. Firstly, a base-pairing matrix is generated by calculating the matching probability between each base, with a Gaussian function introduced as a weight to construct the secondary structure. Then, a Structure_Transformer is employed to extract base-pairing information and spatial positional dependencies, enabling the identification of binding sites through deeper feature extraction. Experimental results using the same set of hyperparameters on 37 circRNA datasets, totaling 671,952 samples, show that the CRBPSA algorithm achieves an average AUC of 99.93%, surpassing all existing prediction methods.

Conclusions: CRBPSA is a lightweight and efficient prediction tool for circRNA-RBP, which can capture structural features of sequences with minimal computational resources and accurately predict protein-binding sites. This tool facilitates a deeper understanding of the biological processes and mechanisms underlying circRNA and protein interactions.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
BMC Biology
BMC Biology 生物-生物学
CiteScore
7.80
自引率
1.90%
发文量
260
审稿时长
3 months
期刊介绍: BMC Biology is a broad scope journal covering all areas of biology. Our content includes research articles, new methods and tools. BMC Biology also publishes reviews, Q&A, and commentaries.
期刊最新文献
Ancient genomes from the Tang Dynasty capital reveal the genetic legacy of trans-Eurasian communication at the eastern end of Silk Road. Eurasian spruce bark beetle detects lanierone using a highly expressed specialist odorant receptor, present in several functional sensillum types. Systemic and transcriptional response to intermittent fasting and fasting-mimicking diet in mice. Motif-guided identification of KRAS-interacting proteins. Long-term survival of asexual Zymoseptoria tritici spores in the environment.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1