CRBPSA：利用序列结构关注模型识别 CircRNA-RBP 相互作用位点。

IF 4.4 1区生物学 Q1 BIOLOGY BMC Biology Pub Date : 2024-11-14 DOI:10.1186/s12915-024-02055-0

Chao Cao, Chunyu Wang, Qi Dai, Quan Zou, Tao Wang

{"title":"CRBPSA：利用序列结构关注模型识别 CircRNA-RBP 相互作用位点。","authors":"Chao Cao, Chunyu Wang, Qi Dai, Quan Zou, Tao Wang","doi":"10.1186/s12915-024-02055-0","DOIUrl":null,"url":null,"abstract":"Background: Due to the ability of circRNA to bind with corresponding RBPs and play a critical role in gene regulation and disease prevention, numerous identification algorithms have been developed. Nevertheless, most of the current mainstream methods primarily capture one-dimensional sequence features through various descriptors, while neglecting the effective extraction of secondary structure features. Moreover, as the number of introduced descriptors increases, the issues of sparsity and ineffective representation also rise, causing a significant burden on computational models and leaving room for improvement in predictive performance.Results: Based on this, we focused on capturing the features of secondary structure in sequences and developed a new architecture called CRBPSA, which is based on a sequence-structure attention mechanism. Firstly, a base-pairing matrix is generated by calculating the matching probability between each base, with a Gaussian function introduced as a weight to construct the secondary structure. Then, a Structure_Transformer is employed to extract base-pairing information and spatial positional dependencies, enabling the identification of binding sites through deeper feature extraction. Experimental results using the same set of hyperparameters on 37 circRNA datasets, totaling 671,952 samples, show that the CRBPSA algorithm achieves an average AUC of 99.93%, surpassing all existing prediction methods.Conclusions: CRBPSA is a lightweight and efficient prediction tool for circRNA-RBP, which can capture structural features of sequences with minimal computational resources and accurately predict protein-binding sites. This tool facilitates a deeper understanding of the biological processes and mechanisms underlying circRNA and protein interactions.","PeriodicalId":9339,"journal":{"name":"BMC Biology","volume":"22 1","pages":"260"},"PeriodicalIF":4.4000,"publicationDate":"2024-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11566611/pdf/","citationCount":"0","resultStr":"{\"title\":\"CRBPSA: CircRNA-RBP interaction sites identification using sequence structural attention model.\",\"authors\":\"Chao Cao, Chunyu Wang, Qi Dai, Quan Zou, Tao Wang\",\"doi\":\"10.1186/s12915-024-02055-0\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Background: Due to the ability of circRNA to bind with corresponding RBPs and play a critical role in gene regulation and disease prevention, numerous identification algorithms have been developed. Nevertheless, most of the current mainstream methods primarily capture one-dimensional sequence features through various descriptors, while neglecting the effective extraction of secondary structure features. Moreover, as the number of introduced descriptors increases, the issues of sparsity and ineffective representation also rise, causing a significant burden on computational models and leaving room for improvement in predictive performance.Results: Based on this, we focused on capturing the features of secondary structure in sequences and developed a new architecture called CRBPSA, which is based on a sequence-structure attention mechanism. Firstly, a base-pairing matrix is generated by calculating the matching probability between each base, with a Gaussian function introduced as a weight to construct the secondary structure. Then, a Structure_Transformer is employed to extract base-pairing information and spatial positional dependencies, enabling the identification of binding sites through deeper feature extraction. Experimental results using the same set of hyperparameters on 37 circRNA datasets, totaling 671,952 samples, show that the CRBPSA algorithm achieves an average AUC of 99.93%, surpassing all existing prediction methods.Conclusions: CRBPSA is a lightweight and efficient prediction tool for circRNA-RBP, which can capture structural features of sequences with minimal computational resources and accurately predict protein-binding sites. This tool facilitates a deeper understanding of the biological processes and mechanisms underlying circRNA and protein interactions.\",\"PeriodicalId\":9339,\"journal\":{\"name\":\"BMC Biology\",\"volume\":\"22 1\",\"pages\":\"260\"},\"PeriodicalIF\":4.4000,\"publicationDate\":\"2024-11-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11566611/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"BMC Biology\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1186/s12915-024-02055-0\",\"RegionNum\":1,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Biology","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s12915-024-02055-0","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOLOGY","Score":null,"Total":0}

引用次数: 0

摘要

背景：由于 circRNA 能够与相应的 RBPs 结合，并在基因调控和疾病预防中发挥关键作用，人们开发了许多识别算法。然而，目前大多数主流方法主要是通过各种描述符捕捉一维序列特征，而忽略了二级结构特征的有效提取。此外，随着引入描述符数量的增加，稀疏性和无效表示的问题也随之增加，给计算模型带来了很大负担，预测性能也有待提高：在此基础上，我们重点捕捉了序列中二级结构的特征，并开发了一种基于序列-结构关注机制的新架构--CRBPSA。首先，通过计算每个碱基之间的匹配概率生成碱基配对矩阵，并引入高斯函数作为权重来构建二级结构。然后，利用结构转换器（Structure_Transformer）提取碱基配对信息和空间位置依赖关系，从而通过更深入的特征提取识别结合位点。在 37 个 circRNA 数据集（共 671 952 个样本）上使用同一组超参数的实验结果表明，CRBPSA 算法的平均 AUC 高达 99.93%，超过了所有现有的预测方法：CRBPSA是一种轻量级、高效的循环RNA-RBP预测工具，它能以最少的计算资源捕捉序列的结构特征，并准确预测蛋白质结合位点。该工具有助于深入了解 circRNA 与蛋白质相互作用的生物学过程和机制。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

CRBPSA: CircRNA-RBP interaction sites identification using sequence structural attention model.

Background: Due to the ability of circRNA to bind with corresponding RBPs and play a critical role in gene regulation and disease prevention, numerous identification algorithms have been developed. Nevertheless, most of the current mainstream methods primarily capture one-dimensional sequence features through various descriptors, while neglecting the effective extraction of secondary structure features. Moreover, as the number of introduced descriptors increases, the issues of sparsity and ineffective representation also rise, causing a significant burden on computational models and leaving room for improvement in predictive performance.

Results: Based on this, we focused on capturing the features of secondary structure in sequences and developed a new architecture called CRBPSA, which is based on a sequence-structure attention mechanism. Firstly, a base-pairing matrix is generated by calculating the matching probability between each base, with a Gaussian function introduced as a weight to construct the secondary structure. Then, a Structure_Transformer is employed to extract base-pairing information and spatial positional dependencies, enabling the identification of binding sites through deeper feature extraction. Experimental results using the same set of hyperparameters on 37 circRNA datasets, totaling 671,952 samples, show that the CRBPSA algorithm achieves an average AUC of 99.93%, surpassing all existing prediction methods.

Conclusions: CRBPSA is a lightweight and efficient prediction tool for circRNA-RBP, which can capture structural features of sequences with minimal computational resources and accurately predict protein-binding sites. This tool facilitates a deeper understanding of the biological processes and mechanisms underlying circRNA and protein interactions.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

BMC Biology 生物-生物学

CiteScore

7.80

自引率

1.90%

发文量

260

审稿时长

3 months

期刊介绍： BMC Biology is a broad scope journal covering all areas of biology. Our content includes research articles, new methods and tools. BMC Biology also publishes reviews, Q&A, and commentaries.