{"title":"Bi-SeqCNN:用于蛋白质功能预测的新型轻量级双向 CNN 架构","authors":"Vikash Kumar;Akshay Deepak;Ashish Ranjan;Aravind Prakash","doi":"10.1109/TCBB.2024.3426491","DOIUrl":null,"url":null,"abstract":"Deep learning approaches, such as convolution neural networks (CNNs) and deep recurrent neural networks (RNNs), have been the backbone for predicting protein function, with promising state-of-the-art (SOTA) results. RNNs with an in-built ability (i) focus on past information, (ii) collect both \n<i>short-and-long</i>\n range dependency information, and (iii) bi-directional processing offers a strong sequential processing mechanism. CNNs, however, are confined to focusing on \n<i>short-term</i>\n information from both the past and the future, although they offer parallelism. Therefore, a novel \n<i>bi-directional CNN</i>\n that strictly complies with the sequential processing mechanism of RNNs is introduced and is used for developing a protein function prediction framework, Bi-SeqCNN. This is a sub-sequence-based framework. Further, Bi-SeqCNN\n<inline-formula><tex-math>$^+$</tex-math></inline-formula>\n is an ensemble approach to better the prediction results. To our knowledge, this is the first time \n<i>bi-directional CNNs</i>\n are employed for general temporal data analysis and not just for protein sequences. The proposed architecture produces improvements up to +5.5% over contemporary SOTA methods on three benchmark protein sequence datasets. Moreover, it is substantially lighter and attain these results with (0.50–0.70 times) fewer parameters than the SOTA methods.","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"21 6","pages":"1922-1933"},"PeriodicalIF":3.6000,"publicationDate":"2024-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Bi-SeqCNN: A Novel Light-Weight Bi-Directional CNN Architecture for Protein Function Prediction\",\"authors\":\"Vikash Kumar;Akshay Deepak;Ashish Ranjan;Aravind Prakash\",\"doi\":\"10.1109/TCBB.2024.3426491\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Deep learning approaches, such as convolution neural networks (CNNs) and deep recurrent neural networks (RNNs), have been the backbone for predicting protein function, with promising state-of-the-art (SOTA) results. RNNs with an in-built ability (i) focus on past information, (ii) collect both \\n<i>short-and-long</i>\\n range dependency information, and (iii) bi-directional processing offers a strong sequential processing mechanism. CNNs, however, are confined to focusing on \\n<i>short-term</i>\\n information from both the past and the future, although they offer parallelism. Therefore, a novel \\n<i>bi-directional CNN</i>\\n that strictly complies with the sequential processing mechanism of RNNs is introduced and is used for developing a protein function prediction framework, Bi-SeqCNN. This is a sub-sequence-based framework. Further, Bi-SeqCNN\\n<inline-formula><tex-math>$^+$</tex-math></inline-formula>\\n is an ensemble approach to better the prediction results. To our knowledge, this is the first time \\n<i>bi-directional CNNs</i>\\n are employed for general temporal data analysis and not just for protein sequences. The proposed architecture produces improvements up to +5.5% over contemporary SOTA methods on three benchmark protein sequence datasets. Moreover, it is substantially lighter and attain these results with (0.50–0.70 times) fewer parameters than the SOTA methods.\",\"PeriodicalId\":13344,\"journal\":{\"name\":\"IEEE/ACM Transactions on Computational Biology and Bioinformatics\",\"volume\":\"21 6\",\"pages\":\"1922-1933\"},\"PeriodicalIF\":3.6000,\"publicationDate\":\"2024-07-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE/ACM Transactions on Computational Biology and Bioinformatics\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10595435/\",\"RegionNum\":3,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"BIOCHEMICAL RESEARCH METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10595435/","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0
摘要
卷积神经网络(CNN)和深度递归神经网络(RNN)等深度学习方法已成为预测蛋白质功能的中坚力量,并取得了令人鼓舞的先进(SOTA)成果。RNN 具有以下内在能力:(i) 专注于过去的信息;(ii) 同时收集短程和长程依赖信息;(iii) 双向处理,提供了强大的顺序处理机制。而 CNN 虽然提供了并行性,却仅限于关注过去和未来的短期信息。因此,我们引入了一种严格遵守 RNN 顺序处理机制的新型双向 CNN,并将其用于开发蛋白质功能预测框架--Bi-SeqCNN。这是一个基于子序列的框架。此外,Bi-SeqCNN + 是一种集合方法,可以获得更好的预测结果。据我们所知,这是首次将双向 CNN 用于一般时间数据分析,而不仅仅是蛋白质序列。在三个基准蛋白质序列数据集上,所提出的架构比当代的 SOTA 方法提高了 5.5%。此外,与 SOTA 方法相比,它的重量更轻,只需要(0.50-0.70 倍)更少的参数就能获得这些结果。
Bi-SeqCNN: A Novel Light-Weight Bi-Directional CNN Architecture for Protein Function Prediction
Deep learning approaches, such as convolution neural networks (CNNs) and deep recurrent neural networks (RNNs), have been the backbone for predicting protein function, with promising state-of-the-art (SOTA) results. RNNs with an in-built ability (i) focus on past information, (ii) collect both
short-and-long
range dependency information, and (iii) bi-directional processing offers a strong sequential processing mechanism. CNNs, however, are confined to focusing on
short-term
information from both the past and the future, although they offer parallelism. Therefore, a novel
bi-directional CNN
that strictly complies with the sequential processing mechanism of RNNs is introduced and is used for developing a protein function prediction framework, Bi-SeqCNN. This is a sub-sequence-based framework. Further, Bi-SeqCNN
$^+$
is an ensemble approach to better the prediction results. To our knowledge, this is the first time
bi-directional CNNs
are employed for general temporal data analysis and not just for protein sequences. The proposed architecture produces improvements up to +5.5% over contemporary SOTA methods on three benchmark protein sequence datasets. Moreover, it is substantially lighter and attain these results with (0.50–0.70 times) fewer parameters than the SOTA methods.
期刊介绍:
IEEE/ACM Transactions on Computational Biology and Bioinformatics emphasizes the algorithmic, mathematical, statistical and computational methods that are central in bioinformatics and computational biology; the development and testing of effective computer programs in bioinformatics; the development of biological databases; and important biological results that are obtained from the use of these methods, programs and databases; the emerging field of Systems Biology, where many forms of data are used to create a computer-based model of a complex biological system