{"title":"基于神经网络的蛋白质二级结构预测方法","authors":"Arifur Rahman, Anik Mahmud, Pintu Chandra Shill","doi":"10.1109/ICAAIC56838.2023.10140404","DOIUrl":null,"url":null,"abstract":"Protein Secondary structure prediction is an emerging topic in bioinformatics to understand briefly the functions of protein and their role in drug invention, medicine and biology. In our research we have applied two recurrent neural network based approach Bi-LSTM (Bidirectional Long Short-Term Memory) and LSTM (Long Short-Term Memory). Our research was focused on primary structure up to 134 in length of amino acids. Initially our proposed model produced a ‘Indexed Lexicon of corpus’ using tri-gram conversion for primary structure strings. Each primary structure tri-gram transformed snippets is substituted with its associated index mentioned in ‘Indexed corpus’. The indexed parameter vector inputted into our proposed Bi-LSTM and LSTM model. We got best accuracy when we have used two Bi-LSTM and three LSTM layers respectively in Bi-LSTM and LSTM models. To prevent biasness and minimize overfitting problem we have utilized two dropout layers for each of Bi-LSTM and LSTM model. We have operated our model on ccPDB 2.0 benchmark dataset. There is total eight states protein secondary structure in this dataset. For this sst8 secondary structure we have achieved 83.24% accuracy for our proposed LSTM model and 89.10% accuracy for our Bi-LSTM model. We have configured our model to run for 50 epochs with batch size 64. For compilation of our models we have utilized ‘adam’ optimizer and the ‘categorical crossentropy’ loss function. To make dataset balanced to our model we have also employed 5-fold cross validation.","PeriodicalId":267906,"journal":{"name":"2023 2nd International Conference on Applied Artificial Intelligence and Computing (ICAAIC)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Neural Network-based Approach to Predict Protein Secondary Structure\",\"authors\":\"Arifur Rahman, Anik Mahmud, Pintu Chandra Shill\",\"doi\":\"10.1109/ICAAIC56838.2023.10140404\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Protein Secondary structure prediction is an emerging topic in bioinformatics to understand briefly the functions of protein and their role in drug invention, medicine and biology. In our research we have applied two recurrent neural network based approach Bi-LSTM (Bidirectional Long Short-Term Memory) and LSTM (Long Short-Term Memory). Our research was focused on primary structure up to 134 in length of amino acids. Initially our proposed model produced a ‘Indexed Lexicon of corpus’ using tri-gram conversion for primary structure strings. Each primary structure tri-gram transformed snippets is substituted with its associated index mentioned in ‘Indexed corpus’. The indexed parameter vector inputted into our proposed Bi-LSTM and LSTM model. We got best accuracy when we have used two Bi-LSTM and three LSTM layers respectively in Bi-LSTM and LSTM models. To prevent biasness and minimize overfitting problem we have utilized two dropout layers for each of Bi-LSTM and LSTM model. We have operated our model on ccPDB 2.0 benchmark dataset. There is total eight states protein secondary structure in this dataset. For this sst8 secondary structure we have achieved 83.24% accuracy for our proposed LSTM model and 89.10% accuracy for our Bi-LSTM model. We have configured our model to run for 50 epochs with batch size 64. For compilation of our models we have utilized ‘adam’ optimizer and the ‘categorical crossentropy’ loss function. To make dataset balanced to our model we have also employed 5-fold cross validation.\",\"PeriodicalId\":267906,\"journal\":{\"name\":\"2023 2nd International Conference on Applied Artificial Intelligence and Computing (ICAAIC)\",\"volume\":\"35 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-05-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 2nd International Conference on Applied Artificial Intelligence and Computing (ICAAIC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICAAIC56838.2023.10140404\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 2nd International Conference on Applied Artificial Intelligence and Computing (ICAAIC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICAAIC56838.2023.10140404","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Neural Network-based Approach to Predict Protein Secondary Structure
Protein Secondary structure prediction is an emerging topic in bioinformatics to understand briefly the functions of protein and their role in drug invention, medicine and biology. In our research we have applied two recurrent neural network based approach Bi-LSTM (Bidirectional Long Short-Term Memory) and LSTM (Long Short-Term Memory). Our research was focused on primary structure up to 134 in length of amino acids. Initially our proposed model produced a ‘Indexed Lexicon of corpus’ using tri-gram conversion for primary structure strings. Each primary structure tri-gram transformed snippets is substituted with its associated index mentioned in ‘Indexed corpus’. The indexed parameter vector inputted into our proposed Bi-LSTM and LSTM model. We got best accuracy when we have used two Bi-LSTM and three LSTM layers respectively in Bi-LSTM and LSTM models. To prevent biasness and minimize overfitting problem we have utilized two dropout layers for each of Bi-LSTM and LSTM model. We have operated our model on ccPDB 2.0 benchmark dataset. There is total eight states protein secondary structure in this dataset. For this sst8 secondary structure we have achieved 83.24% accuracy for our proposed LSTM model and 89.10% accuracy for our Bi-LSTM model. We have configured our model to run for 50 epochs with batch size 64. For compilation of our models we have utilized ‘adam’ optimizer and the ‘categorical crossentropy’ loss function. To make dataset balanced to our model we have also employed 5-fold cross validation.