{"title":"Highly Accurate Protein Structure Classification and Prediction","authors":"Anirban Saha, Indranil Sarkar","doi":"10.1109/ICCSC56913.2023.10142975","DOIUrl":null,"url":null,"abstract":"Proteins are the main building blocks for any form of life known to us as of now, and it is the actuators of biophysical and chemical events occurring in living organisms. Biological functions are enabled by their naive structure, which plays a very important and crucial role in the design of vaccines and drugs. This acts as one of the main sources of motivation in predicting protein structure from its sequence of amino acids coupled with other information to get highly accurate prediction and classification, which indeed is one of the fundamental computational biology problems. As of now, not much focus has been given to the inclusion of sidechain structure information and prediction of the protein backbone. In this paper, it is shown that a new dataset called SidechainNet, which extends from the ProteinNet dataset, can be used to predict and classify the structure of proteins more accurately. This is because SidechainNet consists of angle and atomic coordinate information, which describes almost all the heavy atoms of each and every protein structure. The background information on the availability of data on the protein structure and the importance of ProteinNet is discussed. It is followed by the beneficial inclusion of additional information that SidechainNet has, which helps in predicting the structure of the protein more accurately. At last, it is shown how using a Machine Learning model, a highly accurate protein structure is obtained by applying SidechainNet as its dataset.","PeriodicalId":184366,"journal":{"name":"2023 2nd International Conference on Computational Systems and Communication (ICCSC)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 2nd International Conference on Computational Systems and Communication (ICCSC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCSC56913.2023.10142975","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Proteins are the main building blocks for any form of life known to us as of now, and it is the actuators of biophysical and chemical events occurring in living organisms. Biological functions are enabled by their naive structure, which plays a very important and crucial role in the design of vaccines and drugs. This acts as one of the main sources of motivation in predicting protein structure from its sequence of amino acids coupled with other information to get highly accurate prediction and classification, which indeed is one of the fundamental computational biology problems. As of now, not much focus has been given to the inclusion of sidechain structure information and prediction of the protein backbone. In this paper, it is shown that a new dataset called SidechainNet, which extends from the ProteinNet dataset, can be used to predict and classify the structure of proteins more accurately. This is because SidechainNet consists of angle and atomic coordinate information, which describes almost all the heavy atoms of each and every protein structure. The background information on the availability of data on the protein structure and the importance of ProteinNet is discussed. It is followed by the beneficial inclusion of additional information that SidechainNet has, which helps in predicting the structure of the protein more accurately. At last, it is shown how using a Machine Learning model, a highly accurate protein structure is obtained by applying SidechainNet as its dataset.