Kodai Suzuki, K. Sakakibara, Masaki Nakamura, Suguru Shinoda, Y. Asano
{"title":"Machine Learning for Protein Solubility Prediction","authors":"Kodai Suzuki, K. Sakakibara, Masaki Nakamura, Suguru Shinoda, Y. Asano","doi":"10.1109/ICMLC56445.2022.9941322","DOIUrl":null,"url":null,"abstract":"The proteins that have the function of catalysis, called enzymes, can be used in many different ways in the chemical industry. The catalysis function of enzymes works by solubilizing. Enzymes can be used in the chemical industry, but in the recombinant production of enzymes, some enzymes aggregate during production. In- solubilized enzymes that has lost its catalysis function cannot be used in industry. Therefore, the search for new enzymes that can be used for industrial purposes is one of the important strategies. However, the search for new enzymes takes time and costs money. In previous research, a model for predicting protein solubility from the amino add sequence of a protein was constructed using machine learning. This has made it possible to predict the solubility of a protein before it is produced. In this study, a model is constructed to predict protein solubility not only from the amino acid sequence but also from the amino acid sequence and the secondary structure information of the protein. We attempt to improve the prediction accuracy of the model by providing the model with information that is thought to influence solubility.","PeriodicalId":117829,"journal":{"name":"2022 International Conference on Machine Learning and Cybernetics (ICMLC)","volume":"98 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 International Conference on Machine Learning and Cybernetics (ICMLC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICMLC56445.2022.9941322","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The proteins that have the function of catalysis, called enzymes, can be used in many different ways in the chemical industry. The catalysis function of enzymes works by solubilizing. Enzymes can be used in the chemical industry, but in the recombinant production of enzymes, some enzymes aggregate during production. In- solubilized enzymes that has lost its catalysis function cannot be used in industry. Therefore, the search for new enzymes that can be used for industrial purposes is one of the important strategies. However, the search for new enzymes takes time and costs money. In previous research, a model for predicting protein solubility from the amino add sequence of a protein was constructed using machine learning. This has made it possible to predict the solubility of a protein before it is produced. In this study, a model is constructed to predict protein solubility not only from the amino acid sequence but also from the amino acid sequence and the secondary structure information of the protein. We attempt to improve the prediction accuracy of the model by providing the model with information that is thought to influence solubility.