{"title":"Division of dataset into training and validation subsets by the jackknife validations to predict the pH optimum for beta-cellobiosidase","authors":"Shaomin Yan, Guang Wu","doi":"10.1109/aemcse55572.2022.00136","DOIUrl":null,"url":null,"abstract":"In modeling, it is generally to divide the dataset into training and validation sub-datasets. Although it appears simple, how to divide the dataset is still somewhat debatable. Of various methods to make the division, the jackknife method is very popular and advocated by professor Kuo-Chen Chou. However, the jackknife method is in fact mainly referenced to the delete-1 jackknife validation because the rest jackknife methods are extremely time-consuming and computationally intensive. In this study, we use the jackknife validations from delete-1 to delete- n+2 to develop a neural network model for the optimization of pH in an enzymatic reaction of beta-cellobiosidase, which gets more and more attention from biofeul industries, but has a small number of documented operational parameters. The best neural network model and the best predictor were elaborated from 31 candidates of neural network with different layers and neurons, and 11 predictors related to the amino acid primary structure. The jackknife validation was performed from delete-1 to delete- 18. The results show that the [6], [1] model provides the best performance among two-layer models, and that multi-layer models perform better than the two-layer model. The delete-6 jackknife strategy has the best performance, which suggests the division of dataset at the ratio of one third.","PeriodicalId":309096,"journal":{"name":"2022 5th International Conference on Advanced Electronic Materials, Computers and Software Engineering (AEMCSE)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 5th International Conference on Advanced Electronic Materials, Computers and Software Engineering (AEMCSE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/aemcse55572.2022.00136","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
In modeling, it is generally to divide the dataset into training and validation sub-datasets. Although it appears simple, how to divide the dataset is still somewhat debatable. Of various methods to make the division, the jackknife method is very popular and advocated by professor Kuo-Chen Chou. However, the jackknife method is in fact mainly referenced to the delete-1 jackknife validation because the rest jackknife methods are extremely time-consuming and computationally intensive. In this study, we use the jackknife validations from delete-1 to delete- n+2 to develop a neural network model for the optimization of pH in an enzymatic reaction of beta-cellobiosidase, which gets more and more attention from biofeul industries, but has a small number of documented operational parameters. The best neural network model and the best predictor were elaborated from 31 candidates of neural network with different layers and neurons, and 11 predictors related to the amino acid primary structure. The jackknife validation was performed from delete-1 to delete- 18. The results show that the [6], [1] model provides the best performance among two-layer models, and that multi-layer models perform better than the two-layer model. The delete-6 jackknife strategy has the best performance, which suggests the division of dataset at the ratio of one third.