Wise Herowati, Wahyu Aji Eko Prabowo, Muhamad Akrom, Noor Ageng Setiyanto, Achmad Wahid Kurniawan, Novianto Nur Hidayat, Totok Sutojo, Supriadi Rustad
{"title":"Machine learning for pyrimidine corrosion inhibitor small dataset","authors":"Wise Herowati, Wahyu Aji Eko Prabowo, Muhamad Akrom, Noor Ageng Setiyanto, Achmad Wahid Kurniawan, Novianto Nur Hidayat, Totok Sutojo, Supriadi Rustad","doi":"10.1007/s00214-024-03140-x","DOIUrl":null,"url":null,"abstract":"<p>Machine learning (ML) approaches have been developed to predict materials’ corrosion inhibition efficiency, particularly pyrimidine compounds. Notably, the virtual sample generation (VSG) technique enhances prediction accuracy, a novel approach for handling small datasets in this context. The random forest model, the best-performing nonlinear algorithm, showed substantial accuracy improvement based on the increase in <i>R</i><sup>2</sup> value from 0.05 to 0.99 and the decrease in RMSE value from 5.60 to 0.42, after applying VSG. These results underscore the efficacy of the VSG technique in boosting the predictive performance of ML models, particularly in scenarios constrained by limited data availability.</p>","PeriodicalId":23045,"journal":{"name":"Theoretical Chemistry Accounts","volume":"10 1","pages":""},"PeriodicalIF":1.6000,"publicationDate":"2024-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Theoretical Chemistry Accounts","FirstCategoryId":"92","ListUrlMain":"https://doi.org/10.1007/s00214-024-03140-x","RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"CHEMISTRY, PHYSICAL","Score":null,"Total":0}
引用次数: 0
Abstract
Machine learning (ML) approaches have been developed to predict materials’ corrosion inhibition efficiency, particularly pyrimidine compounds. Notably, the virtual sample generation (VSG) technique enhances prediction accuracy, a novel approach for handling small datasets in this context. The random forest model, the best-performing nonlinear algorithm, showed substantial accuracy improvement based on the increase in R2 value from 0.05 to 0.99 and the decrease in RMSE value from 5.60 to 0.42, after applying VSG. These results underscore the efficacy of the VSG technique in boosting the predictive performance of ML models, particularly in scenarios constrained by limited data availability.
期刊介绍:
TCA publishes papers in all fields of theoretical chemistry, computational chemistry, and modeling. Fundamental studies as well as applications are included in the scope. In many cases, theorists and computational chemists have special concerns which reach either across the vertical borders of the special disciplines in chemistry or else across the horizontal borders of structure, spectra, synthesis, and dynamics. TCA is especially interested in papers that impact upon multiple chemical disciplines.