Ivan Zlobin, Nikita Toroptsev, Gleb Averochkin, Alexander Pavlov
{"title":"Pre-trained Mol2Vec Embeddings as a Tool for Predicting Polymer Properties","authors":"Ivan Zlobin, Nikita Toroptsev, Gleb Averochkin, Alexander Pavlov","doi":"10.1007/s10118-024-3237-y","DOIUrl":null,"url":null,"abstract":"<div><p>Machine learning-assisted prediction of polymer properties prior to synthesis has the potential to significantly accelerate the discovery and development of new polymer materials. To date, several approaches have been implemented to represent the chemical structure in machine learning models, among which Mol2Vec embeddings have attracted considerable attention in the cheminformatics community since their introduction in 2018. However, for small datasets, the use of chemical structure representations typically increases the dimensionality of the input dataset, resulting in a decrease in model performance. Furthermore, the limited diversity of polymer chemical structures hinders the training of reliable embeddings, necessitating complex task-specific architecture implementations. To address these challenges, we examined the efficacy of Mol2Vec pre-trained embeddings in deriving vectorized representations of polymers. This study assesses the impact of incorporating Mol2Vec compound vectors into the input features on the efficacy of a model reliant on the physical properties of 214 polymers. The results will hopefully highlight the potential for improving prediction accuracy in polymer studies by incorporating pre-trained embeddings or promote their utilization when dealing with modestly sized polymer databases.</p></div>","PeriodicalId":517,"journal":{"name":"Chinese Journal of Polymer Science","volume":"42 12","pages":"2059 - 2068"},"PeriodicalIF":4.1000,"publicationDate":"2024-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Chinese Journal of Polymer Science","FirstCategoryId":"92","ListUrlMain":"https://link.springer.com/article/10.1007/s10118-024-3237-y","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"POLYMER SCIENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Machine learning-assisted prediction of polymer properties prior to synthesis has the potential to significantly accelerate the discovery and development of new polymer materials. To date, several approaches have been implemented to represent the chemical structure in machine learning models, among which Mol2Vec embeddings have attracted considerable attention in the cheminformatics community since their introduction in 2018. However, for small datasets, the use of chemical structure representations typically increases the dimensionality of the input dataset, resulting in a decrease in model performance. Furthermore, the limited diversity of polymer chemical structures hinders the training of reliable embeddings, necessitating complex task-specific architecture implementations. To address these challenges, we examined the efficacy of Mol2Vec pre-trained embeddings in deriving vectorized representations of polymers. This study assesses the impact of incorporating Mol2Vec compound vectors into the input features on the efficacy of a model reliant on the physical properties of 214 polymers. The results will hopefully highlight the potential for improving prediction accuracy in polymer studies by incorporating pre-trained embeddings or promote their utilization when dealing with modestly sized polymer databases.
期刊介绍:
Chinese Journal of Polymer Science (CJPS) is a monthly journal published in English and sponsored by the Chinese Chemical Society and the Institute of Chemistry, Chinese Academy of Sciences. CJPS is edited by a distinguished Editorial Board headed by Professor Qi-Feng Zhou and supported by an International Advisory Board in which many famous active polymer scientists all over the world are included. The journal was first published in 1983 under the title Polymer Communications and has the current name since 1985.
CJPS is a peer-reviewed journal dedicated to the timely publication of original research ideas and results in the field of polymer science. The issues may carry regular papers, rapid communications and notes as well as feature articles. As a leading polymer journal in China published in English, CJPS reflects the new achievements obtained in various laboratories of China, CJPS also includes papers submitted by scientists of different countries and regions outside of China, reflecting the international nature of the journal.