D. D. Matyushin, A. Yu. Sholokhova, M. D. Khrisanfov, S. A. Borovikova
{"title":"Molecular Similarity Used for Evaluating the Accuracy of Retention Index Predictions in Gas Chromatography Using Deep Learning","authors":"D. D. Matyushin, A. Yu. Sholokhova, M. D. Khrisanfov, S. A. Borovikova","doi":"10.1134/S0036024424702431","DOIUrl":null,"url":null,"abstract":"<p>When predicting retention indices using deep learning, there is typically no way to assess the reliability of predictions for specific molecules. The present study demonstrates, using stationary phases based on polyethylene glycol and NIST 17 database, that predictions are generally more accurate when the training dataset includes molecules structurally similar to the compound for which prediction is made. The Tanimoto similarity of “molecular fingerprints” ECFP is the most suitable algorithm for this task among the four algorithms considered. For several transformation products of unsymmetrical dimethylhydrazine whose structures were established using such predictions, the predictions were shown to be unreliable.</p>","PeriodicalId":767,"journal":{"name":"Russian Journal of Physical Chemistry A","volume":"98 13","pages":"3212 - 3219"},"PeriodicalIF":0.7000,"publicationDate":"2025-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Russian Journal of Physical Chemistry A","FirstCategoryId":"92","ListUrlMain":"https://link.springer.com/article/10.1134/S0036024424702431","RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"CHEMISTRY, PHYSICAL","Score":null,"Total":0}
引用次数: 0
Abstract
When predicting retention indices using deep learning, there is typically no way to assess the reliability of predictions for specific molecules. The present study demonstrates, using stationary phases based on polyethylene glycol and NIST 17 database, that predictions are generally more accurate when the training dataset includes molecules structurally similar to the compound for which prediction is made. The Tanimoto similarity of “molecular fingerprints” ECFP is the most suitable algorithm for this task among the four algorithms considered. For several transformation products of unsymmetrical dimethylhydrazine whose structures were established using such predictions, the predictions were shown to be unreliable.
期刊介绍:
Russian Journal of Physical Chemistry A. Focus on Chemistry (Zhurnal Fizicheskoi Khimii), founded in 1930, offers a comprehensive review of theoretical and experimental research from the Russian Academy of Sciences, leading research and academic centers from Russia and from all over the world.
Articles are devoted to chemical thermodynamics and thermochemistry, biophysical chemistry, photochemistry and magnetochemistry, materials structure, quantum chemistry, physical chemistry of nanomaterials and solutions, surface phenomena and adsorption, and methods and techniques of physicochemical studies.