Linh Ho Manh, Victoria C. P. Chen, Jay Rosenberger, Shouyi Wang, Yujing Yang, Kevin A. Schug
{"title":"Prediction of Vacuum Ultraviolet/Ultraviolet Gas-Phase Absorption Spectra Using Molecular Feature Representations and Machine Learning","authors":"Linh Ho Manh, Victoria C. P. Chen, Jay Rosenberger, Shouyi Wang, Yujing Yang, Kevin A. Schug","doi":"10.1021/acs.jcim.4c00676","DOIUrl":null,"url":null,"abstract":"Ultraviolet (UV) absorption spectroscopy is a widely used tool for quantitative and qualitative analyses of chemical compounds. In the gas phase, vacuum UV (VUV) and UV absorption spectra are specific and diagnostic for many small molecules. An accurate prediction of VUV/UV absorption spectra can aid the characterization of new or unknown molecules in areas such as fuels, forensics, and pharmaceutical research. An alternative to quantum chemical spectral prediction is the use of artificial intelligence. Here, different molecular feature representation techniques were used and developed to encode chemical structures for testing three machine learning models to predict gas-phase VUV/UV absorption spectra. Structure data files (.sdf) and VUV/UV absorption spectra for 1397 volatile and semivolatile chemical compounds were used to train and test the models. New molecular features (termed ABOCH) were introduced to better capture pi-bonding, aromaticity, and halogenation. The incorporation of these new features benefited spectral prediction and demonstrated superior performance compared to computationally intensive molecular-based deep learning methods. Of the machine learning methods, the use of a Random Forest regressor returned the best accuracy score with the shortest training time. The developed machine learning prediction model also outperformed spectral predictions based on the time-dependent density functional theory.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":null,"pages":null},"PeriodicalIF":5.6000,"publicationDate":"2024-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Chemical Information and Modeling ","FirstCategoryId":"92","ListUrlMain":"https://doi.org/10.1021/acs.jcim.4c00676","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, MEDICINAL","Score":null,"Total":0}
引用次数: 0
Abstract
Ultraviolet (UV) absorption spectroscopy is a widely used tool for quantitative and qualitative analyses of chemical compounds. In the gas phase, vacuum UV (VUV) and UV absorption spectra are specific and diagnostic for many small molecules. An accurate prediction of VUV/UV absorption spectra can aid the characterization of new or unknown molecules in areas such as fuels, forensics, and pharmaceutical research. An alternative to quantum chemical spectral prediction is the use of artificial intelligence. Here, different molecular feature representation techniques were used and developed to encode chemical structures for testing three machine learning models to predict gas-phase VUV/UV absorption spectra. Structure data files (.sdf) and VUV/UV absorption spectra for 1397 volatile and semivolatile chemical compounds were used to train and test the models. New molecular features (termed ABOCH) were introduced to better capture pi-bonding, aromaticity, and halogenation. The incorporation of these new features benefited spectral prediction and demonstrated superior performance compared to computationally intensive molecular-based deep learning methods. Of the machine learning methods, the use of a Random Forest regressor returned the best accuracy score with the shortest training time. The developed machine learning prediction model also outperformed spectral predictions based on the time-dependent density functional theory.
期刊介绍:
The Journal of Chemical Information and Modeling publishes papers reporting new methodology and/or important applications in the fields of chemical informatics and molecular modeling. Specific topics include the representation and computer-based searching of chemical databases, molecular modeling, computer-aided molecular design of new materials, catalysts, or ligands, development of new computational methods or efficient algorithms for chemical software, and biopharmaceutical chemistry including analyses of biological activity and other issues related to drug discovery.
Astute chemists, computer scientists, and information specialists look to this monthly’s insightful research studies, programming innovations, and software reviews to keep current with advances in this integral, multidisciplinary field.
As a subscriber you’ll stay abreast of database search systems, use of graph theory in chemical problems, substructure search systems, pattern recognition and clustering, analysis of chemical and physical data, molecular modeling, graphics and natural language interfaces, bibliometric and citation analysis, and synthesis design and reactions databases.