Gerardo M. Casanola-Martin, Jing Wang, Jian-ge Zhou, Bakhtiyor Rasulev, Jerzy Leszczynski
{"title":"预测BODIPY化合物光物理性质的基于化学特征的机器学习模型:密度泛函理论和定量结构-性质关系建模","authors":"Gerardo M. Casanola-Martin, Jing Wang, Jian-ge Zhou, Bakhtiyor Rasulev, Jerzy Leszczynski","doi":"10.1007/s00894-024-06240-4","DOIUrl":null,"url":null,"abstract":"<div><h3>Context</h3><p>Boron-dipyrromethene (BODIPY) compounds have unique photophysical properties and have been applied in fluorescence imaging, sensing, optoelectronics, and beyond. In order to design effective BODIPY compounds, it is crucial to acquire a comprehensive understanding of the relationships between the structures of BODIPY and the corresponding photoproperties. Fifteen molecular descriptors were identified to be strongly correlated with the maximum absorption wavelength. The developed ML/QSPR model exhibited good predictive performance, with coefficients of determination (<i>R</i><sup>2</sup>) of 0.945 for the training set and 0.734 for the test set, demonstrating robustness and reliability. A posterior analysis of some of the selected descriptors in the model provided insights into the structural features that influence BODIPY compound properties; meanwhile, it also emphasizes the importance of molecular branching, size, and specific functional groups. This work shows that applied combined cheminformatics and machine learning approach is robust to screen the BODIPY compounds and design novel structures with enhanced performance.</p><h3>Methods</h3><p>In the present study, all the BODIPY models studied were fully optimized, and the corresponding absorption spectrum was obtained at DFT/TDDFT//B3LYP/6-311G(d,p) level. All the above calculations were executed by the Gaussian 16 program. Based upon the theoretical computational results, the machine learning-based quantitative structure–property relationship (ML/QSPR) model was employed for predicting the maximum absorption wavelength (λ) of BODIPY compounds by combining hand-crafted molecular descriptors (MD) and explainable machine learning (EML) techniques using Scikit-learn python library. A dataset of 131 BODIPY compounds with their experimental photophysical properties was used to generate a diverse set of molecular descriptors capturing information about the size, shape, connectivity, and other structural features of these compounds using Chemaxon and Alvadesc software. A genetic algorithm (GA) variable selection together with the multi-linear regression (MLR) method were applied to develop the best predictive model using the Genetic Selection python library.</p></div>","PeriodicalId":651,"journal":{"name":"Journal of Molecular Modeling","volume":"31 1","pages":""},"PeriodicalIF":2.1000,"publicationDate":"2024-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Chemical feature-based machine learning model for predicting photophysical properties of BODIPY compounds: density functional theory and quantitative structure–property relationship modeling\",\"authors\":\"Gerardo M. Casanola-Martin, Jing Wang, Jian-ge Zhou, Bakhtiyor Rasulev, Jerzy Leszczynski\",\"doi\":\"10.1007/s00894-024-06240-4\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Context</h3><p>Boron-dipyrromethene (BODIPY) compounds have unique photophysical properties and have been applied in fluorescence imaging, sensing, optoelectronics, and beyond. In order to design effective BODIPY compounds, it is crucial to acquire a comprehensive understanding of the relationships between the structures of BODIPY and the corresponding photoproperties. Fifteen molecular descriptors were identified to be strongly correlated with the maximum absorption wavelength. The developed ML/QSPR model exhibited good predictive performance, with coefficients of determination (<i>R</i><sup>2</sup>) of 0.945 for the training set and 0.734 for the test set, demonstrating robustness and reliability. A posterior analysis of some of the selected descriptors in the model provided insights into the structural features that influence BODIPY compound properties; meanwhile, it also emphasizes the importance of molecular branching, size, and specific functional groups. This work shows that applied combined cheminformatics and machine learning approach is robust to screen the BODIPY compounds and design novel structures with enhanced performance.</p><h3>Methods</h3><p>In the present study, all the BODIPY models studied were fully optimized, and the corresponding absorption spectrum was obtained at DFT/TDDFT//B3LYP/6-311G(d,p) level. All the above calculations were executed by the Gaussian 16 program. Based upon the theoretical computational results, the machine learning-based quantitative structure–property relationship (ML/QSPR) model was employed for predicting the maximum absorption wavelength (λ) of BODIPY compounds by combining hand-crafted molecular descriptors (MD) and explainable machine learning (EML) techniques using Scikit-learn python library. A dataset of 131 BODIPY compounds with their experimental photophysical properties was used to generate a diverse set of molecular descriptors capturing information about the size, shape, connectivity, and other structural features of these compounds using Chemaxon and Alvadesc software. A genetic algorithm (GA) variable selection together with the multi-linear regression (MLR) method were applied to develop the best predictive model using the Genetic Selection python library.</p></div>\",\"PeriodicalId\":651,\"journal\":{\"name\":\"Journal of Molecular Modeling\",\"volume\":\"31 1\",\"pages\":\"\"},\"PeriodicalIF\":2.1000,\"publicationDate\":\"2024-12-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Molecular Modeling\",\"FirstCategoryId\":\"92\",\"ListUrlMain\":\"https://link.springer.com/article/10.1007/s00894-024-06240-4\",\"RegionNum\":4,\"RegionCategory\":\"化学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"BIOCHEMISTRY & MOLECULAR BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Molecular Modeling","FirstCategoryId":"92","ListUrlMain":"https://link.springer.com/article/10.1007/s00894-024-06240-4","RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
Chemical feature-based machine learning model for predicting photophysical properties of BODIPY compounds: density functional theory and quantitative structure–property relationship modeling
Context
Boron-dipyrromethene (BODIPY) compounds have unique photophysical properties and have been applied in fluorescence imaging, sensing, optoelectronics, and beyond. In order to design effective BODIPY compounds, it is crucial to acquire a comprehensive understanding of the relationships between the structures of BODIPY and the corresponding photoproperties. Fifteen molecular descriptors were identified to be strongly correlated with the maximum absorption wavelength. The developed ML/QSPR model exhibited good predictive performance, with coefficients of determination (R2) of 0.945 for the training set and 0.734 for the test set, demonstrating robustness and reliability. A posterior analysis of some of the selected descriptors in the model provided insights into the structural features that influence BODIPY compound properties; meanwhile, it also emphasizes the importance of molecular branching, size, and specific functional groups. This work shows that applied combined cheminformatics and machine learning approach is robust to screen the BODIPY compounds and design novel structures with enhanced performance.
Methods
In the present study, all the BODIPY models studied were fully optimized, and the corresponding absorption spectrum was obtained at DFT/TDDFT//B3LYP/6-311G(d,p) level. All the above calculations were executed by the Gaussian 16 program. Based upon the theoretical computational results, the machine learning-based quantitative structure–property relationship (ML/QSPR) model was employed for predicting the maximum absorption wavelength (λ) of BODIPY compounds by combining hand-crafted molecular descriptors (MD) and explainable machine learning (EML) techniques using Scikit-learn python library. A dataset of 131 BODIPY compounds with their experimental photophysical properties was used to generate a diverse set of molecular descriptors capturing information about the size, shape, connectivity, and other structural features of these compounds using Chemaxon and Alvadesc software. A genetic algorithm (GA) variable selection together with the multi-linear regression (MLR) method were applied to develop the best predictive model using the Genetic Selection python library.
期刊介绍:
The Journal of Molecular Modeling focuses on "hardcore" modeling, publishing high-quality research and reports. Founded in 1995 as a purely electronic journal, it has adapted its format to include a full-color print edition, and adjusted its aims and scope fit the fast-changing field of molecular modeling, with a particular focus on three-dimensional modeling.
Today, the journal covers all aspects of molecular modeling including life science modeling; materials modeling; new methods; and computational chemistry.
Topics include computer-aided molecular design; rational drug design, de novo ligand design, receptor modeling and docking; cheminformatics, data analysis, visualization and mining; computational medicinal chemistry; homology modeling; simulation of peptides, DNA and other biopolymers; quantitative structure-activity relationships (QSAR) and ADME-modeling; modeling of biological reaction mechanisms; and combined experimental and computational studies in which calculations play a major role.