Pub Date : 2023-11-19DOI: 10.1016/j.aichem.2023.100027
Si-Min Qi , Tao Bo , Lei Zhang , Zhi-Fang Chai , Wei-Qun Shi
The thermodynamic and transport properties of high-temperature chloride molten salt systems are of great significance for spent fuel reprocessing in the field of nuclear energy engineering. Here, by using machine learning based deep potential (DP) method, we train a high-precision force field model for the LiCl-KCl-LiF system. During force field training, adding new dataset through multiple iterations improves the accuracy of the force field model and its applicability to more configurations. The comparison of density functional theory (DFT) and DP results for the test dataset indicates that our trained DP model has the same accuracy as DFT. Then, we comprehensively investigate the local structure, thermophysical properties, and transport properties of the LiCl-KCl and LiCl-KCl-LiF molten salt systems using the trained DP model. The effects of temperature and LiF concentration on the above properties are analyzed. This work provides guidance for the training of machine learning force fields in molten salt systems and the study of basic physical properties of high-temperature chloride molten salt systems.
{"title":"Machine-learning-driven simulations on microstructure, thermodynamic properties, and transport properties of LiCl-KCl-LiF molten salt","authors":"Si-Min Qi , Tao Bo , Lei Zhang , Zhi-Fang Chai , Wei-Qun Shi","doi":"10.1016/j.aichem.2023.100027","DOIUrl":"https://doi.org/10.1016/j.aichem.2023.100027","url":null,"abstract":"<div><p>The thermodynamic and transport properties of high-temperature chloride molten salt systems are of great significance for spent fuel reprocessing in the field of nuclear energy engineering. Here, by using machine learning based deep potential (DP) method, we train a high-precision force field model for the LiCl-KCl-LiF system. During force field training, adding new dataset through multiple iterations improves the accuracy of the force field model and its applicability to more configurations. The comparison of density functional theory (DFT) and DP results for the test dataset indicates that our trained DP model has the same accuracy as DFT. Then, we comprehensively investigate the local structure, thermophysical properties, and transport properties of the LiCl-KCl and LiCl-KCl-LiF molten salt systems using the trained DP model. The effects of temperature and LiF concentration on the above properties are analyzed. This work provides guidance for the training of machine learning force fields in molten salt systems and the study of basic physical properties of high-temperature chloride molten salt systems.</p></div>","PeriodicalId":72302,"journal":{"name":"Artificial intelligence chemistry","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2949747723000271/pdfft?md5=036ccca1e342d34c04c5cc6fb6e73f01&pid=1-s2.0-S2949747723000271-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138484545","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-11-17DOI: 10.1016/j.aichem.2023.100025
Chen Qu , Paul L. Houston , Qi Yu , Priyanka Pandey , Riccardo Conte , Apurba Nandi , Joel M. Bowman
As a follow-up to our recent Communication in the Journal of Chemical Physics [J. Chem. Phys. 159 071101 (2023)], we report and make available the Jupyter Notebook software here. This software performs binary machine learning classification (MLC) with the goal of learning negligible Hamiltonian matrix elements for vibrational dynamics. We illustrate its usefulness for a Hamiltonian matrix for H2O by using three MLC algorithms: Random Forest, Support Vector Machine, and Multi-layer Perceptron.
{"title":"Machine learning software to learn negligible elements of the Hamiltonian matrix","authors":"Chen Qu , Paul L. Houston , Qi Yu , Priyanka Pandey , Riccardo Conte , Apurba Nandi , Joel M. Bowman","doi":"10.1016/j.aichem.2023.100025","DOIUrl":"https://doi.org/10.1016/j.aichem.2023.100025","url":null,"abstract":"<div><p>As a follow-up to our recent Communication in the Journal of Chemical Physics [J. Chem. Phys. 159 071101 (2023)], we report and make available the Jupyter Notebook software here. This software performs binary machine learning classification (MLC) with the goal of learning negligible Hamiltonian matrix elements for vibrational dynamics. We illustrate its usefulness for a Hamiltonian matrix for H<sub>2</sub>O by using three MLC algorithms: Random Forest, Support Vector Machine, and Multi-layer Perceptron.</p></div>","PeriodicalId":72302,"journal":{"name":"Artificial intelligence chemistry","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2949747723000258/pdfft?md5=aae23141726aebcb5969aecabfb1ff8f&pid=1-s2.0-S2949747723000258-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138430215","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-11-14DOI: 10.1016/j.aichem.2023.100026
Jose Isagani B. Janairo
Cholesterol-lowering peptides (CLPs) are bioactive biomolecules often derived from food proteins. These short peptides bind with bile acids leading to decreased intestinal absorption of cholesterol. CLPs are promising bioceuticals that can possibly be used to support interventions for the management of high cholesterol. Integrating machine learning (ML) in the screening and discovery workflow for CLP can reduce trial-and-error thereby accelerating and increase the efficiency of the overall process. In this study, a support vector machine model that can distinguish CLPs from non-CLPs is presented. The model was built on a diverse dataset of 1840 peptides, with sequence length that ranges from 4 to 7. The ML model only needs 8 features (VHSE scores), and the most important features were found to be related to peptide polarity and hydrophobicity based on feature importance analysis utilizing Shapley and permutation-based method. The formulated ML classifier is reliable, as demonstrated by AUC >0.7 for a diverse test dataset and AUC >0.9 for a conservative validation dataset composed mainly of the top and bottom CLPs. Overall, the presented ML model presents incremental yet meaningful advances to the application of ML for understanding the nature of CLPs, and their discovery and development.
{"title":"A machine learning classification model for cholesterol-lowering peptides","authors":"Jose Isagani B. Janairo","doi":"10.1016/j.aichem.2023.100026","DOIUrl":"10.1016/j.aichem.2023.100026","url":null,"abstract":"<div><p>Cholesterol-lowering peptides (CLPs) are bioactive biomolecules often derived from food proteins. These short peptides bind with bile acids leading to decreased intestinal absorption of cholesterol. CLPs are promising bioceuticals that can possibly be used to support interventions for the management of high cholesterol. Integrating machine learning (ML) in the screening and discovery workflow for CLP can reduce trial-and-error thereby accelerating and increase the efficiency of the overall process. In this study, a support vector machine model that can distinguish CLPs from non-CLPs is presented. The model was built on a diverse dataset of 1840 peptides, with sequence length that ranges from 4 to 7. The ML model only needs 8 features (VHSE scores), and the most important features were found to be related to peptide polarity and hydrophobicity based on feature importance analysis utilizing Shapley and permutation-based method. The formulated ML classifier is reliable, as demonstrated by AUC >0.7 for a diverse test dataset and AUC >0.9 for a conservative validation dataset composed mainly of the top and bottom CLPs. Overall, the presented ML model presents incremental yet meaningful advances to the application of ML for understanding the nature of CLPs, and their discovery and development.</p></div>","PeriodicalId":72302,"journal":{"name":"Artificial intelligence chemistry","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S294974772300026X/pdfft?md5=0835f2ca55b7c8185903061e3f9f59c0&pid=1-s2.0-S294974772300026X-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135764267","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Here we describe the results of QSAR analysis based on artificial neural networks, synthesis, activity evaluation and molecular docking of a number of 1,3-oxazole derivatives as anti-E. coli antibacterials. All developed QSAR models showed excellent statistics on training (with determination coefficient q2 as 0.76 ± 0.01) and test samples (with q2 as 0.78 ± 0.01). The models were successfully used to identify nine novel 5-amino-4-cyano-1,3-oxazoles with potential anti-E. coli activity. All nine 1,3-oxazoles with predicted high antibacterial potential showed different levels of anti- E. coli in vitro activity. 5-amino-4-cyano-1,3-oxazoles 1 and 3 showed the highest antibacterial activity on average from 17 to 27 mm against MDR, hemolytic MDR and ATCC 25922 E. coli colistin-resistant strains, respectively. The comparative docking analysis demonstrated a possible mechanism of the antibacterial action of the studied 1, 3-oxazoles 1 and 3 through inhibition of E. coli enoyl-ACP reductase (ENR) involved in the biosynthesis of bacterial fatty acids. The localization type is shown of 5-amino-4-cyano-1,3-oxazoles 1 and 3 into the E. coli ENR active site with estimated binding energy from − 10.1 to − 9.5 kcal/mol and hydrogen bonds formation with key amino acids similar to Triclosan. These facts confirm the validity of the hypothesis put forward about the potential antibacterial mechanism of 5-amino-4- cyano-1,3-oxazoles.
{"title":"Development and application of in silico models to design new antibacterial 5-amino-4-cyano-1,3-oxazoles against colistin-resistant E. coli strains","authors":"Ivan Semenyuta, Diana Hodyna, Vasyl Kovalishyn, Bohdan Demydchuk, Maryna Kachaeva, Stepan Pilyo, Volodymyr Brovarets, Larysa Metelytsia","doi":"10.1016/j.aichem.2023.100024","DOIUrl":"https://doi.org/10.1016/j.aichem.2023.100024","url":null,"abstract":"<div><p>Here we describe the results of QSAR analysis based on artificial neural networks, synthesis, activity evaluation and molecular docking of a number of 1,3-oxazole derivatives as anti-E. coli antibacterials. All developed QSAR models showed excellent statistics on training (with determination coefficient q<sup>2</sup> as 0.76 ± 0.01) and test samples (with q<sup>2</sup> as 0.78 ± 0.01). The models were successfully used to identify nine novel 5-amino-4-cyano-1,3-oxazoles with potential anti-E. coli activity. All nine 1,3-oxazoles with predicted high antibacterial potential showed different levels of anti- E. coli in vitro activity. 5-amino-4-cyano-1,3-oxazoles <strong>1</strong> and <strong>3</strong> showed the highest antibacterial activity on average from 17 to 27 mm against MDR, hemolytic MDR and ATCC 25922 <em>E. coli</em> colistin-resistant strains, respectively. The comparative docking analysis demonstrated a possible mechanism of the antibacterial action of the studied 1, 3-oxazoles <strong>1</strong> and <strong>3</strong> through inhibition of <em>E. coli</em> enoyl-ACP reductase (ENR) involved in the biosynthesis of bacterial fatty acids. The localization type is shown of 5-amino-4-cyano-1,3-oxazoles <strong>1</strong> and <strong>3</strong> into the <em>E. coli</em> ENR active site with estimated binding energy from − 10.1 to − 9.5 kcal/mol and hydrogen bonds formation with key amino acids similar to Triclosan. These facts confirm the validity of the hypothesis put forward about the potential antibacterial mechanism of 5-amino-4- cyano-1,3-oxazoles.</p></div>","PeriodicalId":72302,"journal":{"name":"Artificial intelligence chemistry","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2949747723000246/pdfft?md5=c9085bc34142109bacab7efa22188c7f&pid=1-s2.0-S2949747723000246-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91987332","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-11-07DOI: 10.1016/j.aichem.2023.100023
Feng Wang , Vladislav Vasilyev
This study turns the design and screen of new compounds into a computer integer crunch of the control arrays using a scaffold based Turing machine model. If small organic fragments are stored in a fragment database (FDB) in which each fragment is labelled by an integer in an array, the position and frequency of the integer control how the fragment clicks on a scaffold (template compound). This method can robustly screen a large number of candidate fragments for solar cells and other applications such as drug design with minimal human assistance. As a proof of concept, we consider terminal imide substituents on the core perylene diimide (PDI) to develop PDI derivatives capable of absorbing UV–vis light for solar cell applications. Time dependent-density functional theory (TD-DFT) method was employed in the calculations. When the imide substituents are electron donors such as azobenzene (DPI-7), they produce a larger bathochromic shift (Δλmax) from the core DPI band position. The UV–vis absorption transitions of these DPI derivatives have more charge transfer (CT) character, as the highest occupied molecular orbitals (HOMO) are located on the fragments rather than the core DPI region. Our study presents a robust and efficient high-performance organic dye screen design strategy, and further research in DPI-based solar cell design will focus on promoting the HOMO to LUMO transitions of the optical spectra.
{"title":"Robust design strategy using a scaffold based Turing machine model--- Application to PDI based dyes","authors":"Feng Wang , Vladislav Vasilyev","doi":"10.1016/j.aichem.2023.100023","DOIUrl":"https://doi.org/10.1016/j.aichem.2023.100023","url":null,"abstract":"<div><p>This study turns the design and screen of new compounds into a computer integer crunch of the control arrays using a scaffold based Turing machine model. If small organic fragments are stored in a fragment database (FDB) in which each fragment is labelled by an integer in an array, the position and frequency of the integer control how the fragment clicks on a scaffold (template compound). This method can robustly screen a large number of candidate fragments for solar cells and other applications such as drug design with minimal human assistance. As a proof of concept, we consider terminal imide substituents on the core perylene diimide (PDI) to develop PDI derivatives capable of absorbing UV–vis light for solar cell applications. Time dependent-density functional theory (TD-DFT) method was employed in the calculations. When the imide substituents are electron donors such as azobenzene (DPI-7), they produce a larger bathochromic shift (Δλ<sub>max</sub>) from the core DPI band position. The UV–vis absorption transitions of these DPI derivatives have more charge transfer (CT) character, as the highest occupied molecular orbitals (HOMO) are located on the fragments rather than the core DPI region. Our study presents a robust and efficient high-performance organic dye screen design strategy, and further research in DPI-based solar cell design will focus on promoting the HOMO to LUMO transitions of the optical spectra.</p></div>","PeriodicalId":72302,"journal":{"name":"Artificial intelligence chemistry","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2949747723000234/pdfft?md5=b6b1b440208372f0df0d3764b52bd55d&pid=1-s2.0-S2949747723000234-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134657401","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-10-29DOI: 10.1016/j.aichem.2023.100021
Bienfait K. Isamura, Paul L.A. Popelier
FEREBUS is a Gaussian process regression (GPR) engine embedded in the large machinery of FFLUX, a novel machine learnt force field developed from scratch through several well-documented proof-of-concept studies. This package relies on the exploration and exploitation capabilities of metaheuristic algorithms (MAs) to carry out the global optimisation of GPR model hyperparameters (). However, because MAs employ different search mechanisms to scrutinise the hyperparameter space, their performance on a specific optimisation task can vary a lot from one technique to another. Herein, we report a series of carefully designed experiments aimed at evaluating the ability of ten metaheuristic algorithms to locate the optimal set of values. Selected optimisation techniques belong to four popular families of MAs, namely particle swarm optimisation (4), grey wolf optimisation (2), bat (2) and firefly (2) algorithms. Our calculations suggest that grey wolf optimisers (GWOs) achieve the best results on average. Furthermore, the RMSE() cost function is confirmed to be an excellent guide for the selection of atomic GPR models. This work also briefly introduces an enhanced grey wolf optimiser called GWO-RUHL (Random Update of the Hierarchy Ladder), which accounts for the (so far omitted) natural desire of non-leader wolves to occupy high-ranked leadership positions in the pack. We demonstrate that GWO-RUHL achieves better results than the standard GWO in terms of both convergence speed and quality of solutions.
{"title":"Metaheuristic optimisation of Gaussian process regression model hyperparameters: Insights from FEREBUS","authors":"Bienfait K. Isamura, Paul L.A. Popelier","doi":"10.1016/j.aichem.2023.100021","DOIUrl":"https://doi.org/10.1016/j.aichem.2023.100021","url":null,"abstract":"<div><p>FEREBUS is a Gaussian process regression (GPR) engine embedded in the large machinery of FFLUX, a novel machine learnt force field developed from scratch through several well-documented proof-of-concept studies. This package relies on the exploration and exploitation capabilities of metaheuristic algorithms (MAs) to carry out the global optimisation of GPR model hyperparameters (<span><math><mi>θ</mi></math></span>). However, because MAs employ different search mechanisms to scrutinise the hyperparameter space, their performance on a specific optimisation task can vary a lot from one technique to another. Herein, we report a series of carefully designed experiments aimed at evaluating the ability of ten metaheuristic algorithms to locate the optimal set of <span><math><mi>θ</mi></math></span> values. Selected optimisation techniques belong to four popular families of MAs, namely particle swarm optimisation (4), grey wolf optimisation (2), bat (2) and firefly (2) algorithms. Our calculations suggest that grey wolf optimisers (GWOs) achieve the best results on average. Furthermore, the RMSE(<span><math><mi>θ</mi></math></span>) cost function is confirmed to be an excellent guide for the selection of atomic GPR models. This work also briefly introduces an enhanced grey wolf optimiser called GWO-RUHL (Random Update of the Hierarchy Ladder), which accounts for the (so far omitted) natural desire of non-leader wolves to occupy high-ranked leadership positions in the pack. We demonstrate that GWO-RUHL achieves better results than the standard GWO in terms of both convergence speed and quality of solutions.</p></div>","PeriodicalId":72302,"journal":{"name":"Artificial intelligence chemistry","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2949747723000210/pdfft?md5=b3d2985c50bf91347418f158a01005cc&pid=1-s2.0-S2949747723000210-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"92061992","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-10-24DOI: 10.1016/j.aichem.2023.100022
Laurent Soulère, Yves Queneau
The LC3 proteins play a crucial role in autophagy by participating to the formation of the autophagosome. Modulation of autophagy by molecular interference with LC3 proteins could help to understand this complex fundamental biological process and how it is involved in several pathologies. Identifying new LC3 ligands is a useful contribution to this aim. In the present study, we created a PubChem library of 749 compounds having a structure based on the central scaffold of novobiocin, a reported LC3A ligand. A robust, rapid and exhaustive algorithm was used for docking each compound of this database as ligands within the dihydronovobiocin binding site, providing a docking score. Remarkable reliability and consistency between docking scores and the reported binding efficiencies of known ligands was observed, validating the machine leaning protocol used in this study. Investigation of the binding mode of the ligands having the best docking score provides additional insights in possible mode of actions of the LC3 identified ligands.
{"title":"Machine learning approaches for the identification of ligands of the autophagy marker LC3","authors":"Laurent Soulère, Yves Queneau","doi":"10.1016/j.aichem.2023.100022","DOIUrl":"https://doi.org/10.1016/j.aichem.2023.100022","url":null,"abstract":"<div><p>The LC3 proteins play a crucial role in autophagy by participating to the formation of the autophagosome. Modulation of autophagy by molecular interference with LC3 proteins could help to understand this complex fundamental biological process and how it is involved in several pathologies. Identifying new LC3 ligands is a useful contribution to this aim. In the present study, we created a PubChem library of 749 compounds having a structure based on the central scaffold of novobiocin, a reported LC3A ligand. A robust, rapid and exhaustive algorithm was used for docking each compound of this database as ligands within the dihydronovobiocin binding site, providing a docking score. Remarkable reliability and consistency between docking scores and the reported binding efficiencies of known ligands was observed, validating the machine leaning protocol used in this study. Investigation of the binding mode of the ligands having the best docking score provides additional insights in possible mode of actions of the LC3 identified ligands.</p></div>","PeriodicalId":72302,"journal":{"name":"Artificial intelligence chemistry","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2949747723000222/pdfft?md5=535de2ec95e92e677368af743f018ee2&pid=1-s2.0-S2949747723000222-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91987333","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-10-19DOI: 10.1016/j.aichem.2023.100020
Nikolai Schapin , Maciej Majewski , Alejandro Varela-Rial , Carlos Arroniz , Gianni De Fabritiis
Machine learning (ML) is a promising approach for predicting small molecule properties in drug discovery. Here, we provide a comprehensive overview of various ML methods introduced for this purpose in recent years. We review a wide range of properties, including binding affinities, solubility, and ADMET (Absorption, Distribution, Metabolism, Excretion, and Toxicity). We discuss existing popular datasets and molecular descriptors and embeddings, such as chemical fingerprints and graph-based neural networks. We highlight also challenges of predicting and optimizing multiple properties during hit-to-lead and lead optimization stages of drug discovery and explore briefly possible multi-objective optimization techniques that can be used to balance diverse properties while optimizing lead candidates. Finally, techniques to provide an understanding of model predictions, especially for critical decision-making in drug discovery are assessed. Overall, this review provides insights into the landscape of ML models for small molecule property predictions in drug discovery. So far, there are multiple diverse approaches, but their performances are often comparable. Neural networks, while more flexible, do not always outperform simpler models. This shows that the availability of high-quality training data remains crucial for training accurate models and there is a need for standardized benchmarks, additional performance metrics, and best practices to enable richer comparisons between the different techniques and models that can shed a better light on the differences between the many techniques.
{"title":"Machine learning small molecule properties in drug discovery","authors":"Nikolai Schapin , Maciej Majewski , Alejandro Varela-Rial , Carlos Arroniz , Gianni De Fabritiis","doi":"10.1016/j.aichem.2023.100020","DOIUrl":"https://doi.org/10.1016/j.aichem.2023.100020","url":null,"abstract":"<div><p>Machine learning (ML) is a promising approach for predicting small molecule properties in drug discovery. Here, we provide a comprehensive overview of various ML methods introduced for this purpose in recent years. We review a wide range of properties, including binding affinities, solubility, and ADMET (Absorption, Distribution, Metabolism, Excretion, and Toxicity). We discuss existing popular datasets and molecular descriptors and embeddings, such as chemical fingerprints and graph-based neural networks. We highlight also challenges of predicting and optimizing multiple properties during hit-to-lead and lead optimization stages of drug discovery and explore briefly possible multi-objective optimization techniques that can be used to balance diverse properties while optimizing lead candidates. Finally, techniques to provide an understanding of model predictions, especially for critical decision-making in drug discovery are assessed. Overall, this review provides insights into the landscape of ML models for small molecule property predictions in drug discovery. So far, there are multiple diverse approaches, but their performances are often comparable. Neural networks, while more flexible, do not always outperform simpler models. This shows that the availability of high-quality training data remains crucial for training accurate models and there is a need for standardized benchmarks, additional performance metrics, and best practices to enable richer comparisons between the different techniques and models that can shed a better light on the differences between the many techniques.</p></div>","PeriodicalId":72302,"journal":{"name":"Artificial intelligence chemistry","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2949747723000209/pdfft?md5=3bda0f36e8c7232bba9ee7512ab052fa&pid=1-s2.0-S2949747723000209-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91987331","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-10-18DOI: 10.1016/j.aichem.2023.100019
Jia Li, Jun Li
The interaction between CO2 and N2, both as essential components of the Earth’s atmosphere, plays a crucial role in investigating the greenhouse effect. In this work, we sampled 40,930 data points within the full-dimensional configuration space of CO2 and N2 and performed calculations at the level of explicitly correlated coupled cluster single, double, and perturbative triple level with the augmented correlation corrected valence triple-ζ basis set (CCSD(T)-F12a/AVTZ). To ensure computational accuracy while reducing computational costs, we employed the recently proposed Δ-machine learning (Δ-ML) method based on Permutation Invariant Polynomial-Neural Network (PIP-NN) for basis set superposition error (BSSE) correction. By leveraging the limited extrapolation capability of NN, efficient sampling was performed within the existing dataset, enabling the construction of the potential energy surface (PES) incorporating BSSE correction with only a small number of data points for BSSE calculations. A total of approximately 1100 data points were selected from the initial dataset to construct a BSSE correction PES. Utilizing this correction PES, BSSE predictions were carried out for all remaining data points, resulting in the successful development of a high-precision full-dimensional PES with BSSE correction for the CO2 + N2 system. The PIP-NN based Δ-ML method significantly reduced the required BSSE calculations by approximately 97.2%, resulting in a final PES with a fitting error of merely 0.026 kcal/mol.
{"title":"An accurate full-dimensional interaction potential energy surface of CO2+N2 incorporating ∆-machine learning approach via permutation invariant polynomial-neural network","authors":"Jia Li, Jun Li","doi":"10.1016/j.aichem.2023.100019","DOIUrl":"https://doi.org/10.1016/j.aichem.2023.100019","url":null,"abstract":"<div><p>The interaction between CO<sub>2</sub> and N<sub>2</sub>, both as essential components of the Earth’s atmosphere, plays a crucial role in investigating the greenhouse effect. In this work, we sampled 40,930 data points within the full-dimensional configuration space of CO<sub>2</sub> and N<sub>2</sub> and performed calculations at the level of explicitly correlated coupled cluster single, double, and perturbative triple level with the augmented correlation corrected valence triple-ζ basis set (CCSD(T)-F12a/AVTZ). To ensure computational accuracy while reducing computational costs, we employed the recently proposed Δ-machine learning (Δ-ML) method based on Permutation Invariant Polynomial-Neural Network (PIP-NN) for basis set superposition error (BSSE) correction. By leveraging the limited extrapolation capability of NN, efficient sampling was performed within the existing dataset, enabling the construction of the potential energy surface (PES) incorporating BSSE correction with only a small number of data points for BSSE calculations. A total of approximately 1100 data points were selected from the initial dataset to construct a BSSE correction PES. Utilizing this correction PES, BSSE predictions were carried out for all remaining data points, resulting in the successful development of a high-precision full-dimensional PES with BSSE correction for the CO<sub>2</sub> + N<sub>2</sub> system. The PIP-NN based Δ-ML method significantly reduced the required BSSE calculations by approximately 97.2%, resulting in a final PES with a fitting error of merely 0.026 kcal/mol.</p></div>","PeriodicalId":72302,"journal":{"name":"Artificial intelligence chemistry","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2949747723000192/pdfft?md5=4f0503b66010517c20f46da9e39da648&pid=1-s2.0-S2949747723000192-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"92061993","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-10-16DOI: 10.1016/j.aichem.2023.100018
Li Wang, Zhendong Li, Jingbai Li
Machine learning photodynamics simulations are revolutionary tools to resolve elusive photochemical reaction mechanisms with time-dependent high-fidelity structure information. Besides the recent advances in neural networks (NNs) potentials, it still lacks a general rule for designing training data for learning photochemical reaction mechanisms with Wigner sampling and geometry interpolation. We present an in-depth investigation of the relationship between the accuracy of the multiple layer NNs and the combinations of training data based on the Wigner sampling and geometry interpolation using model photochemical reactions of the [3]-ladderdiene systems. The NNs trained with Wigner sampling data show underfitting, where the NN errors increase with the structural complexity and diversity. The NNs trained with composite Wigner sampling and geometry interpolation data show one magnitude reduced errors, suggesting an essential role of geometry interpolation in facilitating NNs learning the potential energy surfaces. However, increasing the interpolation steps results in overfitting if the Wigner sampled configuration space is narrowed. Correlating the mean absolute errors (MAE) of the NN predicted energies for the sampled and out-of-sample structures shows an optimal combination ratio of 100:10 between the Wigner sampling structures and geometry interpolation steps for 1000 training data, where the MAE of the sampled structures achieve chemical accuracy while the MAE of the out-of-sample structures is minimized. The NNs trained with the optimally combined data can detect the out-of-sample structures in adaptive sampling with a positive correlation between the maximum standard deviation and MAE of the predicted energies. Collectively, our findings suggest a general rule for designing the training data for ML photodynamics.
{"title":"Balancing Wigner sampling and geometry interpolation for deep neural networks learning photochemical reactions","authors":"Li Wang, Zhendong Li, Jingbai Li","doi":"10.1016/j.aichem.2023.100018","DOIUrl":"https://doi.org/10.1016/j.aichem.2023.100018","url":null,"abstract":"<div><p>Machine learning photodynamics simulations are revolutionary tools to resolve elusive photochemical reaction mechanisms with time-dependent high-fidelity structure information. Besides the recent advances in neural networks (NNs) potentials, it still lacks a general rule for designing training data for learning photochemical reaction mechanisms with Wigner sampling and geometry interpolation. We present an in-depth investigation of the relationship between the accuracy of the multiple layer NNs and the combinations of training data based on the Wigner sampling and geometry interpolation using model photochemical reactions of the [3]-ladderdiene systems. The NNs trained with Wigner sampling data show underfitting, where the NN errors increase with the structural complexity and diversity. The NNs trained with composite Wigner sampling and geometry interpolation data show one magnitude reduced errors, suggesting an essential role of geometry interpolation in facilitating NNs learning the potential energy surfaces. However, increasing the interpolation steps results in overfitting if the Wigner sampled configuration space is narrowed. Correlating the mean absolute errors (MAE) of the NN predicted energies for the sampled and out-of-sample structures shows an optimal combination ratio of 100:10 between the Wigner sampling structures and geometry interpolation steps for 1000 training data, where the MAE of the sampled structures achieve chemical accuracy while the MAE of the out-of-sample structures is minimized. The NNs trained with the optimally combined data can detect the out-of-sample structures in adaptive sampling with a positive correlation between the maximum standard deviation and MAE of the predicted energies. Collectively, our findings suggest a general rule for designing the training data for ML photodynamics.</p></div>","PeriodicalId":72302,"journal":{"name":"Artificial intelligence chemistry","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2949747723000180/pdfft?md5=2cdb8ecc2616508d396111c8c149852d&pid=1-s2.0-S2949747723000180-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"92047094","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}