Pub Date : 2023-12-15DOI: 10.1016/j.aichem.2023.100035
Jinglong Lin , Fanyang Mo
In this review, we explore the integration of intelligent algorithms in chemistry and materials science.We begin by delineating the core principles of Machine Learning, Deep Learning, and optimization algorithms, highlighting their bespoke adaptation to these scientific domains. The focus then shifts to the critical processes of data management, including collection, refinement, and feature engineering, alongside strategies for efficient data mining from targeted databases and literatures. Subsequently, we present a concise overview of the diverse applications of these algorithms, emphasizing their transformative impact in both fields. Finally, this review explores the future prospects and challenges of these emerging algorithms.
{"title":"Empowering research in chemistry and materials science through intelligent algorithms","authors":"Jinglong Lin , Fanyang Mo","doi":"10.1016/j.aichem.2023.100035","DOIUrl":"https://doi.org/10.1016/j.aichem.2023.100035","url":null,"abstract":"<div><p>In this review, we explore the integration of intelligent algorithms in chemistry and materials science.We begin by delineating the core principles of Machine Learning, Deep Learning, and optimization algorithms, highlighting their bespoke adaptation to these scientific domains. The focus then shifts to the critical processes of data management, including collection, refinement, and feature engineering, alongside strategies for efficient data mining from targeted databases and literatures. Subsequently, we present a concise overview of the diverse applications of these algorithms, emphasizing their transformative impact in both fields. Finally, this review explores the future prospects and challenges of these emerging algorithms.</p></div>","PeriodicalId":72302,"journal":{"name":"Artificial intelligence chemistry","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2949747723000350/pdfft?md5=f73da155cd3c387fc723aa1852c198dc&pid=1-s2.0-S2949747723000350-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138838795","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-12-15DOI: 10.1016/j.aichem.2023.100037
Jin Xiao , YiXiao Chen , LinFeng Zhang , Han Wang , Tong Zhu
In computer-aided drug discovery, accurately determining the structure and properties of drug-like molecules is of utmost importance. This necessitates the use of precise and efficient electronic structure methods. Here, we developed two deep learning-based density functional methods, namely DeePHF and DeePKS, specifically tailored for drug-like molecules. Notably, DeePKS incorporates self-consistency into its framework. With a limited dataset labelled at the CCSD(T)/def2-TZVP level, both models have been able to achieve chemical accuracy in calculating molecular energies and have demonstrated excellent transferability. We anticipate that further advancements in this field will lead to the development of high-quality density functional methods designed specifically for drug discovery purposes. This research showcases the capabilities of deep learning approaches in simplifying the construction complexity associated with traditional DFT methods.
{"title":"A machine learning-based high-precision density functional method for drug-like molecules","authors":"Jin Xiao , YiXiao Chen , LinFeng Zhang , Han Wang , Tong Zhu","doi":"10.1016/j.aichem.2023.100037","DOIUrl":"https://doi.org/10.1016/j.aichem.2023.100037","url":null,"abstract":"<div><p>In computer-aided drug discovery, accurately determining the structure and properties of drug-like molecules is of utmost importance. This necessitates the use of precise and efficient electronic structure methods. Here, we developed two deep learning-based density functional methods, namely DeePHF and DeePKS, specifically tailored for drug-like molecules. Notably, DeePKS incorporates self-consistency into its framework. With a limited dataset labelled at the CCSD(T)/def2-TZVP level, both models have been able to achieve chemical accuracy in calculating molecular energies and have demonstrated excellent transferability. We anticipate that further advancements in this field will lead to the development of high-quality density functional methods designed specifically for drug discovery purposes. This research showcases the capabilities of deep learning approaches in simplifying the construction complexity associated with traditional DFT methods.</p></div>","PeriodicalId":72302,"journal":{"name":"Artificial intelligence chemistry","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2949747723000374/pdfft?md5=75400cd611ac51291405e572faae390a&pid=1-s2.0-S2949747723000374-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138769494","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-12-12DOI: 10.1016/j.aichem.2023.100036
Apurba Nandi , Péter R. Nagy
Developing full-dimensional machine-learned potentials with the current “gold-standard” coupled-cluster (CC) level is challenging for medium-sized molecules due to the high computational cost. Consequently, researchers are often bound to use lower-level electronic structure methods such as density functional theory or second-order Møller–Plesset perturbation theory (MP2). Here, we demonstrate on a representative example that gold-standard potentials can now be effectively constructed for molecules of 15 atoms using off-the-shelf hardware. This is achieved by accelerating the CCSD(T) computations via the accurate and cost-effective frozen natural orbital (FNO) approach. The Δ-machine learning (Δ-ML) approach is employed with the use of permutationally invariant polynomials to fit a full-dimensional potential energy surface of the acetylacetone molecule, but any other effective descriptor and ML approach can similarly benefit from the accelerated data generation proposed here. Our benchmarks for the global minima, H-transfer TS, and many high-lying configurations show the excellent agreement of FNO-CCSD(T) results with conventional CCSD(T) while achieving a significant time advantage of about a factor of 30–40. The obtained Δ-ML PES shows high fidelity from multiple perspectives including energetic, structural, and vibrational properties. We obtain the symmetric double well H-transfer barrier of 3.15 kcal/mol in excellent agreement with the direct FNO-CCSD(T) barrier of 3.11 kcal/mol as well as with the benchmark CCSD(F12*)(T+)/CBS value of 3.21 kcal/mol. Furthermore, the tunneling splitting due to H-atom transfer is calculated using a 1D double-well potential, providing improved estimates over previous ones obtained using an MP2-based PES. The methodology introduced here represents a significant advancement in the efficient and precise construction of potentials at the CCSD(T) level for molecules above the current limit of 15 atoms.
由于计算成本高昂,利用目前的 "黄金标准 "耦合簇(CC)水平开发全维机器学习势能对于中等大小的分子来说具有挑战性。因此,研究人员往往不得不使用密度泛函理论或二阶默勒-普莱塞特扰动理论(MP2)等低级电子结构方法。在这里,我们通过一个具有代表性的例子证明,现在可以使用现成的硬件为 15 个原子的分子有效地构建黄金标准电势。这是通过精确而经济的冻结自然轨道(FNO)方法加速 CCSD(T) 计算实现的。我们采用了Δ-机器学习(Δ-ML)方法,利用包覆不变多项式来拟合乙酰丙酮分子的全维势能面,但任何其他有效的描述符和 ML 方法也同样可以从本文提出的加速数据生成中受益。我们对全局最小值、H-转移 TS 和许多高位构型的基准测试表明,FNO-CCSD(T) 的结果与传统的 CCSD(T) 非常吻合,同时在时间上取得了约 30-40 倍的显著优势。所获得的 Δ-ML PES 从能量、结构和振动特性等多个角度显示了高保真性。我们得到的对称双阱氢转移势垒为 3.15 kcal/mol,与直接 FNO-CCSD(T)势垒 3.11 kcal/mol 以及基准 CCSD(F12*)(T+)/CBS 值 3.21 kcal/mol 非常一致。此外,使用一维双阱势能计算了 H 原子转移引起的隧穿分裂,与之前使用基于 MP2 的 PES 所获得的估计值相比,计算结果有所改进。这里介绍的方法代表了在 CCSD(T) 水平上高效、精确地构建当前限制为 15 个原子以上的分子势方面的重大进步。
{"title":"Combining state-of-the-art quantum chemistry and machine learning make gold standard potential energy surfaces accessible for medium-sized molecules","authors":"Apurba Nandi , Péter R. Nagy","doi":"10.1016/j.aichem.2023.100036","DOIUrl":"10.1016/j.aichem.2023.100036","url":null,"abstract":"<div><p>Developing full-dimensional machine-learned potentials with the current “gold-standard” coupled-cluster (CC) level is challenging for medium-sized molecules due to the high computational cost. Consequently, researchers are often bound to use lower-level electronic structure methods such as density functional theory or second-order Møller–Plesset perturbation theory (MP2). Here, we demonstrate on a representative example that gold-standard potentials can now be effectively constructed for molecules of 15 atoms using off-the-shelf hardware. This is achieved by accelerating the CCSD(T) computations via the accurate and cost-effective frozen natural orbital (FNO) approach. The Δ-machine learning (Δ-ML) approach is employed with the use of permutationally invariant polynomials to fit a full-dimensional potential energy surface of the acetylacetone molecule, but any other effective descriptor and ML approach can similarly benefit from the accelerated data generation proposed here. Our benchmarks for the global minima, H-transfer TS, and many high-lying configurations show the excellent agreement of FNO-CCSD(T) results with conventional CCSD(T) while achieving a significant time advantage of about a factor of 30–40. The obtained Δ-ML PES shows high fidelity from multiple perspectives including energetic, structural, and vibrational properties. We obtain the symmetric double well H-transfer barrier of 3.15 kcal/mol in excellent agreement with the direct FNO-CCSD(T) barrier of 3.11 kcal/mol as well as with the benchmark CCSD(F12*)(T+)/CBS value of 3.21 kcal/mol. Furthermore, the tunneling splitting due to H-atom transfer is calculated using a 1D double-well potential, providing improved estimates over previous ones obtained using an MP2-based PES. The methodology introduced here represents a significant advancement in the efficient and precise construction of potentials at the CCSD(T) level for molecules above the current limit of 15 atoms.</p></div>","PeriodicalId":72302,"journal":{"name":"Artificial intelligence chemistry","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2949747723000362/pdfft?md5=c6666f5fcbc3a2bf27c6aae23a604aaf&pid=1-s2.0-S2949747723000362-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138991782","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-12-08DOI: 10.1016/j.aichem.2023.100033
Sugata Goswami , Silvan Käser , Raymond J. Bemish , Markus Meuwly
The effect of noise in the input data for learning potential energy surfaces (PESs) based on neural networks for chemical applications is assessed. Noise in energies and forces can result from aleatoric and epistemic errors in the quantum chemical reference calculations. Statistical (aleatoric) noise arises for example due to the need to set convergence thresholds in the self consistent field (SCF) iterations whereas systematic (epistemic) noise is due to, inter alia, particular choices of basis sets in the calculations. The two molecules considered here as proxies are H2CO and HONO which are examples for single- and multi-reference problems, respectively, for geometries around the minimum energy structure. For H2CO it is found that adding noise to energies and forces with magnitudes representative of single-point calculations does not deteriorate the quality of the final PESs whereas increasing the noise level commensurate with electronic structure calculations for more complicated, e.g. metal-containing, systems is expected to have a more notable effect. On the other hand, for HONO which requires a multi-reference treatment, a clear correlation between model quality and the degree of multi-reference character as measured by the T1 amplitude is found. It is concluded that for chemically “simple” cases the effect of aleatoric and epistemic errors is manageable without evident deterioration of the trained model, but more care needs to be exercised for situations in which multi-reference effects are present.
{"title":"Effects of aleatoric and epistemic errors in reference data on the learnability and quality of NN-based potential energy surfaces","authors":"Sugata Goswami , Silvan Käser , Raymond J. Bemish , Markus Meuwly","doi":"10.1016/j.aichem.2023.100033","DOIUrl":"10.1016/j.aichem.2023.100033","url":null,"abstract":"<div><p>The effect of noise in the input data for learning potential energy surfaces (PESs) based on neural networks for chemical applications is assessed. Noise in energies and forces can result from aleatoric and epistemic errors in the quantum chemical reference calculations. Statistical (aleatoric) noise arises for example due to the need to set convergence thresholds in the self consistent field (SCF) iterations whereas systematic (epistemic) noise is due to, <em>i</em>nter alia, particular choices of basis sets in the calculations. The two molecules considered here as proxies are H<sub>2</sub>CO and HONO which are examples for single- and multi-reference problems, respectively, for geometries around the minimum energy structure. For H<sub>2</sub>CO it is found that adding noise to energies and forces with magnitudes representative of single-point calculations does not deteriorate the quality of the final PESs whereas increasing the noise level commensurate with electronic structure calculations for more complicated, e.g. metal-containing, systems is expected to have a more notable effect. On the other hand, for HONO which requires a multi-reference treatment, a clear correlation between model quality and the degree of multi-reference character as measured by the <em>T</em><sub>1</sub> amplitude is found. It is concluded that for chemically “simple” cases the effect of aleatoric and epistemic errors is manageable without evident deterioration of the trained model, but more care needs to be exercised for situations in which multi-reference effects are present.</p></div>","PeriodicalId":72302,"journal":{"name":"Artificial intelligence chemistry","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2949747723000337/pdfft?md5=391098ccf3759b129948054b61d9af08&pid=1-s2.0-S2949747723000337-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138611496","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-12-07DOI: 10.1016/j.aichem.2023.100034
Linke He , Yulong Fu , Shaoyi Hou , Guoqiang Wang , Jiabao Zhao , Yipeng Xing , Shuhua Li , Jing Ma
Gaining insights into overarching trends in chemical reaction systems is crucial for refining reaction conditions and developing novel reactions. These knowledgements include preferences for certain reagents, solvents, and functional group tolerance rules. Traditionally, synthetic chemists have relied on extensive literature searching to acquire the knowledge, a process that is both time-consuming and laborious. To streamline this process, we construct a standardized dataset and knowledge graph on an emerging domain, transition-metal-free transformations with organoborons. The dataset, compiled from organic reaction literature, includes comprehensive details of reaction scopes and conditions. The subsequent construction of a knowledge graph offers a visual representation of the reactions and their interrelationships. Through knowledge graph-based hierarchical analysis and density functional theory (DFT) calculations, we revealed the currently most frequently used reactants, synthetic conditions, and functional group rules in this field. We anticipate this knowledge graph-based approach will accelerate the acquisition and transfer of chemical reaction knowledge, catalyzing the discovery of new reactions. This work provides an automatic and adaptive framework for extracting key insights from reaction datasets to inform the design of novel reactions.
{"title":"Reaction condition- and functional group-specific knowledge discovery: Data- and computation-based analysis on transition-metal-free transformation of organoborons","authors":"Linke He , Yulong Fu , Shaoyi Hou , Guoqiang Wang , Jiabao Zhao , Yipeng Xing , Shuhua Li , Jing Ma","doi":"10.1016/j.aichem.2023.100034","DOIUrl":"10.1016/j.aichem.2023.100034","url":null,"abstract":"<div><p>Gaining insights into overarching trends in chemical reaction systems is crucial for refining reaction conditions and developing novel reactions. These knowledgements include preferences for certain reagents, solvents, and functional group tolerance rules. Traditionally, synthetic chemists have relied on extensive literature searching to acquire the knowledge, a process that is both time-consuming and laborious. To streamline this process, we construct a standardized dataset and knowledge graph on an emerging domain, transition-metal-free transformations with organoborons. The dataset, compiled from organic reaction literature, includes comprehensive details of reaction scopes and conditions. The subsequent construction of a knowledge graph offers a visual representation of the reactions and their interrelationships. Through knowledge graph-based hierarchical analysis and density functional theory (DFT) calculations, we revealed the currently most frequently used reactants, synthetic conditions, and functional group rules in this field. We anticipate this knowledge graph-based approach will accelerate the acquisition and transfer of chemical reaction knowledge, catalyzing the discovery of new reactions. This work provides an automatic and adaptive framework for extracting key insights from reaction datasets to inform the design of novel reactions.</p></div>","PeriodicalId":72302,"journal":{"name":"Artificial intelligence chemistry","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2949747723000349/pdfft?md5=c4bedd7068acf7555c4e457d139943df&pid=1-s2.0-S2949747723000349-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138619235","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-11-30DOI: 10.1016/j.aichem.2023.100030
Eric Paquet , Farzan Soleymani , Gabriel St-Pierre-Lemieux , Herna Lydia Viktor , Wojtek Michalowski
This paper presents a new approach for protein generation based on one-shot learning and hybrid quantum neural networks. Given a single protein complex, the system learns how to predict the remaining unknown properties, without resorting to autoregression, from the physicochemical properties of the receptor and a prior on the physicochemical properties of the ligand. In contrast with other approaches, QuantumBound learns from a single instance, not from a large dataset, as is common in deep learning. By knowing half of the properties of the ligand, the system can predict the remaining half with an average relative error of 1.43% for a dataset consisting of one hundred and twenty Covid-19 spikes complexes. To the best of our knowledge, this is the first time that one-shot learning and hybrid quantum computing have been applied to protein generation.
{"title":"QuantumBound – Interactive protein generation with one-shot learning and hybrid quantum neural networks","authors":"Eric Paquet , Farzan Soleymani , Gabriel St-Pierre-Lemieux , Herna Lydia Viktor , Wojtek Michalowski","doi":"10.1016/j.aichem.2023.100030","DOIUrl":"https://doi.org/10.1016/j.aichem.2023.100030","url":null,"abstract":"<div><p>This paper presents a new approach for protein generation based on one-shot learning and hybrid quantum neural networks. Given a single protein complex, the system learns how to predict the remaining unknown properties, without resorting to autoregression, from the physicochemical properties of the receptor and a prior on the physicochemical properties of the ligand. In contrast with other approaches, QuantumBound learns from a single instance, not from a large dataset, as is common in deep learning. By knowing half of the properties of the ligand, the system can predict the remaining half with an average relative error of 1.43% for a dataset consisting of one hundred and twenty Covid-19 spikes complexes. To the best of our knowledge, this is the first time that one-shot learning and hybrid quantum computing have been applied to protein generation.</p></div>","PeriodicalId":72302,"journal":{"name":"Artificial intelligence chemistry","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2949747723000301/pdfft?md5=7d7af911816c956f9e7248de8a335e1a&pid=1-s2.0-S2949747723000301-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138489573","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-11-29DOI: 10.1016/j.aichem.2023.100032
Vaneet Saini , Ramesh Kataria, Shruti Rajput
The knowledge of chemical reactivity of substrates is a prerequisite to accurately design a chemical reaction; however, it has been a challenging task due to the slow trial-and-error experimental approaches and the high computational cost associated with in silico investigations. Artificial intelligence techniques could serve as an alternative to efficiently determine the relative reactivity of chemical entities. In the context of this research, we propose an artificial neural network model to predict the bond dissociation energies of hypervalent iodine reagents. An open-source cheminformatics package, namely, Mordred, was employed for calculating various 1D, 2D and topological descriptors. The approach utilizes a dataset of more than 1000 hypervalent iodine reagents, and the bond dissociation energies can be predicted with a remarkable accuracy, as suggested by an R2 score of 0.97 and a mean absolute error of 1.96 kcal/mol. Owing to the low cost and high efficiency, this machine learning approach can provide an alternative to the theoretical/experimental approaches to rationally design a chemical reaction and without having to go through the hassle of high-throughput experimentation to reach the desired reaction outcome. In an effort to make the model interpretable, a feature importance algorithm was applied, which identified descriptors contributing most to the development of the model. Features describing electronegativity and polarizability are some of the important contributors to the model’s training.
{"title":"A machine learning approach for predicting the reactivity power of hypervalent iodine compounds","authors":"Vaneet Saini , Ramesh Kataria, Shruti Rajput","doi":"10.1016/j.aichem.2023.100032","DOIUrl":"https://doi.org/10.1016/j.aichem.2023.100032","url":null,"abstract":"<div><p>The knowledge of chemical reactivity of substrates is a prerequisite to accurately design a chemical reaction; however, it has been a challenging task due to the slow trial-and-error experimental approaches and the high computational cost associated with in silico investigations. Artificial intelligence techniques could serve as an alternative to efficiently determine the relative reactivity of chemical entities. In the context of this research, we propose an artificial neural network model to predict the bond dissociation energies of hypervalent iodine reagents. An open-source cheminformatics package, namely, Mordred, was employed for calculating various 1D, 2D and topological descriptors. The approach utilizes a dataset of more than 1000 hypervalent iodine reagents, and the bond dissociation energies can be predicted with a remarkable accuracy, as suggested by an R<sup>2</sup> score of 0.97 and a mean absolute error of 1.96 kcal/mol. Owing to the low cost and high efficiency, this machine learning approach can provide an alternative to the theoretical/experimental approaches to rationally design a chemical reaction and without having to go through the hassle of high-throughput experimentation to reach the desired reaction outcome. In an effort to make the model interpretable, a feature importance algorithm was applied, which identified descriptors contributing most to the development of the model. Features describing electronegativity and polarizability are some of the important contributors to the model’s training.</p></div>","PeriodicalId":72302,"journal":{"name":"Artificial intelligence chemistry","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2949747723000325/pdfft?md5=a1dd6d2ca6039f146d3c2a643cbb05b8&pid=1-s2.0-S2949747723000325-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138489574","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-11-28DOI: 10.1016/j.aichem.2023.100031
Shijie Tao , Yi Feng , Wenmin Wang , Tiantian Han , Pieter E.S. Smith , Jun Jiang
Geometric information of molecules is closely related to their properties, and vibrational spectroscopy, as a common and powerful analytical tool for determining molecular structure, can assist in gaining precise geometric information. Traditional methods used to delineate spectrum-structure correlations are often expensive, time-consuming, and require extensive professional expertise. In this work, we used a machine learning protocol to construct a map from spectra to molecular geometric structures, and employed Grad-CAM, a convolutional network interpretation technology, to analyze which kinds of chemical information are important for determining our model’s results. The results obtained for six small molecules of differing structures demonstrate that the model is capable of (1) extracting the crucial spectral features that are vital to downstream tasks without necessitating any manual preprocessing, and (2) enabling retrieval of molecular structural information with high precision.
{"title":"A machine learning protocol for geometric information retrieval from molecular spectra","authors":"Shijie Tao , Yi Feng , Wenmin Wang , Tiantian Han , Pieter E.S. Smith , Jun Jiang","doi":"10.1016/j.aichem.2023.100031","DOIUrl":"https://doi.org/10.1016/j.aichem.2023.100031","url":null,"abstract":"<div><p>Geometric information of molecules is closely related to their properties, and vibrational spectroscopy, as a common and powerful analytical tool for determining molecular structure, can assist in gaining precise geometric information. Traditional methods used to delineate spectrum-structure correlations are often expensive, time-consuming, and require extensive professional expertise. In this work, we used a machine learning protocol to construct a map from spectra to molecular geometric structures, and employed Grad-CAM, a convolutional network interpretation technology, to analyze which kinds of chemical information are important for determining our model’s results. The results obtained for six small molecules of differing structures demonstrate that the model is capable of (1) extracting the crucial spectral features that are vital to downstream tasks without necessitating any manual preprocessing, and (2) enabling retrieval of molecular structural information with high precision.</p></div>","PeriodicalId":72302,"journal":{"name":"Artificial intelligence chemistry","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2949747723000313/pdfft?md5=8aed6656166ef3e340a5e81d46b42a1c&pid=1-s2.0-S2949747723000313-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138474987","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-11-19DOI: 10.1016/j.aichem.2023.100029
Beihong Ji, Yuhui Wu, Elena N. Thomas, Jocelyn N. Edwards, Xibing He, Junmei Wang
To accelerate the discovery of novel drug candidates for Coronavirus Disease 2019 (COVID-19) therapeutics, we reported a series of machine learning (ML)-based models to accurately predict the anti-SARS-CoV-2 activities of screening compounds. We explored 6 popular ML algorithms in combination with 15 molecular descriptors for molecular structures from 9 screening assays in the COVID-19 OpenData Portal hosted by NCATS. As a result, the models constructed by k-nearest neighbors (KNN) using the molecular descriptor GAFF+RDKit achieved the best overall performance with the highest average accuracy of 0.68 and relatively high average area under the receiver operating characteristic curve of 0.74, better than other ML algorithms. Meanwhile, The KNN model for all assays using GAFF+RDKit descriptor outperformed using other descriptors. The overall performance of our developed models was better than REDIAL-2020 (R). A web server (https://clickff.org/amberweb/covid-19-cp) was developed to enable users to predict anti-SARS-CoV-2 activities of arbitrary compounds using the COVID-19-CP (P) models. Besides the descriptor-based machine learning models, we also developed graph-based Attentive FP (A) models for the 9 assays. We found that the Attentive FP models achieved a comparable performance to that of COVID-19-CP and outperformed the REDIAL-2020 models. The consensus prediction utilizing both COVID-19-CP and Attentive FP can significantly boost the prediction accuracy as assessed by comparing its performance with other three individual models (R, P, A) utilizing the Wilcoxon signed-rank test, thus can ultimately improve the success rate of COVID-19 drug discovery.
{"title":"Predicting anti-SARS-CoV-2 activities of chemical compounds using machine learning models","authors":"Beihong Ji, Yuhui Wu, Elena N. Thomas, Jocelyn N. Edwards, Xibing He, Junmei Wang","doi":"10.1016/j.aichem.2023.100029","DOIUrl":"https://doi.org/10.1016/j.aichem.2023.100029","url":null,"abstract":"<div><p>To accelerate the discovery of novel drug candidates for Coronavirus Disease 2019 (COVID-19) therapeutics, we reported a series of machine learning (ML)-based models to accurately predict the anti-SARS-CoV-2 activities of screening compounds. We explored 6 popular ML algorithms in combination with 15 molecular descriptors for molecular structures from 9 screening assays in the COVID-19 OpenData Portal hosted by NCATS. As a result, the models constructed by k-nearest neighbors (KNN) using the molecular descriptor GAFF+RDKit achieved the best overall performance with the highest average accuracy of 0.68 and relatively high average area under the receiver operating characteristic curve of 0.74, better than other ML algorithms. Meanwhile, The KNN model for all assays using GAFF+RDKit descriptor outperformed using other descriptors. The overall performance of our developed models was better than REDIAL-2020 (<strong>R</strong>). A web server (<span>https://clickff.org/amberweb/covid-19-cp</span><svg><path></path></svg>) was developed to enable users to predict anti-SARS-CoV-2 activities of arbitrary compounds using the COVID-19-CP (<strong>P</strong>) models. Besides the descriptor-based machine learning models, we also developed graph-based Attentive FP (<strong>A</strong>) models for the 9 assays. We found that the Attentive FP models achieved a comparable performance to that of COVID-19-CP and outperformed the REDIAL-2020 models. The consensus prediction utilizing both COVID-19-CP and Attentive FP can significantly boost the prediction accuracy as assessed by comparing its performance with other three individual models (<strong>R</strong>, <strong>P</strong>, <strong>A</strong>) utilizing the Wilcoxon signed-rank test, thus can ultimately improve the success rate of COVID-19 drug discovery.</p></div>","PeriodicalId":72302,"journal":{"name":"Artificial intelligence chemistry","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2949747723000295/pdfft?md5=6026439e3da02cfb256ffaa4b8f13538&pid=1-s2.0-S2949747723000295-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138436572","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-11-19DOI: 10.1016/j.aichem.2023.100028
Oyawale Adetunji Moses , Mukhtar Lawan Adam , Zijian Chen , Collins Izuchukwu Ezeh , Hao Huang , Zhuo Wang , Zixuan Wang , Boyuan Wang , Wentao Li , Chensu Wang , Zongyou Yin , Yang Lu , Xue-Feng Yu , Haitao Zhao
The challenge of data-driven synthesis of advanced nanomaterials can be minimized by using machine learning algorithms to optimize synthesis parameters and expedite the innovation process. In this study, a high-throughput robotic platform was employed to synthesize over 1356 gold nanorods with varying aspect ratios via a seedless approach. The developed models guided us in synthesizing gold nanorods with customized morphology, resulting in highly repeatable morphological yield with quantifiable structure-modulating precursor adjustments. The study provides insight into the dynamic relationships between key structure-modulating precursors and the structural morphology of gold nanorods based on the expected aspect ratio. The high-throughput robotic platform-fabricated gold nanorods demonstrated precise aspect ratio control when spectrophotometrically investigated and further validated with the transmission electron microscopy characterization. These findings demonstrate the potential of high-throughput robot-assisted synthesis and machine learning in the synthesis optimization of gold nanorods and aided in the development of models that can aid such synthesis of as-desired gold nanorods.
{"title":"Machine learning and robot-assisted synthesis of diverse gold nanorods via seedless approach","authors":"Oyawale Adetunji Moses , Mukhtar Lawan Adam , Zijian Chen , Collins Izuchukwu Ezeh , Hao Huang , Zhuo Wang , Zixuan Wang , Boyuan Wang , Wentao Li , Chensu Wang , Zongyou Yin , Yang Lu , Xue-Feng Yu , Haitao Zhao","doi":"10.1016/j.aichem.2023.100028","DOIUrl":"https://doi.org/10.1016/j.aichem.2023.100028","url":null,"abstract":"<div><p>The challenge of data-driven synthesis of advanced nanomaterials can be minimized by using machine learning algorithms to optimize synthesis parameters and expedite the innovation process. In this study, a high-throughput robotic platform was employed to synthesize over 1356 gold nanorods with varying aspect ratios via a seedless approach. The developed models guided us in synthesizing gold nanorods with customized morphology, resulting in highly repeatable morphological yield with quantifiable structure-modulating precursor adjustments. The study provides insight into the dynamic relationships between key structure-modulating precursors and the structural morphology of gold nanorods based on the expected aspect ratio. The high-throughput robotic platform-fabricated gold nanorods demonstrated precise aspect ratio control when spectrophotometrically investigated and further validated with the transmission electron microscopy characterization. These findings demonstrate the potential of high-throughput robot-assisted synthesis and machine learning in the synthesis optimization of gold nanorods and aided in the development of models that can aid such synthesis of as-desired gold nanorods.</p></div>","PeriodicalId":72302,"journal":{"name":"Artificial intelligence chemistry","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2949747723000283/pdfft?md5=8511642b616c7b56dec42d00c89c3ede&pid=1-s2.0-S2949747723000283-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138448082","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}