Pub Date : 2022-12-01DOI: 10.1016/j.ailsci.2022.100039
Zhenwu Yang , Yujia Tian , Yue Kong , Yushan Zhu , Aixia Yan
Janus kinase 1 (JAK1) is a key regulator of gene transcription, inhibition of JAK1 is an intervention for many diseases including rheumatoid arthritis and Crohn's disease. In this study, we collected a dataset containing 2982 JAK1 inhibitors, characterized molecules by MACCS fingerprints and Morgan fingerprints. We used support vector machine (SVM), decision tree (DT), random forest (RF) and extreme gradient boosting tree (XGBoost) algorithms to build 16 traditional machine learning classification models. Additionally, we utilized deep neural networks (DNN) to develop four deep learning models. The best model (Model 3B) built by RF and Morgan fingerprints achieved the accuracy (ACC) of 93.6% and Mathews correlation coefficient (MCC) of 0.87 on the test set. Furthermore, we made structure–activity relationship (SAR) analyses for JAK1 inhibitors, based on the output from the random forest models. After analyzing the important keys of two types of fingerprints, it was observed that some substructures such as pyrazole, pyrrolotriazolopyrimidine and pyrazolopyrimidine appeared frequently in highly active JAK1 inhibitors.
{"title":"Classification of JAK1 Inhibitors and SAR Research by Machine Learning Methods","authors":"Zhenwu Yang , Yujia Tian , Yue Kong , Yushan Zhu , Aixia Yan","doi":"10.1016/j.ailsci.2022.100039","DOIUrl":"https://doi.org/10.1016/j.ailsci.2022.100039","url":null,"abstract":"<div><p>Janus kinase 1 (JAK1) is a key regulator of gene transcription, inhibition of JAK1 is an intervention for many diseases including rheumatoid arthritis and Crohn's disease. In this study, we collected a dataset containing 2982 JAK1 inhibitors, characterized molecules by MACCS fingerprints and Morgan fingerprints. We used support vector machine (SVM), decision tree (DT), random forest (RF) and extreme gradient boosting tree (XGBoost) algorithms to build 16 traditional machine learning classification models. Additionally, we utilized deep neural networks (DNN) to develop four deep learning models. The best model (Model 3B) built by RF and Morgan fingerprints achieved the accuracy (ACC) of 93.6% and Mathews correlation coefficient (MCC) of 0.87 on the test set. Furthermore, we made structure–activity relationship (SAR) analyses for JAK1 inhibitors, based on the output from the random forest models. After analyzing the important keys of two types of fingerprints, it was observed that some substructures such as pyrazole, pyrrolotriazolopyrimidine and pyrazolopyrimidine appeared frequently in highly active JAK1 inhibitors.</p></div>","PeriodicalId":72304,"journal":{"name":"Artificial intelligence in the life sciences","volume":"2 ","pages":"Article 100039"},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2667318522000101/pdfft?md5=2754446c7965603153a27ece060160a4&pid=1-s2.0-S2667318522000101-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91728648","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-12-01DOI: 10.1016/j.ailsci.2022.100035
Yu Feng , Yuyao Yang , Wenbin Deng , Hongming Chen , Ting Ran
Target specific drug design has attracted much attention in drug discovery. But, it is a great challenge to efficiently explore the target-focused chemical space. Fragment-based drug design (FBDD) has shown its potential to do this thing. In this study, we introduced a deep learning-based fragment linking method, namely SyntaLinker-Hybrid, for target specific molecular generation. By carrying out transfer learning and fragment hybridization, this method allows to generate a great number of linker fragments to assemble given terminal fragments into the molecules with target specificity. This work demonstrates that the method has the capacity to generate target specific structures for various targets. We believe that its application could be extended to a broader target scope.
{"title":"SyntaLinker-Hybrid: A deep learning approach for target specific drug design","authors":"Yu Feng , Yuyao Yang , Wenbin Deng , Hongming Chen , Ting Ran","doi":"10.1016/j.ailsci.2022.100035","DOIUrl":"https://doi.org/10.1016/j.ailsci.2022.100035","url":null,"abstract":"<div><p>Target specific drug design has attracted much attention in drug discovery. But, it is a great challenge to efficiently explore the target-focused chemical space. Fragment-based drug design (FBDD) has shown its potential to do this thing. In this study, we introduced a deep learning-based fragment linking method, namely SyntaLinker-Hybrid, for target specific molecular generation. By carrying out transfer learning and fragment hybridization, this method allows to generate a great number of linker fragments to assemble given terminal fragments into the molecules with target specificity. This work demonstrates that the method has the capacity to generate target specific structures for various targets. We believe that its application could be extended to a broader target scope.</p></div>","PeriodicalId":72304,"journal":{"name":"Artificial intelligence in the life sciences","volume":"2 ","pages":"Article 100035"},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S266731852200006X/pdfft?md5=18b885672aac997f6abccdc3b5e58b84&pid=1-s2.0-S266731852200006X-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90029604","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-12-01DOI: 10.1016/j.ailsci.2022.100042
Luca Menestrina, Maurizio Recanatini
Drug repurposing consists in identifying additional uses for known drugs and, since these new findings are built on previous knowledge, it reduces both the length and the costs of the drug development. In this work, we assembled an automated computational pipeline for drug repurposing, integrating also a network-based analysis for screening the possible drug combinations. The selection of drugs relies both on their proximity to the disease on the protein-protein interactome and on their influence on the expression of disease-related genes. Combined therapies are then prioritized on the basis of the drugs’ separation on the human interactome and the known drug-drug interactions. We eventually collected a number of molecules, and their plausible combinations, that could be proposed for the treatment of Huntington's disease and multiple sclerosis. Finally, this pipeline could potentially provide new suggestions also for other complex disorders.
{"title":"An unsupervised computational pipeline identifies potential repurposable drugs to treat Huntington's disease and multiple sclerosis","authors":"Luca Menestrina, Maurizio Recanatini","doi":"10.1016/j.ailsci.2022.100042","DOIUrl":"10.1016/j.ailsci.2022.100042","url":null,"abstract":"<div><p>Drug repurposing consists in identifying additional uses for known drugs and, since these new findings are built on previous knowledge, it reduces both the length and the costs of the drug development. In this work, we assembled an automated computational pipeline for drug repurposing, integrating also a network-based analysis for screening the possible drug combinations. The selection of drugs relies both on their proximity to the disease on the protein-protein interactome and on their influence on the expression of disease-related genes. Combined therapies are then prioritized on the basis of the drugs’ separation on the human interactome and the known drug-drug interactions. We eventually collected a number of molecules, and their plausible combinations, that could be proposed for the treatment of Huntington's disease and multiple sclerosis. Finally, this pipeline could potentially provide new suggestions also for other complex disorders.</p></div>","PeriodicalId":72304,"journal":{"name":"Artificial intelligence in the life sciences","volume":"2 ","pages":"Article 100042"},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2667318522000125/pdfft?md5=02a08224e3d5097be5747fc8a22c3572&pid=1-s2.0-S2667318522000125-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42636492","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-12-01DOI: 10.1016/j.ailsci.2022.100049
Denis N. Prada Gori, Lucas N. Alberca, Santiago Rodriguez, Juan I. Alice, Manuel A. Llanos, Carolina L. Bellera, Alan Talevi
Cheminformatics is the chemical field that deals with the storage, retrieval, analysis and manipulation of an increasing volume of available chemical data, and it plays a fundamental role in the fields of drug discovery, biology, chemistry, and biochemistry. Open source and freely available cheminformatics tools not only contribute to the generation of public knowledge, but also to reduce the technological gap between high- and low- to middle-income countries. Here, we describe a series of in-house cheminformatics applications developed by our academic drug discovery team, which are freely available on our website (https://lideb.biol.unlp.edu.ar/) as Web Apps and stand-alone versions. These apps include tools for clustering small molecules, decoy generation, druggability assessment, classificatory model evaluation, and data standardization and visualization.
{"title":"LIDeB Tools: A Latin American resource of freely available, open-source cheminformatics apps","authors":"Denis N. Prada Gori, Lucas N. Alberca, Santiago Rodriguez, Juan I. Alice, Manuel A. Llanos, Carolina L. Bellera, Alan Talevi","doi":"10.1016/j.ailsci.2022.100049","DOIUrl":"10.1016/j.ailsci.2022.100049","url":null,"abstract":"<div><p>Cheminformatics is the chemical field that deals with the storage, retrieval, analysis and manipulation of an increasing volume of available chemical data, and it plays a fundamental role in the fields of drug discovery, biology, chemistry, and biochemistry. Open source and freely available cheminformatics tools not only contribute to the generation of public knowledge, but also to reduce the technological gap between high- and low- to middle-income countries. Here, we describe a series of in-house cheminformatics applications developed by our academic drug discovery team, which are freely available on our website (<span>https://lideb.biol.unlp.edu.ar/</span><svg><path></path></svg>) as Web Apps and stand-alone versions. These apps include tools for clustering small molecules, decoy generation, druggability assessment, classificatory model evaluation, and data standardization and visualization.</p></div>","PeriodicalId":72304,"journal":{"name":"Artificial intelligence in the life sciences","volume":"2 ","pages":"Article 100049"},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2667318522000198/pdfft?md5=022e4e88e07795a9a57aee98fede7162&pid=1-s2.0-S2667318522000198-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47979278","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-12-01DOI: 10.1016/j.ailsci.2022.100051
Jürgen Bajorath
{"title":"Revisiting active learning in drug discovery through open science","authors":"Jürgen Bajorath","doi":"10.1016/j.ailsci.2022.100051","DOIUrl":"10.1016/j.ailsci.2022.100051","url":null,"abstract":"","PeriodicalId":72304,"journal":{"name":"Artificial intelligence in the life sciences","volume":"2 ","pages":"Article 100051"},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2667318522000216/pdfft?md5=b8de5d966c65ba976cccafce482b1fe8&pid=1-s2.0-S2667318522000216-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47205862","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-12-01DOI: 10.1016/j.ailsci.2022.100045
Satvik Tripathi , Alisha Isabelle Augustin , Adam Dunlop , Rithvik Sukumaran , Suhani Dheer , Alex Zavalny , Owen Haslam , Thomas Austin , Jacob Donchez , Pushpendra Kumar Tripathi , Edward Kim
A rising amount of research demonstrates that artificial intelligence and machine learning approaches can provide an essential basis for the drug design and discovery process. Deep learning algorithms are being developed in response to recent advances in computer technology as part of the creation of therapeutically relevant medications for the treatment of a variety of ailments. In this review, we focus on the most recent advances in the areas of drug design and discovery research employing generative deep learning methodologies such as generative adversarial network (GAN) frameworks. To begin, we examine drug design and discovery studies that use several GAN methodologies to evaluate one key application, such as molecular de novo design in drug design and discovery. Furthermore, we discuss many GAN models for dimension reduction of single-cell data at the preclinical stage of the drug development pipeline. We also show various experiments in de novo peptide and protein creation utilizing GAN frameworks. Furthermore, we discuss the limits of past drug design and discovery research employing GAN models. Finally, we give a discussion on future research prospects and obstacles.
{"title":"Recent advances and application of generative adversarial networks in drug discovery, development, and targeting","authors":"Satvik Tripathi , Alisha Isabelle Augustin , Adam Dunlop , Rithvik Sukumaran , Suhani Dheer , Alex Zavalny , Owen Haslam , Thomas Austin , Jacob Donchez , Pushpendra Kumar Tripathi , Edward Kim","doi":"10.1016/j.ailsci.2022.100045","DOIUrl":"10.1016/j.ailsci.2022.100045","url":null,"abstract":"<div><p>A rising amount of research demonstrates that artificial intelligence and machine learning approaches can provide an essential basis for the drug design and discovery process. Deep learning algorithms are being developed in response to recent advances in computer technology as part of the creation of therapeutically relevant medications for the treatment of a variety of ailments. In this review, we focus on the most recent advances in the areas of drug design and discovery research employing generative deep learning methodologies such as generative adversarial network (GAN) frameworks. To begin, we examine drug design and discovery studies that use several GAN methodologies to evaluate one key application, such as molecular <em>de novo</em> design in drug design and discovery. Furthermore, we discuss many GAN models for dimension reduction of single-cell data at the preclinical stage of the drug development pipeline. We also show various experiments in <em>de novo</em> peptide and protein creation utilizing GAN frameworks. Furthermore, we discuss the limits of past drug design and discovery research employing GAN models. Finally, we give a discussion on future research prospects and obstacles.</p></div>","PeriodicalId":72304,"journal":{"name":"Artificial intelligence in the life sciences","volume":"2 ","pages":"Article 100045"},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2667318522000150/pdfft?md5=9c33e9c2ba0eb38e17020fefccff7451&pid=1-s2.0-S2667318522000150-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43912790","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-12-01DOI: 10.1016/j.ailsci.2022.100030
Jürgen Bajorath
{"title":"AI in Life Science Research – The Road Ahead","authors":"Jürgen Bajorath","doi":"10.1016/j.ailsci.2022.100030","DOIUrl":"https://doi.org/10.1016/j.ailsci.2022.100030","url":null,"abstract":"","PeriodicalId":72304,"journal":{"name":"Artificial intelligence in the life sciences","volume":"2 ","pages":"Article 100030"},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2667318522000010/pdfft?md5=4b1645e249223d66d1d5fd7531925bf6&pid=1-s2.0-S2667318522000010-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136610939","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-12-01DOI: 10.1016/j.ailsci.2022.100044
Rodrigo Ochoa , Ángel Santiago , Melissa Alegría-Arcos
The study of protein-peptide interactions is an active research field from an experimental and computational perspective, with the latest presenting challenges to model and simulate the peptides' intrinsic flexibility. Predicting affinities towards protein systems of interest, such as proteases, is crucial to understand the specificity of the interactions and support the discovery of novel substrates. Here we provide a set of computational protocols to run structural and dynamical analysis of protein-peptide complexes from a binding perspective. The protocols are based on state-of-the-art methods, but the code is open and can be customized depending on the user needs. These include a fragment-growing peptide docking protocol to predict bound conformations of flexible peptides, a protocol to extract descriptors from protein-peptide molecular dynamics trajectories, and a workflow to build and test machine learning regression models. As a toy example, we applied the protocols to a serine protease structure with a set of known peptide substrates and random sequences to illustrate the use of the code, which is publicly available at: https://github.com/rochoa85/Protocols-Peptide-Binding
{"title":"Open protocols for docking and MD-based scoring of peptide substrates","authors":"Rodrigo Ochoa , Ángel Santiago , Melissa Alegría-Arcos","doi":"10.1016/j.ailsci.2022.100044","DOIUrl":"10.1016/j.ailsci.2022.100044","url":null,"abstract":"<div><p>The study of protein-peptide interactions is an active research field from an experimental and computational perspective, with the latest presenting challenges to model and simulate the peptides' intrinsic flexibility. Predicting affinities towards protein systems of interest, such as proteases, is crucial to understand the specificity of the interactions and support the discovery of novel substrates. Here we provide a set of computational protocols to run structural and dynamical analysis of protein-peptide complexes from a binding perspective. The protocols are based on state-of-the-art methods, but the code is open and can be customized depending on the user needs. These include a fragment-growing peptide docking protocol to predict bound conformations of flexible peptides, a protocol to extract descriptors from protein-peptide molecular dynamics trajectories, and a workflow to build and test machine learning regression models. As a toy example, we applied the protocols to a serine protease structure with a set of known peptide substrates and random sequences to illustrate the use of the code, which is publicly available at: <span>https://github.com/rochoa85/Protocols-Peptide-Binding</span><svg><path></path></svg></p></div>","PeriodicalId":72304,"journal":{"name":"Artificial intelligence in the life sciences","volume":"2 ","pages":"Article 100044"},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2667318522000149/pdfft?md5=37f48baa6e0b2e91691325276818a26d&pid=1-s2.0-S2667318522000149-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41545827","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-12-01DOI: 10.1016/j.ailsci.2022.100031
Fabio Urbina, Sean Ekins
Anyone involved in designing or finding molecules in the life sciences over the past few years has witnessed a dramatic change in how we now work due to the COVID-19 pandemic. Computational technologies like artificial intelligence (AI) seemed to become ubiquitous in 2020 and have been increasingly applied as scientists worked from home and were separated from the laboratory and their colleagues. This shift may be more permanent as the future of molecule design across different industries will increasingly require machine learning models for design and optimization of molecules as they become “designed by AI”. AI and machine learning has essentially become a commodity within the pharmaceutical industry. This perspective will briefly describe our personal opinions of how machine learning has evolved and is being applied to model different molecule properties that crosses industries in their utility and ultimately suggests the potential for tight integration of AI into equipment and automated experimental pipelines. It will also describe how many groups have implemented generative models covering different architectures, for de novo design of molecules. We also highlight some of the companies at the forefront of using AI to demonstrate how machine learning has impacted and influenced our work. Finally, we will peer into the future and suggest some of the areas that represent the most interesting technologies that may shape the future of molecule design, highlighting how we can help increase the efficiency of the design-make-test cycle which is currently a major focus across industries.
{"title":"The commoditization of AI for molecule design","authors":"Fabio Urbina, Sean Ekins","doi":"10.1016/j.ailsci.2022.100031","DOIUrl":"10.1016/j.ailsci.2022.100031","url":null,"abstract":"<div><p>Anyone involved in designing or finding molecules in the life sciences over the past few years has witnessed a dramatic change in how we now work due to the COVID-19 pandemic. Computational technologies like artificial intelligence (AI) seemed to become ubiquitous in 2020 and have been increasingly applied as scientists worked from home and were separated from the laboratory and their colleagues. This shift may be more permanent as the future of molecule design across different industries will increasingly require machine learning models for design and optimization of molecules as they become “designed by AI”. AI and machine learning has essentially become a commodity within the pharmaceutical industry. This perspective will briefly describe our personal opinions of how machine learning has evolved and is being applied to model different molecule properties that crosses industries in their utility and ultimately suggests the potential for tight integration of AI into equipment and automated experimental pipelines. It will also describe how many groups have implemented generative models covering different architectures, for <em>de novo</em> design of molecules. We also highlight some of the companies at the forefront of using AI to demonstrate how machine learning has impacted and influenced our work. Finally, we will peer into the future and suggest some of the areas that represent the most interesting technologies that may shape the future of molecule design, highlighting how we can help increase the efficiency of the design-make-test cycle which is currently a major focus across industries.</p></div>","PeriodicalId":72304,"journal":{"name":"Artificial intelligence in the life sciences","volume":"2 ","pages":"Article 100031"},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9541920/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10653331","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-12-01DOI: 10.1016/j.ailsci.2022.100050
James Thompson , W Patrick Walters , Jianwen A Feng , Nicolas A Pabon , Hongcheng Xu , Michael Maser , Brian B Goldman , Demetri Moustakas , Molly Schmidt , Forrest York
While Relative Binding Free Energy (RBFE) calculations have become a mainstay in lead optimization programs, the computational expense of performing these calculations has limited their broader application. Active learning (AL), a machine learning method used to direct a search iteratively, has explored larger chemical libraries using RBFE calculations. While AL has been successfully applied, there has not been a systematic study of the impact of parameter settings on the performance of AL. To address this gap, we have generated an exhaustive dataset of RBFE calculations on 10,000 congeneric molecules. We used this dataset to explore the impact of several AL design choices, including the number of molecules sampled at each iteration, the method used to select an initial sample, the method used to build a machine learning model, and the acquisition function that defines the balance between exploration and exploitation in the search. Our studies demonstrated that the performance of AL is largely insensitive to the specific machine learning method and acquisition functions used. In our studies, the most significant factor impacting performance was the number of molecules sampled at each iteration where selecting too few molecules hurts performance. Under the best conditions, we were able to identify 75% of the 100 top scoring molecules by sampling only 6% of the dataset. We hope that the dataset of 10K molecules will provide the basis for future studies exploring additional AL strategies. The source code and supporting data for the work are available at https://github.com/google-research/google-research/tree/master/al_for_fep.
{"title":"Optimizing active learning for free energy calculations","authors":"James Thompson , W Patrick Walters , Jianwen A Feng , Nicolas A Pabon , Hongcheng Xu , Michael Maser , Brian B Goldman , Demetri Moustakas , Molly Schmidt , Forrest York","doi":"10.1016/j.ailsci.2022.100050","DOIUrl":"10.1016/j.ailsci.2022.100050","url":null,"abstract":"<div><p>While Relative Binding Free Energy (RBFE) calculations have become a mainstay in lead optimization programs, the computational expense of performing these calculations has limited their broader application. Active learning (AL), a machine learning method used to direct a search iteratively, has explored larger chemical libraries using RBFE calculations. While AL has been successfully applied, there has not been a systematic study of the impact of parameter settings on the performance of AL. To address this gap, we have generated an exhaustive dataset of RBFE calculations on 10,000 congeneric molecules. We used this dataset to explore the impact of several AL design choices, including the number of molecules sampled at each iteration, the method used to select an initial sample, the method used to build a machine learning model, and the acquisition function that defines the balance between exploration and exploitation in the search. Our studies demonstrated that the performance of AL is largely insensitive to the specific machine learning method and acquisition functions used. In our studies, the most significant factor impacting performance was the number of molecules sampled at each iteration where selecting too few molecules hurts performance. Under the best conditions, we were able to identify 75% of the 100 top scoring molecules by sampling only 6% of the dataset. We hope that the dataset of 10K molecules will provide the basis for future studies exploring additional AL strategies. The source code and supporting data for the work are available at <span>https://github.com/google-research/google-research/tree/master/al_for_fep</span><svg><path></path></svg>.</p></div>","PeriodicalId":72304,"journal":{"name":"Artificial intelligence in the life sciences","volume":"2 ","pages":"Article 100050"},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2667318522000204/pdfft?md5=fd95fcb1f3da91cd7543db829403ca90&pid=1-s2.0-S2667318522000204-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48384591","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}