首页 > 最新文献

Artificial intelligence in the life sciences最新文献

英文 中文
Classification of JAK1 Inhibitors and SAR Research by Machine Learning Methods 基于机器学习方法的JAK1抑制剂分类及SAR研究
Pub Date : 2022-12-01 DOI: 10.1016/j.ailsci.2022.100039
Zhenwu Yang , Yujia Tian , Yue Kong , Yushan Zhu , Aixia Yan

Janus kinase 1 (JAK1) is a key regulator of gene transcription, inhibition of JAK1 is an intervention for many diseases including rheumatoid arthritis and Crohn's disease. In this study, we collected a dataset containing 2982 JAK1 inhibitors, characterized molecules by MACCS fingerprints and Morgan fingerprints. We used support vector machine (SVM), decision tree (DT), random forest (RF) and extreme gradient boosting tree (XGBoost) algorithms to build 16 traditional machine learning classification models. Additionally, we utilized deep neural networks (DNN) to develop four deep learning models. The best model (Model 3B) built by RF and Morgan fingerprints achieved the accuracy (ACC) of 93.6% and Mathews correlation coefficient (MCC) of 0.87 on the test set. Furthermore, we made structure–activity relationship (SAR) analyses for JAK1 inhibitors, based on the output from the random forest models. After analyzing the important keys of two types of fingerprints, it was observed that some substructures such as pyrazole, pyrrolotriazolopyrimidine and pyrazolopyrimidine appeared frequently in highly active JAK1 inhibitors.

Janus kinase 1 (JAK1)是基因转录的关键调控因子,抑制JAK1可以干预包括类风湿关节炎和克罗恩病在内的许多疾病。在这项研究中,我们收集了包含2982个JAK1抑制剂的数据集,用MACCS指纹和Morgan指纹对分子进行了表征。采用支持向量机(SVM)、决策树(DT)、随机森林(RF)和极端梯度增强树(XGBoost)算法构建了16个传统的机器学习分类模型。此外,我们利用深度神经网络(DNN)开发了四个深度学习模型。采用RF指纹和Morgan指纹构建的最佳模型(model 3B)在测试集上的准确率(ACC)为93.6%,Mathews相关系数(MCC)为0.87。此外,基于随机森林模型的输出,我们对JAK1抑制剂进行了结构-活性关系(SAR)分析。通过分析两类指纹图谱的重要键,发现高活性JAK1抑制剂中频繁出现吡唑、吡咯三唑嘧啶和吡唑嘧啶等亚结构。
{"title":"Classification of JAK1 Inhibitors and SAR Research by Machine Learning Methods","authors":"Zhenwu Yang ,&nbsp;Yujia Tian ,&nbsp;Yue Kong ,&nbsp;Yushan Zhu ,&nbsp;Aixia Yan","doi":"10.1016/j.ailsci.2022.100039","DOIUrl":"https://doi.org/10.1016/j.ailsci.2022.100039","url":null,"abstract":"<div><p>Janus kinase 1 (JAK1) is a key regulator of gene transcription, inhibition of JAK1 is an intervention for many diseases including rheumatoid arthritis and Crohn's disease. In this study, we collected a dataset containing 2982 JAK1 inhibitors, characterized molecules by MACCS fingerprints and Morgan fingerprints. We used support vector machine (SVM), decision tree (DT), random forest (RF) and extreme gradient boosting tree (XGBoost) algorithms to build 16 traditional machine learning classification models. Additionally, we utilized deep neural networks (DNN) to develop four deep learning models. The best model (Model 3B) built by RF and Morgan fingerprints achieved the accuracy (ACC) of 93.6% and Mathews correlation coefficient (MCC) of 0.87 on the test set. Furthermore, we made structure–activity relationship (SAR) analyses for JAK1 inhibitors, based on the output from the random forest models. After analyzing the important keys of two types of fingerprints, it was observed that some substructures such as pyrazole, pyrrolotriazolopyrimidine and pyrazolopyrimidine appeared frequently in highly active JAK1 inhibitors.</p></div>","PeriodicalId":72304,"journal":{"name":"Artificial intelligence in the life sciences","volume":"2 ","pages":"Article 100039"},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2667318522000101/pdfft?md5=2754446c7965603153a27ece060160a4&pid=1-s2.0-S2667318522000101-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91728648","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SyntaLinker-Hybrid: A deep learning approach for target specific drug design syntalink - hybrid:一种针对特定靶标药物设计的深度学习方法
Pub Date : 2022-12-01 DOI: 10.1016/j.ailsci.2022.100035
Yu Feng , Yuyao Yang , Wenbin Deng , Hongming Chen , Ting Ran

Target specific drug design has attracted much attention in drug discovery. But, it is a great challenge to efficiently explore the target-focused chemical space. Fragment-based drug design (FBDD) has shown its potential to do this thing. In this study, we introduced a deep learning-based fragment linking method, namely SyntaLinker-Hybrid, for target specific molecular generation. By carrying out transfer learning and fragment hybridization, this method allows to generate a great number of linker fragments to assemble given terminal fragments into the molecules with target specificity. This work demonstrates that the method has the capacity to generate target specific structures for various targets. We believe that its application could be extended to a broader target scope.

靶向性药物设计在药物发现领域受到广泛关注。但是,如何有效地探索靶向化学领域是一个巨大的挑战。基于片段的药物设计(FBDD)已经显示出它在这方面的潜力。在这项研究中,我们引入了一种基于深度学习的片段连接方法,即SyntaLinker-Hybrid,用于目标特定分子的生成。该方法通过迁移学习和片段杂交,可以产生大量的连接子片段,将给定的末端片段组装成具有目标特异性的分子。这项工作表明,该方法具有为各种目标生成目标特定结构的能力。我们认为,它的适用可以扩大到更广泛的目标范围。
{"title":"SyntaLinker-Hybrid: A deep learning approach for target specific drug design","authors":"Yu Feng ,&nbsp;Yuyao Yang ,&nbsp;Wenbin Deng ,&nbsp;Hongming Chen ,&nbsp;Ting Ran","doi":"10.1016/j.ailsci.2022.100035","DOIUrl":"https://doi.org/10.1016/j.ailsci.2022.100035","url":null,"abstract":"<div><p>Target specific drug design has attracted much attention in drug discovery. But, it is a great challenge to efficiently explore the target-focused chemical space. Fragment-based drug design (FBDD) has shown its potential to do this thing. In this study, we introduced a deep learning-based fragment linking method, namely SyntaLinker-Hybrid, for target specific molecular generation. By carrying out transfer learning and fragment hybridization, this method allows to generate a great number of linker fragments to assemble given terminal fragments into the molecules with target specificity. This work demonstrates that the method has the capacity to generate target specific structures for various targets. We believe that its application could be extended to a broader target scope.</p></div>","PeriodicalId":72304,"journal":{"name":"Artificial intelligence in the life sciences","volume":"2 ","pages":"Article 100035"},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S266731852200006X/pdfft?md5=18b885672aac997f6abccdc3b5e58b84&pid=1-s2.0-S266731852200006X-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90029604","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An unsupervised computational pipeline identifies potential repurposable drugs to treat Huntington's disease and multiple sclerosis 一个无监督的计算管道识别潜在的可重复利用的药物治疗亨廷顿氏病和多发性硬化症
Pub Date : 2022-12-01 DOI: 10.1016/j.ailsci.2022.100042
Luca Menestrina, Maurizio Recanatini

Drug repurposing consists in identifying additional uses for known drugs and, since these new findings are built on previous knowledge, it reduces both the length and the costs of the drug development. In this work, we assembled an automated computational pipeline for drug repurposing, integrating also a network-based analysis for screening the possible drug combinations. The selection of drugs relies both on their proximity to the disease on the protein-protein interactome and on their influence on the expression of disease-related genes. Combined therapies are then prioritized on the basis of the drugs’ separation on the human interactome and the known drug-drug interactions. We eventually collected a number of molecules, and their plausible combinations, that could be proposed for the treatment of Huntington's disease and multiple sclerosis. Finally, this pipeline could potentially provide new suggestions also for other complex disorders.

药物再利用包括确定已知药物的额外用途,由于这些新发现是建立在以前的知识基础上的,它减少了药物开发的时间和成本。在这项工作中,我们组装了一个用于药物再利用的自动计算管道,并集成了一个基于网络的分析来筛选可能的药物组合。药物的选择既取决于它们与疾病的接近程度,也取决于蛋白质-蛋白质相互作用组,以及它们对疾病相关基因表达的影响。然后根据药物在人体相互作用组上的分离和已知的药物-药物相互作用来优先考虑联合治疗。我们最终收集了一些分子,以及它们的合理组合,这些分子可以用于治疗亨廷顿舞蹈症和多发性硬化症。最后,这个管道也可能为其他复杂疾病提供新的建议。
{"title":"An unsupervised computational pipeline identifies potential repurposable drugs to treat Huntington's disease and multiple sclerosis","authors":"Luca Menestrina,&nbsp;Maurizio Recanatini","doi":"10.1016/j.ailsci.2022.100042","DOIUrl":"10.1016/j.ailsci.2022.100042","url":null,"abstract":"<div><p>Drug repurposing consists in identifying additional uses for known drugs and, since these new findings are built on previous knowledge, it reduces both the length and the costs of the drug development. In this work, we assembled an automated computational pipeline for drug repurposing, integrating also a network-based analysis for screening the possible drug combinations. The selection of drugs relies both on their proximity to the disease on the protein-protein interactome and on their influence on the expression of disease-related genes. Combined therapies are then prioritized on the basis of the drugs’ separation on the human interactome and the known drug-drug interactions. We eventually collected a number of molecules, and their plausible combinations, that could be proposed for the treatment of Huntington's disease and multiple sclerosis. Finally, this pipeline could potentially provide new suggestions also for other complex disorders.</p></div>","PeriodicalId":72304,"journal":{"name":"Artificial intelligence in the life sciences","volume":"2 ","pages":"Article 100042"},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2667318522000125/pdfft?md5=02a08224e3d5097be5747fc8a22c3572&pid=1-s2.0-S2667318522000125-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42636492","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
LIDeB Tools: A Latin American resource of freely available, open-source cheminformatics apps LIDeB Tools:拉丁美洲免费提供的开源化学信息学应用程序资源
Pub Date : 2022-12-01 DOI: 10.1016/j.ailsci.2022.100049
Denis N. Prada Gori, Lucas N. Alberca, Santiago Rodriguez, Juan I. Alice, Manuel A. Llanos, Carolina L. Bellera, Alan Talevi

Cheminformatics is the chemical field that deals with the storage, retrieval, analysis and manipulation of an increasing volume of available chemical data, and it plays a fundamental role in the fields of drug discovery, biology, chemistry, and biochemistry. Open source and freely available cheminformatics tools not only contribute to the generation of public knowledge, but also to reduce the technological gap between high- and low- to middle-income countries. Here, we describe a series of in-house cheminformatics applications developed by our academic drug discovery team, which are freely available on our website (https://lideb.biol.unlp.edu.ar/) as Web Apps and stand-alone versions. These apps include tools for clustering small molecules, decoy generation, druggability assessment, classificatory model evaluation, and data standardization and visualization.

化学信息学是化学领域的一门学科,它处理日益增多的可用化学数据的存储、检索、分析和操作,它在药物发现、生物学、化学和生物化学等领域起着重要作用。开源和免费提供的化学信息学工具不仅有助于公共知识的产生,而且还有助于缩小高、中低收入国家之间的技术差距。在这里,我们描述了一系列由我们的学术药物发现团队开发的内部化学信息学应用程序,这些应用程序可以在我们的网站(https://lideb.biol.unlp.edu.ar/)上作为Web应用程序和独立版本免费获得。这些应用程序包括小分子聚类、诱饵生成、药物评估、分类模型评估以及数据标准化和可视化的工具。
{"title":"LIDeB Tools: A Latin American resource of freely available, open-source cheminformatics apps","authors":"Denis N. Prada Gori,&nbsp;Lucas N. Alberca,&nbsp;Santiago Rodriguez,&nbsp;Juan I. Alice,&nbsp;Manuel A. Llanos,&nbsp;Carolina L. Bellera,&nbsp;Alan Talevi","doi":"10.1016/j.ailsci.2022.100049","DOIUrl":"10.1016/j.ailsci.2022.100049","url":null,"abstract":"<div><p>Cheminformatics is the chemical field that deals with the storage, retrieval, analysis and manipulation of an increasing volume of available chemical data, and it plays a fundamental role in the fields of drug discovery, biology, chemistry, and biochemistry. Open source and freely available cheminformatics tools not only contribute to the generation of public knowledge, but also to reduce the technological gap between high- and low- to middle-income countries. Here, we describe a series of in-house cheminformatics applications developed by our academic drug discovery team, which are freely available on our website (<span>https://lideb.biol.unlp.edu.ar/</span><svg><path></path></svg>) as Web Apps and stand-alone versions. These apps include tools for clustering small molecules, decoy generation, druggability assessment, classificatory model evaluation, and data standardization and visualization.</p></div>","PeriodicalId":72304,"journal":{"name":"Artificial intelligence in the life sciences","volume":"2 ","pages":"Article 100049"},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2667318522000198/pdfft?md5=022e4e88e07795a9a57aee98fede7162&pid=1-s2.0-S2667318522000198-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47979278","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Revisiting active learning in drug discovery through open science 通过开放科学重新审视药物发现中的主动学习
Pub Date : 2022-12-01 DOI: 10.1016/j.ailsci.2022.100051
Jürgen Bajorath
{"title":"Revisiting active learning in drug discovery through open science","authors":"Jürgen Bajorath","doi":"10.1016/j.ailsci.2022.100051","DOIUrl":"10.1016/j.ailsci.2022.100051","url":null,"abstract":"","PeriodicalId":72304,"journal":{"name":"Artificial intelligence in the life sciences","volume":"2 ","pages":"Article 100051"},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2667318522000216/pdfft?md5=b8de5d966c65ba976cccafce482b1fe8&pid=1-s2.0-S2667318522000216-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47205862","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Recent advances and application of generative adversarial networks in drug discovery, development, and targeting 生成对抗网络在药物发现、开发和靶向中的最新进展和应用
Pub Date : 2022-12-01 DOI: 10.1016/j.ailsci.2022.100045
Satvik Tripathi , Alisha Isabelle Augustin , Adam Dunlop , Rithvik Sukumaran , Suhani Dheer , Alex Zavalny , Owen Haslam , Thomas Austin , Jacob Donchez , Pushpendra Kumar Tripathi , Edward Kim

A rising amount of research demonstrates that artificial intelligence and machine learning approaches can provide an essential basis for the drug design and discovery process. Deep learning algorithms are being developed in response to recent advances in computer technology as part of the creation of therapeutically relevant medications for the treatment of a variety of ailments. In this review, we focus on the most recent advances in the areas of drug design and discovery research employing generative deep learning methodologies such as generative adversarial network (GAN) frameworks. To begin, we examine drug design and discovery studies that use several GAN methodologies to evaluate one key application, such as molecular de novo design in drug design and discovery. Furthermore, we discuss many GAN models for dimension reduction of single-cell data at the preclinical stage of the drug development pipeline. We also show various experiments in de novo peptide and protein creation utilizing GAN frameworks. Furthermore, we discuss the limits of past drug design and discovery research employing GAN models. Finally, we give a discussion on future research prospects and obstacles.

越来越多的研究表明,人工智能和机器学习方法可以为药物设计和发现过程提供必要的基础。深度学习算法的开发是为了响应计算机技术的最新进展,作为治疗各种疾病的治疗相关药物的一部分。在这篇综述中,我们重点介绍了采用生成式深度学习方法(如生成式对抗网络(GAN)框架)的药物设计和发现研究领域的最新进展。首先,我们研究了使用几种GAN方法来评估一个关键应用的药物设计和发现研究,例如药物设计和发现中的分子从头设计。此外,我们讨论了药物开发管道临床前阶段单细胞数据降维的许多GAN模型。我们还展示了利用GAN框架从头生成肽和蛋白质的各种实验。此外,我们讨论了过去使用GAN模型的药物设计和发现研究的局限性。最后,对未来的研究前景和障碍进行了讨论。
{"title":"Recent advances and application of generative adversarial networks in drug discovery, development, and targeting","authors":"Satvik Tripathi ,&nbsp;Alisha Isabelle Augustin ,&nbsp;Adam Dunlop ,&nbsp;Rithvik Sukumaran ,&nbsp;Suhani Dheer ,&nbsp;Alex Zavalny ,&nbsp;Owen Haslam ,&nbsp;Thomas Austin ,&nbsp;Jacob Donchez ,&nbsp;Pushpendra Kumar Tripathi ,&nbsp;Edward Kim","doi":"10.1016/j.ailsci.2022.100045","DOIUrl":"10.1016/j.ailsci.2022.100045","url":null,"abstract":"<div><p>A rising amount of research demonstrates that artificial intelligence and machine learning approaches can provide an essential basis for the drug design and discovery process. Deep learning algorithms are being developed in response to recent advances in computer technology as part of the creation of therapeutically relevant medications for the treatment of a variety of ailments. In this review, we focus on the most recent advances in the areas of drug design and discovery research employing generative deep learning methodologies such as generative adversarial network (GAN) frameworks. To begin, we examine drug design and discovery studies that use several GAN methodologies to evaluate one key application, such as molecular <em>de novo</em> design in drug design and discovery. Furthermore, we discuss many GAN models for dimension reduction of single-cell data at the preclinical stage of the drug development pipeline. We also show various experiments in <em>de novo</em> peptide and protein creation utilizing GAN frameworks. Furthermore, we discuss the limits of past drug design and discovery research employing GAN models. Finally, we give a discussion on future research prospects and obstacles.</p></div>","PeriodicalId":72304,"journal":{"name":"Artificial intelligence in the life sciences","volume":"2 ","pages":"Article 100045"},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2667318522000150/pdfft?md5=9c33e9c2ba0eb38e17020fefccff7451&pid=1-s2.0-S2667318522000150-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43912790","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
AI in Life Science Research – The Road Ahead 生命科学研究中的人工智能-未来之路
Pub Date : 2022-12-01 DOI: 10.1016/j.ailsci.2022.100030
Jürgen Bajorath
{"title":"AI in Life Science Research – The Road Ahead","authors":"Jürgen Bajorath","doi":"10.1016/j.ailsci.2022.100030","DOIUrl":"https://doi.org/10.1016/j.ailsci.2022.100030","url":null,"abstract":"","PeriodicalId":72304,"journal":{"name":"Artificial intelligence in the life sciences","volume":"2 ","pages":"Article 100030"},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2667318522000010/pdfft?md5=4b1645e249223d66d1d5fd7531925bf6&pid=1-s2.0-S2667318522000010-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136610939","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Open protocols for docking and MD-based scoring of peptide substrates 肽底物对接和基于MD评分的开放协议
Pub Date : 2022-12-01 DOI: 10.1016/j.ailsci.2022.100044
Rodrigo Ochoa , Ángel Santiago , Melissa Alegría-Arcos

The study of protein-peptide interactions is an active research field from an experimental and computational perspective, with the latest presenting challenges to model and simulate the peptides' intrinsic flexibility. Predicting affinities towards protein systems of interest, such as proteases, is crucial to understand the specificity of the interactions and support the discovery of novel substrates. Here we provide a set of computational protocols to run structural and dynamical analysis of protein-peptide complexes from a binding perspective. The protocols are based on state-of-the-art methods, but the code is open and can be customized depending on the user needs. These include a fragment-growing peptide docking protocol to predict bound conformations of flexible peptides, a protocol to extract descriptors from protein-peptide molecular dynamics trajectories, and a workflow to build and test machine learning regression models. As a toy example, we applied the protocols to a serine protease structure with a set of known peptide substrates and random sequences to illustrate the use of the code, which is publicly available at: https://github.com/rochoa85/Protocols-Peptide-Binding

从实验和计算的角度来看,蛋白质-肽相互作用的研究是一个活跃的研究领域,最新的挑战是建立和模拟肽的内在灵活性。预测对感兴趣的蛋白质系统(如蛋白酶)的亲和力对于理解相互作用的特异性和支持新底物的发现至关重要。在这里,我们提供了一套计算协议运行结构和动态分析的蛋白质-肽复合物从结合的角度。这些协议基于最先进的方法,但代码是开放的,可以根据用户的需要进行定制。其中包括用于预测柔性肽结合构象的片段生长肽对接协议,用于从蛋白质-肽分子动力学轨迹中提取描述符的协议,以及构建和测试机器学习回归模型的工作流程。作为一个简单的例子,我们将该协议应用于具有一组已知肽底物和随机序列的丝氨酸蛋白酶结构,以说明该代码的使用,该代码可在:https://github.com/rochoa85/Protocols-Peptide-Binding上公开获得
{"title":"Open protocols for docking and MD-based scoring of peptide substrates","authors":"Rodrigo Ochoa ,&nbsp;Ángel Santiago ,&nbsp;Melissa Alegría-Arcos","doi":"10.1016/j.ailsci.2022.100044","DOIUrl":"10.1016/j.ailsci.2022.100044","url":null,"abstract":"<div><p>The study of protein-peptide interactions is an active research field from an experimental and computational perspective, with the latest presenting challenges to model and simulate the peptides' intrinsic flexibility. Predicting affinities towards protein systems of interest, such as proteases, is crucial to understand the specificity of the interactions and support the discovery of novel substrates. Here we provide a set of computational protocols to run structural and dynamical analysis of protein-peptide complexes from a binding perspective. The protocols are based on state-of-the-art methods, but the code is open and can be customized depending on the user needs. These include a fragment-growing peptide docking protocol to predict bound conformations of flexible peptides, a protocol to extract descriptors from protein-peptide molecular dynamics trajectories, and a workflow to build and test machine learning regression models. As a toy example, we applied the protocols to a serine protease structure with a set of known peptide substrates and random sequences to illustrate the use of the code, which is publicly available at: <span>https://github.com/rochoa85/Protocols-Peptide-Binding</span><svg><path></path></svg></p></div>","PeriodicalId":72304,"journal":{"name":"Artificial intelligence in the life sciences","volume":"2 ","pages":"Article 100044"},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2667318522000149/pdfft?md5=37f48baa6e0b2e91691325276818a26d&pid=1-s2.0-S2667318522000149-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41545827","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The commoditization of AI for molecule design 人工智能在分子设计中的商品化
Pub Date : 2022-12-01 DOI: 10.1016/j.ailsci.2022.100031
Fabio Urbina, Sean Ekins

Anyone involved in designing or finding molecules in the life sciences over the past few years has witnessed a dramatic change in how we now work due to the COVID-19 pandemic. Computational technologies like artificial intelligence (AI) seemed to become ubiquitous in 2020 and have been increasingly applied as scientists worked from home and were separated from the laboratory and their colleagues. This shift may be more permanent as the future of molecule design across different industries will increasingly require machine learning models for design and optimization of molecules as they become “designed by AI”. AI and machine learning has essentially become a commodity within the pharmaceutical industry. This perspective will briefly describe our personal opinions of how machine learning has evolved and is being applied to model different molecule properties that crosses industries in their utility and ultimately suggests the potential for tight integration of AI into equipment and automated experimental pipelines. It will also describe how many groups have implemented generative models covering different architectures, for de novo design of molecules. We also highlight some of the companies at the forefront of using AI to demonstrate how machine learning has impacted and influenced our work. Finally, we will peer into the future and suggest some of the areas that represent the most interesting technologies that may shape the future of molecule design, highlighting how we can help increase the efficiency of the design-make-test cycle which is currently a major focus across industries.

在过去几年中,任何参与设计或发现生命科学分子的人都目睹了由于COVID-19大流行,我们现在的工作方式发生了巨大变化。人工智能(AI)等计算技术似乎在2020年变得无处不在,随着科学家在家工作、与实验室和同事分离,人工智能(AI)等计算技术的应用越来越多。这种转变可能会更加持久,因为未来不同行业的分子设计将越来越多地需要机器学习模型来设计和优化分子,因为它们变得“由人工智能设计”。人工智能和机器学习基本上已经成为制药行业的一种商品。这一观点将简要描述我们个人对机器学习的看法,即机器学习是如何发展的,如何被应用于跨行业的不同分子特性的建模,并最终表明将人工智能紧密集成到设备和自动化实验管道中的潜力。它还将描述有多少小组已经实现了涵盖不同架构的生成模型,用于分子的从头设计。我们还重点介绍了一些在使用人工智能方面处于前沿的公司,以展示机器学习如何影响和影响我们的工作。最后,我们将展望未来,并提出一些最有趣的技术领域,这些技术可能会塑造分子设计的未来,强调我们如何帮助提高设计-制造-测试周期的效率,这是目前各行业关注的主要焦点。
{"title":"The commoditization of AI for molecule design","authors":"Fabio Urbina,&nbsp;Sean Ekins","doi":"10.1016/j.ailsci.2022.100031","DOIUrl":"10.1016/j.ailsci.2022.100031","url":null,"abstract":"<div><p>Anyone involved in designing or finding molecules in the life sciences over the past few years has witnessed a dramatic change in how we now work due to the COVID-19 pandemic. Computational technologies like artificial intelligence (AI) seemed to become ubiquitous in 2020 and have been increasingly applied as scientists worked from home and were separated from the laboratory and their colleagues. This shift may be more permanent as the future of molecule design across different industries will increasingly require machine learning models for design and optimization of molecules as they become “designed by AI”. AI and machine learning has essentially become a commodity within the pharmaceutical industry. This perspective will briefly describe our personal opinions of how machine learning has evolved and is being applied to model different molecule properties that crosses industries in their utility and ultimately suggests the potential for tight integration of AI into equipment and automated experimental pipelines. It will also describe how many groups have implemented generative models covering different architectures, for <em>de novo</em> design of molecules. We also highlight some of the companies at the forefront of using AI to demonstrate how machine learning has impacted and influenced our work. Finally, we will peer into the future and suggest some of the areas that represent the most interesting technologies that may shape the future of molecule design, highlighting how we can help increase the efficiency of the design-make-test cycle which is currently a major focus across industries.</p></div>","PeriodicalId":72304,"journal":{"name":"Artificial intelligence in the life sciences","volume":"2 ","pages":"Article 100031"},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9541920/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10653331","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Optimizing active learning for free energy calculations 优化自由能计算的主动学习
Pub Date : 2022-12-01 DOI: 10.1016/j.ailsci.2022.100050
James Thompson , W Patrick Walters , Jianwen A Feng , Nicolas A Pabon , Hongcheng Xu , Michael Maser , Brian B Goldman , Demetri Moustakas , Molly Schmidt , Forrest York

While Relative Binding Free Energy (RBFE) calculations have become a mainstay in lead optimization programs, the computational expense of performing these calculations has limited their broader application. Active learning (AL), a machine learning method used to direct a search iteratively, has explored larger chemical libraries using RBFE calculations. While AL has been successfully applied, there has not been a systematic study of the impact of parameter settings on the performance of AL. To address this gap, we have generated an exhaustive dataset of RBFE calculations on 10,000 congeneric molecules. We used this dataset to explore the impact of several AL design choices, including the number of molecules sampled at each iteration, the method used to select an initial sample, the method used to build a machine learning model, and the acquisition function that defines the balance between exploration and exploitation in the search. Our studies demonstrated that the performance of AL is largely insensitive to the specific machine learning method and acquisition functions used. In our studies, the most significant factor impacting performance was the number of molecules sampled at each iteration where selecting too few molecules hurts performance. Under the best conditions, we were able to identify 75% of the 100 top scoring molecules by sampling only 6% of the dataset. We hope that the dataset of 10K molecules will provide the basis for future studies exploring additional AL strategies. The source code and supporting data for the work are available at https://github.com/google-research/google-research/tree/master/al_for_fep.

虽然相对结合自由能(RBFE)计算已经成为引线优化程序的主要内容,但执行这些计算的计算费用限制了它们的广泛应用。主动学习(AL)是一种用于迭代指导搜索的机器学习方法,已经使用RBFE计算探索了更大的化学库。虽然人工智能已经成功应用,但还没有系统地研究参数设置对人工智能性能的影响。为了解决这一差距,我们生成了一个详尽的数据集,其中包含了10,000个同源分子的RBFE计算。我们使用该数据集来探索几种人工智能设计选择的影响,包括每次迭代时采样的分子数量,用于选择初始样本的方法,用于构建机器学习模型的方法,以及定义搜索中探索和利用之间平衡的获取函数。我们的研究表明,人工智能的性能在很大程度上对所使用的特定机器学习方法和获取函数不敏感。在我们的研究中,影响性能的最重要因素是每次迭代中采样的分子数量,而选择太少的分子会损害性能。在最好的条件下,我们能够通过仅采样数据集的6%来识别100个得分最高的分子中的75%。我们希望10K个分子的数据集将为未来探索其他人工智能策略的研究提供基础。该工作的源代码和支持数据可在https://github.com/google-research/google-research/tree/master/al_for_fep上获得。
{"title":"Optimizing active learning for free energy calculations","authors":"James Thompson ,&nbsp;W Patrick Walters ,&nbsp;Jianwen A Feng ,&nbsp;Nicolas A Pabon ,&nbsp;Hongcheng Xu ,&nbsp;Michael Maser ,&nbsp;Brian B Goldman ,&nbsp;Demetri Moustakas ,&nbsp;Molly Schmidt ,&nbsp;Forrest York","doi":"10.1016/j.ailsci.2022.100050","DOIUrl":"10.1016/j.ailsci.2022.100050","url":null,"abstract":"<div><p>While Relative Binding Free Energy (RBFE) calculations have become a mainstay in lead optimization programs, the computational expense of performing these calculations has limited their broader application. Active learning (AL), a machine learning method used to direct a search iteratively, has explored larger chemical libraries using RBFE calculations. While AL has been successfully applied, there has not been a systematic study of the impact of parameter settings on the performance of AL. To address this gap, we have generated an exhaustive dataset of RBFE calculations on 10,000 congeneric molecules. We used this dataset to explore the impact of several AL design choices, including the number of molecules sampled at each iteration, the method used to select an initial sample, the method used to build a machine learning model, and the acquisition function that defines the balance between exploration and exploitation in the search. Our studies demonstrated that the performance of AL is largely insensitive to the specific machine learning method and acquisition functions used. In our studies, the most significant factor impacting performance was the number of molecules sampled at each iteration where selecting too few molecules hurts performance. Under the best conditions, we were able to identify 75% of the 100 top scoring molecules by sampling only 6% of the dataset. We hope that the dataset of 10K molecules will provide the basis for future studies exploring additional AL strategies. The source code and supporting data for the work are available at <span>https://github.com/google-research/google-research/tree/master/al_for_fep</span><svg><path></path></svg>.</p></div>","PeriodicalId":72304,"journal":{"name":"Artificial intelligence in the life sciences","volume":"2 ","pages":"Article 100050"},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2667318522000204/pdfft?md5=fd95fcb1f3da91cd7543db829403ca90&pid=1-s2.0-S2667318522000204-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48384591","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
期刊
Artificial intelligence in the life sciences
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1