Artificial intelligence in the life sciences最新文献_第10页

Revisiting active learning in drug discovery through open science 通过开放科学重新审视药物发现中的主动学习

Artificial intelligence in the life sciences

Pub Date : 2022-12-01 DOI: 10.1016/j.ailsci.2022.100051

Jürgen Bajorath

引用次数: 0

Recent advances and application of generative adversarial networks in drug discovery, development, and targeting 生成对抗网络在药物发现、开发和靶向中的最新进展和应用

Artificial intelligence in the life sciences

Pub Date : 2022-12-01 DOI: 10.1016/j.ailsci.2022.100045

Satvik Tripathi , Alisha Isabelle Augustin , Adam Dunlop , Rithvik Sukumaran , Suhani Dheer , Alex Zavalny , Owen Haslam , Thomas Austin , Jacob Donchez , Pushpendra Kumar Tripathi , Edward Kim

A rising amount of research demonstrates that artificial intelligence and machine learning approaches can provide an essential basis for the drug design and discovery process. Deep learning algorithms are being developed in response to recent advances in computer technology as part of the creation of therapeutically relevant medications for the treatment of a variety of ailments. In this review, we focus on the most recent advances in the areas of drug design and discovery research employing generative deep learning methodologies such as generative adversarial network (GAN) frameworks. To begin, we examine drug design and discovery studies that use several GAN methodologies to evaluate one key application, such as molecular de novo design in drug design and discovery. Furthermore, we discuss many GAN models for dimension reduction of single-cell data at the preclinical stage of the drug development pipeline. We also show various experiments in de novo peptide and protein creation utilizing GAN frameworks. Furthermore, we discuss the limits of past drug design and discovery research employing GAN models. Finally, we give a discussion on future research prospects and obstacles.

越来越多的研究表明，人工智能和机器学习方法可以为药物设计和发现过程提供必要的基础。深度学习算法的开发是为了响应计算机技术的最新进展，作为治疗各种疾病的治疗相关药物的一部分。在这篇综述中，我们重点介绍了采用生成式深度学习方法(如生成式对抗网络(GAN)框架)的药物设计和发现研究领域的最新进展。首先，我们研究了使用几种GAN方法来评估一个关键应用的药物设计和发现研究，例如药物设计和发现中的分子从头设计。此外，我们讨论了药物开发管道临床前阶段单细胞数据降维的许多GAN模型。我们还展示了利用GAN框架从头生成肽和蛋白质的各种实验。此外，我们讨论了过去使用GAN模型的药物设计和发现研究的局限性。最后，对未来的研究前景和障碍进行了讨论。

{"title":"Recent advances and application of generative adversarial networks in drug discovery, development, and targeting","authors":"Satvik Tripathi , Alisha Isabelle Augustin , Adam Dunlop , Rithvik Sukumaran , Suhani Dheer , Alex Zavalny , Owen Haslam , Thomas Austin , Jacob Donchez , Pushpendra Kumar Tripathi , Edward Kim","doi":"10.1016/j.ailsci.2022.100045","DOIUrl":"10.1016/j.ailsci.2022.100045","url":null,"abstract":"<div><p>A rising amount of research demonstrates that artificial intelligence and machine learning approaches can provide an essential basis for the drug design and discovery process. Deep learning algorithms are being developed in response to recent advances in computer technology as part of the creation of therapeutically relevant medications for the treatment of a variety of ailments. In this review, we focus on the most recent advances in the areas of drug design and discovery research employing generative deep learning methodologies such as generative adversarial network (GAN) frameworks. To begin, we examine drug design and discovery studies that use several GAN methodologies to evaluate one key application, such as molecular <em>de novo</em> design in drug design and discovery. Furthermore, we discuss many GAN models for dimension reduction of single-cell data at the preclinical stage of the drug development pipeline. We also show various experiments in <em>de novo</em> peptide and protein creation utilizing GAN frameworks. Furthermore, we discuss the limits of past drug design and discovery research employing GAN models. Finally, we give a discussion on future research prospects and obstacles.</p></div>","PeriodicalId":72304,"journal":{"name":"Artificial intelligence in the life sciences","volume":"2 ","pages":"Article 100045"},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2667318522000150/pdfft?md5=9c33e9c2ba0eb38e17020fefccff7451&pid=1-s2.0-S2667318522000150-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43912790","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

AI in Life Science Research – The Road Ahead 生命科学研究中的人工智能-未来之路

Artificial intelligence in the life sciences

Pub Date : 2022-12-01 DOI: 10.1016/j.ailsci.2022.100030

Jürgen Bajorath

引用次数: 0

Open protocols for docking and MD-based scoring of peptide substrates 肽底物对接和基于MD评分的开放协议

Artificial intelligence in the life sciences

Pub Date : 2022-12-01 DOI: 10.1016/j.ailsci.2022.100044

Rodrigo Ochoa , Ángel Santiago , Melissa Alegría-Arcos

The study of protein-peptide interactions is an active research field from an experimental and computational perspective, with the latest presenting challenges to model and simulate the peptides' intrinsic flexibility. Predicting affinities towards protein systems of interest, such as proteases, is crucial to understand the specificity of the interactions and support the discovery of novel substrates. Here we provide a set of computational protocols to run structural and dynamical analysis of protein-peptide complexes from a binding perspective. The protocols are based on state-of-the-art methods, but the code is open and can be customized depending on the user needs. These include a fragment-growing peptide docking protocol to predict bound conformations of flexible peptides, a protocol to extract descriptors from protein-peptide molecular dynamics trajectories, and a workflow to build and test machine learning regression models. As a toy example, we applied the protocols to a serine protease structure with a set of known peptide substrates and random sequences to illustrate the use of the code, which is publicly available at: https://github.com/rochoa85/Protocols-Peptide-Binding

从实验和计算的角度来看，蛋白质-肽相互作用的研究是一个活跃的研究领域，最新的挑战是建立和模拟肽的内在灵活性。预测对感兴趣的蛋白质系统(如蛋白酶)的亲和力对于理解相互作用的特异性和支持新底物的发现至关重要。在这里，我们提供了一套计算协议运行结构和动态分析的蛋白质-肽复合物从结合的角度。这些协议基于最先进的方法，但代码是开放的，可以根据用户的需要进行定制。其中包括用于预测柔性肽结合构象的片段生长肽对接协议，用于从蛋白质-肽分子动力学轨迹中提取描述符的协议，以及构建和测试机器学习回归模型的工作流程。作为一个简单的例子，我们将该协议应用于具有一组已知肽底物和随机序列的丝氨酸蛋白酶结构，以说明该代码的使用，该代码可在:https://github.com/rochoa85/Protocols-Peptide-Binding上公开获得

{"title":"Open protocols for docking and MD-based scoring of peptide substrates","authors":"Rodrigo Ochoa , Ángel Santiago , Melissa Alegría-Arcos","doi":"10.1016/j.ailsci.2022.100044","DOIUrl":"10.1016/j.ailsci.2022.100044","url":null,"abstract":"<div><p>The study of protein-peptide interactions is an active research field from an experimental and computational perspective, with the latest presenting challenges to model and simulate the peptides' intrinsic flexibility. Predicting affinities towards protein systems of interest, such as proteases, is crucial to understand the specificity of the interactions and support the discovery of novel substrates. Here we provide a set of computational protocols to run structural and dynamical analysis of protein-peptide complexes from a binding perspective. The protocols are based on state-of-the-art methods, but the code is open and can be customized depending on the user needs. These include a fragment-growing peptide docking protocol to predict bound conformations of flexible peptides, a protocol to extract descriptors from protein-peptide molecular dynamics trajectories, and a workflow to build and test machine learning regression models. As a toy example, we applied the protocols to a serine protease structure with a set of known peptide substrates and random sequences to illustrate the use of the code, which is publicly available at: <span>https://github.com/rochoa85/Protocols-Peptide-Binding</span><svg><path></path></svg></p></div>","PeriodicalId":72304,"journal":{"name":"Artificial intelligence in the life sciences","volume":"2 ","pages":"Article 100044"},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2667318522000149/pdfft?md5=37f48baa6e0b2e91691325276818a26d&pid=1-s2.0-S2667318522000149-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41545827","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

The commoditization of AI for molecule design 人工智能在分子设计中的商品化

Artificial intelligence in the life sciences

Pub Date : 2022-12-01 DOI: 10.1016/j.ailsci.2022.100031

Fabio Urbina, Sean Ekins

Anyone involved in designing or finding molecules in the life sciences over the past few years has witnessed a dramatic change in how we now work due to the COVID-19 pandemic. Computational technologies like artificial intelligence (AI) seemed to become ubiquitous in 2020 and have been increasingly applied as scientists worked from home and were separated from the laboratory and their colleagues. This shift may be more permanent as the future of molecule design across different industries will increasingly require machine learning models for design and optimization of molecules as they become “designed by AI”. AI and machine learning has essentially become a commodity within the pharmaceutical industry. This perspective will briefly describe our personal opinions of how machine learning has evolved and is being applied to model different molecule properties that crosses industries in their utility and ultimately suggests the potential for tight integration of AI into equipment and automated experimental pipelines. It will also describe how many groups have implemented generative models covering different architectures, for de novo design of molecules. We also highlight some of the companies at the forefront of using AI to demonstrate how machine learning has impacted and influenced our work. Finally, we will peer into the future and suggest some of the areas that represent the most interesting technologies that may shape the future of molecule design, highlighting how we can help increase the efficiency of the design-make-test cycle which is currently a major focus across industries.

在过去几年中，任何参与设计或发现生命科学分子的人都目睹了由于COVID-19大流行，我们现在的工作方式发生了巨大变化。人工智能(AI)等计算技术似乎在2020年变得无处不在，随着科学家在家工作、与实验室和同事分离，人工智能(AI)等计算技术的应用越来越多。这种转变可能会更加持久，因为未来不同行业的分子设计将越来越多地需要机器学习模型来设计和优化分子，因为它们变得“由人工智能设计”。人工智能和机器学习基本上已经成为制药行业的一种商品。这一观点将简要描述我们个人对机器学习的看法，即机器学习是如何发展的，如何被应用于跨行业的不同分子特性的建模，并最终表明将人工智能紧密集成到设备和自动化实验管道中的潜力。它还将描述有多少小组已经实现了涵盖不同架构的生成模型，用于分子的从头设计。我们还重点介绍了一些在使用人工智能方面处于前沿的公司，以展示机器学习如何影响和影响我们的工作。最后，我们将展望未来，并提出一些最有趣的技术领域，这些技术可能会塑造分子设计的未来，强调我们如何帮助提高设计-制造-测试周期的效率，这是目前各行业关注的主要焦点。

{"title":"The commoditization of AI for molecule design","authors":"Fabio Urbina, Sean Ekins","doi":"10.1016/j.ailsci.2022.100031","DOIUrl":"10.1016/j.ailsci.2022.100031","url":null,"abstract":"<div><p>Anyone involved in designing or finding molecules in the life sciences over the past few years has witnessed a dramatic change in how we now work due to the COVID-19 pandemic. Computational technologies like artificial intelligence (AI) seemed to become ubiquitous in 2020 and have been increasingly applied as scientists worked from home and were separated from the laboratory and their colleagues. This shift may be more permanent as the future of molecule design across different industries will increasingly require machine learning models for design and optimization of molecules as they become “designed by AI”. AI and machine learning has essentially become a commodity within the pharmaceutical industry. This perspective will briefly describe our personal opinions of how machine learning has evolved and is being applied to model different molecule properties that crosses industries in their utility and ultimately suggests the potential for tight integration of AI into equipment and automated experimental pipelines. It will also describe how many groups have implemented generative models covering different architectures, for <em>de novo</em> design of molecules. We also highlight some of the companies at the forefront of using AI to demonstrate how machine learning has impacted and influenced our work. Finally, we will peer into the future and suggest some of the areas that represent the most interesting technologies that may shape the future of molecule design, highlighting how we can help increase the efficiency of the design-make-test cycle which is currently a major focus across industries.</p></div>","PeriodicalId":72304,"journal":{"name":"Artificial intelligence in the life sciences","volume":"2 ","pages":"Article 100031"},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9541920/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10653331","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Optimizing active learning for free energy calculations 优化自由能计算的主动学习

Artificial intelligence in the life sciences

Pub Date : 2022-12-01 DOI: 10.1016/j.ailsci.2022.100050

James Thompson , W Patrick Walters , Jianwen A Feng , Nicolas A Pabon , Hongcheng Xu , Michael Maser , Brian B Goldman , Demetri Moustakas , Molly Schmidt , Forrest York

While Relative Binding Free Energy (RBFE) calculations have become a mainstay in lead optimization programs, the computational expense of performing these calculations has limited their broader application. Active learning (AL), a machine learning method used to direct a search iteratively, has explored larger chemical libraries using RBFE calculations. While AL has been successfully applied, there has not been a systematic study of the impact of parameter settings on the performance of AL. To address this gap, we have generated an exhaustive dataset of RBFE calculations on 10,000 congeneric molecules. We used this dataset to explore the impact of several AL design choices, including the number of molecules sampled at each iteration, the method used to select an initial sample, the method used to build a machine learning model, and the acquisition function that defines the balance between exploration and exploitation in the search. Our studies demonstrated that the performance of AL is largely insensitive to the specific machine learning method and acquisition functions used. In our studies, the most significant factor impacting performance was the number of molecules sampled at each iteration where selecting too few molecules hurts performance. Under the best conditions, we were able to identify 75% of the 100 top scoring molecules by sampling only 6% of the dataset. We hope that the dataset of 10K molecules will provide the basis for future studies exploring additional AL strategies. The source code and supporting data for the work are available at https://github.com/google-research/google-research/tree/master/al_for_fep.

虽然相对结合自由能(RBFE)计算已经成为引线优化程序的主要内容，但执行这些计算的计算费用限制了它们的广泛应用。主动学习(AL)是一种用于迭代指导搜索的机器学习方法，已经使用RBFE计算探索了更大的化学库。虽然人工智能已经成功应用，但还没有系统地研究参数设置对人工智能性能的影响。为了解决这一差距，我们生成了一个详尽的数据集，其中包含了10,000个同源分子的RBFE计算。我们使用该数据集来探索几种人工智能设计选择的影响，包括每次迭代时采样的分子数量，用于选择初始样本的方法，用于构建机器学习模型的方法，以及定义搜索中探索和利用之间平衡的获取函数。我们的研究表明，人工智能的性能在很大程度上对所使用的特定机器学习方法和获取函数不敏感。在我们的研究中，影响性能的最重要因素是每次迭代中采样的分子数量，而选择太少的分子会损害性能。在最好的条件下，我们能够通过仅采样数据集的6%来识别100个得分最高的分子中的75%。我们希望10K个分子的数据集将为未来探索其他人工智能策略的研究提供基础。该工作的源代码和支持数据可在https://github.com/google-research/google-research/tree/master/al_for_fep上获得。

{"title":"Optimizing active learning for free energy calculations","authors":"James Thompson , W Patrick Walters , Jianwen A Feng , Nicolas A Pabon , Hongcheng Xu , Michael Maser , Brian B Goldman , Demetri Moustakas , Molly Schmidt , Forrest York","doi":"10.1016/j.ailsci.2022.100050","DOIUrl":"10.1016/j.ailsci.2022.100050","url":null,"abstract":"<div><p>While Relative Binding Free Energy (RBFE) calculations have become a mainstay in lead optimization programs, the computational expense of performing these calculations has limited their broader application. Active learning (AL), a machine learning method used to direct a search iteratively, has explored larger chemical libraries using RBFE calculations. While AL has been successfully applied, there has not been a systematic study of the impact of parameter settings on the performance of AL. To address this gap, we have generated an exhaustive dataset of RBFE calculations on 10,000 congeneric molecules. We used this dataset to explore the impact of several AL design choices, including the number of molecules sampled at each iteration, the method used to select an initial sample, the method used to build a machine learning model, and the acquisition function that defines the balance between exploration and exploitation in the search. Our studies demonstrated that the performance of AL is largely insensitive to the specific machine learning method and acquisition functions used. In our studies, the most significant factor impacting performance was the number of molecules sampled at each iteration where selecting too few molecules hurts performance. Under the best conditions, we were able to identify 75% of the 100 top scoring molecules by sampling only 6% of the dataset. We hope that the dataset of 10K molecules will provide the basis for future studies exploring additional AL strategies. The source code and supporting data for the work are available at <span>https://github.com/google-research/google-research/tree/master/al_for_fep</span><svg><path></path></svg>.</p></div>","PeriodicalId":72304,"journal":{"name":"Artificial intelligence in the life sciences","volume":"2 ","pages":"Article 100050"},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2667318522000204/pdfft?md5=fd95fcb1f3da91cd7543db829403ca90&pid=1-s2.0-S2667318522000204-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48384591","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

Modeling bioconcentration factors in fish with explainable deep learning 利用可解释的深度学习建模鱼类的生物富集因子

Artificial intelligence in the life sciences

Pub Date : 2022-12-01 DOI: 10.1016/j.ailsci.2022.100047

Linlin Zhao , Floriane Montanari , Henry Heberle , Sebastian Schmidt

The Bioconcentration Factor (BCF) is an important parameter in the environmental risk assessment of chemicals, relevant for industrial and academic research as well as required in many regulatory contexts. It represents the potential of a substance to accumulate in organic tissues or whole animals and is most frequently measured in fish. However, animal welfare reasons, throughput limitations, and costs push the need for alternative methods that allow accurate and reliable estimations of BCF in silico. We present a new deep learning model to predict BCF values from chemical structures, that outperforms currently available models ( $R^{2}$ of 0.68 and RMSE of 0.59 log units on an external test set; $R^{2}$ of 0.70 and RMSE of 0.74 log units in a demanding cluster split validation). The model is based on molecular representations encoded as CDDD descriptors and exploits a large in-house dataset with measured logD values as an auxiliary task.

Additionally, we developed a post-hoc explainability method based on SMILES character substitutions to accompany our predictions with atom-level interpretations. These sensitivity scores highlight the most influential moieties in the molecule and can help to understand the predictions better and design new molecules.

生物浓度因子(BCF)是化学品环境风险评估中的一个重要参数，与工业和学术研究相关，并且在许多监管环境中都需要。它代表了一种物质在有机组织或整个动物中积累的潜力，最常在鱼类中测量。然而，动物福利的原因，吞吐量限制和成本推动了对替代方法的需求，这些方法可以准确可靠地估计BCF。我们提出了一个新的深度学习模型来预测化学结构的BCF值，该模型优于目前可用的模型(在外部测试集上R2为0.68,RMSE为0.59 log units;R2为0.70,RMSE为0.74 log单位(要求较高的集群分割验证)。该模型基于编码为CDDD描述符的分子表示，并利用具有测量logD值的大型内部数据集作为辅助任务。此外，我们开发了一种基于SMILES字符替换的事后可解释性方法，使我们的预测与原子水平的解释相结合。这些敏感性分数突出了分子中最具影响力的部分，可以帮助更好地理解预测并设计新的分子。

{"title":"Modeling bioconcentration factors in fish with explainable deep learning","authors":"Linlin Zhao , Floriane Montanari , Henry Heberle , Sebastian Schmidt","doi":"10.1016/j.ailsci.2022.100047","DOIUrl":"10.1016/j.ailsci.2022.100047","url":null,"abstract":"<div><p>The Bioconcentration Factor (BCF) is an important parameter in the environmental risk assessment of chemicals, relevant for industrial and academic research as well as required in many regulatory contexts. It represents the potential of a substance to accumulate in organic tissues or whole animals and is most frequently measured in fish. However, animal welfare reasons, throughput limitations, and costs push the need for alternative methods that allow accurate and reliable estimations of BCF in silico. We present a new deep learning model to predict BCF values from chemical structures, that outperforms currently available models (<span><math><msup><mi>R</mi><mn>2</mn></msup></math></span> of 0.68 and RMSE of 0.59 log units on an external test set; <span><math><msup><mi>R</mi><mn>2</mn></msup></math></span> of 0.70 and RMSE of 0.74 log units in a demanding cluster split validation). The model is based on molecular representations encoded as CDDD descriptors and exploits a large in-house dataset with measured logD values as an auxiliary task.</p><p>Additionally, we developed a post-hoc explainability method based on SMILES character substitutions to accompany our predictions with atom-level interpretations. These sensitivity scores highlight the most influential moieties in the molecule and can help to understand the predictions better and design new molecules.</p></div>","PeriodicalId":72304,"journal":{"name":"Artificial intelligence in the life sciences","volume":"2 ","pages":"Article 100047"},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2667318522000174/pdfft?md5=d1e08bc12ac334ce4c4ea0eb17936560&pid=1-s2.0-S2667318522000174-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45371673","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Symbolic regression for the interpretation of quantitative structure-property relationships 符号回归在定量构效关系解释中的应用

Artificial intelligence in the life sciences

Pub Date : 2022-12-01 DOI: 10.1016/j.ailsci.2022.100046

Katsushi Takaki , Tomoyuki Miyao

The interpretation of quantitative structure–activity or structure–property relationships is important in the field of chemoinformatics. Although multivariate linear regression models are typically interpretable, they do not generally have high predictive abilities. Symbolic regression (SR) combined with genetic programming (GP) is a well-established technique for generating the mathematical expressions that describe the relationships within a dataset. However, SR sometimes produces complicated expressions that are hard for humans to interpret. This paper proposes a method for generating simpler expressions by incorporating three filters into GP-based SR. The filters are further combined with nonlinear least-squares optimization to give filter-introduced GP (FIGP), which improves the predictive ability of SR models while retaining simple expressions. As a proof-of-concept, the quantitative estimate of drug-likeness and the synthetic accessibility score are predicted based on the chemical structures of compounds. Overall, FIGP generates less-complicated expressions than previous SR methods. In terms of predictive ability, FIGP is better than GP, but is outperformed by a support vector machine with a radial basis function kernel. Furthermore, quantitative structure–activity relationship models are constructed for three matching molecular series with biological targets. In the case of one target, the activity prediction models given by FIGP exhibit better predictive ability than multivariate linear regression and support vector regression with the radial basis function kernel, whereas for the remaining cases, FIGP is slightly less accurate than multivariate linear regression.

定量结构-活性或结构-性质关系的解释在化学信息学领域是重要的。虽然多元线性回归模型通常是可解释的，但它们通常没有很高的预测能力。符号回归(SR)结合遗传规划(GP)是一种成熟的技术，用于生成描述数据集中关系的数学表达式。然而，SR有时会产生人类难以理解的复杂表达。本文提出了一种将三个滤波器合并到基于遗传算法的遗传算法中生成更简单表达式的方法，并将这些滤波器与非线性最小二乘优化相结合，得到滤波引入遗传算法(FIGP)，在保留简单表达式的同时提高了遗传算法模型的预测能力。作为概念验证，基于化合物的化学结构预测了药物相似性的定量估计和合成可及性评分。总的来说，FIGP生成的表达式比以前的SR方法简单。在预测能力方面，FIGP优于GP，但优于具有径向基函数核的支持向量机。在此基础上，构建了具有生物靶点的三个匹配分子序列的定量构效关系模型。在一个目标的情况下，FIGP给出的活动预测模型的预测能力优于多元线性回归和径向基函数核支持向量回归，而在其余情况下，FIGP的预测精度略低于多元线性回归。

{"title":"Symbolic regression for the interpretation of quantitative structure-property relationships","authors":"Katsushi Takaki , Tomoyuki Miyao","doi":"10.1016/j.ailsci.2022.100046","DOIUrl":"10.1016/j.ailsci.2022.100046","url":null,"abstract":"<div><p>The interpretation of quantitative structure–activity or structure–property relationships is important in the field of chemoinformatics. Although multivariate linear regression models are typically interpretable, they do not generally have high predictive abilities. Symbolic regression (SR) combined with genetic programming (GP) is a well-established technique for generating the mathematical expressions that describe the relationships within a dataset. However, SR sometimes produces complicated expressions that are hard for humans to interpret. This paper proposes a method for generating simpler expressions by incorporating three filters into GP-based SR. The filters are further combined with nonlinear least-squares optimization to give filter-introduced GP (FIGP), which improves the predictive ability of SR models while retaining simple expressions. As a proof-of-concept, the quantitative estimate of drug-likeness and the synthetic accessibility score are predicted based on the chemical structures of compounds. Overall, FIGP generates less-complicated expressions than previous SR methods. In terms of predictive ability, FIGP is better than GP, but is outperformed by a support vector machine with a radial basis function kernel. Furthermore, quantitative structure–activity relationship models are constructed for three matching molecular series with biological targets. In the case of one target, the activity prediction models given by FIGP exhibit better predictive ability than multivariate linear regression and support vector regression with the radial basis function kernel, whereas for the remaining cases, FIGP is slightly less accurate than multivariate linear regression.</p></div>","PeriodicalId":72304,"journal":{"name":"Artificial intelligence in the life sciences","volume":"2 ","pages":"Article 100046"},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2667318522000162/pdfft?md5=d40d5f4fb6a5861ba6faf6c4bcb2c52c&pid=1-s2.0-S2667318522000162-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42959550","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Interpretation of multi-task clearance models from molecular images supported by experimental design 从实验设计支持的分子图像中解释多任务清除模型

Artificial intelligence in the life sciences

Pub Date : 2022-12-01 DOI: 10.1016/j.ailsci.2022.100048

Andrés Martínez Mora , Mickael Mogemark , Vigneshwari Subramanian , Filip Miljković

Recent methodological advances in deep learning (DL) architectures have not only improved the performance of predictive models but also enhanced their interpretability potential, thus considerably increasing their transparency. In the context of medicinal chemistry, the potential to not only accurately predict molecular properties, but also chemically interpret them, would be strongly preferred. Previously, we developed accurate multi-task convolutional neural network (CNN) and graph convolutional neural network (GCNN) models to predict a set of diverse intrinsic metabolic clearance parameters from image- and graph-based molecular representations, respectively. Herein, we introduce several model interpretability frameworks to answer whether the model explanations obtained from CNN and GCNN multi-task clearance models could be applied to predict chemical transformations associated with experimentally confirmed metabolic products. We show a strong correlation between the CNN pixel intensities and corresponding clearance predictions, as well as their robustness to different molecular orientations. Using actual case examples, we demonstrate that both CNN and GCNN interpretations frequently complement each other, suggesting their high potential for combined use in guiding medicinal chemistry design.

最近深度学习(DL)架构的方法进步不仅提高了预测模型的性能，而且增强了它们的可解释性潜力，从而大大提高了它们的透明度。在药物化学的背景下，不仅可以准确预测分子性质，而且可以化学解释它们的潜力将是强烈首选。此前，我们开发了精确的多任务卷积神经网络(CNN)和图卷积神经网络(GCNN)模型，分别从基于图像和基于图的分子表示中预测一组不同的内在代谢清除参数。在此，我们引入了几个模型可解释性框架，以回答从CNN和GCNN多任务清除模型获得的模型解释是否可以应用于预测与实验证实的代谢产物相关的化学转化。我们展示了CNN像素强度与相应的间隙预测之间的强相关性，以及它们对不同分子取向的鲁棒性。通过实际案例，我们证明了CNN和GCNN的解释经常相互补充，这表明它们在指导药物化学设计方面具有很大的潜力。

{"title":"Interpretation of multi-task clearance models from molecular images supported by experimental design","authors":"Andrés Martínez Mora , Mickael Mogemark , Vigneshwari Subramanian , Filip Miljković","doi":"10.1016/j.ailsci.2022.100048","DOIUrl":"10.1016/j.ailsci.2022.100048","url":null,"abstract":"<div><p>Recent methodological advances in deep learning (DL) architectures have not only improved the performance of predictive models but also enhanced their interpretability potential, thus considerably increasing their transparency. In the context of medicinal chemistry, the potential to not only accurately predict molecular properties, but also chemically interpret them, would be strongly preferred. Previously, we developed accurate multi-task convolutional neural network (CNN) and graph convolutional neural network (GCNN) models to predict a set of diverse intrinsic metabolic clearance parameters from image- and graph-based molecular representations, respectively. Herein, we introduce several model interpretability frameworks to answer whether the model explanations obtained from CNN and GCNN multi-task clearance models could be applied to predict chemical transformations associated with experimentally confirmed metabolic products. We show a strong correlation between the CNN pixel intensities and corresponding clearance predictions, as well as their robustness to different molecular orientations. Using actual case examples, we demonstrate that both CNN and GCNN interpretations frequently complement each other, suggesting their high potential for combined use in guiding medicinal chemistry design.</p></div>","PeriodicalId":72304,"journal":{"name":"Artificial intelligence in the life sciences","volume":"2 ","pages":"Article 100048"},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2667318522000186/pdfft?md5=fc7537dd4777fa93dd0a74d1d81c0c55&pid=1-s2.0-S2667318522000186-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41622538","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Deepitope: Prediction of HLA-independent T-cell epitopes mediated by MHC class II using a convolutional neural network Deepitope:利用卷积神经网络预测MHC II类介导的HLA非依赖性T细胞表位

Artificial intelligence in the life sciences

Pub Date : 2022-12-01 DOI: 10.1016/j.ailsci.2022.100038

Raphael Trevizani , Fábio Lima Custódio

Computational linear T-cell epitope prediction tools allow cost and labor reduction in downstream in vitro testing, but the quality of currently available methods is compromised by the scarcity of experimental data and extensive HLA polymorphism. However, it is possible to improve prediction quality by forgoing HLA-dependency that allows treating all immunogenic sequences as a single group. This reduces the problem to a much simpler two-classes classification of determining whether a peptide is immunogenic or not. Here, we use a deep convolutional neural network capable of predicting linear T-cell epitope regions in primary structures trained using all peptides deposited in the IEDB website. We also investigate the possibility of using peptides derived from known human proteins as non-immunogenic counterexamples. We compared our model with a state-of-the-art tool and analyze the benefits of using larger databases. Our results corroborate the usefulness of HLA-free methods for practical applications that require the identification of immunogenic sequences. Deepitope is an open source project that can be found at https://github.com/raphaeltrevizani/deepitope.

计算线性t细胞表位预测工具可以降低下游体外测试的成本和人工，但目前可用方法的质量受到实验数据稀缺和广泛的HLA多态性的影响。然而，通过放弃hla依赖性，允许将所有免疫原性序列作为单一组处理，可以提高预测质量。这将问题简化为确定肽是否具有免疫原性的简单得多的两类分类。在这里，我们使用一个深度卷积神经网络，能够预测初级结构中的线性t细胞表位区域，该结构使用IEDB网站上沉积的所有肽进行训练。我们还研究了使用从已知人类蛋白质中提取的肽作为非免疫原性反例的可能性。我们将我们的模型与最先进的工具进行了比较，并分析了使用大型数据库的好处。我们的结果证实了无hla方法在实际应用中需要识别免疫原性序列的有效性。Deepitope是一个开源项目，可以在https://github.com/raphaeltrevizani/deepitope上找到。

{"title":"Deepitope: Prediction of HLA-independent T-cell epitopes mediated by MHC class II using a convolutional neural network","authors":"Raphael Trevizani , Fábio Lima Custódio","doi":"10.1016/j.ailsci.2022.100038","DOIUrl":"10.1016/j.ailsci.2022.100038","url":null,"abstract":"<div><p>Computational linear T-cell epitope prediction tools allow cost and labor reduction in downstream <em>in vitro</em> testing, but the quality of currently available methods is compromised by the scarcity of experimental data and extensive HLA polymorphism. However, it is possible to improve prediction quality by forgoing HLA-dependency that allows treating all immunogenic sequences as a single group. This reduces the problem to a much simpler two-classes classification of determining whether a peptide is immunogenic or not. Here, we use a deep convolutional neural network capable of predicting linear T-cell epitope regions in primary structures trained using all peptides deposited in the IEDB website. We also investigate the possibility of using peptides derived from known human proteins as non-immunogenic counterexamples. We compared our model with a state-of-the-art tool and analyze the benefits of using larger databases. Our results corroborate the usefulness of HLA-free methods for practical applications that require the identification of immunogenic sequences. Deepitope is an open source project that can be found at <span>https://github.com/raphaeltrevizani/deepitope</span><svg><path></path></svg>.</p></div>","PeriodicalId":72304,"journal":{"name":"Artificial intelligence in the life sciences","volume":"2 ","pages":"Article 100038"},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2667318522000095/pdfft?md5=14ba0e71b89c009c171d8f8bde7e5f43&pid=1-s2.0-S2667318522000095-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43701924","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1