首页 > 最新文献

Artificial intelligence in the life sciences最新文献

英文 中文
Actively protective combinatorial analysis: A scalable novel method for detecting variants that contribute to reduced disease prevalence in high-risk individuals
Pub Date : 2025-01-31 DOI: 10.1016/j.ailsci.2025.100125
J Sardell, S Das, K Taylor, C Stubberfield, A Malinowski, M Strivens, S Gardner
We present a novel method for routinely identifying disease resilience associations that offers powerful insights for the discovery of a new class of disease protective targets. We show how this can be used to identify mechanisms in the background of normal cellular biology that work to slow or stop progression of complex, chronic diseases.
Actively protective combinatorial analysis identifies combinations of features that contribute to reducing risk of disease in individuals who remain healthy even though their genomic profile suggests that they have high risk of developing disease. These protective signatures can potentially be used to identify novel drug targets, pharmacogenomic and/or therapeutic mRNA opportunities and to better stratify patients by overall disease risk and mechanistic subtype.
We describe the method and illustrate how it offers increased power for detecting disease-associated genetic variants relative to traditional methods. We exemplify this by identifying individuals who remain healthy despite possessing several disease signatures associated with increased risk of myalgic encephalomyelitis/chronic fatigue syndrome (ME/CFS) or amyotrophic lateral sclerosis (ALS). We then identify combinations of SNP-genotypes significantly associated with reduced disease prevalence in these high-risk protected cohorts.
We discuss how actively protective combinatorial analysis generates novel insights into the genetic drivers of established disease biology and detects gene-disease associations missed by standard statistical approaches such as meta-GWAS. The results support the mechanism of action hypotheses identified in our original causative disease analyses. They also illustrate the potential for development of precision medicine approaches that can increase healthspan by reducing the progression of disease.
{"title":"Actively protective combinatorial analysis: A scalable novel method for detecting variants that contribute to reduced disease prevalence in high-risk individuals","authors":"J Sardell,&nbsp;S Das,&nbsp;K Taylor,&nbsp;C Stubberfield,&nbsp;A Malinowski,&nbsp;M Strivens,&nbsp;S Gardner","doi":"10.1016/j.ailsci.2025.100125","DOIUrl":"10.1016/j.ailsci.2025.100125","url":null,"abstract":"<div><div>We present a novel method for routinely identifying disease resilience associations that offers powerful insights for the discovery of a new class of disease protective targets. We show how this can be used to identify mechanisms in the background of normal cellular biology that work to slow or stop progression of complex, chronic diseases.</div><div>Actively protective combinatorial analysis identifies combinations of features that contribute to reducing risk of disease in individuals who remain healthy even though their genomic profile suggests that they have high risk of developing disease. These protective signatures can potentially be used to identify novel drug targets, pharmacogenomic and/or therapeutic mRNA opportunities and to better stratify patients by overall disease risk and mechanistic subtype.</div><div>We describe the method and illustrate how it offers increased power for detecting disease-associated genetic variants relative to traditional methods. We exemplify this by identifying individuals who remain healthy despite possessing several disease signatures associated with increased risk of myalgic encephalomyelitis/chronic fatigue syndrome (ME/CFS) or amyotrophic lateral sclerosis (ALS). We then identify combinations of SNP-genotypes significantly associated with reduced disease prevalence in these high-risk protected cohorts.</div><div>We discuss how actively protective combinatorial analysis generates novel insights into the genetic drivers of established disease biology and detects gene-disease associations missed by standard statistical approaches such as meta-GWAS. The results support the mechanism of action hypotheses identified in our original causative disease analyses. They also illustrate the potential for development of precision medicine approaches that can increase healthspan by reducing the progression of disease.</div></div>","PeriodicalId":72304,"journal":{"name":"Artificial intelligence in the life sciences","volume":"7 ","pages":"Article 100125"},"PeriodicalIF":0.0,"publicationDate":"2025-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143133851","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Explainable artificial intelligence for targeted protein degradation predictions
Pub Date : 2024-12-09 DOI: 10.1016/j.ailsci.2024.100121
Francis J. Prael III , Jutta Blank , William C. Forrester , Lingling Shen , Raquel Rodríguez-Pérez
Defining structure-activity relationships (SAR) is a central task in medicinal chemistry. Apart from optimizing activity against the target of interest, off-target activities and other properties need to be balanced to ensure a suitable property profile, which is an exceptional challenge in drug design. Machine learning (ML) can identify structural patterns in large compound collections that are correlated to biological activity or other molecular properties. Such ML-based SAR modeling has the potential of greatly assisting in compound optimization. However, the black-box character of most ML models has limited their application to help establishing SAR hypotheses. Explainable ML or, more generally, explainable artificial intelligence (XAI) aims at “opening the black box” by estimating how model inputs – e.g., chemical structures – contribute to model predictions. Although a variety of model interpretation methods have been proposed, XAI for medicinal chemistry is still an active field of research and XAI strategies are dominated by proofs of concept rather than by practical applications in drug discovery programs. Moreover, with the advent of new modalities, the applicability of ML and XAI models remains under-investigated. Herein, we present a novel application of XAI methods to targeted protein degradation (TPD) predictions. We report a case study of ML-based SAR modeling with explainable predictions of Cereblon (CRBN) glues for GSPT1 (G1 to S phase transition 1 protein). We showcase how XAI results were able to mirror expert knowledge based on structural data. Importantly, quantitative evaluations showed the ability of our ML/XAI workflow to accurately describe TPD activity cliffs across different proteins. These findings support use of the proposed XAI strategy to help rationalizing model predictions and illustrates how XAI methods can be exploited to balance SAR across different targets or properties for the new modality of TPDs.
{"title":"Explainable artificial intelligence for targeted protein degradation predictions","authors":"Francis J. Prael III ,&nbsp;Jutta Blank ,&nbsp;William C. Forrester ,&nbsp;Lingling Shen ,&nbsp;Raquel Rodríguez-Pérez","doi":"10.1016/j.ailsci.2024.100121","DOIUrl":"10.1016/j.ailsci.2024.100121","url":null,"abstract":"<div><div>Defining structure-activity relationships (SAR) is a central task in medicinal chemistry. Apart from optimizing activity against the target of interest, off-target activities and other properties need to be balanced to ensure a suitable property profile, which is an exceptional challenge in drug design. Machine learning (ML) can identify structural patterns in large compound collections that are correlated to biological activity or other molecular properties. Such ML-based SAR modeling has the potential of greatly assisting in compound optimization. However, the black-box character of most ML models has limited their application to help establishing SAR hypotheses. Explainable ML or, more generally, explainable artificial intelligence (XAI) aims at “opening the black box” by estimating how model inputs – e.g., chemical structures – contribute to model predictions. Although a variety of model interpretation methods have been proposed, XAI for medicinal chemistry is still an active field of research and XAI strategies are dominated by proofs of concept rather than by practical applications in drug discovery programs. Moreover, with the advent of new modalities, the applicability of ML and XAI models remains under-investigated. Herein, we present a novel application of XAI methods to targeted protein degradation (TPD) predictions. We report a case study of ML-based SAR modeling with explainable predictions of Cereblon (CRBN) glues for GSPT1 (G1 to S phase transition 1 protein). We showcase how XAI results were able to mirror expert knowledge based on structural data. Importantly, quantitative evaluations showed the ability of our ML/XAI workflow to accurately describe TPD activity cliffs across different proteins. These findings support use of the proposed XAI strategy to help rationalizing model predictions and illustrates how XAI methods can be exploited to balance SAR across different targets or properties for the new modality of TPDs.</div></div>","PeriodicalId":72304,"journal":{"name":"Artificial intelligence in the life sciences","volume":"7 ","pages":"Article 100121"},"PeriodicalIF":0.0,"publicationDate":"2024-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143133852","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The path to adoption of open source AI for drug discovery in Africa
Pub Date : 2024-12-05 DOI: 10.1016/j.ailsci.2024.100118
Gemma Turon, Miquel Duran-Frigola
{"title":"The path to adoption of open source AI for drug discovery in Africa","authors":"Gemma Turon,&nbsp;Miquel Duran-Frigola","doi":"10.1016/j.ailsci.2024.100118","DOIUrl":"10.1016/j.ailsci.2024.100118","url":null,"abstract":"","PeriodicalId":72304,"journal":{"name":"Artificial intelligence in the life sciences","volume":"7 ","pages":"Article 100118"},"PeriodicalIF":0.0,"publicationDate":"2024-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143133853","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Corrigendum to “Modeling PROTAC degradation activity with machine learning” [Artif. Intell. Life Sci. 6 (2024) 100104]
Pub Date : 2024-12-01 DOI: 10.1016/j.ailsci.2024.100114
Stefano Ribes , Eva Nittinger , Christian Tyrchan , Rocío Mercado
PROTACs are a promising therapeutic modality that harnesses the cell’s built-in degradation machinery to degrade specific proteins. Despite their potential, developing new PROTACs is challenging and requires significant domain expertise, time, and cost. Meanwhile, machine learning has transformed drug design and development. In this work, we present a strategy for curating open-source PROTAC data and an open-source deep learning tool for predicting the degradation activity of novel PROTAC molecules. The curated dataset incorporates important information such as pDC50, Dmax, E3 ligase type, POI amino acid sequence, and experimental cell type. Our model architecture leverages learned embeddings from pretrained machine learning models, in particular for encoding protein sequences and cell type information. We assessed the quality of the curated data and the generalization ability of our model architecture against new PROTACs and targets via three tailored studies, which we recommend other researchers to use in evaluating their degradation activity models. In each study, three models predict protein degradation in a majority vote setting, reaching a top test accuracy of 80.8% and 0.865 ROC-AUC, and a test accuracy of 62.3% and 0.604 ROC-AUC when generalizing to novel protein targets. Our results are not only comparable to state-of-the-art models for protein degradation prediction, but also part of an open-source implementation which is easily reproducible and less computationally complex than existing approaches.
{"title":"Corrigendum to “Modeling PROTAC degradation activity with machine learning” [Artif. Intell. Life Sci. 6 (2024) 100104]","authors":"Stefano Ribes ,&nbsp;Eva Nittinger ,&nbsp;Christian Tyrchan ,&nbsp;Rocío Mercado","doi":"10.1016/j.ailsci.2024.100114","DOIUrl":"10.1016/j.ailsci.2024.100114","url":null,"abstract":"<div><div>PROTACs are a promising therapeutic modality that harnesses the cell’s built-in degradation machinery to degrade specific proteins. Despite their potential, developing new PROTACs is challenging and requires significant domain expertise, time, and cost. Meanwhile, machine learning has transformed drug design and development. In this work, we present a strategy for curating open-source PROTAC data and an open-source deep learning tool for predicting the degradation activity of novel PROTAC molecules. The curated dataset incorporates important information such as <span><math><mrow><mi>p</mi><mi>D</mi><msub><mrow><mi>C</mi></mrow><mrow><mn>50</mn></mrow></msub></mrow></math></span>, <span><math><msub><mrow><mi>D</mi></mrow><mrow><mi>m</mi><mi>a</mi><mi>x</mi></mrow></msub></math></span>, E3 ligase type, POI amino acid sequence, and experimental cell type. Our model architecture leverages learned embeddings from pretrained machine learning models, in particular for encoding protein sequences and cell type information. We assessed the quality of the curated data and the generalization ability of our model architecture against new PROTACs and targets via three tailored studies, which we recommend other researchers to use in evaluating their degradation activity models. In each study, three models predict protein degradation in a majority vote setting, reaching a top test accuracy of 80.8% and 0.865 ROC-AUC, and a test accuracy of 62.3% and 0.604 ROC-AUC when generalizing to novel protein targets. Our results are not only comparable to state-of-the-art models for protein degradation prediction, but also part of an open-source implementation which is easily reproducible and less computationally complex than existing approaches.</div></div>","PeriodicalId":72304,"journal":{"name":"Artificial intelligence in the life sciences","volume":"6 ","pages":"Article 100114"},"PeriodicalIF":0.0,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143169762","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Rethinking the 'best method' paradigm: The effectiveness of hybrid and multidisciplinary approaches in chemoinformatics 重新思考“最佳方法”范式:化学信息学中混合和多学科方法的有效性
Pub Date : 2024-12-01 DOI: 10.1016/j.ailsci.2024.100117
José L. Medina-Franco , Johny R. Rodríguez-Pérez , Héctor F. Cortés-Hernández , Edgar López-López
In Chemoinformatics, as in many other computational-related disciplines, it is a common practice to identify the “single best” approach or methodology, for instance, identify the best fingerprint representation, the best single virtual screening approach or protocol, the optimal representation of the chemical space, the best predictive model, to name a few. In molecular modeling, a typical example is finding the best docking program. However, it is also known that each approach has its advantages and limitations. There are examples of benchmark studies comparing different approaches to find the most appropriate solution, and it is common to find that there are no single best programs in such studies. Yet, searching for the “best” methods is still common. The main goal of this work is to survey hybrid methodologies recently developed in Chemoinformatics. The list of approaches is not exhaustive, but it aims to cover several representative applications. One of the major outcomes of the survey is that, for various purposes, individual methods do not perform as well as the combination of approaches because single methods have inherent limitations with advantages and disadvantages.
在化学信息学中,与许多其他与计算相关的学科一样,确定“单一最佳”方法或方法是一种常见的做法,例如,确定最佳指纹表示,最佳单一虚拟筛选方法或协议,化学空间的最佳表示,最佳预测模型,等等。在分子建模中,寻找最佳对接方案是一个典型的例子。然而,众所周知,每种方法都有其优点和局限性。有一些比较不同方法以找到最合适的解决方案的基准研究的例子,并且通常发现在此类研究中没有单一的最佳方案。然而,寻找“最佳”方法仍然很常见。这项工作的主要目的是调查混合方法最近发展在化学信息学。方法列表并不详尽,但它旨在涵盖几个具有代表性的应用程序。调查的主要结果之一是,对于各种目的,单个方法不如方法组合的效果好,因为单个方法具有固有的优点和缺点的局限性。
{"title":"Rethinking the 'best method' paradigm: The effectiveness of hybrid and multidisciplinary approaches in chemoinformatics","authors":"José L. Medina-Franco ,&nbsp;Johny R. Rodríguez-Pérez ,&nbsp;Héctor F. Cortés-Hernández ,&nbsp;Edgar López-López","doi":"10.1016/j.ailsci.2024.100117","DOIUrl":"10.1016/j.ailsci.2024.100117","url":null,"abstract":"<div><div>In Chemoinformatics, as in many other computational-related disciplines, it is a common practice to identify the “single best” approach or methodology, for instance, identify the best fingerprint representation, the best single virtual screening approach or protocol, the optimal representation of the chemical space, the best predictive model, to name a few. In molecular modeling, a typical example is finding the best docking program. However, it is also known that each approach has its advantages and limitations. There are examples of benchmark studies comparing different approaches to find the most appropriate solution, and it is common to find that there are no single best programs in such studies. Yet, searching for the “best” methods is still common. The main goal of this work is to survey hybrid methodologies recently developed in Chemoinformatics. The list of approaches is not exhaustive, but it aims to cover several representative applications. One of the major outcomes of the survey is that, for various purposes, individual methods do not perform as well as the combination of approaches because single methods have inherent limitations with advantages and disadvantages.</div></div>","PeriodicalId":72304,"journal":{"name":"Artificial intelligence in the life sciences","volume":"6 ","pages":"Article 100117"},"PeriodicalIF":0.0,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142748622","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Corrigendum to “Modeling PROTAC degradation activity with machine learning” [Artificial Intelligence in the Life Sciences 6 (2024) 100104]
Pub Date : 2024-12-01 DOI: 10.1016/j.ailsci.2024.100105
Stefano Ribes , Eva Nittinger , Christian Tyrchan , Rocío Mercado
{"title":"Corrigendum to “Modeling PROTAC degradation activity with machine learning” [Artificial Intelligence in the Life Sciences 6 (2024) 100104]","authors":"Stefano Ribes ,&nbsp;Eva Nittinger ,&nbsp;Christian Tyrchan ,&nbsp;Rocío Mercado","doi":"10.1016/j.ailsci.2024.100105","DOIUrl":"10.1016/j.ailsci.2024.100105","url":null,"abstract":"","PeriodicalId":72304,"journal":{"name":"Artificial intelligence in the life sciences","volume":"6 ","pages":"Article 100105"},"PeriodicalIF":0.0,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143169761","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Pharmacological profiles of neglected tropical disease drugs 被忽视的热带病药物的药理学特征
Pub Date : 2024-10-30 DOI: 10.1016/j.ailsci.2024.100116
Alessandro Greco , Reagon Karki , Yojana Gadiya , Clara Deecke , Andrea Zaliani , Sheraz Gul
According to the World health Organization there are a group of 20 diverse infectious Neglected Tropical Disease (NTD) conditions that primarily affect populations in low-income and developing regions. Despite the limited attention and funding compared to other health concerns, significant efforts to develop drugs for treating and controlling NTDs have been made. However, there is room for developing NTD drugs with improved safety, efficacy and ecotoxicological profiles. In order to facilitate this, we have adapted our existing validated data-driven workflows for understanding disease comorbidity to systematically evaluate the approved drugs that target the major World Health Organization defined NTDs. The foundation for this work comprised assembling the physicochemical, biological and clinical properties of each NTD drug and identifying patterns that reveal the underlying cause of their efficacy and side-effect profiles. Subsequently, computational methods were employed to identify analogs with potentially improved profiles and validated in a case study focusing on the teratogenic antileishmanial drug miltefosine. The wider impact of NTD drugs with regards to a One Health cross-disciplinary perspective at the human-animal-environment interface are also discussed.
据世界卫生组织统计,被忽视的热带传染病(NTD)有 20 种,主要影响低收入和发展中地区的人口。尽管与其他健康问题相比,NTD 得到的关注和资金有限,但在开发治疗和控制 NTD 的药物方面仍做出了巨大努力。然而,在开发安全性、有效性和生态毒理学特征更佳的非传染性疾病药物方面仍有空间。为了促进这项工作,我们调整了现有的经过验证的数据驱动工作流程,以了解疾病的并发症,从而系统地评估针对世界卫生组织定义的主要非传染性疾病的已批准药物。这项工作的基础包括收集每种非传染性疾病药物的理化、生物和临床特性,并找出揭示其疗效和副作用特征根本原因的模式。随后,利用计算方法确定了具有潜在改良特性的类似物,并在以致畸抗利什曼病药物米替福新为重点的案例研究中进行了验证。此外,还讨论了非传染性疾病药物对人类-动物-环境界面的 "一体健康 "跨学科视角的更广泛影响。
{"title":"Pharmacological profiles of neglected tropical disease drugs","authors":"Alessandro Greco ,&nbsp;Reagon Karki ,&nbsp;Yojana Gadiya ,&nbsp;Clara Deecke ,&nbsp;Andrea Zaliani ,&nbsp;Sheraz Gul","doi":"10.1016/j.ailsci.2024.100116","DOIUrl":"10.1016/j.ailsci.2024.100116","url":null,"abstract":"<div><div>According to the World health Organization there are a group of 20 diverse infectious Neglected Tropical Disease (NTD) conditions that primarily affect populations in low-income and developing regions. Despite the limited attention and funding compared to other health concerns, significant efforts to develop drugs for treating and controlling NTDs have been made. However, there is room for developing NTD drugs with improved safety, efficacy and ecotoxicological profiles. In order to facilitate this, we have adapted our existing validated data-driven workflows for understanding disease comorbidity to systematically evaluate the approved drugs that target the major World Health Organization defined NTDs. The foundation for this work comprised assembling the physicochemical, biological and clinical properties of each NTD drug and identifying patterns that reveal the underlying cause of their efficacy and side-effect profiles. Subsequently, computational methods were employed to identify analogs with potentially improved profiles and validated in a case study focusing on the teratogenic antileishmanial drug miltefosine. The wider impact of NTD drugs with regards to a One Health cross-disciplinary perspective at the human-animal-environment interface are also discussed.</div></div>","PeriodicalId":72304,"journal":{"name":"Artificial intelligence in the life sciences","volume":"6 ","pages":"Article 100116"},"PeriodicalIF":0.0,"publicationDate":"2024-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142586762","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DTA Atlas: A massive-scale drug repurposing database DTA Atlas:大规模药物再利用数据库
Pub Date : 2024-10-18 DOI: 10.1016/j.ailsci.2024.100115
Madina Sultanova , Elizaveta Vinogradova , Alisher Amantay , Ferdinand Molnár , Siamac Fazli
The drug development process is costly and time-consuming. Repurposing existing approved drugs, an efficient and cost-effective strategy, involves assessing numerous drug-protein pairs to uncover new interactions. While modern in silico methods enhance scalability, an open database for projected drug-target interactions across the entire human proteome is still lacking. In this work, we introduce an open database of predicted drug-target interactions, termed DTA Atlas, covering the entire human proteome as well as a wide range of marketed drugs, resulting in over 220 million drug-target pairs. The database integrates 4 billion affinity predictions from advanced deep neural networks and offers a user-friendly web interface, enabling users to explore drug-target affinity predictions for the human proteome. To the best of our knowledge, DTA Atlas represents the first comprehensive collection of drug-target binding strength predictions. It is open-source and can serve as an important resource for drug development, drug repurposing, toxicity studies and more.
药物开发过程耗资巨大、耗时漫长。对现有获批药物进行再利用是一种高效且具有成本效益的策略,它涉及评估众多药物-蛋白质配对,以发现新的相互作用。虽然现代的硅学方法提高了可扩展性,但目前仍缺乏一个开放的数据库来预测整个人类蛋白质组中药物与靶点的相互作用。在这项工作中,我们引入了一个预测药物-靶点相互作用的开放式数据库,称为 DTA Atlas,它涵盖了整个人类蛋白质组以及各种上市药物,从而产生了超过 2.2 亿个药物-靶点配对。该数据库整合了来自高级深度神经网络的 40 亿次亲和力预测,并提供了用户友好的网络界面,使用户能够探索人类蛋白质组的药物-靶点亲和力预测。据我们所知,DTA Atlas 是第一个全面的药物-靶点结合强度预测集合。它是开源的,可作为药物开发、药物再利用、毒性研究等方面的重要资源。
{"title":"DTA Atlas: A massive-scale drug repurposing database","authors":"Madina Sultanova ,&nbsp;Elizaveta Vinogradova ,&nbsp;Alisher Amantay ,&nbsp;Ferdinand Molnár ,&nbsp;Siamac Fazli","doi":"10.1016/j.ailsci.2024.100115","DOIUrl":"10.1016/j.ailsci.2024.100115","url":null,"abstract":"<div><div>The drug development process is costly and time-consuming. Repurposing existing approved drugs, an efficient and cost-effective strategy, involves assessing numerous drug-protein pairs to uncover new interactions. While modern <em>in silico</em> methods enhance scalability, an open database for projected drug-target interactions across the entire human proteome is still lacking. In this work, we introduce an open database of predicted drug-target interactions, termed <em>DTA Atlas</em>, covering the entire human proteome as well as a wide range of marketed drugs, resulting in over 220 million drug-target pairs. The database integrates 4 billion affinity predictions from advanced deep neural networks and offers a user-friendly web interface, enabling users to explore drug-target affinity predictions for the human proteome. To the best of our knowledge, DTA Atlas represents the first comprehensive collection of drug-target binding strength predictions. It is open-source and can serve as an important resource for drug development, drug repurposing, toxicity studies and more.</div></div>","PeriodicalId":72304,"journal":{"name":"Artificial intelligence in the life sciences","volume":"6 ","pages":"Article 100115"},"PeriodicalIF":0.0,"publicationDate":"2024-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142525925","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Modeling PROTAC degradation activity with machine learning 利用机器学习模拟 PROTAC 降解活动
Pub Date : 2024-07-14 DOI: 10.1016/j.ailsci.2024.100104
Stefano Ribes , Eva Nittinger , Christian Tyrchan , Rocío Mercado

PROTACs are a promising therapeutic modality that harnesses the cell’s built-in degradation machinery to degrade specific proteins. Despite their potential, developing new PROTACs is challenging and requires significant domain expertise, time, and cost. Meanwhile, machine learning has transformed drug design and development. In this work, we present a strategy for curating open-source PROTAC data and an open-source deep learning tool for predicting the degradation activity of novel PROTAC molecules. The curated dataset incorporates important information such as pDC50, Dmax, E3 ligase type, POI amino acid sequence, and experimental cell type. Our model architecture leverages learned embeddings from pretrained machine learning models, in particular for encoding protein sequences and cell type information. We assessed the quality of the curated data and the generalization ability of our model architecture against new PROTACs and targets via three tailored studies, which we recommend other researchers to use in evaluating their degradation activity models. In each study, three models predict protein degradation in a majority vote setting, reaching a top test accuracy of 82.6% and 0.848 ROC AUC, and a test accuracy of 61% and 0.615 ROC AUC when generalizing to novel protein targets. Our results are not only comparable to state-of-the-art models for protein degradation prediction, but also part of an open-source implementation which is easily reproducible and less computationally complex than existing approaches.

PROTACs 是一种很有前景的治疗方式,它利用细胞内置的降解机制来降解特定蛋白质。尽管PROTACs潜力巨大,但开发新的PROTACs却极具挑战性,需要大量的专业领域知识、时间和成本。与此同时,机器学习改变了药物设计和开发。在这项工作中,我们提出了一种整理开源 PROTAC 数据的策略,以及一种预测新型 PROTAC 分子降解活性的开源深度学习工具。策划的数据集包含 pDC50、Dmax、E3 连接酶类型、POI 氨基酸序列和实验细胞类型等重要信息。我们的模型架构利用了从预先训练的机器学习模型中学习到的嵌入,特别是用于编码蛋白质序列和细胞类型信息。我们通过三项量身定制的研究评估了数据的质量以及我们的模型架构对新的 PROTAC 和靶标的泛化能力,我们建议其他研究人员在评估他们的降解活性模型时使用这些数据。在每项研究中,三个模型都以多数票方式预测了蛋白质降解情况,最高测试准确率达 82.6%,ROC AUC 为 0.848;当推广到新型蛋白质靶标时,测试准确率达 61%,ROC AUC 为 0.615。我们的结果不仅可以与最先进的蛋白质降解预测模型相媲美,而且是开源实现的一部分,与现有方法相比,它易于重复,计算复杂度较低。
{"title":"Modeling PROTAC degradation activity with machine learning","authors":"Stefano Ribes ,&nbsp;Eva Nittinger ,&nbsp;Christian Tyrchan ,&nbsp;Rocío Mercado","doi":"10.1016/j.ailsci.2024.100104","DOIUrl":"10.1016/j.ailsci.2024.100104","url":null,"abstract":"<div><p>PROTACs are a promising therapeutic modality that harnesses the cell’s built-in degradation machinery to degrade specific proteins. Despite their potential, developing new PROTACs is challenging and requires significant domain expertise, time, and cost. Meanwhile, machine learning has transformed drug design and development. In this work, we present a strategy for curating open-source PROTAC data and an open-source deep learning tool for predicting the degradation activity of novel PROTAC molecules. The curated dataset incorporates important information such as <span><math><mrow><mi>p</mi><mi>D</mi><msub><mrow><mi>C</mi></mrow><mrow><mn>50</mn></mrow></msub></mrow></math></span>, <span><math><msub><mrow><mi>D</mi></mrow><mrow><mi>m</mi><mi>a</mi><mi>x</mi></mrow></msub></math></span>, E3 ligase type, POI amino acid sequence, and experimental cell type. Our model architecture leverages learned embeddings from pretrained machine learning models, in particular for encoding protein sequences and cell type information. We assessed the quality of the curated data and the generalization ability of our model architecture against new PROTACs and targets via three tailored studies, which we recommend other researchers to use in evaluating their degradation activity models. In each study, three models predict protein degradation in a majority vote setting, reaching a top test accuracy of 82.6% and 0.848 ROC AUC, and a test accuracy of 61% and 0.615 ROC AUC when generalizing to novel protein targets. Our results are not only comparable to state-of-the-art models for protein degradation prediction, but also part of an open-source implementation which is easily reproducible and less computationally complex than existing approaches.</p></div>","PeriodicalId":72304,"journal":{"name":"Artificial intelligence in the life sciences","volume":"6 ","pages":"Article 100104"},"PeriodicalIF":0.0,"publicationDate":"2024-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2667318524000114/pdfft?md5=fbcd6191bbd4f65eeacdd8602953af66&pid=1-s2.0-S2667318524000114-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141960711","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Machine learning proteochemometric models for Cereblon glue activity predictions 用于预测脑龙胶活性的机器学习蛋白质化学计量模型
Pub Date : 2024-06-11 DOI: 10.1016/j.ailsci.2024.100100
Francis J. Prael III , Jiayi Cox , Noé Sturm , Peter Kutchukian , William C. Forrester , Gregory Michaud , Jutta Blank , Lingling Shen , Raquel Rodríguez-Pérez

Targeted protein degradation (TPD) is a rapidly developing drug discovery technique with unique efficacy and target scope stemming from its degradation-based activity. Molecular glue degraders are a promising arm of TPD, as evidenced by the FDA-approved therapeutics within this class, the increasing number of degraders in clinical development, and their predisposition to drug-likeness. Cereblon (CRBN) glue degraders mediate target degradation by generating a neomorphic interface between CRBN and a protein of interest. While promising, the complicated nature of this CRBN-glue-target ternary complex makes the rational design of molecular glue degraders challenging. For other drug modalities, predictive modeling has been established to leverage existing activity data and generate quantitative structure-activity relationships (QSAR). However, the applicability of QSAR strategies for glues remains under-investigated. Herein, machine learning methodologies were developed to predict glue-mediated recruitment of CRBN to target proteins and achieved promising performance. Generated models leveraged more than a hundred internal screening campaigns across thousands of CRBN glues to predict glue-mediated recruitment of targets to CRBN. Our results show that recruitment activity of CRBN glue degraders can be modeled by machine learning, with 89 % of models producing an area under the receiver operating characteristic curve (ROC AUC) > 0.8 and 70 % of models producing a Matthew's correlation coefficient (MCC) > 0.2 for these primary screening data. Importantly, our findings also indicate that the combination of compound and protein descriptors in the so-called proteochemometric models improves performance, with >80 % of the models exhibiting higher ROC AUC and MCC values than per-target models only based on compound information. Hence, our investigations suggest that proteochemometric modeling is a successful approach for molecular glue degraders. The proposed machine learning strategies can aid compound prioritization based on recruitment efficacy and target selectivity, thus have the potential to facilitate the design and discovery of therapeutic CRBN molecular glues.

靶向蛋白质降解(TPD)是一种快速发展的药物发现技术,其独特的功效和靶向范围源于其基于降解的活性。分子胶降解剂是一种前景广阔的靶向降解技术,美国食品及药物管理局(FDA)批准的该类治疗药物、越来越多的降解剂进入临床开发阶段以及它们的药物相似性都证明了这一点。Cereblon(CRBN)胶水降解剂通过在 CRBN 和感兴趣的蛋白质之间生成一个新形界面来介导目标降解。这种 CRBN-胶水-靶标三元复合物性质复杂,虽然前景广阔,但合理设计分子胶水降解剂仍具有挑战性。对于其他药物模式,已经建立了预测模型来利用现有的活性数据并生成定量结构-活性关系(QSAR)。然而,QSAR 策略对胶水的适用性仍未得到充分研究。在此,我们开发了机器学习方法来预测胶水介导的 CRBN 对靶蛋白的招募,并取得了良好的效果。生成的模型利用了数以千计的 CRBN 胶的百余次内部筛选活动来预测胶介导的 CRBN 对靶蛋白的招募。我们的研究结果表明,CRBN胶水降解剂的招募活性可以通过机器学习来建模,对于这些初筛数据,89%的模型产生的接收者操作特征曲线下面积(ROC AUC)为0.8,70%的模型产生的马修相关系数(MCC)为0.2。重要的是,我们的研究结果还表明,在所谓的蛋白质化学计量学模型中结合化合物和蛋白质描述因子可提高性能,80%的模型比仅基于化合物信息的每目标模型显示出更高的ROC AUC和MCC值。因此,我们的研究表明,蛋白化学计量模型是一种成功的分子胶降解方法。所提出的机器学习策略可以根据招募效果和靶点选择性帮助确定化合物的优先级,从而有可能促进治疗性 CRBN 分子胶的设计和发现。
{"title":"Machine learning proteochemometric models for Cereblon glue activity predictions","authors":"Francis J. Prael III ,&nbsp;Jiayi Cox ,&nbsp;Noé Sturm ,&nbsp;Peter Kutchukian ,&nbsp;William C. Forrester ,&nbsp;Gregory Michaud ,&nbsp;Jutta Blank ,&nbsp;Lingling Shen ,&nbsp;Raquel Rodríguez-Pérez","doi":"10.1016/j.ailsci.2024.100100","DOIUrl":"https://doi.org/10.1016/j.ailsci.2024.100100","url":null,"abstract":"<div><p>Targeted protein degradation (TPD) is a rapidly developing drug discovery technique with unique efficacy and target scope stemming from its degradation-based activity. Molecular glue degraders are a promising arm of TPD, as evidenced by the FDA-approved therapeutics within this class, the increasing number of degraders in clinical development, and their predisposition to drug-likeness. Cereblon (CRBN) glue degraders mediate target degradation by generating a neomorphic interface between CRBN and a protein of interest. While promising, the complicated nature of this CRBN-glue-target ternary complex makes the rational design of molecular glue degraders challenging. For other drug modalities, predictive modeling has been established to leverage existing activity data and generate quantitative structure-activity relationships (QSAR). However, the applicability of QSAR strategies for glues remains under-investigated. Herein, machine learning methodologies were developed to predict glue-mediated recruitment of CRBN to target proteins and achieved promising performance. Generated models leveraged more than a hundred internal screening campaigns across thousands of CRBN glues to predict glue-mediated recruitment of targets to CRBN. Our results show that recruitment activity of CRBN glue degraders can be modeled by machine learning, with 89 % of models producing an area under the receiver operating characteristic curve (ROC AUC) &gt; 0.8 and 70 % of models producing a Matthew's correlation coefficient (MCC) &gt; 0.2 for these primary screening data. Importantly, our findings also indicate that the combination of compound and protein descriptors in the so-called proteochemometric models improves performance, with &gt;80 % of the models exhibiting higher ROC AUC and MCC values than per-target models only based on compound information. Hence, our investigations suggest that proteochemometric modeling is a successful approach for molecular glue degraders. The proposed machine learning strategies can aid compound prioritization based on recruitment efficacy and target selectivity, thus have the potential to facilitate the design and discovery of therapeutic CRBN molecular glues.</p></div>","PeriodicalId":72304,"journal":{"name":"Artificial intelligence in the life sciences","volume":"6 ","pages":"Article 100100"},"PeriodicalIF":0.0,"publicationDate":"2024-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2667318524000072/pdfft?md5=74a4c064cfb576ff403180c61ffdc97f&pid=1-s2.0-S2667318524000072-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141324462","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Artificial intelligence in the life sciences
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1