Pub Date : 2023-10-04DOI: 10.1016/j.ailsci.2023.100087
Carlos A. Hernández-Garrido , Norberto Sánchez-Cruz
The accuracy of machine learning models for protein-ligand binding affinity prediction depends on the quality of the experimental data they are trained on. Most of these models are trained and tested on different subsets of the PDBbind database, which is the main source of protein-ligand complexes with annotated binding affinity in the public domain. However, estimating its experimental uncertainty is not straightforward because just a few protein-ligand complexes have more than one measurement associated. In this work, we analyze bioactivity data from ChEMBL to estimate the experimental uncertainty associated with the three binding affinity measures included in the PDBbind (Ki, Kd, and IC50), as well as the effect of combining them. The experimental uncertainty of combining these three affinity measures was characterized by a mean absolute error of 0.78 logarithmic units, a root mean square error of 1.04 and a Pearson correlation coefficient of 0.76. These estimations were contrasted with the performances obtained by state-of-the-art machine learning models for binding affinity prediction, showing that these models tend to be overoptimistic when evaluated on the core set from PDBbind.
{"title":"Experimental Uncertainty in Training Data for Protein-Ligand Binding Affinity Prediction Models","authors":"Carlos A. Hernández-Garrido , Norberto Sánchez-Cruz","doi":"10.1016/j.ailsci.2023.100087","DOIUrl":"https://doi.org/10.1016/j.ailsci.2023.100087","url":null,"abstract":"<div><p>The accuracy of machine learning models for protein-ligand binding affinity prediction depends on the quality of the experimental data they are trained on. Most of these models are trained and tested on different subsets of the PDBbind database, which is the main source of protein-ligand complexes with annotated binding affinity in the public domain. However, estimating its experimental uncertainty is not straightforward because just a few protein-ligand complexes have more than one measurement associated. In this work, we analyze bioactivity data from ChEMBL to estimate the experimental uncertainty associated with the three binding affinity measures included in the PDBbind (K<sub>i</sub>, K<sub>d</sub>, and IC<sub>50</sub>), as well as the effect of combining them. The experimental uncertainty of combining these three affinity measures was characterized by a mean absolute error of 0.78 logarithmic units, a root mean square error of 1.04 and a Pearson correlation coefficient of 0.76. These estimations were contrasted with the performances obtained by state-of-the-art machine learning models for binding affinity prediction, showing that these models tend to be overoptimistic when evaluated on the core set from PDBbind.</p></div>","PeriodicalId":72304,"journal":{"name":"Artificial intelligence in the life sciences","volume":"4 ","pages":"Article 100087"},"PeriodicalIF":0.0,"publicationDate":"2023-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49711349","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-09-09DOI: 10.1016/j.ailsci.2023.100086
Sabrina Silva-Mendonça , Arthur Ricardo de Sousa Vitória , Telma Woerle de Lima , Arlindo Rodrigues Galvão-Filho , Carolina Horta Andrade
Computational approaches have revolutionized the field of drug discovery, collectively known as Computer-Assisted Drug Design (CADD). Advancements in computing power, data generation, digitalization, and artificial intelligence (AI) techniques have played a crucial role in the rise of CADD. These approaches offer numerous benefits, enabling the analysis and interpretation of vast amounts of data from diverse sources, such as genomics, structural information, and clinical trials data. By integrating and analyzing these multiple data sources, researchers can efficiently identify potential drug targets and develop new drug candidates. Among the AI techniques, machine learning (ML) and deep learning (DL) have shown tremendous promise in drug discovery. ML and DL models can effectively utilize experimental data to accurately predict the efficacy and safety of drug candidates. However, despite these advancements, certain areas in drug discovery face data scarcity, particularly in neglected, rare, and emerging viral diseases. Few-shot learning (FSL) is an emerging approach that addresses the challenge of limited data in drug discovery. FSL enables ML models to learn from a small number of examples of a new task, achieving commendable performance by leveraging knowledge learned from related datasets or prior information. It often involves meta-learning, which trains a model to learn how to learn from few data. This ability to quickly adapt to new tasks with low data circumvents the need for extensive training on large datasets. By enabling efficient learning from a small amount of data, few-shot learning has the potential to accelerate the drug discovery process and enhance the success rate of drug development. In this review, we introduce the concept of few-shot learning and its application in drug discovery. Furthermore, we demonstrate the valuable application of few-shot learning in the identification of new drug targets, accurate prediction of drug efficacy, and the design of novel compounds possessing desired biological properties. This comprehensive review draws upon numerous papers from the literature to provide extensive insights into the effectiveness and potential of few-shot learning in these critical areas of drug discovery and development.
{"title":"Exploring new horizons: Empowering computer-assisted drug design with few-shot learning","authors":"Sabrina Silva-Mendonça , Arthur Ricardo de Sousa Vitória , Telma Woerle de Lima , Arlindo Rodrigues Galvão-Filho , Carolina Horta Andrade","doi":"10.1016/j.ailsci.2023.100086","DOIUrl":"https://doi.org/10.1016/j.ailsci.2023.100086","url":null,"abstract":"<div><p>Computational approaches have revolutionized the field of drug discovery, collectively known as Computer-Assisted Drug Design (CADD). Advancements in computing power, data generation, digitalization, and artificial intelligence (AI) techniques have played a crucial role in the rise of CADD. These approaches offer numerous benefits, enabling the analysis and interpretation of vast amounts of data from diverse sources, such as genomics, structural information, and clinical trials data. By integrating and analyzing these multiple data sources, researchers can efficiently identify potential drug targets and develop new drug candidates. Among the AI techniques, machine learning (ML) and deep learning (DL) have shown tremendous promise in drug discovery. ML and DL models can effectively utilize experimental data to accurately predict the efficacy and safety of drug candidates. However, despite these advancements, certain areas in drug discovery face data scarcity, particularly in neglected, rare, and emerging viral diseases. Few-shot learning (FSL) is an emerging approach that addresses the challenge of limited data in drug discovery. FSL enables ML models to learn from a small number of examples of a new task, achieving commendable performance by leveraging knowledge learned from related datasets or prior information. It often involves meta-learning, which trains a model to learn how to learn from few data. This ability to quickly adapt to new tasks with low data circumvents the need for extensive training on large datasets. By enabling efficient learning from a small amount of data, few-shot learning has the potential to accelerate the drug discovery process and enhance the success rate of drug development. In this review, we introduce the concept of few-shot learning and its application in drug discovery. Furthermore, we demonstrate the valuable application of few-shot learning in the identification of new drug targets, accurate prediction of drug efficacy, and the design of novel compounds possessing desired biological properties. This comprehensive review draws upon numerous papers from the literature to provide extensive insights into the effectiveness and potential of few-shot learning in these critical areas of drug discovery and development.</p></div>","PeriodicalId":72304,"journal":{"name":"Artificial intelligence in the life sciences","volume":"4 ","pages":"Article 100086"},"PeriodicalIF":0.0,"publicationDate":"2023-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49711348","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-08-23DOI: 10.1016/j.ailsci.2023.100085
Jürgen Bajorath
{"title":"Data and code availability requirements in open science and consequences for different research environments","authors":"Jürgen Bajorath","doi":"10.1016/j.ailsci.2023.100085","DOIUrl":"10.1016/j.ailsci.2023.100085","url":null,"abstract":"","PeriodicalId":72304,"journal":{"name":"Artificial intelligence in the life sciences","volume":"4 ","pages":"Article 100085"},"PeriodicalIF":0.0,"publicationDate":"2023-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47255704","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-08-18DOI: 10.1016/j.ailsci.2023.100084
Rahul Gomes , Tyler Pham , Nichol He , Connor Kamrowski , Joseph Wildenberg
Purpose
The purpose of this study is to develop an accurate deep learning model capable of Inferior Vena Cava (IVC) filter segmentation from CT scans. The study does a comparative assessment of the impact of Residual Networks (ResNets) complemented with reduced convolutional layer depth and also analyzes the impact of using vision transformer architectures without performance degradation.
Materials and Methods
This experimental retrospective study on 84 CT scans consisting of 54618 slices involves design, implementation, and evaluation of segmentation algorithm which can be used to generate a clinical report for the presence of IVC filters on abdominal CT scans performed for any reason. Several variants of patch-based 3D-Convolutional Neural Network (CNN) and the Swin UNet Transformer (Swin-UNETR) are used to retrieve the signature of IVC filters. The Dice Score is used as a metric to compare the performance of the segmentation models.
Results
Model trained on UNet variant using four ResNet layers showed a higher segmentation performance achieving median Dice = 0.92 [Interquartile range(IQR): 0.85, 0.93] compared to the plain UNet model with four layers having median Dice = 0.89 [IQR: 0.83, 0.92]. Segmentation results from ResNet with two layers achieved a median Dice = 0.93 [IQR: 0.87, 0.94] which was higher than the plain UNet model with two layers at median Dice = 0.87 [IQR: 0.77, 0.90]. Models trained using SWIN-based transformers performed significantly better in both training and validation datasets compared to the four CNN variants. The validation median Dice was highest in 4 layer Swin UNETR at 0.88 followed by 2 layer Swin UNETR at 0.85.
Conclusion
Utilization of vision based transformer Swin-UNETR results in segmentation output with both low bias and variance thereby solving a real-world problem within healthcare for advanced Artificial Intelligence (AI) image processing and recognition. The Swin UNETR will reduce the time spent manually tracking IVC filters by centralizing within the electronic health record. Link to GitHub repository.
{"title":"Analysis of Swin-UNet vision transformer for Inferior Vena Cava filter segmentation from CT scans","authors":"Rahul Gomes , Tyler Pham , Nichol He , Connor Kamrowski , Joseph Wildenberg","doi":"10.1016/j.ailsci.2023.100084","DOIUrl":"10.1016/j.ailsci.2023.100084","url":null,"abstract":"<div><h3>Purpose</h3><p>The purpose of this study is to develop an accurate deep learning model capable of Inferior Vena Cava (IVC) filter segmentation from CT scans. The study does a comparative assessment of the impact of Residual Networks (ResNets) complemented with reduced convolutional layer depth and also analyzes the impact of using vision transformer architectures without performance degradation.</p></div><div><h3>Materials and Methods</h3><p>This experimental retrospective study on 84 CT scans consisting of 54618 slices involves design, implementation, and evaluation of segmentation algorithm which can be used to generate a clinical report for the presence of IVC filters on abdominal CT scans performed for any reason. Several variants of patch-based 3D-Convolutional Neural Network (CNN) and the Swin UNet Transformer (Swin-UNETR) are used to retrieve the signature of IVC filters. The Dice Score is used as a metric to compare the performance of the segmentation models.</p></div><div><h3>Results</h3><p>Model trained on UNet variant using four ResNet layers showed a higher segmentation performance achieving median Dice = 0.92 [Interquartile range(IQR): 0.85, 0.93] compared to the plain UNet model with four layers having median Dice = 0.89 [IQR: 0.83, 0.92]. Segmentation results from ResNet with two layers achieved a median Dice = 0.93 [IQR: 0.87, 0.94] which was higher than the plain UNet model with two layers at median Dice = 0.87 [IQR: 0.77, 0.90]. Models trained using SWIN-based transformers performed significantly better in both training and validation datasets compared to the four CNN variants. The validation median Dice was highest in 4 layer Swin UNETR at 0.88 followed by 2 layer Swin UNETR at 0.85.</p></div><div><h3>Conclusion</h3><p>Utilization of vision based transformer Swin-UNETR results in segmentation output with both low bias and variance thereby solving a real-world problem within healthcare for advanced Artificial Intelligence (AI) image processing and recognition. The Swin UNETR will reduce the time spent manually tracking IVC filters by centralizing within the electronic health record. Link to <span>GitHub</span><svg><path></path></svg> repository.</p></div>","PeriodicalId":72304,"journal":{"name":"Artificial intelligence in the life sciences","volume":"4 ","pages":"Article 100084"},"PeriodicalIF":0.0,"publicationDate":"2023-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46348564","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Imaging plays a fundamental role in the effective diagnosis, staging, management, and monitoring of various cardiac pathologies. Successful radiological analysis relies on accurate image segmentation, a technically arduous process, prone to human-error. To overcome the laborious and time-consuming nature of cardiac image analysis, deep learning approaches have been developed, enabling the accurate, time-efficient, and highly personalised diagnosis, staging and management of cardiac pathologies.
Here, we present a review of over 60 papers, proposing deep learning models for cardiac image segmentation. We summarise the theoretical basis of Convolutional Neural Networks, Fully Convolutional Neural Networks, U-Net, V-Net, No-New-U-Net (nnU-Net), Transformer Networks, DeepLab, Generative Adversarial Networks, Auto Encoders and Recurrent Neural Networks. In addition, we identify pertinent performance-enhancing measures including adaptive convolutional kernels, atrous convolutions, attention gates, and deep supervision modules.
Top-performing models in ventricular, myocardial, atrial and aortic segmentation are explored, highlighting U-Net and nnU-Net-based model architectures achieving state-of-the art segmentation accuracies. Additionally, key gaps in the current research and technology are identified, and areas of future research are suggested, aiming to guide the innovation and clinical adoption of automated cardiac segmentation methods.
{"title":"Deep neural network architectures for cardiac image segmentation","authors":"Jasmine El-Taraboulsi , Claudia P. Cabrera , Caroline Roney , Nay Aung","doi":"10.1016/j.ailsci.2023.100083","DOIUrl":"10.1016/j.ailsci.2023.100083","url":null,"abstract":"<div><p>Imaging plays a fundamental role in the effective diagnosis, staging, management, and monitoring of various cardiac pathologies. Successful radiological analysis relies on accurate image segmentation, a technically arduous process, prone to human-error. To overcome the laborious and time-consuming nature of cardiac image analysis, deep learning approaches have been developed, enabling the accurate, time-efficient, and highly personalised diagnosis, staging and management of cardiac pathologies.</p><p>Here, we present a review of over 60 papers, proposing deep learning models for cardiac image segmentation. We summarise the theoretical basis of Convolutional Neural Networks, Fully Convolutional Neural Networks, U-Net, V-Net, No-New-U-Net (nnU-Net), Transformer Networks, DeepLab, Generative Adversarial Networks, Auto Encoders and Recurrent Neural Networks. In addition, we identify pertinent performance-enhancing measures including adaptive convolutional kernels, atrous convolutions, attention gates, and deep supervision modules.</p><p>Top-performing models in ventricular, myocardial, atrial and aortic segmentation are explored, highlighting U-Net and nnU-Net-based model architectures achieving state-of-the art segmentation accuracies. Additionally, key gaps in the current research and technology are identified, and areas of future research are suggested, aiming to guide the innovation and clinical adoption of automated cardiac segmentation methods.</p></div>","PeriodicalId":72304,"journal":{"name":"Artificial intelligence in the life sciences","volume":"4 ","pages":"Article 100083"},"PeriodicalIF":0.0,"publicationDate":"2023-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45888732","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-07-17DOI: 10.1016/j.ailsci.2023.100082
Anum Shafiq , Andaç Batur Çolak , Tabassum Naz Sindhu , Showkat Ahmad Lone , Tahani A. Abushal
The core objective of this research is to describe the behavior of the distribution using the MLE method to estimate its parameters, as well as to determine the optimal Artificial Neural Network method by comparing it to the maximum likelihood estimation method and applying it to real data for breast cancer patients to determine survival, risk, and other survival study functions of the log-logistic distribution. The parameters were defined in the input layer of the artificial neural network developed for the purpose of survival analysis and reliability function, hazard rate function, probability density function, reserved hazard rate function, Mills ratio, Odd function and CHR values were obtained in the output layer. The findings show that risk function increases with the increase in the time of infection and then decreases for a group of breast cancer patients under study, which corresponds to the theoretical properties of this according to the practical conclusions. The examination of survival analysis reveals that practical conclusions correspond to the theoretical properties of log-logistic distribution. Artificial neural networks have proven to be one of the ideal tools that can be used to predict various vital parameters, especially survival of cancer patients, with their high predictive capabilities.
{"title":"Modeling and survival exploration of breast carcinoma: A statistical, maximum likelihood estimation, and artificial neural network perspective","authors":"Anum Shafiq , Andaç Batur Çolak , Tabassum Naz Sindhu , Showkat Ahmad Lone , Tahani A. Abushal","doi":"10.1016/j.ailsci.2023.100082","DOIUrl":"10.1016/j.ailsci.2023.100082","url":null,"abstract":"<div><p>The core objective of this research is to describe the behavior of the distribution using the MLE method to estimate its parameters, as well as to determine the optimal Artificial Neural Network method by comparing it to the maximum likelihood estimation method and applying it to real data for breast cancer patients to determine survival, risk, and other survival study functions of the log-logistic distribution. The parameters were defined in the input layer of the artificial neural network developed for the purpose of survival analysis and reliability function, hazard rate function, probability density function, reserved hazard rate function, Mills ratio, Odd function and CHR values were obtained in the output layer. The findings show that risk function increases with the increase in the time of infection and then decreases for a group of breast cancer patients under study, which corresponds to the theoretical properties of this according to the practical conclusions. The examination of survival analysis reveals that practical conclusions correspond to the theoretical properties of log-logistic distribution. Artificial neural networks have proven to be one of the ideal tools that can be used to predict various vital parameters, especially survival of cancer patients, with their high predictive capabilities.</p></div>","PeriodicalId":72304,"journal":{"name":"Artificial intelligence in the life sciences","volume":"4 ","pages":"Article 100082"},"PeriodicalIF":0.0,"publicationDate":"2023-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41536157","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-05-15DOI: 10.1016/j.ailsci.2023.100075
Florian Störtz, Jeffrey K. Mak, Peter Minary
CRISPR/Cas programmable nuclease systems have become ubiquitous in the field of gene editing. With progressing development, applications in in vivo therapeutic gene editing are increasingly within reach, yet limited by possible adverse side effects from unwanted edits. Recent years have thus seen continuous development of off-target prediction algorithms trained on in vitro cleavage assay data gained from immortalised cell lines. It has been shown that in contrast to experimental epigenetic features, computed physically informed features are so far underutilised despite bearing considerably larger correlation with cleavage activity. Here, we implement state-of-the-art deep learning algorithms and feature encodings for off-target prediction with emphasis on physically informed features that capture the biological environment of the cleavage site, hence terming our approach piCRISPR. Features were gained from the large, diverse crisprSQL off-target cleavage dataset. We find that our best-performing models highlight the importance of sequence context and chromatin accessibility for cleavage prediction and compare favourably with literature standard prediction performance. We further show that our novel, environmentally sensitive features are crucial to accurate prediction on sequence-identical locus pairs, making them highly relevant for clinical guide design. The source code and trained models can be found ready to use at github.com/florianst/picrispr.
{"title":"piCRISPR: Physically informed deep learning models for CRISPR/Cas9 off-target cleavage prediction","authors":"Florian Störtz, Jeffrey K. Mak, Peter Minary","doi":"10.1016/j.ailsci.2023.100075","DOIUrl":"https://doi.org/10.1016/j.ailsci.2023.100075","url":null,"abstract":"<div><p>CRISPR/Cas programmable nuclease systems have become ubiquitous in the field of gene editing. With progressing development, applications in <em>in vivo</em> therapeutic gene editing are increasingly within reach, yet limited by possible adverse side effects from unwanted edits. Recent years have thus seen continuous development of off-target prediction algorithms trained on <em>in vitro</em> cleavage assay data gained from immortalised cell lines. It has been shown that in contrast to experimental epigenetic features, computed physically informed features are so far underutilised despite bearing considerably larger correlation with cleavage activity. Here, we implement state-of-the-art deep learning algorithms and feature encodings for off-target prediction with emphasis on <em>physically informed</em> features that capture the biological environment of the cleavage site, hence terming our approach piCRISPR. Features were gained from the large, diverse crisprSQL off-target cleavage dataset. We find that our best-performing models highlight the importance of sequence context and chromatin accessibility for cleavage prediction and compare favourably with literature standard prediction performance. We further show that our novel, environmentally sensitive features are crucial to accurate prediction on sequence-identical locus pairs, making them highly relevant for clinical guide design. The source code and trained models can be found ready to use at <span>github.com/florianst/picrispr</span><svg><path></path></svg>.</p></div>","PeriodicalId":72304,"journal":{"name":"Artificial intelligence in the life sciences","volume":"3 ","pages":"Article 100075"},"PeriodicalIF":0.0,"publicationDate":"2023-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49774977","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-05-11DOI: 10.1016/j.ailsci.2023.100077
Jazmín Miranda-Salas , Carlos Peña-Varas , Ignacio Valenzuela Martínez , Dionisio A. Olmedo , William J. Zamora , Miguel Angel Chávez-Fumagalli , Daniela Q. Azevedo , Rachel Oliveira Castilho , Vinicius G. Maltarollo , David Ramírez , José L. Medina-Franco
Chemoinformatics is an independent inter-discipline with a broad impact in drug design and discovery, medicinal chemistry, biochemistry, analytical and organic chemistry, natural products, and several other areas in chemistry. Through collaborations, scientific exchanges, and participation in international research networks, Latin American scientists have contributed to the development of this subject. The aim of this perspective is to discuss the status and progress of the chemoinformatic discipline in Latin America. We team up to provide an author´s perspective on the topics that have been investigated and published over the past twelve years, collaborations between Latin America researchers and others worldwide, contributions to open-access chemoinformatic tools such as web servers, and educational-related resources and events, such as scientific conferences. We conclude that linking and fostering collaboration within each nation as well as among other Latin American nations and globally is made possible by open science and the democratization of science. We also outline strategic actions that can boost the development and practice of chemoinformatic in the region and enhance the interaction between Latin American countries and the rest of the world.
{"title":"Trends and challenges in chemoinformatics research in Latin America","authors":"Jazmín Miranda-Salas , Carlos Peña-Varas , Ignacio Valenzuela Martínez , Dionisio A. Olmedo , William J. Zamora , Miguel Angel Chávez-Fumagalli , Daniela Q. Azevedo , Rachel Oliveira Castilho , Vinicius G. Maltarollo , David Ramírez , José L. Medina-Franco","doi":"10.1016/j.ailsci.2023.100077","DOIUrl":"10.1016/j.ailsci.2023.100077","url":null,"abstract":"<div><p>Chemoinformatics is an independent inter-discipline with a broad impact in drug design and discovery, medicinal chemistry, biochemistry, analytical and organic chemistry, natural products, and several other areas in chemistry. Through collaborations, scientific exchanges, and participation in international research networks, Latin American scientists have contributed to the development of this subject. The aim of this perspective is to discuss the status and progress of the chemoinformatic discipline in Latin America. We team up to provide an author´s perspective on the topics that have been investigated and published over the past twelve years, collaborations between Latin America researchers and others worldwide, contributions to open-access chemoinformatic tools such as web servers, and educational-related resources and events, such as scientific conferences. We conclude that linking and fostering collaboration within each nation as well as among other Latin American nations and globally is made possible by open science and the democratization of science. We also outline strategic actions that can boost the development and practice of chemoinformatic in the region and enhance the interaction between Latin American countries and the rest of the world.</p></div>","PeriodicalId":72304,"journal":{"name":"Artificial intelligence in the life sciences","volume":"3 ","pages":"Article 100077"},"PeriodicalIF":0.0,"publicationDate":"2023-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42646088","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-04-19DOI: 10.1016/j.ailsci.2023.100072
Arjun Rao , Tin M. Tunjic , Michael Brunsteiner , Michael Müller, Hosein Fooladi, Chiara Gasbarri, Noah Weber
Proximity-inducing compounds (PICs) are an emergent drug technology through which a protein of interest (POI), often a drug target, is brought into the vicinity of a second protein which modifies the POI’s function, abundance or localisation, giving rise to a therapeutic effect. One of the best-known examples for such compounds are heterobifunctional molecules known as proteolysis targeting chimeras (PROTACs). PROTACs reduce the abundance of the target protein by establishing proximity to an E3 ligase which labels the protein for degradation via the ubiquitin-proteasomal pathway. Design of PROTACs in silico requires the computational prediction of the ternary complex consisting of POI, PROTAC molecule, and the E3 ligase.
We present a novel machine learning-based method for predicting PROTAC-mediated ternary complex structures using Bayesian optimization. We show how a fitness score combining an estimation of protein-protein interactions with PROTAC conformation energy calculations enables the sample-efficient exploration of candidate structures. Furthermore, our method presents two novel scores for filtering and reranking which take PROTAC stability (Autodock-Vina based PROTAC stability score) and protein interaction restraints (the TCP-AIR score) into account. We evaluate our method using DockQ scores on a number of available ternary complex structures (including previously unevaluated cases) and demonstrate that even with a clustering that requires members to have a high similarity, i.e., with smaller clusters, we can assign high ranks to those clusters that contain poses close to the experimentally determined native structure of the ternary complexes. We also demonstrate the resultant improved yield of near-native poses3 in these clusters.
{"title":"Bayesian optimization for ternary complex prediction (BOTCP)","authors":"Arjun Rao , Tin M. Tunjic , Michael Brunsteiner , Michael Müller, Hosein Fooladi, Chiara Gasbarri, Noah Weber","doi":"10.1016/j.ailsci.2023.100072","DOIUrl":"https://doi.org/10.1016/j.ailsci.2023.100072","url":null,"abstract":"<div><p>Proximity-inducing compounds (PICs) are an emergent drug technology through which a protein of interest (POI), often a drug target, is brought into the vicinity of a second protein which modifies the POI’s function, abundance or localisation, giving rise to a therapeutic effect. One of the best-known examples for such compounds are heterobifunctional molecules known as proteolysis targeting chimeras (PROTACs). PROTACs reduce the abundance of the target protein by establishing proximity to an E3 ligase which labels the protein for degradation via the ubiquitin-proteasomal pathway. Design of PROTACs in silico requires the computational prediction of the ternary complex consisting of POI, PROTAC molecule, and the E3 ligase.</p><p>We present a novel machine learning-based method for predicting PROTAC-mediated ternary complex structures using Bayesian optimization. We show how a fitness score combining an estimation of protein-protein interactions with PROTAC conformation energy calculations enables the sample-efficient exploration of candidate structures. Furthermore, our method presents two novel scores for filtering and reranking which take PROTAC stability (Autodock-Vina based PROTAC stability score) and protein interaction restraints (the TCP-AIR score) into account. We evaluate our method using DockQ scores on a number of available ternary complex structures (including previously unevaluated cases) and demonstrate that even with a clustering that requires members to have a high similarity, i.e., with smaller clusters, we can assign high ranks to those clusters that contain poses close to the experimentally determined native structure of the ternary complexes. We also demonstrate the resultant improved yield of near-native poses<span><sup>3</sup></span> in these clusters.</p></div>","PeriodicalId":72304,"journal":{"name":"Artificial intelligence in the life sciences","volume":"3 ","pages":"Article 100072"},"PeriodicalIF":0.0,"publicationDate":"2023-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49775003","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-04-14DOI: 10.1016/j.ailsci.2023.100073
María Andreína Francisco Rodríguez, Jordi Carreras Puigvert, Ola Spjuth
Microplates are indispensable in large-scale biomedical experiments but the physical location of samples and controls on the microplate can significantly affect the resulting data and quality metric values. We introduce a new method based on constraint programming for designing microplate layouts that reduces unwanted bias and limits the impact of batch effects after error correction and normalisation. We demonstrate that our method applied to dose-response experiments leads to more accurate regression curves and lower errors when estimating /, and for drug screening leads to increased precision, when compared to random layouts. It also reduces the risk of inflated scores from common microplate quality assessment metrics such as factor and SSMD. We make our method available via a suite of tools (PLAID) including a reference constraint model, a web application, and Python notebooks to evaluate and compare designs when planning microplate experiments.
{"title":"Designing microplate layouts using artificial intelligence","authors":"María Andreína Francisco Rodríguez, Jordi Carreras Puigvert, Ola Spjuth","doi":"10.1016/j.ailsci.2023.100073","DOIUrl":"https://doi.org/10.1016/j.ailsci.2023.100073","url":null,"abstract":"<div><p>Microplates are indispensable in large-scale biomedical experiments but the physical location of samples and controls on the microplate can significantly affect the resulting data and quality metric values. We introduce a new method based on constraint programming for designing microplate layouts that reduces unwanted bias and limits the impact of batch effects after error correction and normalisation. We demonstrate that our method applied to dose-response experiments leads to more accurate regression curves and lower errors when estimating <span><math><msub><mtext>IC</mtext><mn>50</mn></msub></math></span>/<span><math><msub><mtext>EC</mtext><mn>50</mn></msub></math></span>, and for drug screening leads to increased precision, when compared to random layouts. It also reduces the risk of inflated scores from common microplate quality assessment metrics such as <span><math><msup><mi>Z</mi><mo>′</mo></msup></math></span> factor and SSMD. We make our method available via a suite of tools (PLAID) including a reference constraint model, a web application, and Python notebooks to evaluate and compare designs when planning microplate experiments.</p></div>","PeriodicalId":72304,"journal":{"name":"Artificial intelligence in the life sciences","volume":"3 ","pages":"Article 100073"},"PeriodicalIF":0.0,"publicationDate":"2023-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49774976","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}