Pub Date : 2025-09-26DOI: 10.1186/s13321-025-01087-0
Tong Luo, Zheng Zhang, Xian-gan Chen, Zhi Li
Compared to monotherapy, drug combinations exhibit stronger efficacy, fewer side effects, and lower drug resistance in cancer treatment. However, traditional wet-lab methods for screening synergistic drug combinations are both costly and inefficient. Lately, the development of various drug synergy methods has been promoted by the emergence of multiple drug synergy databases. Many of these methods use multimodal data and achieve good results. However, if various modalities of data is given equal consideration without taking into account the differences in features between the two modalities, this may lead to less effective multi-modal learning. We propose a multi-modal contrastive learning method for drug synergy prediction, named MCDSP. Specifically, MCDSP extracts entity embedding features of drugs and cell lines from heterogeneous graphs, while leveraging molecular fingerprints and gene expression features as biomolecular features for drugs and cell lines. These two different types of features serve as two types of modality information. Under the guided of single modality prediction tasks, we evaluated the relevant information of each modality. Through contrastive learning, the prediction bias of the two modalities are reduced, which obtain improved quality of multi-modal feature. Experiments show that MCDSP outperforms baseline methods on large datasets, and it performs well in handling unknown drug combinations and cell lines. MCDSP has demonstrated significant effectiveness in predicting drug synergy.
{"title":"Multi-modal contrastive drug synergy prediction model guided by single modality","authors":"Tong Luo, Zheng Zhang, Xian-gan Chen, Zhi Li","doi":"10.1186/s13321-025-01087-0","DOIUrl":"10.1186/s13321-025-01087-0","url":null,"abstract":"<div><p>Compared to monotherapy, drug combinations exhibit stronger efficacy, fewer side effects, and lower drug resistance in cancer treatment. However, traditional wet-lab methods for screening synergistic drug combinations are both costly and inefficient. Lately, the development of various drug synergy methods has been promoted by the emergence of multiple drug synergy databases. Many of these methods use multimodal data and achieve good results. However, if various modalities of data is given equal consideration without taking into account the differences in features between the two modalities, this may lead to less effective multi-modal learning. We propose a multi-modal contrastive learning method for drug synergy prediction, named MCDSP. Specifically, MCDSP extracts entity embedding features of drugs and cell lines from heterogeneous graphs, while leveraging molecular fingerprints and gene expression features as biomolecular features for drugs and cell lines. These two different types of features serve as two types of modality information. Under the guided of single modality prediction tasks, we evaluated the relevant information of each modality. Through contrastive learning, the prediction bias of the two modalities are reduced, which obtain improved quality of multi-modal feature. Experiments show that MCDSP outperforms baseline methods on large datasets, and it performs well in handling unknown drug combinations and cell lines. MCDSP has demonstrated significant effectiveness in predicting drug synergy.</p></div>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"17 1","pages":""},"PeriodicalIF":5.7,"publicationDate":"2025-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-025-01087-0","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145141369","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-26DOI: 10.1186/s13321-025-01066-5
Srijit Seal, Maria-Anna Trapotsi, Manas Mahale, Vigneshwari Subramanian, Nigel Greene, Ola Spjuth, Andreas Bender
Drug exposure, a key determinant of drug safety and efficacy, is governed by pharmacokinetic (PK) parameters such as volume of distribution (VDss), clearance (CL), half-life (t½), fraction unbound in plasma (fu), and mean residence time (MRT). In this study, we developed machine learning models to predict human PK parameters for 1,283 unique compounds using molecular structure, physicochemical properties, and predicted animal PK data. Our approach involved a two-stage modeling pipeline. First, we trained models to predict rat, dog, and monkey PK parameters (VDss, CL, fu) from chemical structure and properties for 371 compounds. These models were used to predict animal PK values for 1,283 unique compounds with human PK data. These animal PK predictions were then integrated with molecular descriptors and fingerprints to build Random Forest models for human PK parameters. The models demonstrated consistent performance across nested cross-validation and external validation sets, with predictive accuracy for VDss comparable to proprietary models developed by AstraZeneca. Notably, human VDss and CL predictions achieved external R2 values of 0.39 and 0.46, respectively. To support broad accessibility and integration into early drug discovery workflows such as Design-Make-Test-Analyze (DMTA), we developed PKSmart (https://broad.io/PKSmart), a freely available web application. All code and models are also open source, enabling local deployment. To our knowledge, this represents the first public suite of PK prediction models with performance on par with industry standard models.
This study introduces the first publicly available pharmacokinetic (PK) models that match industry-standard predictions, utilizing molecular structural fingerprints, physicochemical properties, and predicted animal PK data to model human pharmacokinetics. Our approach is validated through repeated nested cross-validation and an external test set, including comparing predictions to an industry standard model. The models are released via a web-hosted application (https://broad.io/PKSmart) for wider accessibility and utility in drug development processes.
{"title":"PKSmart: an open-source computational model to predict intravenous pharmacokinetics of small molecules","authors":"Srijit Seal, Maria-Anna Trapotsi, Manas Mahale, Vigneshwari Subramanian, Nigel Greene, Ola Spjuth, Andreas Bender","doi":"10.1186/s13321-025-01066-5","DOIUrl":"10.1186/s13321-025-01066-5","url":null,"abstract":"<p>Drug exposure, a key determinant of drug safety and efficacy, is governed by pharmacokinetic (PK) parameters such as volume of distribution (VDss), clearance (CL), half-life (t½), fraction unbound in plasma (fu), and mean residence time (MRT). In this study, we developed machine learning models to predict human PK parameters for 1,283 unique compounds using molecular structure, physicochemical properties, and predicted animal PK data. Our approach involved a two-stage modeling pipeline. First, we trained models to predict rat, dog, and monkey PK parameters (VDss, CL, fu) from chemical structure and properties for 371 compounds. These models were used to predict animal PK values for 1,283 unique compounds with human PK data. These animal PK predictions were then integrated with molecular descriptors and fingerprints to build Random Forest models for human PK parameters. The models demonstrated consistent performance across nested cross-validation and external validation sets, with predictive accuracy for VDss comparable to proprietary models developed by AstraZeneca. Notably, human VDss and CL predictions achieved external R<sup>2</sup> values of 0.39 and 0.46, respectively. To support broad accessibility and integration into early drug discovery workflows such as Design-Make-Test-Analyze (DMTA), we developed PKSmart (https://broad.io/PKSmart), a freely available web application. All code and models are also open source, enabling local deployment. To our knowledge, this represents the first public suite of PK prediction models with performance on par with industry standard models.</p><p>This study introduces the first publicly available pharmacokinetic (PK) models that match industry-standard predictions, utilizing molecular structural fingerprints, physicochemical properties, and predicted animal PK data to model human pharmacokinetics. Our approach is validated through repeated nested cross-validation and an external test set, including comparing predictions to an industry standard model. The models are released via a web-hosted application (https://broad.io/PKSmart) for wider accessibility and utility in drug development processes.</p>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"17 1","pages":""},"PeriodicalIF":5.7,"publicationDate":"2025-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-025-01066-5","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145141367","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-25DOI: 10.1186/s13321-025-01067-4
Jordi Gómez Borrego, Marc Torrent Burgas
Advances in docking protocols have significantly enhanced the field of protein–protein interaction (PPI) modulation, with AlphaFold2 (AF2) and molecular dynamics (MD) refinements playing pivotal roles. This study evaluates the performance of AF2 models against experimentally solved structures in docking protocols targeting PPIs. Using a dataset of 16 interactions with validated modulators, we benchmarked eight docking protocols, revealing similar performance between native and AF2 models. Local docking strategies outperformed blind docking, with TankBind_local and Glide providing the best results across the structural types tested. MD simulations and other ensemble generation algorithms such as AlphaFlow, refined both native and AF2 models, improving docking outcomes but showing significant variability across conformations. These results suggest that, while structural refinement can enhance docking in some cases, overall performance appears to be constrained by limitations in scoring functions and docking methodologies. Although protein ensembles can improve virtual screening, predicting the most effective conformations for docking remains a challenge. These findings support the use of AF2-generated structures in docking protocols targeting PPIs and highlight the need to improve current scoring methodologies.
This study provides a systematic benchmark of docking protocols applied to protein–proteininteractions (PPIs) using both experimentally solved structures and AlphaFold2 models. Byintegrating molecular dynamics ensembles and AlphaFlow-generated conformations, we showthat structural refinement improves docking outcomes in selected cases, but overallperformance remains constrained by docking scoring function limitations. Our analysis showsthat AlphaFold2 models perform comparably to native structures in PPI docking, validating theiruse when experimental data are unavailable. These results establish a reference framework forfuture PPI-focused virtual screening and underscore the need for improved scoring functionsand ensemble-based approaches to better exploit emerging structural prediction tools.
{"title":"Evaluating ligand docking methods for drugging protein–protein interfaces: insights from AlphaFold2 and molecular dynamics refinement","authors":"Jordi Gómez Borrego, Marc Torrent Burgas","doi":"10.1186/s13321-025-01067-4","DOIUrl":"10.1186/s13321-025-01067-4","url":null,"abstract":"<p>Advances in docking protocols have significantly enhanced the field of protein–protein interaction (PPI) modulation, with AlphaFold2 (AF2) and molecular dynamics (MD) refinements playing pivotal roles. This study evaluates the performance of AF2 models against experimentally solved structures in docking protocols targeting PPIs. Using a dataset of 16 interactions with validated modulators, we benchmarked eight docking protocols, revealing similar performance between native and AF2 models. Local docking strategies outperformed blind docking, with TankBind_local and Glide providing the best results across the structural types tested. MD simulations and other ensemble generation algorithms such as AlphaFlow, refined both native and AF2 models, improving docking outcomes but showing significant variability across conformations. These results suggest that, while structural refinement can enhance docking in some cases, overall performance appears to be constrained by limitations in scoring functions and docking methodologies. Although protein ensembles can improve virtual screening, predicting the most effective conformations for docking remains a challenge. These findings support the use of AF2-generated structures in docking protocols targeting PPIs and highlight the need to improve current scoring methodologies.</p><p>This study provides a systematic benchmark of docking protocols applied to protein–proteininteractions (PPIs) using both experimentally solved structures and AlphaFold2 models. Byintegrating molecular dynamics ensembles and AlphaFlow-generated conformations, we showthat structural refinement improves docking outcomes in selected cases, but overallperformance remains constrained by docking scoring function limitations. Our analysis showsthat AlphaFold2 models perform comparably to native structures in PPI docking, validating theiruse when experimental data are unavailable. These results establish a reference framework forfuture PPI-focused virtual screening and underscore the need for improved scoring functionsand ensemble-based approaches to better exploit emerging structural prediction tools.</p>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"17 1","pages":""},"PeriodicalIF":5.7,"publicationDate":"2025-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-025-01067-4","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145133533","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-25DOI: 10.1186/s13321-025-01084-3
Fabian Liessmann, Paul Eisenhuth, Alexander Fürll, Oanh Vu, Rocco Moretti, Jens Meiler
In this study, we present a pipeline for identifying novel ligands targeting the Tryptophan-Aspartate-Repeat domain 40 (WDR40) of Leucine-Rich Repeat Kinase 2 (LRRK2), a protein associated with Parkinson’s disease, as part of the first Critical Assessment of Computational Hit-finding Experiments (CACHE) challenge, a blind benchmark experiment for drug discovery. Mutations in this protein are the most common genetic cause of familial Parkinson’s disease, yet this target remains understudied. We conducted an ultra-large library screening (ULLS) of the Enamine REAL space using a newly developed evolutionary algorithm, RosettaEvolutionaryLigand (REvoLd), which allows for efficient screening of combinatorial compound libraries. The protocol involved refining the target structure with molecular dynamic simulations, identifying a binding site via blind-docking, and optimizing compounds through REvoLd, culminating in a manual selection amongst the top-scoring REvoLd hits. A single binder molecule was identified that derived from the combination of two Enamine building blocks. In the second round, derivatives of the hit compound were used as input for REvoLd to further sample within the Enamine REAL space. Ultimately, a total of five molecules were identified, from which three show a measurable dissociation constant K(_D) value better than 150 (upmu) μm, showcasing the effectiveness of this approach. However, it also highlighted shortcomings, such as the preference for nitrogen-rich rings in the RosettaLigand scoring function.
We introduce the first real-world application for REvoLd, an evolutionary docking algorithm enabling efficient ultra-large library screening for flexible protein targets. Our approach identified novel binders for the WDR40 domain of LRRK2 within the CACHE challenge #1, representing the first prospective validation of REvoLd. Here, we present a preparation pipeline to allow exploration of a large protein pocket with unspecific binding areas, and unlike prior brute-force docking efforts, our method integrates receptor flexibility and combinatorial chemistry optimization.
{"title":"Cache: Utilizing ultra-large library screening in Rosetta to identify novel binders of the WD-repeat domain of Leucine-Rich Repeat Kinase 2","authors":"Fabian Liessmann, Paul Eisenhuth, Alexander Fürll, Oanh Vu, Rocco Moretti, Jens Meiler","doi":"10.1186/s13321-025-01084-3","DOIUrl":"10.1186/s13321-025-01084-3","url":null,"abstract":"<p>In this study, we present a pipeline for identifying novel ligands targeting the Tryptophan-Aspartate-Repeat domain 40 (WDR40) of Leucine-Rich Repeat Kinase 2 (LRRK2), a protein associated with Parkinson’s disease, as part of the first Critical Assessment of Computational Hit-finding Experiments (CACHE) challenge, a blind benchmark experiment for drug discovery. Mutations in this protein are the most common genetic cause of familial Parkinson’s disease, yet this target remains understudied. We conducted an ultra-large library screening (ULLS) of the Enamine REAL space using a newly developed evolutionary algorithm, RosettaEvolutionaryLigand (REvoLd), which allows for efficient screening of combinatorial compound libraries. The protocol involved refining the target structure with molecular dynamic simulations, identifying a binding site via blind-docking, and optimizing compounds through REvoLd, culminating in a manual selection amongst the top-scoring REvoLd hits. A single binder molecule was identified that derived from the combination of two Enamine building blocks. In the second round, derivatives of the hit compound were used as input for REvoLd to further sample within the Enamine REAL space. Ultimately, a total of five molecules were identified, from which three show a measurable dissociation constant K<span>(_D)</span> value better than 150 <span>(upmu)</span> μm, showcasing the effectiveness of this approach. However, it also highlighted shortcomings, such as the preference for nitrogen-rich rings in the RosettaLigand scoring function.</p><p>We introduce the first real-world application for REvoLd, an evolutionary docking algorithm enabling efficient ultra-large library screening for flexible protein targets. Our approach identified novel binders for the WDR40 domain of LRRK2 within the CACHE challenge #1, representing the first prospective validation of REvoLd. Here, we present a preparation pipeline to allow exploration of a large protein pocket with unspecific binding areas, and unlike prior brute-force docking efforts, our method integrates receptor flexibility and combinatorial chemistry optimization.</p>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"17 1","pages":""},"PeriodicalIF":5.7,"publicationDate":"2025-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-025-01084-3","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145141368","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-23DOI: 10.1186/s13321-025-01100-6
Alec Lamens, Jürgen Bajorath
The concept of contrastive explanations originating from human reasoning is used in explainable artificial intelligence. In machine learning, contrastive explanations relate alternative prediction outcomes to each other involving the identification of features leading to opposing model decisions. We introduce a methodological framework for deriving contrastive explanations for machine learning models in chemistry to systematically generate intuitive explanations of predictions in high-dimensional feature spaces. The molecular contrastive explanations (MolCE) methodology explores alternative model decisions by generating virtual analogues of test compounds through replacements of molecular building blocks and quantifies the degree of “contrastive shifts” resulting from changes in model probability distributions. In a proof-of-concept study, MolCE was applied to explain selectivity predictions of ligands of D2-like dopamine receptor isoforms.
{"title":"Contrastive explanations for machine learning predictions in chemistry","authors":"Alec Lamens, Jürgen Bajorath","doi":"10.1186/s13321-025-01100-6","DOIUrl":"10.1186/s13321-025-01100-6","url":null,"abstract":"<div><p>The concept of contrastive explanations originating from human reasoning is used in explainable artificial intelligence. In machine learning, contrastive explanations relate alternative prediction outcomes to each other involving the identification of features leading to opposing model decisions. We introduce a methodological framework for deriving contrastive explanations for machine learning models in chemistry to systematically generate intuitive explanations of predictions in high-dimensional feature spaces. The molecular contrastive explanations (MolCE) methodology explores alternative model decisions by generating virtual analogues of test compounds through replacements of molecular building blocks and quantifies the degree of “contrastive shifts” resulting from changes in model probability distributions. In a proof-of-concept study, MolCE was applied to explain selectivity predictions of ligands of D2-like dopamine receptor isoforms.</p></div>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"17 1","pages":""},"PeriodicalIF":5.7,"publicationDate":"2025-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-025-01100-6","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145110668","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-23DOI: 10.1186/s13321-025-01094-1
Kohulan Rajan, Venkata Chandrasekhar, Nisha Sharma, Sri Ram Sagar Kanakam, Felix Baensch, Christoph Steinbeck
The widespread adoption of open-source cheminformatics toolkits remains constrained by technical implementation barriers, including complex installation procedures, dependency management, and integration challenges. Here, we present Cheminformatics Microservice V3, a significant update to the existing platform that provides unified programmatic access to cheminformatics libraries, including RDKit, Chemistry Development Kit (CDK), and Open Babel through a RESTful API framework. This latest version features a newly developed, interactive web-based frontend built with React, providing users with an intuitive graphical interface for manipulating and analysing chemical structures. The frontend supports essential cheminformatics operations, including structure editing, PubChem database integration, batch molecular processing, and standardised InChI/RInChI identifier generation. The microservice V3 addresses critical accessibility barriers in computational chemistry by providing researchers with immediate access to analytical tools, eliminating the need for specialised technical expertise or complex software installations. This approach facilitates reproducible research workflows and broadens the utilisation of cheminformatics methodologies across interdisciplinary research communities. The platform is publicly accessible at https://app.naturalproducts.net, and the complete source code and documentation are available on GitHub.
开源化学信息学工具包的广泛采用仍然受到技术实现障碍的限制,包括复杂的安装过程、依赖管理和集成挑战。在这里,我们介绍了Cheminformatics Microservice V3,这是对现有平台的重大更新,它通过RESTful API框架提供了对化学信息学库的统一编程访问,包括RDKit、Chemistry Development Kit (CDK)和Open Babel。这个最新版本的特点是使用React构建了一个新开发的交互式基于web的前端,为用户提供了一个直观的图形界面来操作和分析化学结构。前端支持基本的化学信息学操作,包括结构编辑、PubChem数据库集成、批量分子处理和标准化的InChI/RInChI标识符生成。微服务V3通过为研究人员提供即时访问分析工具,消除了对专业技术知识或复杂软件安装的需求,解决了计算化学中关键的可访问性障碍。这种方法促进了可重复的研究工作流程,并扩大了化学信息学方法在跨学科研究社区的应用。该平台可在https://app.naturalproducts.net上公开访问,完整的源代码和文档可在GitHub上获得。
{"title":"Cheminformatics Microservice V3: a web portal for chemical structure manipulation and analysis","authors":"Kohulan Rajan, Venkata Chandrasekhar, Nisha Sharma, Sri Ram Sagar Kanakam, Felix Baensch, Christoph Steinbeck","doi":"10.1186/s13321-025-01094-1","DOIUrl":"10.1186/s13321-025-01094-1","url":null,"abstract":"<div><p>The widespread adoption of open-source cheminformatics toolkits remains constrained by technical implementation barriers, including complex installation procedures, dependency management, and integration challenges. Here, we present <i>Cheminformatics Microservice V3</i>, a significant update to the existing platform that provides unified programmatic access to cheminformatics libraries, including RDKit, Chemistry Development Kit (CDK), and Open Babel through a RESTful API framework. This latest version features a newly developed, interactive web-based frontend built with React, providing users with an intuitive graphical interface for manipulating and analysing chemical structures. The frontend supports essential cheminformatics operations, including structure editing, PubChem database integration, batch molecular processing, and standardised InChI/RInChI identifier generation. The microservice V3 addresses critical accessibility barriers in computational chemistry by providing researchers with immediate access to analytical tools, eliminating the need for specialised technical expertise or complex software installations. This approach facilitates reproducible research workflows and broadens the utilisation of cheminformatics methodologies across interdisciplinary research communities. The platform is publicly accessible at https://app.naturalproducts.net, and the complete source code and documentation are available on GitHub.</p></div>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"17 1","pages":""},"PeriodicalIF":5.7,"publicationDate":"2025-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-025-01094-1","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145110667","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-19DOI: 10.1186/s13321-025-01093-2
Nour H. Marzouk, Sahar Selim, Mustafa Elattar, Mai S. Mabrouk, Mohamed Mysara
In drug development, managing interactions such as drug–drug, drug–disease, and drug–nutrient is critical for ensuring the safety and efficacy of pharmacological treatments. These interactions often overlap, forming a complex, interconnected landscape that necessitates accurate prediction to improve patient outcomes and support evidence-based care. Recent advances in artificial intelligence (AI), powered by large-scale datasets (e.g., DrugBank, TWOSIDES, SIDER), have significantly enhanced interaction prediction. Machine learning, deep learning, and graph-based models show great promise, but challenges persist, including data imbalance, noisy sources, Limited explainability, and underrepresentation of certain types of interactions. This systematic review of 147 studies (2018–2024) is the first to comprehensively map AI applications across major interaction types. We present a detailed taxonomy of models and datasets, emphasizing the growing roles of large language models and knowledge graphs in overcoming key limitations. Their integration—alongside explainable AI tools—enhances transparency, paving the way for AI-driven systems that proactively mitigate adverse interactions. By identifying the most promising approaches and critical research gaps, this review lays the groundwork for advancing more robust, interpretable, and personalized models for drug interaction prediction.
{"title":"A comprehensive landscape of AI applications in broad-spectrum drug interaction prediction: a systematic review","authors":"Nour H. Marzouk, Sahar Selim, Mustafa Elattar, Mai S. Mabrouk, Mohamed Mysara","doi":"10.1186/s13321-025-01093-2","DOIUrl":"10.1186/s13321-025-01093-2","url":null,"abstract":"<div><p>In drug development, managing interactions such as drug–drug, drug–disease, and drug–nutrient is critical for ensuring the safety and efficacy of pharmacological treatments. These interactions often overlap, forming a complex, interconnected landscape that necessitates accurate prediction to improve patient outcomes and support evidence-based care. Recent advances in artificial intelligence (AI), powered by large-scale datasets (e.g., DrugBank, TWOSIDES, SIDER), have significantly enhanced interaction prediction. Machine learning, deep learning, and graph-based models show great promise, but challenges persist, including data imbalance, noisy sources, Limited explainability, and underrepresentation of certain types of interactions. This systematic review of 147 studies (2018–2024) is the first to comprehensively map AI applications across major interaction types. We present a detailed taxonomy of models and datasets, emphasizing the growing roles of large language models and knowledge graphs in overcoming key limitations. Their integration—alongside explainable AI tools—enhances transparency, paving the way for AI-driven systems that proactively mitigate adverse interactions. By identifying the most promising approaches and critical research gaps, this review lays the groundwork for advancing more robust, interpretable, and personalized models for drug interaction prediction.</p></div>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"17 1","pages":""},"PeriodicalIF":5.7,"publicationDate":"2025-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-025-01093-2","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145079055","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-03DOI: 10.1186/s13321-025-01089-y
Jun Hyeong Park, Ri Han, Junbo Jang, Jisan Kim, Joonki Paik, Jaesung Heo, Yoonji Lee
The metabolic stability of a drug is a crucial determinant of its pharmacokinetic properties, including clearance, half-life, and oral bioavailability. Accurate predictions of metabolic stability can significantly streamline the drug discovery process. In this study, we present MetaboGNN, an advanced model for predicting liver metabolic stability based on Graph Neural Networks (GNNs) and Graph Contrastive Learning (GCL). Using a high-quality dataset from the 2023 South Korea Data Challenge for Drug Discovery, which comprises 3,498 training molecules and 483 test molecules, we presented molecular structures as graphs to capture the intricate structural relationships that influence metabolic stability. A GCL-driven pretraining step was employed to enhance model generalizability by learning robust, transferable graph-level representations. Notably, incorporating interspecies differences between human liver microsomes (HLM) and mouse liver microsomes (MLM) further improved predictive accuracy, achieving Root Mean Square Error (RMSE) values of 27.91 (HLM) and 27.86 (MLM), both expressed as the percentage of parent compound remaining after a 30-min incubation. Compared to traditional approaches, MetaboGNN demonstrates superior predictive performance and highlights the importance of considering interspecies enzymatic variations. In addition, attention-based analysis identified key molecular fragments associated with metabolic stability, highlighting chemically meaningful structural determinants. These findings establish MetaboGNN as a powerful tool for metabolic stability prediction, supporting more efficient lead optimization processes in drug discovery.
{"title":"MetaboGNN: predicting liver metabolic stability with graph neural networks and cross-species data","authors":"Jun Hyeong Park, Ri Han, Junbo Jang, Jisan Kim, Joonki Paik, Jaesung Heo, Yoonji Lee","doi":"10.1186/s13321-025-01089-y","DOIUrl":"10.1186/s13321-025-01089-y","url":null,"abstract":"<div><p>The metabolic stability of a drug is a crucial determinant of its pharmacokinetic properties, including clearance, half-life, and oral bioavailability. Accurate predictions of metabolic stability can significantly streamline the drug discovery process. In this study, we present <i>MetaboGNN</i>, an advanced model for predicting liver metabolic stability based on Graph Neural Networks (GNNs) and Graph Contrastive Learning (GCL). Using a high-quality dataset from the 2023 South Korea Data Challenge for Drug Discovery, which comprises 3,498 training molecules and 483 test molecules, we presented molecular structures as graphs to capture the intricate structural relationships that influence metabolic stability. A GCL-driven pretraining step was employed to enhance model generalizability by learning robust, transferable graph-level representations. Notably, incorporating interspecies differences between human liver microsomes (HLM) and mouse liver microsomes (MLM) further improved predictive accuracy, achieving Root Mean Square Error (RMSE) values of 27.91 (HLM) and 27.86 (MLM), both expressed as the percentage of parent compound remaining after a 30-min incubation. Compared to traditional approaches, <i>MetaboGNN</i> demonstrates superior predictive performance and highlights the importance of considering interspecies enzymatic variations. In addition, attention-based analysis identified key molecular fragments associated with metabolic stability, highlighting chemically meaningful structural determinants. These findings establish <i>MetaboGNN</i> as a powerful tool for metabolic stability prediction, supporting more efficient lead optimization processes in drug discovery.</p></div>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"17 1","pages":""},"PeriodicalIF":5.7,"publicationDate":"2025-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-025-01089-y","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144934590","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-03DOI: 10.1186/s13321-025-01047-8
Nam-Chul Cho, SeongEun Hong, Jin Sook Song, EuiJu Yeo, SoI Jung, Yuno Lee, Seul Gee Hwang, Su Min Kang, JaeSung Hwang, Tae-Eun Jin
The Korea Chemical Bank (KCB) has generated a dataset containing metabolic stability data for approximately 4,000 compounds that have been tested on human and mouse liver microsomes. The first South Korea Data Challenge, named the Jump AI Challenge for Drug Discovery (JUMP AI 2023), was opened using the metabolic stability data of KCB in 2023. The objective of the JUMP AI 2023 was to promote and encourage the development of new drugs using artificial intelligence (AI) technology in South Korea. A total of 1254 teams participated in the competition, developing algorithms to estimate the remaining percentage of compounds after 30 min of incubation with human and mouse liver microsomes. The data set comprised training and test sets of 3498 and 483 compounds, respectively. This paper provides an overview of the JUMP AI 2023 and its outcomes, highlighting the diverse range of algorithms and artificial intelligence technologies employed by the competing teams. Among these, five teams stood out by utilizing GNN-based approaches winning awards. This competition was the first AI competition for drug discovery in South Korea, attracting numerous researchers and playing a key role in promoting drug research through the application of artificial intelligence technologies.
韩国化学银行(KCB)制作了包含在人类和小鼠肝微粒体上测试的4000多种化合物的代谢稳定性数据的数据集。第一届韩国数据挑战赛名为Jump AI药物发现挑战赛(Jump AI 2023),于2023年利用KCB的代谢稳定性数据开启。JUMP AI 2023的目标是促进和鼓励利用人工智能(AI)技术在韩国开发新药。共有1254个团队参加了比赛,开发算法来估计人类和小鼠肝微粒体孵育30分钟后化合物的剩余百分比。数据集分别由3498个化合物的训练集和483个化合物的测试集组成。本文概述了JUMP AI 2023及其成果,重点介绍了参赛团队采用的各种算法和人工智能技术。其中,5个团队利用基于gnn的方法脱颖而出,获得了奖项。此次大赛是韩国首次举办药物研发人工智能大赛,吸引了众多研究人员,在通过应用人工智能技术促进药物研究方面发挥了关键作用。
{"title":"The first South Korean data challenge for drug discovery using human and mouse liver microsomal stability data","authors":"Nam-Chul Cho, SeongEun Hong, Jin Sook Song, EuiJu Yeo, SoI Jung, Yuno Lee, Seul Gee Hwang, Su Min Kang, JaeSung Hwang, Tae-Eun Jin","doi":"10.1186/s13321-025-01047-8","DOIUrl":"10.1186/s13321-025-01047-8","url":null,"abstract":"<div><p>The Korea Chemical Bank (KCB) has generated a dataset containing metabolic stability data for approximately 4,000 compounds that have been tested on human and mouse liver microsomes. The first South Korea Data Challenge, named the Jump AI Challenge for Drug Discovery (JUMP AI 2023), was opened using the metabolic stability data of KCB in 2023. The objective of the JUMP AI 2023 was to promote and encourage the development of new drugs using artificial intelligence (AI) technology in South Korea. A total of 1254 teams participated in the competition, developing algorithms to estimate the remaining percentage of compounds after 30 min of incubation with human and mouse liver microsomes. The data set comprised training and test sets of 3498 and 483 compounds, respectively. This paper provides an overview of the JUMP AI 2023 and its outcomes, highlighting the diverse range of algorithms and artificial intelligence technologies employed by the competing teams. Among these, five teams stood out by utilizing GNN-based approaches winning awards. This competition was the first AI competition for drug discovery in South Korea, attracting numerous researchers and playing a key role in promoting drug research through the application of artificial intelligence technologies.</p></div>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"17 1","pages":""},"PeriodicalIF":5.7,"publicationDate":"2025-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-025-01047-8","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144934591","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-01DOI: 10.1186/s13321-025-01086-1
Adel Memariani, Martin Glauer, Simon Flügel, Fabian Neuhaus, Janna Hastings, Till Mossakowski
Deriving symbolic knowledge from trained deep learning models is challenging due to the lack of transparency in such models. A promising approach to address this issue is to couple a semantic structure with the model outputs and thereby make the model interpretable. In prediction tasks such as multi-label classification, labels tend to form hierarchical relationships. Therefore, we propose enforcing a taxonomical structure on the model’s outputs throughout the training phase. In vector space, a taxonomy can be represented using axis-aligned hyper-rectangles, or boxes, which may overlap or nest within one another. The boundaries of a box determine the extent of a particular category. Thus, we used box-shaped embeddings of ontology classes to learn and transparently represent logical relationships that are only implicit in multi-label datasets. We assessed our model by measuring its ability to approximate the full set of inferred subclass relations in the ChEBI ontology, which is an important knowledge base in the field of life science. We demonstrate that our model captures implicit hierarchical relationships among labels, ensuring consistency with the underlying ontological conceptualization, while also achieving state-of-the-art performance in multi-label classification. Notably, this is accomplished without requiring an explicit taxonomy during the training process.
Our proposed approach advances chemical classification by enabling interpretable outputs through a structured and geometrically expressive representation of molecules and their classes.
{"title":"Box embeddings for extending ontologies: a data-driven and interpretable approach","authors":"Adel Memariani, Martin Glauer, Simon Flügel, Fabian Neuhaus, Janna Hastings, Till Mossakowski","doi":"10.1186/s13321-025-01086-1","DOIUrl":"10.1186/s13321-025-01086-1","url":null,"abstract":"<p>Deriving symbolic knowledge from trained deep learning models is challenging due to the lack of transparency in such models. A promising approach to address this issue is to couple a semantic structure with the model outputs and thereby make the model interpretable. In prediction tasks such as multi-label classification, labels tend to form hierarchical relationships. Therefore, we propose enforcing a taxonomical structure on the model’s outputs throughout the training phase. In vector space, a taxonomy can be represented using axis-aligned hyper-rectangles, or boxes, which may overlap or nest within one another. The boundaries of a box determine the extent of a particular category. Thus, we used box-shaped embeddings of ontology classes to learn and transparently represent logical relationships that are only implicit in multi-label datasets. We assessed our model by measuring its ability to approximate the full set of inferred subclass relations in the ChEBI ontology, which is an important knowledge base in the field of life science. We demonstrate that our model captures implicit hierarchical relationships among labels, ensuring consistency with the underlying ontological conceptualization, while also achieving state-of-the-art performance in multi-label classification. Notably, this is accomplished without requiring an explicit taxonomy during the training process.</p><p>Our proposed approach advances chemical classification by enabling\u0000interpretable outputs through a structured and geometrically\u0000expressive representation of molecules and their classes.</p>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"17 1","pages":""},"PeriodicalIF":5.7,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-025-01086-1","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144924125","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}