Journal of Cheminformatics最新文献_第8页

Subgrapher: visual fingerprinting of chemical structures 子图谱：化学结构的视觉指纹图谱

IF 5.7 2区化学 Q1 CHEMISTRY, MULTIDISCIPLINARY

Journal of Cheminformatics

Pub Date : 2025-09-29 DOI: 10.1186/s13321-025-01091-4

Lucas Morin, Gerhard Ingmar Meijer, Valéry Weber, Luc Van Gool, Peter W. J. Staar

Automatic extraction of molecules from scientific literature plays a crucial role in accelerating research across fields ranging from drug discovery to materials science. Patent documents, in particular, contain molecular information in visual form, which is often inaccessible through traditional text-based searches. In this work, we introduce SubGrapher, a method for the visual fingerprinting of molecule and Markush structure images. Unlike conventional Optical Chemical Structure Recognition (OCSR) models that attempt to reconstruct full molecular graphs, SubGrapher focuses on extracting fingerprints directly from images. Using learning-based instance segmentation, SubGrapher identifies functional groups and carbon backbones, constructing a substructure-based fingerprint that enables the retrieval of molecules and Markush structures. Our approach is evaluated against state-of-the-art OCSR and fingerprinting methods, demonstrating superior retrieval performance and robustness across diverse molecule and Markush structure depictions. The benchmark datasets, models, and inference code are publicly available..

从科学文献中自动提取分子在加速从药物发现到材料科学等各个领域的研究中起着至关重要的作用。特别是专利文件，以视觉形式包含分子信息，这通常是通过传统的基于文本的搜索无法访问的。在这项工作中，我们介绍了SubGrapher，一种用于分子和马库什结构图像的视觉指纹识别方法。与传统的光学化学结构识别（OCSR）模型试图重建完整的分子图不同，SubGrapher专注于直接从图像中提取指纹。使用基于学习的实例分割，SubGrapher可以识别官能团和碳骨架，构建基于子结构的指纹，从而可以检索分子和马库什结构。我们的方法与最先进的OCSR和指纹识别方法进行了评估，在不同的分子和马库什结构描述中展示了卓越的检索性能和鲁棒性。基准测试数据集、模型和推理代码都是公开的。SubGrapher引入了一种新的方法，将分子和马库什结构图像直接转换为指纹，只需一步，绕过传统的SMILES或图形重建。在不同数据集（包括Markush结构图像）的子结构检测和结构检索方面，它优于现有的OCSR和指纹识别方法。

{"title":"Subgrapher: visual fingerprinting of chemical structures","authors":"Lucas Morin, Gerhard Ingmar Meijer, Valéry Weber, Luc Van Gool, Peter W. J. Staar","doi":"10.1186/s13321-025-01091-4","DOIUrl":"10.1186/s13321-025-01091-4","url":null,"abstract":"<div><p>Automatic extraction of molecules from scientific literature plays a crucial role in accelerating research across fields ranging from drug discovery to materials science. Patent documents, in particular, contain molecular information in visual form, which is often inaccessible through traditional text-based searches. In this work, we introduce SubGrapher, a method for the visual fingerprinting of molecule and Markush structure images. Unlike conventional Optical Chemical Structure Recognition (OCSR) models that attempt to reconstruct full molecular graphs, SubGrapher focuses on extracting fingerprints directly from images. Using learning-based instance segmentation, SubGrapher identifies functional groups and carbon backbones, constructing a substructure-based fingerprint that enables the retrieval of molecules and Markush structures. Our approach is evaluated against state-of-the-art OCSR and fingerprinting methods, demonstrating superior retrieval performance and robustness across diverse molecule and Markush structure depictions. The benchmark datasets, models, and inference code are publicly available..</p></div>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"17 1","pages":""},"PeriodicalIF":5.7,"publicationDate":"2025-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-025-01091-4","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145182842","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

MolPrice: assessing synthetic accessibility of molecules based on market value 摩尔价格：基于市场价值评估分子的合成可及性

IF 5.7 2区化学 Q1 CHEMISTRY, MULTIDISCIPLINARY

Journal of Cheminformatics

Pub Date : 2025-09-29 DOI: 10.1186/s13321-025-01076-3

Friedrich Hastedt, Klaus Hellgardt, Sophia Yaliraki, Dongda Zhang, Antonio del Rio Chanona

Machine learning approaches for conceptualizing and designing in silico compounds have attracted significant attention. However, the applicability of these compounds is often challenged by synthetic viability and cost-effectiveness. Researchers introduced proxy-scores, known as synthethic accessiblity scoring, to quantify the ease of synthesis for virtual molecules. Despite their utility, existing synthetic accessibility tools have notable limitations: they overlook compound purchasability, lack physical interpretability, and often rely on imperfect computer-aided synthesis planning algorithms. We introduce MolPrice, an accurate and fast model for molecular price prediction. Utilizing self-supervised contrastive learning, MolPrice autonomously generates price labels for synthetically complex molecules, enabling the model to generalize to molecules beyond the training distribution. Our results show that MolPrice reliably assigns higher prices to synthetically complex molecules than to readily purchasable ones, effectively distinguishing different levels of synthetic accessibility. Furthermore, MolPrice achieves competitive performance on literature benchmarks for synthetic accessibility. To demonstrate its practical utility, we conduct a virtual screening case study, illustrating how MolPrice successfully identifies purchasable molecules from a large candidate library. MolPrice bridges the gap between generative molecular design and real-world feasibility by integrating cost-awareness into synthetic accessibility assessment, making it a powerful model to accelerate molecular discovery.

用于概念化和设计硅化合物的机器学习方法引起了人们的极大关注。然而，这些化合物的适用性经常受到合成可行性和成本效益的挑战。研究人员引入了代理评分，即合成可及性评分，来量化虚拟分子合成的难易程度。尽管它们很实用，但现有的合成可及性工具有明显的局限性：它们忽略了化合物的可购买性，缺乏物理可解释性，并且经常依赖于不完善的计算机辅助合成规划算法。我们介绍了MolPrice，一个准确、快速的分子价格预测模型。MolPrice利用自监督对比学习，自主生成合成复杂分子的价格标签，使模型能够泛化到训练分布之外的分子。我们的研究结果表明，MolPrice可靠地为合成复杂分子分配了比容易购买的分子更高的价格，有效地区分了不同的合成可及性水平。此外，MolPrice在合成可及性的文献基准上取得了具有竞争力的表现。为了证明其实用性，我们进行了一个虚拟筛选案例研究，说明MolPrice如何成功地从大型候选库中识别可购买的分子。MolPrice通过将成本意识整合到合成可及性评估中，弥合了生成分子设计与现实世界可行性之间的差距，使其成为加速分子发现的强大模型。我们介绍了MolPrice，这是一个机器学习模型，可以预测分子价格作为合成可及性的代理。与现有的方法不同，MolPrice将成本意识整合到可获得性评估中，使其能够区分容易购买的分子和合成复杂的分子。该模型计算效率高，适用于大规模虚拟筛选。因此，这项工作提供了一个实用的工具，在早期发现工作流程中优先考虑廉价和合成可行的化合物。

{"title":"MolPrice: assessing synthetic accessibility of molecules based on market value","authors":"Friedrich Hastedt, Klaus Hellgardt, Sophia Yaliraki, Dongda Zhang, Antonio del Rio Chanona","doi":"10.1186/s13321-025-01076-3","DOIUrl":"10.1186/s13321-025-01076-3","url":null,"abstract":"<div><p>Machine learning approaches for conceptualizing and designing in silico compounds have attracted significant attention. However, the applicability of these compounds is often challenged by synthetic viability and cost-effectiveness. Researchers introduced proxy-scores, known as synthethic accessiblity scoring, to quantify the ease of synthesis for virtual molecules. Despite their utility, existing synthetic accessibility tools have notable limitations: they overlook compound purchasability, lack physical interpretability, and often rely on imperfect computer-aided synthesis planning algorithms. We introduce <i>MolPrice</i>, an accurate and fast model for molecular price prediction. Utilizing self-supervised contrastive learning, <i>MolPrice</i> autonomously generates price labels for synthetically complex molecules, enabling the model to generalize to molecules beyond the training distribution. Our results show that <i>MolPrice</i> reliably assigns higher prices to synthetically complex molecules than to readily purchasable ones, effectively distinguishing different levels of synthetic accessibility. Furthermore, <i>MolPrice</i> achieves competitive performance on literature benchmarks for synthetic accessibility. To demonstrate its practical utility, we conduct a virtual screening case study, illustrating how <i>MolPrice</i> successfully identifies purchasable molecules from a large candidate library. <i>MolPrice</i> bridges the gap between generative molecular design and real-world feasibility by integrating cost-awareness into synthetic accessibility assessment, making it a powerful model to accelerate molecular discovery.</p></div>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"17 1","pages":""},"PeriodicalIF":5.7,"publicationDate":"2025-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-025-01076-3","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145188879","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Multi-modal contrastive drug synergy prediction model guided by single modality 单模态指导下的多模态对比药物协同作用预测模型

IF 5.7 2区化学 Q1 CHEMISTRY, MULTIDISCIPLINARY

Journal of Cheminformatics

Pub Date : 2025-09-26 DOI: 10.1186/s13321-025-01087-0

Tong Luo, Zheng Zhang, Xian-gan Chen, Zhi Li

Compared to monotherapy, drug combinations exhibit stronger efficacy, fewer side effects, and lower drug resistance in cancer treatment. However, traditional wet-lab methods for screening synergistic drug combinations are both costly and inefficient. Lately, the development of various drug synergy methods has been promoted by the emergence of multiple drug synergy databases. Many of these methods use multimodal data and achieve good results. However, if various modalities of data is given equal consideration without taking into account the differences in features between the two modalities, this may lead to less effective multi-modal learning. We propose a multi-modal contrastive learning method for drug synergy prediction, named MCDSP. Specifically, MCDSP extracts entity embedding features of drugs and cell lines from heterogeneous graphs, while leveraging molecular fingerprints and gene expression features as biomolecular features for drugs and cell lines. These two different types of features serve as two types of modality information. Under the guided of single modality prediction tasks, we evaluated the relevant information of each modality. Through contrastive learning, the prediction bias of the two modalities are reduced, which obtain improved quality of multi-modal feature. Experiments show that MCDSP outperforms baseline methods on large datasets, and it performs well in handling unknown drug combinations and cell lines. MCDSP has demonstrated significant effectiveness in predicting drug synergy.

与单一治疗相比，药物联合治疗在癌症治疗中表现出更强的疗效、更少的副作用和更低的耐药性。然而，传统的湿实验室筛选协同药物组合的方法既昂贵又低效。近年来，多种药物协同数据库的出现促进了各种药物协同方法的发展。其中许多方法使用多模态数据并取得了良好的效果。然而，如果对数据的各种模式给予同等的考虑，而不考虑两种模式之间的特征差异，这可能会导致多模式学习的效率降低。我们提出了一种用于药物协同预测的多模态对比学习方法，命名为MCDSP。具体而言，MCDSP从异构图中提取药物和细胞系的实体嵌入特征，同时利用分子指纹和基因表达特征作为药物和细胞系的生物分子特征。这两种不同类型的特征作为两种类型的情态信息。在单模态预测任务的指导下，对各模态的相关信息进行评价。通过对比学习，减少了两种模态的预测偏差，提高了多模态特征的质量。实验表明，MCDSP在大型数据集上优于基线方法，并且在处理未知药物组合和细胞系方面表现良好。MCDSP在预测药物协同作用方面显示出显著的有效性。本研究通过对比学习将两种模式的特征对准有效成分，从而提高了多模式特征的质量，显著提高了药物协同作用预测模型的性能。我们创新性地利用单模态预测任务指导下的对比学习，使本研究有别于以往的研究，为药物协同作用预测提供了新的工具。

{"title":"Multi-modal contrastive drug synergy prediction model guided by single modality","authors":"Tong Luo, Zheng Zhang, Xian-gan Chen, Zhi Li","doi":"10.1186/s13321-025-01087-0","DOIUrl":"10.1186/s13321-025-01087-0","url":null,"abstract":"<div><p>Compared to monotherapy, drug combinations exhibit stronger efficacy, fewer side effects, and lower drug resistance in cancer treatment. However, traditional wet-lab methods for screening synergistic drug combinations are both costly and inefficient. Lately, the development of various drug synergy methods has been promoted by the emergence of multiple drug synergy databases. Many of these methods use multimodal data and achieve good results. However, if various modalities of data is given equal consideration without taking into account the differences in features between the two modalities, this may lead to less effective multi-modal learning. We propose a multi-modal contrastive learning method for drug synergy prediction, named MCDSP. Specifically, MCDSP extracts entity embedding features of drugs and cell lines from heterogeneous graphs, while leveraging molecular fingerprints and gene expression features as biomolecular features for drugs and cell lines. These two different types of features serve as two types of modality information. Under the guided of single modality prediction tasks, we evaluated the relevant information of each modality. Through contrastive learning, the prediction bias of the two modalities are reduced, which obtain improved quality of multi-modal feature. Experiments show that MCDSP outperforms baseline methods on large datasets, and it performs well in handling unknown drug combinations and cell lines. MCDSP has demonstrated significant effectiveness in predicting drug synergy.</p></div>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"17 1","pages":""},"PeriodicalIF":5.7,"publicationDate":"2025-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-025-01087-0","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145141369","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

PKSmart: an open-source computational model to predict intravenous pharmacokinetics of small molecules PKSmart：一个开源的计算模型，用于预测静脉小分子的药代动力学

IF 5.7 2区化学 Q1 CHEMISTRY, MULTIDISCIPLINARY

Journal of Cheminformatics

Pub Date : 2025-09-26 DOI: 10.1186/s13321-025-01066-5

Srijit Seal, Maria-Anna Trapotsi, Manas Mahale, Vigneshwari Subramanian, Nigel Greene, Ola Spjuth, Andreas Bender

Drug exposure, a key determinant of drug safety and efficacy, is governed by pharmacokinetic (PK) parameters such as volume of distribution (VDss), clearance (CL), half-life (t½), fraction unbound in plasma (fu), and mean residence time (MRT). In this study, we developed machine learning models to predict human PK parameters for 1,283 unique compounds using molecular structure, physicochemical properties, and predicted animal PK data. Our approach involved a two-stage modeling pipeline. First, we trained models to predict rat, dog, and monkey PK parameters (VDss, CL, fu) from chemical structure and properties for 371 compounds. These models were used to predict animal PK values for 1,283 unique compounds with human PK data. These animal PK predictions were then integrated with molecular descriptors and fingerprints to build Random Forest models for human PK parameters. The models demonstrated consistent performance across nested cross-validation and external validation sets, with predictive accuracy for VDss comparable to proprietary models developed by AstraZeneca. Notably, human VDss and CL predictions achieved external R² values of 0.39 and 0.46, respectively. To support broad accessibility and integration into early drug discovery workflows such as Design-Make-Test-Analyze (DMTA), we developed PKSmart (https://broad.io/PKSmart), a freely available web application. All code and models are also open source, enabling local deployment. To our knowledge, this represents the first public suite of PK prediction models with performance on par with industry standard models.

This study introduces the first publicly available pharmacokinetic (PK) models that match industry-standard predictions, utilizing molecular structural fingerprints, physicochemical properties, and predicted animal PK data to model human pharmacokinetics. Our approach is validated through repeated nested cross-validation and an external test set, including comparing predictions to an industry standard model. The models are released via a web-hosted application (https://broad.io/PKSmart) for wider accessibility and utility in drug development processes.

药物暴露是药物安全性和有效性的关键决定因素，受药代动力学（PK）参数的控制，如分布体积（VDss）、清除率（CL）、半衰期（t½）、血浆中未结合分数（fu）和平均停留时间（MRT）。在这项研究中，我们开发了机器学习模型，利用分子结构、物理化学性质和预测动物PK数据来预测1,283种独特化合物的人类PK参数。我们的方法包括一个两阶段的建模管道。首先，我们训练模型从371种化合物的化学结构和性质来预测大鼠、狗和猴子的PK参数（VDss、CL、fu）。这些模型被用来预测1283种独特化合物与人类PK数据的动物PK值。然后将这些动物PK预测与分子描述符和指纹相结合，构建人类PK参数的随机森林模型。该模型在嵌套交叉验证和外部验证集中表现出一致的性能，其VDss的预测精度可与阿斯利康开发的专有模型相媲美。值得注意的是，人类VDss和CL预测的外部R2值分别为0.39和0.46。为了支持广泛的可访问性并集成到早期药物发现工作流程中，例如设计-制造-测试-分析（DMTA），我们开发了PKSmart (https://broad.io/PKSmart)，这是一个免费的web应用程序。所有的代码和模型也是开源的，支持本地部署。据我们所知，这是第一个公开的PK预测模型套件，其性能与行业标准模型相当。本研究引入了第一个公开可用的药代动力学（PK）模型，该模型符合行业标准预测，利用分子结构指纹图谱、物理化学性质和预测的动物PK数据来模拟人类药代动力学。我们的方法通过重复嵌套交叉验证和外部测试集得到验证，包括将预测与行业标准模型进行比较。这些模型通过网络托管应用程序（https://broad.io/PKSmart）发布，以便在药物开发过程中更广泛地访问和使用。

{"title":"PKSmart: an open-source computational model to predict intravenous pharmacokinetics of small molecules","authors":"Srijit Seal, Maria-Anna Trapotsi, Manas Mahale, Vigneshwari Subramanian, Nigel Greene, Ola Spjuth, Andreas Bender","doi":"10.1186/s13321-025-01066-5","DOIUrl":"10.1186/s13321-025-01066-5","url":null,"abstract":"<p>Drug exposure, a key determinant of drug safety and efficacy, is governed by pharmacokinetic (PK) parameters such as volume of distribution (VDss), clearance (CL), half-life (t½), fraction unbound in plasma (fu), and mean residence time (MRT). In this study, we developed machine learning models to predict human PK parameters for 1,283 unique compounds using molecular structure, physicochemical properties, and predicted animal PK data. Our approach involved a two-stage modeling pipeline. First, we trained models to predict rat, dog, and monkey PK parameters (VDss, CL, fu) from chemical structure and properties for 371 compounds. These models were used to predict animal PK values for 1,283 unique compounds with human PK data. These animal PK predictions were then integrated with molecular descriptors and fingerprints to build Random Forest models for human PK parameters. The models demonstrated consistent performance across nested cross-validation and external validation sets, with predictive accuracy for VDss comparable to proprietary models developed by AstraZeneca. Notably, human VDss and CL predictions achieved external R<sup>2</sup> values of 0.39 and 0.46, respectively. To support broad accessibility and integration into early drug discovery workflows such as Design-Make-Test-Analyze (DMTA), we developed PKSmart (https://broad.io/PKSmart), a freely available web application. All code and models are also open source, enabling local deployment. To our knowledge, this represents the first public suite of PK prediction models with performance on par with industry standard models.</p><p>This study introduces the first publicly available pharmacokinetic (PK) models that match industry-standard predictions, utilizing molecular structural fingerprints, physicochemical properties, and predicted animal PK data to model human pharmacokinetics. Our approach is validated through repeated nested cross-validation and an external test set, including comparing predictions to an industry standard model. The models are released via a web-hosted application (https://broad.io/PKSmart) for wider accessibility and utility in drug development processes.</p>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"17 1","pages":""},"PeriodicalIF":5.7,"publicationDate":"2025-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-025-01066-5","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145141367","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Evaluating ligand docking methods for drugging protein–protein interfaces: insights from AlphaFold2 and molecular dynamics refinement 评估药物蛋白-蛋白界面的配体对接方法：来自AlphaFold2和分子动力学改进的见解

IF 5.7 2区化学 Q1 CHEMISTRY, MULTIDISCIPLINARY

Journal of Cheminformatics

Pub Date : 2025-09-25 DOI: 10.1186/s13321-025-01067-4

Jordi Gómez Borrego, Marc Torrent Burgas

Advances in docking protocols have significantly enhanced the field of protein–protein interaction (PPI) modulation, with AlphaFold2 (AF2) and molecular dynamics (MD) refinements playing pivotal roles. This study evaluates the performance of AF2 models against experimentally solved structures in docking protocols targeting PPIs. Using a dataset of 16 interactions with validated modulators, we benchmarked eight docking protocols, revealing similar performance between native and AF2 models. Local docking strategies outperformed blind docking, with TankBind_local and Glide providing the best results across the structural types tested. MD simulations and other ensemble generation algorithms such as AlphaFlow, refined both native and AF2 models, improving docking outcomes but showing significant variability across conformations. These results suggest that, while structural refinement can enhance docking in some cases, overall performance appears to be constrained by limitations in scoring functions and docking methodologies. Although protein ensembles can improve virtual screening, predicting the most effective conformations for docking remains a challenge. These findings support the use of AF2-generated structures in docking protocols targeting PPIs and highlight the need to improve current scoring methodologies.

This study provides a systematic benchmark of docking protocols applied to protein–proteininteractions (PPIs) using both experimentally solved structures and AlphaFold2 models. Byintegrating molecular dynamics ensembles and AlphaFlow-generated conformations, we showthat structural refinement improves docking outcomes in selected cases, but overallperformance remains constrained by docking scoring function limitations. Our analysis showsthat AlphaFold2 models perform comparably to native structures in PPI docking, validating theiruse when experimental data are unavailable. These results establish a reference framework forfuture PPI-focused virtual screening and underscore the need for improved scoring functionsand ensemble-based approaches to better exploit emerging structural prediction tools.

对接协议的进展显著增强了蛋白-蛋白相互作用（PPI）调节领域，其中AlphaFold2 （AF2）和分子动力学（MD）的改进发挥了关键作用。本研究针对针对PPIs的对接协议中实验解决的结构，评估了AF2模型的性能。使用包含16个经过验证的调制器交互的数据集，我们对8个对接协议进行了基准测试，发现本地模型和AF2模型之间的性能相似。局部对接策略优于盲对接策略，TankBind_local和Glide在测试的结构类型中提供了最好的结果。MD模拟和其他集成生成算法（如AlphaFlow）改进了原生模型和AF2模型，改善了对接结果，但显示出不同构象之间的显著差异。这些结果表明，虽然结构优化在某些情况下可以增强对接，但总体性能似乎受到评分函数和对接方法的限制。尽管蛋白质集合可以改善虚拟筛选，但预测最有效的对接构象仍然是一个挑战。这些发现支持在针对PPIs的对接协议中使用af2生成的结构，并强调了改进当前评分方法的必要性。该研究通过实验解决的结构和AlphaFold2模型，为蛋白质-蛋白质相互作用（PPIs）的对接协议提供了系统的基准。通过整合分子动力学集成和alphaflow生成的构象，我们发现结构优化在某些情况下改善了对接结果，但总体性能仍然受到对接评分函数的限制。我们的分析表明，AlphaFold2模型在PPI对接中的表现与原生结构相当，验证了它们在实验数据不可用时的使用。这些结果为未来以ppi为重点的虚拟筛选建立了参考框架，并强调了改进评分功能和基于集成的方法以更好地利用新兴结构预测工具的必要性。

{"title":"Evaluating ligand docking methods for drugging protein–protein interfaces: insights from AlphaFold2 and molecular dynamics refinement","authors":"Jordi Gómez Borrego, Marc Torrent Burgas","doi":"10.1186/s13321-025-01067-4","DOIUrl":"10.1186/s13321-025-01067-4","url":null,"abstract":"<p>Advances in docking protocols have significantly enhanced the field of protein–protein interaction (PPI) modulation, with AlphaFold2 (AF2) and molecular dynamics (MD) refinements playing pivotal roles. This study evaluates the performance of AF2 models against experimentally solved structures in docking protocols targeting PPIs. Using a dataset of 16 interactions with validated modulators, we benchmarked eight docking protocols, revealing similar performance between native and AF2 models. Local docking strategies outperformed blind docking, with TankBind_local and Glide providing the best results across the structural types tested. MD simulations and other ensemble generation algorithms such as AlphaFlow, refined both native and AF2 models, improving docking outcomes but showing significant variability across conformations. These results suggest that, while structural refinement can enhance docking in some cases, overall performance appears to be constrained by limitations in scoring functions and docking methodologies. Although protein ensembles can improve virtual screening, predicting the most effective conformations for docking remains a challenge. These findings support the use of AF2-generated structures in docking protocols targeting PPIs and highlight the need to improve current scoring methodologies.</p><p>This study provides a systematic benchmark of docking protocols applied to protein–proteininteractions (PPIs) using both experimentally solved structures and AlphaFold2 models. Byintegrating molecular dynamics ensembles and AlphaFlow-generated conformations, we showthat structural refinement improves docking outcomes in selected cases, but overallperformance remains constrained by docking scoring function limitations. Our analysis showsthat AlphaFold2 models perform comparably to native structures in PPI docking, validating theiruse when experimental data are unavailable. These results establish a reference framework forfuture PPI-focused virtual screening and underscore the need for improved scoring functionsand ensemble-based approaches to better exploit emerging structural prediction tools.</p>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"17 1","pages":""},"PeriodicalIF":5.7,"publicationDate":"2025-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-025-01067-4","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145133533","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Cache: Utilizing ultra-large library screening in Rosetta to identify novel binders of the WD-repeat domain of Leucine-Rich Repeat Kinase 2 缓存：利用Rosetta的超大文库筛选，鉴定富亮氨酸重复激酶2的WD-repeat结构域的新结合物

IF 5.7 2区化学 Q1 CHEMISTRY, MULTIDISCIPLINARY

Journal of Cheminformatics

Pub Date : 2025-09-25 DOI: 10.1186/s13321-025-01084-3

Fabian Liessmann, Paul Eisenhuth, Alexander Fürll, Oanh Vu, Rocco Moretti, Jens Meiler

In this study, we present a pipeline for identifying novel ligands targeting the Tryptophan-Aspartate-Repeat domain 40 (WDR40) of Leucine-Rich Repeat Kinase 2 (LRRK2), a protein associated with Parkinson’s disease, as part of the first Critical Assessment of Computational Hit-finding Experiments (CACHE) challenge, a blind benchmark experiment for drug discovery. Mutations in this protein are the most common genetic cause of familial Parkinson’s disease, yet this target remains understudied. We conducted an ultra-large library screening (ULLS) of the Enamine REAL space using a newly developed evolutionary algorithm, RosettaEvolutionaryLigand (REvoLd), which allows for efficient screening of combinatorial compound libraries. The protocol involved refining the target structure with molecular dynamic simulations, identifying a binding site via blind-docking, and optimizing compounds through REvoLd, culminating in a manual selection amongst the top-scoring REvoLd hits. A single binder molecule was identified that derived from the combination of two Enamine building blocks. In the second round, derivatives of the hit compound were used as input for REvoLd to further sample within the Enamine REAL space. Ultimately, a total of five molecules were identified, from which three show a measurable dissociation constant K(_D) value better than 150 (upmu) μm, showcasing the effectiveness of this approach. However, it also highlighted shortcomings, such as the preference for nitrogen-rich rings in the RosettaLigand scoring function.

We introduce the first real-world application for REvoLd, an evolutionary docking algorithm enabling efficient ultra-large library screening for flexible protein targets. Our approach identified novel binders for the WDR40 domain of LRRK2 within the CACHE challenge #1, representing the first prospective validation of REvoLd. Here, we present a preparation pipeline to allow exploration of a large protein pocket with unspecific binding areas, and unlike prior brute-force docking efforts, our method integrates receptor flexibility and combinatorial chemistry optimization.

在这项研究中，我们提出了一个管道，用于鉴定针对富含亮氨酸重复激酶2 （LRRK2）的色氨酸-天冬氨酸-重复结构域40 （WDR40）的新型配体，这是与帕金森病相关的蛋白质，作为计算命中发现实验（CACHE）挑战的第一个关键评估的一部分，这是药物发现的盲基准实验。这种蛋白质的突变是家族性帕金森病最常见的遗传原因，但这一目标仍未得到充分研究。我们使用新开发的进化算法rosettaevolutionaryigand （REvoLd）对Enamine REAL空间进行了超大型文库筛选（ULLS），该算法允许有效筛选组合化合物文库。该方案包括通过分子动力学模拟来优化目标结构，通过盲对接确定结合位点，并通过REvoLd优化化合物，最终在REvoLd得分最高的命中中进行手动选择。发现了一种由两个烯胺基元组合而成的单一粘结剂分子。在第二轮中，使用命中化合物的衍生物作为REvoLd的输入，在Enamine REAL空间内进一步采样。最终，共鉴定出5个分子，其中3个分子的解离常数K $$_D$$值优于150 $$upmu$$ μm，证明了该方法的有效性。然而，它也突出了缺点，例如在RosettaLigand评分功能中对富氮环的偏好。我们介绍了REvoLd的第一个实际应用，REvoLd是一种进化对接算法，可以对灵活的蛋白质靶点进行高效的超大文库筛选。我们的方法在CACHE挑战＃1中发现了LRRK2的WDR40结构域的新结合物，代表了REvoLd的首次前瞻性验证。在这里，我们提出了一个制备管道，允许探索具有非特异性结合区域的大蛋白质口袋，与之前的暴力对接工作不同，我们的方法集成了受体灵活性和组合化学优化。

{"title":"Cache: Utilizing ultra-large library screening in Rosetta to identify novel binders of the WD-repeat domain of Leucine-Rich Repeat Kinase 2","authors":"Fabian Liessmann, Paul Eisenhuth, Alexander Fürll, Oanh Vu, Rocco Moretti, Jens Meiler","doi":"10.1186/s13321-025-01084-3","DOIUrl":"10.1186/s13321-025-01084-3","url":null,"abstract":"<p>In this study, we present a pipeline for identifying novel ligands targeting the Tryptophan-Aspartate-Repeat domain 40 (WDR40) of Leucine-Rich Repeat Kinase 2 (LRRK2), a protein associated with Parkinson’s disease, as part of the first Critical Assessment of Computational Hit-finding Experiments (CACHE) challenge, a blind benchmark experiment for drug discovery. Mutations in this protein are the most common genetic cause of familial Parkinson’s disease, yet this target remains understudied. We conducted an ultra-large library screening (ULLS) of the Enamine REAL space using a newly developed evolutionary algorithm, RosettaEvolutionaryLigand (REvoLd), which allows for efficient screening of combinatorial compound libraries. The protocol involved refining the target structure with molecular dynamic simulations, identifying a binding site via blind-docking, and optimizing compounds through REvoLd, culminating in a manual selection amongst the top-scoring REvoLd hits. A single binder molecule was identified that derived from the combination of two Enamine building blocks. In the second round, derivatives of the hit compound were used as input for REvoLd to further sample within the Enamine REAL space. Ultimately, a total of five molecules were identified, from which three show a measurable dissociation constant K<span>(_D)</span> value better than 150 <span>(upmu)</span> μm, showcasing the effectiveness of this approach. However, it also highlighted shortcomings, such as the preference for nitrogen-rich rings in the RosettaLigand scoring function.</p><p>We introduce the first real-world application for REvoLd, an evolutionary docking algorithm enabling efficient ultra-large library screening for flexible protein targets. Our approach identified novel binders for the WDR40 domain of LRRK2 within the CACHE challenge #1, representing the first prospective validation of REvoLd. Here, we present a preparation pipeline to allow exploration of a large protein pocket with unspecific binding areas, and unlike prior brute-force docking efforts, our method integrates receptor flexibility and combinatorial chemistry optimization.</p>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"17 1","pages":""},"PeriodicalIF":5.7,"publicationDate":"2025-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-025-01084-3","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145141368","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Contrastive explanations for machine learning predictions in chemistry 化学中机器学习预测的对比解释

IF 5.7 2区化学 Q1 CHEMISTRY, MULTIDISCIPLINARY

Journal of Cheminformatics

Pub Date : 2025-09-23 DOI: 10.1186/s13321-025-01100-6

Alec Lamens, Jürgen Bajorath

The concept of contrastive explanations originating from human reasoning is used in explainable artificial intelligence. In machine learning, contrastive explanations relate alternative prediction outcomes to each other involving the identification of features leading to opposing model decisions. We introduce a methodological framework for deriving contrastive explanations for machine learning models in chemistry to systematically generate intuitive explanations of predictions in high-dimensional feature spaces. The molecular contrastive explanations (MolCE) methodology explores alternative model decisions by generating virtual analogues of test compounds through replacements of molecular building blocks and quantifies the degree of “contrastive shifts” resulting from changes in model probability distributions. In a proof-of-concept study, MolCE was applied to explain selectivity predictions of ligands of D2-like dopamine receptor isoforms.

源于人类推理的对比解释概念被用于可解释的人工智能。在机器学习中，对比解释将不同的预测结果相互关联，涉及识别导致相反模型决策的特征。我们引入了一种方法框架，用于推导化学中机器学习模型的对比解释，以系统地生成高维特征空间预测的直观解释。分子对比解释（MolCE）方法通过替换分子构建块来生成测试化合物的虚拟类似物，并量化由模型概率分布变化引起的“对比位移”的程度，从而探索替代模型决策。在一项概念验证研究中，MolCE被用于解释d2样多巴胺受体同种异构体配体的选择性预测。

引用次数: 0

Cheminformatics Microservice V3: a web portal for chemical structure manipulation and analysis 化学信息学微服务V3：用于化学结构操作和分析的门户网站

IF 5.7 2区化学 Q1 CHEMISTRY, MULTIDISCIPLINARY

Journal of Cheminformatics

Pub Date : 2025-09-23 DOI: 10.1186/s13321-025-01094-1

Kohulan Rajan, Venkata Chandrasekhar, Nisha Sharma, Sri Ram Sagar Kanakam, Felix Baensch, Christoph Steinbeck

The widespread adoption of open-source cheminformatics toolkits remains constrained by technical implementation barriers, including complex installation procedures, dependency management, and integration challenges. Here, we present Cheminformatics Microservice V3, a significant update to the existing platform that provides unified programmatic access to cheminformatics libraries, including RDKit, Chemistry Development Kit (CDK), and Open Babel through a RESTful API framework. This latest version features a newly developed, interactive web-based frontend built with React, providing users with an intuitive graphical interface for manipulating and analysing chemical structures. The frontend supports essential cheminformatics operations, including structure editing, PubChem database integration, batch molecular processing, and standardised InChI/RInChI identifier generation. The microservice V3 addresses critical accessibility barriers in computational chemistry by providing researchers with immediate access to analytical tools, eliminating the need for specialised technical expertise or complex software installations. This approach facilitates reproducible research workflows and broadens the utilisation of cheminformatics methodologies across interdisciplinary research communities. The platform is publicly accessible at https://app.naturalproducts.net, and the complete source code and documentation are available on GitHub.

开源化学信息学工具包的广泛采用仍然受到技术实现障碍的限制，包括复杂的安装过程、依赖管理和集成挑战。在这里，我们介绍了Cheminformatics Microservice V3，这是对现有平台的重大更新，它通过RESTful API框架提供了对化学信息学库的统一编程访问，包括RDKit、Chemistry Development Kit （CDK）和Open Babel。这个最新版本的特点是使用React构建了一个新开发的交互式基于web的前端，为用户提供了一个直观的图形界面来操作和分析化学结构。前端支持基本的化学信息学操作，包括结构编辑、PubChem数据库集成、批量分子处理和标准化的InChI/RInChI标识符生成。微服务V3通过为研究人员提供即时访问分析工具，消除了对专业技术知识或复杂软件安装的需求，解决了计算化学中关键的可访问性障碍。这种方法促进了可重复的研究工作流程，并扩大了化学信息学方法在跨学科研究社区的应用。该平台可在https://app.naturalproducts.net上公开访问，完整的源代码和文档可在GitHub上获得。

{"title":"Cheminformatics Microservice V3: a web portal for chemical structure manipulation and analysis","authors":"Kohulan Rajan, Venkata Chandrasekhar, Nisha Sharma, Sri Ram Sagar Kanakam, Felix Baensch, Christoph Steinbeck","doi":"10.1186/s13321-025-01094-1","DOIUrl":"10.1186/s13321-025-01094-1","url":null,"abstract":"<div><p>The widespread adoption of open-source cheminformatics toolkits remains constrained by technical implementation barriers, including complex installation procedures, dependency management, and integration challenges. Here, we present <i>Cheminformatics Microservice V3</i>, a significant update to the existing platform that provides unified programmatic access to cheminformatics libraries, including RDKit, Chemistry Development Kit (CDK), and Open Babel through a RESTful API framework. This latest version features a newly developed, interactive web-based frontend built with React, providing users with an intuitive graphical interface for manipulating and analysing chemical structures. The frontend supports essential cheminformatics operations, including structure editing, PubChem database integration, batch molecular processing, and standardised InChI/RInChI identifier generation. The microservice V3 addresses critical accessibility barriers in computational chemistry by providing researchers with immediate access to analytical tools, eliminating the need for specialised technical expertise or complex software installations. This approach facilitates reproducible research workflows and broadens the utilisation of cheminformatics methodologies across interdisciplinary research communities. The platform is publicly accessible at https://app.naturalproducts.net, and the complete source code and documentation are available on GitHub.</p></div>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"17 1","pages":""},"PeriodicalIF":5.7,"publicationDate":"2025-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-025-01094-1","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145110667","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A comprehensive landscape of AI applications in broad-spectrum drug interaction prediction: a systematic review 人工智能在广谱药物相互作用预测中的应用：系统综述

IF 5.7 2区化学 Q1 CHEMISTRY, MULTIDISCIPLINARY

Journal of Cheminformatics

Pub Date : 2025-09-19 DOI: 10.1186/s13321-025-01093-2

Nour H. Marzouk, Sahar Selim, Mustafa Elattar, Mai S. Mabrouk, Mohamed Mysara

In drug development, managing interactions such as drug–drug, drug–disease, and drug–nutrient is critical for ensuring the safety and efficacy of pharmacological treatments. These interactions often overlap, forming a complex, interconnected landscape that necessitates accurate prediction to improve patient outcomes and support evidence-based care. Recent advances in artificial intelligence (AI), powered by large-scale datasets (e.g., DrugBank, TWOSIDES, SIDER), have significantly enhanced interaction prediction. Machine learning, deep learning, and graph-based models show great promise, but challenges persist, including data imbalance, noisy sources, Limited explainability, and underrepresentation of certain types of interactions. This systematic review of 147 studies (2018–2024) is the first to comprehensively map AI applications across major interaction types. We present a detailed taxonomy of models and datasets, emphasizing the growing roles of large language models and knowledge graphs in overcoming key limitations. Their integration—alongside explainable AI tools—enhances transparency, paving the way for AI-driven systems that proactively mitigate adverse interactions. By identifying the most promising approaches and critical research gaps, this review lays the groundwork for advancing more robust, interpretable, and personalized models for drug interaction prediction.

在药物开发中，管理药物-药物、药物-疾病和药物-营养等相互作用对于确保药物治疗的安全性和有效性至关重要。这些相互作用经常重叠，形成一个复杂的，相互关联的景观，需要准确的预测，以改善患者的结果和支持循证护理。人工智能（AI）的最新进展，由大规模数据集（例如，DrugBank， TWOSIDES， SIDER）提供支持，显著增强了交互预测。机器学习、深度学习和基于图的模型显示出巨大的前景，但挑战仍然存在，包括数据不平衡、噪声源、有限的可解释性以及某些类型交互的代表性不足。这项对147项研究（2018-2024）的系统回顾是第一个全面描绘主要交互类型的人工智能应用的研究。我们提出了模型和数据集的详细分类，强调了大型语言模型和知识图在克服关键限制方面日益增长的作用。它们与可解释的人工智能工具相结合，提高了透明度，为人工智能驱动的系统主动减轻不利的相互作用铺平了道路。通过确定最有希望的方法和关键的研究差距，本综述为推进更稳健、可解释和个性化的药物相互作用预测模型奠定了基础。

{"title":"A comprehensive landscape of AI applications in broad-spectrum drug interaction prediction: a systematic review","authors":"Nour H. Marzouk, Sahar Selim, Mustafa Elattar, Mai S. Mabrouk, Mohamed Mysara","doi":"10.1186/s13321-025-01093-2","DOIUrl":"10.1186/s13321-025-01093-2","url":null,"abstract":"<div><p>In drug development, managing interactions such as drug–drug, drug–disease, and drug–nutrient is critical for ensuring the safety and efficacy of pharmacological treatments. These interactions often overlap, forming a complex, interconnected landscape that necessitates accurate prediction to improve patient outcomes and support evidence-based care. Recent advances in artificial intelligence (AI), powered by large-scale datasets (e.g., DrugBank, TWOSIDES, SIDER), have significantly enhanced interaction prediction. Machine learning, deep learning, and graph-based models show great promise, but challenges persist, including data imbalance, noisy sources, Limited explainability, and underrepresentation of certain types of interactions. This systematic review of 147 studies (2018–2024) is the first to comprehensively map AI applications across major interaction types. We present a detailed taxonomy of models and datasets, emphasizing the growing roles of large language models and knowledge graphs in overcoming key limitations. Their integration—alongside explainable AI tools—enhances transparency, paving the way for AI-driven systems that proactively mitigate adverse interactions. By identifying the most promising approaches and critical research gaps, this review lays the groundwork for advancing more robust, interpretable, and personalized models for drug interaction prediction.</p></div>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"17 1","pages":""},"PeriodicalIF":5.7,"publicationDate":"2025-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-025-01093-2","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145079055","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

MetaboGNN: predicting liver metabolic stability with graph neural networks and cross-species data MetaboGNN：用图神经网络和跨物种数据预测肝脏代谢稳定性

IF 5.7 2区化学 Q1 CHEMISTRY, MULTIDISCIPLINARY

Journal of Cheminformatics

Pub Date : 2025-09-03 DOI: 10.1186/s13321-025-01089-y

Jun Hyeong Park, Ri Han, Junbo Jang, Jisan Kim, Joonki Paik, Jaesung Heo, Yoonji Lee

The metabolic stability of a drug is a crucial determinant of its pharmacokinetic properties, including clearance, half-life, and oral bioavailability. Accurate predictions of metabolic stability can significantly streamline the drug discovery process. In this study, we present MetaboGNN, an advanced model for predicting liver metabolic stability based on Graph Neural Networks (GNNs) and Graph Contrastive Learning (GCL). Using a high-quality dataset from the 2023 South Korea Data Challenge for Drug Discovery, which comprises 3,498 training molecules and 483 test molecules, we presented molecular structures as graphs to capture the intricate structural relationships that influence metabolic stability. A GCL-driven pretraining step was employed to enhance model generalizability by learning robust, transferable graph-level representations. Notably, incorporating interspecies differences between human liver microsomes (HLM) and mouse liver microsomes (MLM) further improved predictive accuracy, achieving Root Mean Square Error (RMSE) values of 27.91 (HLM) and 27.86 (MLM), both expressed as the percentage of parent compound remaining after a 30-min incubation. Compared to traditional approaches, MetaboGNN demonstrates superior predictive performance and highlights the importance of considering interspecies enzymatic variations. In addition, attention-based analysis identified key molecular fragments associated with metabolic stability, highlighting chemically meaningful structural determinants. These findings establish MetaboGNN as a powerful tool for metabolic stability prediction, supporting more efficient lead optimization processes in drug discovery.

药物的代谢稳定性是其药代动力学特性的关键决定因素，包括清除率、半衰期和口服生物利用度。代谢稳定性的准确预测可以大大简化药物发现过程。在这项研究中，我们提出了MetaboGNN，一个基于图神经网络（GNNs）和图对比学习（GCL）预测肝脏代谢稳定性的先进模型。使用来自2023年韩国药物发现数据挑战赛的高质量数据集，其中包括3,498个训练分子和483个测试分子，我们将分子结构以图形的形式呈现，以捕获影响代谢稳定性的复杂结构关系。采用gcl驱动的预训练步骤，通过学习鲁棒的、可转移的图级表示来增强模型的泛化性。值得注意的是，纳入人肝微粒体（HLM）和小鼠肝微粒体（MLM）的种间差异进一步提高了预测准确性，实现了均方根误差（RMSE）值为27.91 （HLM）和27.86 (MLM)，均表示为孵育30分钟后母体化合物残留的百分比。与传统方法相比，MetaboGNN显示出优越的预测性能，并强调了考虑物种间酶变化的重要性。此外，基于注意力的分析确定了与代谢稳定性相关的关键分子片段，突出了化学上有意义的结构决定因素。这些发现确立了MetaboGNN作为代谢稳定性预测的强大工具，支持药物发现中更有效的先导物优化过程。

{"title":"MetaboGNN: predicting liver metabolic stability with graph neural networks and cross-species data","authors":"Jun Hyeong Park, Ri Han, Junbo Jang, Jisan Kim, Joonki Paik, Jaesung Heo, Yoonji Lee","doi":"10.1186/s13321-025-01089-y","DOIUrl":"10.1186/s13321-025-01089-y","url":null,"abstract":"<div><p>The metabolic stability of a drug is a crucial determinant of its pharmacokinetic properties, including clearance, half-life, and oral bioavailability. Accurate predictions of metabolic stability can significantly streamline the drug discovery process. In this study, we present <i>MetaboGNN</i>, an advanced model for predicting liver metabolic stability based on Graph Neural Networks (GNNs) and Graph Contrastive Learning (GCL). Using a high-quality dataset from the 2023 South Korea Data Challenge for Drug Discovery, which comprises 3,498 training molecules and 483 test molecules, we presented molecular structures as graphs to capture the intricate structural relationships that influence metabolic stability. A GCL-driven pretraining step was employed to enhance model generalizability by learning robust, transferable graph-level representations. Notably, incorporating interspecies differences between human liver microsomes (HLM) and mouse liver microsomes (MLM) further improved predictive accuracy, achieving Root Mean Square Error (RMSE) values of 27.91 (HLM) and 27.86 (MLM), both expressed as the percentage of parent compound remaining after a 30-min incubation. Compared to traditional approaches, <i>MetaboGNN</i> demonstrates superior predictive performance and highlights the importance of considering interspecies enzymatic variations. In addition, attention-based analysis identified key molecular fragments associated with metabolic stability, highlighting chemically meaningful structural determinants. These findings establish <i>MetaboGNN</i> as a powerful tool for metabolic stability prediction, supporting more efficient lead optimization processes in drug discovery.</p></div>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"17 1","pages":""},"PeriodicalIF":5.7,"publicationDate":"2025-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-025-01089-y","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144934590","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0