首页 > 最新文献

Journal of Cheminformatics最新文献

英文 中文
Comment on “Advancing material property prediction: using physics-informed machine learning models for viscosity” 对“推进材料性能预测:使用物理信息的粘度机器学习模型”的评论
IF 5.7 2区 化学 Q1 CHEMISTRY, MULTIDISCIPLINARY Pub Date : 2025-08-28 DOI: 10.1186/s13321-025-01070-9
Maximilian Fleck, Samir Darouich, Marcelle B. M. Spera, Niels Hansen

When data availability is limited, the prediction of properties through purely data-driven machine learning (ML) is challenging. Integrating physically-based modeling techniques into ML methods may lead to better performance. In a recent work by Chew et al. (“Advancing material property prediction: using physics-informed machine learning models for viscosity”) descriptors from classical molecular dynamics (MD) simulations were included into a quantitative structure–property relationship to accurately predict temperature-dependent viscosity of pure liquids. Through feature importance analysis, the authors found that heat of vaporization was the most relevant descriptor for the prediction of viscosity. In this comment, we would like to discuss the physical origin of this finding by referring to Eyring’s rate theory, and develop an alternative modeling approach using a thermodynamic-based architecture that requires less input data.

当数据可用性有限时,通过纯数据驱动的机器学习(ML)预测属性是具有挑战性的。将基于物理的建模技术集成到ML方法中可能会带来更好的性能。在Chew等人最近的一项工作(“推进材料性能预测:使用物理信息的粘度机器学习模型”)中,来自经典分子动力学(MD)模拟的描述符被纳入定量结构-性能关系中,以准确预测纯液体的温度依赖性粘度。通过特征重要性分析,发现汽化热是预测粘度最相关的描述符。在这篇评论中,我们想通过参考Eyring的速率理论来讨论这一发现的物理起源,并开发一种使用基于热力学的架构的替代建模方法,该方法需要较少的输入数据。
{"title":"Comment on “Advancing material property prediction: using physics-informed machine learning models for viscosity”","authors":"Maximilian Fleck,&nbsp;Samir Darouich,&nbsp;Marcelle B. M. Spera,&nbsp;Niels Hansen","doi":"10.1186/s13321-025-01070-9","DOIUrl":"10.1186/s13321-025-01070-9","url":null,"abstract":"<div><p>When data availability is limited, the prediction of properties through purely data-driven machine learning (ML) is challenging. Integrating physically-based modeling techniques into ML methods may lead to better performance. In a recent work by Chew et al. (“<i>Advancing material property prediction: using physics-informed machine learning models for viscosity</i>”) descriptors from classical molecular dynamics (MD) simulations were included into a quantitative structure–property relationship to accurately predict temperature-dependent viscosity of pure liquids. Through feature importance analysis, the authors found that heat of vaporization was the most relevant descriptor for the prediction of viscosity. In this comment, we would like to discuss the physical origin of this finding by referring to Eyring’s rate theory, and develop an alternative modeling approach using a thermodynamic-based architecture that requires less input data.</p></div>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"17 1","pages":""},"PeriodicalIF":5.7,"publicationDate":"2025-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-025-01070-9","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144910811","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Systematic benchmarking of 13 AI methods for predicting cyclic peptide membrane permeability 13种预测环肽膜通透性的人工智能方法的系统基准测试
IF 5.7 2区 化学 Q1 CHEMISTRY, MULTIDISCIPLINARY Pub Date : 2025-08-28 DOI: 10.1186/s13321-025-01083-4
Wei Liu, Jianguo Li, Chandra S. Verma, Hwee Kuan Lee

Cyclic peptides are promising drug candidates due to their ability to modulate intracellular protein–protein interactions, a property often inaccessible to small molecules. However, their typically poor membrane permeability limits therapeutic applicability. Accurate computational prediction of permeability can accelerate the identification of cell-permeable candidates, reducing reliance on time-consuming and costly experimental screening. Although deep learning has shown potential in predicting molecular properties, its application in permeability prediction remains underexplored. A systematic evaluation of these models is important to assess current capabilities and guide future development. In this study, we conduct a comprehensive benchmark of 13 machine learning models for predicting cyclic peptide membrane permeability. These models cover four types of molecular representations: fingerprints, SMILES strings, molecular graphs, and 2D images. We use experimentally measured PAMPA permeability data from the CycPeptMPDB database, comprising nearly 6000 cyclic peptides, and evaluate performance across three prediction tasks: regression, binary classification, and soft-label classification. Two data-splitting strategies, random split and scaffold split, are used to assess the generalizability of trained models. Our results show that model performance depends strongly on molecular representation and model architecture. Graph-based models, particularly the Directed Message Passing Neural Network (DMPNN), consistently achieve top performance across tasks. Regression generally outperforms classification. Scaffold-based splitting, although intended to more rigorously assess generalization, yields substantially lower model generalizability compared to random splitting. Comparing prediction errors with experimental variability highlights the practical value of current models while also indicating room for further improvement.

环肽是很有希望的候选药物,因为它们能够调节细胞内蛋白质-蛋白质相互作用,这是小分子通常无法获得的特性。然而,它们典型的膜渗透性差限制了治疗的适用性。准确的渗透性计算预测可以加速细胞渗透性候选物的识别,减少对耗时和昂贵的实验筛选的依赖。尽管深度学习在预测分子性质方面显示出潜力,但其在渗透率预测方面的应用仍未得到充分探索。对这些模型进行系统的评估对于评估当前的能力和指导未来的发展非常重要。在这项研究中,我们对13种预测环肽膜通透性的机器学习模型进行了综合基准测试。这些模型涵盖了四种类型的分子表示:指纹、SMILES字符串、分子图和2D图像。我们使用实验测量的PAMPA渗透率数据来自CycPeptMPDB数据库,包含近6000个环肽,并评估了三个预测任务的性能:回归、二元分类和软标签分类。使用随机分割和支架分割两种数据分割策略来评估训练模型的泛化性。我们的研究结果表明,模型性能在很大程度上取决于分子表示和模型结构。基于图的模型,特别是定向消息传递神经网络(DMPNN),可以在任务之间始终实现最佳性能。回归通常优于分类。基于支架的分裂虽然旨在更严格地评估泛化,但与随机分裂相比,产生的模型泛化性要低得多。将预测误差与实验变率进行比较,突出了当前模型的实用价值,同时也表明了进一步改进的空间。
{"title":"Systematic benchmarking of 13 AI methods for predicting cyclic peptide membrane permeability","authors":"Wei Liu,&nbsp;Jianguo Li,&nbsp;Chandra S. Verma,&nbsp;Hwee Kuan Lee","doi":"10.1186/s13321-025-01083-4","DOIUrl":"10.1186/s13321-025-01083-4","url":null,"abstract":"<div><p>Cyclic peptides are promising drug candidates due to their ability to modulate intracellular protein–protein interactions, a property often inaccessible to small molecules. However, their typically poor membrane permeability limits therapeutic applicability. Accurate computational prediction of permeability can accelerate the identification of cell-permeable candidates, reducing reliance on time-consuming and costly experimental screening. Although deep learning has shown potential in predicting molecular properties, its application in permeability prediction remains underexplored. A systematic evaluation of these models is important to assess current capabilities and guide future development. In this study, we conduct a comprehensive benchmark of 13 machine learning models for predicting cyclic peptide membrane permeability. These models cover four types of molecular representations: fingerprints, SMILES strings, molecular graphs, and 2D images. We use experimentally measured PAMPA permeability data from the CycPeptMPDB database, comprising nearly 6000 cyclic peptides, and evaluate performance across three prediction tasks: regression, binary classification, and soft-label classification. Two data-splitting strategies, random split and scaffold split, are used to assess the generalizability of trained models. Our results show that model performance depends strongly on molecular representation and model architecture. Graph-based models, particularly the Directed Message Passing Neural Network (DMPNN), consistently achieve top performance across tasks. Regression generally outperforms classification. Scaffold-based splitting, although intended to more rigorously assess generalization, yields substantially lower model generalizability compared to random splitting. Comparing prediction errors with experimental variability highlights the practical value of current models while also indicating room for further improvement.</p></div>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"17 1","pages":""},"PeriodicalIF":5.7,"publicationDate":"2025-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-025-01083-4","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144909733","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
xBitterT5: an explainable transformer-based framework with multimodal inputs for identifying bitter-taste peptides xbitt5:一个可解释的基于转换器的框架,具有多模态输入,用于识别苦味肽
IF 5.7 2区 化学 Q1 CHEMISTRY, MULTIDISCIPLINARY Pub Date : 2025-08-20 DOI: 10.1186/s13321-025-01078-1
Nguyen Doan Hieu Nguyen, Nhat Truong Pham, Duong Thanh Tran, Leyi Wei, Adeel Malik, Balachandran Manavalan

Bitter peptides (BPs), derived from the hydrolysis of proteins in food, play a crucial role in both food science and biomedicine by influencing taste perception and participating in various physiological processes. Accurate identification of BPs is essential for understanding food quality and potential health impacts. Traditional machine learning approaches for BP identification have relied on conventional feature descriptors, achieving moderate success but struggling with the complexities of biological sequence data. Recent advances utilizing protein language model embedding and meta-learning approaches have improved the accuracy, but frequently neglect the molecular representations of peptides and lack interpretability. In this study, we propose xBitterT5, a novel multimodal and interpretable framework for BP identification that integrates pretrained transformer-based embeddings from BioT5+ with the combination of peptide sequence and its SELFIES molecular representation. Specifically, incorporating both peptide sequences and their molecular strings, xBitterT5 demonstrates superior performance compared to previous methods on the same benchmark datasets. Importantly, the model provides residue-level interpretability, highlighting chemically meaningful substructures that significantly contribute to its bitterness, thus offering mechanistic insights beyond black-box predictions. A user-friendly web server (https://balalab-skku.org/xBitterT5/) and a standalone version (https://github.com/cbbl-skku-org/xBitterT5/) are freely available to support both computational biologists and experimental researchers in peptide-based food and biomedicine.

苦肽(Bitter peptides, BPs)是由食物中的蛋白质水解而成,通过影响味觉和参与多种生理过程,在食品科学和生物医学中发挥着重要作用。准确识别bp对于了解食品质量和潜在的健康影响至关重要。BP识别的传统机器学习方法依赖于传统的特征描述符,取得了中等程度的成功,但在生物序列数据的复杂性方面存在困难。利用蛋白质语言模型嵌入和元学习方法的最新进展提高了准确性,但经常忽略肽的分子表示和缺乏可解释性。在这项研究中,我们提出了一种新的多模态和可解释的BP识别框架xBitterT5,它将来自BioT5+的预训练变压器嵌入与肽序列及其自定义分子表示相结合。具体来说,结合肽序列及其分子链,xBitterT5在相同的基准数据集上比以前的方法表现出更优越的性能。重要的是,该模型提供了残留水平的可解释性,突出了化学上有意义的子结构,这些子结构对其苦味有重要贡献,从而提供了超越黑箱预测的机制见解。一个用户友好的web服务器(https://balalab-skku.org/xBitterT5/)和一个独立的版本(https://github.com/cbbl-skku-org/xBitterT5/)是免费的,以支持计算生物学家和实验研究人员在肽类食品和生物医学。
{"title":"xBitterT5: an explainable transformer-based framework with multimodal inputs for identifying bitter-taste peptides","authors":"Nguyen Doan Hieu Nguyen,&nbsp;Nhat Truong Pham,&nbsp;Duong Thanh Tran,&nbsp;Leyi Wei,&nbsp;Adeel Malik,&nbsp;Balachandran Manavalan","doi":"10.1186/s13321-025-01078-1","DOIUrl":"10.1186/s13321-025-01078-1","url":null,"abstract":"<div><p>Bitter peptides (BPs), derived from the hydrolysis of proteins in food, play a crucial role in both food science and biomedicine by influencing taste perception and participating in various physiological processes. Accurate identification of BPs is essential for understanding food quality and potential health impacts. Traditional machine learning approaches for BP identification have relied on conventional feature descriptors, achieving moderate success but struggling with the complexities of biological sequence data. Recent advances utilizing protein language model embedding and meta-learning approaches have improved the accuracy, but frequently neglect the molecular representations of peptides and lack interpretability. In this study, we propose xBitterT5, a novel multimodal and interpretable framework for BP identification that integrates pretrained transformer-based embeddings from BioT5+ with the combination of peptide sequence and its SELFIES molecular representation. Specifically, incorporating both peptide sequences and their molecular strings, xBitterT5 demonstrates superior performance compared to previous methods on the same benchmark datasets. Importantly, the model provides residue-level interpretability, highlighting chemically meaningful substructures that significantly contribute to its bitterness, thus offering mechanistic insights beyond black-box predictions. A user-friendly web server (https://balalab-skku.org/xBitterT5/) and a standalone version (https://github.com/cbbl-skku-org/xBitterT5/) are freely available to support both computational biologists and experimental researchers in peptide-based food and biomedicine.</p></div>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"17 1","pages":""},"PeriodicalIF":5.7,"publicationDate":"2025-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-025-01078-1","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144880932","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ReactionT5: a pre-trained transformer model for accurate chemical reaction prediction with limited data 反应5:一个预先训练的变压器模型,在有限的数据下进行准确的化学反应预测
IF 5.7 2区 化学 Q1 CHEMISTRY, MULTIDISCIPLINARY Pub Date : 2025-08-19 DOI: 10.1186/s13321-025-01075-4
Tatsuya Sagawa, Ryosuke Kojima

Accurate chemical reaction prediction is critical for reducing both cost and time in drug development. This study introduces ReactionT5, a transformer-based chemical reaction foundation model pre-trained on the Open Reaction Database—a large publicly available reaction dataset. In benchmarks for product prediction, retrosynthesis, and yield prediction, ReactionT5 outperformed existing models. Specifically, ReactionT5 achieved 97.5% accuracy in product prediction, 71.0% in retrosynthesis, and a coefficient of determination of 0.947 in yield prediction. Remarkably, ReactionT5, when fine-tuned with only a limited dataset of reactions, achieved performance on par with models fine-tuned on the complete dataset. Additionally, the visualization of ReactionT5 embeddings illustrates that the model successfully captures and represents the chemical reaction space, indicating effective learning of reaction properties.

Graphical Abstract

准确的化学反应预测对于降低药物开发的成本和时间至关重要。本研究介绍了基于变压器的化学反应基础模型reaction t5,该模型是在Open reaction database(一个大型公开可用的反应数据集)上预先训练的。在产品预测、反合成和产率预测的基准测试中,ReactionT5优于现有模型。其中,ReactionT5的产物预测准确率为97.5%,反合成准确率为71.0%,产率预测的决定系数为0.947。值得注意的是,当仅对有限的反应数据集进行微调时,反动5达到了与在完整数据集上微调的模型相当的性能。此外,reaction t5嵌入的可视化表明,该模型成功捕获并表示了化学反应空间,表明对反应性质的有效学习。图形抽象
{"title":"ReactionT5: a pre-trained transformer model for accurate chemical reaction prediction with limited data","authors":"Tatsuya Sagawa,&nbsp;Ryosuke Kojima","doi":"10.1186/s13321-025-01075-4","DOIUrl":"10.1186/s13321-025-01075-4","url":null,"abstract":"<div><p>Accurate chemical reaction prediction is critical for reducing both cost and time in drug development. This study introduces ReactionT5, a transformer-based chemical reaction foundation model pre-trained on the Open Reaction Database—a large publicly available reaction dataset. In benchmarks for product prediction, retrosynthesis, and yield prediction, ReactionT5 outperformed existing models. Specifically, ReactionT5 achieved 97.5% accuracy in product prediction, 71.0% in retrosynthesis, and a coefficient of determination of 0.947 in yield prediction. Remarkably, ReactionT5, when fine-tuned with only a limited dataset of reactions, achieved performance on par with models fine-tuned on the complete dataset. Additionally, the visualization of ReactionT5 embeddings illustrates that the model successfully captures and represents the chemical reaction space, indicating effective learning of reaction properties.</p><h3>Graphical Abstract</h3>\u0000<div><figure><div><div><picture><source><img></source></picture></div></div></figure></div></div>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"17 1","pages":""},"PeriodicalIF":5.7,"publicationDate":"2025-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-025-01075-4","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144868633","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Improving drug-induced liver injury prediction using graph neural networks with augmented graph features from molecular optimisation 基于分子优化增广图特征的图神经网络改进药物性肝损伤预测
IF 5.7 2区 化学 Q1 CHEMISTRY, MULTIDISCIPLINARY Pub Date : 2025-08-18 DOI: 10.1186/s13321-025-01068-3
Taeyeub Lee, Joram M. Posma

Purpose

Drug-induced liver injury (DILI) is a significant concern in drug development, often leading to the discontinuation of clinical trials and the withdrawal of drugs from the market. This study explores the application of graph neural networks (GNNs) for DILI prediction, using molecular graph representations as the primary input.

Methods

We evaluated several GNN architectures, including Graph Convolutional Networks (GCNs), Graph Attention Networks (GATs), Graph Sample and Aggregation (GraphSAGE), and Graph Isomorphism Networks (GINs), using the latest FDA DILI dataset and other molecular property prediction datasets. We introduce a novel approach that creates a custom graph dataset, driven by molecular optimisation, that incorporates detailed and realistic chemical features such as bond lengths and partial charges as input into the GNN models. We have named our model approach DILIGeNN.

Results

DILIGeNN achieved an AUC of 0.897 on the DILI dataset, surpassing the current state-of-the-art model in the DILI prediction task. Furthermore, DILIGeNN outperformed the state-of-the-art in other graph-based molecular prediction tasks, achieving an AUC of 0.918 on the Clintox dataset, 0.993 on the BBBP dataset, and 0.953 on the BACE dataset, indicating strong generalisation and performance across different datasets.

Conclusion

DILIGeNN, utilising a single graph representation as input, outperforms the state-of-the-art methods in DILI prediction that incorporate both molecular fingerprint and graph-structured data. These findings highlight the effectiveness of our molecular graph generation and the GNN training approach as a powerful tool for early-stage drug development and drug repurposing pipeline.

Scientific Contribution: DILIGeNN is a GNN framework that extracts graph features from 3D optimised molecular structures as is done in target-based drug discovery and molecular docking simulation. Our method is the first to encode spatial and electrostatic information into a single graph representation, as opposed to other work that require multiple graphs or additional chemical descriptors for feature representation. Our approach, using warm starts following repeated early stopping during training, outperforms the current state-of-the-art methods in liver toxicity (DILI), permeability (BBBP) and activity (BACE) prediction tasks.

Graphic Abstract

目的药物性肝损伤(DILI)是药物开发中的一个重要问题,经常导致临床试验中止和药物退出市场。本研究探索了图神经网络(GNNs)在DILI预测中的应用,使用分子图表示作为主要输入。方法利用最新的FDA DILI数据集和其他分子性质预测数据集,我们评估了几种GNN架构,包括图卷积网络(GCNs)、图注意力网络(GATs)、图样本和聚合(GraphSAGE)和图同构网络(GINs)。我们引入了一种新方法,该方法创建了一个自定义图形数据集,由分子优化驱动,该数据集将详细和现实的化学特征(如键长和部分电荷)作为输入输入到GNN模型中。我们将我们的模型方法命名为DILIGeNN。结果diligenn在DILI数据集上的AUC为0.897,在DILI预测任务中超过了目前最先进的模型。此外,DILIGeNN在其他基于图的分子预测任务中表现优于最先进的技术,在Clintox数据集上实现了0.918的AUC,在BBBP数据集上实现了0.993,在BACE数据集上实现了0.953,表明在不同数据集上具有很强的泛化性和性能。结论:使用单个图表示作为输入的diligenn在DILI预测中优于结合分子指纹和图结构数据的最先进方法。这些发现突出了我们的分子图生成和GNN训练方法作为早期药物开发和药物再利用管道的强大工具的有效性。科学贡献:DILIGeNN是一个GNN框架,从3D优化的分子结构中提取图形特征,就像在基于靶标的药物发现和分子对接模拟中所做的那样。我们的方法是第一个将空间和静电信息编码成单个图表示,而不是其他需要多个图或额外的化学描述符来表示特征的工作。我们的方法是在训练中反复提前停止后进行热启动,在肝毒性(DILI)、渗透性(BBBP)和活性(BACE)预测任务中优于当前最先进的方法。图形抽象
{"title":"Improving drug-induced liver injury prediction using graph neural networks with augmented graph features from molecular optimisation","authors":"Taeyeub Lee,&nbsp;Joram M. Posma","doi":"10.1186/s13321-025-01068-3","DOIUrl":"10.1186/s13321-025-01068-3","url":null,"abstract":"<div><h3>Purpose</h3><p>Drug-induced liver injury (DILI) is a significant concern in drug development, often leading to the discontinuation of clinical trials and the withdrawal of drugs from the market. This study explores the application of graph neural networks (GNNs) for DILI prediction, using molecular graph representations as the primary input.</p><h3>Methods</h3><p>We evaluated several GNN architectures, including Graph Convolutional Networks (GCNs), Graph Attention Networks (GATs), Graph Sample and Aggregation (GraphSAGE), and Graph Isomorphism Networks (GINs), using the latest FDA DILI dataset and other molecular property prediction datasets. We introduce a novel approach that creates a custom graph dataset, driven by molecular optimisation, that incorporates detailed and realistic chemical features such as bond lengths and partial charges as input into the GNN models. We have named our model approach DILIGeNN.</p><h3>Results</h3><p>DILIGeNN achieved an AUC of 0.897 on the DILI dataset, surpassing the current state-of-the-art model in the DILI prediction task. Furthermore, DILIGeNN outperformed the state-of-the-art in other graph-based molecular prediction tasks, achieving an AUC of 0.918 on the Clintox dataset, 0.993 on the BBBP dataset, and 0.953 on the BACE dataset, indicating strong generalisation and performance across different datasets.</p><h3>Conclusion</h3><p>DILIGeNN, utilising a single graph representation as input, outperforms the state-of-the-art methods in DILI prediction that incorporate both molecular fingerprint and graph-structured data. These findings highlight the effectiveness of our molecular graph generation and the GNN training approach as a powerful tool for early-stage drug development and drug repurposing pipeline.</p><p>Scientific Contribution: DILIGeNN is a GNN framework that extracts graph features from 3D optimised molecular structures as is done in target-based drug discovery and molecular docking simulation. Our method is the first to encode spatial and electrostatic information into a single graph representation, as opposed to other work that require multiple graphs or additional chemical descriptors for feature representation. Our approach, using warm starts following repeated early stopping during training, outperforms the current state-of-the-art methods in liver toxicity (DILI), permeability (BBBP) and activity (BACE) prediction tasks.</p><h3>Graphic Abstract</h3><div><figure><div><div><picture><source><img></source></picture></div></div></figure></div></div>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"17 1","pages":""},"PeriodicalIF":5.7,"publicationDate":"2025-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-025-01068-3","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144861405","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cycle-configuration descriptors: a novel graph-theoretic approach to enhancing molecular inference 循环构型描述符:一种新的增强分子推理的图论方法
IF 5.7 2区 化学 Q1 CHEMISTRY, MULTIDISCIPLINARY Pub Date : 2025-08-18 DOI: 10.1186/s13321-025-01042-z
Bowen Song, Jianshen Zhu, Naveed Ahmed Azam, Kazuya Haraguchi, Liang Zhao, Tatsuya Akutsu

Inference of molecules with desired activities/properties is one of the key and challenging issues in cheminformatics and bioinformatics. For that purpose, our research group has recently developed a state-of-the-art framework mol-infer for molecular inference. This framework first constructs a prediction function for a fixed property using machine learning models, which is then simulated by mixed-integer linear programming to infer desired molecules. The accuracy of the framework heavily relies on the representation power of the descriptors. In this study, we highlight a typical class of non-isomorphic chemical graphs with reasonably different property values that cannot be distinguished by the standard “two-layered (2L) model" of mol-infer. To address this distinguishability problem of the 2L model, we propose a novel family of descriptors, named cycle-configuration (CC), which captures the notion of ortho/meta/para patterns that appear in aromatic rings, which was impossible in the framework so far. Extensive computational experiments show that with the new descriptors, we can construct prediction functions with similar or better performance for all 44 tested chemical properties, including 27 regression datasets and 17 classification datasets comparing with our previous studies, confirming the effectiveness of the CC descriptors. For inference, we also provide a system of linear constraints to formulate the CC descriptors as linear constraints. We demonstrate that a chemical graph with up to 50 non-hydrogen vertices can be inferred within a practical time frame.

分子活性/性质的推断是化学信息学和生物信息学的关键和挑战性问题之一。为此,我们的研究小组最近开发了一个用于分子推理的最先进的框架moll -infer。该框架首先使用机器学习模型构建固定属性的预测函数,然后通过混合整数线性规划模拟以推断所需分子。框架的准确性很大程度上依赖于描述符的表示能力。在这项研究中,我们强调了一类典型的非同构化学图,它们具有合理不同的性质值,不能被mol-infer的标准“两层(2L)模型”所区分。为了解决2L模型的可分辨性问题,我们提出了一个新的描述符家族,称为循环配置(CC),它捕获了芳香环中出现的邻位/元/对位模式的概念,这在目前的框架中是不可能的。大量的计算实验表明,与我们之前的研究相比,使用新的描述符,我们可以对所有44个被测试的化学性质构建具有相似或更好性能的预测函数,包括27个回归数据集和17个分类数据集,证实了CC描述符的有效性。对于推理,我们还提供了一个线性约束系统来将CC描述符表述为线性约束。我们证明了在实际时间框架内可以推断出具有多达50个非氢顶点的化学图。
{"title":"Cycle-configuration descriptors: a novel graph-theoretic approach to enhancing molecular inference","authors":"Bowen Song,&nbsp;Jianshen Zhu,&nbsp;Naveed Ahmed Azam,&nbsp;Kazuya Haraguchi,&nbsp;Liang Zhao,&nbsp;Tatsuya Akutsu","doi":"10.1186/s13321-025-01042-z","DOIUrl":"10.1186/s13321-025-01042-z","url":null,"abstract":"<div><p>Inference of molecules with desired activities/properties is one of the key and challenging issues in cheminformatics and bioinformatics. For that purpose, our research group has recently developed a state-of-the-art framework <span>mol-infer</span> for molecular inference. This framework first constructs a prediction function for a fixed property using machine learning models, which is then simulated by mixed-integer linear programming to infer desired molecules. The accuracy of the framework heavily relies on the representation power of the descriptors. In this study, we highlight a typical class of non-isomorphic chemical graphs with reasonably different property values that cannot be distinguished by the standard “two-layered (2L) model\" of <span>mol-infer</span>. To address this distinguishability problem of the 2L model, we propose a novel family of descriptors, named <i>cycle-configuration (CC)</i>, which captures the notion of ortho/meta/para patterns that appear in aromatic rings, which was impossible in the framework so far. Extensive computational experiments show that with the new descriptors, we can construct prediction functions with similar or better performance for all 44 tested chemical properties, including 27 regression datasets and 17 classification datasets comparing with our previous studies, confirming the effectiveness of the CC descriptors. For inference, we also provide a system of linear constraints to formulate the CC descriptors as linear constraints. We demonstrate that a chemical graph with up to 50 non-hydrogen vertices can be inferred within a practical time frame. </p></div>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"17 1","pages":""},"PeriodicalIF":5.7,"publicationDate":"2025-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-025-01042-z","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144861404","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-fidelity graph neural networks for predicting toluene/water partition coefficients 预测甲苯/水分配系数的多保真度图神经网络
IF 5.7 2区 化学 Q1 CHEMISTRY, MULTIDISCIPLINARY Pub Date : 2025-08-08 DOI: 10.1186/s13321-025-01057-6
Thomas Nevolianis, Jan G. Rittig, Alexander Mitsos, Kai Leonhard

Accurate prediction of toluene/water partition coefficients of neutral species is crucial in drug discovery and separation processes; however, data-driven modeling of these coefficients remains challenging due to limited available experimental data. To address the limitation of available data, we apply multi-fidelity learning approaches leveraging a quantum chemical dataset (low fidelity) of approximately 9000 entries generated by COSMO-RS and an experimental dataset (high fidelity) of about 250 entries collected from the literature. We explore the transfer learning, feature-augmented learning, and multi-target learning approaches in combination with graph neural networks, validating them on two external datasets: one with molecules similar to training data (EXT-Zamora) and one with more challenging molecules (EXT-SAMPL9). Our results show that multi-target learning significantly improves predictive accuracy, achieving a root-mean-square error of 0.44 (log {P}) units for the EXT-Zamora, compared to a root-mean-square error of 0.63 (log {P}) units for single-task models. For the EXT-SAMPL9 dataset, multi-target learning achieves a root-mean-square error of 1.02 (log {P}) units, indicating reasonable performance even for more complex molecular structures. These findings highlight the potential of multi-fidelity learning approaches that leverage quantum chemical data to improve toluene/water partition coefficient predictions and address challenges posed by limited experimental data. We expect the applicability of the methods used beyond just toluene/water partition coefficients.

准确预测中性物质的甲苯/水分配系数在药物发现和分离过程中至关重要;然而,由于可用的实验数据有限,这些系数的数据驱动建模仍然具有挑战性。为了解决可用数据的限制,我们应用多保真度学习方法,利用cosmos - rs生成的约9000个条目的量子化学数据集(低保真度)和从文献中收集的约250个条目的实验数据集(高保真度)。我们结合图神经网络探索了迁移学习、特征增强学习和多目标学习方法,并在两个外部数据集上验证了它们:一个具有与训练数据相似的分子(EXT-Zamora),另一个具有更具挑战性的分子(EXT-SAMPL9)。我们的研究结果表明,多目标学习显著提高了预测精度,EXT-Zamora的均方根误差为0.44 $$log {P}$$单位,而单任务模型的均方根误差为0.63 $$log {P}$$单位。对于EXT-SAMPL9数据集,多目标学习的均方根误差为1.02 $$log {P}$$单位,即使对于更复杂的分子结构,也有合理的性能。这些发现突出了利用量子化学数据改进甲苯/水分配系数预测和解决有限实验数据带来的挑战的多保真度学习方法的潜力。我们期望所使用的方法的适用性不仅仅是甲苯/水分配系数。我们研究了迁移学习、特征增强学习和多目标学习方法结合图神经网络预测甲苯-水分配系数的好处。我们展示了如何将来自半经验cosmos - rs模型的大量廉价数据与少量高保真度实验数据和多目标学习有效地结合在一起,从而产生具有广泛适用性和低不确定性的机器学习模型,其划分系数为0.44至1.02个log单位,具体取决于测试集。
{"title":"Multi-fidelity graph neural networks for predicting toluene/water partition coefficients","authors":"Thomas Nevolianis,&nbsp;Jan G. Rittig,&nbsp;Alexander Mitsos,&nbsp;Kai Leonhard","doi":"10.1186/s13321-025-01057-6","DOIUrl":"10.1186/s13321-025-01057-6","url":null,"abstract":"<div><p>Accurate prediction of toluene/water partition coefficients of neutral species is crucial in drug discovery and separation processes; however, data-driven modeling of these coefficients remains challenging due to limited available experimental data. To address the limitation of available data, we apply multi-fidelity learning approaches leveraging a quantum chemical dataset (low fidelity) of approximately 9000 entries generated by COSMO-RS and an experimental dataset (high fidelity) of about 250 entries collected from the literature. We explore the <i>transfer learning</i>, <i>feature-augmented learning</i>, and <i>multi-target learning</i> approaches in combination with graph neural networks, validating them on two external datasets: one with molecules similar to training data (EXT-Zamora) and one with more challenging molecules (EXT-SAMPL9). Our results show that <i>multi-target learning</i> significantly improves predictive accuracy, achieving a root-mean-square error of 0.44 <span>(log {P})</span> units for the EXT-Zamora, compared to a root-mean-square error of 0.63 <span>(log {P})</span> units for single-task models. For the EXT-SAMPL9 dataset, <i>multi-target learning</i> achieves a root-mean-square error of 1.02 <span>(log {P})</span> units, indicating reasonable performance even for more complex molecular structures. These findings highlight the potential of multi-fidelity learning approaches that leverage quantum chemical data to improve toluene/water partition coefficient predictions and address challenges posed by limited experimental data. We expect the applicability of the methods used beyond just toluene/water partition coefficients.</p></div>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"17 1","pages":""},"PeriodicalIF":5.7,"publicationDate":"2025-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-025-01057-6","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144797314","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Advanced machine learning for innovative drug discovery 创新药物发现的先进机器学习
IF 5.7 2区 化学 Q1 CHEMISTRY, MULTIDISCIPLINARY Pub Date : 2025-08-08 DOI: 10.1186/s13321-025-01061-w
Igor V. Tetko, Djork-Arné Clevert

This editorial presents an analysis of the articles published in the Journal of Cheminformatics Special Issue “AI in Drug Discovery”. We review how novel machine learning developments are enhancing structural-based drug discovery; providing better forecasts of molecular properties while also improving various elements of chemical reaction prediction. Methodological developments focused on increasing the accuracy of models via pre-training, estimating the accuracy of predictions, tuning model hyperparameters while avoiding overfitting, in addition to a diverse range of other novel and interesting methodological aspects, including the incorporation of human expert knowledge to analysing the susceptibility of models to adversary attacks, were explored in this Special Issue. In summary, the Special Issue brought together an excellent collection of articles that collectively demonstrate how machine learning methods have become an essential asset in modern drug discovery, with the potential to advance autonomous chemistry labs in the near future.

Graphical Abstract

这篇社论对发表在化学信息学杂志特刊“药物发现中的人工智能”上的文章进行了分析。我们回顾了新的机器学习发展如何增强基于结构的药物发现;提供更好的分子性质预测,同时也改进了各种元素的化学反应预测。方法学的发展侧重于通过预训练来提高模型的准确性,估计预测的准确性,在避免过度拟合的同时调整模型超参数,以及各种其他新颖有趣的方法学方面,包括结合人类专家知识来分析模型对对手攻击的易感性,在本期特刊中进行了探讨。总之,特刊汇集了一系列优秀的文章,这些文章共同展示了机器学习方法如何成为现代药物发现的重要资产,并有可能在不久的将来推动自主化学实验室的发展。
{"title":"Advanced machine learning for innovative drug discovery","authors":"Igor V. Tetko,&nbsp;Djork-Arné Clevert","doi":"10.1186/s13321-025-01061-w","DOIUrl":"10.1186/s13321-025-01061-w","url":null,"abstract":"<div><p>This editorial presents an analysis of the articles published in the <i>Journal of Cheminformatics</i> Special Issue “AI in Drug Discovery”. We review how novel machine learning developments are enhancing structural-based drug discovery; providing better forecasts of molecular properties while also improving various elements of chemical reaction prediction. Methodological developments focused on increasing the accuracy of models via pre-training, estimating the accuracy of predictions, tuning model hyperparameters while avoiding overfitting, in addition to a diverse range of other novel and interesting methodological aspects, including the incorporation of human expert knowledge to analysing the susceptibility of models to adversary attacks, were explored in this Special Issue. In summary, the Special Issue brought together an excellent collection of articles that collectively demonstrate how machine learning methods have become an essential asset in modern drug discovery, with the potential to advance autonomous chemistry labs in the near future.</p><h3>Graphical Abstract</h3>\u0000<div><figure><div><div><picture><source><img></source></picture></div></div></figure></div></div>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"17 1","pages":""},"PeriodicalIF":5.7,"publicationDate":"2025-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-025-01061-w","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144797318","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Nanodesigner: resolving the complex-CDR interdependency with iterative refinement 纳米设计器:通过迭代细化解决复杂的cdr相互依赖
IF 5.7 2区 化学 Q1 CHEMISTRY, MULTIDISCIPLINARY Pub Date : 2025-08-07 DOI: 10.1186/s13321-025-01069-2
Melissa Maria Rios Zertuche, Şenay Kafkas, Dominik Renn, Magnus Rueping, Robert Hoehndorf

Camelid heavy-chain only antibodies consist of two heavy chains and single variable domains (VHHs), which retain antigen-binding functionality even when isolated. The term “nanobody” is now more generally used for describing small, single-domain antibodies. Several antibody generative models have been developed for the sequence and structure co-design of the complementarity-determining regions (CDRs) based on the binding interface with a target antigen. However, these models are not tailored for nanobodies and are often constrained by their reliance on experimentally determined antigen–antibody structures, which are labor-intensive to obtain. Here, we introduce NanoDesigner, a tool for nanobody design and optimization based on generative AI methods. NanoDesigner integrates key stages—structure prediction, docking, CDR generation, and side-chain packing—into an iterative framework based on an expectation maximization (EM) algorithm. The algorithm effectively tackles an interdependency challenge where accurate docking presupposes a priori knowledge of the CDR conformation, while effective CDR generation relies on accurate docking outputs to guide its design. NanoDesigner approximately doubles the success rate of de novo nanobody designs through continuous refinement of docking and CDR generation.

骆驼重链抗体由两条重链和单变量结构域(VHHs)组成,即使被分离也能保持抗原结合功能。“纳米体”这个术语现在更普遍地用于描述小的、单域的抗体。基于与靶抗原结合界面的互补决定区(cdr)的序列和结构协同设计,已经建立了几种抗体生成模型。然而,这些模型并不是为纳米体量身定制的,而且往往受到它们依赖于实验确定的抗原-抗体结构的限制,这些结构的获得需要大量的劳动。在这里,我们介绍NanoDesigner,一个基于生成式人工智能方法的纳米体设计和优化工具。NanoDesigner将关键阶段(结构预测、对接、CDR生成和侧链打包)集成到基于期望最大化(EM)算法的迭代框架中。该算法有效地解决了相互依赖的挑战,其中准确对接以CDR构象的先验知识为前提,而有效的CDR生成依赖于准确的对接输出来指导其设计。通过对对接和CDR生成的不断改进,NanoDesigner将从头设计纳米体的成功率提高了近一倍。我们开发了一种利用生成式人工智能设计和优化纳米体的新方法。我们使用迭代方法来解决cdr的设计依赖于由纳米体和蛋白质靶点组成的复合物的知识,以及对复合物的准确预测依赖于cdr知识的问题。通过直接比较,我们证明了我们的方法比目前的技术水平有所提高。
{"title":"Nanodesigner: resolving the complex-CDR interdependency with iterative refinement","authors":"Melissa Maria Rios Zertuche,&nbsp;Şenay Kafkas,&nbsp;Dominik Renn,&nbsp;Magnus Rueping,&nbsp;Robert Hoehndorf","doi":"10.1186/s13321-025-01069-2","DOIUrl":"10.1186/s13321-025-01069-2","url":null,"abstract":"<div><p>Camelid heavy-chain only antibodies consist of two heavy chains and single variable domains (VHHs), which retain antigen-binding functionality even when isolated. The term “nanobody” is now more generally used for describing small, single-domain antibodies. Several antibody generative models have been developed for the sequence and structure co-design of the complementarity-determining regions (CDRs) based on the binding interface with a target antigen. However, these models are not tailored for nanobodies and are often constrained by their reliance on experimentally determined antigen–antibody structures, which are labor-intensive to obtain. Here, we introduce NanoDesigner, a tool for nanobody design and optimization based on generative AI methods. NanoDesigner integrates key stages—structure prediction, docking, CDR generation, and side-chain packing—into an iterative framework based on an expectation maximization (EM) algorithm. The algorithm effectively tackles an interdependency challenge where accurate docking presupposes <i>a priori</i> knowledge of the CDR conformation, while effective CDR generation relies on accurate docking outputs to guide its design. NanoDesigner approximately doubles the success rate of de novo nanobody designs through continuous refinement of docking and CDR generation.</p></div>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"17 1","pages":""},"PeriodicalIF":5.7,"publicationDate":"2025-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-025-01069-2","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144797315","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
From molecules to data: the emerging impact of chemoinformatics in chemistry 从分子到数据:化学信息学在化学中的新影响。
IF 5.7 2区 化学 Q1 CHEMISTRY, MULTIDISCIPLINARY Pub Date : 2025-08-07 DOI: 10.1186/s13321-025-00978-6
Anup Basnet Chetry, Keisuke Ohto

Chemoinformatics is a rapidly advancing field that integrates chemistry, computer science, and data analysis to enhance the study and application of chemical systems. This interdisciplinary approach leverages computational tools and large datasets to drive innovation in various chemical disciplines, including drug discovery, materials science, and environmental chemistry. Recent advancements in artificial intelligence (AI) and machine learning (ML) have significantly improved the ability to analyze complex datasets, predict molecular properties, and design new compounds. Additionally, the expansion of open-access databases and collaborative platforms has facilitated broader access to chemical data and fostered global research collaboration. Sophisticated molecular modeling techniques, such as multi-scale modeling and free energy calculations, have enhanced the accuracy of predictions, while big data analytics has enabled the extraction of valuable insights from vast datasets. Emerging technologies, including quantum computing, hold promise for further revolutionizing the field by offering new capabilities for simulating and optimizing chemical processes. Despite these advancements, chemoinformatics faces challenges related to data integrity, computational demands, and interdisciplinary collaboration. Addressing these challenges is crucial for the continued growth and effectiveness of chemoinformatics. Overall, the field is poised to play a pivotal role in advancing chemical research and developing innovative solutions to address global challenges.

Scientific contribution This article highlights the growing impact of chemoinformatics in modern chemistry by integrating computational tools with molecular science to enhance data-driven discovery. It explores advancements in machine learning, artificial intelligence, and big data analytics, which improve molecular property predictions and accelerate chemical innovations. The study also discusses key applications in drug design and materials science, demonstrating how chemoinformatics drives efficiency and sustainability in research. Additionally, it outlines future challenges and opportunities, emphasizing the need for improved algorithms, data standardization, and interdisciplinary collaboration. This work contributes to the evolving role of chemoinformatics as a crucial pillar of modern chemical research.

化学信息学是一个快速发展的领域,它将化学、计算机科学和数据分析相结合,以加强化学系统的研究和应用。这种跨学科的方法利用计算工具和大型数据集来推动各种化学学科的创新,包括药物发现、材料科学和环境化学。人工智能(AI)和机器学习(ML)的最新进展显著提高了分析复杂数据集、预测分子性质和设计新化合物的能力。此外,开放获取数据库和协作平台的扩展促进了对化学数据的更广泛获取,并促进了全球研究合作。复杂的分子建模技术,如多尺度建模和自由能计算,提高了预测的准确性,而大数据分析使从大量数据集中提取有价值的见解成为可能。包括量子计算在内的新兴技术,通过提供模拟和优化化学过程的新功能,有望进一步革新该领域。尽管取得了这些进步,化学信息学仍面临着与数据完整性、计算需求和跨学科合作相关的挑战。解决这些挑战对化学信息学的持续发展和有效性至关重要。总体而言,该领域将在推进化学研究和开发创新解决方案以应对全球挑战方面发挥关键作用。这篇文章强调了化学信息学在现代化学中日益增长的影响,通过将计算工具与分子科学相结合来增强数据驱动的发现。它探讨了机器学习、人工智能和大数据分析方面的进展,这些进展可以改善分子性质预测并加速化学创新。该研究还讨论了在药物设计和材料科学中的关键应用,展示了化学信息学如何推动研究的效率和可持续性。此外,它概述了未来的挑战和机遇,强调了改进算法、数据标准化和跨学科合作的必要性。这项工作有助于化学信息学作为现代化学研究的重要支柱的发展作用。
{"title":"From molecules to data: the emerging impact of chemoinformatics in chemistry","authors":"Anup Basnet Chetry,&nbsp;Keisuke Ohto","doi":"10.1186/s13321-025-00978-6","DOIUrl":"10.1186/s13321-025-00978-6","url":null,"abstract":"<div><p>Chemoinformatics is a rapidly advancing field that integrates chemistry, computer science, and data analysis to enhance the study and application of chemical systems. This interdisciplinary approach leverages computational tools and large datasets to drive innovation in various chemical disciplines, including drug discovery, materials science, and environmental chemistry. Recent advancements in artificial intelligence (AI) and machine learning (ML) have significantly improved the ability to analyze complex datasets, predict molecular properties, and design new compounds. Additionally, the expansion of open-access databases and collaborative platforms has facilitated broader access to chemical data and fostered global research collaboration. Sophisticated molecular modeling techniques, such as multi-scale modeling and free energy calculations, have enhanced the accuracy of predictions, while big data analytics has enabled the extraction of valuable insights from vast datasets. Emerging technologies, including quantum computing, hold promise for further revolutionizing the field by offering new capabilities for simulating and optimizing chemical processes. Despite these advancements, chemoinformatics faces challenges related to data integrity, computational demands, and interdisciplinary collaboration. Addressing these challenges is crucial for the continued growth and effectiveness of chemoinformatics. Overall, the field is poised to play a pivotal role in advancing chemical research and developing innovative solutions to address global challenges.</p><p><b>Scientific contribution</b> This article highlights the growing impact of chemoinformatics in modern chemistry by integrating computational tools with molecular science to enhance data-driven discovery. It explores advancements in machine learning, artificial intelligence, and big data analytics, which improve molecular property predictions and accelerate chemical innovations. The study also discusses key applications in drug design and materials science, demonstrating how chemoinformatics drives efficiency and sustainability in research. Additionally, it outlines future challenges and opportunities, emphasizing the need for improved algorithms, data standardization, and interdisciplinary collaboration. This work contributes to the evolving role of chemoinformatics as a crucial pillar of modern chemical research.</p></div>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"17 1","pages":""},"PeriodicalIF":5.7,"publicationDate":"2025-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-025-00978-6","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144796717","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of Cheminformatics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1